1. Introduction
In recent years, with the large-scale construction of urban rail transit and the diversification of transportation modes, the development of conventional public transport has been confronted with unprecedented challenges and pressures. From 2019 to 2024, the annual passenger volume of conventional public transport across the country decreased from 69.176 billion person-times to 38.670 billion person-times, with a decline rate of 44.1% [
1]. This is mainly due to the continuous improvement of the metro network and the rapid growth in the ownership of electric bicycles. Faced with the new transportation situation, the development of conventional public transport should change traditional thinking, actively promote the supply-side reform of conventional public transport services, and shift from the previous focus on increasing the “quantity” of supply—such as simply expanding the density of the route network and the frequency of departures—to enhancing the “quality” of public transport services.
Mastering accurate passenger travel demand is a prerequisite for improving public transport service quality. Currently, the technology for estimating public transport passengers’ OD (Origin-Destination) data is relatively mature, but its accuracy is not high—it can only estimate data down to the level of public transport stops. However, achieving improvements in the “quality” aspect of the supply side of conventional public transport services is inseparable from precise POI-level OD data of public transport passengers. Precise OD data of public transport passengers reflects residents’ travel patterns and travel demands. It can provide data support for establishing a system of refined public transport implementation plans, identifying the travel needs of public transport passengers, and enhancing the targeting and precision of public transport services [
2,
3].
The commonly used method to obtain precise Origin-Destination (OD) data of public transport passengers is the manual survey method [
4], which is usually conducted in the form of questionnaires. This method has the advantage of being able to acquire comprehensive and accurate public transport passenger OD data. However, it also has drawbacks: poor repeatability, high labor costs, and it is often impossible to implement on a large scale. As a core component of precise OD data [
5], the residential distribution of public transport passengers can fully reflect the status of urban public transport and urban spatial layout. It serves as an important support for formulating urban comprehensive transportation system plans and urban spatial strategic plans. During the morning peak hours, public transport trips are highly concentrated, with commuting passengers being the main group of travelers, and residential areas being the primary starting points of commuting [
6,
7]. Therefore, morning peak public transport trip data can be used to study the distribution of passengers’ residential areas. Meanwhile, POI data has been widely applied in research on residents’ activities, the extraction of urban functional zones, the analysis of urban business formats, and other fields. Residential areas, also referred to as residential POI, are a special type of POI. With the development of Location-Based Services (LBS), real estate online platforms, and web crawler technology, open platforms such as Baidu Maps, Amap, and Tencent Maps provide API interfaces for high-concurrency data retrieval services, enabling efficient acquisition of information such as POI geographic coordinates [
8]. Major domestic real estate websites, including Lianjia, Fang.com, and Anjuke, record detailed information about residential POI, such as housing prices, property types, and longitude and latitude. Thus, it has become possible to automatically collect residential POI data in batches [
9,
10,
11].
In summary, data on the residential distribution of public transport passengers is of great significance for solving public transport-related issues, while the traditional manual survey method has inherent shortcomings. With the development of relevant technologies, residential POI—which are convenient to collect—have provided a new research perspective for obtaining data on the residential distribution of public transport passengers. Against this backdrop, this study focuses on realizing the estimation of the residential distribution of public transport passengers based on POI information accessible from stops and morning peak travel data. This aims to meet the demand for efficiently, conveniently, and cost-effectively obtaining data on the residential distribution of public transport passengers, to provide data support and a scientific basis for subsequent research, and to improve the sustainability level of urban transportation.
2. The Literature Review
At present, exploration and research on the issue of obtaining data on the residential distribution of public transport passengers are mainly reflected in two aspects: research on public transport passengers’ Origin-Destination (OD) data and research on residents’ workplace-residence locations.
2.1. Public Transport Passengers’ OD Data
Some researchers at home and abroad analyze the original data of public transport IC card swipes and combine multi-source data obtained from other sources, such as on-board GPS data and vehicle operation data. Through data fusion, they form a dataset to estimate passengers’ Origin-Destination (OD) information. Additionally, some experts and scholars adopt methods such as intelligent video analysis and infrared sensor counting to identify the boarding and alighting behaviors of public transport passengers, thereby constructing a travel OD matrix. Wei Wang et al. [
12] used an automatic data collection system, leveraging public transport IC card transaction data and vehicle positioning data. Based on the principle of travel chains, they analyzed the boarding and alighting stops of public transport passengers in London, which were then used as the passengers’ OD points. Catherine Vanderwaart et al. [
13] designed a new service planning program that automatically aggregates public transport IC card data and vehicle positioning data to infer passengers’ boarding and alighting stops. These inferred stops were used as passengers’ OD information to address issues related to public transport network design and service planning. Widyawan et al. [
2] conducted an analysis at the level of public transport stops and routes based on the principle of travel chains. They used public transport IC card data to construct an initial OD matrix table for passengers and added the judgment of passenger behavior patterns, which was applied to quickly support the operational planning of public transport. R. Takao et al. [
3] utilized an intelligent video analysis system to identify information about the bus stops where passengers board and alight. This information was used as passenger OD data to understand passenger flow distribution, thereby providing data support for tasks such as extracting potential demand, adjusting timetables, and modifying routes. Lan Cheng [
14] collected data using an automatic passenger counting (APC) system based on infrared technology, matched the data to public transport routes and stops, and used this to estimate passengers’ travel OD information. Rodríguez González AB et al. [
15] used radio frequency identification (RFID) technology to count passengers, proposed a complete BIBO (Board-In Board-Out) system, and designed corresponding algorithms for calculating individual trips of each passenger and the corresponding OD matrix. Finally, the effectiveness of the system and algorithms was verified through two practical experiments.
2.2. Workplace-Residence Locations
Some researchers at home and abroad have conducted studies on the workplace-residence locations of urban residents. They use mobile phone signaling data to analyze features such as call locations, stay duration, and stay time periods, thereby proposing methods to identify workplace-residence locations. Additionally, some experts and scholars integrate multi-source data, including vehicle data, land use data, and social media data, to identify residents’ workplace-residence locations, aiming to provide new approaches for research on urban residents’ workplace-residence locations. Ge Q. et al. [
16] developed a maximum entropy model for estimating the Origin-Destination (OD) of residents’ residence- and work-related trips. This model utilizes mobile phone signaling data and is based on a sequence update algorithm grounded in the principle of maximum entropy. Zang H. et al. [
17] integrated mobile phone call data with available public data (e.g., census data). They identified residents’ workplace-residence locations by analyzing the frequency of mobile phone users’ call events. Frias-Martinez V. et al. [
18] identified individuals’ key activity locations through analytical methods such as spatial clustering, based on the spatial distribution of mobile phone calls. They then determined workplace-residence locations by combining these locations with the corresponding activity times. Shiva R. et al. [
7] constructed an activity-based travel demand model using mobile phone signaling data, which is built on a neuro-fuzzy inference system and a hidden Markov model. This model distinguishes commuters and further infers their workplace-residence locations. The effectiveness of the model was verified by comparing its results with three types of data: expert-labeled on-site actual data, activity-based trip volumes generated and attracted by different regions, and highway traffic volume data from survey reports. Jang Y. et al. [
6] designed an algorithm for pedestrian detection based on mobile phone mobility data and GPS base station information. This algorithm identifies pedestrian travel patterns and infers trip origins and destinations. Finally, the algorithm’s effectiveness was validated by comparing its outputs with data from household questionnaires.
2.3. Summary of the Literature Review
From the above research on the status of scholars at home and abroad, the following conclusions can be drawn:
(1) In the current research on public transport passengers’ Origin-Destination (OD) data, the boarding and alighting stops of passengers are basically taken as the passengers’ OD for research, while the actual locations of passengers’ OD are ignored. This deviates from the real situation and has a certain discrepancy with passengers’ actual travel demands, which hinders the improvement of the targeting and precision of public transport services.
(2) In the current research on residence estimation, most studies use mobile phone signaling data for estimation. The estimated residences are the residences of mobile phone users, and there is a lack of residence data specifically for public transport passengers. In addition, there are two disadvantages in using mobile phone signaling data to estimate workplace-residence locations: on the one hand, considering that current users pay more attention to protecting their own privacy, it is difficult to obtain such data; on the other hand, mobile phone signaling data needs to be purchased from mobile operators, and the cost is relatively high.
In summary, there are few current studies on the estimation of public transport passengers’ residences. The methods for obtaining data on the residential distribution of public transport passengers are inefficient, cumbersome, and expensive, and there is a serious lack of such data, which makes it difficult to provide a data reference and theoretical basis for accurately identifying passengers’ demands.
To address the above problems, this study explores an approach to estimate the residential distribution of public transport passengers based on the data of residential POI accessible from stops. Specifically, it integrates and uses the information of POI accessible from stops and public transport passenger flow data during morning peak hours, and conducts an in-depth analysis of the impacts of factors such as housing prices and types of residential POI, the convenience of public transport and subways, and the convenience of public transport stops on residents’ willingness to travel to public transport stops. Furthermore, a regression model for the number of public transport passengers from residential POI to stops is constructed to estimate the number of passengers traveling from each residential POI to all accessible public transport stops. Taking this number as the weight, the actual passenger flow of each public transport stop is allocated to the respective residential POI, and finally, the estimation of the residential distribution of public transport passengers is realized.
3. Methodology and Data
3.1. Problem Transformation
The research problem in this paper is the transit passenger residence projection, which ultimately requires obtaining the distribution of transit passengers by residence, and from the perspective of residence, the number of bus station passengers by residence is required. The number of residential bus station passengers is determined by the population living in the residence and the residents’ willingness to travel to the bus station, using the residential bus station travel ratio [
19,
20] (later referred to as the travel ratio) to express the residents’ willingness to travel to the bus station, as shown in Equation (1). Where the population in residence is a knowable constant, at this point, the research problem is transformed into a travel ratio projection.
where
denotes the number of passengers from the residence
to the reachable bus station
;
denotes the population in residence
;
denotes the ratio of trips from the residence
to residence
reachable bus station
.
,
is the number of residences in the tract;
,
is the number of reachable bus stations for residents
in the tract.
3.2. Influencing Factors
To estimate the proportion of trips to public transport stops, it is first necessary to explore the specific influencing factors. This paper conducts a further analysis from two aspects: public transport travel factors and public transport stop selection factors. The reason is that public transport travel factors affect residents’ choice of travel mode, while public transport stop selection factors influence residents’ choice of specific public transport stops.
Currently, research on the influencing factors of travel mode choice is quite mature. The main influencing factors include [
21,
22]: residents’ income, age, car ownership, occupation, and the convenience of travel by bus or subway. In studies on public transport passengers’ stop selection [
23,
24], stop satisfaction is usually used as the evaluation criterion. The distance between the stop and the residential POI, and the number of bus routes at the stop, are used to characterize this satisfaction.
3.2.1. Indirect Influencing Factors
From the available data, it is found that the current data on residents’ per capita income, age per capita, per capita car ownership, and occupation proportion cannot be obtained directly. Further analysis shows that the housing price of residential POI is directly related to residents’ per capita income [
25,
26]; urban villages have a larger proportion of young people, while the age distribution in residential communities is more balanced [
27,
28,
29]; on the other hand, residential communities usually have more parking spaces, so their per capita car ownership is higher than that of urban villages [
30,
31]. Therefore, this paper uses the housing price and type of residential POI to indirectly reflect residents’ per capita income, age per capita, and per capita car ownership. The specific indirectly designed parameter indicators are detailed in
Table 1. In addition, due to the inability to design indirectly through available data, the influencing factor of residents’ occupation proportion is excluded.
3.2.2. Direct Influencing Factors
The scope of factors considered for the convenience of public transport, subways, and public transport stops in residential areas is relatively broad, among which accessibility [
32,
33,
34] is the key aspect. Therefore, this paper selects two indicators as parameters for the convenience of public transport and subways: the number of accessible (within a 500 m walking distance) public transport and subway lines from a residential POI, and the average distance from the residential POI to (relevant) stops. For the convenience of public transport stops, the paper selects the number of accessible lines at the stops (reachable from the residential POI) and the distance (from the residential POI to the stops) as its parameter indicators.
Table 1 presents the indicators for each influencing factor.
3.2.3. Determination of Functional Relationships
This section conducts multiple functional relationship fittings between each influencing factor and the public transport travel ratio, respectively. The results indicate that the correlations between community-related influencing factors and the public transport travel ratio are all linear, while those between urban agglomeration-related influencing factors and the public transport travel ratio follow a power function relationship. The fitting errors are presented in
Table 2 below.
3.3. Regression Model for the Number of Public Transport Passengers from POI to Stops
3.3.1. Original Model
This section conducts multiple functional relationship fittings between each influencing factor and the public transport travel ratio, respectively. The results indicate that the correlations between community-related influencing factors and the public transport travel ratio are all linear, while those between urban agglomeration-related influencing factors and the public transport travel ratio follow a power function relationship. The fitting errors are presented in the table below. The residential POI data are divided into two categories of communities and urban villages. The regression model of residential POI bus station ridership is constructed from the perspective of stations, as shown in Equation (2).
where
denotes the projected total number of passengers at the bus station
;
denotes the population in the community
;
denotes the ratio of trips from the community
reachable from the bus station
;
denotes the population in urban village
i″ population;
denotes the ratio of trips from the urban village
reachable from the bus station
.
,
denotes the number of communities in the tract;
denotes the number of urban villages within the tract;
is the number of bus stations that are reachable from both subdivisions within the tract
and urban villages
.
Further, based on the functional relationships determined in the previous section, all influencing factors related to the public transport travel ratio of residential districts exhibit a linear relationship. These factors constitute the public transport travel ratio of residential districts through the summation of correlation coefficients, while the influencing factors related to the public transport travel ratio of urban villages follow a power function relationship, and Equation (2) can be changed to Equation (3) [
35,
36].
where
,
,
,
,
denote the house price of the community
, the number of reachable bus lines, the number of reachable subway lines, the average distance of reachable subway stations, and the number of lines of the bus station
, respectively;
,
,
,
, and
denote the coefficients of the influencing factors related to the travel ratio of different communities, respectively;
denotes a constant;
denotes the number of lines of the bus station
in the urban village perspective;
denotes the coefficients of the influencing factors related to the travel ratio of urban villages; and
denotes a power function coefficient.
Equation (2) can be further transformed into Equation (4), and the analysis reveals that the coefficient
of the influencing factors related to the ratio of cell trips presents a correlation form with the population
in each cell.
3.3.2. Improved Model
Considering that the total population in the community reachable to the station is a fixed constant and the total number of passengers is only related to the overall travel ratio influence factor, the model is improved to Equation (5) by referring to the formula for calculating the number of passengers at bus stations in urban villages [] and separating the population from the influence factor in Equation (4) for calculating the number of passengers at bus stations in the cell to avoid the problems of the original model.
Based on the improved model, the actual passenger flow at bus station
j is approximated as a substitute for
, and the unknown coefficients
,
,
,
,
,
C,
, S. At this point, the residential POI data is input, and the number of passengers from the residential POI to each reachable bus station
can be deduced.
where
denotes the total population in the community reachable by the bus station
;
,
,
,
denote the housing price, the number of reachable bus lines, the number of reachable subway lines, and the average distance between reachable subway stations to the community reachable by bus station
, respectively;
denotes the total population in the urban village reachable by the bus station
, i.e.,
;
j = 1, 2, ……
N, and
is the number of bus stations in the tract.
3.4. Method of Projecting the Residence of Public Transport Passengers
3.4.1. Residence Projecting
Using the residential POI bus station ridership regression model, it is possible to derive the number of passengers from the residential POI to each reachable bus station. Since the number of bus station passengers obtained by this projection deviates from the actual one, it is proposed to use the number of passengers from residential POI to each reachable bus station as the weight to allocate the actual passenger flow of bus stations to each residential POI, as shown in Equation (6), to finally realize the projection of bus passenger residence.
where
denotes the actual number of passengers in the bus station
whose residence is
;
denotes the actual passenger flow at the bus station
;
denotes the imputed number of passengers from residential POI
i to the bus station
;
denotes the imputed total number of passengers at the bus station
, i.e.,
.
3.4.2. Travel Ratio Projecting
Based on Equation (5), the travel ratio in the community is
; the travel ratio in the urban village is
. Since the travel ratio projected in this way deviates from the actual one, this section projects the travel ratio
at the bus station of the residence based on the projected results of the residence of the bus passengers, according to Equation (7). Taking the residence as the main body, the actual number of passengers at the bus stations in the residence is summed up to achieve the actual number of bus passengers in the residence, and the ratio of bus trips in the residence is further calculated, as shown in Equation (8).
In Equation (6), denotes the ratio of trips from the residence to residence up to the bus station ; Pi denotes the population with residence . In Equation (8), denotes the ratio of bus trips with residence ; denotes the actual number of bus passengers with residence , i.e., .
3.5. XGBoost-Based Residential POI Bus Station Ridership Projecting Model
3.5.1. Introduction to the Algorithm
In this paper, we do not use the model constructed based on deep learning algorithm as the main model for two main reasons [
37,
38,
39]: on the one hand, the deep learning process requires a large amount of data, and only 5050 samples are used to construct the model in this paper, which may lead to poor generalization ability of the model; on the other hand, deep learning cannot explain the principle of the constructed model, and the constructed model may be far from the actual model, such as the population living in the area and the travel ratio are constructed as a functional relationship in the form of non-multiplication. However, to further verify the validity of the regression model, an XGBoost-based residential POI bus station ridership imputation model is constructed in this paper as a reference.
XGBoost is an improved learning algorithm based on Gradient Boosting and Decision Tree (GBDT). The principle is to use the idea of iterative operations to transform a large number of weak classifiers into strong classifiers to achieve accurate classification results. It is an efficient implementation of GBDT, and its advantages are mainly reflected in two aspects: first, compared with GBDT, the objective loss function of XGBoost increases the regular term, which helps to reduce the model variance and prevent overfitting; second, the loss function of GBDT only does negative gradient (first-order Taylor) expansion for the error part, while the loss function of XGBoost does second-order Taylor expansion for the error part, which improves the prediction accuracy of the XGBoost algorithm.
3.5.2. Model Construction
In this section, all features are feature engineered into the input dataset, and then the model is trained and tuned to obtain the XGBoost-based residential POI bus station ridership imputation model.
- (1)
Feature Engineering
The feature inputs of the XGBoost model are consistent with those of the regression model, where the inputs of the cell data are , , , , , ; the inputs of the urban village data are , . The above feature inputs are all numerical variables with values in the range of [0, +∞).
- (2)
Model Tuning
The parameters of XGBoost include general parameters, boosting parameters, and learning task parameters. The generic parameters are used to set the overall functionality, the boost parameters are used to set the parameters of each step of the regression tree, and the learning task parameters guide the model to perform optimization tasks. In this paper, the above parameters are tuned using the grid search cross-validation method, which returns the evaluation index scores under all parameter combinations by iterating through all permutations of the incoming parameters in a cross-validation manner.
3.6. Study Area
In this paper, the proposed model and method are validated and evaluated by taking Shenzhen city as an example. For the bus travel data, the data of the morning peak (7:00~9:00) trips in December 2020 were selected, with a daily average of 838,900 entries, accounting for 28.92% of the whole day trips. The average time distribution of passenger flow is shown in
Figure 1, showing a trend of rising and then falling, with the peak located at around 8 o’clock; the average spatial distribution of passenger flow is shown in the heat map in
Figure 2, with more concentrated passenger flow, mainly occurring in the city center. For station and line data, it specifically includes 5500 bus station data, 1020 bus line data, 234 subway station data, and 11 subway line data. For residential POI data, the total number of residential POI data is 6282; furthermore, the number of reachable communities for bus stations is 36,800, and the number of urban villages is 29,800.
3.7. Data Resources
3.7.1. Residential POI Data Crawling
- (1)
Website Choosing
Compared with the mainstream real estate websites in China, “Housing World” and “Anjuke” have anti-crawler mechanisms and need to complete the slider verification manually at regular intervals; “Chain Home” does not have an anti-crawler mechanism. It is feasible to crawl data automatically and in bulk. Meanwhile, “Chain Home” covers 82 popular cities in China and has 230,000 pieces of residential data, which can meet the data demand.
- (2)
Crawling method
This paper adopts a breadth-first traversal crawling strategy and uses Python (PyCharm 2025.1.1) to build six sub-functional modules, including a request configuration module, a URL de-duplication module, a robots protocol module, a web crawling module, a web parsing module, and a storage module, to crawl the residential POI data of “Chain Home” in a batch automatically under the premise of the standard operation process.
- (3)
Results
For Shenzhen, 4574 communities and 1708 urban villages were crawled. Among them, the communities are mainly distributed in the city center, with an average price of 8826.02 USD/
and a total of 5,361,900 people; the urban villages are more scattered, with an average price of 7934.94 USD/
and a total of 10,479,900 people. The total number of residential POI is compared with the resident population of 17,560,100 people in the Seventh Census Analysis Report, and the relative error is −9.79%, and the crawling result is basically in line with the reality. The specific distribution of subdivisions and urban villages is shown in
Figure 3 and
Figure 4.
3.7.2. Bus and Subway Data Obtaining
- (1)
Platform choosing
Compared with the domestic mainstream map open platform, the personal quota of Baidu data retrieval function is 30,000/day, while Tencent and Gaode are 10,000/day and 5000/day, respectively; therefore, this paper chooses the Baidu map open platform to obtain bus and subway data.
- (2)
Obtaining method
In this paper, we use Python to write query statements conforming to the platform format, access the application programming interfaces of the Baidu Map open platform, call the highly concurrent data retrieval function, and obtain the bus and subway data in Baidu Map automatically in batches.
- (3)
Results
For Shenzhen, 1020 bus lines, 5050 bus stations, 11 metro lines, and 234 metro stations were obtained. After comparing and checking with the data of the Shenzhen Municipal Bureau of Transportation, the obtained data are relatively complete. The distribution maps of bus and subway stations are shown in
Figure 5 and
Figure 6.
4. Results and Discussion
4.1. Selection of Models for Comparison
Main models capable of multi-variable input include XGBoost, KNN Regression, BP Neural Network, and LSTM. The advantages and disadvantages of the above four algorithms are summarized in the table below.
| Models | Advantages | Disadvantages |
| XGBoost | (1) High prediction accuracy. The objective function is regularized to prevent overfitting, the second-order derivative is used when minimizing the objective function, enabling more accurate identification of the optimal solution, and shrinkage (learning rate) is applied. Reducing the learning rate increases the number of iterations, which acts as a regularization mechanism and improves model performance. (2) Fast computation speed. Supports feature sampling to reduce overfitting, lower computational complexity, and accelerate parallelization, performs parallel optimization at the feature granularity, significantly reducing the computational load, allows specifying default branch directions for specific values or missing values, greatly enhancing algorithm efficiency. (3) Low susceptibility to overfitting, low bias, and excellent generalization ability. (4) Effective solution for high-dimensional data problems. | Numerous and complex parameters make parameter tuning difficult. |
| KNN | (1) Simple concept, applicable to both classification and regression tasks; (2) No assumptions about data, and insensitive to outliers; (3) Relatively fast computation speed only needs to store training samples and labels, without the need for parameter estimation or model training. | (1) Low efficiency: It requires calculating all training data and test data, resulting in low computational efficiency when the data volume is large; (2) High dependence on training data: When the samples are imbalanced, the prediction accuracy for rare classes is low; (3) Poor performance in handling high-dimensional data. |
| BP Neural Networks | (1) Possesses strong nonlinear mapping capability, making it suitable for solving problems with complex internal mechanisms. (2) High self-learning and adaptive capabilities. (3) Strong generalization ability. | (1) The algorithm is prone to falling into local minima; (2) Slow convergence speed and low algorithm efficiency; (3) Prone to overfitting; (4) Lack of a unified standard for network structure selection. |
| LSTM | (1) Suitable for time-series data; (2) Alleviates the gradient vanishing or exploding problem in long-sequence tasks. | (1) Suboptimal performance in handling long-sequence tasks; (2) Slow computation speed and inability to support parallel processing. |
The data in this study is divided into two categories: data used for modeling and residential POI data, ultimately input for prediction. The constructed model is required to have high generalization ability. Therefore, this study adopts the XGBoost algorithm for modeling, which not only achieves high prediction accuracy but also obtains excellent generalization ability.
4.2. Regression Model Validation
4.2.1. Model Statistical Analysis
The R-values for the number of residents in residential districts and urban villages are 0.678 and 0.560, respectively, both passing the significance test. The results indicate that as the number of residents corresponding to residential POIs increases, the model accuracy decreases. The R-values for the number of bus routes at stations and passenger flow volume are -0.690 and -0.795, respectively, also passing the significance test. These results show that with the increase in the number of bus routes at stations and passenger flow volume, the model accuracy improves. In contrast, residential district housing prices, the number of accessible bus routes, the number of accessible metro routes in residential districts, and the average distance between stations all fail to pass the significance test, exerting no significant impact on the model's MAPE.
4.2.2. Model Validity Analysis
Due to the lack of actual transit passenger residence data to compare with the projected transit passenger residence, this section focuses on verifying the accuracy of the total residential POI bus station ridership.
- (1)
Residential POI bus station ridership regression model
① Improved model
Based on Equation (5), the modeling data were first divided into training set and test set, and the training set accounted for 80% of the data set, then the coefficients of the regression model were calculated by inputting the training set data, and finally the total number of passengers at residential POI bus stations were projected by inputting the test set data, and the mean absolute error (MAE), root mean square error (RMSE), and MAPE were calculated using the actual passenger flow at bus stations as the true value, respectively.
② Original model
Based on Equation (4), the accuracy of the original model is calculated by inputting the corresponding data and further compared with the results of the improved model.
- (2)
XGBoost-based residential POI bus station ridership projection model
In the same way as ①, after completing model tuning using the training set data, we input the test set data to impute the total number of passengers at residential POI bus stations and calculate MAE, RMSE, and MAPE, respectively.
The comparison results of the above models are shown in
Table 3. The MAE, RMSE, and MAPE of the improved model of residential POI bus station ridership regression are the smallest and have the highest accuracy. However, the MAPE reaches 72.024%, indicating that the model’s prediction accuracy is moderate. Nevertheless, considering the complexity of the problem, this model is deemed to have a certain reference value.
Further evaluating the optimal model, the comparison results between the predicted and true values of the training set and test set inputs are shown in
Figure 7 and
Figure 8. The upper and lower lines in the Figure indicate the error values of plus or minus 30% of the true values, and the more points falling within the error range indicate the higher accuracy of the model. As shown, the model prediction values are more concentrated, and the results are reasonable. The percentage of the error of the results within 30% is 28.83% and 27.12% for the training set and test set, respectively, and the accuracy of the model is acceptable.
4.2.3. Reachable Distance Validation
The reachable distance is additionally divided into three categories of 300 m, 700 m and 1000 m, and the corresponding data are input to calculate the model MAPE respectively, and the comparison results are shown in
Figure 9. The results show that as the reachable distance increases, the model MAPE gradually increases and the model projection effect becomes worse. When the reachable distance is 500 m or more, the model effect changes more and is relatively worse, indicating that the maximum walking distance of bus passengers is within 500 m. Therefore, it is more reasonable to set 500 m as the reachable threshold in this paper.
4.3. Passenger Projection
4.3.1. Residency Projection
Based on the model coefficients in
Section 4.2.3, the residential POI data are input to project the number of passengers from residential POI to each reachable bus station; further, to evaluate the accuracy of the projection, the total number of passengers at residential POI bus stations is verified. The specific values of residential POI to each reachable bus station ridership are shown in
Table 4, and the results indicate that the number of urban village bus station ridership is greater than the number of community bus station ridership.
The MAE, RMSE, and MAPE of the total number of passengers at residential POI bus stations were calculated to be 155.490, 266.405, and 81.272%, respectively, using the actual passenger flow at the bus stations as the true value, and the results of the comparison between the predicted and true values are shown in
Figure 10. The results show that the vast majority of points are concentrated on or near the three straight lines, which is also consistent with the MAPE of 81.272%. Compared with the results of the previously proposed regression-improved model, the MAPE calculated in this section is higher, while the MAE and RMSE show the opposite trend. This phenomenon can be attributed to the larger data volume input in the calculation of this section, which leads to slightly lower MAE and RMSE values.
Using the number of passengers from residential POI to each reachable bus station as the weight, the actual number of passengers at the bus station is assigned to each residential POI, and the specific values of the actual number of passengers at the residential bus station are shown in
Table 4. As shown, after the passenger flow allocation, the average value of the community is higher than before the allocation, and the average value of the urban village is lower than before the allocation, considering that the reason is that the number of passengers at the community bus station before the allocation is a larger ratio of the total number of passengers at the bus station, and the results are relatively reasonable.
Further, in order to evaluate the actual number of residential bus passengers at the place of residence, the calculation is based on the actual number of residential passengers at the bus station at the place of residence. The results show that the average actual number of residential bus passengers is 132.59, the average value of the community is 105.01, and the maximum value is 2769; the average value of the urban village is 168.04, and the maximum value is 3602. The heat map of residential bus passengers’ distribution is shown in
Figure 11, which shows that residential bus passengers are more distributed in downtown areas where public transit is more convenient. Compared with
Figure 12 Residential POI number distribution heat map, observation shows that the number of residential POI in area ① and area ② is relatively close, but the number of residential bus passengers in area ② is much larger than that in area ①, considering the reason that area ① is located in the northwest of Baoan District, the residential POI is mainly in urban villages, and the willingness to travel by bus is weaker, while area ② is located in Longhua District, there are more subdivisions, and the willingness to travel by bus is stronger. The heat map of the residential bus passengers distribution with the reachability distance threshold set to 1000 m is further plotted, as shown in
Figure 13 and compared with
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11,
Figure 12,
Figure 13,
Figure 14,
Figure 15,
Figure 16,
Figure 17 and
Figure 18, it is found that the distribution of public transport passengers in residential districts is relatively concentrated, with a generally high willingness to travel by public transport and a high level of transportation sustainability. In contrast, the distribution of public transport passengers in urban villages is relatively scattered, and there are fewer public transport passengers in the city center, resulting in a generally low willingness to travel by public transport and a low level of transportation sustainability.
4.3.2. Trip Ratio Projection
- (1)
Ratio of trips to bus stations in residential areas
Based on the residence projection results, the ratio of trips to bus stations in the residence is projected according to Equation (7). Among them, the average value of the trip ratio of the community bus station is 0.0115, and the maximum value is 0.194; the average value of the trip ratio of the urban village bus station is 0.0073, and the maximum value is 0.187. To further verify the relationship between the number of bus station lines and the ratio of trips to bus stations in communities and urban villages, the ratio of trips to bus stations in communities and urban villages was divided into five intervals according to the quintile method, the mean values of the number of bus station routes in different zones were counted, as shown in
Figure 14 and
Figure 15.
- (2)
Ratio of trips by bus in residential areas
Based on the residence projection results, the ratio of trips by bus in residential areas is calculated according to Equation (8). Statistically, the average ratio of trips by bus in residential areas is 0.0561, compared with the ratio of 0.0478 obtained by dividing the daily average of 838,900 morning peak transit trips by the resident population of 17,560,100 in Shenzhen, with an error of 17.61%, and the results are relatively reasonable. Among them, the average value of the community bus travel ratio is 0.0573, and the maximum value is 0.197; the average value of the urban village bus travel ratio is 0.0509, and the maximum value is 0.214. The frequency distribution of the ratio of bus trips in residential areas is shown in
Figure 16, and the results show that the low ratio of bus trips is more common, which is consistent with the current situation of residents’ travel; the frequency distribution of the ratio of bus trips in communities and urban villages is shown in
Figure 17 and
Figure 18, and overall, the ratio of bus trips in communities is higher than the ratio of bus trips in urban villages. The thermal distribution of the ratio of bus trips in communities and urban villages is shown in
Figure 19 and
Figure 20. As shown, the communities with a high ratio of bus trips are mainly concentrated in the city center, and the urban villages with a high ratio of bus trips are outside the city center and are more scattered.
Further, to verify the relationship between the community housing price, the number of reachable bus lines, the number of reachable subway lines and the average distance to reachable subway stations and the community transit travel ratio, the neighborhood transit travel ratio was divided into five intervals according to the quintile method, and the mean values of housing price, the number of reachable bus lines, the number of reachable subway lines and the average distance to reachable subway stations in different intervals were counted, as shown in
Figure 21,
Figure 22,
Figure 23 and
Figure 24. The results show that the ratio of community bus trips and the above factors all show a linear relationship, and the results determined by the functional relationship are consistent; the ratio of community bus trips is relatively reasonable. Among the results,
Figure 22 indicates that the number of accessible bus routes exhibits a negative linear correlation with the public transport travel ratio. This phenomenon is mainly attributed to the fact that areas with a higher density of bus routes are usually located in downtown areas, where metro stations are also relatively concentrated. Since passengers generally prefer metro travel, this leads to the aforementioned result.
5. Summary
In view of the current inconvenient and expensive data acquisition of bus passengers’ residences, this paper takes into account the factors influencing the proportion of bus stop-trips and puts forward the idea of “calculating the residence of bus passengers from the characteristics of housing POI housing price, bus, and station convenience”. The regression model of residential POI bus station passenger number and the projection model of residential POI bus station passenger number based on XGBoost are constructed. Meanwhile, for the city of Shenzhen, the proposed model and method were verified and evaluated by using the bus travel data, residential POI data, and subway lines and stations data. The results show that the regression model has the highest accuracy, and the calculated results are consistent with reality and relatively reasonable.
This study holds the following three application values:
- (1)
Supplementing accurate OD information of public transport passengers to provide data reference for subsequent research
Compared with the information derived solely from public transport IC card data, the residential location information of public transport passengers estimated in this study is more accurate. It can serve as a supplement to the OD information of public transport passengers, thereby enabling a timely grasp of passengers’ travel demands and patterns, and providing a basis and reference for subsequent research.
- (2)
Enhancing public transport passenger satisfaction and improving public transport service quality
By understanding the residential distribution of public transport passengers, operational strategies that meet passengers’ personalized needs can be formulated. Improving passenger satisfaction and establishing a public reputation for convenience and fulfillment among passengers are crucial to enhancing service quality and public transport competitiveness. They also constitute a key link in achieving the 81% green transport mode share target by 2025.
- (3)
Laying the foundation for the overall optimization of the public transport network in the future
The application of the research findings helps accurately identify the actual demands of public transport passengers and provides data support for public transport administrative departments and operating enterprises to formulate rational capacity allocation plans and passenger transport organization schemes. It further delivers more targeted and higher-quality public transport services to citizens and contributes to the improvement of the sustainability level of urban transportation.
There are still some problems and inadequacies in this research. The following aspects are found to be further explored in the future:
- (1)
The reachable distance in this paper is simply set as a radius of 500 m, which still has a certain deviation from the actual walking distance of passengers. In future research, more accurate walking distances can be used, so as to better match the actual situation of passengers.
- (2)
For some websites with anti-crawler programs, it is impossible to crawl their data. As a result, there are still some errors between the residential POI data in this paper and the real situation. Therefore, the follow-up research can further study the principle of the crawler, to update and supplement residential POI data in real time, and to make the data more accurate.
- (3)
The object of this study is mainly the residence of bus passengers in the morning rush hour. Subsequent studies can be combined with the bus travel data of other time periods, other types of POI data, such as enterprise POI, entertainment POI, and other travel mode data, such as by-subway and by-online-car, so as to expand the object of this study to the actual, accurate OD projection of urban residents.
Author Contributions
Conceptualization, L.Z. (Liang Zou), L.Z. (Lingxiang Zhu), and Q.X.; methodology, L.Z. (Liang Zou) and L.Z. (Lingxiang Zhu); software, Q.X.; validation, L.Z. (Lingxiang Zhu); formal analysis, L.Z. (Liang Zou); investigation, L.Z. (Liang Zou) and Q.X.; resources, L.Z. (Liang Zou); data curation, L.Z. (Lingxiang Zhu) and Q.X.; writing—original draft preparation, L.Z. (Lingxiang Zhu) and Q.X.; writing—review and editing, L.Z. (Liang Zou); visualization, L.Z. (Liang Zou) and Q.X.; supervision, L.Z. (Liang Zou); funding acquisition, L.Z. (Liang Zou) All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by Shenzhen Science and Technology Plan Project (No.KJZD20230923115223047) and Shenzhen Higher Education Stable Support Plan Project(No.20231123103157001).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
References
- Facing Declining Passenger Flow and Operational Difficulties, How Can Public Transport Enterprises Achieve Stable and Long-Term Development? People’s Daily. 29 September 2025. Available online: https://baijiahao.baidu.com/s?id=1844552207705471183&wfr=spider&for=pc (accessed on 1 October 2025).
- Prakasa, B.; Putra, D.W.; Kusumawardani, S.S.; Widhiyanto, B.T.Y.; Habibie, F. Big data Analytic for Estimation of Origin-Destination Matrix in Bus Rapid Transit System. In Proceedings of the 2017 3rd International Conference on Science and Technology-Computer (ICST), Yogyakarta, Indonesia, 11–12 July 2017; pp. 165–170. [Google Scholar]
- Takao, R.; Ikeuchi, N.; Suzuki, H.; Matsumoto, Y. A Proposal for OD Data Estimation System of Bus Users with Intelligent Video Analysis and Its Application to Synerex. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics (ICCE), Penghu, Taiwan, 15–17 September 2021; pp. 1–4. [Google Scholar]
- Zhang, W.S.; Lu, M.; Zhu, J.J.; Yan, T.; Duan, Z.N. OD Calculation of bus passenger flow based on IC card and AVL data. Comput. Appl. Softw. 2021, 38, 100–105. [Google Scholar]
- Liu, B.K. The defect and prospect of existing OD survey method. Shandong Jiaotong Keji 2016, 4, 109–110. [Google Scholar]
- Jang, Y.; Ku, D.; Lee, S. Pedestrian mode identification, classification and characterization by tracking mobile data. Transp. A Transp. Sci. 2021, 19, 2008044. [Google Scholar] [CrossRef]
- Shiva, R.; Mehdi, G.; Hashemi, S.M.; Nickabadi, A. A hybrid of Neuro-Fuzzy Inference System and Hidden Markov Model for Activity-Based Mobility Modeling of Cellphone Users. Comput. Commun. 2021, 173, 79–94. [Google Scholar]
- Sun, Z.; Liu, J.M.; Yan, N. Prediction of urban residents’ OD matrix based on mobile phone big data. Math. Pract. Theory 2019, 49, 68–77. [Google Scholar]
- Zhang, X.D.; Jia, L.P.; Deng, S.C.; Wang, X.; Zhou, Z. Study on the operation characteristics of taxi and ride-hailing in Xiamen constrained by GPS track data and POI data. J. Beijing Univ. Civ. Eng. 2021, 37, 60–68. [Google Scholar] [CrossRef]
- Peng, F.; Song, G.H.; Zhu, S. A method for extracting commuting trips of frequent passengers in urban public transportation. J. Transp. Syst. Eng. 2021, 21, 158–165+172. [Google Scholar]
- Tang, H.T.; Liu, Y.P.; Wu, Z.C. Analysis of spatial heterogeneity of influencing factors of housing price based on POI data: A case study of Changsha. Urban Probl. 2021, 95–103. [Google Scholar]
- Wang, W.; Attanucci, J.P.; Wilson, N. Bus Passenger Origin-Destination Estimation and Related Analyses Using Automated Data Collection Systems. J. Public Transp. 2011, 14, 131–150. [Google Scholar] [CrossRef]
- Vanderwaart, C.; Attanucci, J.P.; Salvucci, F.P. Applications of Inferred Origins-Destinations and Interchanges in Bus Service Planning. Transp. Res. Rec. 2017, 2652, 70–77. [Google Scholar] [CrossRef]
- Lan, C. Route-Level Transit Passenger Origin-Destination Trip Estimation from Automatic Passenger Counting Data: A Case Study in Edmonton. Master’s Thesis, University of Alberta, Edmonton, AB, Canada, 2015. [Google Scholar]
- Rodríguez González, A.B.; Vinagre Díaz, J.J.; Wilby, M.R. Detailed Origin-Destination Matrices of Bus Passengers Using Radio Frequency Identification. IEEE Intell. Transp. Syst. Mag. 2022, 14, 141–152. [Google Scholar]
- Ge, Q.; Fukuda, D. Updating origin-destination matrices with aggregated data of GPS traces. Transp. Res. Part C Emerg. Technol. 2016, 69, 291–312. [Google Scholar] [CrossRef]
- Zang, H.; Bolot, J. Anonymization of Location Data Does not Work: A Large-scale Measurement Study. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking, Las Vegas, NV, USA, 20–22 September 2011. [Google Scholar]
- Frias-Martinez, V.; Soguero, C.; Frias-Martinez, E. Estimation of Urban Commuting Patterns Using Cellphone Network Data. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, Beijing, China, 12 August 2012. [Google Scholar]
- Liu, F.L.; Li, N.; Tian, L.F.; Wang, Y. Analysis and research on travel demand of public transport based on metropolitan comparison. Highway 2020, 65, 230–237. [Google Scholar]
- Zhang, X.M.; Gong, D.; Xie, B.L.; Ma, H. A study of the effectiveness of epidemic prevention policies on public transit usage based on the theory of planned behaviors. J. Transp. Inf. Saf. 2021, 39, 117–125. [Google Scholar]
- Hu, Y.Y.; Pu, Z.; Wang, P. Study on the impacts of traffic carbon emission pricing on resident trip behavior using logit model. J. Transp. Eng. Inf. 2021, 39, 117–125. [Google Scholar]
- Liu, Y.F.; An, T.; Chien, S.I.J.; Guo, J. Exploring influence factors for travel mode choice in cities with different scales. China J. Highw. Transp. 2022, 35, 286–297. [Google Scholar]
- Wang, W.L.; Yu, H. Research on evaluation method of pedestrian reachability and convenience of rail transit stations. Tianjin Constr. Sci. Technol. 2019, 29, 67–73. [Google Scholar]
- Zhang, S.Y.; Yang, Y.; Chu, Y.H.; Chen, Z.W. Evaluation of Wuhan bus station satisfaction based on structural equation. Highw. Automot. Appl. 2020, 9, 29–32+36. [Google Scholar]
- Cui, N.N.; Gu, H.Y.; Shen, T.Y. Study on the Impact of Transportation Spatial Layout on Urban Housing Prices—Based on the Correlation Analysis between Road Network Morphology and Housing Prices in Beijing. Price Theory Pract. 2019, 63–66. [Google Scholar] [CrossRef]
- Kang, J.; Luo, J.J.; Xue, S.Y.; Guo, L.F.; Wei, M.X. An empirical study on the relationship between housing prices and residents’ income in Shanxi province. Sci. Technol. Inf. 2022, 20, 129–132. [Google Scholar]
- Ren, X.W.; Huang, P. A Test of the mediating effect of population urbanization on house price. West Forum Econ. Manag. 2021, 32, 58–66. [Google Scholar]
- Yu, X.; Wang, M.Y.; Dong, X.; Chen, X.; Lu, J.X. A study on migration tendency of floating population in urban villages: Taking the city of Xi’an as an Example. Mod. Urban Res. 2021, 8, 10–16. [Google Scholar]
- Yao, W.J.; Bai, L.S. Implementation-oriented urban village traffic management optimization measures-Shenzhen city as an example. Traffic Transp. 2021, 34, 197–200. [Google Scholar]
- Hao, Q.T.; Wang, H.Y.; Hao, J.J. Study on influencing factors of community residents’ housing satisfaction-take Yijing community in Dazhou Sichuan province as an example. Jiangxi Build. Mater. 2022, 7, 322–325+328. [Google Scholar]
- Liu, Y.B.; Li, X.; Li, A.X. Countermeasures for the treatment and promotion of traffic congestion and hidden dangers in “villages within city”-taking Buji Changlong Area of Shenzhen city as an example. Traffic Transp. 2021, 34, 201–205. [Google Scholar]
- Wang, J.Q.; Zhan, Y.T.; Li, S.J. Analysis of the relationship between traffic congestion and population density and car ownership in surrounding communities. Auto Time 2020, 9, 32–33. [Google Scholar]
- Xia, H.B.; Dai, X.Y.; Wang, Y.; Wang, Z. The analysis of traffic convenience on county level based on GIS. Areal Res. Dev. 2006, 25, 120–124+130. [Google Scholar]
- Qi, W.F.; Zhang, J.J. Evaluation of Subway Station Convenience Based on Walking Living Circles—A Case Study of Hangzhou Metro Line 1. Archit. Cult. 2022, 136–138. [Google Scholar] [CrossRef]
- Xie, G.W.; Qian, L.B.; Pang, Y. Study on Public Transport Accessibility Measurement Based on GIS and Open Data. Logist. Technol. 2021, 44, 102–106. [Google Scholar]
- Gan, L.L.; Feng, X.H.; Bi, J.L.; Jiang, H.L. Study on Strength Prediction of High-Strength Concrete Based on Multivariate Nonlinear Regression Model. Concr. Cem. Prod. 2022, 1–7. [Google Scholar] [CrossRef]
- Fofanah, A.J.; Kalokoh, I.; Hwase, K.T.; Namagonya, A.P. Adaptive Neuro-Fuzzy Inference System with Non-Linear Regression Model for Online Learning Framework. Int. J. Sci. Eng. Res. 2020, 11, 375–391. [Google Scholar] [CrossRef]
- Wei, M.Y.; Li, L.L.; Huang, G.; Tang, F.; Zhang, Z. Deep learning in EEG decoding: A review. Chin. J. Biomed. Eng. 2019, 38, 464–472. [Google Scholar]
- Li, L.M.; Hou, M.M.; Chen, K. A Review of Research on the Interpretability of Deep Learning. Comput. Appl. 2022, 42, 3639–3650. [Google Scholar]
Figure 1.
Time distribution of passenger flow.
Figure 1.
Time distribution of passenger flow.
Figure 2.
Thermal spatial distribution diagram of passenger flow.
Figure 2.
Thermal spatial distribution diagram of passenger flow.
Figure 3.
Scatter diagram of community distribution.
Figure 3.
Scatter diagram of community distribution.
Figure 4.
Scatter diagram of urban village distribution.
Figure 4.
Scatter diagram of urban village distribution.
Figure 5.
Scatter diagram of bus stations distribution.
Figure 5.
Scatter diagram of bus stations distribution.
Figure 6.
Scatter diagram of subway stations distribution.
Figure 6.
Scatter diagram of subway stations distribution.
Figure 7.
Improved regression model of residential POI bus station passenger training set.
Figure 7.
Improved regression model of residential POI bus station passenger training set.
Figure 8.
Improved regression model of residential POI bus station passenger test set.
Figure 8.
Improved regression model of residential POI bus station passenger test set.
Figure 9.
MAPE Comparison of different reachable walking distance.
Figure 9.
MAPE Comparison of different reachable walking distance.
Figure 10.
Comparison between residential POI bus station total passengers and bus station actual passenger flow.
Figure 10.
Comparison between residential POI bus station total passengers and bus station actual passenger flow.
Figure 11.
Thermal distribution diagram of bus passenger residence.
Figure 11.
Thermal distribution diagram of bus passenger residence.
Figure 12.
Thermal distribution diagram of residential POI population.
Figure 12.
Thermal distribution diagram of residential POI population.
Figure 13.
Thermal distribution diagram of bus passenger residence (1000 M).
Figure 13.
Thermal distribution diagram of bus passenger residence (1000 M).
Figure 14.
Line chart of community station trip ratio and bus station lines.
Figure 14.
Line chart of community station trip ratio and bus station lines.
Figure 15.
Curving diagram of urban village station trip ratio and bus station lines.
Figure 15.
Curving diagram of urban village station trip ratio and bus station lines.
Figure 16.
Frequency distribution of residence trip ratio.
Figure 16.
Frequency distribution of residence trip ratio.
Figure 17.
Frequency distribution of community trip ratio.
Figure 17.
Frequency distribution of community trip ratio.
Figure 18.
Frequency distribution of urban village trip ratio.
Figure 18.
Frequency distribution of urban village trip ratio.
Figure 19.
Thermal distribution diagram of community trip ratio.
Figure 19.
Thermal distribution diagram of community trip ratio.
Figure 20.
Thermal distribution diagram of urban village trip ratio.
Figure 20.
Thermal distribution diagram of urban village trip ratio.
Figure 21.
Line chart of community trip ratio and house price.
Figure 21.
Line chart of community trip ratio and house price.
Figure 22.
Line chart of community trip ratio and reachable bus lines.
Figure 22.
Line chart of community trip ratio and reachable bus lines.
Figure 23.
Line chart of community trip ratio and reachable subway lines.
Figure 23.
Line chart of community trip ratio and reachable subway lines.
Figure 24.
Line chart of community trip ratio and average distance of reachable subway station.
Figure 24.
Line chart of community trip ratio and average distance of reachable subway station.
Table 1.
Indicators of Influencing Factors.
Table 1.
Indicators of Influencing Factors.
| Factor Categories | Influencing Factors | Indicators | Designing Way |
|---|
| Bus Travel Factors | Income per inhabitant in the place of residence | Residential POI Home Prices | Indirect |
| Age per inhabitant in the place of residence | Residential POI Type (community, urban village) | Indirect |
| Vehicle ownership per inhabitant in the place of residence | Indirect |
| Occupational share of residents in the place of residence | ———— | Excluded |
| The convenience of the bus in the place of residence | Number of residential POI reachable bus routes | Direct |
| Number of residential POI reachable by bus stations |
| The convenience of the subway in the place of residence | Number of residential POI reachable by subway lines | Direct |
| Number of residential POI reachable subway stations, average distance |
| Bus Station Choosing Factors | Convenience of bus stations | Number of bus routes | Direct |
| Number of residential POI reachable bus stations distance |
Table 2.
Fitting Results of Each Functional Form for Influencing Factors.
Table 2.
Fitting Results of Each Functional Form for Influencing Factors.
| Factor Categories | Influencing Factors | Linear Function | Power Function | Exponential Function |
|---|
| community-related influencing factors | Housing price | 103.344 | 104.228 | 105.024 |
| Number of accessible bus routes | 84.501 | 88.265 | 104.893 |
| Number of accessible metro lines | 31.357 | 44.357 | 36.604 |
| Average distance to accessible metro stations | 83.350 | 94.219 | 84.363 |
| Number of bus routes per bus stop | 43.661 | 51.036 | 48.195 |
| urban village-related influencing factors | Number of bus routes per bus stop | 35.476 | 24.1847 | 38.805 |
Table 3.
Comparison of models.
Table 3.
Comparison of models.
| Model Results | MAE | RMSE | MAPE |
|---|
| Improved residential POI bus station ridership regression model | 161.027 | 288.069 | 72.024% |
| Original Residential POI bus station ridership regression model | 180.642 | 311.052 | 113.531% |
| XGBoost-based residential POI bus station ridership projection model | 167.124 | 295.868 | 82.472% |
Table 4.
Residential POI and residence bus station passenger.
Table 4.
Residential POI and residence bus station passenger.
| Type | The Number of Passengers from Residential POI to Reachable Bus Stations | The Actual Number of Passengers in the Residential Bus Station |
|---|
| Values | |
|---|
| Overall average | 21.48 | 23.07 |
| Community average | 19.34 | 22.29 |
| Urban villages average | 23.72 | 23.64 |
| Community maximum | 817 | 1327 |
| Urban villages maximum | 1088 | 1720 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |