Next Article in Journal
Energy Efficiency and Environmental Sustainability: Investigating the Moderating Role of Trade Openness in Türkiye
Previous Article in Journal
Application of Geo-Bag and Cement Concrete Blocks in Riverbank Erosion Control: A Study of Satkhira Koyra
Previous Article in Special Issue
Road Safety Management in Brazilian Logistics Companies: An Empirical Study of Practices, Motivators, and Barriers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Bus Passengers’ Residential Locations Based on Morning Rush Hour Travel Data and POI Information

1
College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
2
College of Civil and Traffic Engineering, Shenzhen University, Shenzhen 518060, China
*
Author to whom correspondence should be addressed.
Sustainability 2026, 18(1), 41; https://doi.org/10.3390/su18010041
Submission received: 5 November 2025 / Revised: 11 December 2025 / Accepted: 16 December 2025 / Published: 19 December 2025
(This article belongs to the Special Issue Sustainable Transport System and Mobility in Urban Traffic)

Abstract

To address the issues of inefficiency and high costs in obtaining data on the residential distribution of public transport passengers at present, this paper proposes an approach of “estimating the residential distribution of public transport passengers based on characteristics such as housing prices of residential Point of Interest (POI) and the convenience of public transport and its stops”. First, from two aspects—public transport travel and the selection of public transport stops—eight influencing factors for the selection of public transport stops during travel are identified. Based on these factors, a regression model for the number of public transport passengers from residential POI to their corresponding stops is constructed, through which the number of passengers traveling from each residential POI to all accessible public transport stops is obtained. This number is then used as a weight to allocate the actual passenger flow of each public transport stop to the respective residential POI, thereby realizing the estimation of the residential distribution of public transport passengers. Furthermore, this approach enables the estimation of the proportion of trips made from residential areas to specific public transport stops and the overall proportion of public transport trips among all travel modes from residential areas. The proposed estimation method is verified and evaluated using Shenzhen as a case study.

1. Introduction

In recent years, with the large-scale construction of urban rail transit and the diversification of transportation modes, the development of conventional public transport has been confronted with unprecedented challenges and pressures. From 2019 to 2024, the annual passenger volume of conventional public transport across the country decreased from 69.176 billion person-times to 38.670 billion person-times, with a decline rate of 44.1% [1]. This is mainly due to the continuous improvement of the metro network and the rapid growth in the ownership of electric bicycles. Faced with the new transportation situation, the development of conventional public transport should change traditional thinking, actively promote the supply-side reform of conventional public transport services, and shift from the previous focus on increasing the “quantity” of supply—such as simply expanding the density of the route network and the frequency of departures—to enhancing the “quality” of public transport services.
Mastering accurate passenger travel demand is a prerequisite for improving public transport service quality. Currently, the technology for estimating public transport passengers’ OD (Origin-Destination) data is relatively mature, but its accuracy is not high—it can only estimate data down to the level of public transport stops. However, achieving improvements in the “quality” aspect of the supply side of conventional public transport services is inseparable from precise POI-level OD data of public transport passengers. Precise OD data of public transport passengers reflects residents’ travel patterns and travel demands. It can provide data support for establishing a system of refined public transport implementation plans, identifying the travel needs of public transport passengers, and enhancing the targeting and precision of public transport services [2,3].
The commonly used method to obtain precise Origin-Destination (OD) data of public transport passengers is the manual survey method [4], which is usually conducted in the form of questionnaires. This method has the advantage of being able to acquire comprehensive and accurate public transport passenger OD data. However, it also has drawbacks: poor repeatability, high labor costs, and it is often impossible to implement on a large scale. As a core component of precise OD data [5], the residential distribution of public transport passengers can fully reflect the status of urban public transport and urban spatial layout. It serves as an important support for formulating urban comprehensive transportation system plans and urban spatial strategic plans. During the morning peak hours, public transport trips are highly concentrated, with commuting passengers being the main group of travelers, and residential areas being the primary starting points of commuting [6,7]. Therefore, morning peak public transport trip data can be used to study the distribution of passengers’ residential areas. Meanwhile, POI data has been widely applied in research on residents’ activities, the extraction of urban functional zones, the analysis of urban business formats, and other fields. Residential areas, also referred to as residential POI, are a special type of POI. With the development of Location-Based Services (LBS), real estate online platforms, and web crawler technology, open platforms such as Baidu Maps, Amap, and Tencent Maps provide API interfaces for high-concurrency data retrieval services, enabling efficient acquisition of information such as POI geographic coordinates [8]. Major domestic real estate websites, including Lianjia, Fang.com, and Anjuke, record detailed information about residential POI, such as housing prices, property types, and longitude and latitude. Thus, it has become possible to automatically collect residential POI data in batches [9,10,11].
In summary, data on the residential distribution of public transport passengers is of great significance for solving public transport-related issues, while the traditional manual survey method has inherent shortcomings. With the development of relevant technologies, residential POI—which are convenient to collect—have provided a new research perspective for obtaining data on the residential distribution of public transport passengers. Against this backdrop, this study focuses on realizing the estimation of the residential distribution of public transport passengers based on POI information accessible from stops and morning peak travel data. This aims to meet the demand for efficiently, conveniently, and cost-effectively obtaining data on the residential distribution of public transport passengers, to provide data support and a scientific basis for subsequent research, and to improve the sustainability level of urban transportation.

2. The Literature Review

At present, exploration and research on the issue of obtaining data on the residential distribution of public transport passengers are mainly reflected in two aspects: research on public transport passengers’ Origin-Destination (OD) data and research on residents’ workplace-residence locations.

2.1. Public Transport Passengers’ OD Data

Some researchers at home and abroad analyze the original data of public transport IC card swipes and combine multi-source data obtained from other sources, such as on-board GPS data and vehicle operation data. Through data fusion, they form a dataset to estimate passengers’ Origin-Destination (OD) information. Additionally, some experts and scholars adopt methods such as intelligent video analysis and infrared sensor counting to identify the boarding and alighting behaviors of public transport passengers, thereby constructing a travel OD matrix. Wei Wang et al. [12] used an automatic data collection system, leveraging public transport IC card transaction data and vehicle positioning data. Based on the principle of travel chains, they analyzed the boarding and alighting stops of public transport passengers in London, which were then used as the passengers’ OD points. Catherine Vanderwaart et al. [13] designed a new service planning program that automatically aggregates public transport IC card data and vehicle positioning data to infer passengers’ boarding and alighting stops. These inferred stops were used as passengers’ OD information to address issues related to public transport network design and service planning. Widyawan et al. [2] conducted an analysis at the level of public transport stops and routes based on the principle of travel chains. They used public transport IC card data to construct an initial OD matrix table for passengers and added the judgment of passenger behavior patterns, which was applied to quickly support the operational planning of public transport. R. Takao et al. [3] utilized an intelligent video analysis system to identify information about the bus stops where passengers board and alight. This information was used as passenger OD data to understand passenger flow distribution, thereby providing data support for tasks such as extracting potential demand, adjusting timetables, and modifying routes. Lan Cheng [14] collected data using an automatic passenger counting (APC) system based on infrared technology, matched the data to public transport routes and stops, and used this to estimate passengers’ travel OD information. Rodríguez González AB et al. [15] used radio frequency identification (RFID) technology to count passengers, proposed a complete BIBO (Board-In Board-Out) system, and designed corresponding algorithms for calculating individual trips of each passenger and the corresponding OD matrix. Finally, the effectiveness of the system and algorithms was verified through two practical experiments.

2.2. Workplace-Residence Locations

Some researchers at home and abroad have conducted studies on the workplace-residence locations of urban residents. They use mobile phone signaling data to analyze features such as call locations, stay duration, and stay time periods, thereby proposing methods to identify workplace-residence locations. Additionally, some experts and scholars integrate multi-source data, including vehicle data, land use data, and social media data, to identify residents’ workplace-residence locations, aiming to provide new approaches for research on urban residents’ workplace-residence locations. Ge Q. et al. [16] developed a maximum entropy model for estimating the Origin-Destination (OD) of residents’ residence- and work-related trips. This model utilizes mobile phone signaling data and is based on a sequence update algorithm grounded in the principle of maximum entropy. Zang H. et al. [17] integrated mobile phone call data with available public data (e.g., census data). They identified residents’ workplace-residence locations by analyzing the frequency of mobile phone users’ call events. Frias-Martinez V. et al. [18] identified individuals’ key activity locations through analytical methods such as spatial clustering, based on the spatial distribution of mobile phone calls. They then determined workplace-residence locations by combining these locations with the corresponding activity times. Shiva R. et al. [7] constructed an activity-based travel demand model using mobile phone signaling data, which is built on a neuro-fuzzy inference system and a hidden Markov model. This model distinguishes commuters and further infers their workplace-residence locations. The effectiveness of the model was verified by comparing its results with three types of data: expert-labeled on-site actual data, activity-based trip volumes generated and attracted by different regions, and highway traffic volume data from survey reports. Jang Y. et al. [6] designed an algorithm for pedestrian detection based on mobile phone mobility data and GPS base station information. This algorithm identifies pedestrian travel patterns and infers trip origins and destinations. Finally, the algorithm’s effectiveness was validated by comparing its outputs with data from household questionnaires.

2.3. Summary of the Literature Review

From the above research on the status of scholars at home and abroad, the following conclusions can be drawn:
(1) In the current research on public transport passengers’ Origin-Destination (OD) data, the boarding and alighting stops of passengers are basically taken as the passengers’ OD for research, while the actual locations of passengers’ OD are ignored. This deviates from the real situation and has a certain discrepancy with passengers’ actual travel demands, which hinders the improvement of the targeting and precision of public transport services.
(2) In the current research on residence estimation, most studies use mobile phone signaling data for estimation. The estimated residences are the residences of mobile phone users, and there is a lack of residence data specifically for public transport passengers. In addition, there are two disadvantages in using mobile phone signaling data to estimate workplace-residence locations: on the one hand, considering that current users pay more attention to protecting their own privacy, it is difficult to obtain such data; on the other hand, mobile phone signaling data needs to be purchased from mobile operators, and the cost is relatively high.
In summary, there are few current studies on the estimation of public transport passengers’ residences. The methods for obtaining data on the residential distribution of public transport passengers are inefficient, cumbersome, and expensive, and there is a serious lack of such data, which makes it difficult to provide a data reference and theoretical basis for accurately identifying passengers’ demands.
To address the above problems, this study explores an approach to estimate the residential distribution of public transport passengers based on the data of residential POI accessible from stops. Specifically, it integrates and uses the information of POI accessible from stops and public transport passenger flow data during morning peak hours, and conducts an in-depth analysis of the impacts of factors such as housing prices and types of residential POI, the convenience of public transport and subways, and the convenience of public transport stops on residents’ willingness to travel to public transport stops. Furthermore, a regression model for the number of public transport passengers from residential POI to stops is constructed to estimate the number of passengers traveling from each residential POI to all accessible public transport stops. Taking this number as the weight, the actual passenger flow of each public transport stop is allocated to the respective residential POI, and finally, the estimation of the residential distribution of public transport passengers is realized.

3. Methodology and Data

3.1. Problem Transformation

The research problem in this paper is the transit passenger residence projection, which ultimately requires obtaining the distribution of transit passengers by residence, and from the perspective of residence, the number of bus station passengers by residence is required. The number of residential bus station passengers is determined by the population living in the residence and the residents’ willingness to travel to the bus station, using the residential bus station travel ratio [19,20] (later referred to as the travel ratio) to express the residents’ willingness to travel to the bus station, as shown in Equation (1). Where the population in residence is a knowable constant, at this point, the research problem is transformed into a travel ratio projection.
R B i j = P i · α i j
where R B i j denotes the number of passengers from the residence i to the reachable bus station j ; P i denotes the population in residence i ; α i j denotes the ratio of trips from the residence i to residence i reachable bus station j . i = 1 , 2 M , M is the number of residences in the tract;   j = 1 , 2 N i , N i is the number of reachable bus stations for residents i in the tract.

3.2. Influencing Factors

To estimate the proportion of trips to public transport stops, it is first necessary to explore the specific influencing factors. This paper conducts a further analysis from two aspects: public transport travel factors and public transport stop selection factors. The reason is that public transport travel factors affect residents’ choice of travel mode, while public transport stop selection factors influence residents’ choice of specific public transport stops.
Currently, research on the influencing factors of travel mode choice is quite mature. The main influencing factors include [21,22]: residents’ income, age, car ownership, occupation, and the convenience of travel by bus or subway. In studies on public transport passengers’ stop selection [23,24], stop satisfaction is usually used as the evaluation criterion. The distance between the stop and the residential POI, and the number of bus routes at the stop, are used to characterize this satisfaction.

3.2.1. Indirect Influencing Factors

From the available data, it is found that the current data on residents’ per capita income, age per capita, per capita car ownership, and occupation proportion cannot be obtained directly. Further analysis shows that the housing price of residential POI is directly related to residents’ per capita income [25,26]; urban villages have a larger proportion of young people, while the age distribution in residential communities is more balanced [27,28,29]; on the other hand, residential communities usually have more parking spaces, so their per capita car ownership is higher than that of urban villages [30,31]. Therefore, this paper uses the housing price and type of residential POI to indirectly reflect residents’ per capita income, age per capita, and per capita car ownership. The specific indirectly designed parameter indicators are detailed in Table 1. In addition, due to the inability to design indirectly through available data, the influencing factor of residents’ occupation proportion is excluded.

3.2.2. Direct Influencing Factors

The scope of factors considered for the convenience of public transport, subways, and public transport stops in residential areas is relatively broad, among which accessibility [32,33,34] is the key aspect. Therefore, this paper selects two indicators as parameters for the convenience of public transport and subways: the number of accessible (within a 500 m walking distance) public transport and subway lines from a residential POI, and the average distance from the residential POI to (relevant) stops. For the convenience of public transport stops, the paper selects the number of accessible lines at the stops (reachable from the residential POI) and the distance (from the residential POI to the stops) as its parameter indicators. Table 1 presents the indicators for each influencing factor.

3.2.3. Determination of Functional Relationships

This section conducts multiple functional relationship fittings between each influencing factor and the public transport travel ratio, respectively. The results indicate that the correlations between community-related influencing factors and the public transport travel ratio are all linear, while those between urban agglomeration-related influencing factors and the public transport travel ratio follow a power function relationship. The fitting errors are presented in Table 2 below.

3.3. Regression Model for the Number of Public Transport Passengers from POI to Stops

3.3.1. Original Model

This section conducts multiple functional relationship fittings between each influencing factor and the public transport travel ratio, respectively. The results indicate that the correlations between community-related influencing factors and the public transport travel ratio are all linear, while those between urban agglomeration-related influencing factors and the public transport travel ratio follow a power function relationship. The fitting errors are presented in the table below. The residential POI data are divided into two categories of communities and urban villages. The regression model of residential POI bus station ridership is constructed from the perspective of stations, as shown in Equation (2).
B j = i P i · α i j + i P i · α i j
where B j denotes the projected total number of passengers at the bus station j ; P i denotes the population in the community i ; α i j denotes the ratio of trips from the community i reachable from the bus station j ; P i denotes the population in urban village i″ population; α i j denotes the ratio of trips from the urban village i reachable from the bus station j . i = 1 , 2 M , M denotes the number of communities in the tract; i   = 1 , 2 M ,   M denotes the number of urban villages within the tract; j   = 1 , 2 N i , i , N i , i is the number of bus stations that are reachable from both subdivisions within the tract i and urban villages i .
Further, based on the functional relationships determined in the previous section, all influencing factors related to the public transport travel ratio of residential districts exhibit a linear relationship. These factors constitute the public transport travel ratio of residential districts through the summation of correlation coefficients, while the influencing factors related to the public transport travel ratio of urban villages follow a power function relationship, and Equation (2) can be changed to Equation (3) [35,36].
B j = i P i · [ θ 1 · X ( i ) 1 + θ 2 · X ( i ) 2 + θ 3 · X ( i ) 3 + θ 4 · X ( i ) 4 + θ 5 · X ( j ) 5 + C ] + i P i · [ θ · ( X j ) S ]
where X ( i ) 1 , X ( i ) 2 , X ( i ) 3 , X ( i ) 4 , X ( j ) 5 denote the house price of the community i , the number of reachable bus lines, the number of reachable subway lines, the average distance of reachable subway stations, and the number of lines of the bus station j , respectively; θ 1 , θ 2 , θ 3 , θ 4 , and θ 5 denote the coefficients of the influencing factors related to the travel ratio of different communities, respectively; C denotes a constant; X j denotes the number of lines of the bus station j in the urban village perspective; θ denotes the coefficients of the influencing factors related to the travel ratio of urban villages; and S denotes a power function coefficient.
Equation (2) can be further transformed into Equation (4), and the analysis reveals that the coefficient θ of the influencing factors related to the ratio of cell trips presents a correlation form with the population P i in each cell.
B j = [ θ 1 i P i · X ( i ) 1 + θ 2 i P i · X ( i ) 2 + θ 3 · i P i · X ( i ) 3 + θ 4 · i P i · X ( i ) 4 + θ 5 · X ( j ) 5 · i P i + C ] + [ θ · ( X j ) S · i P i ]

3.3.2. Improved Model

Considering that the total population in the community reachable to the station is a fixed constant and the total number of passengers is only related to the overall travel ratio influence factor, the model is improved to Equation (5) by referring to the formula for calculating the number of passengers at bus stations in urban villages [ θ · ( X j ) S · i P i ] and separating the population from the influence factor in Equation (4) for calculating the number of passengers at bus stations in the cell to avoid the problems of the original model.
Based on the improved model, the actual passenger flow at bus station j is approximated as a substitute for B j , and the unknown coefficients θ 1 , θ 2 , θ 3 , θ 4 , θ 5 , C, θ , S. At this point, the residential POI data is input, and the number of passengers from the residential POI to each reachable bus station R B i j can be deduced.
B j = P j · [ θ 1 · X 1 + θ 2 · X 2 + θ 3 · X 3 + θ 4 · X 4 + θ 5 · X ( j ) 5 + C ] + P j · [ θ · ( X j ) S ]
where P j denotes the total population in the community reachable by the bus station j ; X 1 , X 2 , X 3 , X 4 denote the housing price, the number of reachable bus lines, the number of reachable subway lines, and the average distance between reachable subway stations to the community reachable by bus station j , respectively; P j denotes the total population in the urban village reachable by the bus station j , i.e., i P i ; j = 1, 2, ……N, and N is the number of bus stations in the tract.

3.4. Method of Projecting the Residence of Public Transport Passengers

3.4.1. Residence Projecting

Using the residential POI bus station ridership regression model, it is possible to derive the number of passengers from the residential POI to each reachable bus station. Since the number of bus station passengers obtained by this projection deviates from the actual one, it is proposed to use the number of passengers from residential POI to each reachable bus station as the weight to allocate the actual passenger flow of bus stations to each residential POI, as shown in Equation (6), to finally realize the projection of bus passenger residence.
R B i j = B j · R B i j   B j
where R B i j denotes the actual number of passengers in the bus station j whose residence is i ; B j denotes the actual passenger flow at the bus station j ; R B i j denotes the imputed number of passengers from residential POIi to the bus station j ; B j denotes the imputed total number of passengers at the bus station j , i.e., i R B i j .

3.4.2. Travel Ratio Projecting

Based on Equation (5), the travel ratio in the community is [ θ 1 · X 1 + θ 2 · X 2 + θ 3 · X 3 + θ 4 · X 4 + θ 5 · X ( j ) 5 + C ] ; the travel ratio in the urban village is [ θ · ( X j ) S ] . Since the travel ratio projected in this way deviates from the actual one, this section projects the travel ratio α i j at the bus station of the residence based on the projected results of the residence of the bus passengers, according to Equation (7). Taking the residence as the main body, the actual number of passengers at the bus stations in the residence is summed up to achieve the actual number of bus passengers in the residence, and the ratio of bus trips in the residence is further calculated, as shown in Equation (8).
α i j =   R B i j P i
α i =   R B i P i
In Equation (6), α i j denotes the ratio of trips from the residence i to residence i up to the bus station j ; Pi denotes the population with residence i . In Equation (8), α i denotes the ratio of bus trips with residence i ; R B i denotes the actual number of bus passengers with residence i , i.e., j R B i j .

3.5. XGBoost-Based Residential POI Bus Station Ridership Projecting Model

3.5.1. Introduction to the Algorithm

In this paper, we do not use the model constructed based on deep learning algorithm as the main model for two main reasons [37,38,39]: on the one hand, the deep learning process requires a large amount of data, and only 5050 samples are used to construct the model in this paper, which may lead to poor generalization ability of the model; on the other hand, deep learning cannot explain the principle of the constructed model, and the constructed model may be far from the actual model, such as the population living in the area and the travel ratio are constructed as a functional relationship in the form of non-multiplication. However, to further verify the validity of the regression model, an XGBoost-based residential POI bus station ridership imputation model is constructed in this paper as a reference.
XGBoost is an improved learning algorithm based on Gradient Boosting and Decision Tree (GBDT). The principle is to use the idea of iterative operations to transform a large number of weak classifiers into strong classifiers to achieve accurate classification results. It is an efficient implementation of GBDT, and its advantages are mainly reflected in two aspects: first, compared with GBDT, the objective loss function of XGBoost increases the regular term, which helps to reduce the model variance and prevent overfitting; second, the loss function of GBDT only does negative gradient (first-order Taylor) expansion for the error part, while the loss function of XGBoost does second-order Taylor expansion for the error part, which improves the prediction accuracy of the XGBoost algorithm.

3.5.2. Model Construction

In this section, all features are feature engineered into the input dataset, and then the model is trained and tuned to obtain the XGBoost-based residential POI bus station ridership imputation model.
(1)
Feature Engineering
The feature inputs of the XGBoost model are consistent with those of the regression model, where the inputs of the cell data are P j , X 1 , X 2 , X 3 , X 4 , X ( j ) 5 ; the inputs of the urban village data are P j , X j . The above feature inputs are all numerical variables with values in the range of [0, +∞).
(2)
Model Tuning
The parameters of XGBoost include general parameters, boosting parameters, and learning task parameters. The generic parameters are used to set the overall functionality, the boost parameters are used to set the parameters of each step of the regression tree, and the learning task parameters guide the model to perform optimization tasks. In this paper, the above parameters are tuned using the grid search cross-validation method, which returns the evaluation index scores under all parameter combinations by iterating through all permutations of the incoming parameters in a cross-validation manner.

3.6. Study Area

In this paper, the proposed model and method are validated and evaluated by taking Shenzhen city as an example. For the bus travel data, the data of the morning peak (7:00~9:00) trips in December 2020 were selected, with a daily average of 838,900 entries, accounting for 28.92% of the whole day trips. The average time distribution of passenger flow is shown in Figure 1, showing a trend of rising and then falling, with the peak located at around 8 o’clock; the average spatial distribution of passenger flow is shown in the heat map in Figure 2, with more concentrated passenger flow, mainly occurring in the city center. For station and line data, it specifically includes 5500 bus station data, 1020 bus line data, 234 subway station data, and 11 subway line data. For residential POI data, the total number of residential POI data is 6282; furthermore, the number of reachable communities for bus stations is 36,800, and the number of urban villages is 29,800.

3.7. Data Resources

3.7.1. Residential POI Data Crawling

(1)
Website Choosing
Compared with the mainstream real estate websites in China, “Housing World” and “Anjuke” have anti-crawler mechanisms and need to complete the slider verification manually at regular intervals; “Chain Home” does not have an anti-crawler mechanism. It is feasible to crawl data automatically and in bulk. Meanwhile, “Chain Home” covers 82 popular cities in China and has 230,000 pieces of residential data, which can meet the data demand.
(2)
Crawling method
This paper adopts a breadth-first traversal crawling strategy and uses Python (PyCharm 2025.1.1) to build six sub-functional modules, including a request configuration module, a URL de-duplication module, a robots protocol module, a web crawling module, a web parsing module, and a storage module, to crawl the residential POI data of “Chain Home” in a batch automatically under the premise of the standard operation process.
(3)
Results
For Shenzhen, 4574 communities and 1708 urban villages were crawled. Among them, the communities are mainly distributed in the city center, with an average price of 8826.02 USD/ m 2 and a total of 5,361,900 people; the urban villages are more scattered, with an average price of 7934.94 USD/ m 2 and a total of 10,479,900 people. The total number of residential POI is compared with the resident population of 17,560,100 people in the Seventh Census Analysis Report, and the relative error is −9.79%, and the crawling result is basically in line with the reality. The specific distribution of subdivisions and urban villages is shown in Figure 3 and Figure 4.

3.7.2. Bus and Subway Data Obtaining

(1)
Platform choosing
Compared with the domestic mainstream map open platform, the personal quota of Baidu data retrieval function is 30,000/day, while Tencent and Gaode are 10,000/day and 5000/day, respectively; therefore, this paper chooses the Baidu map open platform to obtain bus and subway data.
(2)
Obtaining method
In this paper, we use Python to write query statements conforming to the platform format, access the application programming interfaces of the Baidu Map open platform, call the highly concurrent data retrieval function, and obtain the bus and subway data in Baidu Map automatically in batches.
(3)
Results
For Shenzhen, 1020 bus lines, 5050 bus stations, 11 metro lines, and 234 metro stations were obtained. After comparing and checking with the data of the Shenzhen Municipal Bureau of Transportation, the obtained data are relatively complete. The distribution maps of bus and subway stations are shown in Figure 5 and Figure 6.

4. Results and Discussion

4.1. Selection of Models for Comparison

Main models capable of multi-variable input include XGBoost, KNN Regression, BP Neural Network, and LSTM. The advantages and disadvantages of the above four algorithms are summarized in the table below.
ModelsAdvantagesDisadvantages
XGBoost(1) High prediction accuracy. The objective function is regularized to prevent overfitting, the second-order derivative is used when minimizing the objective function, enabling more accurate identification of the optimal solution, and shrinkage (learning rate) is applied. Reducing the learning rate increases the number of iterations, which acts as a regularization mechanism and improves model performance.
(2) Fast computation speed. Supports feature sampling to reduce overfitting, lower computational complexity, and accelerate parallelization, performs parallel optimization at the feature granularity, significantly reducing the computational load, allows specifying default branch directions for specific values or missing values, greatly enhancing algorithm efficiency.
(3) Low susceptibility to overfitting, low bias, and excellent generalization ability.
(4) Effective solution for high-dimensional data problems.
Numerous and complex parameters make parameter tuning difficult.
KNN(1) Simple concept, applicable to both classification and regression tasks;
(2) No assumptions about data, and insensitive to outliers;
(3) Relatively fast computation speed only needs to store training samples and labels, without the need for parameter estimation or model training.
(1) Low efficiency: It requires calculating all training data and test data, resulting in low computational efficiency when the data volume is large;
(2) High dependence on training data: When the samples are imbalanced, the prediction accuracy for rare classes is low;
(3) Poor performance in handling high-dimensional data.
BP Neural Networks(1) Possesses strong nonlinear mapping capability, making it suitable for solving problems with complex internal mechanisms. (2) High self-learning and adaptive capabilities. (3) Strong generalization ability.(1) The algorithm is prone to falling into local minima;
(2) Slow convergence speed and low algorithm efficiency;
(3) Prone to overfitting;
(4) Lack of a unified standard for network structure selection.
LSTM(1) Suitable for time-series data;
(2) Alleviates the gradient vanishing or exploding problem in long-sequence tasks.
(1) Suboptimal performance in handling long-sequence tasks;
(2) Slow computation speed and inability to support parallel processing.
The data in this study is divided into two categories: data used for modeling and residential POI data, ultimately input for prediction. The constructed model is required to have high generalization ability. Therefore, this study adopts the XGBoost algorithm for modeling, which not only achieves high prediction accuracy but also obtains excellent generalization ability.

4.2. Regression Model Validation

4.2.1. Model Statistical Analysis

The R-values for the number of residents in residential districts and urban villages are 0.678 and 0.560, respectively, both passing the significance test. The results indicate that as the number of residents corresponding to residential POIs increases, the model accuracy decreases. The R-values for the number of bus routes at stations and passenger flow volume are -0.690 and -0.795, respectively, also passing the significance test. These results show that with the increase in the number of bus routes at stations and passenger flow volume, the model accuracy improves. In contrast, residential district housing prices, the number of accessible bus routes, the number of accessible metro routes in residential districts, and the average distance between stations all fail to pass the significance test, exerting no significant impact on the model's MAPE.

4.2.2. Model Validity Analysis

Due to the lack of actual transit passenger residence data to compare with the projected transit passenger residence, this section focuses on verifying the accuracy of the total residential POI bus station ridership.
(1)
Residential POI bus station ridership regression model
① Improved model
Based on Equation (5), the modeling data were first divided into training set and test set, and the training set accounted for 80% of the data set, then the coefficients of the regression model were calculated by inputting the training set data, and finally the total number of passengers at residential POI bus stations were projected by inputting the test set data, and the mean absolute error (MAE), root mean square error (RMSE), and MAPE were calculated using the actual passenger flow at bus stations as the true value, respectively.
② Original model
Based on Equation (4), the accuracy of the original model is calculated by inputting the corresponding data and further compared with the results of the improved model.
(2)
XGBoost-based residential POI bus station ridership projection model
In the same way as ①, after completing model tuning using the training set data, we input the test set data to impute the total number of passengers at residential POI bus stations and calculate MAE, RMSE, and MAPE, respectively.
The comparison results of the above models are shown in Table 3. The MAE, RMSE, and MAPE of the improved model of residential POI bus station ridership regression are the smallest and have the highest accuracy. However, the MAPE reaches 72.024%, indicating that the model’s prediction accuracy is moderate. Nevertheless, considering the complexity of the problem, this model is deemed to have a certain reference value.
Further evaluating the optimal model, the comparison results between the predicted and true values of the training set and test set inputs are shown in Figure 7 and Figure 8. The upper and lower lines in the Figure indicate the error values of plus or minus 30% of the true values, and the more points falling within the error range indicate the higher accuracy of the model. As shown, the model prediction values are more concentrated, and the results are reasonable. The percentage of the error of the results within 30% is 28.83% and 27.12% for the training set and test set, respectively, and the accuracy of the model is acceptable.

4.2.3. Reachable Distance Validation

The reachable distance is additionally divided into three categories of 300 m, 700 m and 1000 m, and the corresponding data are input to calculate the model MAPE respectively, and the comparison results are shown in Figure 9. The results show that as the reachable distance increases, the model MAPE gradually increases and the model projection effect becomes worse. When the reachable distance is 500 m or more, the model effect changes more and is relatively worse, indicating that the maximum walking distance of bus passengers is within 500 m. Therefore, it is more reasonable to set 500 m as the reachable threshold in this paper.

4.3. Passenger Projection

4.3.1. Residency Projection

Based on the model coefficients in Section 4.2.3, the residential POI data are input to project the number of passengers from residential POI to each reachable bus station; further, to evaluate the accuracy of the projection, the total number of passengers at residential POI bus stations is verified. The specific values of residential POI to each reachable bus station ridership are shown in Table 4, and the results indicate that the number of urban village bus station ridership is greater than the number of community bus station ridership.
The MAE, RMSE, and MAPE of the total number of passengers at residential POI bus stations were calculated to be 155.490, 266.405, and 81.272%, respectively, using the actual passenger flow at the bus stations as the true value, and the results of the comparison between the predicted and true values are shown in Figure 10. The results show that the vast majority of points are concentrated on or near the three straight lines, which is also consistent with the MAPE of 81.272%. Compared with the results of the previously proposed regression-improved model, the MAPE calculated in this section is higher, while the MAE and RMSE show the opposite trend. This phenomenon can be attributed to the larger data volume input in the calculation of this section, which leads to slightly lower MAE and RMSE values.
Using the number of passengers from residential POI to each reachable bus station as the weight, the actual number of passengers at the bus station is assigned to each residential POI, and the specific values of the actual number of passengers at the residential bus station are shown in Table 4. As shown, after the passenger flow allocation, the average value of the community is higher than before the allocation, and the average value of the urban village is lower than before the allocation, considering that the reason is that the number of passengers at the community bus station before the allocation is a larger ratio of the total number of passengers at the bus station, and the results are relatively reasonable.
Further, in order to evaluate the actual number of residential bus passengers at the place of residence, the calculation is based on the actual number of residential passengers at the bus station at the place of residence. The results show that the average actual number of residential bus passengers is 132.59, the average value of the community is 105.01, and the maximum value is 2769; the average value of the urban village is 168.04, and the maximum value is 3602. The heat map of residential bus passengers’ distribution is shown in Figure 11, which shows that residential bus passengers are more distributed in downtown areas where public transit is more convenient. Compared with Figure 12 Residential POI number distribution heat map, observation shows that the number of residential POI in area ① and area ② is relatively close, but the number of residential bus passengers in area ② is much larger than that in area ①, considering the reason that area ① is located in the northwest of Baoan District, the residential POI is mainly in urban villages, and the willingness to travel by bus is weaker, while area ② is located in Longhua District, there are more subdivisions, and the willingness to travel by bus is stronger. The heat map of the residential bus passengers distribution with the reachability distance threshold set to 1000 m is further plotted, as shown in Figure 13 and compared with Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18, it is found that the distribution of public transport passengers in residential districts is relatively concentrated, with a generally high willingness to travel by public transport and a high level of transportation sustainability. In contrast, the distribution of public transport passengers in urban villages is relatively scattered, and there are fewer public transport passengers in the city center, resulting in a generally low willingness to travel by public transport and a low level of transportation sustainability.

4.3.2. Trip Ratio Projection

(1)
Ratio of trips to bus stations in residential areas
Based on the residence projection results, the ratio of trips to bus stations in the residence is projected according to Equation (7). Among them, the average value of the trip ratio of the community bus station is 0.0115, and the maximum value is 0.194; the average value of the trip ratio of the urban village bus station is 0.0073, and the maximum value is 0.187. To further verify the relationship between the number of bus station lines and the ratio of trips to bus stations in communities and urban villages, the ratio of trips to bus stations in communities and urban villages was divided into five intervals according to the quintile method, the mean values of the number of bus station routes in different zones were counted, as shown in Figure 14 and Figure 15.
(2)
Ratio of trips by bus in residential areas
Based on the residence projection results, the ratio of trips by bus in residential areas is calculated according to Equation (8). Statistically, the average ratio of trips by bus in residential areas is 0.0561, compared with the ratio of 0.0478 obtained by dividing the daily average of 838,900 morning peak transit trips by the resident population of 17,560,100 in Shenzhen, with an error of 17.61%, and the results are relatively reasonable. Among them, the average value of the community bus travel ratio is 0.0573, and the maximum value is 0.197; the average value of the urban village bus travel ratio is 0.0509, and the maximum value is 0.214. The frequency distribution of the ratio of bus trips in residential areas is shown in Figure 16, and the results show that the low ratio of bus trips is more common, which is consistent with the current situation of residents’ travel; the frequency distribution of the ratio of bus trips in communities and urban villages is shown in Figure 17 and Figure 18, and overall, the ratio of bus trips in communities is higher than the ratio of bus trips in urban villages. The thermal distribution of the ratio of bus trips in communities and urban villages is shown in Figure 19 and Figure 20. As shown, the communities with a high ratio of bus trips are mainly concentrated in the city center, and the urban villages with a high ratio of bus trips are outside the city center and are more scattered.
Further, to verify the relationship between the community housing price, the number of reachable bus lines, the number of reachable subway lines and the average distance to reachable subway stations and the community transit travel ratio, the neighborhood transit travel ratio was divided into five intervals according to the quintile method, and the mean values of housing price, the number of reachable bus lines, the number of reachable subway lines and the average distance to reachable subway stations in different intervals were counted, as shown in Figure 21, Figure 22, Figure 23 and Figure 24. The results show that the ratio of community bus trips and the above factors all show a linear relationship, and the results determined by the functional relationship are consistent; the ratio of community bus trips is relatively reasonable. Among the results, Figure 22 indicates that the number of accessible bus routes exhibits a negative linear correlation with the public transport travel ratio. This phenomenon is mainly attributed to the fact that areas with a higher density of bus routes are usually located in downtown areas, where metro stations are also relatively concentrated. Since passengers generally prefer metro travel, this leads to the aforementioned result.

5. Summary

In view of the current inconvenient and expensive data acquisition of bus passengers’ residences, this paper takes into account the factors influencing the proportion of bus stop-trips and puts forward the idea of “calculating the residence of bus passengers from the characteristics of housing POI housing price, bus, and station convenience”. The regression model of residential POI bus station passenger number and the projection model of residential POI bus station passenger number based on XGBoost are constructed. Meanwhile, for the city of Shenzhen, the proposed model and method were verified and evaluated by using the bus travel data, residential POI data, and subway lines and stations data. The results show that the regression model has the highest accuracy, and the calculated results are consistent with reality and relatively reasonable.
This study holds the following three application values:
(1)
Supplementing accurate OD information of public transport passengers to provide data reference for subsequent research
Compared with the information derived solely from public transport IC card data, the residential location information of public transport passengers estimated in this study is more accurate. It can serve as a supplement to the OD information of public transport passengers, thereby enabling a timely grasp of passengers’ travel demands and patterns, and providing a basis and reference for subsequent research.
(2)
Enhancing public transport passenger satisfaction and improving public transport service quality
By understanding the residential distribution of public transport passengers, operational strategies that meet passengers’ personalized needs can be formulated. Improving passenger satisfaction and establishing a public reputation for convenience and fulfillment among passengers are crucial to enhancing service quality and public transport competitiveness. They also constitute a key link in achieving the 81% green transport mode share target by 2025.
(3)
Laying the foundation for the overall optimization of the public transport network in the future
The application of the research findings helps accurately identify the actual demands of public transport passengers and provides data support for public transport administrative departments and operating enterprises to formulate rational capacity allocation plans and passenger transport organization schemes. It further delivers more targeted and higher-quality public transport services to citizens and contributes to the improvement of the sustainability level of urban transportation.
There are still some problems and inadequacies in this research. The following aspects are found to be further explored in the future:
(1)
The reachable distance in this paper is simply set as a radius of 500 m, which still has a certain deviation from the actual walking distance of passengers. In future research, more accurate walking distances can be used, so as to better match the actual situation of passengers.
(2)
For some websites with anti-crawler programs, it is impossible to crawl their data. As a result, there are still some errors between the residential POI data in this paper and the real situation. Therefore, the follow-up research can further study the principle of the crawler, to update and supplement residential POI data in real time, and to make the data more accurate.
(3)
The object of this study is mainly the residence of bus passengers in the morning rush hour. Subsequent studies can be combined with the bus travel data of other time periods, other types of POI data, such as enterprise POI, entertainment POI, and other travel mode data, such as by-subway and by-online-car, so as to expand the object of this study to the actual, accurate OD projection of urban residents.

Author Contributions

Conceptualization, L.Z. (Liang Zou), L.Z. (Lingxiang Zhu), and Q.X.; methodology, L.Z. (Liang Zou) and L.Z. (Lingxiang Zhu); software, Q.X.; validation, L.Z. (Lingxiang Zhu); formal analysis, L.Z. (Liang Zou); investigation, L.Z. (Liang Zou) and Q.X.; resources, L.Z. (Liang Zou); data curation, L.Z. (Lingxiang Zhu) and Q.X.; writing—original draft preparation, L.Z. (Lingxiang Zhu) and Q.X.; writing—review and editing, L.Z. (Liang Zou); visualization, L.Z. (Liang Zou) and Q.X.; supervision, L.Z. (Liang Zou); funding acquisition, L.Z. (Liang Zou) All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Shenzhen Science and Technology Plan Project (No.KJZD20230923115223047) and Shenzhen Higher Education Stable Support Plan Project(No.20231123103157001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Facing Declining Passenger Flow and Operational Difficulties, How Can Public Transport Enterprises Achieve Stable and Long-Term Development? People’s Daily. 29 September 2025. Available online: https://baijiahao.baidu.com/s?id=1844552207705471183&wfr=spider&for=pc (accessed on 1 October 2025).
  2. Prakasa, B.; Putra, D.W.; Kusumawardani, S.S.; Widhiyanto, B.T.Y.; Habibie, F. Big data Analytic for Estimation of Origin-Destination Matrix in Bus Rapid Transit System. In Proceedings of the 2017 3rd International Conference on Science and Technology-Computer (ICST), Yogyakarta, Indonesia, 11–12 July 2017; pp. 165–170. [Google Scholar]
  3. Takao, R.; Ikeuchi, N.; Suzuki, H.; Matsumoto, Y. A Proposal for OD Data Estimation System of Bus Users with Intelligent Video Analysis and Its Application to Synerex. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics (ICCE), Penghu, Taiwan, 15–17 September 2021; pp. 1–4. [Google Scholar]
  4. Zhang, W.S.; Lu, M.; Zhu, J.J.; Yan, T.; Duan, Z.N. OD Calculation of bus passenger flow based on IC card and AVL data. Comput. Appl. Softw. 2021, 38, 100–105. [Google Scholar]
  5. Liu, B.K. The defect and prospect of existing OD survey method. Shandong Jiaotong Keji 2016, 4, 109–110. [Google Scholar]
  6. Jang, Y.; Ku, D.; Lee, S. Pedestrian mode identification, classification and characterization by tracking mobile data. Transp. A Transp. Sci. 2021, 19, 2008044. [Google Scholar] [CrossRef]
  7. Shiva, R.; Mehdi, G.; Hashemi, S.M.; Nickabadi, A. A hybrid of Neuro-Fuzzy Inference System and Hidden Markov Model for Activity-Based Mobility Modeling of Cellphone Users. Comput. Commun. 2021, 173, 79–94. [Google Scholar]
  8. Sun, Z.; Liu, J.M.; Yan, N. Prediction of urban residents’ OD matrix based on mobile phone big data. Math. Pract. Theory 2019, 49, 68–77. [Google Scholar]
  9. Zhang, X.D.; Jia, L.P.; Deng, S.C.; Wang, X.; Zhou, Z. Study on the operation characteristics of taxi and ride-hailing in Xiamen constrained by GPS track data and POI data. J. Beijing Univ. Civ. Eng. 2021, 37, 60–68. [Google Scholar] [CrossRef]
  10. Peng, F.; Song, G.H.; Zhu, S. A method for extracting commuting trips of frequent passengers in urban public transportation. J. Transp. Syst. Eng. 2021, 21, 158–165+172. [Google Scholar]
  11. Tang, H.T.; Liu, Y.P.; Wu, Z.C. Analysis of spatial heterogeneity of influencing factors of housing price based on POI data: A case study of Changsha. Urban Probl. 2021, 95–103. [Google Scholar]
  12. Wang, W.; Attanucci, J.P.; Wilson, N. Bus Passenger Origin-Destination Estimation and Related Analyses Using Automated Data Collection Systems. J. Public Transp. 2011, 14, 131–150. [Google Scholar] [CrossRef]
  13. Vanderwaart, C.; Attanucci, J.P.; Salvucci, F.P. Applications of Inferred Origins-Destinations and Interchanges in Bus Service Planning. Transp. Res. Rec. 2017, 2652, 70–77. [Google Scholar] [CrossRef]
  14. Lan, C. Route-Level Transit Passenger Origin-Destination Trip Estimation from Automatic Passenger Counting Data: A Case Study in Edmonton. Master’s Thesis, University of Alberta, Edmonton, AB, Canada, 2015. [Google Scholar]
  15. Rodríguez González, A.B.; Vinagre Díaz, J.J.; Wilby, M.R. Detailed Origin-Destination Matrices of Bus Passengers Using Radio Frequency Identification. IEEE Intell. Transp. Syst. Mag. 2022, 14, 141–152. [Google Scholar]
  16. Ge, Q.; Fukuda, D. Updating origin-destination matrices with aggregated data of GPS traces. Transp. Res. Part C Emerg. Technol. 2016, 69, 291–312. [Google Scholar] [CrossRef]
  17. Zang, H.; Bolot, J. Anonymization of Location Data Does not Work: A Large-scale Measurement Study. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking, Las Vegas, NV, USA, 20–22 September 2011. [Google Scholar]
  18. Frias-Martinez, V.; Soguero, C.; Frias-Martinez, E. Estimation of Urban Commuting Patterns Using Cellphone Network Data. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, Beijing, China, 12 August 2012. [Google Scholar]
  19. Liu, F.L.; Li, N.; Tian, L.F.; Wang, Y. Analysis and research on travel demand of public transport based on metropolitan comparison. Highway 2020, 65, 230–237. [Google Scholar]
  20. Zhang, X.M.; Gong, D.; Xie, B.L.; Ma, H. A study of the effectiveness of epidemic prevention policies on public transit usage based on the theory of planned behaviors. J. Transp. Inf. Saf. 2021, 39, 117–125. [Google Scholar]
  21. Hu, Y.Y.; Pu, Z.; Wang, P. Study on the impacts of traffic carbon emission pricing on resident trip behavior using logit model. J. Transp. Eng. Inf. 2021, 39, 117–125. [Google Scholar]
  22. Liu, Y.F.; An, T.; Chien, S.I.J.; Guo, J. Exploring influence factors for travel mode choice in cities with different scales. China J. Highw. Transp. 2022, 35, 286–297. [Google Scholar]
  23. Wang, W.L.; Yu, H. Research on evaluation method of pedestrian reachability and convenience of rail transit stations. Tianjin Constr. Sci. Technol. 2019, 29, 67–73. [Google Scholar]
  24. Zhang, S.Y.; Yang, Y.; Chu, Y.H.; Chen, Z.W. Evaluation of Wuhan bus station satisfaction based on structural equation. Highw. Automot. Appl. 2020, 9, 29–32+36. [Google Scholar]
  25. Cui, N.N.; Gu, H.Y.; Shen, T.Y. Study on the Impact of Transportation Spatial Layout on Urban Housing Prices—Based on the Correlation Analysis between Road Network Morphology and Housing Prices in Beijing. Price Theory Pract. 2019, 63–66. [Google Scholar] [CrossRef]
  26. Kang, J.; Luo, J.J.; Xue, S.Y.; Guo, L.F.; Wei, M.X. An empirical study on the relationship between housing prices and residents’ income in Shanxi province. Sci. Technol. Inf. 2022, 20, 129–132. [Google Scholar]
  27. Ren, X.W.; Huang, P. A Test of the mediating effect of population urbanization on house price. West Forum Econ. Manag. 2021, 32, 58–66. [Google Scholar]
  28. Yu, X.; Wang, M.Y.; Dong, X.; Chen, X.; Lu, J.X. A study on migration tendency of floating population in urban villages: Taking the city of Xi’an as an Example. Mod. Urban Res. 2021, 8, 10–16. [Google Scholar]
  29. Yao, W.J.; Bai, L.S. Implementation-oriented urban village traffic management optimization measures-Shenzhen city as an example. Traffic Transp. 2021, 34, 197–200. [Google Scholar]
  30. Hao, Q.T.; Wang, H.Y.; Hao, J.J. Study on influencing factors of community residents’ housing satisfaction-take Yijing community in Dazhou Sichuan province as an example. Jiangxi Build. Mater. 2022, 7, 322–325+328. [Google Scholar]
  31. Liu, Y.B.; Li, X.; Li, A.X. Countermeasures for the treatment and promotion of traffic congestion and hidden dangers in “villages within city”-taking Buji Changlong Area of Shenzhen city as an example. Traffic Transp. 2021, 34, 201–205. [Google Scholar]
  32. Wang, J.Q.; Zhan, Y.T.; Li, S.J. Analysis of the relationship between traffic congestion and population density and car ownership in surrounding communities. Auto Time 2020, 9, 32–33. [Google Scholar]
  33. Xia, H.B.; Dai, X.Y.; Wang, Y.; Wang, Z. The analysis of traffic convenience on county level based on GIS. Areal Res. Dev. 2006, 25, 120–124+130. [Google Scholar]
  34. Qi, W.F.; Zhang, J.J. Evaluation of Subway Station Convenience Based on Walking Living Circles—A Case Study of Hangzhou Metro Line 1. Archit. Cult. 2022, 136–138. [Google Scholar] [CrossRef]
  35. Xie, G.W.; Qian, L.B.; Pang, Y. Study on Public Transport Accessibility Measurement Based on GIS and Open Data. Logist. Technol. 2021, 44, 102–106. [Google Scholar]
  36. Gan, L.L.; Feng, X.H.; Bi, J.L.; Jiang, H.L. Study on Strength Prediction of High-Strength Concrete Based on Multivariate Nonlinear Regression Model. Concr. Cem. Prod. 2022, 1–7. [Google Scholar] [CrossRef]
  37. Fofanah, A.J.; Kalokoh, I.; Hwase, K.T.; Namagonya, A.P. Adaptive Neuro-Fuzzy Inference System with Non-Linear Regression Model for Online Learning Framework. Int. J. Sci. Eng. Res. 2020, 11, 375–391. [Google Scholar] [CrossRef]
  38. Wei, M.Y.; Li, L.L.; Huang, G.; Tang, F.; Zhang, Z. Deep learning in EEG decoding: A review. Chin. J. Biomed. Eng. 2019, 38, 464–472. [Google Scholar]
  39. Li, L.M.; Hou, M.M.; Chen, K. A Review of Research on the Interpretability of Deep Learning. Comput. Appl. 2022, 42, 3639–3650. [Google Scholar]
Figure 1. Time distribution of passenger flow.
Figure 1. Time distribution of passenger flow.
Sustainability 18 00041 g001
Figure 2. Thermal spatial distribution diagram of passenger flow.
Figure 2. Thermal spatial distribution diagram of passenger flow.
Sustainability 18 00041 g002
Figure 3. Scatter diagram of community distribution.
Figure 3. Scatter diagram of community distribution.
Sustainability 18 00041 g003
Figure 4. Scatter diagram of urban village distribution.
Figure 4. Scatter diagram of urban village distribution.
Sustainability 18 00041 g004
Figure 5. Scatter diagram of bus stations distribution.
Figure 5. Scatter diagram of bus stations distribution.
Sustainability 18 00041 g005
Figure 6. Scatter diagram of subway stations distribution.
Figure 6. Scatter diagram of subway stations distribution.
Sustainability 18 00041 g006
Figure 7. Improved regression model of residential POI bus station passenger training set.
Figure 7. Improved regression model of residential POI bus station passenger training set.
Sustainability 18 00041 g007
Figure 8. Improved regression model of residential POI bus station passenger test set.
Figure 8. Improved regression model of residential POI bus station passenger test set.
Sustainability 18 00041 g008
Figure 9. MAPE Comparison of different reachable walking distance.
Figure 9. MAPE Comparison of different reachable walking distance.
Sustainability 18 00041 g009
Figure 10. Comparison between residential POI bus station total passengers and bus station actual passenger flow.
Figure 10. Comparison between residential POI bus station total passengers and bus station actual passenger flow.
Sustainability 18 00041 g010
Figure 11. Thermal distribution diagram of bus passenger residence.
Figure 11. Thermal distribution diagram of bus passenger residence.
Sustainability 18 00041 g011
Figure 12. Thermal distribution diagram of residential POI population.
Figure 12. Thermal distribution diagram of residential POI population.
Sustainability 18 00041 g012
Figure 13. Thermal distribution diagram of bus passenger residence (1000 M).
Figure 13. Thermal distribution diagram of bus passenger residence (1000 M).
Sustainability 18 00041 g013
Figure 14. Line chart of community station trip ratio and bus station lines.
Figure 14. Line chart of community station trip ratio and bus station lines.
Sustainability 18 00041 g014
Figure 15. Curving diagram of urban village station trip ratio and bus station lines.
Figure 15. Curving diagram of urban village station trip ratio and bus station lines.
Sustainability 18 00041 g015
Figure 16. Frequency distribution of residence trip ratio.
Figure 16. Frequency distribution of residence trip ratio.
Sustainability 18 00041 g016
Figure 17. Frequency distribution of community trip ratio.
Figure 17. Frequency distribution of community trip ratio.
Sustainability 18 00041 g017
Figure 18. Frequency distribution of urban village trip ratio.
Figure 18. Frequency distribution of urban village trip ratio.
Sustainability 18 00041 g018
Figure 19. Thermal distribution diagram of community trip ratio.
Figure 19. Thermal distribution diagram of community trip ratio.
Sustainability 18 00041 g019
Figure 20. Thermal distribution diagram of urban village trip ratio.
Figure 20. Thermal distribution diagram of urban village trip ratio.
Sustainability 18 00041 g020
Figure 21. Line chart of community trip ratio and house price.
Figure 21. Line chart of community trip ratio and house price.
Sustainability 18 00041 g021
Figure 22. Line chart of community trip ratio and reachable bus lines.
Figure 22. Line chart of community trip ratio and reachable bus lines.
Sustainability 18 00041 g022
Figure 23. Line chart of community trip ratio and reachable subway lines.
Figure 23. Line chart of community trip ratio and reachable subway lines.
Sustainability 18 00041 g023
Figure 24. Line chart of community trip ratio and average distance of reachable subway station.
Figure 24. Line chart of community trip ratio and average distance of reachable subway station.
Sustainability 18 00041 g024
Table 1. Indicators of Influencing Factors.
Table 1. Indicators of Influencing Factors.
Factor CategoriesInfluencing FactorsIndicators Designing Way
Bus Travel FactorsIncome per inhabitant in the place of residenceResidential POI Home PricesIndirect
Age per inhabitant in the place of residenceResidential POI Type
(community, urban village)
Indirect
Vehicle ownership per inhabitant in the place of residenceIndirect
Occupational share of residents in the place of residence————Excluded
The convenience of the bus in the place of residenceNumber of residential POI reachable bus routesDirect
Number of residential POI reachable by bus stations
The convenience of the subway in the place of residenceNumber of residential POI reachable by subway linesDirect
Number of residential POI reachable subway stations, average distance
Bus Station Choosing Factors Convenience of bus stationsNumber of bus routesDirect
Number of residential POI reachable bus stations distance
Table 2. Fitting Results of Each Functional Form for Influencing Factors.
Table 2. Fitting Results of Each Functional Form for Influencing Factors.
Factor CategoriesInfluencing FactorsLinear FunctionPower FunctionExponential Function
community-related influencing factorsHousing price103.344104.228105.024
Number of accessible bus routes84.50188.265104.893
Number of accessible metro lines31.35744.35736.604
Average distance to accessible metro stations83.35094.21984.363
Number of bus routes per bus stop43.66151.03648.195
urban village-related influencing factorsNumber of bus routes per bus stop35.47624.184738.805
Table 3. Comparison of models.
Table 3. Comparison of models.
Model ResultsMAERMSEMAPE
Improved residential POI bus station ridership regression model161.027288.06972.024%
Original Residential POI bus station ridership regression model180.642311.052113.531%
XGBoost-based residential POI bus station ridership projection model167.124295.86882.472%
Table 4. Residential POI and residence bus station passenger.
Table 4. Residential POI and residence bus station passenger.
TypeThe Number of Passengers from Residential POI to Reachable Bus StationsThe Actual Number of Passengers in the Residential Bus Station
Values
Overall average21.4823.07
Community average19.3422.29
Urban villages average23.7223.64
Community maximum8171327
Urban villages maximum10881720
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, L.; Xuan, Q.; Zou, L. Estimation of Bus Passengers’ Residential Locations Based on Morning Rush Hour Travel Data and POI Information. Sustainability 2026, 18, 41. https://doi.org/10.3390/su18010041

AMA Style

Zhu L, Xuan Q, Zou L. Estimation of Bus Passengers’ Residential Locations Based on Morning Rush Hour Travel Data and POI Information. Sustainability. 2026; 18(1):41. https://doi.org/10.3390/su18010041

Chicago/Turabian Style

Zhu, Lingxiang, Qipeng Xuan, and Liang Zou. 2026. "Estimation of Bus Passengers’ Residential Locations Based on Morning Rush Hour Travel Data and POI Information" Sustainability 18, no. 1: 41. https://doi.org/10.3390/su18010041

APA Style

Zhu, L., Xuan, Q., & Zou, L. (2026). Estimation of Bus Passengers’ Residential Locations Based on Morning Rush Hour Travel Data and POI Information. Sustainability, 18(1), 41. https://doi.org/10.3390/su18010041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop