How Does the Built Environment Affect Intermodal Demand Between Bus and Metro: An Ensemble Explainable Machine Learning Analysis

Zhang, Hui; Qu, Ke

doi:10.3390/ijgi15060269

Open AccessArticle

How Does the Built Environment Affect Intermodal Demand Between Bus and Metro: An Ensemble Explainable Machine Learning Analysis

by

Hui Zhang

^*

and

Ke Qu

School of Transportation Engineering, Shandong Jianzhu University, Jinan 250101, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(6), 269; https://doi.org/10.3390/ijgi15060269 (registering DOI)

Submission received: 20 April 2026 / Revised: 3 June 2026 / Accepted: 9 June 2026 / Published: 15 June 2026

Download

Browse Figures

Versions Notes

Abstract

The integrated usage of metro and bus services plays a key role in long-distance trips in big cities. Revealing the nonlinear relationship between the intermodal transfer demand and the built environment is significant for building a sustainable public transport system. This paper proposes a stacking ensemble explainable machine learning framework, which uses meta-learner to learn the prediction results of diverse base learners to improve performance, to detect how the impact factors impact the intermodal demand, including metro-to-bus and bus-to-metro directions. In this framework, the ensemble model is the stacking model; the ridge regression model is the second model. The base learners contain tree-based models (e.g., Random Forest, XGBoost and CatBoost) and non-tree-based models (e.g., SVR and KNN). The framework is applied to the case study of Beijing, China, based on one weekday (13 May 2019) and one weekend day (18 May 2019) of smart card data covering the main urban districts within the Sixth Ring Road. The results indicate that the stacking ensemble learning model outperforms the base learning models. For the metro-to-bus direction, transfer time, bus station count, and degree centrality are the top three influential factors; for the bus-to-metro direction, transfer time, bus station count, and shopping POI count are the top three, with lower predictive performance due to greater variability in this direction. However, the interaction effect of transfer time and bus station count is negative. This study could provide new insights into public transport planning and management.

Keywords:

stacking ensemble learning; intermodal demand; metro and bus; built environment

1. Introduction

In megacities, metro systems have become a dominant backbone in the daily travel of people as metro networks and urban sprawl have expanded. However, individuals often need to transfer to one or more other travel modes, such as buses, bike sharing and taxis, to accomplish their trips due to the limited coverage and accessibility of metro systems. Among these feeder modes, buses serve a dominant role by providing an affordable and effective way to cater to travel demand [1,2]. In reality, intermodal transfers between bus and metro systems are significant in long-distance travel, especially for younger people [3].

Transfers are a double-edged sword for passengers. On one hand, transfers promote the combination of different travel modes, which enables individuals to travel for long trips. Studies indicate that 30–80% of trips require at least one transfer, relying on network conditions [4]. On the other hand, transfers are viewed as a key factor impacting the attractiveness of public transportation due to inconvenience and added time [5,6]. Survey data have advantages in understanding the route choices of travelers [7], while automatic collected data are superior for detecting the spatiotemporal characteristics of intermodal transfers [8,9]. Intermodal transfers include metro-to-bus and bus-to-metro transfers. Metro-to-bus transfers have been widely studied, but researchers have always fallen short in exploring bus-to-metro characteristics due to the lack of bus alighting information under the single-swipe card model [10]. Identifying intermodal transfers in the travel chain of passengers is the core problem when studying transfer behaviors. A classic method to infer transfers between metro and bus services is using spatial and temporal thresholds [9].

Typically, intermodal transfer behaviors are affected by many factors, such as transfer time, comfort, convenience, weather, etc. [11,12,13,14]. Gu et al. pointed out that the interchange discount policy between metro and bus services had a positive impact on interchange behaviors [15]. Besides that, researchers did a lot of work to explore the nonlinear relationship between the built environment and intermodal transfers using the ordinary least squares (OLS) model, the geographically weighted regression (GWR) model and its variants (MGWR and GTWR) [9,10,16]. In recent years, machine learning methods have become prevalent for revealing the determinants of transfer factors, such as GBDT and XGBoost [17,18]. Compared to the traditional methods, machine learning methods always show better performance.

Although the spatiotemporal characteristics and determinants of metro–bus transfers have been studied, there exist some research gaps. Firstly, the intermodal transfers between metro and bus services play a dominant role in cities; however, they receive limited attention compared to other metro-related intermodal transfers such as bike-sharing and taxis [19,20]. Secondly, most studies of intermodal transfers between metro and bus services focus on metro-to-bus transfers, neglecting bus-to-metro transfers due to the shortage of key alighting information. To bridge these gaps, this paper proposes a stacking ensemble machine learning framework, containing GBDT, XGBoost, Bayesian Ridge, etc., to explore the nonlinear relationship between metro-to-bus and bus-to-metro transfer ridership. We intend to answer the following research questions: (1) Are there any differentially spatiotemporal characteristics between metro-to-bus transfer ridership and bus-to-metro transfer ridership? (2) How do built environment factors impact intermodal ridership between metro and bus services?

This paper is organized as follows. Section 2 gives the literature review. Section 3 introduces the methodology. The data description is shown in Section 4. Section 5 gives the results. Section 6 is the discussion. Section 7 concludes this paper.

2. Literature Review

2.1. Transfer Behavior Studies

Traditional studies on transfers between different travel modes are mostly built on survey data [21,22,23]. Survey data can provide diverse information, but it is a time-consuming and expensive process. Owing to the development of information technology, travelers’ origins and destinations can be tracked based on smart card data or global positioning system (GPS) data, which helps with exploring travel chains. A key problem in constructing the travel chains of travelers is inferring the transfers between different trips. The common method is based on the time threshold and spatial threshold.

The time threshold of 30 min criterion is extensively used to identify transfers of transit passengers [24,25]. Some researchers used other time thresholds, such as 50 min [9]. Seaborn et al. recommended transfer time thresholds for different transfer modes: 20 min for metro-to-bus, 35 min for bus-to-metro and 45 min for bus-to-bus respectively [26]. Liu et al. used the time thresholds to extract trip chain as follows: 21 min for metro–bus, 15 min for bus–metro and 25 min for bus–bus respectively [27]. As for the spatial threshold, Wang et al. estimated the 75th-percentile walking distances to a metro station as 494 m and 712 m in the downtown area and the suburban area, respectively [1]. Li et al. used 800 m as an acceptable walking distance to metro stations [9]. Yadav et al. used 400 m and 800 m as the maximum walking distances to a bus station and a metro station [28].

Generally, the intermodal transfer characteristics are complicated and time-varying due to many influencing factors such as weather, transit network structure, commuting patterns and personal preferences. Eltved et al. found that the walking time from bus stations to the rail platform varies due to the walking speed of travelers and activities during transfers [6]. Wu et al. pointed out that the intermodal transfer ratio is affected by weather and that it will significantly increase under high temperature, strong wind, rainfall and low visibility [29]. Yang et al. studied the threshold range of walking, biking, bus and park-and-ride to determine metro catchment, and found that the bus–metro transfer distance can be extended to 6 km in the suburbs [30]. Gu et al. studied the effect of interchange discount in Suzhou, China, and found that it has positive impact on interchange behaviors [15]. Chen et al. revealed the deviation of “nearest station” in bus–rail intermodal trips, in which only 40% of riders select the nearest rail station [31]. Shi claimed that a metro-to-bus transfer service deficiency exists across the metropolitan area of Shanghai and varies spatially and temporally [32]. Zhang et al. studied the structural pattern of commuting trips by bus and metro, and found greater spatial similarity between bus and metro trips among prolonged commuters (40 min–60 min), indicating a potential mismatch between workplaces and residences [33]. The transfers between metro and bus are commonly regarded as virtual edges when constructing a multilayer network to detect important nodes or analyze the resilience of transit systems [34,35]. Cheng and Tseng provided suggestions to enhance the metro–bus transfer usage, containing timetable coordination between bus and metro, passenger guidance information, waiting information and low-floor bus service [22].

Although previous studies have provided valuable insights into transfer identification methods and transfer behavior characteristics, the reported thresholds and influencing factors vary considerably across different cities and datasets, indicating the complexity and heterogeneity of intermodal transfer behavior.

2.2. Built Environment Effects

In the recent literature, there is a large body of studies exploring the associations between intermodal transfer behavior and the built environment. Chen et al. claimed that the distance to the nearest metro station ranks highest among the impact factors, followed by bus route and land-use mix [36]. Li et al. indicated that the number of bus stations and routes have a more evidently positive effect on transfer ridership, while there is a negative correlation between the density of non-motorway lanes and transfer ridership [9]. Wu et al. stated that the transfer-related variables, real-time weather, socioeconomic characteristics and demographic factors play a crucial role in transfer ridership, while weather variables have little impact on transfer ridership on weekdays [37]. Shi and Zeng demonstrated that the associations between built environment and different metro station types vary significantly, and the transfer ridership at employment-oriented stations shows a positive relationship with bus route and distance to center while land-use diversity and intersection show negative impact [16].

These studies demonstrate that the built environment plays an important role in shaping intermodal transfer demand. However, different studies often identify different dominant factors and even inconsistent effects, suggesting that the impacts of built environment variables may be highly nonlinear and context-dependent. Traditional statistical approaches are effective in identifying general relationships, but their ability to capture complex nonlinear effects and interactions among variables remains limited.

2.3. Machine Learning Approaches and Research Gaps

To better capture nonlinear relationships, machine learning methods have been increasingly applied in transfer demand studies. Liu et al. used the extreme gradient-boosting decision-trees (XGBoost) model to examine the nonlinear effects of the built environment on bus–metro-transfer ridership, and found that the bus-network density plays the most influential role on transfer ridership [18]. Compared with traditional statistical models, machine learning approaches are generally more capable of handling nonlinear relationships and high-dimensional variables.

Despite the growing body of research on metro–bus transfers, several gaps remain. Existing studies have paid limited attention to the nonlinear interaction mechanisms among built environment variables, while most machine learning applications rely on a single algorithm and may not fully exploit the complementary strengths of different models. In addition, previous research has mainly focused on metro-to-bus transfers, with comparatively less attention given to bus-to-metro transfers due to the lack of bus alighting information. To address these gaps, this study develops a stacking ensemble learning framework to improve predictive performance and investigates both metro-to-bus and bus-to-metro transfer demand. Furthermore, SHAP is employed to reveal the nonlinear effects and interactions of built environment factors.

3. Methodology

This paper proposes a stacking machine learning framework to detect the nonlinear relationship between the intermodal demand and the built environment. To enhance the explanations, the SHapley Additive exPlanations (SHAP) model was adopted in the framework. The intermodal demand is divided into metro-to-bus demand and bus-to-metro demand, which are extracted from smart card data using spatial and temporal thresholds. Due to differences in the numbers of valid transfer observations, separate stacking models are developed for the two transfer directions, and the resulting directional comparisons should be interpreted with appropriate caution. The built environment around intermodal metro/bus stations within the buffer areas is considered as the set of independent variables, which contains 5D factors. The framework is shown in Figure 1.

3.1. Stacking Ensemble Learning

The ensemble learning approach constructs a predictive system that is more powerful than any single model by combining multiple learners [38]. Among various ensemble approaches, stacking represents a typical heterogeneous hierarchical architecture, whose performance highly depends on the composition of base learners, the number of models, and the design of the meta-learner. Accordingly, this study develops a feedback-optimized out-of-fold (OOF) stacking ensemble framework (Figure 1). Under strict data partition constraints, multiple rounds of experimental evaluation are conducted to identify base learner combinations that enhance predictive performance, followed by further optimization of the meta-learner structure.

(1) Base learner candidate pool and generation of out-of-fold predictions

To fully capture the nonlinear relationships between transfer demand and built environment factors, a candidate pool of base learners is first constructed, including multiple tree-based and non-tree-based models. This pool covers diverse model structures and learning mechanisms, thereby improving the diversity and robustness of the ensemble framework.

During the stacking procedure, the dataset is divided into a training set (60%), a validation set (20%), and a test set (20%). The training set is used for base learner training and the generation of out-of-fold (OOF) predictions, and the validation set is employed for evaluating model combinations and subsequent ensemble comparisons. The test set is reserved exclusively for final generalization assessment. For the training samples, K-fold cross-validation is implemented to train each base learner. In each fold, predictions are generated for the held-out subset that is not involved in model training, thereby constructing the OOF meta-feature matrix for the training set. This process effectively prevents information leakage and ensures that the meta-learner is trained solely on predictions derived from unseen samples.

(2) Construction of meta-features for the three data subsets

After generating the OOF predictions for the training set, each base learner is refitted on the full training data and subsequently used to produce predictions for the validation and test sets. For the validation set, each base learner generates a single prediction output to construct the corresponding meta-feature matrix. For the test set, the predictions obtained from the K-fold training process are averaged to form a stable meta-feature representation. As a result, the training, validation, and test sets are mapped at the base-learner level into three structurally consistent meta-feature spaces, while maintaining strict independence across all stages. Let M denote the final number of selected base learners and N the number of samples. The corresponding meta-feature matrix can therefore be expressed as:

Z = [\begin{matrix} {\hat{y}}_{1}^{(1)} & {\hat{y}}_{2}^{(1)} & \dots & {\hat{y}}_{M}^{(1)} \\ {\hat{y}}_{1}^{(2)} & {\hat{y}}_{2}^{(2)} & \dots & {\hat{y}}_{M}^{(2)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\hat{y}}_{1}^{(N)} & {\hat{y}}_{2}^{(N)} & \dots & {\hat{y}}_{M}^{(N)} \end{matrix}]

(1)

where

{\hat{y}}_{i}^{(k)}

denotes the prediction generated by the

i - t h

base learner for sample

k

.

(3) Meta-learner fusion and performance evaluation

Within the meta-feature space, a meta-learner is further constructed to perform secondary fusion of the predictions generated by multiple base learners. To this end, a candidate pool of meta-learners is predefined, and the optimal meta-learner structure and its associated parameters are selected through cross-validation. The meta-learner is trained exclusively on the training meta-features and evaluated on the validation meta-features, thereby reducing the risk of overfitting.

{\hat{y}}_{m e t a} = g ({\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{M})

(2)

where

g (\cdot)

denotes the regression mapping function of the meta-learner, and

{\hat{y}}_{i}

represents the outputs of the base learners.

After determining the final meta-learner, the stacking ensemble model produces the final predictions on the test-set meta-features. To comprehensively assess the effectiveness of the proposed approach, a weighted-average ensemble is also constructed as a benchmark. Model performance is then compared on the same test set using the R², RMSE, and MAE metrics, enabling a systematic evaluation of predictive differences among alternative modeling strategies.

R^{2} = 1 - \frac{\sum {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}}

(3)

R M S E = \sqrt{\frac{1}{n} \sum {(y_{i} - {\hat{y}}_{i})}^{2}}

(4)

M A E = \frac{1}{n} \sum ‖y_{i} - {\hat{y}}_{i}‖

(5)

3.2. Individual Machine Learning Models

(1) Tree-Based Machine Learning Methods

This study incorporates several tree-based models as base learners, including Random Forest, ExtraTrees, AdaBoost, GBDT, XGBoost, LightGBM, and CatBoost. Tree-based models recursively partition the feature space to construct piecewise constant functions, which can be expressed as

\hat{y} (x) = \sum_{m} c_{m} I (x \in R_{m})

(6)

where

m

denotes the leaf node,

R_{m}

represents the

m - t h

region of the feature space,

c_{m}

is the predicted constant associated with that region, and

I

is the indicator function. This equation shows how a tree-based model makes predictions—it divides the data into different regions and assigns a constant value to each region. The final prediction is the sum of these values from the relevant regions.

Without requiring a predefined functional form, tree-based models are capable of capturing nonlinearities, threshold effects, and interaction effects between built environment characteristics within station buffers and transfer demand. Bagging-based models emphasize variance reduction and robustness improvement, whereas boosting-based models iteratively fit residuals to enhance predictive accuracy and capture complex structures. Tree-based approaches exhibit strong advantages in handling high-dimensional data, multicollinearity, and outliers, making them well suited for modeling the combined effects of population density, land-use mix, accessibility, and related factors on transfer behavior.

(2) Non-tree Machine Learning Methods

To address the limitations of tree-based models in representing smooth functional relationships, several non-tree models were incorporated as base learners. The multilayer perceptron (MLP) was employed to capture nonlinear coupling among built environment indicators; support vector regression (SVR) exhibited robustness under limited sample sizes and pronounced local variations; k-nearest neighbors regression (KNN) facilitated the identification of local similarity patterns in the feature space; and Bayesian Ridge regression provided a stable linear benchmark with interpretability.

At the second learning stage of the stacking ensemble, RidgeCV, LassoCV, and ElasticNetCV were selected as candidate meta-learners. RidgeCV mitigated multicollinearity among meta-features through L2 regularization and automatically selected the optimal regularization strength via cross-validation over the parameter set

α \in \{0.1, 1, 10\}

. LassoCV and ElasticNetCV achieved coefficient sparsity and a balance between sparsity and stability through L1 regularization and combined L1–L2 constraints, respectively, and were used for comparative selection of the meta-learner structure.

3.3. SHapley Additive exPlanations (SHAP) Model

The SHAP model plays a key role in quantifying the contributions of each feature in machine learning models, which has been extensively applied in many research areas [39,40,41]. To interpret the predictive mechanism of the stacking ensemble model, a Stacking-SHAP interpretability approach consistent with the model structure is adopted. The SHAP method originates from the Shapley value theory in cooperative game theory. Its core idea is to quantify the contribution of a feature to the model prediction by calculating its weighted marginal contribution across all possible feature coalitions. For a given feature

i

, the classical SHAP value is defined as

ϕ_{i} = \sum_{S \subseteq F \ \{i\}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} [f_{S \cup \{i\}} (x_{S \cup \{i\}}) - f_{S} (x_{S})]

(7)

where

F

denotes the set of all features,

S

represents any subset of features excluding feature

i

, and

f_{S \cup \{i\}}

and

f_{S}

denote the prediction functions with and without feature

i

, respectively.

x_{S}

represents the input feature values corresponding to subset

S

. In simple terms, this formulation evaluates how much a feature changes the model prediction and uses this information to quantify its overall importance.

Unlike the conventional SHAP computation for single models, the SHAP values in the stacking framework are derived from the final ensemble prediction, reflecting the aggregated marginal contributions of each feature after multi-model fusion. Assume that the stacking model consists of

M

base learners

f_{m} (x)

, and the meta-learner is specified as a linear model with the following prediction form

\hat{y} = \sum_{m = 1}^{M} w_{m} f_{m} (x)

(8)

where

w_{m}

denotes the weight coefficient learned by the meta-learner. To ensure comparability across base learners, the weights are normalized by their absolute values.

Based on this formulation, the main SHAP effect of the stacking model is defined as the weighted summation of the SHAP values from all base learners

ϕ_{i j}^{s t a c k} = \sum_{m = 1}^{M} {\tilde{w}}_{m} \cdot ϕ_{i j}^{(m)}

(9)

where

ϕ_{i j}^{(m)}

denotes the SHAP value of the

j - t h

feature for sample

i

in the

m - t h

base learner, and

{\tilde{w}}_{m}

represents the normalized weight. Essentially, the final feature importance is obtained by averaging the contributions from all base learners using their learned weights.

For feature interaction effects, SHAP interaction values are computed only for tree-based models that support interaction decomposition. The overall interaction effect of the stacking model is defined as the weighted summation of the interaction SHAP values from all tree-based learners

ϕ_{i j k}^{s t a c k} = \sum_{m \in Τ} {\tilde{w}}_{m} \cdot ϕ_{i j k}^{(m)}

(10)

where

Τ

denotes the set of all tree-based models, and

j

,

k

represent the two interacting features. This formulation measures how two features jointly influence the prediction by aggregating their interaction effects across all tree-based learners. Non-tree models contribute only to the main effect calculation and are not included in the interaction effect analysis. This is because exact SHAP interaction values are currently available only for tree-based models through TreeExplainer, whereas interaction effects for non-tree learners can only be estimated using model-agnostic approximation methods, which may introduce additional uncertainty. Moreover, the tree-based learners consistently achieved substantially higher predictive performance than the non-tree learners. Therefore, the aggregated interaction effects are expected to capture the dominant interaction mechanisms learned by the stacking framework.

4. Data Description

4.1. Study Area

Beijing is one of the most developed megacities in China in terms of public transportation, having established an integrated public transport system with rail transit as the backbone and conventional bus services as the primary feeder mode. During the study period, the metro network consisted of 27 operating lines with a total length exceeding 800 km, carrying more than 10 million passenger trips per day on average. The bus system operated over 1200 regular routes with nearly 30,000 vehicles, serving approximately 8 million passenger trips daily. Transfers between rail transit and bus systems constitute a critical component of residents’ daily travel. The study area is delineated based on the spatial distribution of bus and metro stations where actual transfer activities occur. It generally covers the main urban districts within the Sixth Ring Road of Beijing, as well as selected peripheral extension areas. The spatial extent of the study area is shown in Figure 2.

The smart card data were collected from 13 May 2019 to 19 May 2019. The one-week period was selected because it represents normal urban mobility conditions in Beijing, without major holidays, school breaks, or extreme weather events. A typical weekday (13 May) and weekend day (18 May) were used to study the transfer behavior. Since 28 December 2014, passengers must swipe the card when they board and alight from the bus. Therefore, the smart card data contain the full travel information of passengers, including boarding time, boarding station, alighting time and alighting station. To extract the transfer information between the metro and bus, the temporal threshold (30 min) and spatial threshold (800 m) were used in this paper according to the former study. These thresholds are appropriate for Beijing specifically: the 30 min threshold corresponds to the 95th percentile of observed transfer time differences in Beijing smart card data, and the 800 m threshold aligns with the official definition of metro station pedestrian catchment areas in the Beijing Transport Annual Report [42].

In this paper, we use the POI data, road network data, population data and socioeconomic data from 2019. The data are aggregated around each metro/bus station within a 1 km buffer as the independent variables. To avoid buffer overlap and ensure the uniqueness of spatial units, the Thiessen polygon method is further employed to clip and merge the buffers, generating a station catchment area (SCA).

4.2. Variables

(1) Dependent variables

The dependent variables are the intermodal transfer ridership, including metro-to-bus ridership and bus-to-metro ridership.

(2) Independent variables

To characterize the built environment surrounding public transport stations, this study constructs a system of built environment independent variables based on the SCA spatial analysis units. Variable selection follows the widely adopted “5D” framework [43] in built environment and travel behavior research and is further extended to account for the public transport transfer context, so as to comprehensively reflect the spatial conditions underlying transfer activities. In total, 26 built environment variables were selected, encompassing population and economic density, land-use diversity, transport facility provision, locational accessibility, and bus network structural characteristics. Specifically, the independent variables include population density, GDP, counts of various POI categories and their corresponding entropy indices, number of bus and metro stations and routes, road density, distance to the city center (DCC), as well as degree centrality (DC) and closeness centrality (CC) of bus stops within the local bus network. Taking the metro-to-bus direction on a weekday (Monday) as an example, descriptive statistics of the built environment variables are summarized in Table 1. For a small number of SCA units with missing attribute information, spatial interpolation was applied to ensure spatial continuity and data completeness, thereby avoiding sample size reduction or spatial distribution bias caused by missing values.

Before model estimation, all built environment independent variables were subjected to systematic preprocessing diagnostics, with the corresponding statistical results summarized in Table 2. The results of the Moran’s I tests indicate that most independent variables exhibit significant spatial autocorrelation at the SCA scale, whereas closeness centrality (CC) shows relatively weak spatial autocorrelation, reflecting its network-structural property. Variance Inflation Factor (VIF) values are all below 10, suggesting that no severe multicollinearity is present among the independent variables.

Pearson correlation analysis in Figure 3 further reveals the linear structural relationships among variables. The lower triangular heatmap shows that most pairwise correlation coefficients fall within low to moderate levels. Only certain functional facility indicators (e.g., commercial, life service, and restaurant facilities) demonstrate moderate positive correlations, reflecting spatial synergy among urban functional elements without forming highly overlapping structures, which is consistent with the VIF results. The correlation network graph visually illustrates the linear relationships between explanatory variables and the target variable (transfer volume). Red arcs denote positive correlations, blue arcs indicate negative correlations, line thickness represents the strength of correlation, and line style distinguishes statistical significance. Transfer time exhibits the strongest negative association with transfer demand, as indicated by the thickest blue arc, whereas metro station count demonstrates the strongest positive association, represented by the thickest red arc. Both relationships are connected by solid lines, indicating that the corresponding correlations are statistically significant. Overall, most built environment variables exhibit relatively weak correlations with transfer volume, with both positive and negative directions observed. This suggests that transfer activity is not driven by a single factor but results from the combined effects of multiple built environment dimensions. The weak linear characteristics further imply that traditional linear models may be insufficient to fully capture the complex mechanisms underlying these relationships, thereby providing methodological justification for the subsequent application of nonlinear ensemble models and SHAP-based interpretability analysis.

Although several built-environment variables exhibit moderate correlations, no pairwise correlation coefficient exceeds 0.8 and all VIF values remain below 5, indicating the absence of severe multicollinearity. Nevertheless, correlated variables may share explanatory contributions in SHAP analysis. Therefore, the SHAP importance of a variable should be interpreted as its relative contribution in the presence of other correlated variables rather than as a completely independent effect.

5. Results

5.1. Spatiotemporal Characteristics of the Intermodal Ridership

In order to grasp the temporal characteristics of the metro–bus system, Figure 4 shows the proportions of various trips in different time periods of a day, including bus-only trips, metro-only trips, and transfer trips between metro and bus. The transfer trips are further divided into two directional categories, namely metro-to-bus and bus-to-metro, to better illustrate the spatiotemporal characteristics of intermodal transfers. It can be seen that the bus-to-metro trips are more than metro-to-bus trips in the morning peak hours, while there is an opposite pattern in the evening peak hours. Moreover, in peak hours, the proportion of intermodal trips on weekdays is larger than that on weekends. Conversely, in off-peak hours, the proportion of intermodal trips on weekdays is smaller than that on weekends.

Figure 5 shows the spatial characteristics of the intermodal ridership, including metro-to-bus and bus-to-metro on weekdays and weekends, respectively. In Figure 5a,b, there are 1726 bus stations and 315 metro stations that have intermodal trips on weekdays. The metro stations with the highest intermodal ridership are terminals of metro routes, big transfer stations and stations surrounded by many bus stations. Similarly, the bus stations with large intermodal ridership are located around the metro stations with large intermodal ridership. Statistical results show that metro and bus stations with high transfer volumes account for only a small proportion of all stations. Take Figure 5a for instance: the proportion of bus stations with an intermodal ridership larger than 100 is only 24.62%.

5.2. Model Performance

Based on the aforementioned stacking ensemble framework, predictive models were constructed separately for the two transfer directions: metro-to-bus and bus-to-metro. The two directions differ substantially in sample size and distributional characteristics. Specifically, the metro-to-bus direction contains 1713 SCA units, whereas the bus-to-metro direction includes 315 units. In both cases, the validated set of 26 independent variables was used for prediction. Given the pronounced right-skewness and high kurtosis of transfer volumes, winsorization was applied to mitigate the influence of extreme observations. Sensitivity analysis (see Table A1) indicated that the 10% threshold provided the best balance between predictive performance and preprocessing consistency across both transfer directions. Therefore, a 10% winsorization was applied uniformly to the dependent variable. During model training, the original dataset was divided into training, validation, and testing subsets in a 60–20–20% ratio, ensuring a consistent mapping from the original feature space to the meta-feature space. To account for differences in sample size, distinct cross-validation schemes were adopted in the meta-feature construction stage: 5-fold cross-validation for the metro-to-bus direction and 10-fold cross-validation for the bus-to-metro direction, thereby enhancing the stability of meta-feature generation under limited sample conditions.

The resulting model configurations are presented in Table 3. Although the selected base learner combinations differ between the two directions, both stacking models are dominated by tree-based learners, complemented by non-tree models. While non-tree models exhibit relatively lower predictive accuracy when used individually, their differences from tree models in functional form and error structure introduce essential model diversity into the stacking architecture. Empirical results indicate that excluding these models leads to varying degrees of performance deterioration, underscoring their critical complementary role in improving ensemble generalization. For both directions, RidgeCV was ultimately selected as the meta-learner. Moreover, despite the relatively smaller sample size of the bus-to-metro dataset (315 SCAs), the stacking framework combines 10-fold out-of-fold prediction generation with independent test-set evaluation, which enhances model robustness and reduces the risk of overfitting.

Table 4 presents a comprehensive comparison of predictive performance across different modeling strategies for the two transfer directions. The comparison includes traditional statistical models, individual base learners, a simple weighted ensemble model, and the stacking ensemble model. Across both transfer directions, the stacking ensemble consistently outperforms the other approaches in terms of R², MAE, and RMSE, indicating that secondary learning through a meta-learner effectively enhances model generalization by integrating the predictions of base learners. The results further reveal a clear disparity in overall prediction difficulty between the two transfer directions. Predictive performance in the bus-to-metro direction is generally lower than that in the metro-to-bus direction, with particularly pronounced differences observed in MAE and RMSE. This suggests that transfer demand in the bus-to-metro direction exhibits greater variability and stronger error accumulation effects, making it more challenging to predict accurately.

5.3. Model Interpretability

To further explore the feature contribution structure and underlying mechanisms behind the predictions of the stacking ensemble model, this study employs the SHAP method for systematic interpretability analysis. The analysis is conducted from three perspectives—global feature importance, nonlinear effects of individual features, and interactions between features—and focuses on a comparative examination of the metro-to-bus and bus-to-metro directions to reveal heterogeneity in built environment effects across different transfer contexts.

5.3.1. Global Feature Importance Analysis

Figure 6 and Figure 7 present the global feature importance results for the metro-to-bus and bus-to-metro directions, respectively. The left panels show SHAP summary (beeswarm) plots based on the stacking model. In these plots, features on the vertical axis are ranked in descending order according to their mean absolute stacking SHAP values, reflecting their relative importance in overall prediction. The horizontal axis represents the magnitude of stacking SHAP values, with positive and negative values indicating promotive and inhibitory effects on transfer demand, respectively. Each point corresponds to an SCA spatial unit, and point colors range from cool to warm tones to indicate feature values from low to high, thereby illustrating the relationship between feature magnitude, contribution direction, and impact strength. The right panels display polar (rose) charts that further quantify global feature importance. The length of each sector represents the relative proportion of the mean absolute stacking SHAP value of a feature among all variables, indicating its contribution to the overall explanatory power of the model. Sector colors reflect the average direction of stacking SHAP values, distinguishing the overall promotive or suppressive effect of each feature on transfer demand at the global level.

From a global perspective, transfer time emerges as the most influential factor in both transfer directions, although the degree of concentration differs substantially. In the metro-to-bus direction (Figure 6), transfer time accounts for 27.9% of total importance, markedly exceeding that of bus station count (15.4%) and DC (14.1%). Together, these three variables explain more than half of the total importance, indicating that transfer demand in this direction is primarily driven by transfer duration and bus stop provision, with a relatively concentrated set of dominant factors. In the bus-to-metro direction (Figure 7), feature importance is more dispersed. Transfer time is still the top contributor but with a markedly lower share of 14.9%, followed by bus station count (10.3%) and shopping count (8.6%). The top three features collectively account for only 33.8% of total importance. Notably, shopping count and medical care count rank relatively high and exhibit strong promotive effects, suggesting that commercial and medical land-use opportunities around stations play a more prominent role in encouraging passengers to transfer from bus to metro. The difference may be attributed to the distinct roles of the two transfer directions. Metro-to-bus transfers are typically associated with the last-mile stage of a trip, making transfer efficiency and bus service accessibility the dominant concerns. In contrast, bus-to-metro transfers are more closely linked to accessing major activity centers, where commercial and medical facilities generate substantial travel demand and thus promote transfer to metro services.

5.3.2. Nonlinear Effects of Individual Features

Building on the global feature importance analysis, this study further employs SHAP single-feature dependence plots based on the stacking model to characterize the nonlinear effects of key built environment variables on transfer demand. Figure 8 and Figure 9 present the SHAP dependence relationships of selected representative features for the two transfer directions. In the dependence plots, the horizontal axis denotes the original value of a feature, while the vertical axis represents the corresponding stacking SHAP value, indicating the marginal contribution of the feature to the model prediction at a given value. Each point corresponds to an SCA spatial unit, and the smoothed curve is fitted using locally weighted regression (LOWESS) to capture the overall trend of SHAP contributions as feature values vary. The SHAP = 0 reference line is used to distinguish positive and negative effects: when the smoothed curve lies above this baseline, the feature exerts a promotive effect on transfer demand; otherwise, it exhibits an inhibitory effect.

The SHAP dependence results reveal pronounced differences in the nonlinear response structures of key variables between the two transfer directions. Statistical uncertainty for the identified thresholds was quantified using bootstrap confidence intervals (95% CIs), with the complete results reported in Table 5. The slight discrepancy between the thresholds displayed in the figure and the point estimates in Table 5 arises because the point estimates are computed using the full dataset with higher precision. For transfer time, the metro-to-bus direction exhibits a stage-wise pattern, with critical turning points around 2.80 min and 10.5 min. In contrast, the bus-to-metro direction shows a single positive-to-negative transition at approximately 6.18 min, beyond which the contribution rapidly becomes negative, indicating a higher sensitivity to time delays in this direction. Bus station count displays a threshold-type increasing relationship in both directions, but with markedly different effective thresholds: about 16.84 for the metro-to-bus direction and as high as 101.11 for the bus-to-metro direction, suggesting a stronger dependence on bus network density in the latter. DCC also demonstrates a nonlinear gradient effect, with the peak contribution occurring at approximately 5.75 km in the metro-to-bus direction and a turning point around 10.61 km in the bus-to-metro direction, where negative fluctuations in peripheral areas are more pronounced. Regarding functional facilities, restaurant density in the metro-to-bus direction forms a stable positive contribution beyond approximately 29.72, whereas in the bus-to-metro direction, commercial and medical facilities exhibit clear threshold jumps around 9.57 and 42.58, respectively. Overall, the bus-to-metro direction is characterized by higher thresholds, larger amplitudes, and stronger negative penalties, reflecting greater sensitivity of transfer demand to changes in temporal and spatial conditions; by contrast, the metro-to-bus direction shows relatively smoother and more concentrated influence mechanisms.

These findings suggest that the impacts of built environment and transfer-related variables on intermodal demand are highly nonlinear and direction-dependent. The bus-to-metro direction generally exhibits higher thresholds and stronger negative marginal effects, indicating that transfer demand in this direction is more sensitive to changes in temporal and spatial conditions. This may be because passengers are more sensitive to access costs and service conditions when deciding whether to enter the metro system. By contrast, the metro-to-bus direction shows relatively smoother nonlinear response patterns and lower effective thresholds, as these transfers are often associated with the final stage of a trip and are therefore subject to more stable travel needs. In addition, the observed threshold and saturation effects suggest that improvements in built environment conditions do not necessarily translate into proportional increases in transfer demand; instead, a minimum level of accessibility or facility provision is often required before a positive effect emerges. These findings further highlight the necessity of nonlinear modeling approaches for capturing complex metro–bus transfer behaviors.

5.3.3. Feature Interaction Analysis

To further reveal the synergistic or antagonistic relationships among built environment factors, this study constructs circular feature interaction network diagrams based on stacking-model SHAP interaction values (see Figure 10 and Figure 11). The interaction values are obtained by aggregating the SHAP interaction matrices of tree-based base learners using the weights learned by the meta-learner, thereby reflecting the dominant interaction contributions between feature pairs captured by the tree-based components of the stacking framework. In the diagrams, nodes represent explanatory variables, and node size corresponds to the mean absolute value of the stacking SHAP main effect, indicating the overall importance of an individual feature in prediction. Node color denotes the average direction of the stacking SHAP value, with a transition from cool to warm colors representing overall negative to positive effects. Edges between nodes indicate the strength of pairwise interactions, where edge width reflects the mean absolute SHAP interaction value and thus the significance of the interaction; edge color represents the average direction of the interaction SHAP value, distinguishing synergistic (positive) from antagonistic (negative) relationships.

Distinct interaction patterns are observed between the two transfer directions. In the metro-to-bus direction (Figure 10), prominent interactions are mainly concentrated between transfer time and station-related variables. For example, transfer time and metro station count exhibits a positive interaction, suggesting that in areas with a higher number of metro stations, longer transfer times are more likely to be associated with continued travel via bus. In contrast, transfer time and bus station count and DC and bus station count are predominantly negative, indicating that in areas where bus stops are already dense, the combined effects of time cost and station abundance do not further increase transfer volume and may instead disperse demand due to a wider range of travel options. In the bus-to-metro direction (Figure 11), interaction patterns differ in both composition and magnitude. While interactions involving transfer time and station-related variables remain salient, the overall interaction effects tend to be stronger and more uniformly signed, reflecting a higher degree of sensitivity of transfer demand to the combined influences of temporal conditions and surrounding built environment features in this direction. These interaction patterns indicate that transfer demand is jointly shaped by travel costs and the surrounding service environment rather than by individual factors in isolation. The negative interaction between transfer time and bus station count suggests a diminishing marginal effect of increasing bus stop provision when transfer costs are already high. Conversely, positive interactions involving transfer time and metro station count imply that well-connected transit environments may partially offset the adverse effects of longer transfer times by providing greater accessibility and travel opportunities.

6. Discussion

6.1. Analysis of Results

The spatiotemporal patterns identified in this study generally support existing evidence on commuting-oriented metro–bus integration. Similar to previous studies, transfer demand exhibits clear directional asymmetry between morning and evening peak periods, reflecting the spatial separation between residential and employment locations in large metropolitan areas. Spatially, transfer activities are highly concentrated around terminal stations and major transfer hubs, confirming the hierarchical structure of the metro–bus network reported in previous studies [10]. Compared with earlier research, the present study further highlights a pronounced concentration effect, whereby a relatively small proportion of stations accommodates the majority of transfer demand. This finding suggests that intermodal connectivity within urban transit systems is strongly dependent on a limited number of critical nodes, implying that targeted improvements at these locations may generate substantial network-wide benefits.

Unlike most existing studies that examine transfer demand from a single directional perspective, this study separately models metro-to-bus and bus-to-metro transfers, enabling a direct comparison of their behavioral characteristics. Methodologically, previous studies have primarily relied on individual machine-learning algorithms or hybrid frameworks such as the GTWR-RF model to capture nonlinear and spatiotemporal effects of the built environment on transfer demand [16]. In contrast, this study develops a stacking ensemble framework that integrates multiple learners with different modeling mechanisms. The superior performance of the stacking model across both transfer directions demonstrates that different algorithms capture complementary aspects of transfer behavior. Tree-based learners effectively identify nonlinear relationships and threshold effects, whereas non-tree learners contribute additional diversity through distinct functional structures and error patterns. Compared with Lei et al. [42], who developed separate XGBoost models for different time periods and spatial zones, and Shi et al. [16], who modeled different metro-station clusters independently, this study adopts a global perspective by analyzing transfer demand over an entire weekday. Such an approach inevitably introduces greater behavioral heterogeneity, which may partly explain the relatively moderate predictive performance observed.

The comparison between transfer directions further highlights substantial differences in prediction difficulty. Consistent with previous studies, predictive performance in the metro-to-bus direction is noticeably higher than that in the bus-to-metro direction. Despite the adoption of a more sophisticated stacking architecture and cross-validation strategy, the bus-to-metro model remains more difficult to predict accurately. This disparity can be attributed to differences in both data structure and travel behavior. From a data perspective, bus-to-metro demand is aggregated at metro stations, with each station integrating transfer flows from multiple surrounding bus stops, resulting in greater spatial and behavioral heterogeneity. From a methodological perspective, the bus-to-metro dataset contains substantially fewer samples than the metro-to-bus dataset. The limited sample size may restrict the ability of the learning algorithms to learn stable mapping relationships within the meta-feature space, thereby increasing prediction uncertainty. From a behavioral perspective, bus-to-metro trips are associated with diverse activity purposes, including commuting, shopping, healthcare, and leisure, whereas metro-to-bus trips are more frequently linked to last-mile travel and therefore exhibit more homogeneous behavioral patterns. These characteristics are consistent with the subsequent SHAP results, which reveal a more dispersed importance structure and stronger nonlinear effects in the bus-to-metro direction.

The SHAP analysis confirms that transfer time is the dominant determinant of intermodal transfer demand. Similar findings have been reported by Zeng et al. [10], who identified transfer time and bus-service attributes as key drivers of transfer behavior during peak periods. However, this study further demonstrates that the influence of transfer time is highly nonlinear and exhibits substantial directional heterogeneity, suggesting that passengers evaluate transfer costs differently depending on their travel stage and trip purpose. Regarding threshold effects, Zeng et al. [10] reported that metro-to-bus transfer demand stabilizes when transfer time exceeds approximately 12.5 min during peak periods. By contrast, the metro-to-bus direction in this study exhibits a dual-threshold pattern, indicating a more complex temporal response structure. Similarly, the identified threshold for bus-station provision differs considerably between transfer directions. The effective threshold is approximately 16.84 stations for metro-to-bus transfers but increases to 101.11 stations for bus-to-metro transfers.

The interaction analysis further extends current understanding of the joint effects of built-environment and transfer-related variables. Consistent with previous findings [10,42], interactions involving transfer time and transit-service attributes represent the most influential interaction mechanisms. However, the interaction structures identified in this study differ substantially between transfer directions. Metro-to-bus transfers are characterized by localized interactions primarily involving transfer time and station-related variables, whereas bus-to-metro transfers exhibit stronger and more consistent interaction patterns. This suggests that passengers entering the metro system are more sensitive to the combined effects of travel costs and surrounding built-environment conditions. Overall, these results indicate that transfer demand is shaped not only by individual factors but also by complex nonlinear and interactive mechanisms, highlighting the importance of adopting interpretable machine-learning approaches to capture such relationships.

6.2. Policy Implications

The findings provide several practical implications for improving metro–bus integration in Beijing.

First, reducing transfer time should be prioritized at major transfer hubs, as transfer time consistently emerges as the most influential determinant of intermodal demand. Stations such as Tiantongyuanbei, Dongzhimen, Jishuitan, and Liuliqiao East accommodate substantial transfer volumes and therefore represent priority locations for intervention.

Second, the pronounced directional differences identified in this study suggest that differentiated planning strategies are needed across urban areas. In central districts, where metro accessibility is already relatively high, planning efforts should focus on improving transfer convenience and reducing walking and waiting times. In suburban areas, where bus-to-metro transfers exhibit a stronger dependence on feeder-bus availability, expanding feeder routes and increasing bus-stop coverage around metro stations may be more effective for attracting intermodal passengers.

Third, the identified threshold effects indicate that infrastructure investments should be guided by critical service levels rather than continuous expansion. For example, increasing bus-stop provision beyond the effective threshold may generate diminishing returns, whereas improving service conditions in areas that remain below the threshold could produce substantially greater benefits.

Finally, the observed interaction effects imply that transfer demand is jointly influenced by transfer conditions and surrounding built-environment characteristics. Therefore, metro–bus integration should be coordinated with land-use planning. In areas with intensive commercial and medical facilities, improving feeder-bus accessibility may generate larger increases in transfer demand than equivalent investments in areas with weaker activity concentrations. Such targeted interventions could support a more efficient integration of transport services and urban development.

6.3. Limitations

First, the analysis is based on one week of smart card data collected in May 2019. Although the dataset captures detailed transfer behavior, seasonal variations and long-term changes in travel patterns are not reflected. Future studies could employ multi-period datasets to examine the temporal stability of the identified relationships.

Second, the metro-to-bus and bus-to-metro datasets differ substantially in sample size. Although separate stacking models were developed for each direction and model evaluation incorporated cross-validation and independent test-set validation, differences in sample size may still affect model stability and the robustness of SHAP-based interpretations. Consequently, comparisons between the two transfer directions should be interpreted with appropriate caution.

Third, although the proposed framework is developed using data from Beijing, the identified threshold values and feature effects may be influenced by local urban form, transit-network structure, and population distribution. Additional validation in other cities is therefore needed to assess the transferability and generalizability of the findings.

7. Conclusions

The transfer behavior of passengers between metro and bus services is of significance for constructing a sustainable public transportation system. This study proposed a stacking machine learning framework, using diverse tree-based and non-tree-based learners, to reveal the nonlinear relationship between intermodal demand and the built environment. In order to enhance interpretability, the SHapley Additive exPlanations (SHAP) model is integrated into the framework. The findings of this study are as follows.

In terms of temporal distribution, metro-to-bus and bus-to-metro trips exhibit distinct peak-hour patterns. In the morning peak hours, the bus-to-metro trips are more than metro-to-bus trips, while the situation is opposite in the evening peak hours. In terms of spatial distributions, the stations with large intermodal demand are terminals of metro routes and metro transfer stations. The stacking ensemble machine learning framework can successfully fuse the output of base leaners to increase the accuracy of predictions. The stacking model outperforms the base learners, especially for the bus-to-metro trips. The results indicate that transfer time is the dominant factor affecting intermodal trips, followed by bus station count and the degree of transit network. The interaction effects show that the transfer time and metro station count is positive, while the transfer time and bus station count is negative. It implies that the adverse effect of longer transfer times becomes stronger in areas with a higher density of bus stops, as passengers may have more alternative travel options available.

This study could provide new insights for the public transport system planning and management. Future studies will consider more travel modes such as bike-sharing, taxis and ride-hailing.

Author Contributions

Conceptualization, Hui Zhang; methodology, Hui Zhang; software, Ke Qu; validation, Ke Qu; formal analysis, Ke Qu; resources, Hui Zhang; data curation, Ke Qu; writing—original draft preparation, Ke Qu; writing—review and editing, Hui Zhang; visualization, Ke Qu; supervision, Hui Zhang. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Youth Innovation Team Science and technology support project in Colleges and Universities of Shandong Province (2021KJ058).

Data Availability Statement

The data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Sensitivity analysis of different winsorization thresholds on stacking model performance.

Winsorization Level	R²
Metro-to-bus
5%	0.433
10%	0.429
15%	0.393
Bus-to-metro
5%	0.333
10%	0.404
15%	0.387
Average
5%	0.383
10%	0.416
15%	0.390

References

Wang, Z.J.; Chen, F.; Xu, T.K. Interchange between Metro and Other Modes: Access Distance and Catchment Area. J. Urban Plan. Dev. 2016, 142, 04016012. [Google Scholar] [CrossRef]
Peng, B.Z.Z.; Wang, T.; Zhang, Y.; Li, C.Y.; Lu, C.X. Spatially Varying Effect Mechanism of Intermodal Connection on Metro Ridership: Evidence from a Polycentric Megacity with Multilevel Ring Roads. ISPRS Int. J. Geo-Inf. 2024, 13, 353. [Google Scholar] [CrossRef]
Ma, X.W.; Tian, X.L.; Cui, H.J.; He, M.J.; Wang, J.B.; Cheng, L. What influences intermodal Choices: Metro-Centric, Bus-Centric, Hybrid? insights from Machine learning Approaches. Transp. Res. Part D Transp. Environ. 2024, 136, 104407. [Google Scholar] [CrossRef]
Guo, Z.; Wilson, N.H.M. Assessing the cost of transfer inconvenience in public transport systems: A case study of the London Underground. Transp. Res. Part A Policy Pract. 2011, 45, 91–104. [Google Scholar] [CrossRef]
Schakenbos, R.; La Paix, L.; Nijenstein, S.; Geurs, K.T. Valuation of a transfer in a multimodal public transport trip. Transp. Policy 2016, 46, 72–81. [Google Scholar] [CrossRef]
Zhang, H.; Cui, Y.; Liu, Y.J.; Jia, J.M.; Shi, B.Y.; Yu, X.H. Exploring Travel Mobility in Integrated Usage of Dockless Bike-Sharing and the Metro Based on Multisource Data. ISPRS Int. J. Geo-Inf. 2024, 13, 108. [Google Scholar] [CrossRef]
Yuan, H.X.; Qiu, Z.Q.; Xu, H.; Pan, R.B.; Yan, Y.S. Who is willing to switch to a less-crowded metro route via feeder bus connections? A case study in Chengdu, China. Travel Behav. Soc. 2026, 43, 101202. [Google Scholar] [CrossRef]
Huang, Z.L.; Xu, L.H.; Lin, Y.J.; Wu, P.; Feng, B. Citywide Metro-to-Bus Transfer Behavior Identification Based on Combined Data from Smart Cards and GPS. Appl. Sci. 2019, 9, 3597. [Google Scholar] [CrossRef]
Li, X.; Yan, Q.P.; Ma, Y.F.; Luo, C. Spatially Varying Impacts of Built Environment on Transfer Ridership of Metro and Bus Systems. Sustainability 2023, 15, 7891. [Google Scholar] [CrossRef]
Zeng, L.H.; Shi, Y.J.; Zhang, Z. Exploring Nonlinear and Interactive Influences of Built Environment on Metro-To-Bus Transfer Behavior. Trans. GIS 2025, 29, e70107. [Google Scholar] [CrossRef]
Cherry, T.; Townsend, C. Assessment of Potential Improvements to Metro-Bus Transfers in Bangkok, Thailand. Transp. Res. Rec. 2012, 2276, 116–122. [Google Scholar] [CrossRef]
Huang, J.W.; Liu, X.T.; Zhao, P.X.; Zhang, J.W.; Kwan, M.P. Interactions between Bus, Metro, and Taxi Use before and after the Chinese Spring Festival. ISPRS Int. J. Geo-Inf. 2019, 8, 445. [Google Scholar] [CrossRef]
Chen, Z.Y.; Huang, Z.F.; Yang, L.L.; Zheng, P.J. Evaluation of Transfer Efficiency between Subway and Bus Based on the Interval Number Ranking Method by Employing Probability Reliability: Taking Ningbo for Example. Discret. Dyn. Nat. Soc. 2021, 9125605. [Google Scholar] [CrossRef]
Jin, H.; Gao, J.X.; Shen, Z.H.; Cai, M.; Zhu, X.; Wu, J.H. Dynamic Evaluation for Subway-Bus Transfer Quality Referring to Benefits, Convenience, and Reliability. Sustainability 2025, 17, 6684. [Google Scholar] [CrossRef]
Gu, T.Q.; Zhang, K.H.; Xu, W.P.; Zhuang, C.T.; Jiang, Z.H.; Kim, I.; Chung, H. Free interchange for better transit? Assessing the multi-dimensional impacts on metro to bus interchange behavior—Insights from an explainable machine learning method. Travel Behav. Soc. 2025, 38, 100923. [Google Scholar] [CrossRef]
Shi, Y.J.; Zeng, L.H. How do built environment characteristics influence metro-bus transfer patterns across metro station types in Shanghai? J. Transp. Geogr. 2025, 123, 104137. [Google Scholar] [CrossRef]
Wang, S.X.; Zhao, L.Y.; Zhang, M.; Chen, R.Y.; Liang, S.C. Effects of Built Environment on Bus Trip Rates under Rail Transit Competition. J. Urban Plan. Dev. 2023, 149, 04022059. [Google Scholar] [CrossRef]
Liu, D.; Rong, W.Y.; Zhang, J.; Ge, Y.E. Exploring the Nonlinear Effects of Built Environment on Bus-Transfer Ridership: Take Shanghai as an Example. Appl. Sci. 2022, 12, 5755. [Google Scholar] [CrossRef]
Shen, H.P.; Weng, J.C.; Lin, P.F. Exploring the nuanced correlation between built environment and the integrated travel of dockless bike-sharing and metro at origin-route-destination level. Sustain. Cities Soc. 2025, 119, 106090. [Google Scholar] [CrossRef]
Chen, Q.X.; Lv, B.; Hao, B.B.; Li, X.L. Impacts of the Feeder-Related Built Environment on Taxi-Metro Integrated Use in Lanzhou, China. J. Adv. Transp. 2023, 8251433. [Google Scholar] [CrossRef]
Yang, M.; Zhao, J.Y.; Wang, W.; Liu, Z.Y.; Li, Z.B. Metro commuters’ satisfaction in multi-type access and egress transferring groups. Transp. Res. Part D Transp. Environ. 2015, 34, 179–194. [Google Scholar] [CrossRef]
Cheng, Y.H.; Tseng, W.C. Exploring the effects of perceived values, free bus transfer, and penalties on intermodal metro-bus transfer users’ intention. Transp. Policy 2016, 47, 127–138. [Google Scholar] [CrossRef]
Carrel, A.; Mishalani, R.G.; Sengupta, R.; Walker, J.L. In Pursuit of the Happy Transit Rider: Dissecting Satisfaction Using Daily Surveys and Tracking Data. J. Intell. Transp. Syst. 2016, 20, 345–362. [Google Scholar] [CrossRef]
Devillaine, F.; Munizaga, M.; Trepanier, M. Detection of Activities of Public Transport Users by Analyzing Smart Card Data. Transp. Res. Rec. 2012, 2276, 48–55. [Google Scholar] [CrossRef]
Zhao, D.; Wang, W.; Woodburn, A.; Ryerson, M.S. Isolating high-priority metro and feeder bus transfers using smart card data. Transportation 2017, 44, 1535–1554. [Google Scholar] [CrossRef]
Seaborn, C.; Attanucci, J.; Wilson, N.H.M. Analyzing Multimodal Public Transport Journeys in London with Smart Card Fare Payment Data. Transp. Res. Rec. 2009, 2121, 55–62. [Google Scholar] [CrossRef]
Liu, Y.; He, D.L.; Lei, J.Y.; He, M.W.; Shi, Z.B. Investigating the non-linear influence of the built environment on passengers’ travel distance within metro and bus networks using smart card data. Multimodal Transp. 2025, 4, 100188. [Google Scholar] [CrossRef]
Yadav, M.; Mepparambath, R.M.; Patil, G.R. An enhanced transit accessibility evaluation framework by integrating Public Transport Accessibility Levels (PTAL) and transit gap. J. Transp. Geogr. 2024, 121, 104013. [Google Scholar] [CrossRef]
Wu, P.; Xu, L.H.; Zhong, L.S.; Gao, K.; Qu, X.B.; Pei, M.Y. Revealing the determinants of the intermodal transfer ratio between metro and bus systems considering spatial variations. J. Transp. Geogr. 2022, 104, 103415. [Google Scholar] [CrossRef]
Yang, L.; Zhang, T.Y.; Wang, Y.Q.; Lian, Y.J.; Wang, Y.; Liu, Y.Y.; Zhao, H.; Yuan, Y.T.; Zhou, D.H. How far to allocate feeder transport to metro effectively? An empirical study in Xi’an, China. Transp. Policy 2025, 171, 867–881. [Google Scholar] [CrossRef]
Chen, E.H.; Stathopoulos, A.; Nie, Y. Transfer station choice in a multimodal transit system: An empirical study. Transp. Res. Part A Policy Pract. 2022, 165, 337–355. [Google Scholar] [CrossRef]
Shi, Y.J. Identifying the Spatiotemporal Metro-to-Bus Transfer Deserts in Shanghai, China. J. Urban Plan. Dev. 2023, 149, 04023013. [Google Scholar] [CrossRef]
Zhang, M.M.; Jiang, Q.L.; Tao, S.; Dai, T.Q.; Ma, S.J. The spatial structural patterns of commuting trips by bus and metro in Beijing, China: Complementary or competing? Travel Behav. Soc. 2025, 40, 101032. [Google Scholar] [CrossRef]
Zhou, Y.Y.; Zhang, M.Y.; Deng, S.S.; Hu, S.L.; Zheng, S.Y.; Chen, Y.Y. Node importance calculation in bus-metro composite network considering land use. Tunn. Undergr. Space Technol. 2025, 163, 106723. [Google Scholar] [CrossRef]
Du, Q.; Zong, X.Y.; Li, Y.; Guo, X.Q.; Ye, Z.N.; Li, S.S.; Bai, L.B. Resilience optimization of bus-metro double-layer network against extreme weather events. Transp. Res. Part D Transp. Environ. 2024, 135, 104378. [Google Scholar] [CrossRef]
Chen, E.H.; Ye, Z.R.; Wu, H. Nonlinear effects of built environment on intermodal transit trips considering spatial heterogeneity. Transp. Res. Part D Transp. Environ. 2021, 90, 102677. [Google Scholar] [CrossRef]
Wu, P.; Li, J.L.; Pian, Y.Z.; Li, X.C.; Huang, Z.L.; Xu, L.H.; Li, G.L.; Li, R.N. How Determinants Affect Transfer Ridership between Metro and Bus Systems: A Multivariate Generalized Poisson Regression Analysis Method. Sustainability 2022, 14, 9666. [Google Scholar] [CrossRef]
Li, Q.F.; Song, Z.M. Prediction of compressive strength of rice husk ash concrete based on stacking ensemble learning model. J. Clean. Prod. 2023, 382, 135279. [Google Scholar] [CrossRef]
Riis, C.; Antunes, F.; Bolic, T.; Gurtner, G.; Cook, A.; Azevedo, C.L.; Pereira, G.C. Explainable active learning metamodeling for simulations: Method and experiments for ATM performance assessment. Transp. Res. Part C Emerg. Technol. 2024, 166, 104788. [Google Scholar] [CrossRef]
Li, X.W.; Shi, L.X.; Shi, Y.; Tang, J.Q.; Zhao, P.J.; Wang, Y.T.; Chen, J. Exploring interactive and nonlinear effects of key factors on intercity travel mode choice using XGBoost. Appl. Geogr. 2024, 166, 103264. [Google Scholar] [CrossRef]
Doan, Q.C.; Ma, J.; Chen, S.T.; Zhang, X.H. Nonlinear and threshold effects of the built environment, road vehicles and air pollution on urban vitality. Landsc. Urban Plan. 2025, 253, 105204. [Google Scholar] [CrossRef]
Lei, J.Y.; He, M.; Shi, Z.B.; He, M.W.; Liu, Y.; Qian, Q.; Qian, H.M. How does the built environment affect intermodal transit demand across different spatiotemporal contexts? J. Transp. Geogr. 2024, 121, 104033. [Google Scholar] [CrossRef]
Yin, C.; Cao, J.; Sun, B.D.; Liu, J.H. Exploring built environment correlates of walking for different purposes: Evidence for substitution. J. Transp. Geogr. 2023, 106, 103505. [Google Scholar] [CrossRef]

Figure 1. Methodological framework of data processing, stacking ensemble model training, and SHAP-based interpretation.

Figure 2. Study area.

Figure 3. Correlations among variables.

Figure 4. The proportions of various trips in different time periods of a day.

Figure 5. Spatial characteristics of the intermodal ridership: (a) metro-to-bus on weekdays, (b) bus-to-metro on weekdays, (c) metro-to-bus on weekends, (d) bus-to-metro on weekends.

Figure 6. SHAP feature importance plot (metro-to-bus): (a) SHAP summary beeswarm plot; (b) polar chart showing relative global feature importance.

Figure 7. SHAP feature importance plot (bus-to-metro): (a) SHAP summary beeswarm plot; (b) polar chart showing relative global feature importance.

Figure 8. SHAP dependence plots (metro-to-bus): (a) Transfer time. (b) Bus station count. (c) DC. (d) Restaurant count. (e) Traffic facility count. (f) DCC.

Figure 9. SHAP dependence plots (bus-to-metro): (a) Transfer time. (b) Bus station count. (c) Shopping count. (d) Medical care count. (e) House price. (f) DCC.

Figure 10. Circular feature interaction network (metro-to-bus).

Figure 11. Circular feature interaction network (bus-to-metro).

Table 1. Descriptive statistics of independent variables.

Variable	Description	Max	Min	Mean	Standard Deviation
Diversity
Land-use mix	Index of POI categories within the SCA, measuring land-use and functional diversity.	1	0.262	0.843	0.076
Restaurant count	Number of catering facilities within the SCA.	308	0	26.196	29.490
S&L count	Number of sports and leisure facilities within the SCA.	151	0	9.325	10.403
Corporation count	Number of office and business facilities within the SCA.	838	0	49.584	69.460
Public facility count	Number of public service facilities within the SCA	56	0	4.520	6.047
Commercial residence count	Number of commercial and residential facilities within the SCA.	108	0	12.439	11.193
E&C count	Number of education and culture facilities within the SCA.	1096	0	25.258	39.392
Shopping count	Number of shopping facilities within the SCA.	20	0	2.114	2.511
Medical count	Number of medical facilities within the SCA.	73	0	8.472	8.828
Traffic facility count	Number of traffic facilities within the SCA.	205	1	29.394	25.488
Scenery count	Number of scenery facilities within the SCA.	82	0	1.757	4.883
G&O count	Number of government and social organizations within the SCA.	165	0	17.035	18.696
Life service count	Number of life service facilities within the SCA.	201	0	17.305	19.693
F&I count	Number of financial and insurance facilities within the SCA.	156	0	9.574	13.363
Socioeconomic density
GDP	Gross domestic product within the SCA, representing the level of economic activity (10,000¥/km²).	65.500	0	5.746	6.628
Population	Population density within the SCA, reflecting the intensity of residential activities (persons/km²).	359	8	74.877	60.819
House price	Average residential housing price within the SCA (10,000¥).	24.470	0.337	6.299	3.438
Design
Road density	Total road length per unit area within the SCA, representing street network intensity (km/km²).	77.215	5.679	38.306	10.656
DC	Normalized degree centrality of bus stops within the SCA bus network.	1	0	0.759	0.283
CC	Normalized closeness centrality of bus stops within the SCA bus network.	1	0	0.666	0.309
Metro station count	Total number of metro stations within the SCA.	8	0	0.553	1.218
Bus station count	Total number of bus stops within the SCA.	237	0	26.718	25.823
Metro line count	Total number of metro lines serving the SCA.	8	0	1.696	1.573
Bus line count	Total number of bus lines serving the SCA.	149	1	36.270	24.511
Destination accessibility
DCC	Euclidean distance from the SCA centroid to the city center.	42.576	0.275	11.853	7827
Distance to transit
Transfer time	Average transfer time between bus and metro within the SCA, extracted from smart card transfer records based on passengers’ alighting time and subsequent boarding time (minutes),	30	1	11.838	5.002

Table 2. Diagnostic tests of independent variables.

Variable	Moran’s I	p-Value	VIF	r
Land-use mix	0.088	0.01	1.818	−0.062
Restaurant count	0.022	0.01	3.823	0.021
S&L count	0.025	0.01	2.666	−0.035
Corporation count	0.029	0.01	3.277	−0.005
Public facility count	0.071	0.01	1.661	−0.063
Commercial residence count	0.028	0.01	3.675	−0.077
E&C count	0.027	0.01	1.571	−0.040
Shopping count	0.036	0.01	2.615	−0.001
Medical count	0.042	0.01	3.063	−0.094
Traffic facility count	0.054	0.01	3.768	−0.078
Scenery count	0.032	0.01	1.240	−0.020
G&O count	0.065	0.01	2.103	−0.104
Life service count	0.042	0.01	4.334	−0.034
F&I count	0.044	0.01	2.414	−0.013
GDP	0.217	0.01	1.713	−0.097
Population	0.264	0.01	3.139	−0.034
House price	0.512	0.01	1.973	−0.039
Road density	0.142	0.01	2.071	−0.009
DC	0.101	0.01	1.862	0.101
CC	0.004	0.09	1.037	−0.026
DCC	0.892	0.01	2.864	0.042
Metro station count	0.013	0.01	1.435	0.309
Bus station count	0.230	0.01	2.541	0.052
Metro line count	0.112	0.01	1.631	0.175
Bus line count	0.070	0.01	1.636	0.147
Transfer time	0.012	0.02	1.082	−0.218

Table 3. Optimized learner configurations.

Transfer Type	Model Role	Selected Models
Metro-to-bus	Base-learner	LightGBM, XGBoost, MLP, KNN, SVR
Metro-to-bus	Meta-learner	RidgeCV (α = 10)
Bus-to-metro	Base-learner	LightGBM, CatBoost, Random Forest, AdaBoost, ExtraTrees, MLP
Bus-to-metro	Meta-learner	RidgeCV (α = 10)

Table 4. Comprehensive comparison of model performance.

Model	Hyperparameters	R²	RMSE	MAE
Metro-to-bus
OLS	/	0.245	95.781	72.863
GWR	/	0.219	97.335	74.860
LightGBM	n_estimators = 300, max_depth = 3, Learning_rate = 0.02	0.423	85.810	60.675
XGBoost	n_estimators = 500, max_depth = 8, Learning_rate = 0.01	0.422	85.843	60.228
MLP	sizes = (100, 50), activation = ‘relu’	0.292	95.105	65.949
KNN	n_neighbors = 7, weights = ‘distance’	0.220	99.783	71.731
SVR	kernel = ‘rbf’	0.162	103.446	60.461
Weighted average	(LightGBM = 0.217, XGBoost = 0.213, KNN = 0.184, SVR = 0.176, MLP = 0.210)	0.380	89.053	61.298
Stacking	5-fold CV	0.429	85.376	59.560
Bus-to-metro
OLS	/	0.178	654.888	518.689
GWR	/	0.157	663.746	531.031
LightGBM	n_estimators = 300, max_depth = 3, Learning_rate = 0.02	0.393	613.207	513.515
CatBoost	Iterations = 600, depth = 5, Learning_rate = 0.01	0.350	634.235	523.129
Random Forest	n_estimators = 300, max_depth = 6	0.282	666.592	557.276
AdaBoost	n_estimators = 300, max_depth = 8, Learning_rate = 0.06	0.279	668.059	509.657
ExtraTrees	n_estimators = 200, max_depth = 3	0.0916	749.818	579.418
MLP	sizes = (100, 50), activation = ‘relu’	0.115	740.275	592.565
Weighted average	(LightGBM = 0.174, CatBoost = 0.176, ExtraTrees = 0.145, Random Forest = 0.172, AdaBoost = 0.166, MLP = 0.166)	0.331	643.705	531.807
Stacking	10-fold CV	0.404	607.303	505.590

Table 5. Threshold estimates for SHAP dependence curves.

Feature	Point_Estimate	95%CI	Displayed in Figure
Metro-to-bus
Transfer time	2.82	[2.18, 3.39]	2.80
Transfer time	10.62	[10.50, 10.73]	10.50
Bus station count	16.86	[16.12, 17.58]	16.84
DC	0.78	[0.76, 0.81]	0.78
Restaurant count	29.87	[29.11, 30.57]	29.72
Traffic facility count	17.10	[16.66, 17.53]	16.96
DCC	5.77	[5.43, 6.07]	5.75
DCC	27.08	[25.96, 28.76]	27.0
Bus-to-metro
Transfer time	6.19	[6.14, 6.23]	6.18
Bus station count	100.82	[99.38, 102.59]	101.11
Shopping count	9.54	[9.35, 9.66]	9.57
Medical care count	42.57	[41.93, 43.36]	42.58
House price	3.22	[3.15, 3.31]	3.22
House price	12.79	[12.05, 13.69]	12.30
DCC	10.85	[10.46, 11.31]	10.61
DCC	36.96	[34.22, 39.58]	35.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Zhang, H.; Qu, K. How Does the Built Environment Affect Intermodal Demand Between Bus and Metro: An Ensemble Explainable Machine Learning Analysis. ISPRS Int. J. Geo-Inf. 2026, 15, 269. https://doi.org/10.3390/ijgi15060269

AMA Style

Zhang H, Qu K. How Does the Built Environment Affect Intermodal Demand Between Bus and Metro: An Ensemble Explainable Machine Learning Analysis. ISPRS International Journal of Geo-Information. 2026; 15(6):269. https://doi.org/10.3390/ijgi15060269

Chicago/Turabian Style

Zhang, Hui, and Ke Qu. 2026. "How Does the Built Environment Affect Intermodal Demand Between Bus and Metro: An Ensemble Explainable Machine Learning Analysis" ISPRS International Journal of Geo-Information 15, no. 6: 269. https://doi.org/10.3390/ijgi15060269

APA Style

Zhang, H., & Qu, K. (2026). How Does the Built Environment Affect Intermodal Demand Between Bus and Metro: An Ensemble Explainable Machine Learning Analysis. ISPRS International Journal of Geo-Information, 15(6), 269. https://doi.org/10.3390/ijgi15060269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

How Does the Built Environment Affect Intermodal Demand Between Bus and Metro: An Ensemble Explainable Machine Learning Analysis

Abstract

1. Introduction

2. Literature Review

2.1. Transfer Behavior Studies

2.2. Built Environment Effects

2.3. Machine Learning Approaches and Research Gaps

3. Methodology

3.1. Stacking Ensemble Learning

3.2. Individual Machine Learning Models

3.3. SHapley Additive exPlanations (SHAP) Model

4. Data Description

4.1. Study Area

4.2. Variables

5. Results

5.1. Spatiotemporal Characteristics of the Intermodal Ridership

5.2. Model Performance

5.3. Model Interpretability

5.3.1. Global Feature Importance Analysis

5.3.2. Nonlinear Effects of Individual Features

5.3.3. Feature Interaction Analysis

6. Discussion

6.1. Analysis of Results

6.2. Policy Implications

6.3. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI