Nonlinear Effects of Station-Area Environments on Commercial–Employment Composite Vitality: Evidence from Osaka’s Midosuji Line

Li, Yu; Wang, Zihao; Yao, Minfeng; Zhang, Yikang; Zhang, Qi

doi:10.3390/land15061054

Open AccessArticle

Nonlinear Effects of Station-Area Environments on Commercial–Employment Composite Vitality: Evidence from Osaka’s Midosuji Line

by

Yu Li

^1,†,

Zihao Wang

^2,†

,

Minfeng Yao

^1,3,*

,

Yikang Zhang

¹ and

Qi Zhang

¹

School of Architecture, Huaqiao University, Xiamen 361021, China

²

School of Architecture and Urban Planning, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

³

Research Institute of Urban and Rural Construction and Environmental Protection, Huaqiao University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Land 2026, 15(6), 1054; https://doi.org/10.3390/land15061054 (registering DOI)

Submission received: 29 April 2026 / Revised: 7 June 2026 / Accepted: 12 June 2026 / Published: 15 June 2026

(This article belongs to the Special Issue Transport Planning in Smart Cities and Sustainable Urban Design)

Download

Browse Figures

Versions Notes

Abstract

Rail-transit station areas concentrate commercial services, employment, and intensive land development, but their vitality is shaped by multiple built-environment conditions rather than rail accessibility alone. Focusing on 20 stations along the Osaka Metro Midosuji Line in Japan, this study uses Japanese chome units, which are small neighborhood-level address and statistical units, within an 800 m pedestrian catchment as analytical units and measures commercial-service agglomeration intensity, employment intensity, and commercial–employment composite vitality. The composite indicator measures the static co-concentration of commercial-service provision and employment carrying capacity, with pedestrian flow, consumption activity, and dwell time treated as separate dimensions of station-area vitality. Ten station-area environmental variables are examined using ordinary least squares (OLS), Lasso, Random Forest, Back-Propagation (BP) Neural Network, and extreme gradient boosting (XGBoost) models, with Shapley additive explanations (SHAP) applied to interpret variable contributions and nonlinear responses. Results show that nonlinear models generally outperform linear models. Development intensity, officially assessed land price, and network distance to the nearest metro station are the most influential variables, showing threshold, marginal, and non-monotonic effects. Split models indicate that commercial-service agglomeration is more sensitive to rail proximity and street-network conditions, whereas employment intensity is more associated with development intensity and land price. These findings support fine-grained station-area renewal and mixed-function planning.

Keywords:

station area; commercial–employment composite vitality; built environment; nonlinear effects; XGBoost; SHAP

1. Introduction

Areas surrounding rail-transit stations are important spatial units in which commercial services, employment activities, and land development are highly concentrated in dense cities. Previous studies have shown that transit-oriented development (TOD) and station-area built environments significantly influence vitality levels around stations [1]. As key nodes in urban transport networks, metro stations improve regional accessibility and strengthen spatial connectivity, thereby becoming closely related to surrounding land use, functional configuration, and the distribution of economic activities [2,3]. Understanding these relationships is important for evaluating station-area development, improving land-use organization around stations, and supporting commercial and employment concentration.

Nevertheless, rail accessibility alone is insufficient to explain variations in vitality among station areas. Even within the same rail corridor, station surroundings may differ substantially in land value, development intensity, population base, street network, pedestrian conditions, and public-transport connections. Development intensity, land-use structure, and street-network conditions around stations can also affect vitality agglomeration [4]. These environmental conditions jointly shape the spatial concentration of commercial-service facilities and employment activities, and their effects often show nonlinear and synergistic characteristics [5,6]. In mature rail-transit corridors, station-area vitality is rarely shaped by rail accessibility alone. It is more likely to result from the combined effects of rail proximity, development capacity, market conditions, and street-network structure. For planning practice, the key issue is therefore not only whether a station area has high vitality, but how commercial–employment composite vitality is distributed within the 800 m pedestrian catchment. Commercial and employment functions may not decrease smoothly with distance from the station; instead, they may be concentrated in particular chome units depending on development capacity, land value, street connectivity, and land-use conditions.

Existing studies have measured station-area vitality from different perspectives, including the built environment, land use, transport accessibility, and human activity. Commercial-facility density mainly describes service supply, ridership and pedestrian-flow data describe dynamic movement, and employment density describes job concentration [7,8]. These indicators are useful, but each captures only one dimension of station-area vitality. None of them alone can identify chome units where commercial-service provision and employment carrying capacity are jointly concentrated. This distinction is consistent with multidimensional urban-vitality research, which shows that different data sources correspond to different dimensions of vitality [9]. Therefore, this study focuses on the commercial–employment dimension of station-area vitality, defined as the static spatial co-agglomeration of commercial-service provision and employment carrying capacity within station-area chome units. This composite indicator differs from direct measures of pedestrian flow, consumption activity, or dwell time.

A further gap concerns how station-area vitality is analyzed for planning decisions within station catchments. Many studies use stations, fixed buffers, traffic analysis zones, or regular grids as basic analytical units. These scales are useful for comparing different station areas, but they often hide differences among smaller spatial units within the same 800 m pedestrian catchment. For land-use planning, these internal differences are important because renewal decisions are usually implemented at the block or neighborhood scale rather than at the whole-station scale [10]. In practice, the spatial pattern of commercial–employment composite vitality within a station catchment may be continuous around core stations, scattered in peripheral station areas, or discontinuous across adjacent chome units. Therefore, a fine-grained spatial unit is needed to examine whether station-area vitality follows a smooth station-centered gradient or a more localized and uneven pattern.

This fine-grained spatial heterogeneity also has implications for modeling. The relationship between station-area environmental variables and commercial–employment composite vitality does not necessarily follow a simple linear form. Development intensity, land price, rail distance, street connectivity, and land-use mix may exhibit threshold effects, marginal changes, or non-monotonic responses. Traditional linear models can identify average associations but are relatively limited in capturing complex nonlinear relationships and variable interactions. Machine-learning models and interpretable methods such as Shapley additive explanations (SHAP) provide new tools for identifying variable contributions and nonlinear responses [11,12].

To address these gaps, this study examines 20 stations along the Osaka Metro Midosuji Line in Japan, defines the 800 m pedestrian catchment of each station as the study boundary, and uses chome units, which are small neighborhood-level address and statistical units in Japanese cities, as the basic analytical units to capture internal differences within station service areas. Rather than treating each station catchment as a uniform zone, this study uses chome-level analysis to identify where these two functions overlap and how this spatial pattern is associated with local development intensity, land value, rail proximity, and street-network conditions. The Midosuji Line is Osaka’s north–south trunk rail corridor, connecting major business districts, commercial centers, and transport hubs such as Shin-Osaka, Umeda, Yodoyabashi, Hommachi, Shinsaibashi, Namba, and Tennoji. Many of these station areas are also embedded in Osaka’s wider rail network, where intersecting subway and railway corridors may provide additional accessibility beyond the Midosuji Line itself. The station areas along the line vary substantially in land value, development intensity, commercial facilities, employment activities, and transport accessibility, making it a suitable case for analyzing commercial–employment composite vitality in a mature rail-transit corridor.

Accordingly, this study addresses four questions: (1) How is commercial–employment composite vitality unevenly distributed among chome units within 800 m station catchments along the Osaka Midosuji Line? (2) How well do station-area environmental variables explain commercial–employment composite vitality, and which variables contribute most to model performance? (3) Do commercial-service agglomeration and employment intensity show the same environmental response mechanisms? (4) Do key environmental variables exhibit nonlinear, threshold, or non-monotonic responses in relation to commercial–employment composite vitality?

The contributions of this study are threefold. First, it constructs a commercial–employment composite vitality indicator and clarifies its applicability as a static spatial representation. Second, it uses chome units to identify environmental differences within station service areas. Third, by combining multi-model comparison with SHAP-based interpretable machine learning, it reveals nonlinear relationships between station-area environmental variables and a composite indicator, commercial–employment vitality. These results help explain how development intensity, land value, rail proximity, and street-network conditions are associated with fine-grained differences within station catchments, thereby providing empirical evidence for station-area renewal, functional allocation, and commercial-space optimization in mature rail-transit corridors (Figure 1).

2. Literature Review

2.1. Measurement of Station-Area Vitality

Urban vitality is commonly used to describe the concentration of population activities, functional mix, and economic activities in urban space. In recent years, multisource data such as points of interest (POIs), mobile-positioning data, social media, and street-view images have been widely used to characterize urban vitality [13,14]. For areas surrounding rail-transit stations, high transport accessibility and sustained passenger distribution capacity make them important carriers of commercial services, employment activities, and mixed development. Station-area vitality can be understood from several perspectives. Pedestrian flow captures movement intensity, consumption data capture economic transactions, commercial POIs describe service-facility supply, and employment density reflects job concentration. These dimensions are related, but they do not measure the same aspect of vitality. This study focuses on one specific dimension of station-area vitality: the overlap between commercial-service provision and employment carrying capacity. This dimension is referred to as commercial–employment composite vitality.

Existing studies generally measure urban or station-area vitality using indicators such as commercial-facility density, retail outlet distribution, passenger ridership, pedestrian-flow intensity, employment density, and land-use intensity. With the development of multisource urban data, POIs, mobile-phone signaling, social-media check-ins, location-based service data, and street-view data have also been used to characterize the spatial distribution of urban activities. These studies have enriched methods for measuring urban vitality; however, different data sources and indicators usually correspond to different dimensions of activity. Commercial POIs are closer to facility supply and service-function distribution, ridership and mobile-phone signaling are closer to dynamic population activity, and employment density reflects job carrying capacity and the agglomeration of office functions [15].

A single indicator can reveal one aspect of station-area vitality, but it cannot show whether commercial services and employment activities are concentrated in the same small spatial units. In highly urbanized areas, some station areas do not rely solely on consumption-oriented commerce to generate vitality; rather, vitality is jointly supported by commercial-service facilities, office-employment activities, and intensive land development. On this basis, this study incorporates commercial-service agglomeration intensity and employment intensity into the same analytical framework and constructs a commercial–employment composite vitality indicator. The composite indicator measures the extent to which commercial-service provision and employment carrying capacity are both high within a chome unit. It is not intended to replace direct measures of pedestrian flow, consumption intensity, dwell time, or other dynamic dimensions of urban vitality. Existing studies have also emphasized that social, economic, and comprehensive vitality differ in terms of measurement objects [9,16].

2.2. Built Environment and TOD-Related Factors

Station-area vitality is closely associated with TOD, the built environment, and land-use structure. TOD-related studies generally argue that high-density development, mixed land use, a walkable environment, and convenient public-transport connections around rail stations help improve accessibility, functional diversity, and activity agglomeration in station areas [17]. Built-environment research also explains spatial differences in urban activities and travel behavior through dimensions such as density, diversity, design, destination accessibility, and distance to public transport.

In research on commercial and employment activities in station areas, factors such as population density, employment density, development intensity, land-use mix, street network, pedestrian environment, public-transport supply, and locational value are considered closely related to activity agglomeration [18,19]. Population density reflects potential consumer demand and the local service base; employment density and development intensity indicate office functions, building capacity, and spatial carrying capacity; land-use mix reflects functional diversity and daily activity opportunities; street networks and the pedestrian environment influence the diffusion of station pedestrian flows into surrounding blocks, thereby affecting the spatial distribution of commercial facilities and daily activities [20]; public-transport supply and rail proximity affect the efficiency of connections between station areas and the wider urban network.

However, existing studies still often use the station as a whole, fixed buffers, traffic analysis zones, or regular grids as analytical units. These scales are useful for identifying general patterns but may conceal spatial heterogeneity within station service areas. In practice, a single 800 m pedestrian catchment around a station may include different block morphologies, development intensities, land-use structures, and access conditions. If only station-level or buffer-level statistics are used, it may be difficult to reveal fine-grained relationships between internal environmental differences and vitality distribution. Recent station-area studies have also begun to emphasize spatial heterogeneity and local environmental differences around stations [21]. Accordingly, using spatial units that better correspond to actual block boundaries and basic statistical systems can improve the precision of station-area environmental measurement and mechanism identification.

2.3. Nonlinear Modeling and Explainable Machine Learning

Studies on the relationship between the built environment and urban activities have often used statistical models such as ordinary least squares (OLS), spatial regression, and geographically weighted regression. These methods can identify average associations or spatial differences, but they usually rely on relatively explicit functional forms. For areas surrounding rail-transit stations, the relationship between environmental variables and vitality may not follow a simple linear form [22]. For example, higher development intensity may enhance spatial carrying capacity, but its marginal contribution may flatten at high values; land price reflects locational value and market attractiveness but may also be associated with functional filtering and higher space costs; variables such as rail distance, intersection density, and land-use mix may also show threshold effects, marginal changes, or non-monotonic responses.

In recent years, machine-learning methods have gradually been applied to studies of urban vitality, the built environment, and transport behavior. Compared with traditional linear models, Random Forest, Back-Propagation (BP) Neural Network, and extreme gradient boosting (XGBoost) models can more flexibly capture complex nonlinear relationships and variable interactions. XGBoost, a gradient-boosting-tree-based ensemble method, usually performs strongly in structured-data prediction tasks and is suitable for identifying complex relationships between multidimensional environmental variables and spatial-activity indicators [23].

However, machine-learning models also suffer from limited interpretability. Merely reporting predictive accuracy cannot meet the explanatory needs of urban-planning and land-use research [24]. Shapley additive explanations (SHAP), based on the concept of Shapley values, decomposes model predictions into the marginal contributions of individual variables [25] and has been used to interpret the nonlinear effects of built-environment variables on urban vitality [26]. Combining machine-learning models with SHAP therefore helps improve the identification of nonlinear relationships while enhancing the interpretive value of model results for urban-planning research.

Based on the literature, this study proceeds from three aspects: measurement of commercial–employment composite vitality, fine-grained spatial units, and nonlinear interpretable modeling. Specifically, it uses chome as the basic analytical unit, constructs a commercial-employment composite vitality indicator, and compares the model performance and explanatory contributions of linear and nonlinear machine-learning models. It then uses SHAP to identify the relative contributions and nonlinear response patterns of station-area environmental variables, thereby providing empirical evidence to support a better understanding of the mechanisms underlying functional agglomeration in mature rail-transit corridors.

3. Study Area, Data, and Methods

3.1. Study Area and Spatial Units

This study focuses on the areas surrounding stations along the Osaka Metro Midosuji Line in Japan. The Midosuji Line is Osaka’s north–south trunk rail line, with a total length of approximately 24.5 km, with 20 stations (Figure 2). In the central section of the corridor, several Midosuji Line stations are embedded in Osaka’s wider rail network and function as interchange nodes with other subway or railway lines. As a result, station-area accessibility along the corridor reflects both proximity to the Midosuji Line and the broader rail-network context of surrounding stations. Rail-station areas in Japanese metropolitan regions usually exhibit high levels of land-use mix and functional agglomeration [27]. Station surroundings along the line differ markedly in land value, development intensity, commercial facilities, employment activities, and transport accessibility, making the corridor suitable for analyzing commercial–employment composite vitality in a mature rail-transit context.

Following related research on pedestrian service areas around rail-transit stations, this study defines the 800 m pedestrian catchment around each station as the station service area. Previous studies commonly use a 500–1000 m range to identify the pedestrian influence area of rail stations [28,29]. Compared with conventional Euclidean buffers, network-based pedestrian catchments better reflect actual walking-access conditions [30]. This study further uses Japanese chome units—neighborhood-level address and statistical units commonly used in Japan—as the basic analytical units. Their boundaries generally correspond to local urban blocks and official statistical reporting units, which helps capture spatial differences within station service areas.

In the spatial-processing procedure, the 800 m pedestrian catchments of the 20 stations were overlaid with chome units to extract valid analytical units located within station service areas. Dependent and explanatory variables were then uniformly summarized at the chome scale. A total of 481 valid chome samples were obtained. All spatial data were projected to the same coordinate system and matched and calculated at the same spatial-unit scale. For a small number of missing values in continuous variables, ordinary kriging interpolation was used to estimate the missing values based on the spatial distribution of neighboring units. This procedure was conducted before model construction to ensure the completeness and consistency of the chome-level dataset.

3.2. Dependent Variables

Three dependent variables are specified to represent commercial-service agglomeration, employment carrying capacity, and their composite state. The first is commercial-service agglomeration intensity (Y1), which reflects the spatial concentration of commercial-service facilities within a chome unit. The second is employment intensity (Y2), which reflects employment activity and job carrying capacity within a chome unit. The third is commercial–employment composite vitality (Y), which identifies chome units where commercial-service provision and employment density are both high. This indicator should not be interpreted as a direct measure of pedestrian vitality, consumption intensity, dwell time, or business operating performance. Instead, it is a static facility-employment indicator constructed from commercial-facility and employment data. Its purpose is to identify areas in mature rail-station catchments where commercial services and employment functions overlap.

Commercial-service agglomeration intensity was constructed from Google Maps POI data. POI data can effectively reflect the supply of commercial-service facilities and the distribution of urban functions and have been widely used in urban-vitality studies [31,32]. Commercial-service facilities were classified into six categories: retail, catering, leisure and entertainment, life services, financial services, and tourist accommodation (Table 1). After data cleaning, deduplication, and spatial clipping, 35,879 valid commercial POIs were obtained. Considering the continuous spatial agglomeration of commercial facilities, this study used kernel density estimation to generate a commercial-service agglomeration surface. Similar methods are commonly used to identify spatial hotspots of urban facilities and activities [33]. The kernel-density results were then aggregated to the chome units to obtain the commercial-service agglomeration intensity of each chome.

Employment intensity was constructed using 2021 employment data from e-Stat, the Japanese government’s official statistical portal, to represent the employment carrying capacity of the chome units. Employment density is commonly used to characterize station-area office functions, economic activities, and spatial carrying capacity [34]. To reduce the influence of extreme values and improve comparability across indicators with different units, commercial-service agglomeration intensity and employment intensity were transformed using ln(1 + x) and standardized. The commercial–employment composite vitality indicator was then constructed using equal weights:

Y = \frac{Z [\ln (1 + A g g_{i}^{c s f})] + Z [\ln (1 + \frac{E m p_{i}}{A_{i}})]}{2}

(1)

where Y denotes commercial–employment composite vitality and Z(.) denotes standardization. Equal weighting is used because this study aims to simultaneously characterize commercial-service supply and employment carrying capacity without presupposing differences in their weights within composite vitality. To avoid masking differences between the commercial and employment dimensions, separate models using Y1 and Y2 were also constructed and compared with the composite-vitality model (Table 2).

3.3. Independent Variables

The explanatory variables characterize station-area environmental conditions. Drawing on TOD, urban-vitality, and built-environment studies, this study selects ten variables representing market conditions, population structure, development intensity, land use, street network, pedestrian environment, bus supply, rail accessibility, and interchange accessibility. These variables include average officially assessed land price, residential population density, proportion of older adults, development-intensity index, land-use mix, intersection density, pedestrian friendliness, bus-stop density, network distance to the nearest metro station, and transfer-line count (Table 3). The variable system corresponds to built-environment dimensions, such as density, diversity, design, destination accessibility, and distance to public transport, and rail-network connectivity [35].

Average officially assessed land price represents locational value and market attractiveness. Residential population density and the proportion of older adults reflect the potential demand base and demographic structure. The development-intensity index combines indicators such as floor-area ratio, building coverage ratio, proportion of commercial land, and proportion of high-FAR areas to represent spatial development and functional carrying capacity. Land-use mix reflects functional diversity. Intersection density and pedestrian friendliness characterize the street network and pedestrian environment. Bus-stop density represents surface public-transport supply. Network distance to the nearest metro station represents distance-based rail accessibility at the chome scale. Transfer-line count was introduced to represent the broader rail-network context of each station area. For each Midosuji Line station, transfer-line count was calculated as the number of subway or railway lines that intersect with, or are directly transferable from, that station. Because some chome units fall within overlapping 800 m station catchments, each chome unit was assigned the maximum transfer-line count among the relevant Midosuji Line stations. This variable distinguishes local metro-station proximity from the broader rail-network accessibility associated with interchange stations and intersecting rail corridors.

For variables with clearly right-skewed distributions, such as intersection density and bus-stop density, logarithmic transformation was conducted before modeling to reduce the influence of extreme values.

3.4. Modeling Strategy and Interpretation

To compare environmental-response differences among commercial-service agglomeration, employment intensity, and commercial–employment composite vitality, this study constructs Model A, Model B, and Model C with Y1, Y2, and Y as dependent variables, respectively. Model A represents commercial-service agglomeration intensity, Model B represents employment intensity, and Model C represents commercial–employment composite vitality. The same set of station-area environmental variables listed in Table 3 was used across the three models to ensure comparability. By comparing predictive performance, variable importance, and SHAP response characteristics across the three models, the study evaluates whether commercial, employment, and composite vitality share the same environmental association mechanisms.

In model construction, both linear models and nonlinear machine-learning models are used. OLS and Lasso serve as linear baseline models, while Random Forest, Back-Propagation (BP) Neural Network, and XGBoost are used to capture potential nonlinear relationships and variable interactions. Similar model-comparison frameworks have been applied in research on the built environment, travel behavior, and urban vitality [36,37]. Before modeling, all variables were spatially matched at the chome scale and standardized. The samples were divided into training and test sets at an 8:2 ratio. Model performance was evaluated using the coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE). Five-fold cross-validation was further conducted to examine model stability.

For model interpretation, SHAP was used to analyze variable contributions in the XGBoost model. SHAP is based on the Shapley-value concept and decomposes model predictions into the marginal contributions of explanatory variables. Mean absolute SHAP values were used to identify relative variable importance, and SHAP dependence analysis was used to examine the model response of key variables across different value ranges. This method can simultaneously present variable importance, direction of effect, and potential threshold intervals [38]. It should be noted that SHAP results reflect model-internal variable contributions and response patterns and should not be interpreted directly as strict causal effects.

To ensure comparability among the commercial, employment, and composite vitality models, the subsequent SHAP comparisons are all based on the XGBoost framework. Specifically, Model A, Model B, and Model C were each estimated using an XGBoost Tuned model fitted to the corresponding dependent variable. Although the BP Neural Network achieved the highest test-set accuracy for commercial–employment composite vitality, the difference between BP Neural Network and the XGBoost models was relatively small, and both belonged to the group of nonlinear models that clearly outperformed the linear baselines. Because the purpose of this study is not only prediction but also interpretation for station-area planning, XGBoost was selected as the main explanatory model. Its tree-based structure is well suited to SHAP decomposition, dependence analysis, and comparison of variable contributions across models. Using the tuned XGBoost framework consistently across the three dependent variables allows variable importance, nonlinear response patterns, and environmental-response differences to be compared on the same modeling basis. Therefore, the XGBoost results are used mainly for interpretation rather than to claim that XGBoost is the single best predictive model.

4. Results

4.1. Spatial Patterns of Commercial–Employment Composite Vitality

Figure 3 shows the spatial distribution of commercial–employment composite vitality within the 800 m pedestrian catchments of stations along the Osaka Midosuji Line. Overall, commercial–employment composite vitality exhibits a clear corridor-based agglomeration pattern. High-value units are mainly distributed around central and core stations along the Midosuji Line, especially in areas such as Umeda, Yodoyabashi, Hommachi, Shinsaibashi, Namba, and Tennoji. Existing research on metro vibrancy has similarly found that areas around central business districts and transport hubs generally have higher levels of station-area vitality [2,3]. These areas typically have high land values, strong development intensity, dense commercial-service facilities, and high employment intensity, reflecting the spatial overlap of commercial services, office employment, and mixed functions in mature rail corridors.

Along the line, high-value chome units around core-area stations are more continuous, whereas commercial–employment composite vitality around peripheral stations is generally weaker and high-value units are more scattered. The chome-scale results further indicate that even within the same 800 m pedestrian catchment, commercial–employment composite vitality is not evenly distributed; instead, it is more concentrated in local units that are closer to stations, have higher development intensity, higher land price, or better street connectivity. This demonstrates that using chome units as the analytical unit helps identify fine-grained spatial differences within station service areas. From a planning perspective, this means that an 800 m station catchment should not be treated as a homogeneous influence zone. Instead, commercial–employment composite vitality is concentrated in particular chome units where rail-network proximity, development intensity, land value, and local street conditions overlap.

Figure 4 further compares the spatial distributions of commercial-service agglomeration intensity, employment intensity, and commercial-employment composite vitality. All three indicators show high agglomeration in the core area of the Midosuji Line, but their spatial overlap is not complete. High values of commercial-service agglomeration tend to appear in chome units that are close to stations, have fine-grained street networks, and contain rich commercial-service functions. High values of employment intensity are more concentrated in central business districts characterized by high development intensity, high land price, and prominent office functions. High values of commercial–employment composite vitality are generally located in areas where the two dimensions overlap strongly. This spatial difference provides a necessary basis for the subsequent comparison of split models.

4.2. Model Performance for Commercial–Employment Composite Vitality

To evaluate the predictive ability of station-area environmental variables for commercial–employment composite vitality, this study compares the test-set performance of six models: OLS, Lasso, Random Forest, BP Neural Network, XGBoost Baseline, and XGBoost Tuned. Table 4 reports the R², RMSE, and MAE values of these models on the test set.

The results show that nonlinear models generally outperform linear models, suggesting that the relationship between station-area environmental variables and commercial–employment composite vitality is not simply linear. The BP Neural Network achieves the highest test-set R², at 0.791, with RMSE and MAE values of 0.401 and 0.308, respectively. Random Forest and the two XGBoost models also show strong performance, with test-set R² values above 0.760. By contrast, the R² values of OLS and Lasso are 0.637 and 0.662, respectively. Although the BP Neural Network achieves the highest test-set accuracy, the XGBoost models show comparable performance and offer a more transparent basis for SHAP-based interpretation. Therefore, the model-comparison results are used to demonstrate the general advantage of nonlinear models over linear baselines, while XGBoost is used as the common explanatory framework for identifying variable contributions and nonlinear response patterns. These model-performance differences are visualized in Figure 5, while Figure 6 compares the observed and predicted values of commercial–employment composite vitality based on the XGBoost Tuned model. Previous studies have likewise found that gradient-boosting decision tree models, including XGBoost, usually outperform linear models in explaining station-area and urban vitality [1,5].

As shown in Table 5, the five-fold cross-validation results further confirm the advantage of nonlinear models. BP Neural Network and XGBoost Baseline obtain the highest mean R² values, both approximately 0.735, while XGBoost Tuned has a similar mean R² value of 0.731. The differences among the main nonlinear models are relatively small, whereas OLS and Lasso remain clearly lower. These results suggest that nonlinear modeling provides more stable explanatory capacity than linear baselines for commercial–employment composite vitality.

4.3. Model Comparison for Commercial, Employment, and Composite Vitality

To examine whether commercial–employment composite vitality masks different mechanisms between commercial-service agglomeration and employment intensity, this study further constructs Model A and Model B using Y1 and Y2, respectively, and compares them with Model C, which uses Y as the dependent variable. Table 6 summarizes the test-set performance of different models for the three dependent variables, and Figure 7 presents the observed-versus-predicted values for Model A and Model B.

For commercial-service agglomeration intensity (Y1), nonlinear models clearly outperform OLS and Lasso. BP Neural Network achieves the highest test-set R², at 0.759, followed by XGBoost Tuned with an R² of 0.704. By contrast, the R² values of OLS and Lasso are 0.315 and 0.351, respectively, indicating a strong nonlinear association between commercial-service agglomeration and station-area environmental variables.

For employment intensity (Y2), the linear models already show relatively strong predictive performance, with R² values of 0.729 for OLS and 0.732 for Lasso. Random Forest further improves the R² to 0.786, while BP Neural Network and the XGBoost models also maintain relatively high performance. This suggests that employment intensity has a more stable correspondence with structural variables such as development intensity and land price, although nonlinear models still provide additional predictive improvement.

For commercial–employment composite vitality (Y), the full model comparison has been presented in Table 4. In Table 6, the XGBoost Tuned result is retained for comparison with the SHAP-based models of commercial-service agglomeration and employment intensity. The relatively strong performance of the XGBoost model provides a consistent basis for the subsequent comparison of variable importance and response characteristics across the three dependent variables.

4.4. SHAP Interpretation of Commercial–Employment Composite Vitality

Within the XGBoost framework, SHAP analysis was used to identify the relative contribution of each station-area environmental variable. Figure 8 shows the variable-importance ranking based on mean absolute SHAP values, and Figure 9 shows the SHAP summary plot.

The results show that the development-intensity index has the highest contribution in the commercial–employment composite vitality model, with a mean absolute SHAP value of 0.400. Average officially assessed land price and network distance to the nearest metro station rank second and third, with mean absolute SHAP values of 0.179 and 0.169, respectively. These three variables correspond to three key spatial conditions for station-area functional concentration: physical development capacity, market-location advantage, and rail-network proximity. Intersection density and transfer-line count rank fourth and fifth, indicating that local street-network structure and interchange accessibility also contribute to the prediction of composite vitality. In contrast, land-use mix, pedestrian friendliness, bus-stop density, and the proportion of older adults contribute relatively little.

These results suggest that high composite vitality is more likely to appear in chome units with sufficient development capacity, strong market-location advantages, good metro-station proximity, and favorable interchange accessibility. TOD studies also show that high development intensity and good rail proximity are important conditions for functional agglomeration in station areas [17,18]. The development-intensity index reflects spatial carrying capacity; average officially assessed land price captures locational value and market maturity; network distance indicates metro-station proximity; and transfer-line count represents interchange conditions and the broader rail-network context.

Figure 10 presents the SHAP dependence relationships for the six most important variables. The development-intensity index, average officially assessed land price, network distance to the nearest metro station, intersection density, transfer-line count, and residential population density all display nonlinear responses to varying degrees.

The SHAP dependence plots show differentiated nonlinear responses among the main variables. The development-intensity index generally shifts from negative or near-zero SHAP values to positive contributions when it reaches a higher level, suggesting that commercial–employment composite vitality requires sufficient spatial carrying capacity. Average officially assessed land price also shows a nonlinear positive response, but its marginal contribution tends to flatten in higher value ranges. This indicates that land price should be interpreted as a proxy for mature location, market attractiveness, and employment-opportunity concentration rather than as a direct policy target.

Network distance to the nearest metro station shows a nonlinear distance-related response. Chome units with better metro-station proximity tend to have higher SHAP contributions, whereas the contribution declines in higher-distance ranges. Because the network-distance variable was processed during data preparation, the identified turning points should be interpreted as relative-change intervals rather than thresholds in actual meters. Intersection density shows a non-monotonic response. Street-network research suggests that intersection density and street connectivity affect block-level activity, but their effects may depend on spatial scale and land-use structure [10]. The model contribution improves as intersection density increases to a moderate level, but it does not continue to rise in high-value intervals. This suggests that street-network planning should focus on pedestrian permeability, station access, and the continuity of commercial frontages rather than simply increasing the number of intersections.

Transfer-line count shows a moderate and generally positive SHAP contribution, indicating that broader rail-network connectivity is related to commercial–employment composite vitality. However, its contribution remains smaller than those of development intensity, land price, and metro-station proximity, so it should be interpreted as a complementary network-accessibility condition. Residential population density shows a weaker nonlinear contribution, suggesting that the local population base may support station-area commercial services but is less influential than structural, locational, and network-related variables in this mature rail-transit corridor.

Overall, the SHAP dependence plots demonstrate that the relationship between station-area environmental variables and commercial–employment composite vitality is not simply linear. The results should not be equated with strict causal effects, but they indicate that station-area vitality is associated with the combined effects of development capacity, market-location conditions, metro-station proximity, local street-network structure, population base, and broader rail-network connectivity.

4.5. Comparison of SHAP Importance Across Model A, Model B, and Model C

To further compare environmental-response differences among commercial-service agglomeration, employment intensity, and commercial–employment composite vitality, SHAP values were calculated for the tuned XGBoost models fitted separately for Model A, Model B, and Model C. This section focuses on explanatory comparison rather than on selecting the highest-accuracy predictive model. Using tuned XGBoost models allows variable contributions to be compared consistently across the three dependent variables under the same modeling framework. Table 7 and Figure 11 present the SHAP-based variable-importance comparison across the three models.

The results show that the development-intensity index ranks first in all three models, indicating that spatial development capacity is an important common correlate of commercial-service agglomeration, employment intensity, and commercial–employment composite vitality. However, the rankings of secondary variables differ by model. In Model A, network distance to the nearest metro station and intersection density rank second and third, respectively, indicating that commercial-service agglomeration is more sensitive to rail proximity and street-network conditions. This implies that commercial-service concentration depends strongly on how station entrances are connected to surrounding streets, blocks, and pedestrian routes. Related studies have also found that non-commuting destination choice and commercial-activity distribution are closely associated with station proximity, facility density, and the pedestrian environment [39,40]. In Model B, average officially assessed land price ranks second, and its SHAP value is much higher than those of network distance and intersection density, suggesting that employment intensity is more closely connected with core locational value and development capacity. This indicates that employment concentration is more closely related to office-location advantages, development capacity, and market maturity than to pedestrian-scale street conditions alone.

The variable ranking in Model C lies between those of Model A and Model B. Development-intensity index and average officially assessed land price remain highly influential, while network distance to the nearest metro station and intersection density also retain importance. This indicates that commercial–employment composite vitality reflects the importance of development intensity and market value found in the employment model while retaining the effects of rail proximity and street-network conditions found in the commercial model. Commercial–employment composite vitality can therefore be understood as a static spatial representation of the joint agglomeration of commercial-service supply and employment carrying capacity, rather than as a substitute for a single commercial or employment indicator.

Further SHAP interaction-value results indicate that, in addition to strong main effects such as development intensity, land price, and network distance, the interactions between development intensity and population density, development intensity and land price, and development intensity and metro network distance also have relatively high strengths. This suggests that spatial differences in commercial–employment composite vitality may be related to the combined effects of spatial carrying capacity, market fundamentals, and rail proximity.

Based on the key variable combinations identified by SHAP interaction values, this study further plots two-dimensional partial-dependence relationships between the development-intensity index and average officially assessed land price, metro network distance, and intersection density, as well as between metro network distance and transfer-line count (Figure 12). The results show that higher predicted commercial–employment composite vitality mainly occurs in areas with both high development intensity and high land price. The combined response of development intensity and metro network distance indicates that high development intensity still needs to be matched with good rail proximity. The combined relationship between development intensity and intersection density suggests that street-network conditions further shape the vitality effect of development intensity. The interaction between metro network distance and transfer-line count indicates that interchange accessibility should be interpreted together with distance-based rail accessibility rather than in isolation. It should be noted that two-dimensional partial-dependence plots reflect average model-predicted response relationships and should not be interpreted as strict causal effects.

5. Discussion

5.1. Nonlinear Mechanisms and Chome-Scale Spatial Organization

The results indicate that commercial–employment composite vitality along the Osaka Midosuji Line is not the direct outcome of a single factor such as rail accessibility or population size. At the chome scale, vitality within an 800 m station catchment does not follow a simple station-centered gradient. High-value units are concentrated where development intensity, land value, metro-station proximity, interchange accessibility, and street-network conditions overlap. The 800 m catchment should therefore be understood as a set of differentiated neighborhood units rather than a homogeneous buffer around a station, consistent with recent explainable-machine-learning evidence that built-environment effects on urban vitality may exhibit nonlinear response patterns [41].

Development intensity has the strongest contribution, suggesting that sufficient spatial carrying capacity is a basic condition for the joint concentration of commercial services and employment. Land price reflects market-location advantages and commercial maturity, while network distance and transfer-line count show that both near-station accessibility and broader rail-network connectivity matter. However, these variables do not operate independently. The interaction results indicate that development capacity, land value, rail proximity, and interchange accessibility jointly shape the predicted level of composite vitality.

Street-network and population variables make secondary but still meaningful contributions. The non-monotonic response of intersection density suggests that good street connectivity can support station-area activity, but very high intersection density does not necessarily produce higher vitality. Residential population density mainly reflects the local demand base, but in this mature rail corridor its contribution is weaker than those of development capacity, market location, and rail-network accessibility. These findings support the use of chome-scale analysis to distinguish station catchments with continuous core-area agglomeration from those with scattered or discontinuous vitality patterns, echoing recent findings that the built environment’s influence on urban vitality can vary across space [42].

5.2. Interpretation Boundary and Different Spatial Logics of the Composite Indicator

The commercial–employment composite vitality indicator has a specific interpretation boundary. It identifies chome units where commercial-service provision and employment carrying capacity are jointly concentrated. In other words, it represents the facility-employment dimension of station-area vitality rather than urban vitality in a broad sense.

This indicator should not be interpreted as a direct measure of pedestrian vitality, consumption intensity, dwell time, street-level social interaction, or business operating performance. These behavioral and temporal dimensions require additional evidence from pedestrian-flow data, transaction data, mobile-phone signaling data, or street-view perception data. Therefore, the composite indicator should be understood as a static spatial representation of functional co-agglomeration in mature rail-station areas.

The split-model results further clarify the meaning of the composite indicator. Commercial-service agglomeration and employment intensity share a common foundation in development intensity, but they follow different spatial logics. Commercial-service agglomeration is more sensitive to rail-network proximity and street-network conditions. This suggests that commercial services depend strongly on how station entrances, pedestrian routes, street blocks, and commercial frontages are connected. In contrast, employment intensity is more closely associated with development intensity and officially assessed land price, indicating that employment concentration is more strongly linked to building capacity, office-location advantages, and market maturity.

The composite model lies between the commercial and employment models. It retains the importance of development intensity and land value found in the employment model, while also preserving the role of rail-network proximity and street connectivity found in the commercial model. Therefore, the value of the composite indicator is not to replace either the commercial-service indicator or the employment indicator. Rather, it identifies chome units where commercial services and employment are both concentrated. These units are places where good pedestrian access, sufficient development capacity, and strong market location jointly support commercial and employment concentration.

5.3. Planning Implications

The findings offer three implications for station-area renewal and functional optimization in mature rail-transit corridors, especially where trunk metro lines intersect with other rail corridors and central urban functions. They also suggest that planning strategies should be differentiated according to the spatial conditions of specific chome units rather than applied uniformly across the whole 800 m station catchment.

First, station-area renewal should shift from a single focus on near-station development to the coordinated improvement of rail proximity, interchange connectivity, and development capacity. Studies of station-area vitality suggest that station-area planning should formulate differentiated strategies by considering spatial morphology, functional distribution, and the pedestrian environment [43]. For areas with good rail accessibility but insufficient development intensity, rail or transfer accessibility alone may not generate high commercial–employment composite vitality. In planning practice, land-use adjustment, building-capacity optimization, and mixed-function allocation should be coordinated so that rail accessibility can be better transformed into spatial activity carrying capacity and economic-function agglomeration.

Second, differentiated renewal strategies should be adopted for different station types. For core stations with strong market foundations, such as Umeda, Shinsaibashi, and Namba, the priority should be to improve the coordination among commercial services, office employment, and public space and to avoid functional exclusion or spatial congestion caused by high land prices and intensive development. For peripheral stations or stations with weaker vitality, it is inappropriate to simply replicate the high-intensity commercial-development model of core areas; instead, priority should be given to improving development capacity, street connectivity, pedestrian accessibility, and functional introduction conditions.

Third, near-station street networks and pedestrian organization remain important components of fine-grained station-area renewal. Commercial-service agglomeration is sensitive to network distance and intersection density, indicating that the ways in which station entrances are connected to surrounding streets, commercial frontages, and public spaces can influence the diffusion of pedestrian flows from stations to surrounding blocks. Planning practice should therefore emphasize direct pedestrian routes, block permeability, and continuity of commercial frontages to enhance the support that rail nodes provide for surrounding commercial services and employment activities.

Overall, station-area renewal should not only increase density or add commercial facilities. It should match development capacity, rail-network proximity, street connections, and market conditions to the needs of different chome units. The chome-scale analysis demonstrates that fine-grained identification of internal differences within station service areas can support more targeted station-area renewal and functional-allocation strategies.

The results further imply several types of planning responses at the chome scale. For chome units with high development intensity, high land value, and strong rail-network proximity, planning should focus less on additional intensification and more on improving public-space quality, pedestrian circulation, and the coordination between commercial and office functions. For units that are close to stations but have insufficient development intensity, land-use adjustment, building-capacity optimization, and mixed-function introduction may help transform accessibility advantages into functional concentration. For units with relatively high development intensity but weak street connectivity, renewal should prioritize pedestrian routes, block permeability, station-access links, and the continuity of commercial frontages. For peripheral or low-vitality station areas, gradual renewal, local service provision, and improved walking connections may be more suitable than copying the high-intensity development model of core stations.

6. Conclusions

This study examined 20 stations along the Osaka Metro Midosuji Line in Japan and used chome units within 800 m pedestrian catchments as the basic analytical units. It constructed three dependent variables—commercial-service agglomeration intensity, employment intensity, and commercial–employment composite vitality—and combined multi-model comparison with SHAP to analyze nonlinear relationships between station-area environmental variables and composite vitality. The composite vitality indicator represents the static co-agglomeration of commercial-service supply and employment-carrying capacity, rather than a comprehensive measure of all behavioral, temporal, or perceptual dimensions of urban vitality.

The results show that nonlinear models generally outperform linear models, indicating that the relationship between station-area environmental variables and commercial-employment composite vitality is not simply linear. Development-intensity index, average officially assessed land price, and network distance to the nearest metro station are the most influential variables, while intersection density and transfer-line count also play meaningful roles. SHAP results further reveal threshold, marginal, and non-monotonic responses. These findings indicate that commercial–employment composite vitality in mature rail-station areas is associated with the combined effects of development capacity, market fundamentals, distance-based rail accessibility, interchange accessibility, and street-network conditions.

The split-model results show that commercial-service agglomeration and employment intensity share a common foundation in development intensity but follow different spatial logics. Commercial-service agglomeration is more sensitive to rail proximity and street-network conditions, whereas employment intensity is more closely connected with land price and development intensity. The composite model lies between the two, indicating that commercial–employment composite vitality identifies chome units where commercial-service supply and employment carrying capacity co-agglomerate, while its interpretation remains bounded to the facility-employment dimension of vitality. Methodologically, the study demonstrates the value of combining chome-scale analysis with SHAP-based interpretable machine learning to reveal fine-grained and nonlinear station-area patterns.

This study has several limitations. First, the commercial–employment composite vitality indicator is mainly based on POI and employment data and therefore reflects a relatively static state of facility-employment agglomeration. It does not directly capture dynamic pedestrian flow, consumption behavior, dwell time, street-level social interaction, or business operating status. Second, this study uses the Osaka Midosuji Line as the case, and the findings require further validation in other lines and urban contexts. Cross-city comparisons and comparisons among different functional areas would help identify contextual differences in built-environment-vitality relationships [2,44]. Third, transfer-line count is a simplified measure of interchange accessibility and does not capture passenger volume, service frequency, transfer convenience, network centrality, or pedestrian connections inside interchange stations. Future research can combine mobile-phone signaling, rail ridership, service-frequency data, consumption data, and street-view perception data to further analyze dynamic changes and network mechanisms of station-area vitality. Existing studies suggest that multisource dynamic data can help reveal temporal heterogeneity and behavioral mechanisms of urban vitality [6,45].

Author Contributions

Y.L.: conceptualization, methodology, software, formal analysis, investigation, data curation, visualization, and writing—original draft; Z.W.: conceptualization, methodology, software, formal analysis, investigation, data curation, visualization, and writing—original draft; M.Y.: conceptualization, resources, supervision, project administration, funding acquisition, and writing—review and editing; Y.Z.: validation, investigation, and writing—review and editing; Q.Z.: validation, investigation, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [grant number 52278061].

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the public data providers, including e-Stat, the Ministry of Land, Infrastructure, Transport and Tourism of Japan, and the Geospatial Information Authority of Japan, for making relevant statistical, land-use, transport, and spatial data available. The authors also appreciate the support provided by Huaqiao University during the preparation of this study.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

TOD	transit-oriented development
POI	point of interest
OLS	ordinary least squares
BP	Back-Propagation
XGBoost	extreme gradient boosting
SHAP	Shapley additive explanations
MLIT	Ministry of Land, Infrastructure, Transport and Tourism
GSI	Geospatial Information Authority of Japan
RMSE	root mean square error
MAE	mean absolute error

References

Xiao, L.; Lo, S.; Liu, J.; Zhou, J.; Li, Q. Nonlinear and Synergistic Effects of TOD on Urban Vibrancy: Applying Local Explanations for Gradient Boosting Decision Tree. Sustain. Cities Soc. 2021, 72, 103063. [Google Scholar] [CrossRef]
Tu, W.; Zhu, T.; Zhong, C.; Zhang, X.; Xu, Y.; Li, Q. Exploring Metro Vibrancy and Its Relationship with Built Environment: A Cross-City Comparison Using Multi-Source Urban Data. Geo-Spat. Inf. Sci. 2022, 25, 182–196. [Google Scholar] [CrossRef]
Pan, H.; Huang, Y. TOD Typology and Station Area Vibrancy: An Interpretable Machine Learning Approach. Transp. Res. Part A Policy Pract. 2024, 186, 104150. [Google Scholar] [CrossRef]
Gui, W.; Wu, W.; Wu, D. Study on the Correlation between Rail Station Area Vitality and Built Environment. J. Asian Archit. Build. Eng. 2025. [Google Scholar] [CrossRef]
Peng, J.; Hu, Y.; Liang, C.; Wan, Q.; Dai, Q.; Yang, H. Understanding Nonlinear and Synergistic Effects of the Built Environment on Urban Vibrancy in Metro Station Areas. J. Eng. Appl. Sci. 2023, 70, 18. [Google Scholar] [CrossRef]
Chen, Y.; Yu, B.; Shu, B.; Yang, L.; Wang, R. Exploring the Spatiotemporal Patterns and Correlates of Urban Vitality: Temporal and Spatial Heterogeneity. Sustain. Cities Soc. 2023, 91, 104440. [Google Scholar] [CrossRef]
Li, X.; Li, Y.; Jia, T.; Zhou, L.; Hijazi, I.H. The Six Dimensions of Built Environment on Urban Vitality: Fusion Evidence from Multi-Source Data. Cities 2022, 121, 103482. [Google Scholar] [CrossRef]
Wang, Z.; Liu, Y.; Luo, X.; Tong, Z.; An, R. Nonlinear relationship between urban vitality and the built environment based on multi-source data: A case study of the main urban area of Wuhan City at the weekend. Prog. Geogr. 2023, 42, 716–729. [Google Scholar] [CrossRef]
Jin, A.; Ge, Y.; Zhang, S. Spatial Characteristics of Multidimensional Urban Vitality and Its Impact Mechanisms by the Built Environment. Land 2024, 13, 991. [Google Scholar] [CrossRef]
Fang, C.; He, S.; Wang, L. Spatial Characterization of Urban Vitality and the Association With Various Street Network Metrics From the Multi-Scalar Perspective. Front. Public Health 2021, 9, 677910. [Google Scholar] [CrossRef]
Liu, W.; Yang, Z.; Gui, C.; Li, G.; Xu, H. Investigating the Nonlinear Relationship Between the Built Environment and Urban Vitality Based on Multi-Source Data and Interpretable Machine Learning. Buildings 2025, 15, 1414. [Google Scholar] [CrossRef]
Ling, Z.; Zheng, X.; Chen, Y.; Qian, Q.; Zheng, Z.; Meng, X.; Kuang, J.; Chen, J.; Yang, N.; Shi, X. The Nonlinear Relationship and Synergistic Effects between Built Environment and Urban Vitality at the Neighborhood Scale: A Case Study of Guangzhou’s Central Urban Area. Remote Sens. 2024, 16, 2826. [Google Scholar] [CrossRef]
Ma, Z. Deep Exploration of Street View Features for Identifying Urban Vitality: A Case Study of Qingdao City. Int. J. Appl. Earth Obs. Geoinf. 2023, 123, 103476. [Google Scholar] [CrossRef]
Li, Y.; Yabuki, N.; Fukuda, T. Exploring the Association between Street Built Environment and Street Vitality Using Deep Learning Methods. Sustain. Cities Soc. 2022, 79, 103656. [Google Scholar] [CrossRef]
Chen, L.; Zhao, L.; Xiao, Y.; Lu, Y. Investigating the Spatiotemporal Pattern between the Built Environment and Urban Vibrancy Using Big Data in Shenzhen, China. Comput. Environ. Urban Syst. 2022, 95, 101827. [Google Scholar] [CrossRef]
Chen, X.; Yang, J.; Mai, J.; Cui, A.; Gu, X. Revealing the Impact of the Built Environment on the Temporal Heterogeneity of Urban Vitality Using Ensemble Machine Learning. Land 2025, 14, 2182. [Google Scholar] [CrossRef]
Singh, Y.J.; Lukman, A.; Flacke, J.; Zuidgeest, M.; Van Maarseveen, M.F.A.M. Measuring TOD around Transit Nodes—Towards TOD Policy. Transp. Policy 2017, 56, 96–111. [Google Scholar] [CrossRef]
Wang, Z.; Li, S.; Zhang, Y.; Wang, X.; Liu, S.; Liu, D. Built Environment Renewal Strategies Aimed at Improving Metro Station Vitality via the Interpretable Machine Learning Method: A Case Study of Beijing. Sustainability 2024, 16, 1178. [Google Scholar] [CrossRef]
Xu, Y.; Mao, W.; Hu, S. Unveiling the Pulse of Urban Metro Stations: A TOD-Driven Approach to Measuring Vibrancy Using Geographically Weighted Random Forest. Int. J. Digit. Earth 2025, 18, 2524054. [Google Scholar] [CrossRef]
Ruan, Y.; Zhang, X.; Wang, S.; Chen, X.; Chen, Q. Street View-Enabled Explainable Machine Learning for Spatial Optimization of Non-Motorized Transportation-Oriented Urban Design. Land 2025, 14, 1347. [Google Scholar] [CrossRef]
Wang, J.; Liu, Y.; Weo, Z. The impact of the built environment on the spatial vitality of metro station areas and its heterogeneity: A case study of the four central districts of Guangzhou City. Prog. Geogr. 2025, 44, 1664–1677. [Google Scholar] [CrossRef]
Wang, X.; Zhang, Y.; Li, C.; Yin, C.; Shao, C. Investigating Nonlinear and Spatially Heterogeneous Impacts of the Built Environment on Urban Vitality. Sustain. Cities Soc. 2025, 135, 107033. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
Li, D.; Han, H.; Wang, J.; Xiao, X. Explaining Urban Vitality Through Interpretable Machine Learning: A Big Data Approach Using Street View Images and Environmental Factors. Sustainability 2025, 17, 4926. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Gao, S.; Ge, X.; Li, H.; Zhou, H. Analysis of Urban Vitality and Its Driving Factors in Zhengzhou’s Main Urban Area Based on Multi-Source Data and XGBoost. Ecol. Indic. 2025, 179, 114187. [Google Scholar] [CrossRef]
Chorus, P.; Bertolini, L. Developing Transit-Oriented Corridors: Insights from Tokyo. Int. J. Sustain. Transp. 2016, 10, 86–95. [Google Scholar] [CrossRef]
Guerra, E.; Cervero, R.; Tischler, D. The Half-Mile Circle: Does It Best Represent Transit Station Catchments? Transp. Res. Rec. 2012, 2276, 101–109. [Google Scholar] [CrossRef]
Vale, D.S.; Pereira, M. Influence on Pedestrian Commuting Behavior of the Built Environment Surrounding Destinations: A Structural Equations Modeling Approach. Int. J. Sustain. Transp. 2016, 10, 730–741. [Google Scholar] [CrossRef]
Vale, D.S.; Viana, C.M.; Pereira, M. The Extended Node-Place Model at the Local Scale: Evaluating the Integration of Land Use and Transport for Lisbon’s Subway Network. J. Transp. Geogr. 2018, 69, 282–293. [Google Scholar] [CrossRef]
Long, Y.; Liu, X. Automated Identification and Characterization of Parcels (AICP) with OpenStreetMap and Points of Interest. Environ. Plan. B Plan. Des. 2016, 43, 341–360. [Google Scholar]
Gao, C.; Li, S.; Sun, M.; Zhao, X.; Liu, D. Exploring the Relationship between Urban Vibrancy and Built Environment Using Multi-Source Data: Case Study in Munich. Remote Sens. 2024, 16, 1107. [Google Scholar] [CrossRef]
Wu, W. Spatial Characteristics of Urban Vitality based on Multi-dimensional Perception. J. Geo-Inf. Sci. 2022, 24, 1867. [Google Scholar] [CrossRef]
Papa, E.; Bertolini, L. Accessibility and Transit-Oriented Development in European Metropolitan Areas. J. Transp. Geogr. 2015, 47, 70–83. [Google Scholar] [CrossRef]
Ewing, R.; Cervero, R. Travel and the Built Environment: A Meta-Analysis. J. Am. Plan. Assoc. 2010, 76, 265–294. [Google Scholar] [CrossRef]
Jin, T.; Cheng, L.; Zhang, X.; Cao, J.; Qian, X.; Witlox, F. Nonlinear Effects of the Built Environment on Metro-Integrated Ridesourcing Usage. Transp. Res. Part D Transp. Environ. 2022, 110, 103426. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, X.; Ye, Y.; Wang, L.; Zhang, Y.; Qin, W.; Chi, Y.; Liu, G.; Yao, S. Nonlinear Relationships and Interaction Effects of Urban Built Environment on Urban Vitality Based on Explainable Machine Learning. City Environ. Interact. 2025, 28, 100244. [Google Scholar] [CrossRef]
Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure Mode and Effects Analysis of RC Members Based on Machine-Learning-Based SHapley Additive exPlanations (SHAP) Approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
Liu, Y.; Du, H. The Built Environment and Urban Vibrancy: A Data-Driven Study of Non-Commuters’ Destination Choices Around Metro Stations. Land 2025, 14, 1619. [Google Scholar] [CrossRef]
Yang, J.; Wang, E. Exploring the Nonlinear Impacts of Built Environment on Urban Vitality from a Spatiotemporal Perspective at the Block Scale in Chongqing. ISPRS Int. J. Geo-Inf. 2025, 14, 225. [Google Scholar] [CrossRef]
Yu, M.; Ji, Q.; Zheng, X.; Cui, W. Nonlinear Effects of the Built Environment on Urban Vitality in Jinan Based on Multi-Source Data and Explainable AI. Sci. Rep. 2026, 16, 4923. [Google Scholar] [CrossRef]
Wu, W.; Liu, X.; Zhou, Y.; Zhao, K. Spatial Heterogeneity of Built Environment’s Impact on Urban Vitality Using Multi-Source Big Data and MGWR. Sci. Rep. 2025, 15, 23459. [Google Scholar] [CrossRef]
Sun, Y.; Wan, B.; Sheng, Q. Relationship Between Spatial Form, Functional Distribution, and Vitality of Railway Station Areas Under Station-City Synergetic Development: A Case Study of Four Special-Grade Stations in Beijing. Sustainability 2024, 16, 10102. [Google Scholar] [CrossRef]
Sun, F.; Wang, E. Unveiling the Spatial Heterogeneity of Urban Vitality Using Machine Learning Methods: A Case Study of Tianjin, China. Land 2025, 14, 1316. [Google Scholar] [CrossRef]
Wang, Z.; Gao, Y.; Wei, X.; Lyu, C.; Li, L. Beyond Homogenization: Spatio-Temporal Dynamics of Urban Vitality and the Nonlinear Role of Built Environment in Shenyang’s Historic Urban Area. Land 2026, 15, 431. [Google Scholar] [CrossRef]

Figure 1. Conceptual framework.

Figure 2. Osaka Metro network and the Midosuji Line.

Figure 3. Spatial distribution of commercial–employment composite vitality along the Midosuji Line.

Figure 4. Spatial comparison of commercial-service agglomeration, employment intensity, and commercial–employment composite vitality: (a) commercial-service agglomeration, (b) employment intensity, and (c) commercial–employment composite vitality.

Figure 5. Comparison of model performance.

Figure 6. Observed versus predicted values of commercial–employment composite vitality based on the XGBoost Tuned model. Each dot represents a chome sample, and the dotted line indicates the 1:1 reference line.

Figure 7. Observed versus predicted values for Model A and Model B: (a) Model A; (b) Model B. Each dot represents a chome sample, and the dotted line indicates the 1:1 reference line.

Figure 8. SHAP-based importance of station-area environmental variables.

Figure 9. SHAP summary plot of environmental variables.

Figure 10. Nonlinear SHAP dependence plots of key environmental variables.

Figure 11. Comparison of SHAP-based variable importance across Model A, Model B, and Model C.

Figure 12. Two-dimensional partial dependence of key interaction effects in Model C.

Table 1. Classification details of commercial service facilities.

Type	Examples
Retail	Supermarkets, shopping malls, and comprehensive stores
Catering	Restaurants, cafes, bakeries, dessert shops, and snack bars
Leisure and entertainment	Fitness centers, cinemas, bars, tea rooms, and amusement facilities
Life services	Beauty salons, medical services, training services, repair services, and cleaning services
Financial services	Banks and securities institutions
Tourist accommodation	Hotels, inns, and hostels

Table 2. Definitions of dependent variables and model settings.

Model	Variable	Name	Meaning	Equation
Model A	Y1	Commercial-service agglomeration intensity	Spatial concentration of commercial-service facilities within the analytical unit	$Y_{1} = Z [\ln (1 + A g g_{i}^{c s f})]$
Model B	Y2	Employment intensity	Agglomeration level of employment activities within the analytical unit	$Y_{2} = Z [\ln (1 + \frac{E m p_{i}}{A_{i}})]$
Model C	Y	Commercial–employment composite vitality	Static spatial state in which commercial-service provision and employment carrying capacity are jointly concentrated	$Y = \frac{1}{2} Y_{1} + \frac{1}{2} Y_{2}$

Table 3. Definitions of independent variables.

Code	Variable	Meaning	Equation	Data Source
X1	Average officially assessed land price	Land value, commercial maturity, and market attractiveness	$X 1 = \frac{\sum_{j} (P_{j} \times A_{j})}{\sum_{j} A_{j}}$	MLIT Real Estate Information Library
X2	Residential population density	Concentration of residential population	$X 2 = \frac{\sum_{j} (P o p_{j} \times \frac{A_{j}}{G_{j}})}{A}$	e-Stat Population Census; jSTAT MAP
X3	Proportion of older adults	Degree of population ageing	$X 3 = \frac{O l d}{P o p}$	e-Stat Population Census
X4	Development- intensity index	Institutional development potential and commercial orientation	$X 4 = P C_{1} (D_{i})$ *	National Land Numerical Information: land-use zoning
X5	Land-use mix	Land-use diversity and richness of the commercial environment	$X 5 = - \sum_{k = 1}^{K} p_{k} \ln (p_{k})$	National Land Numerical Information: land-use zoning
X6	Intersection density	Street granularity and pedestrian permeability	$X 6 = \ln (1 + \frac{N o d e}{A})$	National Land Numerical Information: road data
X7	Pedestrian friendliness	Proportion of pedestrian-oriented road systems	$X 7 = \frac{L_{p e d e s t r i a n}}{L_{r o a d}}$	National Land Numerical Information: road data; GSI base-map information
X8	Bus-stop density	Convenience of surface public transport	$X 8 = \ln (1 + \frac{B u s}{A})$	National Land Numerical Information: bus-stop data
X9	Network distance to the nearest metro station	Distance-based rail accessibility; smaller values indicate better metro-station proximity	$X 9 = d_{s}^{n e t w o r k}$	National Land Numerical Information: railway and road data
X10	Transfer-line count	Maximum number of intersecting or transferable rail lines within overlapping station catchments	$X 1 0_{i} = \max_{s \in S_{i}} (T_{s})$	Official railway-operator station information; Railway network data

* Note: D_i denotes the development intensity indicator set for chome i, consisting of the standardized statutory floor area ratio FAR_i, building coverage ratio BCR_i, proportion of high-FAR land HighFAR_i, and proportion of commercial land ComLand_i. PC₁(·) denotes the first principal component score extracted from the indicator set through principal component analysis.

Table 4. Test-set performance of models for commercial–employment composite vitality.

Model	R²	RMSE	MAE
OLS	0.637	0.528	0.400
Lasso	0.662	0.510	0.391
Random Forest	0.764	0.426	0.310
BP Neural Network	0.791	0.401	0.308
XGBoost Baseline	0.761	0.429	0.318
XGBoost Tuned	0.763	0.427	0.312

Table 5. Five-fold cross-validation results for commercial–employment composite vitality models.

Model	R²_Mean	R²_Std	RMSE_Mean	RMSE_Std	MAE_Mean	MAE_Std
OLS	0.635	0.054	0.540	0.046	0.401	0.035
Lasso	0.638	0.054	0.539	0.054	0.399	0.035
Random Forest	0.722	0.074	0.469	0.060	0.334	0.031
BP Neural Network	0.735	0.066	0.457	0.049	0.342	0.029
XGBoost Baseline	0.735	0.072	0.457	0.046	0.332	0.024
XGBoost Tuned	0.731	0.073	0.461	0.051	0.333	0.023

Table 6. Test-set performance comparison of Model A, Model B, and Model C.

Dependent Variable	Model	R²	RMSE	MAE
Y1	OLS	0.315	0.807	0.534
	Lasso	0.351	0.785	0.517
	Random Forest	0.694	0.540	0.376
	BP Neural Network	0.759	0.479	0.345
	XGBoost Baseline	0.688	0.544	0.375
	XGBoost Tuned	0.704	0.531	0.364
Y2	OLS	0.729	0.501	0.394
	Lasso	0.732	0.499	0.386
	Random Forest	0.786	0.446	0.333
	BP Neural Network	0.764	0.468	0.358
	XGBoost Baseline	0.766	0.466	0.339
	XGBoost Tuned	0.758	0.473	0.335
Y	XGBoost Tuned	0.763	0.427	0.312

Table 7. SHAP-based variable-importance comparison across Model A, Model B, and Model C.

Variable	Model A	Model B	Model C
X4 Development-intensity index	0.302, rank 1	0.466, rank 1	0.400, rank 1
X1 Average officially assessed land price	0.103, rank 4	0.268, rank 2	0.179, rank 2
X9 Network distance to the nearest metro station	0.286, rank 2	0.079, rank 3	0.169, rank 3
X6 Intersection density	0.199, rank 3	0.059	0.100, rank 4
X10 Transfer-line count	0.084, rank 5	0.060	0.075, rank 5
X2 Residential population density	0.072	0.075, rank 4	0.075
X5 Land-use mix	0.055	0.065, rank 5	0.058
X7 Pedestrian friendliness	0.044	0.051	0.041
X8 Bus-stop density	0.033	0.042	0.039
X3 Proportion of older adults	0.033	0.048	0.034

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Wang, Z.; Yao, M.; Zhang, Y.; Zhang, Q. Nonlinear Effects of Station-Area Environments on Commercial–Employment Composite Vitality: Evidence from Osaka’s Midosuji Line. Land 2026, 15, 1054. https://doi.org/10.3390/land15061054

AMA Style

Li Y, Wang Z, Yao M, Zhang Y, Zhang Q. Nonlinear Effects of Station-Area Environments on Commercial–Employment Composite Vitality: Evidence from Osaka’s Midosuji Line. Land. 2026; 15(6):1054. https://doi.org/10.3390/land15061054

Chicago/Turabian Style

Li, Yu, Zihao Wang, Minfeng Yao, Yikang Zhang, and Qi Zhang. 2026. "Nonlinear Effects of Station-Area Environments on Commercial–Employment Composite Vitality: Evidence from Osaka’s Midosuji Line" Land 15, no. 6: 1054. https://doi.org/10.3390/land15061054

APA Style

Li, Y., Wang, Z., Yao, M., Zhang, Y., & Zhang, Q. (2026). Nonlinear Effects of Station-Area Environments on Commercial–Employment Composite Vitality: Evidence from Osaka’s Midosuji Line. Land, 15(6), 1054. https://doi.org/10.3390/land15061054

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Nonlinear Effects of Station-Area Environments on Commercial–Employment Composite Vitality: Evidence from Osaka’s Midosuji Line

Abstract

1. Introduction

2. Literature Review

2.1. Measurement of Station-Area Vitality

2.2. Built Environment and TOD-Related Factors

2.3. Nonlinear Modeling and Explainable Machine Learning

3. Study Area, Data, and Methods

3.1. Study Area and Spatial Units

3.2. Dependent Variables

3.3. Independent Variables

3.4. Modeling Strategy and Interpretation

4. Results

4.1. Spatial Patterns of Commercial–Employment Composite Vitality

4.2. Model Performance for Commercial–Employment Composite Vitality

4.3. Model Comparison for Commercial, Employment, and Composite Vitality

4.4. SHAP Interpretation of Commercial–Employment Composite Vitality

4.5. Comparison of SHAP Importance Across Model A, Model B, and Model C

5. Discussion

5.1. Nonlinear Mechanisms and Chome-Scale Spatial Organization

5.2. Interpretation Boundary and Different Spatial Logics of the Composite Indicator

5.3. Planning Implications

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI