An Explainable Machine-Learning Framework Based on XGBoost–SHAP and Big Data for Revealing the Socioeconomic Drivers of Population Urbanization in China

Shangguan, Ziheng

doi:10.3390/systems13080679

Open AccessArticle

An Explainable Machine-Learning Framework Based on XGBoost–SHAP and Big Data for Revealing the Socioeconomic Drivers of Population Urbanization in China

by

Ziheng Shangguan

^1,2,3

¹

School of Public Administration, Guizhou University of Finance and Economics, Guiyang 550025, China

²

School of Geography, Earth and Atmospheric Sciences, The University of Melbourne, Melbourne, VIC 3010, Australia

³

Asia Institute, The University of Melbourne, Melbourne, VIC 3010, Australia

Systems 2025, 13(8), 679; https://doi.org/10.3390/systems13080679

Submission received: 14 July 2025 / Revised: 4 August 2025 / Accepted: 7 August 2025 / Published: 9 August 2025

(This article belongs to the Section Systems Practice in Social Science)

Download

Browse Figures

Versions Notes

Abstract

The global acceleration of population urbanization has transformed cities into primary spatial hubs of human activity. As urban populations continue to expand, identifying the socioeconomic drivers of urbanization and elucidating their underlying mechanisms are essential for achieving Sustainable Development Goal 11, established by the United Nations. This study leverages machine learning and big data to investigate the determinants of population urbanization in China over the period 1991–2023. Utilizing the XGBoost algorithm combined with SHAP (Shapley Additive Explanations), the analysis reveals a tripartite structure of key drivers encompassing industrial support, employment orientation, and infrastructure accessibility. Regional assessments indicate distinct urbanization patterns: Eastern coastal areas are predominantly driven by finance and service industries; central inland regions follow an investment-led trajectory anchored in infrastructure development and real estate expansion, while the western interior relies mainly on employment-centered strategies. Partial Dependence Plots (PDPs) highlighted spatial variations in the effects of sensitive factors, with interaction analyses revealing synergistic effects between tertiary sector shares and the working-age share in eastern coastlands, structural amplification by real estate investment with appropriate working-age population shares in the central inlands, and balancing interactions between GDP growth rates and tertiary sector shares in the western interior. These findings contribute to a more nuanced understanding of the socioeconomic forces shaping urbanization and offer evidence-based recommendations for policymakers in other developing countries seeking to foster sustainable urban growth.

Keywords:

population urbanization; socioeconomic drivers; machine learning; big data; China

1. Introduction

In recent years, the global pace of population urbanization has accelerated significantly, positioning cities as the primary spatial platforms for human activity. This sweeping transformation not only reshapes patterns of human settlement but also exerts profound impacts on economic structures, social systems, and environmental conditions [1,2]. As urban populations continue to surge, identifying the socioeconomic drivers behind urbanization becomes increasingly critical. Such insights are foundational to achieve the United Nations Sustainable Development Goals (SDGs)—particularly Goal 11 (Sustainable Cities and Communities), Goal 1 (No Poverty), and Goal 9 (Industry, Innovation, and Infrastructure) [3,4]. Understanding these determinants enables the development of evidence-based urban policies and planning strategies that foster inclusive growth and environmental sustainability.

Recent research has demonstrated that population urbanization is shaped by a complex interplay of socioeconomic factors. These include, but are not limited to, the level of economic development, the distribution of educational resources, accessibility to healthcare and public services, the quality of transportation and infrastructure, as well as the population size and labor force structure. Each of these variables exerts a distinct influence on urbanization patterns, often interacting in nonlinear and regionally heterogeneous ways [5]. For instance, economic growth stimulates urban expansion by generating employment and attracting investment, but its effects may be uneven across regions depending on industrial structure and capital flows [6]. Similarly, access to high-quality education and healthcare tends to concentrate populations in urban cores, yet disparities in service provision can exacerbate social exclusion within cities [7]. Moreover, urban infrastructure and transport systems are critical in enhancing mobility and enabling peri-urban integration, but their development often lags behind population growth, especially in developing contexts [8]. Due to these intertwined and region-specific dynamics, a one-size-fits-all approach to urban policy is insufficient. A data-driven, context-sensitive analytical framework is urgently needed to disentangle causal relationships and better support sustainable urbanization strategies [9].

As the world’s largest developing country, China has undergone a rapid and expansive process of population urbanization in recent decades. Since the year 2000, China’s urbanization rate has surged from approximately 36% to over 65% by 2022, accompanied by profound shifts in demographic structure, resource allocation, and land-use patterns [10]. This transformation reflects not only the scale and speed of urban development but also the evolving role of socioeconomic drivers in shaping urban growth across diverse regions [11]. A deeper understanding of the socioeconomic mechanisms behind China’s urbanization is critical for informing more effective, equitable, and sustainable national development strategies. Moreover, China’s urbanization trajectory offers valuable lessons for other developing countries grappling with similar challenges, including rural–urban disparities, infrastructural constraints, and labor mobility. Insights derived from the Chinese context can help to advance global urbanization theory by integrating large-scale empirical evidence with development-oriented policy frameworks [12].

It is important to note that China’s population urbanization exhibits pronounced regional disparities. In the eastern region, the urbanization process has advanced rapidly, primarily due to strong economic foundations, dense industrial clusters, and well-developed infrastructure. In contrast, central and western regions face constraints stemming from differences in development stages, resource endowments, and levels of policy support, which often lead to distinct urbanization trajectories and divergent underlying mechanisms [13]. These spatial imbalances underscore the need for disaggregated and mechanism-specific analyses of urbanization drivers. Understanding region-specific determinants is crucial not only for tailoring effective policy interventions but also for enhancing the explanatory power of urbanization theory in large, heterogeneous economies, like that of China [14]. Such a differentiated research approach holds substantial theoretical value and practical relevance for promoting inclusive and coordinated urban development.

Against this backdrop, this study aims to introduce an empirically grounded and methodologically innovative framework for analyzing urbanization by integrating machine-learning techniques with interpretable modeling tools. Focusing on China’s 31 province-level administrative regions, we utilize a combination of official statistical yearbook data and large-scale socioeconomic datasets to explore the key drivers of population urbanization. Specifically, we apply the XGBoost algorithm to model urbanization levels and assess feature importance, supplemented by SHAP (Shapley Additive Explanations) and Partial Dependence Plot (PDP) techniques to enhance interpretability. This integrated approach enables both a global and a local understanding of feature effects and interaction mechanisms. Through this comprehensive analytical framework, the study seeks to offer both theoretical insights and policy-oriented evidence for promoting differentiated urbanization strategies. Additionally, it aims to serve as a methodological reference for future data-driven research on urban development. The objectives of this study are fourfold:

(1): Identify the critical socioeconomic variables influencing population urbanization;
(2): Reveal the primary driving factors across China’s eastern, central, and western regions, highlighting regional heterogeneity;
(3): Analyze the nonlinear and time-varying relationships between key sensitive variables and urbanization levels;
(4): Investigate how interactions among these sensitive factors shape population urbanization.

2. Literature Review and Indicator System Development

Population urbanization is a nonlinear process driven by the complex interaction of multiple socioeconomic factors. Economic development, education, healthcare, infrastructure, and demographic structure are intricately interlinked, and their respective influences vary significantly across regions. Economic development serves as one of the most fundamental drivers of population urbanization. According to classical urban economic theory, economic agglomeration in urban areas leads to increased productivity and employment opportunities, particularly in the manufacturing and service sectors [15]. Higher GDP per capita and greater industrial diversification attract rural labor seeking better income prospects and improved living standards [16]. Empirical studies in countries like Poland, Spain, and Ukraine confirm a statistically significant positive correlation between regional economic performance and urban population growth, reinforcing the idea that economic prosperity enhances urban appeal through job creation and infrastructure investment [17]. However, this process is not uniform across all regions. In developing countries with uneven economic development, such as China, urbanization often concentrates disproportionately in coastal megacities, while inland regions experience labor outflows and relative stagnation [18]. Thus, while economic development generally promotes urbanization, its spillover effects remain spatially limited in the absence of coordinated policy interventions.

The spatial allocation of educational resources plays a pivotal role in shaping urbanization patterns, particularly through its influence on rural-to-urban migration. Urban areas typically offer superior access to high-quality educational institutions, such as universities, technical colleges, and vocational training centers, which serve as powerful magnets for young and mobile populations [19]. The pursuit of better educational opportunities not only drives individual migration decisions but also enhances the long-term human capital stock in cities, reinforcing cycles of innovation, labor market upgrading, and economic growth [20]. A regional study in the southeastern United States found that educational access is one of the most decisive factors shaping migration behavior, especially among minority groups and younger populations [9]. Similarly, empirical evidence from China indicates that the spatial concentration of human capital in cities is both a driver and a consequence of rural-to-urban migration, with higher educational returns observed in urban centers [21]. However, in developing economies, education-driven migration can exacerbate regional inequality. As educated individuals leave rural areas for cities, the “brain drain” reduces the availability of skilled labor in less-developed regions, limiting their potential for endogenous development [22].

Access to healthcare and public services is a major determinant of urban migration, especially in contexts where rural service provision is inadequate or deteriorating [23]. Urban areas generally possess more hospitals, specialized clinics, and robust public health infrastructures, which offer not only improved medical care but also lower morbidity and mortality rates, thereby enhancing urban appeal to prospective migrants [24]. The sustainability of urban health systems has become central to urban planning, with increasing emphasis on environmental health, equitable access, and preventive care [25]. However, these benefits are not evenly distributed. Many urban poor face systemic exclusion due to institutional, financial, or social barriers, despite being physically close to health services. Research from China indicates that migrants often face restricted access to care due to institutional arrangements, like the hukou (household registration) system, and the absence of targeted welfare policies [26]. This phenomenon—sometimes termed as the “spatial–functional paradox”—represents a major challenge for inclusive urban development. Addressing it requires not only infrastructure investments but also institutional reform to ensure access equity and service affordability [27].

Transportation networks and infrastructure quality significantly shape the spatial and functional contours of urbanization. Efficient public transit, road systems, utility services, and digital infrastructure reduce mobility frictions, enhance regional connectivity, and support higher urban densities [28]. These improvements expand labor markets, diversify housing options, and facilitate the economic integration of peripheral zones into metropolitan cores [29]. Empirical evidence from OECD countries suggests that integrated infrastructure systems are closely associated with reduced economic inequality and increased productivity across large urban agglomerations [8]. In China, the pattern holds as well—regions with better transportation access experience more rapid urbanization, whereas areas with deficient infrastructure face slower urban transitions and diminished migration attractiveness [30]. Notably, infrastructure investment has demonstrated spillover effects, contributing to both regional economic convergence and urban system cohesion. However, disparities in access to such infrastructure remain a central spatial determinant of China’s urban hierarchy, leading to distinct urban clusters and persistent regional inequalities [31].

Demographic factors, including the total population size and the age structure of the labor force, critically influence the scale and sustainability of urbanization. Regions with a high proportion of working-age residents and elevated labor force participation rates tend to urbanize more rapidly, driven by both the labor supply and increased consumer demand [32]. The concept of the “demographic dividend” captures the growth potential that arises when a large, youthful workforce is matched with employment and education opportunities [33]. In China, demographic transformations—including the shrinking labor force, population aging, and sustained rural-to-urban migration—have become defining forces shaping urban expansion and economic development trajectories [34]. For example, empirical research on Chongqing has shown how shifts in labor availability and aging populations directly impact urban land use and infrastructure demand [35]. However, if not properly managed, demographic shifts can become liabilities. An oversupply of low-skilled labor or a rapidly aging population may constrain innovation, increase healthcare and pension burdens, and erode long-term urban productivity [36]. Thus, aligning urbanization policies with labor market dynamics and the age structure is essential for achieving resilient and equitable urban growth [37].

To analyze the relative importance, nonlinear effects, and interaction mechanisms of key drivers of population urbanization in contemporary China, this study constructs a comprehensive indicator system of socioeconomic variables, as presented in Table 1. The selected indicators aim to capture multiple dimensions of urbanization dynamics, including economic performance, demographic characteristics, infrastructure, education, healthcare access, and public service provision. This multidimensional framework provides the empirical foundation for subsequent machine-learning modeling and interpretation.

3. Materials and Methods

3.1. Overview of the XGBoost Regression Model

To identify the key socioeconomic determinants influencing the level of population urbanization, this study employs the eXtreme Gradient Boosting (XGBoost) regression model. XGBoost is an ensemble learning algorithm, based on Classification and Regression Trees (CARTs), renowned for its high computational efficiency, strong predictive accuracy, and ability to handle missing data automatically. These features make it particularly well suited for complex regression problems involving nonlinear relationships among multiple variables [38]. The optimization is guided by a regularized objective function that balances the model’s accuracy and complexity, which can be formally expressed as follows:

L (φ) = \sum_{i = 1}^{n} l (Y_{i}, {\hat{Y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(1)

In the objective function,

l (Y_{i}, {\hat{Y}}_{i})

denotes the loss function, which measures the discrepancy between the actual and predicted values. The regularization term is defined as follows:

Ω (f_{k}) = γ T + 0.5 λ {‖ω‖}^{2}

(2)

where

T

represents the number of leaf nodes in each regression tree, and

ω

denotes the vector of leaf weights. This term is designed to penalize model complexity and mitigate overfitting. The overall number of trees in the ensemble is denoted by

K

, and each individual tree is constructed to minimize the residual errors of the previous iteration. Specifically, in each boosting round, the tth tree is trained to approximate the negative gradient of the loss function with respect to the predicted values from the prior round as follows:

g_{i} = \frac{\partial l (Y_{i}, {\hat{Y}}_{i}^{(t - 1)})}{\partial {\hat{Y}}_{i}^{(t - 1)}}, h_{i} = \frac{\partial^{2} l (Y_{i}, {\hat{Y}}_{i}^{(t - 1)})}{\partial {({\hat{Y}}_{i}^{(t - 1)})}^{2}}

(3)

New leaf node splits are determined by maximizing the gain function, which quantifies the improvement in the objective function after a split. The gain is calculated as follows:

G a i n = \frac{1}{2} [\frac{{(\sum_{i \in I_{L}} g_{i})}^{2}}{\sum_{i \in I_{L}} h_{i} + λ} + \frac{{(\sum_{i \in I_{R}} g_{i})}^{2}}{\sum_{i \in I_{R}} h_{i} + λ} - \frac{{(\sum_{i \in I} g_{i})}^{2}}{\sum_{i \in I} h_{i} + λ}] - γ

(4)

where

I_{L}

and

I_{R}

denote the instance sets assigned to the left and right child nodes, respectively, while

I

is the original set before the split;

g_{i}

and

h_{i}

are the first- and second-order gradients of the loss function with respect to the model output,

λ

is the regularization parameter controlling the leaf weight’s complexity, and

γ

is the minimum loss reduction required to make a further partition.

3.2. SHAP Value Method

The gain metric is used to rank the contributions of input features based on their improvements to the model’s objective function during splits. This allows for the identification of the most influential variables in predicting population urbanization levels. To enhance interpretability, we further employ the SHAP value method, which provides a consistent and theoretically grounded approach for explaining individual predictions. Originating from cooperative game theory, SHAP values are derived from the Shapley value theory, which attributes the prediction output to each feature by computing its marginal contribution across all the possible feature combinations. The SHAP value for a feature

i

is defined as follows:

ϕ_{i} = \sum_{S \subseteq F ∖ \{i\}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} [f (S \cup \{i\}) - f (S)]

(5)

where

F

is the set of all the input features,

S

is a subset of

F

not containing feature

i

, and

f (S)

is the model prediction using the subset

S

. This formula evaluates the average marginal contribution of feature

i

over all the possible feature coalitions, thereby offering a robust explanation of feature influence at both global and local levels.

3.3. Partial Dependence Plots

To further investigate the marginal effects between key features and the target variable, this study employs partial dependence plots (PDPs). PDPs illustrate the average effect of a single input variable on the model’s predictions while holding all the other variables constant. This technique is particularly suitable for interpreting complex machine-learning models that capture nonlinear relationships and feature interactions. The univariate partial dependence function is defined as follows:

{\hat{f}}_{P D P} (x_{S}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{f} (x_{S}, x_{C}^{(i)})

(6)

where

x_{S}

is the feature of interest,

x_{C}^{(i)}

denotes the remaining features for the ith observation,

\hat{f} (\cdot)

is the trained XGBoost model, and

n

is the total number of samples. This formulation computes the average model prediction across all the observations for fixed values of the focal feature, thus revealing its marginal contribution to the prediction. To explore potential interaction effects between features, the study further utilizes two-way partial dependence plots, which visualize the joint marginal effects of two variables on the predicted outcome. The bivariate PDP is given by

{\hat{f}}_{P D P} (x_{S 1}, x_{S 2}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{f} (x_{S 1}, x_{S 2}, x_{C}^{(i)})

(7)

3.4. Data Collection

Data on economic development, education resources, and healthcare and public services were obtained from the Statistical Communiqué on the National Economic and Social Development and Regional Statistical Yearbooks published by the National Bureau of Statistics of China (https://www.stats.gov.cn/). Information on infrastructure access was sourced from the Resource and Environmental Science Data Platform of China’s Geographic Big Data (https://www.resdc.cn/) and supplementary datasets published by the Ministry of Housing and Urban–Rural Development (https://www.mohurd.gov.cn/). Data related to the population and labor force structure were derived from the Population and Employment Statistical Yearbook and relevant national census reports, made available by the Ministry of Human Resources and Social Security and the National Bureau of Statistics (https://www.mohrss.gov.cn/). The study utilizes panel data from 31 inland province-level administrative regions of China, with a temporal scope ranging from 1991 to 2023. For the partially missing data from the China Statistical Yearbook, this study employs the ARIMA imputation method for supplementation.

3.5. Research Framework

Figure 1 presents the conceptual and analytical framework developed for this study, outlining the key components and their interrelationships. Figure 2 shows the study area of this research and its spatial division into the eastern coastland, central inland, and western interior [10].

4. Results

4.1. Importance Analysis of Population-Urbanization-Driving Factors

To evaluate the performance of the XGBoost model in predicting province-level population urbanization, a series of baseline tests was conducted. To ensure the model’s generalizability and robustness, fivefold cross-validation was employed on the training dataset. The dataset was partitioned into five equal subsets; in each iteration, four subsets were used for model training, and the remaining one was used for validation. The average performance metrics across all the folds were calculated to mitigate overfitting and reduce the variance in the model evaluation. In the test set, the model achieved a coefficient of determination (R²) of 0.932, indicating that it explains over 93.2% of the variance. Additionally, the model yielded a root-mean-square error (RMSE) of 0.0185 and a mean absolute error (MAE) of 0.0141, further confirming its high predictive accuracy and low degree of bias. The relative importances of the driving factors identified by the machine-learning model are illustrated in Figure 3.

The TSS ranks first in feature importance, contributing nearly 0.40 to the model, highlighting the pivotal roles of industrial upgrading and the expansion of the service economy in driving population urbanization. UJG ranks second, underscoring the foundational importance of the employment absorption capacity in attracting and concentrating urban populations. Together, these two factors form a dual-engine mechanism—economic transformation and functional employment—that propels urbanization. The REI and WAS occupy middle positions in the ranking, reflecting the supportive roles of capital investment and the demographic structure in sustaining urban growth. In parallel, the GGR and transportation accessibility indicators—including the road length (RDL) and transport vehicle registration (TVR)—also show moderate influences, indicating that macroeconomic momentum and physical connectivity remain key contextual drivers of population mobility. Notably, variables traditionally regarded as essential—education resources (HES, TSR, and PCE) and healthcare and services (HBR, DSR, and PCH)—rank relatively lower. This may be attributed to the broad success of China’s compulsory nine-year education system and the ongoing equalization of basic public services [39]. These efforts have led to widespread coverage of education and healthcare facilities across provinces, reducing inter-regional variance and, thus, diminishing their explanatory power for differences in urbanization rates. In sum, the overall driving mechanism of population urbanization in China exhibits a composite structure characterized by industrial support, employment orientation, and spatial accessibility.

4.2. Regional Heterogeneity of Driving Factors According to the SHAP Model

The feature importance analysis based on SHAP values reveals significant regional disparities in the driving factors of population urbanization. This underscores the pronounced spatial heterogeneity in the structure of urbanization drivers across different areas, as illustrated in Figure 4.

As shown in Figure 4, the population urbanization process in the eastern coastal region is characterized by a multifactorial and complex mechanism. Among all the variables, UJG and TSS emerge as the most influential, closely aligning with the national-level trends predicted by the XGBoost model. Additionally, the WAS, REI, and GGR also exhibit significant impacts. These findings suggest that the financial and service sectors are the core engines of population urbanization in the eastern region, underpinned by favorable demographic structures and robust real estate investment. In contrast, REI ranks far above all the other variables in importance in the central inland region, highlighting the region’s strong dependence on the property sector for urban expansion and population absorption. While WAS, UJG, and GGR also contribute meaningfully, their influence is secondary. This indicates that in the context of supportive population structures and moderate economic growth, investment-oriented urban development plays a dominant role to determine population urbanization in the central region. In the western interior, UJG is the overwhelmingly dominant variable, with SHAP values significantly higher than those of all the other factors. Although GGR and TSS show moderate relevance, the overall urban population in the western region remains heavily reliant on the expansion of the labor market, with relatively limited influence from broader economic or service-based variables. So, urban population change follows a more singular employment-driven pattern in the western interior. These results highlight the structural divergence of population urbanization drivers across China’s macro-regions and underscore the need for region-specific policy responses.

4.3. Partial Dependence Analysis of Regional Sensitivity Factors

The partial dependence analysis reveals that the sensitivity factors of population urbanization vary significantly across regions. In the eastern coastland, TSS and WAS are the sensitivity factors; in the central inland, REI and WAS; and in the western interior, GGR and TSS.

4.3.1. Static-Effect Analysis

The static analysis, which does not account for temporal effects, primarily reveals the marginal impacts of sensitivity factors on population urbanization (PU) while controlling for the influence of all the other variables, as illustrated in Figure 5.

Figure 5a illustrates the partial dependences of TSS and WAS on PU in the eastern coastland. The PDP for TSS reveals a weak marginal impact on PU when TSS is below 0.40. However, a marked increase in PU is observed when TSS reaches 0.40, indicating that the expansion of the service sector has a strong positive effect on population urbanization. Beyond 0.50, PU exhibits a fluctuating decline, suggesting potential saturation in the service sector or diminishing marginal returns. In contrast, the PDP for WAS presents a clearer upward trend. Particularly between 0.645 and 0.72, PU rises significantly with increases in WAS, indicating that maintaining a favorable age structure is critical for population urbanization.

Figure 5b shows the PDPs for REI and WAS in the central inland. For REI, PU increases steadily with greater investment but plateaus at around 0.15, suggesting a saturation effect. This implies that while population urbanization in the central inland is highly dependent on real-estate-driven expansion, the marginal benefit levels off once investment reaches approximately 15% of the total GDP. The PDP for WAS displays a relatively gradual and stable upward trend between 0.65 and 0.75, with a slight acceleration beyond 0.75. Although not the dominant driver, WAS still provides structural support to population urbanization—especially when the share of the working-age population is high enough to enhance the labor market dynamics and city-level absorptive capacity.

Figure 5c examines GGR and TSS in the western interior. The PDP for GGR shows a moderate positive impact on PU when GGR is around 3%, but PU declines slightly as GGR exceeds this threshold. Overall, the marginal impact of the GGR on urbanization is limited, implying that economic growth alone is insufficient to drive population urbanization in less-developed regions. By contrast, the PDP for TSS shows a more pronounced effect. PU increases significantly as TSS reaches 0.40; the trend levels off after 0.43, suggesting that the expansion of the service sector has a saturation effect once the sector reaches a certain structural share.

4.3.2. Temporal-Effect Analysis

To further uncover the dynamic evolution of sensitivity factors influencing population urbanization over time, this study incorporates temporal variables into the analysis of the sensitivity factors. The results of this temporal-effect assessment are presented in Figure 6.

Figure 6a presents the temporal effects of TSS and WAS on PU in the eastern coastland. The time effect for TSS shows that prior to 2013, its impact on PU remained largely stable. After 2013, the influence of TSS increased noticeably until it began to decline slightly after 2020. For WAS, the temporal pattern is more pronounced. Before 1998, the impact of WAS on PU was relatively constant. However, after 1998, areas with higher WAS values began to show a stronger positive association with PU. Interestingly, after 2015, regions with lower WAS values also started to exhibit an increasing effect. Overall, the influences of both TSS and WAS on urbanization intensified over time, particularly after 2015.

Figure 6b displays the time dynamics of REI and WAS in the central inland. The results indicate that the effects of both variables on PU steadily increased over time, especially after 2002. Notably, the impact of the REI reached a stable level at around 2002, while the influence of the WAS continued to grow until it plateaued in 2017.

Figure 6c shows the temporal effects of GGR and TSS on the western interior. The effects of both factors exhibit an inverted U-shaped trend, with the turning point occurring at around 2001. Notably, after 2011, the influences of both GGR and TSS on population urbanization declined significantly. This decline was especially pronounced in regions with higher GGR and lower TSS values.

4.4. Interaction Analysis of Regional Sensitivity Factors

To further investigate the combined effects of sensitivity factors, this study conducted bivariate PDPs to explore their interaction mechanisms. The three-dimensional interaction surfaces illustrating these effects are presented in Figure 7.

Figure 7a presents the interaction mechanism between TSS and WAS in the eastern coastland. The results indicate a significant positive interaction effect between these variables: When both TSS and WAS are at higher levels (TSS > 0.45; WAS > 0.73), PU reaches its optimal range (>0.415). This suggests that a high proportion of the service sector, combined with a large working-age population, constitutes favorable conditions for population urbanization. Conversely, when either variable remains at a low level, PU declines sharply, demonstrating that a single structural advantage alone is insufficient to sustain urbanization momentum. However, beyond approximately TSS = 0.50 and WAS = 0.76, the growth in PU begins to level off, signaling the presence of diminishing marginal returns.

Figure 7b shows the interaction mechanism between REI and WAS in the central inland. When REI exceeds 0.19 and WAS is above 0.78, PU reaches a relatively high interval (>0.460), indicating that the combined effect of the real estate investment and labor supply significantly enhances population urbanization in the central inland. Importantly, the marginal effect of the REI is substantially stronger than that of the WAS. Even when WAS remains low, PU increases markedly as REI rises, confirming the dominant role of real estate investment in driving population urbanization. In contrast, the impact of the WAS becomes more pronounced primarily at mid-to-high levels (>0.72), suggesting that human capital exerts its influence after crossing a certain threshold.

Figure 7c illustrates the interaction mechanism between GGR and TSS in the western interior. The results show that PU generally increases as GGR and TSS rise, with PU reaching a relatively high level (>0.625) when GGR exceeds 0.15 and TSS surpasses 0.50. This demonstrates that economic growth and industrial upgrading exert strong synergistic effects in driving population urbanization. Further observation reveals that when TSS exceeds 0.60, PU starts to decline slightly, suggesting that beyond a critical point, the marginal benefits of expanding the service sector diminish, potentially resulting in structural oversupply or mismatches in the employment capacity. This nonlinear pattern emphasizes the need to focus not only on the scale of the service sector expansion but also on its quality and diversification. GGR exhibits a stage-dependent enhancement: At lower levels (0–0.15), its marginal contribution to PU tends to be high. However, when GGR exceeds 0.20, PU decreases steadily, implying that moderate economic growth is a prerequisite for translating economic gains into population urbanization in the western interior.

5. Discussion

5.1. China’s Tripartite Mechanism of Population Urbanization

This study identifies a tripartite mechanism driving China’s population urbanization—industrial support, employment orientation, and spatial accessibility—which aligns closely with classical theories of urban economics, particularly the Harris–Todaro model, which emphasizes employment-driven rural-to-urban migration [40]. In China, industrial agglomeration not only generates large-scale employment opportunities but also enhances migrants’ willingness to settle permanently, thereby serving as a fundamental engine of sustained urban population growth [41]. Moreover, long-standing national policies, such as industrial parks and special economic zones, have tightly coupled the industrial spatial layout with urban functional systems. These policies, when combined with massive investment in transportation and infrastructure, have fostered an integrated spatial network that facilitates the urbanization of mobile populations [42]. This mechanism can be conceptualized as a synergistic “production–employment–mobility” cycle, reflecting a uniquely Chinese urbanization pathway shaped by a blend of state leadership and market forces [43]. Importantly, recent research suggests that China’s urban systems are experiencing improved coupling coordination among the population, land, economy, and ecology, which is increasingly seen as a hallmark of high-quality urban development and further supports the systemic nature of this tripartite mechanism [44].

Compared to the urbanization of other developing countries, China’s urbanization demonstrates a distinct production-oriented pattern. In regions such as sub-Saharan Africa and South Asia, urban growth is often driven by deteriorating rural livelihoods, leading to passive migration, limited employment absorption, and the proliferation of informal settlements [45]. In contrast, China’s urbanization is heavily guided by industrial policy and infrastructure-led development, revealing a strong state-regulated model [46]. Even when compared with urbanization trajectories in Western economies, China’s urbanization trajectory appears to be unique. While transportation and employment are also critical in advanced economies, urban expansion in those contexts is primarily market driven, relying on private land development and lacking the “administrative coordination plus hierarchical regulation” structure that characterizes China. As a result, China has gradually built a tiered urban system—ranging from megacities to medium-sized cities and newly urbanized towns—which enables differentiated regulation and spatial population optimization [47]. This system offers both a policy blueprint and a theoretical reference for other countries seeking to develop orderly and sustainable urbanization strategies.

5.2. Structural Differentiation in Regional Population Urbanization Mechanisms

This study reveals a clear pattern of regional differentiation in the mechanisms driving population urbanization across China’s eastern coastland, central inland, and western interior regions. Specifically, the eastern coastland is characterized by a finance- and service-oriented model; the central region follows an investment-led trajectory dominated by infrastructure and real estate, and the western region is shaped by a single-core employment-driven pathway. This gradient structure—“service–investment–employment”—can be interpreted through the lenses of regional development stage theory and institutional economics. In the eastern coastland, where industrialization has largely matured, cities are entering a transformation phase centered on services and innovation. High-value services and financialization have become major forces behind urban population concentration [48], while the central inland is undergoing rapid industrialization and urbanization, with local governments promoting the expansion of real estate and infrastructure through “land finance” and debt-driven investment, thereby creating a strong pull for population inflow [49]. This explains the emergence of an investment-led urbanization model in the central inland provinces [50]. By contrast, the western region, constrained by limited resource endowment and weaker economic foundations, primarily relies on employment opportunities generated by basic industries to absorb surplus rural labor—typifying a labor-market-led model of urban growth [51].

The “service–investment–employment” gradient fundamentally encapsulates the evolutionary trajectory of regional urban development in China—mirroring variations in fiscal capacity, industrial maturity, and mechanisms of population absorption. This structurally adaptive model presents several compelling advantages. By accommodating different regional dynamics within a unified national framework, it facilitates a multi-speed approach to urbanization that enhances the overall policy coherence and development efficiency [46]. The presence of dominant drivers in each region also strengthens spatial governance by enabling targeted policy design. Policymakers can respond with tailored strategies—such as incentivizing talent migration, upgrading infrastructure, or providing employment subsidies—that better align with local conditions and improve the effectiveness of population management [52]. From a global perspective, China’s gradient population urbanization model—which refers to a spatial distribution structure in which different regions exhibit dominant urbanization drivers sequentially led by services, investment, and employment—offers meaningful insights for other multi-layered economic systems. These include designing differentiated population urbanization pathways, aligning policy tools with industrial life cycles, and promoting orderly urban expansion alongside balanced regional development.

5.3. Nonlinear Dynamics of Sensitivity Factors in Population Urbanization

This study reveals that several key sensitivity factors exhibit pronounced nonlinear effects on population urbanization, reflecting hidden threshold dynamics and structural saturation within urban development trajectories. The service sector has consistently played vital roles in attracting population urbanization, generating employment, and improving living standards. However, when the share of the tertiary sector approaches approximately 45%, the marginal growth effect on population urbanization tends to plateau—or even diminish. This phenomenon can be attributed to internal structural homogeneity within services, declining labor absorption efficiency, and diminishing returns [53]. Internationally, countries such as the United States and Japan have experienced similar trends of urban growth deceleration and functional hollowing out during service-led development stages, suggesting that sole reliance on the service economy may be insufficient to sustain long-term population attractiveness [54,55]. Conversely, cities with a higher share of the working-age population demonstrate stronger persistence and stability in their population urbanization processes. The youthful demographic base provides sustained endogenous momentum through labor supply, consumption potential, and social vitality. Global experience indicates that the acceleration of population urbanization in middle-income countries often coincides with peaks in the working-age population, as seen in both China and India [56].

Analysis of real estate investment further reveals risk thresholds within urban expansion. When the share of real estate investment rises above approximately 15% of the GDP, its marginal effect on population urbanization diminishes markedly. This pattern likely reflects market saturation, rising housing affordability pressures, and diminishing investment efficiency. The recent emergence of high vacancy rates and structural oversupply in many Chinese cities exemplifies the waning efficacy of property-led urbanization and highlights the necessity for governments to remain cautious about overreliance on real estate as a growth engine [57]. Economic growth similarly shows an “optimal interval” in its influence on population shifts. When the GDP growth rate hovers around 3%, urbanization is most stable and sustainable. This growth pace is sufficient to support employment and infrastructure expansion while avoiding the inflationary pressures and cost escalation associated with overheated economies. Both excessive and insufficient growth can undermine urban attractiveness and create misalignment between economic expansion and the urban carrying capacity [58].

Temporal differences further illuminate the evolutionary stages of sensitivity factors across regions. In the eastern and central regions, the influences of sensitivity factors have progressively strengthened over time, suggesting that these areas have gradually developed comprehensive advantages in policies, industries, and infrastructure. By contrast, the western region shows a clear inflection point at around 2011, after which the driving force behind population urbanization has weakened. This trend may be linked to the region’s dependence on resource-based industries, the tightening of ecological conservation policies, and persistent outmigration, all of which underscore the life-cycle characteristics of regional urban development [39].

Taken together, these nonlinear relationships demonstrate that the dynamics of urbanization are not simply driven by linear growth but are instead shaped by multiple adjustment mechanisms arising from structural saturation, efficiency thresholds, and spatiotemporal transitions. This finding holds critical implications for urban policy. Governments should develop threshold-monitoring frameworks grounded in the “effective intervals” of key variables and promote a transition from scale-driven to structure-optimized urbanization strategies.

5.4. Divergence in the Interaction Effects of Regional Population Urbanization Sensitivity Factors

The differentiated patterns of interaction mechanisms among sensitivity factors of population urbanization across the eastern coastland, central inland, and western interior regions in China stem from pronounced disparities in developmental stages, institutional frameworks, and resource endowments. For the eastern coastland, the synergistic interaction between the tertiary sector share and working-age share vividly illustrates the dynamics of growth pole theory, as realized in megacities [59], where the high-quality clustering of service industries has given rise to an industrial ecosystem dominated by knowledge- and capital-intensive sectors. This environment consistently attracts a highly skilled labor force, which, in turn, fuels further evolution and sophistication of the service economy. This dual engine of working-age population inflows and service sector deepening forms a self-reinforcing loop of endogenous growth [60]. Such a mechanism resonates with Krugman’s new economic geography, which emphasizes the mutually reinforcing effects of production and population concentration through agglomeration externalities [61]. This helps to explain why cities like New York, London, and Tokyo sustain powerful urbanization momentum despite already high levels of service sector dominance and population density [62].

Moving to the central inland, the “structural amplification” interaction between real estate investment and the working-age share highlights a pronounced dependence on fixed capital formation, with appropriate working-age population shares in emerging urban regions. This mechanism aligns closely with the growth machine theory, wherein local governments leverage land and real estate as strategic instruments to attract investment and population inflows, propelling rapid economic expansion [63]. Under China’s land finance system, this dependence has become especially prominent. Real estate not only supplies the physical infrastructure required for urban growth but also shapes asset prices and the spatial configuration of cities, guiding labor migration and settlement patterns [64]. Comparable dynamics have emerged in second-tier cities across Brazil and India, where large-scale housing and infrastructure projects are commonly employed to activate urban functions and drive population growth [65].

By contrast, in the western interior, the “balanced regulation” coupling between the GDP growth rate and tertiary sector share underscores how population urbanization in resource-constrained environments relies on a dynamic equilibrium between endogenous growth and structural transformation; that is, both the GDP growth rate and tertiary sector share should be kept in an appropriate interval. This pattern can be effectively explained by sustainable urbanization theory, which posits that the success of urban development depends not only on the speed of growth but also on whether it drives industrial upgrading and service sector enhancement [11]. Given the constraints posed by the limited ecological carrying capacity and insufficient external capital inflows, these regions must pursue a careful balance between economic expansion and diversification of services within finite resources [66]. Similar trajectories are evident in parts of sub-Saharan Africa and Central Asia, where achieving sustainable urbanization requires coordination between the development pace, spatial structure, and economic fundamentals [67].

Taken together, China’s population urbanization follows a multidimensional and adaptive path rather than a single linear model, shaped by local conditions. This experience offers lessons for other countries: Advanced regions should integrate knowledge and human capital; intermediate cities need to diversify investment and avoid overreliance on real estate, and less-developed areas should focus on employment, education, and public services to ensure sustainable urban growth.

6. Conclusions

This study investigates the socioeconomic drivers underlying China’s population urbanization process against the backdrop of rapid global urbanization and increasing challenges related to sustainable urban development. Utilizing machine-learning techniques, specifically, the XGBoost model, alongside big data analysis of China’s urbanization from 1991 to 2023, the research identifies a tripartite driving mechanism: industrial support, employment orientation, and infrastructure accessibility. This finding aligns closely with classical urban economics, particularly the employment-driven migration mechanism emphasized by the Harris–Todaro model [40]. Industrial agglomeration in China not only provides substantial employment opportunities but also strengthens migrant populations’ willingness to settle permanently, serving as a fundamental driver of sustained population growth. Furthermore, policies promoting industrial parks and special economic zones, coupled with extensive transportation infrastructure investments, have created a spatial network supporting urbanization, embodying a distinctive “production–employment–mobility” logic, characterized by the coexistence of state-led planning and market regulation.

Regional analysis reveals structural differences in driving factors of population urbanization across China. Specifically, eastern coastal regions exhibit a finance- and service-driven population urbanization model; central inland regions primarily follow an investment-driven pathway centered on infrastructure and real estate, whereas western regions predominantly rely on employment-led urbanization growth. This gradient structure of “service–investment–employment” reflects regional differences in fiscal capacity, industrial maturity, and population absorption mechanisms, facilitating differentiated policy interventions tailored to regional characteristics and enhancing governance effectiveness.

The study further highlights significant nonlinear effects of key sensitive factors influencing population urbanization, such as the service sector share, working-age population share, real estate investment, and GDP growth rate. Notably, the positive effect of the tertiary sector on urbanization reaches a plateau at around 45%, suggesting diminishing marginal returns due to structural homogeneity and declining labor absorption efficiency. Likewise, real estate investment displays a threshold effect at around 15% of the GDP, beyond which its marginal influence diminishes, underscoring market saturation risks and affordability pressures. GDP growth also shows an optimal range at around 3%, balancing economic vitality with the urban carrying capacity.

Interaction analyses reveal distinct regional coupling mechanisms among sensitive factors. In the eastern regions, the synergistic relationship between the tertiary sector growth and working-age population exemplifies growth pole theory, where high-quality service industries continuously attract skilled labor, creating a reinforcing growth loop. Central regions display structural amplification by real estate investment with appropriate labor availability, reflecting a growth-machine approach reliant on fixed capital formation and land finance strategies. Conversely, western regions show balancing interactions between GDP growth and service sector development, highlighting an appropriate interval due to limited resources and economic constraints.

Based on these findings, several policy recommendations can be proposed:

Economically advanced regions should prioritize aligning knowledge capital with human resources and promoting high-value service industries to continuously attract skilled populations and support sustainable urban growth;
Cities at intermediate development stages should optimize investment structures, reducing excessive reliance on real estate and encouraging integrated development across industry, population, and land to achieve more balanced urban expansion;
Resource-dependent and less-developed regions should emphasize strengthening employment opportunities, education infrastructure, and essential public services, thereby enhancing their capacity to sustain population inflow and improve the overall urban sustainability;
Establish threshold-based monitoring frameworks anchored in the effective intervals of key indicators, including the share of the tertiary sector, the real estate investment intensity, the proportion of the working-age population, and the GDP growth rate.

Funding

This research was funded by Hubei Provincial Federation of Social Sciences Post-Funded Project (HBSKJJ20243266) and Open Fund Project of Key Research Base of Humanities and Social Sciences in Hubei Province Universities, Research Center for Reservoir Resettlement (2022KFJJ04).

Data Availability Statement

The data supporting this article will be made available by the authors onreasonable request from corressponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kundu, D.; Pandey, A.K. World urbanisation: Trends and patterns. In Develo National Urban Policies: Ways Forward to Green and Smart Cities; Springer: Singapore, 2020; pp. 13–49. [Google Scholar]
Zhang, X.Q. The trends, promises and challenges of urbanisation in the world. Habitat Int. 2016, 54, 241–252. [Google Scholar] [CrossRef]
Onyango, A.O. Global and regional trends of urbanization: A critical review of the environmental and economic imprints. World Environ. 2018, 8, 47–62. [Google Scholar]
Balula, L.; Seixas, J. Contemporary city and plural knowledge: Reframing urban planning. In Architecture and the Social Sciences: Inter-and Multidisciplinary Approaches Between Society and Space; Springer International Publishing: Cham, Switzerland, 2017; pp. 69–84. [Google Scholar]
Buriachenko, A.; Koczar, J.; Biliavska, O.; Levchenko, K.; Kolomiiets, H. The impact of urbanization on socio-economic development: The experience of Poland, Spain and Ukraine. Sci. Bull. Natl. Min. Univ. 2024, 158–168. [Google Scholar] [CrossRef]
Zhu, C.; Sun, S.; Li, Z. The influencing factors and spatial distribution of population urbanization in China. Geogr. Res. 2008, 27, 13–23. [Google Scholar]
Tacoli, C.; McGranahan, G.; Satterthwaite, D. Urbanisation, Rural-Urban Migration and URBAN Poverty; Human Settlements Group, International Institute for Environment and Development: London, UK, 2015. [Google Scholar]
Castells-Quintana, D.; Royuela, V.; Veneri, P. Inequality and city size: An analysis for OECD functional urban areas. Pap. Reg. Sci. 2020, 99, 1045–1065. [Google Scholar] [CrossRef]
Gyawali, B.R.; Hill, A.; Banerjee, S.; Chembezi, D.; Christian, C.; Bukenya, J.; Silitonga, M. Examining rural-urban population change in the southeastern United States. J. Rural. Soc. Sci. 2013, 28, 4. [Google Scholar]
Shangguan, Z. Investigating the spatiotemporal dynamics and interplay mechanisms between population urbanization and carbon dioxide emissions in China. J. Geochem. Explor. 2024, 266, 107571. [Google Scholar] [CrossRef]
Shangguan, Z. Exploring the impact of population urbanization on the green economy development: A case study of 30 provincial-level administrative regions in China. Phys. Chem. Earth Parts A/B/C 2024, 136, 103727. [Google Scholar] [CrossRef]
Zhang, P.; Yuan, H.; Tian, X. Sustainable development in China: Trends, patterns, and determinants of the “Five Modernizations” in Chinese cities. J. Clean. Prod. 2019, 214, 685–695. [Google Scholar] [CrossRef]
Liu, Y.; Lu, S.; Chen, Y. Spatio-temporal change of urban–rural equalized development patterns in China and its driving factors. J. Rural. Stud. 2013, 32, 320–330. [Google Scholar] [CrossRef]
Yang, K. New urbanization and coordinated regional development. Chin. J. Urban Environ. Stud. 2019, 7, 1975009. [Google Scholar] [CrossRef]
Ahrend, R.; Lembcke, A.C.; Schumann, A. The role of urban agglomerations for economic and productivity growth. Int. Product. Monit. 2017, 32, 161–179. [Google Scholar]
Cohen, M.; Simet, L. Macroeconomy and urban productivity. In Urban Planet: Knowledge Towards Sustainable Cities; Cambridge University Press & Assessment: Cambridge, UK, 2018; pp. 130–146. [Google Scholar]
Cuaresma, J.C.; Doppelhofer, G.; Feldkircher, M. The determinants of economic growth in European regions. Reg. Stud. 2014, 48, 44–67. [Google Scholar] [CrossRef]
Liao, F.H.; Wei, Y.D. Space, scale, and regional inequality in provincial China: A spatial filtering approach. Appl. Geogr. 2015, 61, 94–104. [Google Scholar] [CrossRef]
Xing, C. Human Capital and Urbanization in the People’s Republic of China; ADBI Working Paper, No. 603; Asian Development Bank Institute: Tokyo, Japan, 2016. [Google Scholar]
Ginsburg, C.; Bocquier, P.; Beguy, D.; Afolabi, S.; Augusto, O.; Derra, K.; Odhiambo, F.; Otiende, M.; Soura, A.B.; Zabre, P.; et al. Human capital on the move: Education as a determinant of internal migration in selected INDEPTH surveillance populations in Africa. Demogr. Res. 2016, 34, 845. [Google Scholar] [CrossRef]
Zhiqiang, L. Human capital externalities and rural–urban migration: Evidence from rural China. China Econ. Rev. 2008, 19, 521–535. [Google Scholar] [CrossRef]
Zhou, J.; Song, J.; Huang, X. Human capital, well-being and growth rate of rural–urban migration in China. Singap. Econ. Rev. 2021, 1–34. [Google Scholar] [CrossRef]
Tong, Y.; Tan, C.H.; Sia, C.L.; Shi, Y.; Teo, H.-H. Rural-urban healthcare access inequality challenge: Transformative roles of information technology. Mis Q. 2022, 46, 1937–1982. [Google Scholar] [CrossRef]
Qin, V.M.; McPake, B.; Raban, M.Z.; Cowling, T.E.; Alshamsan, R.; Chia, K.S.; Smith, P.C.; Atun, R. Rural and urban differences in health system performance among older Chinese adults: Cross-sectional analysis of a national sample. BMC Health Serv. Res. 2020, 20, 372. [Google Scholar] [CrossRef] [PubMed]
City, B.L.; Assessment, E.; Tool, R. Urbanization and health. Bull World Health Organ 2010, 88, 245–246. [Google Scholar]
Chindarkar, N.; Nakajima, M.; Wu, A.M. Inequality of Opportunity in Health Among Urban, Rural, and Migrant Children: Evidence from China. J. Soc. Policy 2024, 53, 950–969. [Google Scholar] [CrossRef]
Hong, Y.; Li, X.; Stanton, B.; Lin, D. Too costly to be ill: Health care access and health seeking behaviors among rural-to-urban migrants in China. World Health Popul. 2006, 8, 22. [Google Scholar] [CrossRef] [PubMed]
Lu, H.; Zhao, P.; Hu, H.; Zeng, L.; Wu, K.S.; Lv, D. Transport infrastructure and urban-rural income disparity: A municipal-level analysis in China. J. Transp. Geogr. 2022, 99, 103292. [Google Scholar] [CrossRef]
Wu, W. Urban infrastructure financing and economic performance in China. Urban Geogr. 2010, 31, 648–667. [Google Scholar] [CrossRef]
Xu, Y.; Zhu, S. Transport infrastructure, intra-regional inequality and urban-rural divide: Evidence From China’s high-speed rail construction. Int. Reg. Sci. Rev. 2024, 47, 378–406. [Google Scholar] [CrossRef]
Chanieabate, M.; He, H.; Guo, C.; Abrahamgeremew, B.; Huang, Y. Examining the relationship between transportation infrastructure, urbanization level and rural-urban income gap in China. Sustainability 2023, 15, 8410. [Google Scholar] [CrossRef]
Du, Y.; Yang, C. Demographic transition and labour market changes: Implications for economic development in China. China’s Econ. A Collect. Surv. 2015, 28, 25–44. [Google Scholar]
Che, S.Y.; Chen, W.; Guo, L. Demographic dividend in china’s economic growth. Popul. Econ. 2011, 3, 16–23. [Google Scholar]
Eggleston, K.; Oi, J.C.; Rozelle, S.; Sun, A.; Walder, A.; Zhou, X. Will demographic change slow China’s rise? J. Asian Stud. 2013, 72, 505–518. [Google Scholar] [CrossRef]
Yin, Z.; Gang, Y. A Brief Analysis on the Development Strategies for New-Type Urbanization Simulated by Demographic Factors: Based on Real Evidence in Chongqing. Can. Soc. Sci. 2014, 10, 126. [Google Scholar]
Zhan, P.; Ma, X.; Li, S. Migration, population aging, and income inequality in China. J. Asian Econ. 2021, 76, 101351. [Google Scholar] [CrossRef]
Chen, M.; Zhang, H.; Gong, Y. China’s Population Aging and New Urbanization. In E-Planning and Collaboration: Concepts, Methodologies, Tools, and Applications; IGI Global Scientific Publishing: Hershey, PA, USA, 2018; pp. 382–400. [Google Scholar]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Wang, Y. Social Security: Establishment and Equalized Provision of Basic Public Services. In The Chinese Approach: How China Has Transformed Its Economy and System? Springer: Singapore, 2021; pp. 311–351. [Google Scholar]
Sancar, C.; Akbaş, Y.E. The Effect of Unemployment and Urbanization on Migration in Turkey: An Evaluation in terms of the Harris-Todaro Model. Sosyoekonomi 2022, 30, 215–239. [Google Scholar] [CrossRef]
Du, Y.; Wang, M. Chapter Eight Population, Industrial Development, and Employment in Chinese Urbanization. In The China Population and Labor Yearbook; Brill: Leiden, The Netherlands, 2012; Volume 3, pp. 189–214. [Google Scholar]
Pannell, C.W. China’s continuing urban transition. Environ. Plan. A 2002, 34, 1571–1589. [Google Scholar] [CrossRef]
Hamnett, C. Is Chinese urbanisation unique? Urban Stud. 2020, 57, 690–700. [Google Scholar] [CrossRef]
Huang, S.; Lin, Y. Research on High-Quality Urbanization Development and Optimization Pathways Based on the Coupling Coordination Perspective of “Population–Land–Economy–Environment”: A Case Study of Jiangsu Province, China. Land 2025, 14, 435. [Google Scholar] [CrossRef]
Floater, G.; Rode, P.; Robert, A.; Kennedy, C.; Hoornweg, D.; Slavcheva, R.; Godfrey, N. Cities and the New Climate Economy: The Transformative Role of Global Urban Growth. New Climate Economy Contributing Paper. Retrieved from: New Climate Economy and London School of Economic and Political Science, 2014. Available online: www.newclimateeconomy.net (accessed on 15 May 2017).
Zhang, K.H. Urbanization and industrial development in China. In China’s Urbanization and Socioeconomic Impact; Springer: Singapore, 2017; pp. 21–35. [Google Scholar]
Song, F.; Timberlake, M. Chinese urbanization, state policy, and the world economy. J. Urban Aff. 1996, 18, 285–306. [Google Scholar] [CrossRef]
Pannell, C. China’s urban transition. J. Geogr. 1995, 94, 394–403. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Luo, J.; Zhang, X.; Skitmore, M. Urban growth dilemmas and solutions in China: Looking forward to 2030. Habitat Int. 2016, 56, 42–51. [Google Scholar] [CrossRef]
Sun, Y.; Zhu, Y.; Zheng, W. Policy simulation on national large scale investment program based on the impaction of financial crisis. Geogr. Res. 2010, 29, 789–800. [Google Scholar]
Dang, J.; Wang, J.; Tu, B. The impact of National Forest City Construction on local employment: Evidence from China. For. Policy Econ. 2025, 172, 103421. [Google Scholar] [CrossRef]
Shan, Z.; Wang, Y. Strategic talent development in the knowledge economy: A comparative analysis of global practices. J. Knowl. Econ. 2024, 15, 19570–19596. [Google Scholar] [CrossRef]
Oyelaran-Oyeyinka, B.; Lal, K. Structural Transformation and Economic Development: Cross Regional Analysis of Industrialization and Urbanization; Routledge: London, UK, 2016. [Google Scholar]
Su, H.; Wei, H.; Zhao, J. Density effect and optimum density of the urban population in China. Urban Stud. 2017, 54, 1760–1777. [Google Scholar] [CrossRef]
Ali, A. Nonlinear effects of urbanization routes on environmental degradation, evidence from China, India, Indonesia, the United States, and Brazil. Front. Psychol. 2023, 34, 3391–3416. [Google Scholar]
Sodhi, I.S. Urbanization in China and India. Rise India China Soc. Econ. Environ. Impacts 2020, 73, 73. [Google Scholar]
Wang, Z.; Wang, C.; Zhang, Q. Population ageing, urbanization and housing demand. J. Serv. Sci. Manag. 2015, 8, 516–525. [Google Scholar] [CrossRef][Green Version]
Kose, M.A.; Ohnsorge, F. Falling Long-Term Growth Prospects: Trends, Expectations, and Policies; World Bank Publications: Washington, DC, USA, 2024. [Google Scholar]
Polenske, K.R. Growth pole theory and strategy reconsidered: Domination, linkages, and distribution. In Regional Economic Development; Routledge: London, UK, 2017; pp. 91–111. [Google Scholar]
Qi, A.; Feng, Z.; Xie, M.; Song, Y.; Guan, H.; Hao, F. Measurement and Driving Mechanisms of Coordinated Development Between Innovation Capacity and Industrial Transformation in Northeast China Under the Background of Population Shrinkage. Chin. Geogr. Sci. 2025, 35, 819–834. [Google Scholar] [CrossRef]
Nijkamp, P.; Kourtit, K.; Krugman, P.; Moreno, C. Old wisdom and the New Economic Geography: Managing uncertainty in 21st century regional and urban development. Reg. Sci. Policy Pract. 2024, 16, 100124. [Google Scholar] [CrossRef]
Sassen, S. The Global City: New York, London, Tokyo; Princeton University Press: Princeton, NJ, USA, 2013. [Google Scholar]
Hunt, S.D. Economic growth: Should policy focus on investment or dynamic competition? Eur. Bus. Rev. 2007, 19, 274–291. [Google Scholar] [CrossRef]
Wu, F. Land financialisation and the financing of urban development in China. Land Use Policy 2022, 112, 104412. [Google Scholar] [CrossRef]
Ren, X. Urban China; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Wackernagel, M.; Rees, W.E. Perceptual and structural barriers to investing in natural capital: Economics from an ecological footprint perspective. Ecol. Econ. 1997, 20, 3–24. [Google Scholar] [CrossRef]
Saghir, J.; Santoro, J. Urbanization in Sub-Saharan Africa. In Meeting Challenges by Bridging Stakeholders; Center for Strategic & International Studies: Washington, DC, USA, 2018. [Google Scholar]

Figure 1. Theoretical framework of the study.

Figure 2. Regional division of the study area.

Figure 3. Importance ranking of driving factors derived from the XGBoost model.

Figure 4. Results of the regional heterogeneity analysis of driving factors: (a) eastern coastland; (b) central inland; (c) western interior.

Figure 5. Static partial dependence analysis: (a) eastern coastland; (b) central inland; (c) western interior.

Figure 6. Temporal partial dependence analysis: (a) eastern coastland; (b) central inland; (c) western interior.

Figure 7. Interaction analysis: (a) eastern coastland; (b) central inland; (c) western interior.

Table 1. Socioeconomic driving factors of population urbanization.

Dimension	Variable Name	Indicator Formula	Abb.
Economic Development	GDP Per Person	Gross Regional Product/Total Population	GPP
	GDP Growth Rate	GDP change/Previous GDP	GGR
	Tertiary Sector Share	Value of the Tertiary Sector/Total GDP	TSS
Education Resources	Higher Education Share	Higher-Education Population/Total Population	HES
	Teacher–Student Ratio	Number of Students/Number of Teachers	TSR
	Per Capita Education	Total Education Spending/Total Population	PCE
Healthcare and Services	Hospital Bed Rate	Total Hospital Beds/Total Population	HBR
	Doctor Service Rate	Number of Doctors/Total Population	DSR
	Per Capita Health	Total Health Spending/Total Population	PCH
Infrastructure Access	Road Density Degree	Total Road Length/Land Area	RDL
	Real Estate Investment	Total Real Estate Investment/Total GDP	REI
	Transit Vehicle Rate	Number of Public Buses/Total Population	TVR
Population and Labor	Working-Age Share	Population Aged 15–64/Total Population	WAS
	Population Density Level	Total Population/Land Area	PDL
	Urban Job Growth	Employment Change/Previous Employment	UJG

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shangguan, Z. An Explainable Machine-Learning Framework Based on XGBoost–SHAP and Big Data for Revealing the Socioeconomic Drivers of Population Urbanization in China. Systems 2025, 13, 679. https://doi.org/10.3390/systems13080679

AMA Style

Shangguan Z. An Explainable Machine-Learning Framework Based on XGBoost–SHAP and Big Data for Revealing the Socioeconomic Drivers of Population Urbanization in China. Systems. 2025; 13(8):679. https://doi.org/10.3390/systems13080679

Chicago/Turabian Style

Shangguan, Ziheng. 2025. "An Explainable Machine-Learning Framework Based on XGBoost–SHAP and Big Data for Revealing the Socioeconomic Drivers of Population Urbanization in China" Systems 13, no. 8: 679. https://doi.org/10.3390/systems13080679

APA Style

Shangguan, Z. (2025). An Explainable Machine-Learning Framework Based on XGBoost–SHAP and Big Data for Revealing the Socioeconomic Drivers of Population Urbanization in China. Systems, 13(8), 679. https://doi.org/10.3390/systems13080679

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Explainable Machine-Learning Framework Based on XGBoost–SHAP and Big Data for Revealing the Socioeconomic Drivers of Population Urbanization in China

Abstract

1. Introduction

2. Literature Review and Indicator System Development

3. Materials and Methods

3.1. Overview of the XGBoost Regression Model

3.2. SHAP Value Method

3.3. Partial Dependence Plots

3.4. Data Collection

3.5. Research Framework

4. Results

4.1. Importance Analysis of Population-Urbanization-Driving Factors

4.2. Regional Heterogeneity of Driving Factors According to the SHAP Model

4.3. Partial Dependence Analysis of Regional Sensitivity Factors

4.3.1. Static-Effect Analysis

4.3.2. Temporal-Effect Analysis

4.4. Interaction Analysis of Regional Sensitivity Factors

5. Discussion

5.1. China’s Tripartite Mechanism of Population Urbanization

5.2. Structural Differentiation in Regional Population Urbanization Mechanisms

5.3. Nonlinear Dynamics of Sensitivity Factors in Population Urbanization

5.4. Divergence in the Interaction Effects of Regional Population Urbanization Sensitivity Factors

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI