Unveiling the Spatially Heterogeneous Driving Mechanisms of Net Migration in Chinese Cities: A Geographically Weighted Random Forest Approach
Abstract
1. Introduction
- 1.
- To identify the dominant global determinants of net migration and detect potential non-linear threshold effects, such as critical wage “take-off” points, that govern population redistribution;
- 2.
- To examine the spatial non-stationarity of these driving mechanisms and determine whether distinct migration regimes exist across China’s diverse geographical regions;
- 3.
- To validate the methodological advantages of the GWRF model by benchmarking its predictive performance and local interpretability against traditional OLS, GWR, and global RF approaches.
2. Literature Review and Theoretical Framework
2.1. Evolution of Migration Theories: From Flow Mechanisms to Urban Attraction
2.2. The “Linearity Bias” and the Necessity of Non-Linear Response
2.3. Spatial Non-Stationarity in Migration Mechanisms
2.4. Theoretical Framework: Spatially Heterogeneous Non-Linear Attraction
3. Data
3.1. Study Area and Data Source
3.2. Variable Selection and Preprocessing
- 1.
- Economic Drivers: Per capita GDP and Average Wage. Wages are the direct incentive for population movement, while per capita GDP reflects macro-economic strength.
- 2.
- Industry and Employment: Tertiary Industry Share and Registered Urban Unemployment Rate. The share of the tertiary industry represents industrial upgrading and employment absorption capacity, while the unemployment rate acts as a primary push factor.
- 3.
- Public Services and Innovation: Per capita Fiscal Expenditure, per capita Education Expenditure, and per capita Science and Technology Expenditure. These measure the quality of soft infrastructure and future development potential.
- 4.
- Environment and Cost: Annual average PM2.5 Concentration and Minimum Monthly Wage. The latter serves as a proxy for the local cost of living, as minimum wage standards in China are administratively set by provincial and municipal governments based on local economic conditions, consumer price indices, and housing cost levels. While granular city-level housing price data would be an ideal complement, its limited availability across all 278 cities necessitates this proxy approach.
- 5.
- Demographics and Scale: Gender Ratio and Urban Population Scale (log-transformed) to control for the non-linear impacts of agglomeration and congestion effects.
4. Methodology
4.1. Random Forest and Partial Dependence
4.2. Geographically Weighted Random Forest
- 1.
- Local Weighting: Sample points closer to city i are assigned higher weights during model training, while the weights of sample points beyond a specific bandwidth (b) are set to zero. This study utilizes an Adaptive Bisquare Kernel function to determine the spatial weights, following the established geographical weighting principles [30]:where represents the distance between city i and observation j.
- 2.
- Bandwidth Optimization: The choice of bandwidth directly determines the model’s fit and spatial smoothness. This study employs the Incremental Spatial Autocorrelation (ISA) algorithm to search for the optimal bandwidth. The algorithm identifies the bandwidth value that minimizes the spatial autocorrelation (Moran’s I) of the model residuals, ensuring that the errors are randomly distributed and the local models are statistically valid [14].
- 3.
- Prediction Synthesis: The final prediction results are based on a spatially weighted average of the predicted values from all local models within the optimal bandwidth, balanced with the global model’s predicted values to ensure robust generalization.
4.3. Model Evaluation and Validation
5. Results
5.1. Spatial Distribution Pattern of Net Migration
5.2. Random Forest Model: Global Feature Importance
- 1.
- Core Economic Incentives (Wage and Wealth). “Average Wage” (Importance = 0.230) stands out as the most decisive variable, nearly doubling the importance of the second-ranked factor. This reinforces the “income maximization” hypothesis [34], indicating that actual labor compensation remains the fundamental magnet for population redistribution. This is followed by “Per Capita GDP” (0.124), representing a city’s macro-economic strength. The substantial gap between wage and GDP importance suggests that migrants are driven more by individual disposable income than by a city’s aggregate wealth.
- 2.
- Industrial Structure and Labor Market. The “Tertiary Industry Share” (0.122) ranks third, acting as a critical employment reservoir due to its capacity to absorb labor. Meanwhile, the “Unemployment Rate” (0.092), serving as a primary push factor, also shows high significance, implying that job security is a baseline requirement for migration stability.
- 3.
- Agglomeration and Innovation Potential. “Population Scale” (0.084) and “Per Capita Science and Technology Expenditure” (0.081) represent the attraction of urban agglomeration and future development prospects, respectively. High technology investment signals industrial vitality and the potential for professional career growth.
- 4.
- Environmental Quality and Public Services. Factors such as “PM2.5” (0.043) and per capita fiscal or education expenditures rank lower in the global model. This indicates that while quality-of-life amenities are increasingly valued, economic opportunity and job security still constitute the dominant decision-making logic for the majority of intercity migrants in China.
5.3. Non-Linear Threshold Effects: Partial Dependence Plots (PDP)
- 1.
- The “S-shaped” Growth and Saturation of Economic Drivers: The PDP curve for average wage shows a classic “S-shaped” pattern. Below 80,000 RMB, growth is flat (the “low-level equilibrium trap”). Once the 80,000 RMB threshold is breached, the slope rises sharply, showing strong increasing marginal returns. Beyond 90,000 RMB, the curve flattens again, suggesting that for ultra-high-income cities, salary becomes less of a decisive factor compared to other non-economic elements. Figure 5 illustrates the response curves for income-related factors.
- 2.
- The “Threshold Effect” of Industry and Innovation: For Tertiary Industry Share, the impact is minimal below . However, once it exceeds (the mark of a post-industrial city), net flow increases explosively. This indicates that only when the service industry forms a scale effect can it effectively transform into an employment reservoir. Similarly, per capita tech expenditure shows a long stagnant period below million RMB, followed by a vertical surge after crossing this “critical mass” threshold (Figure 6).
- 3.
- The “U-shaped” Scale Point and Asymmetric Unemployment Shock: Population scale shows an U-curve. Between values of 9 and 10, the agglomeration effect is dominant. However, above (super-cities), the curve slope plateaus or slightly declines, indicating that “congestion effects” (high housing prices, traffic) begin to offset the benefits of agglomeration. For unemployment, once it exceeds , the curve drops sharply, indicating a psychological “red line” for migrants (Figure 7).
5.4. Spatial Heterogeneity: Mapping GWRF Local Mechanisms
5.4.1. Spatial Variation of Economic Drivers: Wage and GDP
5.4.2. The Role of Technology Investment in Shrinking Industrial Cities
5.4.3. Industrial Specialization and Demographic Flows in Manufacturing Hubs
5.4.4. Service Industry Development and Labor Market Stability
5.5. Dominant Driving Factors: A Typological Perspective
- 1.
- “Wage-oriented” Cities (Survival Law): Clustered in the Northwest (Lanzhou, Yinchuan) and Southwest (Guiyang, Yibin). Improving income levels is the most direct means of stemming outflow and attracting return migration in these regions where survival needs still carry overwhelming marginal utility.
- 2.
- “Technology-aspiring” Cities (Rust Belt Transformation): Dominant in the Northeast (Harbin, Changchun, Qiqihar) and North China (Taiyuan, Baotou). The key to retaining talent here lies in the “incremental expectations” created through technological innovation rather than current stock wealth.
- 3.
- “Gender-structure-driven” Cities (World Factory Effect): Highly concentrated in the PRD (Shenzhen, Dongguan, Foshan) and manufacturing bases in the Yangtze River Delta (Suzhou, Wuxi). The gender-based preferences of the light industry directly shape local migration patterns.
- 4.
- “Employment-security-driven” Cities (Stability Seekers): In core cities like Shanghai, Hefei, and Ningbo, the unemployment rate () replaces wage as the dominant factor. Middle-class and skilled workers attracted to these hubs prioritize long-term career stability over short-term wage premiums.
- 5.
- “Scale-dependent” Cities (Agglomeration vs. Congestion): Clustered in traditional industrial zones in Liaoning and Hubei (e.g., Anshan, Jingzhou). These cities rely on existing urban-scale inertia to maintain attraction or face outflow due to congestion effects.
5.6. Regional Heterogeneity: The “Mosaic” of Migration Mechanisms
- 1.
- Northwest and Southwest: Average wage exhibits the highest regional importance ( in NW, in SW). In inland regions, material rewards are the decisive factor. Policies focusing on “soft environments” (e.g., education, PM2.5) without competitive salaries may see limited results.
- 2.
- South China: This region is extremely sensitive to the “Unemployment Rate” (, the region’s top factor), even surpassing wages (). Migration here is highly “employment-following,” reacting rapidly to job market fluctuations. The “Gender Ratio” importance () is also the highest in the country.
- 3.
- North and Northeast: These regions show the highest sensitivity to “Per Capita Technology Expenditure” ( and , respectively). Facing the transition from old to new kinetic energy, increasing innovation input is crucial for curbing population loss.
- 4.
- Central China: “Population Scale” () is more significant here () than elsewhere. With the “Strong Provincial Capital” strategy in cities like Wuhan and Zhengzhou, large cities are exerting a powerful siphon effect on surrounding smaller cities.
5.7. Comparative Analysis of Model Performance
5.7.1. Advantages of Non-Linear Models
5.7.2. Advantages of Spatial Models
5.7.3. The Integrated Superiority of GWRF
6. Discussion
6.1. Re-Theorizing Migration: Thresholds and Asymmetry
6.2. The Spatial Mosaic of Migration Regimes
- 1.
- The “Survival-Oriented” West: In the Northwest and Southwest, wages remain the absolute dominant driver. This aligns with Maslow’s hierarchy of needs: where basic economic sufficiency is the primary concern, direct income maximization outweighs “soft” amenities, reflecting the pragmatic economic rationality of migrants in less-developed regions [34].
- 2.
- The “Structural-Screening” South: In the Pearl River Delta, migration is driven by structural factors like gender ratios and unemployment. This reflects a labor market shaped by light manufacturing that generated specific demographic demands.
- 3.
- The “Transformation-Hope” North: In the Northeast “Rust Belt,” technology expenditure plays a pivotal role. In these shrinking cities, innovation investment serves as a signal of industrial upgrading and future opportunity.
- 4.
- The “Stability-Quality” East: In the affluent Yangtze River Delta, drivers shift towards stability () and quality of life, suggesting a transition to a mature migration stage where high-human-capital workers become increasingly sensitive to urban consumption and quality of life [39].
6.3. Institutional Underpinnings of Spatial Heterogeneity
6.4. Policy Implications: From Expansion to Precision
6.5. Limitations and Future Directions
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bai, X.; Chen, J.; Shi, P. Landscape urbanization and economic growth in China: Positive feedbacks and sustainability dilemmas. Environ. Sci. Technol. 2012, 46, 132–139. [Google Scholar] [CrossRef] [PubMed]
- Chan, K. Crossing the 50 Percent Population Rubicon: Can China Urbanize to Prosperity? In Urbanization with Chinese Characteristics: The Hukou System and Migration; Chan, K., Ed.; Routledge: London, UK, 2018; pp. 166–189. [Google Scholar]
- Su, Y.; Hua, Y.; Deng, L. Agglomeration of human capital: Evidence from city choice of online job seekers in China. Reg. Sci. Urban Econ. 2021, 91, 103621. [Google Scholar] [CrossRef]
- Long, Y.; Wu, K. Shrinking cities in a rapidly urbanizing China. Environ. Plan. A 2016, 48, 220–222. [Google Scholar] [CrossRef]
- Lee, E. A theory of migration. Demography 1966, 3, 47–57. [Google Scholar] [CrossRef]
- Harris, J.; Todaro, M. Migration, unemployment and development: A two-sector analysis. Am. Econ. Rev. 1970, 60, 126–142. [Google Scholar]
- Glaeser, E.; Kolko, J.; Saiz, A. Consumer city. J. Econ. Geogr. 2001, 1, 27–50. [Google Scholar] [CrossRef]
- Xia, H.; Liu, Q.; Baptista, E. Spatial heterogeneity of internal migration in China: The role of economic, social and environmental characteristics. PLoS ONE 2022, 17, e0276992. [Google Scholar] [CrossRef]
- Zipf, G. The P1P2/D hypothesis: On the intercity movement of persons. Am. Sociol. Rev. 1946, 11, 677–686. [Google Scholar] [CrossRef]
- Robinson, C.; Dilkina, B. A machine learning approach to modeling human migration. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, Menlo Park, CA, USA, 20–22 June 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Fan, C. Migration and Labor-Market Returns in Urban China: Results from a Recent Survey in Guangzhou. Environ. Plan. A 2001, 33, 479–508. [Google Scholar] [CrossRef]
- Qi, W.; Deng, Y.; Fu, B. Rural attraction: The spatial pattern and driving factors of China’s rural in-migration. J. Rural Stud. 2022, 93, 461–470. [Google Scholar] [CrossRef]
- Brunsdon, C.; Fotheringham, A.; Charlton, M. Geographically weighted regression: A method for exploring spatial nonstationarity. Geogr. Anal. 1996, 28, 281–298. [Google Scholar] [CrossRef]
- Georganos, S.; Grippa, T.; Gadiaga, A.; Linard, C.; Lennert, M.; Vanhuysse, S.; Kalogirou, S. Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int. 2021, 36, 121–136. [Google Scholar] [CrossRef]
- Jia, J.; Lu, X.; Yuan, Y.; Xu, G.; Jia, J.; Christakis, N. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 2020, 582, 389–394. [Google Scholar] [CrossRef] [PubMed]
- Shi, F.; Geng, W.; Huang, R.; Mao, Y.; Jia, J. Push-pull mechanisms in China’s intercity population migration: Nonlinearity and asymmetry. Cities 2025, 157, 105624. [Google Scholar] [CrossRef]
- Shen, J. Increasing internal migration in China from 1985 to 2005: Institutional versus economic drivers. Habitat Int. 2013, 39, 1–7. [Google Scholar] [CrossRef]
- Shi, F.; Cao, W.; Huang, R.; Geng, W. Spatially heterogeneous drivers of hukou transfer intentions in China: A geographically weighted logistic regression analysis. Ann. Reg. Sci. 2025, 74, 59. [Google Scholar] [CrossRef]
- Eraydin, A.; Özatağan, G. Pathways to a resilient future: A review of policy agendas and governance practices in shrinking cities. Cities 2021, 115, 103226. [Google Scholar] [CrossRef]
- Tiebout, C. A pure theory of local expenditures. J. Polit. Econ. 1956, 64, 416–424. [Google Scholar] [CrossRef]
- China Highway & Transportation Society. 2020 Transportation of China. Technical Report, China Highway & Transportation Society, 2020. Available online: https://www.chts.cn/cms_files/filemanager/1389253025/attach/20235/99d768b212db4947909701bd6db04f2c.pdf (accessed on 6 March 2025).
- Stark, O.; Bloom, D. The new economics of labor migration. Am. Econ. Rev. 1985, 75, 173–178. [Google Scholar]
- Cooke, T. Migration in a family way. Popul. Space Place 2008, 14, 255–265. [Google Scholar] [CrossRef]
- Lenormand, M.; Louail, T.; Cantu-Ros, O.; Picornell, M.; Herranz, R.; Arias, J.; Ramasco, J. Influence of sociodemographic characteristics on human mobility. Sci. Rep. 2015, 5, 10075. [Google Scholar] [CrossRef] [PubMed]
- Bettencourt, L.; Lobo, J.; Strumsky, D.; West, G. Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities. PLoS ONE 2010, 5, e13541. [Google Scholar] [CrossRef] [PubMed]
- Bettencourt, L.; Lobo, J.; Helbing, D.; Kühnert, C.; West, G. Growth, innovation, scaling, and the pace of life in cities. Proc. Natl. Acad. Sci. USA 2007, 104, 7301–7306. [Google Scholar] [CrossRef]
- O’Brien, R. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 2007, 41, 673–690. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Friedman, J. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Fotheringham, A.; Brunsdon, C.; Charlton, M. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships; Wiley: Chichester, UK, 2002. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Roberts, D.; Bahn, V.; Ciuti, S.; Boyce, M.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J.; Schröder, B.; Thuiller, W.; et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Y. Revitalize the world’s countryside. Nature 2017, 548, 275–277. [Google Scholar] [CrossRef]
- Zhao, Y. Labor migration and earnings differences: The case of rural China. Econ. Dev. Cult. Change 1999, 47, 767–782. [Google Scholar] [CrossRef]
- Easterlin, R. Does economic growth improve the human lot? Some empirical evidence. In Nations and Households in Economic Growth: Essays in Honor of Moses Abramovitz; David, P., Reder, M., Eds.; Academic Press: New York, NY, USA, 1974; pp. 89–125. [Google Scholar] [CrossRef]
- Pun, N. Made in China: Women Factory Workers in a Global Workplace; Duke University Press: Durham, NC, USA, 2005. [Google Scholar] [CrossRef]
- Simini, F.; Barlacchi, G.; Luca, M.; Pappalardo, L. A Deep Gravity model for mobility flows generation. Nat. Commun. 2021, 12, 6576. [Google Scholar] [CrossRef]
- Moon, B. Paradigms in migration research: Exploring ’moorings’ as a schema. Prog. Hum. Geogr. 1995, 19, 504–524. [Google Scholar] [CrossRef]
- Chen, Y.; Rosenthal, S. Local amenities and life-cycle migration: Do people move for jobs or fun? J. Urban Econ. 2008, 64, 519–537. [Google Scholar] [CrossRef]
- Zhou, J.; Hui, E. Housing prices, migration, and self-selection of migrants in China. Habitat Int. 2022, 119, 102479. [Google Scholar] [CrossRef]
- Fang, C.; Yu, D. China’s New Urbanization: Developmental Paths, Blueprints and Patterns; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Gu, H.; Liu, Z.; Shen, T. Spatial pattern and determinants of migrant workers’ interprovincial hukou transfer intention in China: Evidence from a national migrant population dynamic monitoring survey in 2016. Popul. Space Place 2020, 26, e2250. [Google Scholar] [CrossRef]












| Variable Name | Description | Min | Max | Mean | Std. Dev. |
|---|---|---|---|---|---|
| Net Migration | Pre-Spring Festival net outflow (attraction proxy) | −1,451,142 | 3,192,970 | 10,074.20 | 630,415.70 |
| Population Scale | Permanent population in 2019 (10k) | 31.40 | 3208.90 | 461.25 | 389.71 |
| GDP Scale | Gross Domestic Product in 2019 (100m RMB) | 231.20 | 38,156.00 | 3319.40 | 4710.50 |
| Fiscal Expenditure | General public budget expenditure (100m RMB) | 31.80 | 8179.20 | 575.80 | 815.50 |
| Education Expenditure | Expenditure on education (100m RMB) | 4.50 | 1136.00 | 93.90 | 119.10 |
| Tech Expenditure | Expenditure on science and tech (100m RMB) | 0.20 | 548.40 | 18.40 | 54.05 |
| Average Wage | Annual average wage of employees (RMB) | 44,953 | 173,204 | 77,291.21 | 16,385.50 |
| PM2.5 | Annual average PM2.5 concentration (g/m3) | 11.33 | 69.16 | 38.35 | 12.18 |
| Industrial Structure | Share of tertiary industry (%) | 28.33 | 83.52 | 49.15 | 8.24 |
| Minimum Wage | Minimum monthly wage standard (RMB) | 1180 | 2480 | 1517.62 | 200.46 |
| Unemployment Rate | Registered urban unemployment rate (%) | 0.04 | 3.14 | 0.60 | 0.41 |
| Gender Ratio | Ratio of males to females (%) | 96.37 | 130.05 | 104.53 | 4.52 |
| Model | RMSE | MAE | |
|---|---|---|---|
| OLS | |||
| GWR | |||
| RF (CV) | |||
| GWRF (CV) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, R.; Shi, F.; Guo, H. Unveiling the Spatially Heterogeneous Driving Mechanisms of Net Migration in Chinese Cities: A Geographically Weighted Random Forest Approach. Sustainability 2026, 18, 3866. https://doi.org/10.3390/su18083866
Huang R, Shi F, Guo H. Unveiling the Spatially Heterogeneous Driving Mechanisms of Net Migration in Chinese Cities: A Geographically Weighted Random Forest Approach. Sustainability. 2026; 18(8):3866. https://doi.org/10.3390/su18083866
Chicago/Turabian StyleHuang, Runhua, Feng Shi, and Huichao Guo. 2026. "Unveiling the Spatially Heterogeneous Driving Mechanisms of Net Migration in Chinese Cities: A Geographically Weighted Random Forest Approach" Sustainability 18, no. 8: 3866. https://doi.org/10.3390/su18083866
APA StyleHuang, R., Shi, F., & Guo, H. (2026). Unveiling the Spatially Heterogeneous Driving Mechanisms of Net Migration in Chinese Cities: A Geographically Weighted Random Forest Approach. Sustainability, 18(8), 3866. https://doi.org/10.3390/su18083866

