Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin
Abstract
1. Introduction
2. Materials
2.1. Study Area
2.2. Source of Feature Data
3. Methods
3.1. Random Forest (RF)
3.2. Extreme Gradient Boosting (XGBoost)
3.3. Geographically Weighted Random Forest (GWRF)
3.4. Kruskal–Wallis H and Rank-Biserial Correlation
3.5. Chi-Square Test and Cramér’s V
3.6. Sample Data Extraction and Preprocessing
4. Results
4.1. Spatial Distribution of Characteristics in the Yangtze River Basin in 2010
4.2. Correlation Between Feature Variables and Water Functional Zone
4.3. Progressive Zoning Framework for Water Functional Zones
4.3.1. Model Parameter Optimization
4.3.2. Model Training Results and Model Selection
4.4. 2020 First-Level Water Functional Zoning Results
5. Discussion
5.1. Discussion of the Correlation Between Water Functional Zones and Feature Variables
5.2. Discussion of the Division Results of the First-Level Water Functional Zones in the Yangtze River Basin in 2020
5.3. Future Work and Limitations
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
| Variable | Abbreviation | Variable Explanation |
|---|---|---|
| Cropland | CL | Cropland Proportion |
| Forest Land | FL | Forest Land Proportion |
| Shrubland | SL | Shrubland Proportion |
| Grassland | GL | Grassland Proportion |
| Waterbody | WB | Waterbody Proportion |
| Snow | SN | Snow Proportion |
| Bare Land | BL | Bare Land Proportion |
| Impervious Surface Area | ISA | Impervious Surface Proportion |
| Wetland | WL | Wetland Proportion |
| Electricity Consumption_Mean | EleCM | Annual Average Electricity Consumption |
| Light_Density_per_km2 | LigDen | Nighttime Light Intensity per Square Kilometer |
| NDVI_Min | NDVI | NDVI Minimum Value |
| PM2.5_Mean | PM2.5 | Average PM2.5 Value |
| NO2_Mean | NO2 | Average NO2 Value |
| Annual precipitation_Mean | APM | Annual Average Precipitation |
| Population_Density_per_km2 | PopDen | Total Population per Square Kilometer |
| GDP_per_km2 | GDP | Total GDP per Square Kilometer |
| DEM_Mean | DEM | Average DEM Value |
| Slope_Mean | Slope | Average Slope Value |
| Distance to provincial boundary | DisPb | Distance from Water Function Zone to Provincial Boundary |
| Intersect with the Provincial boundary | InterPb | Whether the Water Function Zone Intersects with Provincial Boundary |






References
- Yuan, H.; Shen, F.; Wei, K. Preliminary study on river function regionalization. Water Resour. Prot. 2011, 27, 13–16. [Google Scholar] [CrossRef]
- Chen, W. Study on Optimum Layout of Water Function Areas—Taking Shaying River of Huaihe River Basin as an Example. Master’s Thesis, Northwest University, Xi’an, China, 2019. [Google Scholar]
- Long, D.; Pan, W. Stream protection and ecological rehabilitation. Adv. Sci. Technol. Water Resour. 2006, 26, 21–25. [Google Scholar] [CrossRef]
- Wu, Y.; Wang, G.; Wu, Y.; Feng, H.; Shen, F.; Lei, S.; Shi, R. Methods of River Functional Zoning and Case Study. Adv. Water Sci. 2011, 22, 741–749. [Google Scholar] [CrossRef]
- Huang, Y.F. Study on the Indicator System and Zoning Methods of Water Functional Areas in Fujian Province. Hydraul. Sci. Technol. 2016, 34–38, 44. [Google Scholar]
- TB 10099–2017; Code for Design of Railway Station and Terminal. China Railway Publishing House: Beijing, China, 2017.
- Zhang, X.; Ru, B. Water Functional Zoning and Management Recommendations for Nanhai District, Foshan City. Pearl River Navig. 2016, 90–91. [Google Scholar] [CrossRef]
- Zhao, Y.; Ding, A.; Pan, C.; Xu, X.; Li, Y.; Li, J. Theory for River Functional Regionalization and A Case Study. Sci. Technol. Rev. 2013, 31, 60–64. [Google Scholar] [CrossRef]
- Liu, F.; Guo, Y. Discussion on the Refinement of Water Functional Zoning and the Coordinated Optimization of Protection Zones in Poyang Lake. Yangtze River 2018, 49, 27–31. [Google Scholar] [CrossRef]
- Liu, H.M.; Zhang, T.R.; Cen, D.H.; Lin, M.L. Water Function Zoning Adjustment Demonstration and a Aase Analysis. Guangdong Water Resour. Hydropower 2018, 45–48, 51. [Google Scholar]
- Zhao, W.; Pang, Y.; Chen, Y.N. Study on adjustment of water environmental function zoning in Nanjing City. Water Resour. Prot. 2012, 28, 76–79. [Google Scholar]
- Hu, K.; Pang, Y.; Yu, H.; Li, Z.; Wang, M. Adjustment of water environmental functional zones in Taihu Lake. J. Hohai Univ. Nat. Sci. 2012, 40, 503–508. [Google Scholar] [CrossRef]
- Luo, H.P.; Pang, Y.; Xu, L.Y. Preliminary study on adjustment scheme of water function zone in Wuxi. J. Water Resour. Water Eng. 2015, 26, 114–120. [Google Scholar]
- Rodrigues, C.; Veloso, M.; Alves, A.; Bento, C.L. Socioeconomic and functional zoning characterization in a city: A clustering approach. Cities 2025, 163, 106023. [Google Scholar] [CrossRef]
- Song, X.; Pu, Y.; Liu, D.; Feng, Y. Mining Urban Functional Areas Using Pedestrians’ Movement Trajectories. Acta Geod. Cartogr. Sin. 2015, 44, 82–88. [Google Scholar]
- Luo, G.; Ye, J.; Wang, J.; Wei, Y. Urban Functional Zone Classification Based on POI Data and Machine Learning. Sustainability 2023, 15, 4631. [Google Scholar] [CrossRef]
- Guangyu, Q. Urban Functional Area Identification and Spatial Structure Analysis Integrating Multi-Source Data—Taking Nanchang City as an Example; East China University of Technology: Nanchang, China, 2024. [Google Scholar]
- Li, Y.; Zhang, F.; Li, R.; Yu, H.; Chen, Y.; Yu, H. Comprehensive Ecological Functional Zoning: A Data-Driven Approach for Sustainable Land Use and Environmental Management—A Case Study in Shenzhen, China. Land 2024, 13, 1413. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, S.; Yu, P.; Lu, Y.; Zhang, Y.; Zhang, J.; Chen, Y. Delineating Ecological Functional Zones and Grades for Multi-Scale Ecosystem Management. Land 2024, 13, 1624. [Google Scholar] [CrossRef]
- Shao, S.; Yang, Y. Identification of ecological improvement zones in different ecological functional zones in north-west Hubei, China. Ecol. Indic. 2023, 155, 111032. [Google Scholar] [CrossRef]
- Sunde, M.; Diamond, D.; Elliott, L. Ecological Systems Classification: Integrating Machine Learning, Ancillary Mod-eling, and Sentinel-2 Satellite Imagery. Remote Sens. 2024, 16, 4440. [Google Scholar] [CrossRef]
- Li, N.; Lu, H. Regionalization method for water resources utilization based on cluster analysis. J. Shenyang Univ. Technol. 2021, 43, 425–431. [Google Scholar] [CrossRef]
- Wang, S.C. Study on Water Functional Zoning Method Based on Dynamic Fuzzy Clustering. Hydraul. Sci. Technol. 2016, 29–33. Available online: https://d.wanfangdata.com.cn/periodical/CiFQZXJpb2RpY2FsQ0hJU29scjlTMjAyNTEwMjEwOTUwNDYSEHNodWlsa2oyMDE2MDQwMDgaCGNkN29jNXVi (accessed on 29 December 2025).
- Enguehard, L.; Falco, N.; Schmutz, M.; Newcomer, M.E.; Ladau, J.; Brown, J.B.; Bourgeau-Chavez, L.; Wainwright, H.M. Machine-Learning Functional Zonation Approach for Characterizing Terrestrial–Aquatic Interfaces: Application to Lake Erie. Remote Sens. 2022, 14, 3285. [Google Scholar] [CrossRef]
- Bai, H.; Zhong, Y.; Ma, N.; Kong, D.; Mao, Y.; Feng, W.; Wu, Y.; Zhong, M. Changes and drivers of long-term land evapotranspiration in the Yangtze River Basin: A water balance perspective. J. Hydrol. 2025, 653, 132763. [Google Scholar] [CrossRef]
- Wang, F.; Zhan, C.S.; Pan, C.Z.; Wang, H.X. Theoretical Study on River Functional Zoning. China Rural Water Resour. Hydropower 2009, 33–36. Available online: https://kns.cnki.net/kcms2/article/abstract?v=-djcopRf0qGct_YrvF591ob_eWcKPxFErmp58P2ujBOBeeJLBROD9NJaofhAKgTXvnkqsC4Q_4xk5X5r4oheI-ob8fDm9rmCi--kGuksRMzCAsV9ctXfd4V3kiPiyaGpXVZ6AhwuM6ks0266eMnMGWLCsbCNk19kwJh4frVcCC9TOGDpTrcinkLB27MjBiSP&uniplatform=NZKPT&captchaId=915ace3a-bbfc-4b49-a7aa-9fbd0c943535 (accessed on 29 December 2025).
- Approval of the National Major Rivers and Lakes Water Functional Zoning (2011–2030) by the State Council. China Water Resour. Bull. 2011. Available online: https://www.waizi.org.cn/law/9451.html (accessed on 29 December 2025).
- Bai, R.; Li, T.; Huang, Y.; Li, J.; Wang, G. An efficient and comprehensive method for drainage network extraction from DEM with billions of pixels using a size-balanced binary search tree. Geomorphology 2015, 238, 56–67. [Google Scholar] [CrossRef]
- Didan, K. MOD13A3 MODIS/Terra Vegetation Indices Monthly L3 Global 1km SIN Grid V006. In NASA Land Processes Distributed Active Archive Center; NASA: Washington, DC, USA, 2015. [Google Scholar] [CrossRef]
- Wei, J.; Li, Z.Q. ChinaHighPM2.5: High-Resolution and High-Quality Ground-Level PM2.5 Dataset for China (2000–2023); National Tibetan Plateau Data Center: Beijing, China, 2024. [Google Scholar]
- Wei, J.; Li, Z.Q. ChinaHighNO2: High-Resolution and High-Quality Ground-Level NO2 Dataset for China (2008–2023); National Tibetan Plateau Data Center: Beijing, China, 2024. [Google Scholar]
- Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, RG2004. [Google Scholar] [CrossRef]
- Chen, J.; Gao, M. Global 1 km × 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data. Sci. Data 2022, 9, 202. [Google Scholar] [CrossRef]
- Hu, J.; Miao, C. CHM_PRE V2: An upgraded high-precision gridded precipitation dataset for the Chinese mainland considering spatial autocorrelation and covariates (V2.0.2). Zenodo 2025. Available online: https://zenodo.org/records/14634575 (accessed on 7 January 2026).
- Wu, Y.; Shi, K.; Chen, Z.; Liu, S.; Chang, Z. An improved time-series DMSP-OLS-like data (1992–2024) in China by integrating DMSP-OLS and SNPP-VIIRS. Harv. Dataverse 2021. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GIYGJU (accessed on 7 January 2026).
- Qian, L. A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGBoost. Master’s Thesis, Zhejiang University, Hangzhou, China, 2018. [Google Scholar]
- Niu, X.; Ling, F. Study on Personal Credit Risk Assessment Model Based on Hybrid Learning. J. Fudan Univ. Nat. Sci. 2021, 60, 703–719. [Google Scholar] [CrossRef]
- Wan, C.; Li, X.; Yang, Z.; Du, F.; Chen, X. Comparative analysis of rural water supply risk identification models based on machine learn-ing algorithms. J. China Inst. Water Resour. Hydropower Res. 2025, 23, 297–306. [Google Scholar] [CrossRef]
- Yue, H.; Chen, J. Interpretable spatial machine learning for understanding spatial heterogeneity in factors affecting street theft crime. Appl. Geogr. 2025, 175, 103503. [Google Scholar] [CrossRef]
- Zhang, Y.; Ge, J.; Wang, S.; Dong, C. Optimizing urban green space configurations for enhanced heat island mitigation: A geographically weighted machine learning approach. Sustain. Cities Soc. 2025, 119, 106087. [Google Scholar] [CrossRef]
- Gautham, V.M.; Sumesh, A.; Jithin, E.V.; Rameshkumar, K.; Thekkuden, D.T. Evaluation of Time-Domain Acoustic Signature in TIG Welding of 5083 Aluminum Alloy: A Methodological Comparison of Feature Reduction Approaches. Results Eng. 2025, 26, 105062. [Google Scholar] [CrossRef]
- Xu, S.; Shi, R.; Zhao, Q. Research on River Functional Zoning. Sci. China Press 2009, 39, 1521–1528. [Google Scholar] [CrossRef]
- Garg, A.; Ramamurthi, N.; Das, S.S. Addressing Imbalanced Classification Problems in Drug Discovery and Devel-opment Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML. J. Chem. Inf. Model. 2025, 65, 3976–3989. [Google Scholar] [CrossRef]
- Ghosh, K.; Bellinger, C.; Corizzo, R.; Branco, P.; Krawczyk, B.; Japkowicz, N. The class imbalance problem in deep learning. Mach. Learn. 2024, 113, 4845–4901. [Google Scholar] [CrossRef]
- Ma, Y.; Kong, D.; Ye, X.; Ding, Y. Application and comparison of type 2 diabetes with comorbid hypertension classification prediction models based on random forest and XGBoost algorithms. J. Guangdong Med. Coll. 2024, 42, 523–534. [Google Scholar] [CrossRef]
- Wu, D.; Zhang, Y.; Xiang, Q. Geographically weighted random forests for macro-level crash frequency prediction. Accid. Anal. Prev. 2024, 194, 107370. [Google Scholar] [CrossRef] [PubMed]
- Yuan, H.; Luo, X. Method and practice of water function zone division. Yangtze River 2001, 32, 13–15. [Google Scholar] [CrossRef]
- Hu, L.; Xing, J. The division of water function areas of jiangxi province and the analysis of water quality which reach the standard or not. Jiangxi Hydraul. Sci. Technol. 2003, 29, 154–157. [Google Scholar] [CrossRef]
- Li, Z. Research on water function regionalization hierarchical classification system. Glob. Seabuckthorn Res. Dev. 2016, 26–28. [Google Scholar] [CrossRef]
- Yang, Y.; Fu, H.; Wu, T. Protection and Management of Water Function Zone in the Yangtze River Basin. Technol. Econ. Change 2018, 2, 45–52. [Google Scholar] [CrossRef]










| Category | Subcategory | Specific Feature Data |
|---|---|---|
| Natural Ecology Data | Hydrological Data | Water Quality, Average Water Flow, Sediment Concentration, Runoff Depth. |
| Meteorological Data | Air Temperature, Precipitation, Evapotranspiration (ET), Sunshine Duration, PM2.5, NO2. | |
| Land Use Data | Forest, Shrubland, Grassland, Waterbody, Snow, Bare Land, Wetland. | |
| Vegetation Distribution Data | NDVI, EVI, FVC. | |
| Topographic Data | Drainage Area, River Length, Drainage Density, Distance to Provincial Boundary, DEM, Slope. | |
| Socioeconomic and Human Activity Data | Socioeconomic Data | GDP, Population, Gross Output Value of Industry and Agriculture. |
| Human Activity Data | Nighttime Light, POI, Electricity Consumption, Impervious Surface, Cropland, Location of Water Withdrawal Points and Water Withdrawal Amount, Location of Wastewater Discharge Points and Wastewater Discharge Volume. |
| Name | Time | Spatial Resolution | Source |
|---|---|---|---|
| Land Use Data | 2000, 2010, 2020 | 30 m | http://globallandcover.com/ |
| NDVI Data [29] | Feb 2000–Dec 2024 | 1 km | https://www.earthdata.nasa.gov/ |
| PM2.5 Data [30] | 2000–2023 | 1 km | https://data.tpdc.ac.cn/home |
| NO2 Data [31] | 2008–2023 | 2008–2018:10 km | https://data.tpdc.ac.cn/home |
| 2019–2023:1 km | |||
| DEM and Slope Data [32] | Null | 30 m | https://www.earthdata.nasa.gov/ |
| Electricity Consumption Data [33] | 1992–2019 | 1 km | https://figshare.com |
| GDP [33] | 1992–2019 | 1 km | https://figshare.com |
| Annual Precipitation Data [34] | 1 January 1960–31 December 2023 | 0.1° | https://zenodo.org |
| Population Data | 2010–2020 | 100 m | https://hub.worldpop.org |
| Nighttime Light Data [35] | 1992–2023 | 1 km | https://dataverse.harvard.edu |
| Parameter | Explanation | Bayesian Optimization Range | Optimal Parameter |
|---|---|---|---|
| max_depth | Maximum depth of the tree | (5, 15) | 12 |
| min_samples_split | Minimum number of samples required to split an internal node | (2, 15) | 6 |
| min_samples_leaf | Minimum number of samples required to be at a leaf node | (1, 10) | 5 |
| max_features | Maximum number of features considered when looking for the best split | (0.1, 1) | 0.67 |
| n_estimators | Number of trees in the forest | (10, 300) | 121 |
| max_samples | Maximum number of samples used to train each estimator | (0.7, 1) | 0.98 |
| Parameter | Explanation | Bayesian Optimization Range | Optimal Parameter |
|---|---|---|---|
| max_depth | Maximum depth of the tree | (3, 15) | 5 |
| n_estimators | Number of boosting trees | (100, 500) | 111 |
| learning_rate | Learning rate | (0.01, 0.3) | 0.288 |
| subsample | Subsample ratio of the training instances | (0.5, 1) | 0.879 |
| colsample_bytree | Subsample ratio of columns when constructing each tree | (0.5, 1) | 0.597 |
| gamma | Minimum loss reduction required to make a further partition | (0, 5) | 1.459 |
| min_child_weight | Minimum sum of instance weight needed in a child node | (0, 5) | 4 |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| RF | 0.7322 | 0.7570 | 0.7427 | 0.7481 |
| XGBoost | 0.7128 | 0.7548 | 0.7063 | 0.7245 |
| GWRF | 0.7477 | 0.7239 | 0.7316 | 0.7274 |
| RF+GWRF | 0.7808 | 0.7921 | 0.7965 | 0.7943 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, W.; Sun, Y.; Deng, F.; Wu, B.; Zhang, X.; Sun, M.; Li, L.; Li, H.; Yuan, Y. Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin. Water 2026, 18, 209. https://doi.org/10.3390/w18020209
Liu W, Sun Y, Deng F, Wu B, Zhang X, Sun M, Li L, Li H, Yuan Y. Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin. Water. 2026; 18(2):209. https://doi.org/10.3390/w18020209
Chicago/Turabian StyleLiu, Wei, Yuanzhuo Sun, Fuliang Deng, Bo Wu, Xiaoyan Zhang, Mei Sun, Lanhui Li, Hui Li, and Ying Yuan. 2026. "Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin" Water 18, no. 2: 209. https://doi.org/10.3390/w18020209
APA StyleLiu, W., Sun, Y., Deng, F., Wu, B., Zhang, X., Sun, M., Li, L., Li, H., & Yuan, Y. (2026). Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin. Water, 18(2), 209. https://doi.org/10.3390/w18020209

