Next Article in Journal
Integrating Regressive and Probabilistic Streamflow Forecasting via a Hybrid Hydrological Forecasting System: Application to the Paraíba do Sul River Basin
Next Article in Special Issue
Increasing Computational Efficiency of a River Ice Model to Help Investigate the Impact of Ice Booms on Ice Covers Formed in a Regulated River
Previous Article in Journal
Assessment of Water Balance and Future Runoff in the Nitra River Basin, Slovakia
Previous Article in Special Issue
Multi-Scale and Interpretable Daily Runoff Forecasting with IEWT and ModernTCN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin

1
School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
2
Chinese Academy of Environmental Planning, Water Ecology Research Center, Beijing 100000, China
3
China Sciences Landsenses (Amoy) Ecology and Environment Group, Xiamen 361000, China
*
Author to whom correspondence should be addressed.
Water 2026, 18(2), 209; https://doi.org/10.3390/w18020209
Submission received: 22 November 2025 / Revised: 1 January 2026 / Accepted: 9 January 2026 / Published: 13 January 2026

Abstract

Water functional zoning plays a crucial role in water resource allocation, pollution prevention, and ecological protection. With the increasing intensity of human activities, there is a significant mismatch between current water functional zoning and the economic, social development needs and ecological protection goals. Existing water functional zoning methods mainly rely on expert experience for qualitative judgment, which is highly subjective and inefficient. In response, this paper presents a transferable quantitative feature system and introduces a machine learning-based progressive zoning framework for water functions, validated through a case study of the Yangtze River Basin. The results show that the overall accuracy of the framework is 0.78, which is 4–7% higher compared to traditional single models. In terms of spatial distribution, the transformation of protection and reserved zones in 2020 mainly occurred in the middle and lower reaches, where human activities are frequent, particularly in Sichuan and Jiangxi provinces. The development zones are highly concentrated in the downstream areas, with some regions transitioning into protection or reserved zones, mainly in Hubei and Chongqing provinces. Adjustments to buffer zones are primarily concentrated along inter-provincial boundary areas, such as the junction between Hubei and Anhui provinces. This framework helps managers quickly identify key areas for optimizing water functional zones, providing valuable reference for the precise management of water resources and the formulation of ecological protection strategies in the basin.

1. Introduction

Rivers are an important component of water resources and the material basis for human survival and development [1]. As human activities intensify, the conflict and disconnect between water functional zoning, economic and social development needs, water resource utilization, and ecological protection goals have severely restricted the rational development and effective protection of water resources [2]. In this context, the rational development and effective utilization of rivers have become important issues related to resource management, environmental sustainability, economic and social development, and ecological and environmental safety [3]. Currently, water functional zoning mainly relies on qualitative judgment, supplemented by some quantitative methods. However, many key zoning indicators face challenges in quantification. Therefore, conducting systematic, quantitative, and universal research on water functional zoning not only helps clarify the functional positioning of river sections in spatial patterns but also enhances the efficiency of water resource utilization.
With the increasing availability of multi-source data such as land use, nighttime light intensity, population density, and regional GDP, a solid data foundation has been laid for water functional zoning research. For example, Wu Yongxiang et al. [4] used data on urban distribution, water quality, and intake locations to establish a zoning plan for the Taipu River; Huang Yongfu [5] used data on terrain, average annual precipitation, and population density to develop a zoning plan for the Jinjiang River Basin. It has been observed that different scholars use varying evaluation indicators in the zoning process across different regions, and a unified standard for feature selection has not been established. Therefore, there is a need to build a quantitative feature system to make water functional zoning more scientific, standardized, and practical.
Regarding water functional zoning methods, many scholars have used techniques such as system analysis and the elimination method, combined with the “Water Functional Zoning Standards” [6], to explore zoning approaches [7,8,9,10]. Although these qualitative analysis methods provide valuable reference, they are highly subjective and arbitrary. Additionally, some scholars emphasize the use of water quality mathematical models and adjustments through quantitative calculations to support water functional zoning. For instance, in studies conducted in regions such as Taihu, Nanjing, and Wuxi, scholars commonly used 2D hydrodynamic–water quality coupled models to accurately delineate the boundaries of pollution control areas and transition zones [11,12,13]. However, these methods have low computational efficiency and are limited when dealing with highly nonlinear relationships. With the rapid development of machine learning, it has provided innovative technical support for functional zoning studies. In urban functional zoning, some researchers employed unsupervised learning methods such as GMM and K-Means to investigate the functional zoning of Lisbon and Hefei [14,15]. In contrast, some researchers used supervised learning methods like KStar and LightGBM to explore the functional zoning of Nanning and Nanchang [16,17]. In ecological functional zoning studies, many researchers have utilized clustering learning methods [18,19,20] such as SOFM (Self-Organizing Feature Map) and SOM (Self-Organizing Map) to study the ecological functional zoning in various regions. At the same time, some researchers have also applied supervised machine learning methods to further explore ecological functional zoning methods. For example, Michael Sunde et al. [21] used the Random Forest (RF) model to classify the ecological functional zones in Arkansas, USA. In the study of water functional zoning, some researchers employed K-means [22] and fuzzy clustering methods [23] to classify the water functional zones in the Jinjiang and Daduhe River basins. Léa Enguehard et al. [24] used hierarchical clustering to perform functional zoning for Lake Erie. However, since clustering is an unsupervised learning method, it still requires manual intervention to determine the specific water functional categories. Currently, there is relatively little research that directly applies supervised learning to water functional zoning. Therefore, how to utilize supervised classification models to scientifically classify and reasonably plan large-scale water functional zones remains an important direction for future research.
This paper aims to propose a transferable water functional zoning framework and validate it using the Yangtze River Basin as a case study. The specific research contents include the following: (1) Analyzing the correlation between first-level water functional zones and relevant characteristic variables in the Yangtze River Basin, identifying the main influencing factors of different types of water functional zones. (2) Training the 2010 Yangtze River Basin first-level water functional zone data using three supervised classification models (RF, GWRF, and XGBoost) and proposing a Progressive Zoning Framework. (3) Predicting the spatial distribution of first-level water functional zones in the Yangtze River Basin in 2020 based on this framework and analyzing the changes in water functional zones between 2010 and 2020.

2. Materials

2.1. Study Area

The Yangtze River is the third-longest river in the world and the longest in Asia, originating from the highest peak of the Tanggula Mountains, with a total length of approximately 6300 km. As the largest river basin in China, the Yangtze River Basin covers an extensive area of 1.8 million square kilometers, accounting for about 19% of China’s land area [25] (Figure 1). The Yangtze River Basin is rich in water resources and plays an extremely important role in China’s overall water resource distribution. The basin contains numerous rivers, lakes, and a well-developed water system, with a relatively balanced spatiotemporal distribution of water resources. This has provided a solid foundation for the socio-economic development and ecological protection of the surrounding regions. However, with the rapid economic and social development in the Yangtze River Basin, changes in river conditions, and the innovation of water management concepts, the original integrated basin planning can no longer meet the requirements for the sustainable use of river functions and water resources [26]. Therefore, water functional zoning plays an important role in basin management and water resource protection.
The water functional data used in this study comes from the “National Major Rivers and Lakes Water Functional Zoning (2011–2030)” [27], which includes Protection Zones, Reserved Zones, Buffer Zones, and Development Zones. Among these, the Development Zones encompass seven categories: drinking water source areas, industrial water use areas, agricultural water use areas, fishery water use areas, recreational water use areas, transition zones, and pollution control zones. In this study, all these categories are collectively referred to as Development Zones.
Protection Zones refer to waters that are of significant importance for the protection of water resources, natural ecosystems, and rare or endangered species, and thus require designated protection. Reserved Zones refer to waters with a low current level of resource development and utilization, reserved for future sustainable water use. Development Zones are waters designated to meet the functional needs of urban living, industrial and agricultural production, fishery, recreation, etc. Buffer Zones are waters designated to coordinate water use relations in regions with inter-provincial water conflicts or areas with significant water usage issues.
This study uses the first-level hydrological units of the Yangtze River Basin, based on the dataset created by the Tsinghua University team [28], defining a total of 1398 first-level water functional zones. These include 171 Protection Zones, 398 Reserved Zones, 728 Development Zones, and 101 Buffer Zones (Figure 2).

2.2. Source of Feature Data

To address the issue of indicator heterogeneity in water functional zoning, this study proposes a quantitative indicator system (Table 1). The feature variable data used in this study are shown in Table 2. Among these, the land use data comes from GlobeLand and includes 10 primary types: cropland, forest, grassland, shrubland, wetland, waterbody, tundra, impervious surface, bare land, and snow. Tundra is not used in this study. NDVI data comes from NASA’s monthly data, and the study calculates the annual average NDVI value based on the 12-month NDVI raster data. PM2.5 and NO2 data come from the data provided by the Qinghai–Tibet Plateau Scientific Data Center. DEM and Slope data are provided by NASA and USGS at a 30 m resolution, and these data are resampled to a 1 km spatial resolution. GDP and electricity consumption data come from data published on the FigShare platform. Precipitation data is from Zenodo, and population data is from WorldPop. Nighttime light data is from the HARVARD platform. All of the aforementioned data, except for GDP and electricity consumption data, which are from 2010 and 2019, use data from 2010 and 2020.

3. Methods

This study proposes a Progressive Zoning Framework based on machine learning models, where the RF model is first used to delineate the buffer zones, followed by the GWRF model to delineate the protection zones, reserved zones, and development zones. Additionally, this study uses the Yangtze River Basin as a typical case to analyze the correlation between water functional zones and feature variables, and predicts the spatial distribution of first-level water functional zones for 2020. The technical approach is divided into three core modules: data, methods, and results, as shown in Figure 3.
Specifically, this study first uses seven-level hydrological units as the smallest division to delineate the water functional zones in the Yangtze River Basin. Regional statistical methods are then applied to quantify the feature variables, creating a sample set for model training. Based on this, the KW-RBC and Chi-V methods are employed for correlation analysis, and feature optimization is performed. Subsequently, model training is conducted, and a progressive zoning framework is proposed. Finally, this framework is applied to predict the spatial distribution of the first-level water functional zones in the Yangtze River Basin for 2020, and the spatiotemporal evolution of water functional zones from 2010 to 2020 is illustrated by combining the spatial distribution results of 2010.

3.1. Random Forest (RF)

Random Forest is an ensemble learning method, where multiple weak models are combined to form a strong model through a specific combination approach [36]. In classification tasks, its core idea is to use the bootstrap sampling method during the training phase. This method involves repeatedly drawing samples with replacement from the original training set to generate multiple training subsets that are not completely identical. Each training subset is used to independently train a decision tree, and during the construction process, a subset of features is randomly selected to participate in node splitting, thus introducing additional model diversity. This reduces the correlation between trees and effectively mitigates overfitting. The dual randomness mechanism, including both sample randomness and feature randomness, enables the random forest to have strong generalization ability and noise resistance [37].

3.2. Extreme Gradient Boosting (XGBoost)

Extreme Gradient Boosting (XGBoost) is an efficient machine learning ensemble algorithm [38]. Its core idea is to combine multiple weak classifiers in a weighted manner to create a strong classification model. During training, XGBoost optimizes a loss function with a regularization term to improve model performance. This loss function consists of both an error term and a model complexity term, which minimizes prediction error while preventing overfitting. The algorithm uses a second-order approximation of the objective function via Taylor expansion, allowing the model to more accurately assess the loss of samples in each iteration, thereby improving prediction accuracy. Compared to traditional gradient boosting methods, XGBoost has stronger stability and robustness when handling structured data, high-dimensional feature spaces, and incomplete data. It also maintains good generalization ability even when the sample size is limited or the data quality is low. The algorithm process of XGBoost is as follows:
D   =   { x i ,   y i   |   =   1 ,   2 ,   ,   N }
Let Formula (1) represent the input training data, where x i is the i-th feature variable and y i is the i-th target variable (Protection Area, Reserved Area, Buffer Area, Development Area).
L ϕ = i = 1 N l y i , y i ^ t + k = 1 t Ω f k
Ω f k = γ T + 1 2 λ w 2
l ϕ i = 1 N g i f t x i + 1 2 h i f t 2 x i
Formula (2) is the objective function to be minimized, consisting of two parts. Formula (3) is the regularization term, used to control the complexity of the tree, where γ T is used to control the number of leaf nodes, and λ | | w | | 2 controls the magnitude of the leaf weights through L2 regularization. Formula (4) is the second-order Taylor expansion of the loss function, where g i is the first derivative, and h i is the second derivative.
W j = G j H j + λ
L t = 1 2 j = 1 T G j 2 H j + λ + γ T
The approximate loss terms corresponding to each sample are accumulated. Since each sample will eventually fall into a specific leaf node, the samples within the same leaf node can be processed together. In this way, the objective function can be transformed into a quadratic function with respect to the leaf node weights W j . Using the quadratic function extremum formula, the optimal leaf weight can be directly obtained (Formula (5)), and the corresponding value of the objective function can be determined (Formula (6)).

3.3. Geographically Weighted Random Forest (GWRF)

Traditional Random Forest (RF) is a global, ‘non-spatial’ model that cannot handle spatial non-stationarity, which may reduce its prediction accuracy [39].To address this limitation, the Geographically Weighted Random Forest (GWRF) model was developed. This machine learning model accounts for spatial heterogeneity and was first applied in 2019 [40]. Its core idea is to combine the spatially local weighting concept from Geographically Weighted Regression (GWR) with the ensemble learning capability of Random Forest. By constructing local Random Forest models for each spatial location, it better captures the spatial heterogeneity present in geographic spatial data. The choice of bandwidth parameter is crucial in the GWRF model, as it directly determines the weight distribution of local samples and the model’s ability to capture spatial heterogeneity. In this study, the commonly used Golden Section Search method in GWR [40] is employed to search for the optimal bandwidth. The core process is as follows:
W i j = exp d i j 2 2 b 2
where i , in this study, represents the geographic center point (longitude and latitude coordinates) of the seven-level hydrological units. The GWRF model assigns weights to neighboring samples based on the spatial weight function (Formula (7)) and then trains a local Random Forest. Here, d i j represents the spatial distance between location i and sample point j , and b is the bandwidth parameter that controls the size of the domain.
f i ^ x = 1 T t = 1 T h t x ; D i
At location i , a neighborhood training set D i is generated via weighted sampling, and a local Random Forest model is trained. In this context, T refers to the number of trees, and h t denotes the t -th decision tree.

3.4. Kruskal–Wallis H and Rank-Biserial Correlation

In this study, to explore the relationship between a discrete dependent variable and continuous feature variables, a combined approach of the Kruskal–Wallis H and Rank-Biserial Correlation (RBC) is used, referred to as the KW-RBC method.
Firstly, Kruskal–Wallis H is a non-parametric statistical method commonly used to test whether there are statistically significant differences between three or more independent samples. It can also degenerate into the Mann–Whitney U test in a binary classification scenario [41]. Unlike traditional ANOVA, this method does not rely on the assumptions of normality and homogeneity of variance, making it suitable for situations where the distribution deviates from normality. In the case of binary classification, the Kruskal–Wallis test is equivalent to the Mann–Whitney U. Secondly, to measure the direction and strength of the correlation, this study introduces the Rank-Biserial Correlation (RBC). Its definition is based on the Mann–Whitney U statistic, with a range of (−1, 1). The absolute value represents the effect size (the closer the value is to 1, the stronger the effect), and the sign indicates the direction of the difference. Since the Mann–Whitney U test is equivalent to the Kruskal–Wallis H statistic in a binary classification scenario, the RBC can be considered as an effect size indicator for the Kruskal-Wallis test in binary classification. Their calculation methods are as follows:
H = 12 N N + 1 i = 1 k R i 2 n i 3 N + 1
U = n 1 n 0 + n 1 n 1 + 1 2 R 1
Let the dependent variable be divided into k categories, where n i is the sample size of the i -th category and N is the total sample size. R i is the rank sum of the i -th category. When k = 2 , the H value is equivalent to the U value.
r r b = 2 U n 1 n 0 1
where n 0 and n 1 represent the sample sizes of the two groups, and r r b denotes the Rank-Biserial Correlation (RBC).

3.5. Chi-Square Test and Cramér’s V

In this study, to explore the correlation between a discrete dependent variable and discrete feature variables, a combined approach of the Chi-square test and Cramér’s V is used, referred to as the Chi–V analysis. This method is applied in this paper to investigate the correlation between the categorical variable InterPb and the water functional zones.
The Chi-square test of independence is used to examine whether there is a statistically significant association between two categorical variables. Cramér’s V is used to measure the strength of the correlation, with a range of [0, 1], where a larger value indicates a stronger correlation. The calculation method is as follows:
χ 2 = O i j E i j 2 E i j
V = χ 2 n k 1
where O i j   represents the observed frequency, and E i j represents the expected frequency under the null hypothesis of independence, χ 2 is the Chi-square statistic. n is the total sample size, and k is the smaller value of the number of rows and columns.

3.6. Sample Data Extraction and Preprocessing

When delineating water functional zones, it is necessary to take into account both upstream and downstream, both banks, as well as the short-term and long-term goals for water resources and water ecological protection, along with the socio-economic development needs [42]. Based on this principle, this study uses seven-level hydrological units as the smallest division unit to map the water functional zones of the Yangtze River Basin to each hydrological unit, establishing the relationship between hydrological units and first-level water functional zone categories. Then, regional statistical methods are used to quantify 21 feature indicators corresponding to each hydrological unit, providing data for subsequent model training. To ensure data accuracy and representativeness, line data with a length of less than 100 m and polygon data for hydrological units smaller than 1 square kilometer are excluded.
In total, 6456 sample points were obtained, with 1080 points from protection zones, 3236 points from reserved zones, 1698 points from development zones, and 442 points from buffer zones. The detailed explanations of the feature variables are provided in Table A1, and the violin plot of the feature variable distribution is shown in Figure A1 and Figure A2.

4. Results

4.1. Spatial Distribution of Characteristics in the Yangtze River Basin in 2010

The spatial distribution map of the six major land use types in the Yangtze River Basin (Figure 4) shows that cropland is mainly concentrated in the middle and lower reaches of the Yangtze River plains and along some tributary banks, especially in the Sichuan Basin, Jianghan Plain, and the lower Yangtze River plain. Forest is widely distributed in the mountainous and hilly areas of the upper and middle reaches, forming a ring around the Sichuan Basin. Grassland is primarily found in the high-altitude regions of the upper Yangtze River, the Tibetan Plateau, and the western Sichuan Plateau. Impervious surfaces are mainly concentrated in the urbanized areas of the middle and lower Yangtze River, while bare land is found in the arid and semi-arid regions in the upper reaches and high-altitude areas. Wetlands are scattered, mainly located in the lake areas of the middle and lower Yangtze River, such as Dongting Lake and Poyang Lake. Among the six major land use types, forest occupies the largest proportion (42.53%), followed by cropland (27.85%), grassland (23.23%), impervious surfaces (1.27%), bare land (1%), and wetlands (0.3%).
The spatial distribution of socioeconomic and human activity features in the Yangtze River Basin (Figure 5) shows that the distribution of GDP and nighttime light intensity is highly consistent, with high-value areas primarily concentrated in the middle and lower reaches of the basin, particularly in the Yangtze River Delta and around some provincial capital cities. The spatial distribution of the population is positively correlated with GDP and nighttime light, with high population density areas also located in the middle and lower reaches, while the population density in the upper reaches is relatively lower. Meanwhile, the spatial distribution of electricity consumption follows a similar pattern to GDP and nighttime light intensity, with high-value areas concentrated in cities such as Chengdu and Chongqing, indicating that electricity demand is closely related to economic activity and population concentration.
The natural ecological features of the Yangtze River Basin (Figure 6) show that precipitation follows a pattern of increasing from northwest to southeast, with high-value areas concentrated in the southeastern part of the basin, including Jiangxi, Hunan, and surrounding regions, while the northwestern areas of Qinghai and Gansu experience less precipitation. In terms of vegetation coverage, the middle and lower reaches of the Yangtze River Basin, due to the warm, humid climate and abundant rainfall, exhibit significantly higher vegetation coverage (NDVI values) than the arid upper reaches. Regarding air pollution, high levels of PM2.5 and NO2 are concentrated in the urban clusters and industrial zones of the middle and lower reaches, particularly in areas such as the Sichuan Basin, Wuhan in Hubei, and the Yangtze River Delta, reflecting the significant impact of human activity on air quality. The DEM shows that the terrain of the Yangtze River Basin gradually decreases from west to east, with higher elevations in the upper reaches and flatter terrain in the middle and lower reaches. The slope distribution matches the elevation, with steep slopes concentrated around the mountainous areas of the Sichuan Basin, where local slopes exceed 15°, while the middle and lower reaches are dominated by plains and hills with slopes generally below 5°.

4.2. Correlation Between Feature Variables and Water Functional Zone

This study first conducted a normality test on the sample distribution and plotted their Quantile-Quantile (Q-Q) plots (Figure A3, Figure A4, Figure A5 and Figure A6). The results show that the majority of features across different water functional zones do not follow a normal distribution. Therefore, we used the KW–RBC non-parametric method to explore the correlation between continuous feature variables and water functional zone classification variables, and employed the Chi–V method to assess the correlation between the categorical variable InterPb and water functional zones.
The results (Figure 7) indicate that in protection areas, variables closely related to socioeconomic and human activity features (such as PopDen, LigDen, GDP, EleCM, CL, etc.) show a strong negative correlation with the protection areas. However, WL and DisPb have p-values greater than 0.05, indicating no significant difference. In development areas, features closely associated with socioeconomic and human activities show a strong positive correlation with the development areas. In reserved areas, features closely linked to both socioeconomic and human activity factors, as well as natural ecological elements (such as LigDen, GDP, FL, NDVI, etc.), exhibit a strong correlation with the reserved areas. In buffer areas, DisPb and InterPb show the most significant correlations with the buffer areas.

4.3. Progressive Zoning Framework for Water Functional Zones

4.3.1. Model Parameter Optimization

In the training set, the sample distribution across categories is imbalanced, with fewer samples in the buffer area. To address this, the study employs a class weighting strategy for balance [43,44], which is essentially an implementation of cost-sensitive learning. The core idea is to adjust the weights of samples from different categories in the loss function, allowing the model to fit the overall data while maintaining accuracy in identifying the minority class. This improves the robustness and generalization ability of the classification task under imbalanced data conditions. Specifically, this is achieved by using the class_weight = ‘balanced’ parameter provided by the scikit-learn package.
Before training the RF, XGBoost, and GWRF models, it is necessary to optimize the model parameters to achieve the best performance, generalization ability, interpretability, and training speed [45]. For the optimization of RF and XGBoost model parameters, Bayesian optimization is employed in this study, and the optimal parameters are shown in Table 3 and Table 4. For the GWRF model, bandwidth selection is a key factor influencing model performance. This study uses the Golden Section Search method, which is applied in geographically weighted regression (GWR), to determine the optimal bandwidth. The Golden Section Search efficiently approaches the optimal value by gradually narrowing the search interval, allowing for effective optimization with limited computational resources. To ensure the feasibility and computational efficiency of the experiment, the search range is restricted to between 3000 m and 12,000 m, with F1_Score used as the evaluation metric for bandwidth selection. After several rounds of experimentation, the final optimal bandwidth of 6437.7 m was obtained, showing an ideal balance between accuracy and efficiency.

4.3.2. Model Training Results and Model Selection

To ensure the reliability and robustness of the model evaluation results, five-fold cross-validation is applied to the XGBoost, GWRF, and RF models in this study. Specifically, the original dataset is randomly divided into five subsets. During each model validation, one subset is selected as the test set, and the remaining four subsets are used as the training set [46], with this process repeating until all subsets have been used as the test set. The final model performance evaluation metrics are obtained by averaging the results from the five experiments. This approach effectively reduces the random variability caused by data partitioning, thereby enhancing the robustness of the evaluation results. To compare the performance of different models on the test set, accuracy, precision, recall, and F1_Score are used to evaluate the classification performance of the models.
Due to the clear distinction of the buffer area, this study adopts a progressive zoning strategy for model training. Specifically, the buffer area is partitioned first, followed by the partitioning of the other three categories. The procedure is as follows: the RF, GWRF, and XGBoost models are used to perform binary classification for the buffer and non-buffer areas, and the classification performance of each model is compared. The model with the best performance is then selected for predicting the buffer area. By removing features with insignificant or weak correlation with the buffer area, the prediction performance of the three models is obtained (shown in Figure 8a). The results indicate that the RF model outperforms the other two models in terms of F1-Score, suggesting that RF is the most suitable model for final buffer area prediction.
For the prediction tasks of the protection areas, reserved areas, and development areas, RF, GWRF, and XGBoost models are used for multi-class classification, and their performance is compared to select the best model for these areas’ predictions. The prediction results are shown in Figure 8b–d. GWRF exhibits a higher F1-score in the multi-class classification tasks for the protection areas, reserved areas, and development areas, indicating that GWRF effectively captures spatial heterogeneity and fully explores the local relationships between geographical location and spatial features, thus improving classification accuracy. Therefore, GWRF is considered the best choice for predicting the protection areas, reserved areas, and development areas. In addition, the three models were individually applied to predict water functional areas and compared in an experiment (Table 5). The experimental results show that after applying the Progressive Zoning Framework strategy, the model’s accuracy improved to 0.78, outperforming the predictions made using any single model.

4.4. 2020 First-Level Water Functional Zoning Results

Based on the results of model selection presented earlier, a progressive zoning strategy was used to classify and predict the first-level water functional zones of the Yangtze River Basin in 2020. The prediction results indicate that in 2020, the number of Protection Zones in the Yangtze River Basin increased from 171 in 2010 to 178, the number of Reserved Zones rose from 398 in 2010 to 428, while the number of Buffer Zones decreased from 101 in 2010 to 82, and the number of Development Zones declined from 728 in 2010 to 710. Based on these changes in numbers, we have mapped the spatiotemporal evolution of the first-level water functional zones in the Yangtze River Basin from 2010 to 2020 (Figure 9).
From the figure, it is evident that approximately 83.0% remained Protection Zones in 2020, with only a small portion converting to Reserved Zones (8.8%), Development Zones (5.8%), and Buffer Zones (2.3%). Spatially, Protection Zones remain primarily in the ecologically sensitive areas of the upper and parts of the middle Yangtze River, with the overall spatial pattern being stable. The few areas that have converted are mainly located in the lower and middle reaches, where human activity is more intense, with concentrations in regions such as Sichuan and Jiangxi provinces.
For the water functional zones that were Reserved Zones in 2010, 82.7% remained in the same category, but 12.6% were converted to Development Zones, indicating some level of development in the Reserved Zones over the past decade. Spatially, the conversion of Reserved Zones is mainly concentrated in the cities and economically active regions of the middle and lower Yangtze River, while the upper and more remote areas still maintain relatively intact Reserved Zones.
Of the water functional zones that were Development Zones in 2010, 87.8% remained as Development Zones, with only a small portion converting to Reserved Zones (10.4%) and Protection Zones (1.8%). Spatially, Development Zones are still highly concentrated in the downstream areas, with regions that transitioned to Reserved Zones or Protection Zones mainly found in areas such as Hubei and Chongqing.
For the water functional zones that were Buffer Zones in 2010, 76.2% remained Buffer Zones in 2020, with the rest transitioning to Development Zones (10.9%), Reserved Zones (7.9%), and Protection Zones (5.0%). Spatially, the adjustments in Buffer Zones are mainly found at the provincial boundary areas, such as the boundary between Hubei and Anhui.

5. Discussion

5.1. Discussion of the Correlation Between Water Functional Zones and Feature Variables

This study quantitatively reveals the correlation between different water functional zones and feature variables using the KW-RBC and Chi-V methods. In the protection and development zones, features reflecting socioeconomic and human activities, such as GDP, population density, nighttime light intensity, and cultivated land, show a negative correlation in the protection zones, while they exhibit a positive correlation in the development zones. This pattern aligns closely with existing water zoning theories. Yuan Hongren et al. [47] clearly pointed out that protection zones are critical for the protection of water resources, natural ecosystems, and endangered species. Hu Liping et al. [48] stated that development zones must meet the water demands of urban life, industrial and agricultural production, fisheries, and recreational activities, with their classification closely related to indicators such as industrial output, non-agricultural population, and urban water consumption. In the reserved zones, forest cover and NDVI values show significant positive correlations, while GDP, nighttime light intensity, and impervious surface area show strong negative correlations. This supports Li Zhan’s [49] concept of reserved zones as “sustainable development reserves,” which must balance current protection with future potential development. Finally, in the buffer zones, the significant correlation with the inter-provincial boundary distance (DisPb) verifies Yang Yongde’s [50] statement: buffer zones play a key role in coordinating inter-provincial water conflicts.
To enhance interpretability, we further present spatial distribution maps of key feature variables across the Yangtze River Basin (Figure 4, Figure 5 and Figure 6), helping readers intuitively understand their spatial distribution patterns and gain deeper insights into why certain regions are more likely to be classified into specific types of water functional zones. For instance, Protection Zones exhibit a significant negative correlation with both GDP and cultivated land. The spatial distribution maps of GDP and cultivated land (Figure 4 and Figure 5) clearly show that Protection Zones are typically located in areas with weaker economic activity, lower GDP levels, and lower cultivated land coverage. The spatial distribution map of DEM (Figure 6) reveals that changes in Protection Zones usually occur in lower elevation areas of the middle and lower reaches, while in the upstream high-altitude areas, the types of Protection Zones remain relatively stable. By presenting these spatial distribution maps of feature variables, we are able to more clearly reveal the spatial relationships between water functional zones and feature variables, thereby uncovering the underlying spatial logic behind this statistical relationship.
Although previous studies have theoretically demonstrated that water functional zoning should consider natural ecology, socioeconomic, and human activity attributes, most focus on qualitative analysis and lack quantitative validation of the “dominant role of specific factors in each functional zone.” This study not only validates the scientific basis of the zoning principles but also clarifies the intrinsic mechanisms between each functional zone and specific indicators, providing a quantifiable scientific foundation for the dynamic adjustment and precise optimization of future water functional zoning. This drives the zoning process from experience-based judgment to data-driven decision-making.

5.2. Discussion of the Division Results of the First-Level Water Functional Zones in the Yangtze River Basin in 2020

This study presents the division results of the first-level water functional zones in the Yangtze River Basin in 2020. The analysis reveals that over the past decade, the first-level water functional zones in the Yangtze River Basin have undergone various changes, such as the conversion of protection zones to development zones, reserved zones to development zones, and buffer zones to development zones. For example, the national-level nature reserve for rare and endemic fish species in the upper reaches of the Yangtze River (Figure 10(a1,a2)), the Fuh River reserved zone (Figure 10(b1,b2)), and the Yangtze River buffer zone in Hubei and Jiangxi (Figure 10(c1,c2)). Satellite imagery clearly shows the intensifying degradation of vegetation along the rivers in these areas, the accelerating urbanization process, and increasing human activity, indicating a high probability of changes in these water functional zones. The machine learning methods employed in this study provide scientific analytical results that can assist local governments in identifying and prioritizing water functional zones that require attention and adjustment, thereby offering important decision support for water resource management and optimization.

5.3. Future Work and Limitations

This study has made some progress in water functional zoning, but there are still limitations. The absence of some key variables (such as water quality data and water intake point distribution) has impacted the model’s accuracy. Additionally, the changes in water functional zones in the Yangtze River Basin from 2010 to 2020 are data-driven results, and the actual changes still need to be further verified and adjusted in conjunction with local policies, economic activities, and social demands. However, from a management practice perspective, these results can help identify areas that require focused attention and adjustment in water functional zone optimization. Future research should focus on expanding and improving data collection, particularly by supplementing key data, to obtain more comprehensive information on changes in water functional zones in the basin. This will help improve the accuracy and reliability of the model, providing more detailed and scientific decision support for water resource management.

6. Conclusions

This study, based on machine learning methods and utilizing 2010 first-level water functional zone data from the Yangtze River Basin, proposes a Progressive Zoning Framework. The findings are as follows:
(1) In the Yangtze River Basin, Protection Zones are negatively correlated with socioeconomic and human activity-related variables (such as nighttime light intensity and GDP, etc.), while Development Zones show a positive correlation with these variables. Reserved Zones are significantly correlated with both socioeconomic and human activity factors, as well as natural ecological aspects (such as LigDen, GDP, FL, NDVI, etc.). Buffer Zones exhibit significant correlation with the distance from provincial boundaries.
(2) The Progressive Zoning Framework effectively leverages the advantages of the RF and GWRF models in each category, accurately classifying the four types of water functional zones. Compared to predictions made using individual models, the model accuracy improved to 0.78, significantly enhancing the precision and reliability of the water functional zone classification. Based on this Progressive Zoning Framework, the distribution of water functional zones in 2020 was predicted. It was found that the transitions of Protection and Reserved Zones were mostly located in areas with stronger human activity in the middle and lower reaches, primarily concentrated in the Sichuan Basin and Jiangxi Province. Development Zones remain highly concentrated in the downstream areas, with some areas transitioning to Reserved or Protection Zones, mainly in Hubei and Chongqing. Adjustments in the Buffer Zones are mainly located in the provincial boundary areas, such as the boundary between Hubei and Anhui Provinces.

Author Contributions

Methodology, W.L. and F.D.; investigation, Y.S., B.W. and W.L.; data curation, Y.S. and W.L.; writing—original draft preparation, Y.S., W.L., M.S., Y.Y., F.D. and B.W.; writing—review and editing, Y.S., W.L., F.D., L.L., H.L. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Xiamen Natural Science Foundation Project (3502Z202372044), Fujian Provincial Education Department Project (JAT220330) and Science and Technology Research Project of Xiamen University of Technology (YKJ22040R).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

Author Xiaoyan Zhang was employed by the company China Sciences Landsenses (Amoy) Ecology and Environment Group. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Explanation and description of feature data.
Table A1. Explanation and description of feature data.
VariableAbbreviationVariable Explanation
CroplandCLCropland Proportion
Forest LandFLForest Land Proportion
ShrublandSLShrubland Proportion
GrasslandGLGrassland Proportion
WaterbodyWBWaterbody Proportion
SnowSNSnow Proportion
Bare LandBLBare Land Proportion
Impervious Surface AreaISAImpervious Surface Proportion
WetlandWLWetland Proportion
Electricity Consumption_MeanEleCMAnnual Average Electricity Consumption
Light_Density_per_km2LigDenNighttime Light Intensity per Square Kilometer
NDVI_MinNDVINDVI Minimum Value
PM2.5_MeanPM2.5Average PM2.5 Value
NO2_MeanNO2Average NO2 Value
Annual precipitation_MeanAPMAnnual Average Precipitation
Population_Density_per_km2PopDenTotal Population per Square Kilometer
GDP_per_km2GDPTotal GDP per Square Kilometer
DEM_MeanDEMAverage DEM Value
Slope_MeanSlopeAverage Slope Value
Distance to provincial boundaryDisPbDistance from Water Function Zone to Provincial Boundary
Intersect with the Provincial boundaryInterPbWhether the Water Function Zone Intersects with Provincial Boundary
Figure A1. Violin Plot of Feature Variables 1.
Figure A1. Violin Plot of Feature Variables 1.
Water 18 00209 g0a1
Figure A2. Violin Plot of Feature Variables 2.
Figure A2. Violin Plot of Feature Variables 2.
Water 18 00209 g0a2
Figure A3. Q-Q Plot Distribution of Feature Variables in Protection Area.
Figure A3. Q-Q Plot Distribution of Feature Variables in Protection Area.
Water 18 00209 g0a3
Figure A4. Q-Q Plot Distribution of Feature Variables in Reserved Area.
Figure A4. Q-Q Plot Distribution of Feature Variables in Reserved Area.
Water 18 00209 g0a4
Figure A5. Q-Q Plot Distribution of Feature Variables in Development Area.
Figure A5. Q-Q Plot Distribution of Feature Variables in Development Area.
Water 18 00209 g0a5
Figure A6. Q-Q Plot Distribution of Feature Variables in Buffer Area.
Figure A6. Q-Q Plot Distribution of Feature Variables in Buffer Area.
Water 18 00209 g0a6

References

  1. Yuan, H.; Shen, F.; Wei, K. Preliminary study on river function regionalization. Water Resour. Prot. 2011, 27, 13–16. [Google Scholar] [CrossRef]
  2. Chen, W. Study on Optimum Layout of Water Function Areas—Taking Shaying River of Huaihe River Basin as an Example. Master’s Thesis, Northwest University, Xi’an, China, 2019. [Google Scholar]
  3. Long, D.; Pan, W. Stream protection and ecological rehabilitation. Adv. Sci. Technol. Water Resour. 2006, 26, 21–25. [Google Scholar] [CrossRef]
  4. Wu, Y.; Wang, G.; Wu, Y.; Feng, H.; Shen, F.; Lei, S.; Shi, R. Methods of River Functional Zoning and Case Study. Adv. Water Sci. 2011, 22, 741–749. [Google Scholar] [CrossRef]
  5. Huang, Y.F. Study on the Indicator System and Zoning Methods of Water Functional Areas in Fujian Province. Hydraul. Sci. Technol. 2016, 34–38, 44. [Google Scholar]
  6. TB 10099–2017; Code for Design of Railway Station and Terminal. China Railway Publishing House: Beijing, China, 2017.
  7. Zhang, X.; Ru, B. Water Functional Zoning and Management Recommendations for Nanhai District, Foshan City. Pearl River Navig. 2016, 90–91. [Google Scholar] [CrossRef]
  8. Zhao, Y.; Ding, A.; Pan, C.; Xu, X.; Li, Y.; Li, J. Theory for River Functional Regionalization and A Case Study. Sci. Technol. Rev. 2013, 31, 60–64. [Google Scholar] [CrossRef]
  9. Liu, F.; Guo, Y. Discussion on the Refinement of Water Functional Zoning and the Coordinated Optimization of Protection Zones in Poyang Lake. Yangtze River 2018, 49, 27–31. [Google Scholar] [CrossRef]
  10. Liu, H.M.; Zhang, T.R.; Cen, D.H.; Lin, M.L. Water Function Zoning Adjustment Demonstration and a Aase Analysis. Guangdong Water Resour. Hydropower 2018, 45–48, 51. [Google Scholar]
  11. Zhao, W.; Pang, Y.; Chen, Y.N. Study on adjustment of water environmental function zoning in Nanjing City. Water Resour. Prot. 2012, 28, 76–79. [Google Scholar]
  12. Hu, K.; Pang, Y.; Yu, H.; Li, Z.; Wang, M. Adjustment of water environmental functional zones in Taihu Lake. J. Hohai Univ. Nat. Sci. 2012, 40, 503–508. [Google Scholar] [CrossRef]
  13. Luo, H.P.; Pang, Y.; Xu, L.Y. Preliminary study on adjustment scheme of water function zone in Wuxi. J. Water Resour. Water Eng. 2015, 26, 114–120. [Google Scholar]
  14. Rodrigues, C.; Veloso, M.; Alves, A.; Bento, C.L. Socioeconomic and functional zoning characterization in a city: A clustering approach. Cities 2025, 163, 106023. [Google Scholar] [CrossRef]
  15. Song, X.; Pu, Y.; Liu, D.; Feng, Y. Mining Urban Functional Areas Using Pedestrians’ Movement Trajectories. Acta Geod. Cartogr. Sin. 2015, 44, 82–88. [Google Scholar]
  16. Luo, G.; Ye, J.; Wang, J.; Wei, Y. Urban Functional Zone Classification Based on POI Data and Machine Learning. Sustainability 2023, 15, 4631. [Google Scholar] [CrossRef]
  17. Guangyu, Q. Urban Functional Area Identification and Spatial Structure Analysis Integrating Multi-Source Data—Taking Nanchang City as an Example; East China University of Technology: Nanchang, China, 2024. [Google Scholar]
  18. Li, Y.; Zhang, F.; Li, R.; Yu, H.; Chen, Y.; Yu, H. Comprehensive Ecological Functional Zoning: A Data-Driven Approach for Sustainable Land Use and Environmental Management—A Case Study in Shenzhen, China. Land 2024, 13, 1413. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Liu, S.; Yu, P.; Lu, Y.; Zhang, Y.; Zhang, J.; Chen, Y. Delineating Ecological Functional Zones and Grades for Multi-Scale Ecosystem Management. Land 2024, 13, 1624. [Google Scholar] [CrossRef]
  20. Shao, S.; Yang, Y. Identification of ecological improvement zones in different ecological functional zones in north-west Hubei, China. Ecol. Indic. 2023, 155, 111032. [Google Scholar] [CrossRef]
  21. Sunde, M.; Diamond, D.; Elliott, L. Ecological Systems Classification: Integrating Machine Learning, Ancillary Mod-eling, and Sentinel-2 Satellite Imagery. Remote Sens. 2024, 16, 4440. [Google Scholar] [CrossRef]
  22. Li, N.; Lu, H. Regionalization method for water resources utilization based on cluster analysis. J. Shenyang Univ. Technol. 2021, 43, 425–431. [Google Scholar] [CrossRef]
  23. Wang, S.C. Study on Water Functional Zoning Method Based on Dynamic Fuzzy Clustering. Hydraul. Sci. Technol. 2016, 29–33. Available online: https://d.wanfangdata.com.cn/periodical/CiFQZXJpb2RpY2FsQ0hJU29scjlTMjAyNTEwMjEwOTUwNDYSEHNodWlsa2oyMDE2MDQwMDgaCGNkN29jNXVi (accessed on 29 December 2025).
  24. Enguehard, L.; Falco, N.; Schmutz, M.; Newcomer, M.E.; Ladau, J.; Brown, J.B.; Bourgeau-Chavez, L.; Wainwright, H.M. Machine-Learning Functional Zonation Approach for Characterizing Terrestrial–Aquatic Interfaces: Application to Lake Erie. Remote Sens. 2022, 14, 3285. [Google Scholar] [CrossRef]
  25. Bai, H.; Zhong, Y.; Ma, N.; Kong, D.; Mao, Y.; Feng, W.; Wu, Y.; Zhong, M. Changes and drivers of long-term land evapotranspiration in the Yangtze River Basin: A water balance perspective. J. Hydrol. 2025, 653, 132763. [Google Scholar] [CrossRef]
  26. Wang, F.; Zhan, C.S.; Pan, C.Z.; Wang, H.X. Theoretical Study on River Functional Zoning. China Rural Water Resour. Hydropower 2009, 33–36. Available online: https://kns.cnki.net/kcms2/article/abstract?v=-djcopRf0qGct_YrvF591ob_eWcKPxFErmp58P2ujBOBeeJLBROD9NJaofhAKgTXvnkqsC4Q_4xk5X5r4oheI-ob8fDm9rmCi--kGuksRMzCAsV9ctXfd4V3kiPiyaGpXVZ6AhwuM6ks0266eMnMGWLCsbCNk19kwJh4frVcCC9TOGDpTrcinkLB27MjBiSP&uniplatform=NZKPT&captchaId=915ace3a-bbfc-4b49-a7aa-9fbd0c943535 (accessed on 29 December 2025).
  27. Approval of the National Major Rivers and Lakes Water Functional Zoning (2011–2030) by the State Council. China Water Resour. Bull. 2011. Available online: https://www.waizi.org.cn/law/9451.html (accessed on 29 December 2025).
  28. Bai, R.; Li, T.; Huang, Y.; Li, J.; Wang, G. An efficient and comprehensive method for drainage network extraction from DEM with billions of pixels using a size-balanced binary search tree. Geomorphology 2015, 238, 56–67. [Google Scholar] [CrossRef]
  29. Didan, K. MOD13A3 MODIS/Terra Vegetation Indices Monthly L3 Global 1km SIN Grid V006. In NASA Land Processes Distributed Active Archive Center; NASA: Washington, DC, USA, 2015. [Google Scholar] [CrossRef]
  30. Wei, J.; Li, Z.Q. ChinaHighPM2.5: High-Resolution and High-Quality Ground-Level PM2.5 Dataset for China (2000–2023); National Tibetan Plateau Data Center: Beijing, China, 2024. [Google Scholar]
  31. Wei, J.; Li, Z.Q. ChinaHighNO2: High-Resolution and High-Quality Ground-Level NO2 Dataset for China (2008–2023); National Tibetan Plateau Data Center: Beijing, China, 2024. [Google Scholar]
  32. Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, RG2004. [Google Scholar] [CrossRef]
  33. Chen, J.; Gao, M. Global 1 km × 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data. Sci. Data 2022, 9, 202. [Google Scholar] [CrossRef]
  34. Hu, J.; Miao, C. CHM_PRE V2: An upgraded high-precision gridded precipitation dataset for the Chinese mainland considering spatial autocorrelation and covariates (V2.0.2). Zenodo 2025. Available online: https://zenodo.org/records/14634575 (accessed on 7 January 2026).
  35. Wu, Y.; Shi, K.; Chen, Z.; Liu, S.; Chang, Z. An improved time-series DMSP-OLS-like data (1992–2024) in China by integrating DMSP-OLS and SNPP-VIIRS. Harv. Dataverse 2021. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GIYGJU (accessed on 7 January 2026).
  36. Qian, L. A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGBoost. Master’s Thesis, Zhejiang University, Hangzhou, China, 2018. [Google Scholar]
  37. Niu, X.; Ling, F. Study on Personal Credit Risk Assessment Model Based on Hybrid Learning. J. Fudan Univ. Nat. Sci. 2021, 60, 703–719. [Google Scholar] [CrossRef]
  38. Wan, C.; Li, X.; Yang, Z.; Du, F.; Chen, X. Comparative analysis of rural water supply risk identification models based on machine learn-ing algorithms. J. China Inst. Water Resour. Hydropower Res. 2025, 23, 297–306. [Google Scholar] [CrossRef]
  39. Yue, H.; Chen, J. Interpretable spatial machine learning for understanding spatial heterogeneity in factors affecting street theft crime. Appl. Geogr. 2025, 175, 103503. [Google Scholar] [CrossRef]
  40. Zhang, Y.; Ge, J.; Wang, S.; Dong, C. Optimizing urban green space configurations for enhanced heat island mitigation: A geographically weighted machine learning approach. Sustain. Cities Soc. 2025, 119, 106087. [Google Scholar] [CrossRef]
  41. Gautham, V.M.; Sumesh, A.; Jithin, E.V.; Rameshkumar, K.; Thekkuden, D.T. Evaluation of Time-Domain Acoustic Signature in TIG Welding of 5083 Aluminum Alloy: A Methodological Comparison of Feature Reduction Approaches. Results Eng. 2025, 26, 105062. [Google Scholar] [CrossRef]
  42. Xu, S.; Shi, R.; Zhao, Q. Research on River Functional Zoning. Sci. China Press 2009, 39, 1521–1528. [Google Scholar] [CrossRef]
  43. Garg, A.; Ramamurthi, N.; Das, S.S. Addressing Imbalanced Classification Problems in Drug Discovery and Devel-opment Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML. J. Chem. Inf. Model. 2025, 65, 3976–3989. [Google Scholar] [CrossRef]
  44. Ghosh, K.; Bellinger, C.; Corizzo, R.; Branco, P.; Krawczyk, B.; Japkowicz, N. The class imbalance problem in deep learning. Mach. Learn. 2024, 113, 4845–4901. [Google Scholar] [CrossRef]
  45. Ma, Y.; Kong, D.; Ye, X.; Ding, Y. Application and comparison of type 2 diabetes with comorbid hypertension classification prediction models based on random forest and XGBoost algorithms. J. Guangdong Med. Coll. 2024, 42, 523–534. [Google Scholar] [CrossRef]
  46. Wu, D.; Zhang, Y.; Xiang, Q. Geographically weighted random forests for macro-level crash frequency prediction. Accid. Anal. Prev. 2024, 194, 107370. [Google Scholar] [CrossRef] [PubMed]
  47. Yuan, H.; Luo, X. Method and practice of water function zone division. Yangtze River 2001, 32, 13–15. [Google Scholar] [CrossRef]
  48. Hu, L.; Xing, J. The division of water function areas of jiangxi province and the analysis of water quality which reach the standard or not. Jiangxi Hydraul. Sci. Technol. 2003, 29, 154–157. [Google Scholar] [CrossRef]
  49. Li, Z. Research on water function regionalization hierarchical classification system. Glob. Seabuckthorn Res. Dev. 2016, 26–28. [Google Scholar] [CrossRef]
  50. Yang, Y.; Fu, H.; Wu, T. Protection and Management of Water Function Zone in the Yangtze River Basin. Technol. Econ. Change 2018, 2, 45–52. [Google Scholar] [CrossRef]
Figure 1. Study area overview. (a) Geographical Location of the Yangtze River Basin in China. (b) DEM Distribution of the Yangtze River Basin.
Figure 1. Study area overview. (a) Geographical Location of the Yangtze River Basin in China. (b) DEM Distribution of the Yangtze River Basin.
Water 18 00209 g001
Figure 2. Spatial distribution of first-level water functional zones. (a) Protection area, (b) Reserved area, (c) Development area, (d) Buffer area.
Figure 2. Spatial distribution of first-level water functional zones. (a) Protection area, (b) Reserved area, (c) Development area, (d) Buffer area.
Water 18 00209 g002
Figure 3. Technical flowchart.
Figure 3. Technical flowchart.
Water 18 00209 g003
Figure 4. Spatial distribution map of the proportions of six major land use types in the Yangtze River Basin in 2010. (a) Cropland, (b) Forest Land, (c) Grassland, (d) Impervious Surface Area, (e) Bare Land, and (f) Wetland.
Figure 4. Spatial distribution map of the proportions of six major land use types in the Yangtze River Basin in 2010. (a) Cropland, (b) Forest Land, (c) Grassland, (d) Impervious Surface Area, (e) Bare Land, and (f) Wetland.
Water 18 00209 g004
Figure 5. The spatial distribution map of socioeconomic and human activity features in the Yangtze River Basin in 2010. (a) GDP, (b) Nighttime Light, (c) Electricity, and (d) Population.
Figure 5. The spatial distribution map of socioeconomic and human activity features in the Yangtze River Basin in 2010. (a) GDP, (b) Nighttime Light, (c) Electricity, and (d) Population.
Water 18 00209 g005
Figure 6. The spatial distribution map of natural ecological features in the Yangtze River Basin in 2010. (a) Precipitation, (b) NDVI, (c) PM2.5, (d) NO2, (e) DEM, and (f) Slope.
Figure 6. The spatial distribution map of natural ecological features in the Yangtze River Basin in 2010. (a) Precipitation, (b) NDVI, (c) PM2.5, (d) NO2, (e) DEM, and (f) Slope.
Water 18 00209 g006
Figure 7. Correlation Analysis Results Chart. Note: *** indicates a significance level of p < 0.001, ** indicates a significance level of p < 0.01, * indicates a significance level of p < 0.05. Bars with hatch patterns represent significant results, with + indicating a positive correlation and − indicating a negative correlation.
Figure 7. Correlation Analysis Results Chart. Note: *** indicates a significance level of p < 0.001, ** indicates a significance level of p < 0.01, * indicates a significance level of p < 0.05. Bars with hatch patterns represent significant results, with + indicating a positive correlation and − indicating a negative correlation.
Water 18 00209 g007
Figure 8. Model training results chart.
Figure 8. Model training results chart.
Water 18 00209 g008
Figure 9. Spatiotemporal Evolution Characteristics of First-Level Water Functional Zones in the Yangtze River Basin in 2020. (a) Protection Area, (b) Reserved Area, (c) Development Area, (d) Buffer Area. Note: The arrow on the left side of the figure represents the water functional zone type in 2010, and the arrow on the right side represents the water functional zone type in 2020.
Figure 9. Spatiotemporal Evolution Characteristics of First-Level Water Functional Zones in the Yangtze River Basin in 2020. (a) Protection Area, (b) Reserved Area, (c) Development Area, (d) Buffer Area. Note: The arrow on the left side of the figure represents the water functional zone type in 2010, and the arrow on the right side represents the water functional zone type in 2020.
Water 18 00209 g009
Figure 10. Satellite Images of Water Functional Zones for 2010 and 2020. Note: In the figure, (a1c1) represent the satellite images from 2010; (a2c2) represent the satellite images from 2020. The red line in the figure indicates the river, with the following coordinates: (a1,a2) (105.027777, 28.707996), (b1,b2) (116.456455, 27.054531), (c1,c2) (115.769840, 29.819398).
Figure 10. Satellite Images of Water Functional Zones for 2010 and 2020. Note: In the figure, (a1c1) represent the satellite images from 2010; (a2c2) represent the satellite images from 2020. The red line in the figure indicates the river, with the following coordinates: (a1,a2) (105.027777, 28.707996), (b1,b2) (116.456455, 27.054531), (c1,c2) (115.769840, 29.819398).
Water 18 00209 g010
Table 1. Standardized quantitative indicator system.
Table 1. Standardized quantitative indicator system.
CategorySubcategorySpecific Feature Data
Natural Ecology DataHydrological DataWater Quality, Average Water Flow, Sediment Concentration, Runoff Depth.
Meteorological DataAir Temperature, Precipitation, Evapotranspiration (ET), Sunshine Duration, PM2.5, NO2.
Land Use DataForest, Shrubland, Grassland, Waterbody, Snow, Bare Land, Wetland.
Vegetation Distribution DataNDVI, EVI, FVC.
Topographic DataDrainage Area, River Length, Drainage Density, Distance to Provincial Boundary, DEM, Slope.
Socioeconomic and Human Activity DataSocioeconomic DataGDP, Population, Gross Output Value of Industry and Agriculture.
Human Activity DataNighttime Light, POI, Electricity Consumption, Impervious Surface, Cropland, Location of Water Withdrawal Points and Water Withdrawal Amount, Location of Wastewater Discharge Points and Wastewater Discharge Volume.
Table 2. Source of Feature Data Used in This Paper.
Table 2. Source of Feature Data Used in This Paper.
NameTimeSpatial ResolutionSource
Land Use Data2000, 2010, 202030 mhttp://globallandcover.com/
NDVI Data [29]Feb 2000–Dec 20241 kmhttps://www.earthdata.nasa.gov/
PM2.5 Data [30]2000–20231 kmhttps://data.tpdc.ac.cn/home
NO2 Data [31]2008–20232008–2018:10 kmhttps://data.tpdc.ac.cn/home
2019–2023:1 km
DEM and Slope Data [32]Null30 mhttps://www.earthdata.nasa.gov/
Electricity Consumption Data [33]1992–20191 kmhttps://figshare.com
GDP [33]1992–20191 kmhttps://figshare.com
Annual Precipitation Data [34]1 January 1960–31 December 20230.1°https://zenodo.org
Population Data2010–2020100 mhttps://hub.worldpop.org
Nighttime Light Data [35]1992–20231 kmhttps://dataverse.harvard.edu
Table 3. Bayesian optimization of RF’s optimal parameters.
Table 3. Bayesian optimization of RF’s optimal parameters.
ParameterExplanationBayesian Optimization RangeOptimal Parameter
max_depthMaximum depth of the tree(5, 15)12
min_samples_splitMinimum number of samples required to split an internal node(2, 15)6
min_samples_leafMinimum number of samples required to be at a leaf node(1, 10)5
max_featuresMaximum number of features considered when looking for the best split(0.1, 1)0.67
n_estimatorsNumber of trees in the forest(10, 300)121
max_samplesMaximum number of samples used to train each estimator(0.7, 1)0.98
Table 4. Bayesian optimization of XGBoost’s optimal parameters.
Table 4. Bayesian optimization of XGBoost’s optimal parameters.
ParameterExplanationBayesian Optimization RangeOptimal Parameter
max_depthMaximum depth of the tree(3, 15)5
n_estimatorsNumber of boosting trees(100, 500)111
learning_rateLearning rate(0.01, 0.3)0.288
subsampleSubsample ratio of the training instances(0.5, 1)0.879
colsample_bytreeSubsample ratio of columns when constructing each tree(0.5, 1)0.597
gammaMinimum loss reduction required to make a further partition(0, 5)1.459
min_child_weightMinimum sum of instance weight needed in a child node(0, 5)4
Table 5. Model Performance Comparison.
Table 5. Model Performance Comparison.
ModelAccuracyPrecisionRecallF1-Score
RF0.73220.75700.74270.7481
XGBoost0.71280.75480.70630.7245
GWRF0.74770.72390.73160.7274
RF+GWRF0.78080.79210.79650.7943
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, W.; Sun, Y.; Deng, F.; Wu, B.; Zhang, X.; Sun, M.; Li, L.; Li, H.; Yuan, Y. Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin. Water 2026, 18, 209. https://doi.org/10.3390/w18020209

AMA Style

Liu W, Sun Y, Deng F, Wu B, Zhang X, Sun M, Li L, Li H, Yuan Y. Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin. Water. 2026; 18(2):209. https://doi.org/10.3390/w18020209

Chicago/Turabian Style

Liu, Wei, Yuanzhuo Sun, Fuliang Deng, Bo Wu, Xiaoyan Zhang, Mei Sun, Lanhui Li, Hui Li, and Ying Yuan. 2026. "Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin" Water 18, no. 2: 209. https://doi.org/10.3390/w18020209

APA Style

Liu, W., Sun, Y., Deng, F., Wu, B., Zhang, X., Sun, M., Li, L., Li, H., & Yuan, Y. (2026). Water Functional Zoning Framework Based on Machine Learning: A Case Study of the Yangtze River Basin. Water, 18(2), 209. https://doi.org/10.3390/w18020209

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop