Assessing and Predicting the Water Resources Vulnerability under Various Climate-Change Scenarios: A Case Study of Huang-Huai-Hai River Basin, China

The Huang-Huai-Hai River Basin plays an important strategic role in China’s economic development, but severe water resources problems restrict the development of the three basins. Most of the existing research is focused on the trends of single hydrological and meteorological indicators. However, there is a lack of research on the cause analysis and scenario prediction of water resources vulnerability (WRV) in the three basins, which is the very important foundation for the management of water resources. First of all, based on the analysis of the causes of water resources vulnerability, this article set up the evaluation index system of water resource vulnerability from three aspects: water quantity, water quality and disaster. Then, we use the Improved Blind Deletion Rough Set (IBDRS) method to reduce the dimension of the index system, and we reduce the original 24 indexes to 12 evaluation indexes. Third, by comparing the accuracy of random forest (RF) and artificial neural network (ANN) models, we use the RF model with high fitting accuracy as the evaluation and prediction model. Finally, we use 12 evaluation indexes and an RF model to analyze the trend and causes of water resources vulnerability in three basins during 2000–2015, and further predict the scenarios in 2020 and 2030. The results show that the vulnerability level of water resources in the three basins has been improved during 2000–2015, and the three river basins should follow the development of scenario 1 to ensure the safety of water resources. The research proved that the combination of IBDRS and an RF model is a very effective method to evaluate and forecast the vulnerability of water resources in the Huang-Huai-Hai River Basin.


Introduction
Water resource is widely regarded as among the most important elements to sustain both ecosystems and the economy [1]. The comprehensive management of water resources based on the basin as a unit conforms to the law of natural migration and economic and social characteristics of water resources, which ensures that all the functions of the water resources are addressed. However, which means that this region may suffer from both flooding and drought at the same time. According to statistics from other studies, there are two droughts every three years in this region. Due to the joint action of human activities and climate change, the underlying conditions of these basins have changed significantly, resulting in profound changes in the relationship between precipitation and water resources, so the water resources have been greatly reduced. Hence, as the water resources situation become more vulnerable than in the past, the water resources management in this region is facing great challenges.

Data Sources
For the empirical part of this study, our research period is from 2000 to 2015. For the scenario predictions part, we primarily focus on the water vulnerability at the time points of the years 2020 and 2030, respectively. The data for these two parts of studies are obtained via different ways.
We choose 2000-2015 as the research period for two important reasons: one is to consider the availability of data, the other is the effect of water resources policy of the State Council. The State Council issued the "State Council's Views on the Strictest Water Resources Management System (SWRMS)" in document [35]. We chose the research period that can cover this event, which enables us to examine the effect of this new policy. Data are collected from each of these three basins. Since the climate change scenarios are not certain in the future, we mainly set up three future scenarios. For each scenario, the value of all 24 variables will be changed following certain rules. The indicators of the three scenarios in the future mainly include two categories; one is the indicators that are greatly affected by climate, and the other is the indicators that are greatly affected by human social and economic activities. For these two types of indicators, we adopt different scenario data settings.
The first type of scenario data, which are greatly affected by climate, are mainly collected from the research results of Xia Jun's team and others [36][37][38]. Specifically, according to the Fifth The Huang-Huai-Hai Basin is facing severe water resources problems primarily due to a huge imbalance between the limited gross water amount and prosperous social economic activities in this region. The water resources per capita are 462 m 3 in the Huang-Huai-Hai basin, which is only 21% of the national average level. Furthermore, the insufficient amount of water resources is not the only problem. In fact, this region is also facing a serious problem of unevenly distributed precipitation, which means that this region may suffer from both flooding and drought at the same time. According to statistics from other studies, there are two droughts every three years in this region. Due to the joint action of human activities and climate change, the underlying conditions of these basins have changed significantly, resulting in profound changes in the relationship between precipitation and water resources, so the water resources have been greatly reduced. Hence, as the water resources situation become more vulnerable than in the past, the water resources management in this region is facing great challenges.

Data Sources
For the empirical part of this study, our research period is from 2000 to 2015. For the scenario predictions part, we primarily focus on the water vulnerability at the time points of the years 2020 and 2030, respectively. The data for these two parts of studies are obtained via different ways.
We choose 2000-2015 as the research period for two important reasons: one is to consider the availability of data, the other is the effect of water resources policy of the State Council. The State Council issued the "State Council's Views on the Strictest Water Resources Management System (SWRMS)" in document [35]. We chose the research period that can cover this event, which enables us to examine the effect of this new policy. Data are collected from each of these three basins. Since there are 16 observations for each basin, we have a total of 48 observations for all three river basins combined. Since the climate change scenarios are not certain in the future, we mainly set up three future scenarios. For each scenario, the value of all 24 variables will be changed following certain rules. The indicators of the three scenarios in the future mainly include two categories; one is the indicators that are greatly affected by climate, and the other is the indicators that are greatly affected by human social and economic activities. For these two types of indicators, we adopt different scenario data settings.
The first type of scenario data, which are greatly affected by climate, are mainly collected from the research results of Xia Jun's team and others [36][37][38]. Specifically, according to the Fifth Assessment Report of the IPCC, the precipitation and runoff data of the Huang-Huai-Hai Basin in 2020 and 2030 under different climate change scenarios were collected. Scenarios 1, 2, and 3 in this paper correspond to the three concentration emission scenarios of Representative Concentration Path (RCP) 2.6, RCP 4.5, and RCP 8.5 in the report, respectively. For example, the calculation of water production modulus (A 1 ) is based on the precipitation and water production coefficient data under three concentration discharge scenarios to calculate the total water resources. The A 1 is obtained by dividing the total water resources of the basin by the area of the basin.
The second type of scenario data is greatly influenced by human social and economic activities. The forecast data of these indicators mainly come from the water resource management objectives in 2020 and 2030 of the "State Council's Views on the SWRMS" and the comprehensive planning of the three river basins (2012-2030). Among them, the index data of economic activity impact in scenario 1 can fully achieve the target data set in the SWRMS, while scenario 2 and scenario 3 are determined according to the proportion of the data in scenario 1. For example, the qualification rate of water quality in the water function area (B 3 ) of each basin under scenario 1 can fully achieve the planning objectives, while scenario 2 and scenario 3 are calculated according to the current progress and 80% or 60% of the planning objectives. According to the above scenario setting method, we set the index data of three scenarios in the future in 2020 and 2030. The scenario prediction data and prediction process are shown in Table A1 in the Appendix A.

Methodology
The research method and flow chart of this paper are shown in Figure 2, which has three important steps. Firstly, we construct the evaluation index system. Secondly, using the Improved Blind Deletion Rough Set (IBDRS) method to reduce dimension and simplify the original index system. Afterward, the RF model and artificial neural network (ANN) model are used to train the original data of the reduced indicators and the results. By comparing the fitting accuracy of the two models, we choose the one with the higher fitting accuracy as the model used for evaluation and prediction.

Evaluation Index System
In this paper, the index system method is used to evaluate and forecast the river basin water resources vulnerability. On the basis of the causes and manifestations of the water resources vulnerability, we divide the water resource vulnerability index (WVI) into three secondary indexes,

. Evaluation Index System
In this paper, the index system method is used to evaluate and forecast the river basin water resources vulnerability. On the basis of the causes and manifestations of the water resources vulnerability, we divide the water resource vulnerability index (WVI) into three secondary indexes, which are water shortage vulnerability (WSVI), water pollution vulnerability (WPVI), and water-related natural disaster vulnerability (WDVI) [39]. Then, we divide each secondary indicator into four three level indexes, namely, the pressure, state, impact, and response. Finally, we choose 24 indicators as the initial evaluation index set. The evaluation index system is shown in Table 1. In our original comprehensive evaluation index system, the number of evaluation indexes is huge, which will affect the efficiency of evaluation and prediction. Therefore, we need to reduce the dimension of the original index system. The main principles in the process of reducing indicator dimensions are to maintain the same accuracy as the original index and ultimately improve the efficiency of evaluation and prediction. To balance the interpretability and predicting performance and to keep the RF and ANN models comparable, we must make sure that all models we wanted to compare have the same input variables. Therefore, we disabled the variable selection process embedded in the RF and ANN models but use an independent variable selection process.
Specifically, in this study, the IBDRS is used to screen the evaluation indicators. Rough set is a mathematical tool for dealing with incompleteness and uncertainty. This method is widely used in many fields of natural science, engineering technology, and social science [40]. Compared with the common methods of index dimension reduction such as traditional rough set (RS), analytic hierarchy process (AHP), and principal component analysis (PCA), the IBDRS method achieves a perfect balance in the ability of data mining, variable interpretation, and subjective will.
For example, in the process of reduction, the RS finds the hidden knowledge and rules through the analysis of each index data, which is scientific. However, at the same time, important indicators may be deleted. In addition, the AHP method is very subjective in the process of index deletion, which depends on the judgment of experts. However, the PCA method is relatively objective, but in the process of dimensionality reduction, there are new composite indicators, so it is impossible to analyze the original indicators. For the reason of retaining some necessary attributes in the system subjectively while reducing the data dimension effectively, we choose the IBDRS method for this study.
The main ideas of IBDRS are as follows. Firstly, according to the classification standard of each evaluation index, the original numerical data and the evaluation results calculated by the entropy weight method are discretized to form a decision table. Then, we use the traditional RS method to reduce the dimension of the index and find the core of the index set. Thirdly, we add important indicators to the core. At the same time, the balance of the number of indicators between the secondary indicators should be considered. After adding an indicator, the equivalence relationship between the subset before adding and the subset after adding will be verified according to the concept of rough set. The detailed principles and procedures of this method can refer to the work by Pawlak [40], and the detailed calculation steps of this method are shown in reference [39].

Random Forest and Artificial Neural Network Models
At present, there are two main types of prediction methods. The first kind of forecasting method is the traditional statistical method, mainly including a regression analysis model, time series analysis model, and other methods. The advantage of these methods is that the structure is simple and easy to identify, but it is difficult to achieve ideal results in the data is in non-linear form [41]. The second is machine learning models; some of commonly used methods fall into this category include ANN, RF, and Support Vector Machine (SVM) models [42]. At present, the application of machine learning models in the field of evaluation and prediction is booming.
RF is a combinatorial algorithm based on multiple categorical and regression trees (CART) first proposed by Breiman in 2001 [43]. RF has been widely used in classification, evaluation, and prediction. In recent years, many studies have also been applied in the fields of hydrology and water resources. In the classification, a novel hierarchical object-based Random Forest classification approach can be used to distinguish different land cover types, which have accuracy rates over 90% [44]. RF could also establish a basin hydrological evaluation model by determining the weight of the index, and the accuracy rate is higher than the entropy weight method [45]. In addition, RF in hydrological data prediction has a strong advantage, and its prediction results are better than Poisson regression [46], which could also effectively divide flood-prone areas [47,48]. However, the research trend of RF is more combined with other model methods, and the mixed model can make up for the shortcomings of a single model. In the mixed model, some scholars have tried to combine RF with the Wavelet model, Kernel Ridge Regression (KRR), and other models [49,50]. Through empirical analysis, it is confirmed that these composite models are better than a single model.
The random forest algorithm is based on statistical theory, using the bootstrap resampling method to extract multiple samples from original samples. For each extracted sample, a decision tree is constructed to generate multiple complete depth tree models. When the random forest model is used for predicting purpose, the final predicted value is obtained by averaging the predicted value of multiple tree models. On the other hand, when the random forest method is used to solve classification problems, the final prediction can be generated by majority voting. Random Forest is a combination forecasting model, which can be regarded as a strong predictor integrated by many weak predictors (decision trees). These weak predictors complement each other and can reduce the impact of single predictor errors, thus improving the accuracy and stability of prediction. The random forest approach is robust to both outliers and noise multiple collinearities problems [51]. Therefore, random forest has a good performance in multivariate prediction and its interpretation, so that it has been widely used in many fields such as medicine, biology, and so on [52]. Actually, RF is almost similar to a black box. It cannot control the internal operation of the model. If there are too many training samples, there might be many similar decision trees during the training process, which mask the true results. Besides, for small or low-dimensional data, overfitting may also occur on some noisy classification or regression problems. However, the accuracy of the model can be improved by adjusting the parameters ntree to avoid over fitting. RF model has a strong advantage in dealing with large-scale multivariate data, which meets the requirements of water resources vulnerability assessment and prediction involving multiple indicators and complex data processing.
ANN is a non-linear and adaptive information processing system based on the research results of modern neuroscience. ANN has low requirements for input information, which has many applications in the field of hydrological prediction. It could effectively detect flood-prone areas and be used as a decision support system for the comprehensive evaluation and management of water resources [53]. Since the learning rate of the artificial neural network is fixed, the convergence rate of the network is slow and needs a long training time. So, in recent years, some scholars try to use some algorithms combined with the ANN to speed up its convergence rate, such as AEEMD-ANN (an adaptive ensemble empirical mode decomposition with the ANN), SSA-ANN (a singular spectrum analysis with the ANN), PSO-ANN (Particle Swarm Optimization PSO with the ANN), and other optimization models are proposed [54][55][56].
The artificial neural network model emerged from the field of artificial intelligence in the 1980s [57]. It processes information by simulating the structure and function of a human brain neural network. Artificial neural network is a non-linear system consisting of many interconnected neurons. It is not implemented step by step according to a given procedure, but is trained, studies, and then repeatedly modifies the weights of each neuron. It is because the learning rate of ANN is fixed and the convergence speed of the network is slow, which requires a long training time. However, it could be improved by changing the learning rate or the adaptive learning rate. In addition, due to the lack of training samples or external noise and other factors, the neural network will appear to have an "over fitting" phenomenon in the training process. At present, the artificial neural network model is widely used in many fields such as economy, biology, medicine, and so on. It realizes many functions such as recognition, evaluation, prediction, classification, etc.
We intend to use the RF and ANN to carry out simulation training between the original data and the vulnerability. By comparing the fitting ability of the two models, we find out the best model with the highest fitting accuracy as the evaluation and prediction model.

Result and Discussion
First of all, the IBDRS method is used to reduce the dimension of the initial index system, and the index system after dimension reduction has the same evaluation ability as the initial index. Then, we compare the fitting accuracy of the RF and ANN-based reduced set of variables. Finally, the model with the higher fitting accuracy is selected from these two models to forecast the vulnerability of three river basins in 2020 and 2030 under various scenarios.

Dimension Reduction of Evaluation Index
Our preliminary analysis has shown that some variables are highly correlated with others, which enables us to use the dimension reduction approach aiming at saving computational power while Entropy 2020, 22, 333 9 of 23 maintaining prediction performance. In the process of reducing the original evaluation index set by the IBDRS method, it is necessary to discretize the original data and obtain a decision table C. The decision table C contains two kinds of data: conditional attribute data and decision attribute data. The first category is the condition attributes, which are obtained by the discretization of the original data of each index using the k-means clustering algorithm. The continuous data are transformed into integers between 1 and 4, and the conditional attribute values in the decision table are obtained. The biggest advantage of K-means is that it is easy to understand, simple, and fast to run [58]. We use SPSS 21 software to carry out the k-means clustering process and specify the value of K as 4. However, the other category is the decision attribute, which is determined by the entropy weight method and pre-determined index threshold. We use the entropy weight method and threshold value of 24 original evaluation indexes to determine the decision attributes. The specific steps are listed as follows. Firstly, the weights of the dimensionless indexes are calculated by using the entropy weight method, and seven evaluation grade threshold tables are calculated according to the thresholds of 24 evaluation indexes, as shown in Table 2. Then, according to the weights and dimensionless values of 24 indicators in the three basins from 2000 to 2015, the comprehensive evaluation values of water resources vulnerability in the three basins are obtained by weighting calculation. The decision attributes of the three basins are obtained by judging the level of the comprehensive evaluation value according to the threshold levels in Table 2.
The implementation process of reducing the dimension of evaluation index by the IBDRS method is presented as follows: We define the original conditional attribute set (original evaluation index system) as C and the initial reduction index set as B. Select five indexes from WSVI, WPVI, and WDVI respectively to establish the initial reduction index set B, and Verify the validity of equation pos B (D) = pos C (D). If the above equation holds, it shows that index set B has the same classification ability as initial condition attribute set C, so there is no need to add another indicator to B.
We use the IBDRS method to verify the necessity of each evaluation index in B. In the process of deleting indicators, in order to keep the balance of the number of reduced indicators in WSVI, WPVI, and WDVI as far as possible, the indicators are deleted one by one from WSVI, WPVI, and WDVI or cyclic verification. The main steps are listed as follows.
Step 1: Remove the indicator A 5 , and then the equation pos B−A 5 (D) = pos B (D) holds up, indicating that the indicator A 5 can be removed. In this way, the initial index set B can be reduced to Step 2: Remove the indicator B 4 , and then the equation pos B−B 4 (D) = pos B (D) holds up, indicating that the indicator B 4 can be removed. In this way, the initial index set B can be reduced to Step 3: Remove the indicator C 4 , and then the equation pos B−C 4 (D) = pos B (D) holds up, indicating that the indicator C 4 can be removed. In this way, the initial index set B can be reduced to Step  At this time, B is the smallest reduction set extracted from the original decision table. We define B as a core, and every indicator in this core is necessary. And B = {A 6 , A 7 , B 3 , B 5 , C 3 }.
We find out that there is an unbalanced distribution of the number of indicators in the index set B. This situation may affect the prediction accuracy of WSVI, WPVI, and WDVI. Therefore, based on the experience judgement of experts, we add 7 indicators such as {A 1 , A 8 , B 2 , B 4 , C 6 , C 7 , C 8 } to the simplest core indicator set {A 6 , A 7 , B 3 , B 5 , C 3 } and form the final prediction indicator set B,

Selection of Evaluation and Prediction Models
In this part, we use the RF and ANN models to fit and choose the model with better fitting and prediction ability as the prediction model.

Optimization of Model Parameters
This part mainly we use the reduced index set of 12 indicators, which includes variables A 1 , A 6 , A 7 , A 8 , B 2 , B 3 , B 4 , B 5 , C 3 , C 6 , C 7 , and C 8. The ntree and mtry are the main parameters that need to be set in the RF model. The parameter ntree represents the number of decision trees, and ntree > 100. The parameter mtry is the most sensitive parameter in the RF model, which represents the number of variables selected when the nodes of the decision tree split. Throughout the generation process of a random forest, mtry remains unchanged. When the number of original variable sets is n, it is recommended that mtry be n/3. We use the bootstrapping method to determine the optimal parameters by comparing the OOB (out of bag) errors under different parameters. In another word, the optimal values of ntree and mtry are determined when they lead to the smallest OOB error. As shown in Figure 3, on the left, the prediction accuracy of RF becomes higher and more stable as the number of trees increases. When ntree takes 500 and mtry takes 4, the OOB error is the smallest. The parameters for an ANN model are mainly the number of hidden layers. As one can see from Figure 3 on the right, when the number of hidden layer increases, its accuracy will not raise substantially. When the number of hidden layers of the neural network is 9, the accuracy is the best.
the optimal values of ntree and mtry are determined when they lead to the smallest OOB error. As shown in Figure 3, on the left, the prediction accuracy of RF becomes higher and more stable as the number of trees increases. When ntree takes 500 and mtry takes 4, the OOB error is the smallest. The parameters for an ANN model are mainly the number of hidden layers. As one can see from Figure  3 on the right, when the number of hidden layer increases, its accuracy will not raise substantially. When the number of hidden layers of the neural network is 9, the accuracy is the best.

Evaluation of Fitting Accuracy of Models
This paper evaluates the goodness of fit of the aforementioned two models based on following two indicators: Mean Square Error (MSE) and Normalized Mean Square Error (NMSE). We use the method of 10-fold cross-validation to calculate the mean values of MSE and NMSE as the accuracy criteria. The formulas for these two fitting accuracy evaluation indicators are as follows.
The values of MSE and NMSE indicate the differences between the predicted and actual values of the model. The range of the two indicators is usually 0-1, and the smaller the calculated value of the two indicators, the better the performance of the model.
We evaluate the fitting accuracy of the two models by comparing the MSE and NMSE results of the two models. It can be seen from Table 3 that the RF model shows superior performance compared to the ANN model. Therefore, we choose the RF as the assessment and prediction model.

Evaluation of Fitting Accuracy of Models
This paper evaluates the goodness of fit of the aforementioned two models based on following two indicators: Mean Square Error (MSE) and Normalized Mean Square Error (NMSE). We use the method of 10-fold cross-validation to calculate the mean values of MSE and NMSE as the accuracy criteria. The formulas for these two fitting accuracy evaluation indicators are as follows.
The values of MSE and NMSE indicate the differences between the predicted and actual values of the model. The range of the two indicators is usually 0-1, and the smaller the calculated value of the two indicators, the better the performance of the model.
We evaluate the fitting accuracy of the two models by comparing the MSE and NMSE results of the two models. It can be seen from Table 3 that the RF model shows superior performance compared to the ANN model. Therefore, we choose the RF as the assessment and prediction model.  Figure 4, where the blue line represents the vulnerability value calculated by the original evaluation index system, and the red line represents the numerical curve fitted by the reduced dimension index and the RF method. As shown in Figure 4, the fitting calculation effect of the RF model is better. RF were used to calculate the vulnerability in the Huang-Huai-Hai Basin during 2000-2015, including WVI, WSVI, WPVI, and WDVI. We compare the fitted values with the actual value, as shown in Figure 4, where the blue line represents the vulnerability value calculated by the original evaluation index system, and the red line represents the numerical curve fitted by the reduced dimension index and the RF method. As shown in Figure 4, the fitting calculation effect of the RF model is better.

Assessment of Water Resources Vulnerability in Huang-Huai-Hai Basin
After variable selection, there are 12 indicators that are supplied into a random forest model to evaluate water resources vulnerability in the Huang-Huai-Hai Basin. The results are shown in Table  4.

Assessment of Water Resources Vulnerability in Huang-Huai-Hai Basin
After variable selection, there are 12 indicators that are supplied into a random forest model to evaluate water resources vulnerability in the Huang-Huai-Hai Basin. The results are shown in Table 4. According to the calculation results in Table 4, we have drawn the trend chart of WVI in the Huang-Huai-Hai Basin, as illustrated in Figure 5. We can see that the values of WVI in the three basins all have been decreased slightly from 2000 to 2015. In general, these results show that the level of WRV in the Huang-Huai-Hai Basin has increased during the 16-year period.
The WVI in the Huang River Basin was grade 4 during 2000-2006, which is considered a moderate level of vulnerability. During 2007-2014, the vulnerability level was at level 3, falling into the category of moderate to low vulnerability. It can be seen that the WVI has been alleviated in the Huang River Basin during the 16-year period.
The WVI of the Huai River Basin was grade 4 during 2000-2015, which is considered moderate vulnerability. From the perspective of vulnerability level, there is no significant improvement. However, the absolute value of vulnerability still shows a decreasing trend, indicating small improvements.
Compared with the other two basins, the WVI in the Hai River Basin is more severe. From 2000 to 2009, the level of WVI in the Hai River Basin was at level 5, which is classified as moderate to high vulnerability. The situation has been improved during the time period between 2010 and 2015, but it is still staying at level 4 (moderate vulnerability).
In general, our study has found that the WVI in the Huang River Basin is the best one among these three river basins. For the other two, Huai River Basin is at a better position than the Hai River Basin in terms of water resource vulnerability.

Cause Identification of Vulnerability in Huang-Huai-Hai Basin
From the calculation results and comparative analysis of WVI, WSVI, WPVI, and WDVI in the three river basins, we found that the WSVI and WDVI are the highest in the Hai River Basin. So, the key vulnerability of the Hai River Basin is caused by water shortage, flood, and drought disaster. In recent years, the water resources vulnerability has been alleviated to some extent in the Hai River Basin. The main reasons are the irrigation of the Huang River diversion and the start-up of the Phase I of South-to-North Water Transfer Project. Since the opening of the South-to-North Water Diversion Project, by the end of 2019, more than 300 billion m 3 of water had been transferred to Beijing, Tianjin and Hebei, which greatly alleviated the water shortage and drought in the Hai River Basin.
In addition, the water pollution is the key vulnerability factor in the Huai River Basin, which has become an important factor of the basin.
We use 12 evaluation indexes after dimensionality reduction and the RF model to evaluate the water resources vulnerability in the three basins. The error between the evaluation results and the original 24 evaluation indexes is very small. This shows that our reduced index system is more representative and concise. In addition, through the results of key vulnerability analysis, we can find We can see that the values of WVI in the three basins all have been decreased slightly from 2000 to 2015. In general, these results show that the level of WRV in the Huang-Huai-Hai Basin has increased during the 16-year period.
The WVI in the Huang River Basin was grade 4 during 2000-2006, which is considered a moderate level of vulnerability. During 2007-2014, the vulnerability level was at level 3, falling into the category of moderate to low vulnerability. It can be seen that the WVI has been alleviated in the Huang River Basin during the 16-year period.
The WVI of the Huai River Basin was grade 4 during 2000-2015, which is considered moderate vulnerability. From the perspective of vulnerability level, there is no significant improvement. However, the absolute value of vulnerability still shows a decreasing trend, indicating small improvements.
Compared with the other two basins, the WVI in the Hai River Basin is more severe. From 2000 to 2009, the level of WVI in the Hai River Basin was at level 5, which is classified as moderate to high vulnerability. The situation has been improved during the time period between 2010 and 2015, but it is still staying at level 4 (moderate vulnerability).
In general, our study has found that the WVI in the Huang River Basin is the best one among these three river basins. For the other two, Huai River Basin is at a better position than the Hai River Basin in terms of water resource vulnerability.

Cause Identification of Vulnerability in Huang-Huai-Hai Basin
From the calculation results and comparative analysis of WVI, WSVI, WPVI, and WDVI in the three river basins, we found that the WSVI and WDVI are the highest in the Hai River Basin. So, the key vulnerability of the Hai River Basin is caused by water shortage, flood, and drought disaster. In recent years, the water resources vulnerability has been alleviated to some extent in the Hai River Basin. The main reasons are the irrigation of the Huang River diversion and the start-up of the Phase I of South-to-North Water Transfer Project. Since the opening of the South-to-North Water Diversion Project, by the end of 2019, more than 300 billion m 3 of water had been transferred to Beijing, Tianjin and Hebei, which greatly alleviated the water shortage and drought in the Hai River Basin.
In addition, the water pollution is the key vulnerability factor in the Huai River Basin, which has become an important factor of the basin.
We use 12 evaluation indexes after dimensionality reduction and the RF model to evaluate the water resources vulnerability in the three basins. The error between the evaluation results and the original 24 evaluation indexes is very small. This shows that our reduced index system is more representative and concise. In addition, through the results of key vulnerability analysis, we can find that the index system constructed from the three aspects of water quantity, water quality, and disaster has more advantages than the other functional methods and index systems. This index system is conducive to the cause analysis of the vulnerability of each river basin and the key governance.
Through the trend analysis of the three basins from 2000 to 2015, it can be found that the vulnerability levels of water resources in the Huang River Basin and Hai River Basin have increased by one level after 2007 and 2010 respectively, while the vulnerability levels in the Huai River Basin have not changed and are still at level 4.
The SWRMS issued by the State Council started in 2012, but before and after 2012, the vulnerability of the three basins did not change significantly. However, this paper cannot evaluate the effectiveness of this policy system. The main reason is that the effect of the policy needs to be monitored and observed for a longer period of time. In 2022, we will evaluate the effect of the implementation of the policy for 10 years and study the changes of vulnerability levels before and after the implementation of the policy.

Scenario Prediction of WRV in Huang-Huai-Hai Basin
In the fifth assessment report, the IPCC set four scenarios named the Representative Concentration Path (RCP), namely, Mitigation Emission Path (RCP 2.6), Middle Stability Emission Path (RCP 4.5), High and Stable Emission Path (RCP 6.0), and High Emission Path (RCP 8.5). In these scenarios, the future climate change is predicted solely on the basis of emission changes, while other socio-economic conditions are not considered [33].
On the basis of the three emission paths of RCP 2.6, RCP 6.0, and RCP 8.5, this paper adds the influence of human social and economic activities. We set up three scenarios, namely scenario 1, scenario 2, and scenario 3; the WVI, WSVI, WPVI, and WDVI in the Huang-Huai-Hai Basin in 2020 and 2030 were calculated under three scenarios, as shown in Table 5. Climate change in scenario 1 is based on the RCP 2.6 climate concentration path in the fifth IPCC report, i.e., by 2100, the average temperature rise will be controlled within 2.0 • C, and the radiation forcing will be stable at 2.6 W/m 2 . The changes caused by the impact of human social and economic activities in scenario 1 are the values of indicators under the three red lines of the SWRMS issued by the State Council in 2012.
From Table 5, it can be seen that if each basin develops according to scenario 1, the water resources vulnerability in the three basins will be significantly improved in 2020 and 2030. The WVI, WSVI, WPVI, and WDVI will reach level 3 in the Huang River Basin in 2020. By 2030, the WVI, WPVI, and WDVI can even reach level 2 in the Huang River Basin, which belongs to the mild vulnerability level.
The WVI, WSVI, WPVI, and WDVI can basically reach the level 3 (moderate to mild vulnerability) in the Huai River Basin in 2020 and 2030.
In 2020, the WVI, WSVI, WPVI, and WDVI will at level 4, level 4, level 3, and level 5 respectively in the Hai River Basin. In 2030, the WVI, WSVI, WPVI, and WDVI will at level 3, level 4, level 3, and level 3 respectively in the Hai River Basin.

Water Resources Vulnerability Prediction under Scenario 2
Climate change in scenario 2 is based on the RCP 4.5 climatic concentration path in IPCC's fifth report, i.e., radiation forcing stabilizes at 4.5 W/m 2 by 2100. The change of human activities in the scenario 2 is that the indicators of each basin can reach 80-90% of the three red-line control of the SWRMS issued by the State Council in 2012.
Under Scenario 2, the water resources vulnerability did not change significantly in 2020 and 2030 compared with the level of 2015 in the Huang-Huai-Hai Basins. Especially in the Huai River Basin and Hai River Basin, the vulnerability of water resources is basically at levels 4 and 5, which belongs to the moderate to high vulnerability level.

Water Resources Vulnerability Prediction under Scenario 3
Climate change in scenario 3 is based on the RCP 8.5 climate concentration path in the fifth IPCC report, which assumes that there is a lack of policies to deal with climate change, low rate of technological innovation, the largest population, and slow energy improvement, all of which lead to high greenhouse gas emissions and long-term high energy demand; i.e., by 2100 radiation is forcibly increased to 8.5 W/m 2 . The change of human activities in scenario 3 is to develop the economy in an extensive way. Without paying attention to environmental protection, the indicators can only reach the level of 60-80% required by the three red lines.
Under scenario 3, the water resources vulnerability tends to deteriorate in 2020 and 2030 compared with the base year of 2015 in the Huang-Huai-Hai Basin. The vulnerability of water resources in the three basins in 2020 and 2030 is basically at levels 4 and 5, which belongs to moderate to high vulnerability.
In this paper, the reduced-dimensional index and the RF model are used to predict the water resources vulnerability of the three basins in 2020 and 2030. It is very simple and convenient, because we can directly calculate the vulnerability value and level by substituting the predicted index value into the trained RF model without calculating the weight of each index.
However, this method will also be affected by the accuracy of scenario prediction data of each indicator. In this paper, the forecast value of indicators is calculated according to some existing research results and the realization proportion of 2020 and 2030 planning goals of each basin. In the future, we can further study the prediction methods of each index. In the future, detailed research can be carried out on the prediction of each index.
Through the scenario prediction analysis, we can find that if we develop according to scenario 1, the vulnerability level of water resources in the three basins will be improved in 2020 and 2030. Therefore, we should develop in accordance with scenario 1 to ensure the future water resources security of the river basin, and each river basin should achieve the established goals.

Conclusions
This paper constructs an evaluation index system from the causes of the vulnerability of water resources. Then, the IBDRS method is used to reduce the dimension of the index, and the reduced dimension index has the same evaluation ability as the original index. Furthermore, by comparing the fitting ability of RF and ANN, the RF model was found to be more accurate and thus was used to evaluate and predict the vulnerability of water resources in the Huang-Huai-Hai Basin.
This article draws the following important conclusions. Firstly, the values and levels of WRV in the three river basins decreased during 2000-2015, indicating that the WRV in the three river basins has improved. From the three aspects of water quantity, water quality, and disaster, the improvement of water quality and disaster prevention capacity is obvious in the three basins. Among them, the level of WPVI has increased by two levels in the Huang River Basin, while the level of WDVI has increased by two levels in the Hai River Basin. Second, through the analysis of the causes of the WRV, we found that the water resources situation of the Huang River Basin is better than the other two basins, and the causes of vulnerability are not significant. Furthermore, the critical vulnerability of the Huai River Basin is caused by water quality, while the critical vulnerability of the Hai River Basin is caused by disasters. So, the governance planning should be based on the reasons for the critical vulnerability of the three basins. Thirdly, from the analysis of the scenario prediction of WRV, if the three basins are developed according to scenario 1, the water resource situation will be greatly improved, and the WRV in the Huai and Hai River Basins is expected to reach moderate to low vulnerability (level 3) in 2030, and even reach mild vulnerability (level 2) in the Huang River Basin.
To summary up, this study has following contributions. First and foremost, this study for the first time used the index system method to evaluate the WRV in the Huang-Huai-Hai Basin in a combined study. Specifically, by establishing an evaluation index system, we have examined the WRV from the aspects of water shortage, water pollution, and natural hazards between 2000 and 2015. A number of empirical evidences regarding water resources vulnerability are revealed for this river basin system, which is of great strategic importance in China. Second, based on a reduced-dimension index system, we have trained a RF model that can simulate the evaluation process based on the full index system. This newly trained random forest model has a more concise structure but can retain the evaluation precision as the full model, which provides a more convenient tool for further study to evaluate water resources vulnerability in this region. Third, other than solely assessing the current condition for water resources vulnerability in this region, we also projected how the water resources vulnerability will be changing under three different climate change scenarios. These new findings indicated that the water resources vulnerability may deteriorate in this region under certain climate change conditions, which provide more evidence calling for the efforts to curb climate change.
Influenced by climate conditions, geographical location, and other comprehensive factors, there are many problems in water resources in the Huang-Huai-Hai Basin, and the situation of water resources management is complex and severe. In the future, we will continue to pay attention to the development of WRV in the three river basins as more data come in over time. The specific calculation method of scenario forecast data of each indicator is as follows.

Appendix A.1. Water Production Modulus A 1
This indicator is mainly influenced by natural factors such as climate change. The water production modulus is equal to the total amount of basin water resources divided by the area of the basin.
The total amount of basin water resources is equal to the total annual precipitation multiplied by the coefficient of water production, and the total annual precipitation data could be obtained by the existing research in three different scenarios, which could be found in [36] (p. 145-146). However, the figures of the water yield coefficient change due to the underlying surface change, in which the three basins show different degrees of change. According to the research, the annual water coefficient of the Huai river basin decreased by 0.02 per 10 years, while that of the Huang and the Hai river basins decreased by 0.06 per 10 years.
In the opinion of these methods, the water yield modulus figures of the three basins could be predicted in 2020 and 2030 under three scenarios.

Appendix A.2. Utilization Rate of Groundwater Resources A 6
This indicator is mainly influenced by both climate and social-economic development. First of all, the indicator data of three river basins show no obvious trends in the period 2000-2015. The data will be processed in the following manner. Firstly, the average value of this index in three basins over 16 years was calculated, and the minimum value was found during 2000-2015.
Then, according to the water resources planning goals of 2020 and 2030 of the three basins, and the corresponding realization ratio, the scenario prediction is carried out.
For example, the 16-year average of the Huang River Basin is 0.3632 and the minimum value is 0.3039. So, in 2020 and 2030, scenario 1 is projected by the minimum of 0.3039, and scenario 2 is projected by the average. However, scenario 3 is calculated through multiplying scenario 2 by the coefficient, which assumes that it is calculated on the figure over 10% in scenario 2. The same calculation was used for the remaining two basins.