Downscaling of GRACE-Derived Groundwater Storage Based on the Random Forest Model

: Groundwater is an important part of water storage and one of the important sources of agricultural irrigation, urban living, and industrial water use. The recent launch of Gravity Recovery and Climate Experiment (GRACE) Satellite has provided a new way for studying large-scale water storage. The application of GRACE in local water resources has been greatly limited because of the coarse spatial resolution, and low temporal resolution. Therefore, it is of great signiﬁcance to improve the spatial resolution of groundwater storage for regional water management. Based on the method of random forest (RF), this study combined six hydrological variables, including precipitation, evapotranspiration, runo ﬀ , soil moisture, snow water equivalent, and canopy water to conduct downscaling study, aiming at downscaling the resolution of the total water storage and groundwater storage from 1 ◦ (110 km) and to 0.25 ◦ (approximately 25 km). The results showed that, from the perspective of long time series, the prediction results of the RF model are ideal in the whole research area and the observations wells area. From the perspective of space, the detailed changes of water storage could be captured in greater detail after downscaling. The veriﬁcation results show that, on the monthly scale and annual scale, the correlation between the downscaling results and the observation wells is 0.78 and 0.94, respectively, and they both reach the conﬁdence level of 0.01. Therefore, the RF downscaling model has great potential for predicting groundwater storage.


Introduction
As an important part of water resources, groundwater has a close relationship with human social life. Due to its good water quality, wide distribution, stable and ease to use, it has become an ideal water supply source and it is being widely exploited and utilized by people. In most arid and semi-arid regions of the world, groundwater is the main source of domestic water, agricultural irrigation, and industrial water for local residents. In recent years, with the growth of population and economy, the pressure of groundwater exploitation, and consumption has been greatly increased. The over-exploitation of groundwater storage has caused the groundwater level continue to decline, and the water quality has deteriorated, forming geological disasters, such as underground funnels and ground subsidence. Therefore, it has important significance for regional scale groundwater monitoring and management [1][2][3][4]. Traditional groundwater monitoring mainly relies on observation wells. It is not only costly, time-consuming, and also has sparse and uneven spatial distribution, which makes it difficult to capture the spatial details of groundwater changes and realize real-time monitoring [5]. Research on large-scale changes in water storage is constrained due to the lack of observational well data and the limitation of spatial extent [6]. In recent years, with the emergence of some new earth observation technologies (such as INSAR, GPS, and GRACE Gravity Satellites), they have the The research of many authors showed that machine learning methods are excellent in dealing with complex nonlinear problems and they can be well applied to downscaling studies of hydrological variables [32][33][34]. Groundwater storage (GWS) is an important part of the total water storage (TWS). Therefore, in this study, we propose a downscaling method that is based on machine learning, first down-scaling TWS, and then deducting surface water from downscaled TWS to obtain final downscaling GWS results. The TWS is a complex hydrological variable. From the perspective of vertical stratification, regional water storage is composed of regional canopy water, snow water equivalent, soil moisture, and groundwater reserves [35]. From the perspective of water balance, the change of TWS is a comprehensive reflection of activities, such as precipitation, evapotranspiration, and runoff [36]. Ning et al. [28] used the water balance equation of TWS to carry out the statistical downscaling method, which indicated that TWS is comprehensively affected by precipitation, evapotranspiration, runoff, and other hydrological factors. Yang et al. [37] used linear regression and machine learning models to reconstruct the total water storage anomalies in Northwest China during 1948-2002 while using GRACE and GLDAS products, indicating that TWS is closely related to soil moisture, snow water equivalent, and canopy water. Therefore, in this study, we selected six modeling factors, including precipitation (P), evapotranspiration (ET), runoff, soil moisture (SM), snow water equivalent (SWE) and canopy water (CW), and analyzed four kinds of machine learning methods: such as random forest (RF) [38], support vector machine (SVR) [39], artificial neural network (ANN) [40], and multiple linear regression (MLR) [41] to conduct a downscaling study on the GRACE-derived TWS and GWS in the long time series. When compared with other machine learning methods, the random forest model has the advantages of simultaneous input of multiple variables and it does not easy overfit in predicting regression [32]. However, there are few researchers that have studied downscaling GRACE data while using random forest model.
Since the 1970s, the region has entered a period of low rainfall, and the continuous drought has affected the distribution and utilization of water resources. At the same time, with the rapid development of industry and agriculture and the rapid growth of population, the shortage of water resources has become one of the important factors affecting the development of the region. In recent years, the region has overexploited the groundwater in order to meet the needs of life and production [42]. Therefore, the study of groundwater storage of the region has important practical significance and practical value. Here, we present a random forest downscaling method for improving the spatial resolution of GWS from 1 • to 0.25 • . It has important practical significance and practical value for the study of GWS in the study area, which can be used to study the water storage in the study area and monitor providing data support.

Study Area
The northeast of mainland China in the range of latitude and longitude between 34 • N-39 • N and 112 • E-117 • E was selected as the study area in this research, which includes Shanxi, Shandong, Henan, Hebei, Tianjin, Jiangsu, and Anhui provinces. The study area is densely populated and economically developed. It is one of China's political, economic, and cultural prosperity. At the same time, the region also includes important grain production bases in China [43]. The terrain is relatively flat, with a general trend of gradual decline from west to east, which belongs to the boundary between temperate monsoon climate and subtropical monsoon climate. Mean annual temperature is between 8-15°C and the mean precipitation is range from 500 to 600 mm. There is a prominent contradiction between water supply and demand in the study area [44,45]. Figure 1 shows the geographical location of the study area.

Data Source and Processing
There are mainly four sets of data in this study. The GRACE gravity satellite data provides the total water storage (TWS) and the groundwater storage (GWS), and the global land surface assimilation system (GLDAS) NOAH model data provides runoff, soil moisture, snow water equivalent, and canopy water data. The Global Land Evaporation Amsterdam Model (GLEAM) provides evapotranspiration data and TRMM 3B43 provides precipitation data. In the above data, the spatial resolution of GRACE data is 1°, while the other data all have a resolution of 0.25°, and the temporal resolution is monthly. The study period was 163 months from January 2003 to July 2016.

GRACE TWS
The GRACE satellite was launched on March 17, 2002, which is jointly implemented by NASA and DLR under the NASA Earth System Science Pathfinder Program [46,47]. It was designed to provide effective observation data for studying the changes in the Earth's mass through high-precision time-varying gravitational fields [48][49][50]. There are three institutions that provide GRACE satellite data sets based on different algorithms, which are CSR (Center for Space Research, the University of Texas), GFZ (GeoForschungsZentrum Potsdam), and JPL (Jet Propulsion Laboratory), respectively [49,50]. GRACE data is divided into five different levels: Level-0, Level-1A, Level-1B, Level-2, and Level-3 [27]. In this research, we used level-three data provided by CSR [24] in the form of grid data product, which has undergone a series of processing, such as gaussian smoothing, destriping filter, glacier isostatic adjustment (GIA), and the anomalies relative has been deducted by default to 2004 to 2009 time-mean baseline. The total water storage should be multiplied the scaling factors to restore much of the energy removed by processing [30,37]. Total water storage of this product has a high accuracy and it is suitable to be applied in the research area of this study [49,51].

TRMM Precipitation
NASA and the Japanese space exploration agency (NASDA) developed the Tropical Rainfall Measuring Mission (TRMM). It is designed to measure rainfall in tropical and subtropical regions. It is the world's first satellite equipped with rain measuring radar, which also carries microwave

Data Source and Processing
There are mainly four sets of data in this study. The GRACE gravity satellite data provides the total water storage (TWS) and the groundwater storage (GWS), and the global land surface assimilation system (GLDAS) NOAH model data provides runoff, soil moisture, snow water equivalent, and canopy water data. The Global Land Evaporation Amsterdam Model (GLEAM) provides evapotranspiration data and TRMM 3B43 provides precipitation data. In the above data, the spatial resolution of GRACE data is 1 • , while the other data all have a resolution of 0.25 • , and the temporal resolution is monthly. The study period was 163 months from January 2003 to July 2016.

GRACE TWS
The GRACE satellite was launched on 17 March 2002, which is jointly implemented by NASA and DLR under the NASA Earth System Science Pathfinder Program [46,47]. It was designed to provide effective observation data for studying the changes in the Earth's mass through high-precision time-varying gravitational fields [48][49][50]. There are three institutions that provide GRACE satellite data sets based on different algorithms, which are CSR (Center for Space Research, the University of Texas), GFZ (GeoForschungsZentrum Potsdam), and JPL (Jet Propulsion Laboratory), respectively [49,50]. GRACE data is divided into five different levels: Level-0, Level-1A, Level-1B, Level-2, and Level-3 [27]. In this research, we used level-three data provided by CSR [24] in the form of grid data product, which has undergone a series of processing, such as gaussian smoothing, destriping filter, glacier isostatic adjustment (GIA), and the anomalies relative has been deducted by default to 2004 to 2009 time-mean baseline. The total water storage should be multiplied the scaling factors to restore much of the energy removed by processing [30,37]. Total water storage of this product has a high accuracy and it is suitable to be applied in the research area of this study [49,51].

TRMM Precipitation
NASA and the Japanese space exploration agency (NASDA) developed the Tropical Rainfall Measuring Mission (TRMM). It is designed to measure rainfall in tropical and subtropical regions. It is the world's first satellite equipped with rain measuring radar, which also carries microwave imager, visible light and infrared scanner, cloud and earth radiation energy system, and lightning imaging sensor [52]. This experiment adopted the TRMM 3B43 data products, and the data space coverage is 50 • N-50 • S in latitude, 180 • W, and 180 • E in longitude. The precipitation data were retrieved from Global Land Evaporation Amsterdam Model [53].  [54,55]. It is a set of algorithms that separately estimate the different components of land evaporation. The algorithm calculates five components of evapotranspiration (ET), including vegetation transpiration (ET), canopy interception evaporation (Ei), bare soil evaporation (Eb), snow sublimation evaporation (Es), and water surface evaporation (Ew). The spatial resolution is 0.25 • × 0.25 • , and the temporal resolution is divided into three types: day, month, and year, with the time span from 1980 to now. This study we select the monthly product data of v3.3a version, and this data were obtained from Global Land Evaporation Amsterdam Model [56].

GLDAS Data
The Goddard Space Flight Center (GSFC) provides Global Land Data Assimilation Systems (GLDAS) [57]. The GLDAS land surface assimilation system includes three land surface process models (CLM, Mosaic, NOAH) and a hydrological model (VIC), and it mainly includes two versions of data sets (GLDAS-1 and GLDAS-2). Among them, GLDAS-1 provides four sets of models to simulate global runoff, evapotranspiration, soil moisture, snow water equivalent, and canopy water, and other hydrological variables. The spatial resolution is 1 • × 1 • , the time resolution has 3 h and monthly scale, and the data set of NOAH model from 2000 to now is 0.25 • × 0.25 • . While GLDAS-2 only has NOAH model, provided data from 1948 to 2010, with spatial resolution of 1 • × 1 • and 0.25 • × 0.25 • . In this study, we mainly use the data of GLDAS-1 NOAH model with the 0.25 • × 0.25 • spatial resolution, which can be found in the Goddard Earth Sciences Data and Information Services Center (GES DISC) [58].

Ground-Based Measurements
In this study, the monthly groundwater level data of 57 observation wells in Handan, Hebei province from January 2006 to December 2015 were collected, which was provided by the groundwater monitoring bureau of Handan. These observation wells are densely distributed in Handan, so they can accurately verify the changes of underground water level in this area. Figure 2 shows the locations of observation wells.

Random Forest Model
Breiman first proposed the Random Forest (RF) model in 2001 [38] as a new integrated machine learning method consisting of multiple classification and regression decision trees CART (Classification and Regression Trees) combination, which is mainly applied to regression and classification. The steps for modeling are as follows: firstly, in the original input data, the Bootstrap method was used to randomly extract multiple samples for training the model, and the unextracted data were used as data out of the bag to test the accuracy of the model. Subsequently, a decision tree is constructed for each training sample, and all of the sample decision trees are combined into random forest model. The RF results are the average of the prediction results of each decision tree. With its advantages in regression and classification, the RF model has been widely used in remote  and Regression Trees) combination, which is mainly applied to regression and classification. The steps for modeling are as follows: firstly, in the original input data, the Bootstrap method was used to randomly extract multiple samples for training the model, and the unextracted data were used as data out of the bag to test the accuracy of the model. Subsequently, a decision tree is constructed for each training sample, and all of the sample decision trees are combined into random forest model. The RF results are the average of the prediction results of each decision tree. With its advantages in regression and classification, the RF model has been widely used in remote sensing and hydrology [32,59,60]. For a regression problem, the RF model has some unique advantages, as following: (1) it can input thousands of variables and it is not easy to overfitting; (2) it has high accuracy and it can run efficiently on large datasets; (3) it can detect the interaction and importance of variables; and, (4) it provided an effective method of estimating missing data and maintaining accuracy. The RF model has been widely used in established regression problem because of the above advantages, and good prediction results have been obtained [32].

Support Vector Machine Model
Vapnik originally proposed the Support Vector Machine (SVM) model [39] as a classification machine learning algorithm, which was later derived from regression algorithms. The theoretical basis is Vapnik Chervonenks Dimension (VC) and Structural Risk Minimized Inductive Principle. The optimization based on training error and minimizing the confidence range value can effectively solve the nonlinear regression problem. Jing et al. [56] described the details of the SVM method.

Artificial Neural Network
The artificial neural network (ANN) model [40] is a non-linear information processing system that is based on the abstraction, simplification, and simulation of the human brain, which is widely connected by a large number of neurons. ANN can realize multi-dimensional nonlinear precise mapping between the input layer and the output layer without setting up any mathematical calculation model between the two. In this study, we used a multi-layer feed-forward perceptron (MLP). A typical MLP consists of an input layer, an output layer, and one or more hidden layers. Each layer of MLP includes one or more neurons that are directionally linked to the neurons of the previous layer and the next layer.

Multiple Linear Regression
In a multi-factor geographic system, multiple (more than two) elements also have an interactive and interrelated linear regression relationship, which is called multiple linear regression (MLR). Nie et al. [61] found that some of the hydrological variables of GLDAS had a good linear relationship with the GRACE-derived TWS. Hence, in this study, we tried to use the MLR approach to compare the quality of the machine learning methods.
It is appropriate to select the random forest mode, given that the total storage is a complex hydrological variable, which is correlated with multiple variables, and that this study has a large number of training data.

Downscaling Model Design
Random forest downscaling is a nonlinear statistical downscaling method. In this paper, we choose random forest model to downscale the TWS and GWS from 1 • to 0.25 • . The basic idea of th random forest downscaling model established the statistical relationship between TWS and hydrological variables (precipitation, evapotranspiration, runoff, soil moisture, snow water equivalent, and canopy water). We used the TWS as the dependent variable, and the hydrological variables as the independent variables. First, we established the relationship between TWS and hydrological variables at a spatial resolution of 1 • and then applied it to the hydrological variables at a spatial resolution of 0.25 • to obtain high resolution TWS. Figure 3 shows the flowchart of the downscaling process. choose random forest model to downscale the TWS and GWS from 1° to 0.25°. The basic idea of th random forest downscaling model established the statistical relationship between TWS and hydrological variables (precipitation, evapotranspiration, runoff, soil moisture, snow water equivalent, and canopy water). We used the TWS as the dependent variable, and the hydrological variables as the independent variables. First, we established the relationship between TWS and hydrological variables at a spatial resolution of 1° and then applied it to the hydrological variables at a spatial resolution of 0.25° to obtain high resolution TWS. Figure 3 shows the flowchart of the downscaling process.

Data Comparison and Error Analysis
In this paper, four indices are used to evaluate the downscaling TWS result. The testing data and its model prediction results are involved in the calculation. Correlation coefficient R, root mean square error (RMSE), Nash efficiency coefficient (NSE), and mean absolute error (MAE) are the four evaluation indices. The detailed description and calculation formula are as follows:

Correlation Coefficient (R)
The correlation coefficient refers to the degree of linear relationship between two variables. For the model construction, the higher the correlation coefficient R between validation data and simulation data, the better the model accuracy.
where Xi and Yi represent two independent data sets with the mean value of X and Y. Xi represents the input total water storage, Yi represents the simulated value of RF model, and N is the total number of samples.

Root Mean Squared Error (RMSE)
Root-mean-square error is also called the standard error and it is the deviation between the real value and the predicted value of the model. The smaller the value, the closer the predicted value is to the real value, that is, the smaller the RMSE is, the higher the accuracy of the model.

Nash-Sutcliffe Efficiency Coefficient (NSE)
The Nash efficiency coefficient is generally used to test the estimated results of hydrological models, and in this paper, we used to test the accuracy of the random forest model. The value range of NSE is (−∞, 1). The closer NSE is to 1, the better the modeling accuracy and the higher the reliability of the model. The closer NSE is to 0, the closer the predicted value is to the average value of the real value, which means that the overall result is credible, but the process simulation error is relatively large. If NSE is much less than 0, then the model is less credible.

Mean Absolute Error (MAE)
The mean absolute error represents the average of the absolute value of the deviation of all single simulated values from the arithmetic mean, which can avoid the problem of the error canceling each other, so it can accurately reflect the actual prediction error.

Significance Test
In this study, we used the t-test as the significance test of the correlation between two data sets. At first, we should compute the correlation coefficient (R) of the time series and construct the hypothesis test. Finding a statistics book that has a table of critical values of R is the easiest way to test this hypothesis. The specified Significance level a is usually set at 0.05 (5%) or 0.01 (1%). We compare the threshold of correlation coefficient with the calculated correlation coefficient. The results can be divided into three situations: (1) |R| < R 0.05 , v means that the correlation is not significant; (2) when R 0.05 , v≤|, R| < R 0.01 , v represents the level of correlation significant α = 0.05; and, (3) when |R|≥R 0.01 , v represents a level of significant correlation α = 0.01.

Accuracy Estimation of Random Forest Model
The original input data was divided into two parts by the method of random secondary sampling, in which 70% of the training data and the remaining 30% were taken as verification data, in order to verify the rationality of the random forest model. With the above six variables as input, four machine Remote Sens. 2019, 11, 2979 9 of 19 learning prediction models were established to predict the TWS. Figure 4 shows the comparison between the predicted results and the original GRACE-derived TWS.

Accuracy Estimation of Random Forest Model
The original input data was divided into two parts by the method of random secondary sampling, in which 70% of the training data and the remaining 30% were taken as verification data, in order to verify the rationality of the random forest model. With the above six variables as input, four machine learning prediction models were established to predict the TWS. Figure 4 shows the comparison between the predicted results and the original GRACE-derived TWS. In Figure 4, the x-coordinate represents the change value of GRACE-derived TWS in the verified data, and the y-coordinate represents the change value of predicted TWS after introducing the independent variable value of the verified data in the random forest model. From the scatter plots in Figure 4, the RF model has the highest correlation of 0.83 and the lowest RMSE of 53.75 mm, respectively, followed by the ANN model with the correlation coefficient of 0.81, slightly lower than the RF model. The results of the support vector machine and multiple linear regression are slightly worse. From Table 1, the indicators of random forests are superior to other models, followed by artificial neural networks, and the results of SVR and MLR are poor. From the above results, it can be found the RF downscaling model is the best, which can meet the requirements of variable prediction. The predicted results, together with the residuals, can accurately predict high-resolution changes in TWS and GWS. Therefore, we only analyze the RF results in the following discussion.

Time Series and Spatial Accuracy
The change trend of TWS and GWS in the study area is almost the same, and they are in a state of decline on the whole, which is consistent with the fact that GWS is the main component of TWS, which indicated that the results of TWS and GWS retrieved by GRACE are reliable. Figure 5 is the time series variation diagram of TWS GWS before and after downscaling, with spatial resolutions of 1 • and 0.25 • , respectively. It can be seen from the Figure 5 that the variation amplitude and value of TWS and GWS in the study area before and after downscaling are consistent, which indicated that the downscaling method based on random forest is reasonable in this study. Figure 5a shows the change of TWS in the whole study area before and after downscaling, with the correlation coefficient R 2 reaching 0.98. The fitting effect is the best. Figure 5b shows the change of GWS in the whole study area before and after downscaling, with the correlation coefficient R 2 reaching 0.97. Figure 5c shows the changes of TWS in the verified area in Handan, where there are 57 observations wells before and after downscaling, with the correlation coefficient R 2 reaching 0.96. Figure 5d shows the changes of GWS in the verified area in Handan, before and after downscaling, with the correlation coefficient R 2 reaching 0.93. In general, the downscaling result of TWS is better than that of GWS, which might be related to the direct downscaling result of TWS, while GWS are indirectly calculated. However, the downscaling results of random forests are credible from both the whole region and the local region. The spatial resolution after downscaling is higher and the spatial variation can be described in more detail. We compared the spatial distribution of TWS and GWS before and after downscaling in order to analyze the consistency of spatial distribution before and after downscaling of the random forest model. As shown in Figure 6, we take the changes in TWS and GWS in February 2003 as an example. From the perspective of spatial variation, the TWS and GWS maintain good consistency both before and after downscaling. Among them, the TWS shows a decreasing trend from north to south. The region with the largest water storage is between 38° N and 39° N, and both of the images are We compared the spatial distribution of TWS and GWS before and after downscaling in order to analyze the consistency of spatial distribution before and after downscaling of the random forest model. As shown in Figure 6, we take the changes in TWS and GWS in February 2003 as an example. From the perspective of spatial variation, the TWS and GWS maintain good consistency both before and after downscaling. Among them, the TWS shows a decreasing trend from north to south. The region with the largest water storage is between 38 • N and 39 • N, and both of the images are obvious. The change of the maximum value before and after downscaling indicates that the downscaling TWS can more carefully depict their spatial changes. Similarly, the spatial variation trend of groundwater before and after downscaling is highly consistent. According to the above results, the downscaling method that is based on random forest model is effective. We once again compared the interannual spatial variation rules of TWS and GWS, taking 2014 as an example to further illustrate the reliability of downscaling results, as shown in Figure 7. In 2014, the TWS and GWS in the study area showed similar changes on the whole in space, but there were also certain differences. It is embodied in the region from 34° N to 35° N, which might be related to the change of soil moisture water in this region. However, the spatial changes before and after downscaling are basically the same. It shows that the downscaling model that is constructed by random forest method is feasible, which can not only predict the change of water storage in the study area, but also depict the change trend of water storage with higher spatial resolution. We once again compared the interannual spatial variation rules of TWS and GWS, taking 2014 as an example to further illustrate the reliability of downscaling results, as shown in Figure 7. In 2014, the TWS and GWS in the study area showed similar changes on the whole in space, but there were also certain differences. It is embodied in the region from 34 • N to 35 • N, which might be related to the change of soil moisture water in this region. However, the spatial changes before and after downscaling are basically the same. It shows that the downscaling model that is constructed by random forest method is feasible, which can not only predict the change of water storage in the study area, but also depict the change trend of water storage with higher spatial resolution.

Testing and Spatial Accuracy
We compared the measured water level with the GRACE-derived GWS and the downscaling GWS in order to further explore the reliability of downscaling results. With the correlation coefficient R as the criterion, the higher R is, the more accurate the result of downscaling. Figure 8, and 9 show the correlation between the measured water level and GWS in Handan area. The verified data are, respectively, the average of 57 measured sites on the monthly scale, the average of the GRACE-derived GWS, and the downscaling GWS in the grid (as shown in Figure 6 the inner region of the red rectangle). The results show that, on the monthly scale, the correlation between downscaling results and measured water level is 0.78, and on the annual scale is 0.94, both of which reach the confidence level of 0.01. The correlation coefficients between GRACE-derived GWS and measured water level are 0.79 and 0.96, respectively. Therefore, the random forest downscaling model not only improves the spatial resolution of data, but also ensures the accuracy of data.

Testing and Spatial Accuracy
We compared the measured water level with the GRACE-derived GWS and the downscaling GWS in order to further explore the reliability of downscaling results. With the correlation coefficient R as the criterion, the higher R is, the more accurate the result of downscaling. Figures 8 and 9 show the correlation between the measured water level and GWS in Handan area. The verified data are, respectively, the average of 57 measured sites on the monthly scale, the average of the GRACE-derived GWS, and the downscaling GWS in the grid (as shown in Figure 6 the inner region of the red rectangle). The results show that, on the monthly scale, the correlation between downscaling results and measured water level is 0.78, and on the annual scale is 0.94, both of which reach the confidence level of 0.01. The correlation coefficients between GRACE-derived GWS and measured water level are 0.79 and 0.96, respectively. Therefore, the random forest downscaling model not only improves the spatial resolution of data, but also ensures the accuracy of data. Remote Sens. 2019, 11, x FOR PEER REVIEW 14 of 20

Discussion
Overall, high-resolution hydrological products are a hot topic in future research. We have downscaled GWS from 1° to 0.25° based on the downscaling model of random forest, which provided a possibility for further research on GRACE data. The results showed that the RF

Discussion
Overall, high-resolution hydrological products are a hot topic in future research. We have downscaled GWS from 1 • to 0.25 • based on the downscaling model of random forest, which provided a possibility for further research on GRACE data. The results showed that the RF downscaling model could be well applied to the study area. The verification results of groundwater level data of only 57 observation wells were used in this study due to the limitation of the measured data. The observation wells were distributed in two grids, weakening the effect of spatial heterogeneity and making the verification results more reliable. If we could get more measured data, then the results might be more reliable. Handan is located in the southern part of Hebei Province, and its topography descends stepwise from west to east, and the landform types are complex and diverse. It belongs to the East Asian monsoon climate zone and it is a continental monsoon climate that transitions from warm temperate semi-humid regions to semi-arid regions. From the verification results, the RF downscaling model is suitable for continental monsoon climate zone, but whether this method is applicable to other climatic regions needs further study. However, in this study, we have provided suitability of RF method in downscaling TWS and GWS, which could be used in other regions.
The use of satellite data can improve the applicability of data-scarce areas with limited hydrological variables, such as groundwater observation well data with uneven spatial and temporal distribution. However, these data sources also bring significant uncertainty. For example, GRACE data has its own uncertainty in data processing, and other input data has contributed to uncertainty and potential error propagation of the total input data [30]. In this research, we chose CSR level-three data products. We multiplied scaling coefficients at 1 • location to restore much of the energy removed by the destriping, gaussian, and degree 60 filters to the land grids, but it may be uncertain in hydrology models at inter-annual. In subsequent research, readers can choose different data, such as using an advanced mass concentration ('mascon') data [63] and comparing the results of multiple data to reduce data uncertainty. At the same time, unavoidable systematic uncertainty in the training models of machine learning still remains. We compare the results of several common machine learning models (e.g., RF model, MLR model, SVR model, and ANN model). From the results, it can be found that the result of RF model is best. It shows that the random forest model is not easy overfitting and noise immunity ability is better than other machine learning models. When compared with other machine learning models, the RF downscaling model mitigated these errors, because it can handle errors associated with input data.
The advantage of the RF downscaling model is that we can simultaneously input multiple variables, which increases the flexibility of the model. At the same time, the input variables are independent of each other in the model and we can select different variable factors of the different study area. From the view of regional water management, they pay more attention to water storage at small regional scales than the water storage of GRACE-derived, and the downscaling random forest model can effectively solve the problem of current coarse resolution data, which provide more detailed information for water decision makers. The results of random forests are the best in the four machine learning downscaling models, followed by artificial neural networks. The worst result of the support vector machine might be that the data anti-noise effect of the model is not as good as the firs at two. Therefore, the reader can further experiment in the RF model and the ANN model; at the same time, I think that the method can be improved in the following aspects in the future study: (1) Syed et al. [12] pointed out that the zonal average indicates that precipitation in the tropics dominates the TWSC, evapotranspiration is the most effective in the mid-latitudes, and snowmelt runoff is the main dissipation flux in the high latitudes. Therefore, in subsequent experiments, according to the latitude of the study area, remote sensing products with higher precision and finer resolution can be selected for research, and the impact of local economic and demographic conditions on water reserves needs to be properly considered. (2) From the global perspective, it can be divided into arid regions, semi-arid regions, and humid regions. According to the different characteristics of the study area, other variables, such as temperature, NDVI, DEM, and so on, can be appropriately added. (3) For regional scale research, we can select higher resolution remote sensing products to meet the requirements of regional water managers.

Conclusions
With the launch and application of GRACE gravity satellite, it provides a new research method for exploring large-scale groundwater changes. However, with the progress of human society, the change of low-resolution water storage is increasingly difficult to meet people's requirements and demand. The need for detailed changes in groundwater is becoming more and more urgent. In this study, we tried to use four common machine learning methods to conduct downscaling study. From the results, the RF downscaling model has the highest accuracy, and the ANN model is second. Therefore, we proposed a downscaling method that was based on random forest to comprehensively consider the influence of multiple hydrological variables on TWS. We successfully downscaled the TWS from 1 • to 0.25 • and calculated the high-resolution GWS.
From the perspective of long time series, this downscaling method could predict the changes of regional water storage well. In both the whole study region and in the small local area, the prediction results of the model are ideal. The error downscaling result is very small, and the amplitude and phase of water storage before and after downscaling is consistent. From the perspective of spatial variation, the downscaling data can effectively capture the subgrid heterogeneity of TWS and GWS, which cannot be realized in the original scale. In addition, the method has a low cost and it can be easily used to detect changes in local and small water storage.
From the verification results of measured water level, the measured groundwater level data is in good agreement with the downscaling groundwater data. On the monthly and annual scales, the correlation coefficients reached 0.78 and 0.93, respectively, and they passed the confidence level of 0.01, where it is 0.79 and 0.96 before downscaling. It is shown that the method can not only improve the spatial resolution of the data effectively, but also ensure the accuracy of the data.