Improving the Spatial Resolution of GRACE-Derived Terrestrial Water Storage Changes in Small Areas Using the Machine Learning Spatial Downscaling Method

: Gravity Recovery and Climate Experiment (GRACE) satellites can effectively monitor terrestrial water storage (TWS) changes in large-scale areas. However, due to the coarse resolution of GRACE products, there is still a large number of deﬁciencies that need to be considered when investigating TWS changes in small-scale areas. Hence, it is necessary to downscale the GRACE products with a coarse resolution. First, in order to solve this problem, the present study employs modeling windows of different sizes (Window Size, WS) combined with multiple machine learning algorithms to develop a new machine learning spatial downscaling method (MLSDM) in the spatial dimension. Second, The MLSDM is used to improve the spatial resolution of GRACE observations from 0.5 ◦ to 0.25 ◦ , which is applied to Guantao County. The present study has veriﬁed the downscaling accuracy of the model developed through the combination of WS3, WS5, WS7, and WS9 and jointed with Random Forest (RF), Extra Tree Regressor (ETR), Adaptive Boosting Regressor (ABR), and Gradient Boosting Regressor (GBR) algorithms. The analysis shows that the accuracy of each combined model is improved after adding the residuals to the high-resolution downscaled results. In each modeling window, the accuracy of RF is better than that of ETR, ABR, and GBR. Additionally, compared to the changes in the TWS time series that are derived by the model before and after downscaling, the results indicate that the downscaling accuracy of WS5 is slightly more superior compared to that of WS3, WS7, and WS9. Third, the spatial resolution of the GRACE data was increased from 0.5 ◦ to 0.05 ◦ by integrating the WS5 and RF algorithm. The results are as follows: (1) The TWS (GWS) changes before and after downscaling are consistent, decreasing at − 20.86 mm/yr and − 21.79 mm/yr ( − 14.53 mm/yr and − 15.46 mm/yr), respectively, and the Nash–Sutcliffe efﬁciency coefﬁcient ( NSE ) and correlation coefﬁcient ( CC ) values of both are above 0.99 (0.98). (2) The CC between the 80% deep groundwater well data and the downscaled GWS changes are above 0.70. Overall, the MLSDM can not only effectively improve the spatial resolution of GRACE products but also can preserve the spatial distribution of the original signal, which can provide a reference scheme for research focusing on the downscaling of GRACE products.


Introduction
Water resources are some of the most indispensable material resources for human survival and social development.As the main component of freshwater resources, groundwater is essential for industrial production, agricultural irrigation, and domestic water [1,2].Affected by climate change, groundwater overexploitation, and lack of effective replenishment, groundwater has been severely depleted in many regions [3][4][5].Therefore, an accurate estimation of groundwater storage (GWS) changes is necessary for the effective management of water resources and to be able to ensure food security.In particular, irregular changes in factors such as the global surface temperature, rainfall, and evapotranspiration affect water storage balance, leading to droughts, floods, and other disasters [6,7].This requires some reliable methods that can be used to monitor changes in terrestrial water storage (TWS) and GWS.However, it takes a lot of manpower and material resources to monitor GWS changes through traditional methods such as large-scale monitoring networks.In addition, the use of sparsely and unevenly distributed groundwater observation wells makes it difficult to continuously estimate GWS changes in large-scale areas [8].Hence, many attempts have been made to improve the spatiotemporal characteristics of GWS changes in recent years, such as hydrological models and satellite remote sensing.
The GRACE satellite, which was successfully launched in 2002, provides an unprecedented approach for the time-continuous, large-scale regional monitoring of TWS changes [9].Previous studies have demonstrated that GRACE observations can estimate mass changes with an accuracy of 1 to 1.5 cm equivalent water height (EWH) over large-scale areas (~20,000 km 2 ) [10][11][12].At present, GRACE products have important applications in the study of global TWS changes [13,14], droughts and floods [15,16], glacier melting [17,18], and so on.GRACE-derived TWS changes contain different water storage components, e.g., soil moisture, groundwater, snow mass, and canopy water.According to the water storage balance equation, the corresponding components are subtracted from the TWS to estimate regional GWS changes [19,20].Through the integration of hydrological simulations, the GRACE satellites have been widely used to analyze the TWS and GWS changes in large-scale areas, such as the Amazon basin [21,22], the Yangtze River basin [23,24], northwestern India [25][26][27], the North China Plain [4,28,29], and California's Central Valley [5,[30][31][32].Nevertheless, it is challenging to employ GRACE data to profoundly investigate TWS changes in small-scale areas due to the coarse image resolution.Therefore, it is essential to obtain high-resolution GRACE data to analyze changes in terrestrial water and groundwater storage in small-scale areas.
At present, the downscaling of coarse-resolution products is an effective approach that can be used to obtain high-resolution products and mainly include dynamical and statistical downscaling [33,34].The computational process of dynamical downscaling is complicated and requires extensive computation times and resources [35].Therefore, it is difficult to implement under normal working conditions, limiting the promotion of this method.Correspondingly, the statistical downscaling method establishes regression models between low-resolution target data and variable data and then inputs high-resolution variable data into the regression model to output high-resolution target data [36].The statistical downscaling method is easy to implement, and the downscaled results can satisfy the accuracy requirements.Therefore, this method has been widely used to obtain highresolution hydrological and climate data, such as those pertaining to soil moisture [37][38][39], precipitation [40][41][42], temperature [40,43], and evapotranspiration [44,45].
Many studies attempting to obtain high-resolution GRACE data based on statistical downscaling methods have been conducted.Several earlier studies downscaled GRACE data in different regions through empirical regression relationships and achieved reasonable performance [46][47][48].For example, Ning et al. (2014) improved the spatial resolution of GRCAE products in Yunnan Province to 0.25 • through the statistical regression downscaling method [47].Specifically, because machine learning algorithms can solve nonlinear regression problems and can simulate complex hydrological processes, these methods are widely utilized in GRACE data downscaling research.For instance, Miro et al. (2018) estimated high-resolution GWS changes from the groundwater basins in the San Joaquin Valley in California using an artificial neural network (ANN) [49].Seyoum et al. (2019) trained a Boosted Regression Tree (BRT) model to estimate groundwater storage anomalies (GWSA) in a ~150,000 km 2 Glacial Aquifer System in the state of Illinois based on terrestrial water storage anomalies (TWSA) and other land surface and hydro-climatic variables [50].Milewski et al. (2019) downscaled GRACE TWSA products based on the BRT algorithm and obtained groundwater level anomaly maps at a 5 km resolution over the Upper Floridan Aquifer in Georgia [51].Sahour et al. (2020) introduced three machine learning algorithms to downscale TWSA across the Lower Peninsula of Michigan from a coarse-scale (13,700-33,100 km 2 ) resolution to a fine-scale (0.125 • × 0.125 • or 120 km 2 ) resolution [52].
Currently, several studies have established downscaling models in terms of temporal dimensions, which are mainly used to analyze large-scale regional TWS changes.However, few studies have established downscaling models in spatial dimensions and applied them to analyze TWS changes in small-scale regions.Unlike previous studies, the new machine learning spatial downscaling method (MLSDM) proposes the cross-combining of multiple machine learning algorithms with differently sized modeling windows in spatial dimensions; the algorithm and the size of the modeling windows are optimized according to the training results.Moreover, this study analyzes the consistency of TWS changes before and after downscaling using root mean square error (RMSE), mean absolute error (MAE), the Nash-Sutcliffe efficiency coefficient (NSE), and the correlation coefficient (CC).Finally, the in situ groundwater well observations are used to analyze the reliability of the downscaled GWSA, providing a certain reference basis for regional groundwater management.

The MLSDM for Spatial Downscaling
Generally, a great deal of research focusing on GRACE data downscaling aim to establish the downscaling models of each grid in the temporal dimension (Equation (1)).First, this method establishes the regression relationship between the TWSA and hydrological variables in a specific grid (such as the red grid) (Figure 1a).Then, it gradually establishes the downscaling models of all of the grids through iteration.In this method, the signals that are produced before and after downscaling have good consistency in the spatial distribution.Still, the downscaled signal has a relatively obvious boundary between the grids and does not conform to the characteristics of continuously changing hydrological signals.Hence, some studies have tried to establish downscaling models for GRACE data in the spatial dimension.Figure 1b shows that this method first establishes the regression relationship between the TWSA and hydrological variables in a specific month (such as M1) (Equation ( 1)).It gradually establishes downscaling models for all of the monthly data through iteration.However, these studies establish spatial downscaling models for the entire region and lack the analysis of the impact of the size of the modeling window on the downscaling accuracy.
This study establishes downscaling models that are based on the spatial dimension and discusses the influence of the size of the modelling window on the downscaling accuracy of the GRACE data, as shown in Figure 1c and Equation (2).In addition, the accuracy of the RF, ETR, ABR, and GBR algorithms for GRACE data downscaling has also been analyzed and compared.The TWS changes are closely related to the hydrological changes in the surrounding area.Hence, an in-depth study on how large an area to establish spatial dimensional downscaling models to better detect the hydrological change process in small-scale areas needs to be is worthy of attention.According to the relationship between TWSA and the hydrological variables, some studies establish downscaling models in the temporal dimension or in the spatial dimension.The equation is as follows: . . .

Study Area
Guantao County and its surrounding region (115 • 06 -115 • 40 E, 36 • 27 -36 • 47 N) is located in the southern Hebei Province, China, and has an area of approximately 456 km 2 [53], as shown in Figure 2. The terrain of the study area slopes from the southwest to the northeast and is generally flat, with an elevation of about 43 m in the south and 36 m in the north (Figure 2).Additionally, Guantao County is in a typical warm-temperate semi-humid continental monsoon climate area that receives plenty of sunshine.The annual average temperature is 14 • C, and the average temperatures of the coldest month (January) and the hottest month (July) are −2.5 • C and 27 • C, respectively [53].Due to the lack of rational utilization and effective management of water resources, the groundwater in Guantao County is seriously over-exploited.Generally, the depletion of groundwater has induced a series of ecological and geological problems (e.g., land subsidence, ground fissures, soil salinization) that affect the sustainable economic and social development of the region.Thus, it is crucial to analyze water resources storage changes in Guantao County.

Materials
The research data used in this study include CSR RL06 Mascon, GLDAS, MODIS, TEMP, and in situ observations, as shown in Table 1.Previous studies have revealed that these climate variables are closely related to TWS changes [50,52].

GRACE Solutions
The GRACE satellites were jointly developed by the National Aeronautics and Space Administration (NASA) and Deutsches Zentrum für Luft-und Raumfahrt (DLR).They were successfully launched in March 2002 and ended their mission in October 2017.Currently, GRACE solutions are mainly divided into two categories: Spherical Harmonic Coefficient (SHC) and Mass Concentration solutions (Mascon).The high-degree and high-order coefficients of SHC products are dominated by high-frequency noise and correlated errors in the north-south direction.Therefore, when applying SHC to estimate global TWS changes, data post-processing such as filtering and de-correlated error algorithms are required [54,55].Since pre-processing methods, such as degree 1 corrections, C20 (degree 2 order 0) replacement, and the glacial isostatic adjustment (GIA) correction, have been applied to Mascon solutions, post-processing is no longer required when using them.For more detailed information, please refer to the literature [56].Due to the regularization constraints, Mascon solutions have a higher signal-to-noise ratio than SHC products [57,58].
Hence, this study will analyze the regional TWS changes based on the CSR RL06 Mascon (CSR-M06) that was released by the Center for Space Research (CSR).CSR-M06 has a spatiotemporal resolution of 0.25 • × 0.25 • , and the monthly average values from January 2004 to December 2009 have been deducted.

GLDAS Model
The Global Land Data Assimilation System (GLDAS) was jointly developed by the Goddard Space Flight Center (GSFC) and the National Centers for Environmental Prediction (NCEP) [59].Using satellite and surface observation data as the primary material, GLDAS outputs global surface state and flux data based on advanced land surface models and data assimilation methods.Currently, GLDAS products include four land surface models: Noah, Mosaic, CLM, and VIC.This study applies the soil moisture from the Noah, which has spatial resolutions of 0.25 • × 0.25 • and temporal resolutions of one month, including four different soil moisture depths (SM) (i.e., 0~10 cm, 10~40 cm, 40~100 cm, and 100~200 cm).Since the spatial resolution cannot satisfy the requirements, we resampled the resolution of the GLDAS products to 0.05 • .

MODIS Data
The Moderate-resolution Imaging Spectroradiometer (MODIS), which is mounted on the Terra and Aqua satellites, is an essential instrument in the US Earth Observation System (EOS), and it is mainly used to detect the global biological and physical processes.Based on the Penman-Monteith equation and Mu's improved ET algorithm, the MODIS-MOD16 product uses daily meteorological reanalysis data, land cover, albedo, leaf area index, and enhanced vegetation index as input variables to calculate global land surface evaporation [60][61][62].The datasets include global evapotranspiration (ET), latent heat flux (LE), potential evapotranspiration (PET), and potential latent heat flux (PLE).The spatial resolution of the MODIS-MOD16 product is 500 m, and its temporal resolution is eight days (MOD16A2) and one year (MOD16A3).This study extracted ET components from the MOD16A2 products, and the 8-day results were averaged to obtain monthly temporal resolution products.The day and night land surface temperature (LST_day and LST_night) with a spatial resolution of 0.05 • × 0.05 • (~5.6 × 5.6 km) and the temporal resolution of one month were obtained from the MODIS-MOD11C3 product (version 6) [63].

TEMP Data
The TEMP is China's monthly average temperature dataset that is released at the National Qinghai-Tibet Plateau Science Data Center (TPDC) and has a spatial resolution of about 1 km and a period from January 1901 to December 2017.According to the global 0.5 • climate data released by CRU and the global high-resolution climate products released by WorldClim, this dataset was downscaled in China through the Delta spatial downscaling method.The observation data of 496 independent weather stations were applied to verify the dataset, which performed well [64,65].

In Situ Observations
The measured data from 21 groundwater wells were collected from the Guantao County Groundwater Monitoring and Management Department to verify the high-resolution GWS results after downscaling.These include shallow groundwater wells and deep groundwater wells.In Figure 2, the black dots show the spatial distribution of the groundwater wells.The groundwater level (GWL) changes can be obtained by subtracting the groundwater depth from the elevation of the groundwater well.The groundwater level anomalies (GWLA) can be obtained by subtracting its own average value from the GWL changes, which can be compared with the GRACE-derived GWSA.

Machine Learning Algorithms
The ensemble learning algorithm, a research hotspot of machine learning, integrates the training results of multiple "weak learners" to form a "strong learner".This can improve the generalization ability of the model, thereby improving the prediction accuracy [66,67].The ensemble learning algorithm has higher accuracy, robustness, and flexibility than the single learning algorithm, and it mainly includes the bagging and boosting algorithm.
The idea of bagging is to form sub-training samples by extracting data from the training dataset according to a specified ratio, and the training results of the sub-training samples are averaged as the final prediction result.Random Forest (RF) [68] and Extra Trees Regressor (ETR) [69] are the representative bagging algorithms.RF, which was first proposed by Breiman in 2001, is an ensemble machine learning algorithm that has demonstrated superior performance [68] and is a popular tool for classification and regression.The randomness in the RF is mainly manifested in two aspects: on the one hand, the same amount of samples is randomly selected from the original training data as training samples; on the other hand, partial features are randomly selected to construct decision trees.This randomness creates a low correlation between the different decision trees, which can improve the robustness and accuracy of the model [68].ETR, which was developed based on the RF, is a relatively novel ensemble machine learning algorithm.Compared to RF, the ETR splits the descriptors at the node entirely randomly, and each tree is trained with the entire dataset instead of sampling.
Unlike the bagging, boosting is an iterative algorithm that trains the same training samples multiple times.The next iteration adjusts the sample weight according to the previous training dataset and outputs the final result after meeting the requirements.The Adaptive Boosting Regressor (ABR) [70] is a representative boosting algorithm.The ABR algorithm first assigns equal weight to each sample, and the next iteration uses the error of the previous weak learner to update the weight of the sample.Thus, the iteration terminates after meeting the accuracy requirements.The Gradient Boosting Regressor (GBR) is also designed based on boosting and is often used for comparisons with the ABR [71].Different from the iterative conditions of the ABR algorithm, the GBR algorithm rebuilds the model in the gradient descent direction of the loss function of the previous iteration.Generally, the smaller the loss function, the better the model performance.We can improve the model performance of by decreasing the loss function along the gradient direction.The ensemble learning algorithm is widely used in the field of earth science because of its strong ability to deal with nonlinear problems.In the research focusing on the downscaling of GRACE data, RF and GBR have more applications, but there are few discussions about ETR and ABR.Therefore, this study utilizes RF, ETR, ABR, and GBR to downscale GRACE data and selects the best performing algorithm for analysis in the next section.These four machine learning algorithms can be called in Python's scikit-learn function library [72].

Data Processing Flow
The data processing in the research aims to set up modeling windows, optimize the combined model, obtain high-resolution TWSA and GWSA, and use in situ groundwater well observations to verify the downscaled results.The downscaling process is shown in Figure 3.The red arrow represents the input data, the blue arrow represents the output downscaling models and results, and the green arrow represents the validation data: (1) Firstly, to analyze the influence of the modeling window on the downscaled results, we set up modeling windows with the sizes of 3 × 3 (WS3), 5 × 5 (WS5), 7 × 7 (WS7), and 9 × 9 (WS9) based on a 0.5 • grid with Guantao County as the center (Figure 1c).

Model Evaluation Metrics
In this study, four metrics were utilized to evaluate the downscaled results of the above design schemes, namely RMSE, MAE, NSE, and CC.The calculation equations are as follows [73][74][75][76][77]: where Y is the observed value, X is the predicted value, Y and X are the mean values of Y and X, respectively, and n is the number of datasets.For the model construction, the higher the correlation coefficient NSE and CC between the observed value and predicted value, the better the model accuracy.The smaller the RMSE and MAE, the closer the predicted value was to the observed value and the higher the accuracy of the model.

Performance of Downscaling Model
According to Section 3.4, the monthly regression relationship between TWSA and five hydrological variables was established using the WS3, WS5, WS7, and WS9 modeling windows cross-combined with the RF, ETR, ABR, and GBR.In this way, the TWSA data was downscaled from 0.5 • to 0.25 • .Figure 4

Spatial Analysis of Downscaled TWSA before and after Adding Residuals
After training the low-resolution hydrological variables with the combined downscaling models, there is a residual error between the output predicted value and the original value.Previous studies have shown that it can improve the downscaling accuracy by interpolating the residuals to a higher resolution through the Kriging method and then adding them to the downscaled results with a high resolution [42,78,79].Therefore, although the ETR algorithm has slightly better performance, it is necessary to further evaluate the accuracy of each combined downscaling model.In order to display the distribution of the TWSA signals in the entire study area, Figure 5 shows the spatial distribution of the TWSA trends from 0.5 • downscaling to 0.25 • in WS9. Figure 5a shows the TWSA trend with a spatial resolution of 0.25 • derived by CSR-M06 from January 2003 to December 2016; Figure 5b shows the TWSA trend from 0.25 • resampling to 0.5 • ; Figure 5c-j are the results of different combined downscaling models.On the whole, the spatial distribution of the downscaled signal and the original signal have great consistency, both of which show that the TWSA declines severely in the southwest and slightly declines in the northeast (Figure 5c-f).After adding the residuals, the consistency of the spatial distribution of the downscaled TWSA and original TWSA is improved (Figure 5g-j).The downscaled signals of the WS3, WS5, and WS7 modeling windows all demonstrated similar performance.To further analyze the accuracy, we calculated the spatial distribution consistency of the CSR-M06 TWSA and downscaled TWSA, including the signal before and after adding residuals (Figure 6).After adding residuals, the accuracy of each combined downscaling model is improved.The RMSE values decreased from 1.69~2.55mm/yr to 1.36~1.89mm/yr; The MAE values decreased from 1.19~2.10mm/yr to 1.03~1.53mm/yr; The NSE values increased from 0.28~0.69 to 0.51~0.87;The CC values increased from 0.73~0.88 to 0.75~0.94(Figure 6).The RMSE, MAE, NSE, and CC increased by 20~26%, 13~27%, 20~82%, and 3~7%, respectively.The downscaling accuracy of the RF in each modeling window is better than other that of the algorithms after adding the residuals.Taking WS5 as an example, the RMSE and MAE of RF are the smallest, at 1.36 mm/yr and 1.12 mm/yr, respectively, and NSE and CC are both the largest, at 0.82 and 0.93, respectively.These results indicate that after the GRACE grid data are downscaled by the RF, the downscaled signal is closer to the original signal in spatial distribution.

Time Series Analysis of TWSA in Guantao County before and after Downscaling
From the spatial signal analysis in Section 4.1.2,it is known that the downscaling accuracy of the RF is better.Subsequently, this section analyzes the downscaling accuracy of the WS3, WS5, WS7, and WS9 modeling windows combined with RF from the temporal dimension.Figure 7 shows the TWSA time series in Guantao County (the small rectangle in Figure 5: 115.0 • -115.5 • E, 36.5 • -36.75 • N), including the CSR-M06 and the downscaled results of different modeling windows combined with RF.With the exception of several time nodes (e.g., the beginning of 2004, the beginning of 2014, the end of 2015), the TWSA time series in Guantao County derived by the downscaled results and the CSR-M06 model are in good agreement (Figure 7).To analyze the accuracy of each combination model in Figure 7, regression analysis was performed on the downscaled results and the CSR-M06 model (Figure 8).The NSE and CC of the downscaled results of the four modeling windows combined with the RF are both greater than 0.98.This demonstrates that the downscaled results of each modeling window combined with RF have high accuracy (Figure 8).The RMSE, MAE, NSE, and CC values of WS5 are 9.67 mm, 6.80 mm, 0.990, and 0.997, respectively, which are better than WS3, WS7, and WS9.The statistical accuracy of the WS3 and WS7 modeling windows is the second best, and the statistical accuracy of the WS9 modeling window is the worst.To further satisfy the analysis of the TWS changes in Guantao County, this section utilizes WS5 combined with RF to obtain TWSA with a spatial resolution of 0.05 • .Figure 9 shows the spatial distribution of the TWSA trends before and after downscaling from January 2003 to December 2016.In the WS5 modeling window, the spatial distribution of the signals is highly consistent before and after downscaling, which shows that the TWS decreases rapidly in the southwest and central regions and decreases slowly in the northeast.Moreover, the downscaled signal can better reflect the detailed characteristics of the TWS changes.Meanwhile, the CSR-M06 TWSA and 0.05 • downscaling results are used to calculate the TWSA time series in Guantao County (Figure 10).The NSE and CC of the two time series are above 0.99 and decrease at −20.86 mm/yr and −21.79 mm/yr, respectively.

Spatiotemporal Distribution of GWSA Downscaled Signal
According to the water storage balance equation, the SM are subtracted from the TWS to estimate the regional GWS. Figure 11 shows the spatial distribution of the GWSA after removing the SMA from the same spatial resolution from the CSR-M06 TWSA and 0.05 • downscaled TWSA.The spatial distribution of GWSA before and after downscaling is in good agreement, with a severe decrease in the western region and a slight decrease in the eastern region (Figure 11).Similarly, the GWSA time series in Guantao County are calculated (Figure 12).The NSE and CC of the downscaled GWSA and the CSR-M06derived GWSA are 0.980 and 0.994, respectively.The GWSA trends in Guantao County calculated by CSR-M06 model and the 0.05 • downscaled results are −14.53mm/yr and −15.46 mm/yr, respectively (Figure 12).The discussion shows that the downscaling method proposed in this study can not only improve the spatial resolution of GRACE data but also effectively preserves the original signal.

Validation Analysis of In Situ Observations
To analyze the reliability of high-resolution GWSA (0.05 • ) after downscaling, this study collects observational data from 21 groundwater wells in Guantao County for verification.There are 11 shallow groundwater wells (No. 1~11) and 10 deep groundwater wells (No. 12~21).The public period for GWSA and measured groundwater data is from 2012~2016.Due to the significant uncertainty of the specific yield, this study mainly analyzes the GWSA and measured groundwater change trends and their correlations.Figure 13 shows the spatial distribution of groundwater logging and part of the GWS time series.The black dots and red dots represent shallow and deep groundwater wells, respectively.We calculated the correlation coefficients (CC_monthly and CC_yearly) of the measured and inverted groundwater on the monthly and yearly time scales, as shown in Figure 13 and Table 2. Most of the measured data and the GWSA show a downward trend (Figure 13).Table 2 shows the CC between the GWLA and the GWSA.The observation data from the shallow groundwater wells have poor consistency with the downscaled GWSA.
The CC of observation wells 3, 5, and 6 are positive, and the others are negative.With the exception of the poor correlation between observation wells 18 and 19, the CC_monthly and CC_yearly of other deep groundwater wells are above 0.43 and 0.70, respectively.By comparing the observation wells at different depths with the GWSA, it is shown that the overexploitation of deep groundwater may mainly cause groundwater loss in Guantao County.Furthermore, compared to the CC before downscaling, most of the CC after downscaling are improved, indicating that the downscaling work is effective (Table 2).

Limitation and Outlook
From the above discussion, we know that the MLSDM proposed in this study can achieve good results when TWSA is downscaled.The downscaling model established in WS5 (~250 km × 250 km) has the best accuracy, indicating that the modeling window size needs to be analyzed when the downscaling model is established in the spatial dimension.Therefore, it is necessary to consider the influence of surrounding hydrological conditions on the study region during the downscaling process.Meanwhile, compared to downscaling in the temporal dimension, downscaling in the spatial dimension effectively alleviates the spatial discontinuity of the downscaled results between grids.In this study, the downscaled results of GWSA are compared with the in situ groundwater well observations.However, the in situ groundwater well observations are limited, and there is only 5 years of overlap with GRACE products.The long-term changes of the downscaled GWSA result cannot be effectively verified.In later work, we hope to obtain more measured data spanning a longer time period and a regular observation time to compare and analyze the seasonal changes of the downscaled GRACE products.In addition, follow-up work can consider integrating the downscaling results and the measured data that have good accuracy through data assimilation.This process can correct the downscaling results to further improve the downscaling accuracy.Generally, the method of establishing the TWSA downscaling model in the spatial dimension can achieve good results in Guantao County.Nevertheless, we still need more experimental analysis in other regions.

Conclusions
Fully understanding the changes in regional water resources is essential for its rational utilization and effective management.GRACE products have been widely used to analyze the TWS and GWS changes in large areas, but it is still a challenge to analyze the water storage changes in small areas.To satisfy small-scale regional water resources monitoring needs in smaller areas, this research proposes a new machine learning spatial downscaling method (MLSDM) that can effectively improve the spatial resolution of TWSA/GWSA.Compared to building downscaling models on a single grid, the MLSDM can improve the spatial discontinuity of the downscaled signals.This study discusses the accuracy of the downscaled results obtained by combining RF, ETR, ABR, and GBR for WS3, WS5, WS7, and WS9.The results are as follows: (1) To improve the spatial resolution of TWSA products, the MLSDM was constructed using different modeling windows combined with multiple machine learning algorithms.The verification results show that the downscaled results obtained by MLSDM are consistent with the original CSR-M06 model in the spatial distribution.Furthermore, after adding residuals, the downscaling accuracy of each combined model was improved, and the RMSE, MAE, NSE, and CC values increased by 20%~26%, 13%~27%, and 20%~82%, and 3%~7%.Specifically, the accuracy of RF in each modeling window is slightly better than ETR, ABR, and GBR.
(2) To further analyze the impact of the modeling windows, this study compared the TWS time series changes in Guantao County that were derived from the downscaled results of RF combined with different modeling windows.The RMSE, MAE, NSE, and CC of WS5 combined with RF were 9.67 mm, 6.80 mm, 0.990, and 0.997, respectively, which are slightly superior to the downscaled results of WS3, WS7, and WS9.(3) The combined model of WS5 and RF was utilized to downscale the TWSA/GWSA data to 0.05 • , and the signals before and after downscaling demonstrated high consistency.The NSE and CC of the TWSA time series before and after downscaling are 0.990 and 0.997, respectively, and the NSE and CC of GWSA time series before and after downscaling are 0.980 and 0.994, respectively.Subsequently, the measured groundwater level data was used to verify the high-resolution GWSA results.The CC between the high-resolution GWSA and 80% of the deep groundwater well data was above 0.70, but the correlation between shallow groundwater was relatively poor.

Figure 1 .
Figure 1.Establish TWSA downscaling models (a) in the temporal dimension and (b) in the spatial dimension; M1, M2, . . ., Mn represent the monthly resolution grid data used to establish the downscaling model.(c) Set up of the modeling window, the green grid represents the study area; the blue grid represents the modeling window.

Figure 2 .
Figure 2. Location and digital elevation model map of the Guantao County, (a) China; (b) Hebei Province; (c) Guantao County.

( 2 )
Secondly, due to the inconsistent spatiotemporal resolution, the research data needed to be preprocessed (Part I).The spatial resolution of TWSA was resampled to 0.5 • and 0.25 • ; the spatial resolution of ET, SM, LST_day, LST_night, and TEMP was resampled to 0.5 • , 0.25 • , and 0.05 • .With the exception of the in situ groundwater well observations, the temporal resolution of other research data is unified as one month.(3) Thirdly, The MLSDM is employed to establish the regression models between TWSA and five hydrological variables at a spatiotemporal resolution of 0.5 • × 0.5 • and one month (Figure 1b,c and Part II) (WS3, WS5, WS7, WS9 modeling window cross joint RF, ETR, GBR, ABR algorithm, such as WS3 + RF).We input the hydrological variables of 0.25 • into the regression model of the corresponding month (the red arrow in Part II) to obtain the downscaled 0.25 • TWSA.Then, this study compared RMSE, MAE, NSE, and CC between downscaled 0.25 • TWSA and CSR-M06-derived TWSA on spatiotemporal signals to determine the best combined model.(4) Finally, the optimal combined model in Part II was applied to the hydrological variables at a spatial resolution of 0.05 • to obtain the estimated 0.05 • TWSA.According to the water storage balance equation, the soil moisture anomalies (SMA) were subtracted from the TWSA to estimate a 0.05 • GWSA.Subsequently, the in situ groundwater well observations in Guantao County were used to compare and verify the downscaled GWSA (Part III).
shows the average RMSE, MAE, NSE, and CC of the validation set of the combined downscaling model.Overall, the results of each combined downscaling model have satisfactory performance.The RMSE and MAE are between 6.77~13.62mm and 3.36~10.21mm, respectively.Meanwhile, the NSE and CC are not less than 0.51 and 0.76.In WS3, WS5, WS7, and WS9, the RMSE of the ETR were 6.77 mm, 9.94 mm, 9.56 mm, and 10.90 mm, respectively (the MAE of the ETR are 3.36 mm, 4.62 mm, 4.32 mm, and 4.85 mm, respectively), which are better than other algorithms, and the NSE and CC of the ETR have the maximum value in each modeling window.The comparison indicates that the ETR is slightly superior to RF, ABR, and GBR in terms of the model accuracy statistics.

Figure 4 .
Figure 4. Accuracy statistics of different combined downscaling models.

Figure 5 .
Figure 5.The spatial distribution of the TWSA trends before and after downscaling (take WS9 as an example); (a) is the original signal of CSR-M06; (b) is the signal of (a) resampled to 0.5 • ; (c-f) is the result of downscaling the (b) signal using four machine learning algorithms (without adding residuals); (g-j) are the downscaled results of (c-f) after adding the residuals.

Figure 7 .
Figure 7.The time series of the downscaled results and CSR-M06 in Guantao County.

Figure 9 .
Figure 9. Spatial distribution of TWSA trends (a) before and (b) after downscaling in WS5.

Figure 10 .
Figure 10.Time series of TWSA before and after downscaling in Guantao County.(Notice that the time series is the average signal of the small blue rectangle in Figure 9: 115.0 • -115.5 • E, 36.5 • -36.75 • N).

Figure 11 .
Figure 11.Spatial distribution of GWSA trend (a) before and (b) after downscaling in WS5.

Figure 12 .
Figure 12.Time series of GWSA before and after downscaling in Guantao County.(Notice that the time series is the average signal of the small blue rectangle in Figure 11: 115.0 • -115.5 • E, 36.5 • -36.75 • N).

Figure 13 .
Figure 13.Time series of GWLA and GWSA after downscaling in Guantao County.

Table 1 .
Summary of variables (Sources, Scale, Unit, and Website) employed in the study.

Table 2 .
CC of GWSA and GWLA on the monthly and yearly time scale.