An Ensemble Machine Learning Approach for High-Resolution Estimation of Groundwater Storage Anomalies
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data and Preprocessing
2.1.1. GRACE-Derived TWSAs
2.1.2. GLDAS-2.1 Noah Data
- Soil moisture (SM), calculated as the summation of four soil depth layers (0–0.1 m, 0.1–0.4 m, 0.4–1.0 m, and 1.0–2.0 m) with units of kg/m2;
- Plant canopy surface water (PCS) expressed in kg/m2;
- Snow water equivalent (SWE) quantified as kg/m2;
- Surface runoff (SR) represented in kg/(m2·3 h).
2.1.3. The Reanalysis Data
2.1.4. MODIS Satellite Data
2.1.5. Other Geographic Data
2.1.6. In Situ Measurements
2.1.7. Feature Selection
2.2. Methodology
2.2.1. Derivative Calculation of GWSAs
2.2.2. Technical Route
- Data Preparation: We prepared GWSAs data with a low spatial resolution of 0.25° (approximately 25 km) derived and calculated based on the GRACE satellite data and the GLDAS hydrological model data, validation data (in situ measurements), and a dataset of 1 km high-resolution explanatory variables that directly or indirectly affect groundwater changes. The high spatial resolution predictors were resampled to 0.25° according to the pixel size and resolution of the GWSAs data.
- Low-Resolution Scene Modeling: Firstly, we divided the modeling data set into 12-month data by time and constructed three basic models, XGBoost, LightGBM, and CatBoost, on a monthly basis. To reduce the model bias caused by different algorithm principles, in this study, the key hyperparameters of the three algorithms were optimized through grid search. Considering the prediction performance and generalization ability of the model comprehensively, the optimal combination of hyperparameters was set. Based on the optimal parameter combination, the monthly mapping relationship between the explanatory variables and GWSAs in the low-resolution (0.25°) scene was established for each ML model. Finally, based on the Stacking framework, we proposed the ADWA approach and combined it with the ridge regression model to optimize and integrate the output results of each ML model and construct the ensemble model in the low-resolution (0.25°) scene for each month.
- High-Resolution Estimation: Assuming that the relationship between variables does not change with the change in spatial resolution, the statistical regression ensemble model established under low-resolution conditions is also applicable to high-resolution scenarios. High-resolution (1 km) explanatory variables were input into the ensemble model to obtain preliminary 1 km resolution GWSAs data.
- Residual Correction: The residuals are interpreted as the natural random variations that the existing model cannot explain [54]. Residual correction is used to compensate for the neglect of small-scale spatial variations by the regression model, so that the prediction not only depends on the large-scale trend of the auxiliary variables, but also incorporates local detail changes. Firstly, the crude residual term was obtained by subtracting the simulated values obtained from the ensemble model regression at a resolution of 0.25° from the original GWSAs data. However, the residuals often contain spatial autocorrelation, which can lead to the aggregation of prediction errors, violate the independence assumption, and underestimate the uncertainty. Therefore, the cubic spline interpolation method was adopted to interpolate the coarse residual term to 1 km, which can smooth the extraction of local variations and supplement the large-scale model, significantly reducing the deviation and improving the prediction accuracy. Cubic spline interpolation generates a smooth surface with continuous second derivatives by minimizing curvature, without the need for parameter tuning or semi-variance function modeling. It is computationally efficient and easy to implement, and is an ideal choice for residual correction. Then, the 1 km resolution GWSAs obtained from the prior step was combined with the 1 km residual to generate the 1 km resolution GWSAs data after residual correction.
- Validation and Analysis: Finally, the accuracy of the ensemble model was verified by using the 10-fold cross-validation (10-fold CV) technique and model evaluation metrics. The high-resolution estimation results were further verified by comparing the correlation of the time series and using the measured groundwater level data, and the spatial–temporal distribution and evolution characteristics of GWSAs were analyzed.
2.2.3. Ensemble Machine Learning Model
- The XGBoost model adopts the exact greedy algorithm, is good at capturing the interaction relationship of features, and adds regularization terms to prevent overfitting;
- The LightGBM model adopts the “Leaf-Wise” tree growth strategy and the histogram-based feature binning algorithm, and is particularly suitable for processing large-scale datasets and improving accuracy;
- The CB model performs excellently in small datasets and scenarios where categorical features account for a high proportion through the improvement of gradient estimation and the efficient encoding of categorical features.
2.2.4. Design of the Attention-Based Dynamic Weight Allocation (ADWA) Approach
- Correlation calculation: Suppose there are M base models in total (M = 3), and the prediction result of the mth base model for N samples is defined as and the observed values of the target variable are , then, for the mth base model, the calculation formula for the correlation between its prediction result and the target variable is as follows:
- Weight normalization and the introduction of the attention mechanism: In order to further enhance the discrimination of the relevance weights and simultaneously normalize the weights to the operable range, this study adopts the SoftMax attention function to transform the relevance. For the mth base model, its SoftMax weight is defined as:
- Error feedback adjustment: Since correlation alone cannot fully reflect the predictive capability of the base models, this study further introduces an error feedback factor () to dynamically adjust the attention weights. The is defined based on RMSE as follows:
2.2.5. Model Evaluation Indicators
3. Results
3.1. Model Comparison
3.1.1. Model Evaluation and Validation
3.1.2. Time Series Comparison
3.1.3. Validation of In Situ Measurements
3.2. Sensitivity Analysis of Model Performance
3.2.1. Variability in Relative Importance of Explanatory Variables
3.2.2. Dynamic Characteristics of Base Models Weight Allocation
3.3. Comparative Analysis of Coarse and Fine Resolution Data
3.4. Analysis of the Fine-Scale Spatiotemporal Distribution of GWSAs
4. Discussion
4.1. Comparison of Existing Downscaling Models for GWSAs
4.2. Analysis of the Spatiotemporal Heterogeneity of High-Resolution GWSAs
4.3. Uncertainty Analysis
4.4. Study Limitations and Future Research Directions
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Mission Period | Missing Months (201001–202012) |
---|---|
GRACE | 201101, 201106, 201111, 201204, 201205, 201210, 201303, 201308, 201309, 201402, 201407, 201412, 201505, 201507, 201510, 201511, 201602, 201604, 201609, 201610, 201702 |
Gap Period | 201706, 201707, 201708, 201709, 201710, 201711, 201712, 201801, 201802, 201803, 201804, 201805 |
GRACE-FO | 201808, 201809, 201902 |
Variable (Acronym) | Unit | Spatiotemporal Resolution | Data Source |
---|---|---|---|
total precipitation (Train) | m | Monthly; 0.25° × 0.25° | ECMWF ERA5-Land (https://doi.org/10.24381/cds.68d2bb30 (accessed on 10 November 2024)) |
2m temperature (T2m) | K | ||
Runoff (Runoff) | m | ||
surface pressure (SP) | Pa | ||
surface net solar radiation (Snsr) | J/m2 | ||
0–7 cm volumetric soil water (Swvl1) | m3/m3 | ||
7–28 cm volumetric soil water (Swvl2) | m3/m3 | ||
28–100 cm volumetric soil water (Swvl3) | m3/m3 | ||
100–289 cm volumetric soil water (Swvl4) | m3/m3 |
Generalized Hyperparameter | Candidate Range |
---|---|
n_estimators | [200, 250, 300, 350, 400, 450, 500, 550, 600] |
max_depth | [5, 6, 7, 8, 9, 10, 11, 12, 13] |
learning_rate | [0.01, 0.03, 0.05, 0.08, 0.1] |
Month | XGBoost (95% CI) | LightGBM (95% CI) | CatBoost (95% CI) | Ensemble (95% CI) | ||||
---|---|---|---|---|---|---|---|---|
R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | |
Mon01 | [0.895, 0.897] | [27.649, 27.958] | [0.912, 0.914] | [25.268, 25.462] | [0.898, 0.901] | [27.192, 27.483] | [0.925, 0.928] | [23.236, 23.474] |
Mon02 | [0.880, 0.883] | [27.846, 28.163] | [0.897, 0.899] | [25.860, 26.074] | [0.884, 0.888] | [27.332, 27.611] | [0.914, 0.917] | [23.583, 23.767] |
Mon03 | [0.891, 0.894] | [29.173, 29.432] | [0.910, 0.912] | [26.581, 26.804] | [0.901, 0.903] | [27.915, 28.171] | [0.925, 0.927] | [24.231, 24.474] |
Mon04 | [0.894, 0.897] | [28.297, 28.557] | [0.912, 0.915] | [25.718, 25.982] | [0.902, 0.904] | [27.206, 27.469] | [0.927, 0.929] | [23.378, 23.629] |
Mon05 | [0.910, 0.912] | [27.713, 27.984] | [0.924, 0.926] | [25.497, 25.719] | [0.917, 0.920] | [26.574, 26.925] | [0.936, 0.938] | [23.321, 23.473] |
Mon06 | [0.926, 0.929] | [26.291, 26.588] | [0.938, 0.940] | [24.180, 24.457] | [0.930, 0.932] | [25.749, 25.946] | [0.947, 0.949] | [22.249, 22.468] |
Mon07 | [0.912, 0.914] | [26.345, 26.664] | [0.925, 0.927] | [24.222, 24.558] | [0.916, 0.919] | [25.587, 25.912] | [0.936, 0.939] | [22.300, 22.589] |
Mon08 | [0.900, 0.903] | [27.537, 27.827] | [0.917, 0.919] | [25.161, 25.366] | [0.909, 0.912] | [26.290, 26.520] | [0.930, 0.932] | [23.067, 23.293] |
Mon09 | [0.887, 0.890] | [29.642, 29.973] | [0.910, 0.912] | [26.580, 26.796] | [0.902, 0.905] | [27.650, 27.863] | [0.928, 0.930] | [23.788, 23.944] |
Mon10 | [0.886, 0.888] | [29.172, 29.550] | [0.908, 0.911] | [26.062, 26.413] | [0.899, 0.902] | [27.383, 27.717] | [0.923, 0.925] | [23.953, 24.253] |
Mon11 | [0.902, 0.905] | [28.163, 28.536] | [0.921, 0.924] | [25.346, 25.624] | [0.913, 0.916] | [26.517, 26.866] | [0.936, 0.938] | [22.768, 22.976] |
Mon12 | [0.896, 0.899] | [28.132, 28.519] | [0.916, 0.918] | [25.391, 25.712] | [0.905, 0.908] | [26.941, 27.273] | [0.929, 0.931] | [23.287, 23.503] |
References
- World Economic Forum. The Global Risks Report 2024; World Economic Forum: Cologny/Geneva, Switzerland, 2024. [Google Scholar]
- Konikow, L.F. Contribution of global groundwater depletion since 1900 to sea-level rise. Geophys. Res. Lett. 2011, 38. [Google Scholar] [CrossRef]
- Condon, L.E.; Maxwell, R.M. Evaluating the relationship between topography and groundwater using outputs from a continental-scale integrated hydrology model. Water Resour. Res. 2015, 51, 6602–6621. [Google Scholar] [CrossRef]
- Chen, J.; Famigliett, J.S.; Scanlon, B.R.; Rodell, M. Groundwater Storage Changes: Present Status from GRACE Observations. Surv. Geophys. 2016, 37, 397–417. [Google Scholar] [CrossRef]
- Thomas, B.F.; Famiglietti, J.S.; Landerer, F.W.; Wiese, D.N.; Molotch, N.P.; Argus, D.F. GRACE Groundwater Drought Index: Evaluation of California Central Valley groundwater drought. Remote Sens. Environ. 2017, 198, 384–392. [Google Scholar] [CrossRef]
- Gleeson, T.; Cuthbert, M.; Ferguson, G.; Perrone, D. Global Groundwater Sustainability, Resources, and Systems in the Anthropocene. Annu. Rev. Earth Planet. Sci. 2020, 48, 431–463. [Google Scholar] [CrossRef]
- Zell, W.O.; Sanford, W.E. Calibrated Simulation of the Long-Term Average Surficial Groundwater System and Derived Spatial Distributions of its Characteristics for the Contiguous United States. Water Resources Research 2020, 56, e2019WR026724. [Google Scholar] [CrossRef]
- Konikow, L.F. Long-Term Groundwater Depletion in the United States. Groundwater 2015, 53, 2–9. [Google Scholar] [CrossRef]
- Vasilis, G.; Carlo Nike, B. Mediterranean marine caves: A synthesis of current knowledge. In Oceanography and Marine Biology; CRC Press: Boca Raton, FL, USA, 2021; Volume 59, pp. 1–88. [Google Scholar] [CrossRef]
- Wynne, J.J.; Howarth, F.G.; Mammola, S.; Ferreira, R.L.; Cardoso, P.; Lorenzo, T.D.; Galassi, D.M.P.; Medellin, R.A.; Miller, B.W.; Sánchez-Fernández, D.; et al. A conservation roadmap for the subterranean biome. Conserv. Lett. 2021, 14, e12834. [Google Scholar] [CrossRef]
- Mammola, S.; Meierhofer, M.B.; Borges, P.A.V.; Colado, R.; Culver, D.C.; Deharveng, L.; Delić, T.; Di Lorenzo, T.; Dražina, T.; Ferreira, R.L.; et al. Towards evidence-based conservation of subterranean ecosystems. Biol. Rev. 2022, 97, 1476–1510. [Google Scholar] [CrossRef] [PubMed]
- Xue, D.; Gui, D.; Ci, M.; Liu, Q.; Wei, G.; Liu, Y. Spatial and temporal downscaling schemes to reconstruct high-resolution GRACE data: A case study in the Tarim River Basin, Northwest China. Sci. Total Environ. 2024, 907, 167908. [Google Scholar] [CrossRef] [PubMed]
- Arshad, A.; Mirchi, A.; Samimi, M.; Ahmad, B. Combining downscaled-GRACE data with SWAT to improve the estimation of groundwater storage and depletion variations in the Irrigated Indus Basin (IIB). Sci. Total Environ. 2022, 838, 156044. [Google Scholar] [CrossRef] [PubMed]
- Ali, S.; Liu, D.; Fu, Q.; Cheema, M.J.M.; Pham, Q.B.; Rahaman, M.M.; Dang, T.D.; Anh, D.T. Improving the Resolution of GRACE Data for Spatio-Temporal Groundwater Storage Assessment. Remote Sens. 2021, 13, 3513. [Google Scholar] [CrossRef]
- Miro, M.E.; Famiglietti, J.S. Downscaling GRACE Remote Sensing Datasets to High-Resolution Groundwater Storage Change Maps of California’s Central Valley. Remote Sens. 2018, 10, 143. [Google Scholar] [CrossRef]
- Feng, W.; Shum, C.K.; Zhong, M.; Pan, Y. Groundwater Storage Changes in China from Satellite Gravity: An Overview. Remote Sens. 2018, 10, 674. [Google Scholar] [CrossRef]
- Rodell, M.; Velicogna, I.; Famiglietti, J.S. Satellite-based estimates of groundwater depletion in India. Nature 2009, 460, 999–1002. [Google Scholar] [CrossRef] [PubMed]
- Tapley, B.D.; Bettadpur, S.; Ries, J.C.; Thompson, P.F.; Watkins, M.M. GRACE Measurements of Mass Variability in the Earth System. Science 2004, 305, 503–505. [Google Scholar] [CrossRef]
- Frappart, F.; Ramillien, G. Monitoring Groundwater Storage Changes Using the Gravity Recovery and Climate Experiment (GRACE) Satellite Mission: A Review. Remote Sens. 2018, 10, 829. [Google Scholar] [CrossRef]
- Richey, A.S.; Thomas, B.F.; Lo, M.H.; Reager, J.T.; Famiglietti, J.S.; Voss, K.; Swenson, S.; Rodell, M. Quantifying renewable groundwater stress with GRACE. Water Resour. Res. 2015, 51, 5217–5238. [Google Scholar] [CrossRef]
- Tapley, B.D.; Watkins, M.M.; Flechtner, F.; Reigber, C.; Bettadpur, S.; Rodell, M.; Sasgen, I.; Famiglietti, J.S.; Landerer, F.W.; Chambers, D.P.; et al. Contributions of GRACE to understanding climate change. Nat. Clim. Change 2019, 9, 358–369. [Google Scholar] [CrossRef]
- Rateb, A.; Scanlon, B.R.; Pool, D.R.; Sun, A.; Zhang, Z.; Chen, J.; Clark, B.; Faunt, C.C.; Haugh, C.J.; Hill, M.; et al. Comparison of Groundwater Storage Changes From GRACE Satellites With Monitoring and Modeling of Major US Aquifers. Water Resour. Res. 2020, 56, e2020WR027556. [Google Scholar] [CrossRef]
- Chen, J.; Cazenave, A.; Dahle, C.; Llovel, W.; Panet, I.; Pfeffer, J.; Moreira, L. Applications and Challenges of GRACE and GRACE Follow-On Satellite Gravimetry. Surv. Geophys. 2022, 43, 305–345. [Google Scholar] [CrossRef] [PubMed]
- Seyoum, W.M.; Milewski, A.M. Improved methods for estimating local terrestrial water dynamics from GRACE in the Northern High Plains. Adv. Water Resour. 2017, 110, 279–290. [Google Scholar] [CrossRef]
- He, Q.; Huang, B. Satellite-based mapping of daily high-resolution ground PM2.5 in China via space-time regression modeling. Remote Sens. Environ. 2018, 206, 72–83. [Google Scholar] [CrossRef]
- Atkinson, P.M. Downscaling in remote sensing. Int. J. Appl. Earth Obs. Geoinf. 2013, 22, 106–114. [Google Scholar] [CrossRef]
- Chen, F.R.; Liu, Y.; Liu, Q.; Li, X. Spatial downscaling of TRMM 3B43 precipitation considering spatial heterogeneity. Int. J. Remote Sens. 2014, 35, 3074–3093. [Google Scholar] [CrossRef]
- Sandip, M.; Pawan Kumar, J.; Rahul Dev, G. Regression-Kriging Technique to Downscale Satellite-Derived Land Surface Temperature in Heterogeneous Agricultural Landscape. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 1245–1250. [Google Scholar] [CrossRef]
- Zhang, Q.; Shen, Z.; Xu, C.-Y.; Sun, P.; Hu, P.; He, C. A new statistical downscaling approach for global evaluation of the CMIP5 precipitation outputs: Model development and application. Sci. Total Environ. 2019, 690, 1048–1067. [Google Scholar] [CrossRef] [PubMed]
- Seon-Ho, K.; Jeongwoo, H.; Sankarasubramanian, A. Understanding the variability of large-scale statistical downscaling methods under different climate regimes. J. Hydrol. 2024, 641, 131818. [Google Scholar] [CrossRef]
- Mao, T.; Shangguan, W.; Li, Q.; Li, L.; Zhang, Y.; Huang, F.; Li, J.; Liu, W.; Zhang, R. A Spatial Downscaling Method for Remote Sensing Soil Moisture Based on Random Forest Considering Soil Moisture Memory and Mass Conservation. Remote Sens. 2022, 14, 3858. [Google Scholar] [CrossRef]
- Sun, Y.; Deng, K.; Ren, K.; Liu, J.; Deng, C.; Jin, Y. Deep learning in statistical downscaling for deriving high spatial resolution gridded meteorological data: A systematic review. ISPRS J. Photogramm. Remote Sens. 2024, 208, 14–38. [Google Scholar] [CrossRef]
- Bramha Dutt Vishwakarma, J.Z.N.S. Downscaling GRACE total water storage change using partial least squares regression. Sci. Data 2021, 8, 95. [Google Scholar] [CrossRef] [PubMed]
- Wei, H.; Xiaohan, Z.; Yi, W.; Lizhe, W.; Xiaohui, H.; Jun, L.; Sheng, W.; Weitao, C.; Xianju, L.; Ruyi, F.; et al. A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities. ISPRS J. Photogramm. Remote Sens. 2023, 202, 87–113. [Google Scholar] [CrossRef]
- Yin, W.; Zhang, G.; Han, S.-C.; Yeo, I.-Y.; Zhang, M. Improving the resolution of GRACE-based water storage estimates based on machine learning downscaling schemes. J. Hydrol. 2022, 613, 128447. [Google Scholar] [CrossRef]
- Abdellatif, R.; Yassine, A.B.; Abdelhakim, A.; Mohamed, O.; Bouchra, B.; Hamza, O.; Yassine, B.; Lhoussaine, B.; Abdelghani, C. Groundwater level forecasting in a data-scarce region through remote sensing data downscaling, hydrological modeling, and machine learning: A case study from Morocco. J. Hydrol. Reg. Stud. 2023, 50, 101569. [Google Scholar]
- Sabzehee, F.; Amiri-Simkooei, A.R.; Iran-Pour, S.; Vishwakarma, B.D.; Kerachian, R. Enhancing spatial resolution of GRACE-derived groundwater storage anomalies in Urmia catchment using machine learning downscaling methods. J. Environ. Manag. 2023, 330, 117180. [Google Scholar] [CrossRef]
- Arman, A.; Mohammadali, O.; Zahra, H.; Mohammad, E.; Amin, Z.; Arash, G.; Andre, D.; Graham, E.F.; Mojtaba, S. Groundwater Level Modeling with Machine Learning: A Systematic Review and Meta-Analysis. Water 2022, 14, 949. [Google Scholar] [CrossRef]
- Ashraf, A.A.; Sakina, S.; Antoifi, A.; Salissou, M.; Lukumon, O. Applications of machine learning to water resources management: A review of present status and future opportunities. J. Clean. Prod. 2024, 441, 140715. [Google Scholar] [CrossRef]
- Fahad, H.; Paul, M.; Jason, D.; Gang, C. Advancing Hydrology through Machine Learning: Insights, Challenges, and Future Directions Using the CAMELS, Caravan, GRDC, CHIRPS, PERSIANN, NLDAS, GLDAS, and GRACE Datasets. Water 2024, 16, 1904. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, K.; Wang, M. Downscaling Groundwater Storage Data in China to a 1-km Resolution Using Machine Learning Methods. Remote Sens. 2021, 13, 523. [Google Scholar] [CrossRef]
- Ali, S.; Khorrami, B.; Jehanzaib, M.; Tariq, A.; Ajmal, M.; Arshad, A.; Shafeeque, M.; Dilawar, A.; Basit, I.; Zhang, L.; et al. Spatial Downscaling of GRACE Data Based on XGBoost Model for Improved Understanding of Hydrological Droughts in the Indus Basin Irrigation System (IBIS). Remote Sens. 2023, 15, 873. [Google Scholar] [CrossRef]
- Khorrami, B.; Ali, S.; Gündüz, O. Investigating the Local-scale Fluctuations of Groundwater Storage by Using Downscaled GRACE/GRACE-FO JPL Mascon Product Based on Machine Learning (ML) Algorithm. Water Resour. Manag. 2023, 37, 3439–3456. [Google Scholar] [CrossRef]
- Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble machine learning paradigms in hydrology: A review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev.-Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
- Uddin, M.S.; Mitra, B.; Mahmud, K.; Rahman, S.M.; Chowdhury, S.; Rahman, M.M. An ensemble machine learning approach for predicting groundwater storage for sustainable management of water resources. Groundw. Sustain. Dev. 2025, 29, 101417. [Google Scholar] [CrossRef]
- Yin, J.; Medellín-Azuara, J.; Escriva-Bou, A.; Liu, Z. Bayesian machine learning ensemble approach to quantify model uncertainty in predicting groundwater storage change. Sci. Total Environ. 2021, 769, 144715. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Dong, H.; Zhang, Z.; Luo, L.; He, S. Estimation of Near-Ground Ozone With High Spatio-Temporal Resolution in the Yangtze River Delta Region of China Based on a Temporally Ensemble Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7051–7061. [Google Scholar] [CrossRef]
- Tao, K.; Wang, Z.; Chen, A.; Han, Y.; Liu, J.; Zhang, X.; Li, J. Unlocking Potential of Pyrochlore in Energy Systems via Soft Voting Ensemble Learning. Small 2024, 20, 2402756. [Google Scholar] [CrossRef]
- Yin, J.; Slater, L.J.; Khouakhi, A.; Yu, L.; Liu, P.; Li, F.; Pokhrel, Y.; Gentine, P. GTWS-MLrec: Global terrestrial water storage reconstruction by machine learning from 1940 to present. Earth Syst. Sci. Data 2023, 15, 5597–5615. [Google Scholar] [CrossRef]
- Rodell, M.; Houser, P.R.; Jambor, U.; Gottschalck, J.; Mitchell, K.; Meng, C.-J.; Arsenault, K.; Cosgrove, B.; Radakovich, J.; Bosilovich, M.; et al. The Global Land Data Assimilation System. Bull. Am. Meteorol. Soc. 2004, 85, 381–394. [Google Scholar] [CrossRef]
- Lv, M.; Lv, M.; Zha, Y.; Wang, L.; Yang, Z.-L. A global dataset of average specific yield for soils. Sci. Data 2025, 12, 427. [Google Scholar] [CrossRef] [PubMed]
- Fang, J.; Du, J.; Xu, W.; Shi, P.; Li, M.; Ming, X. Spatial downscaling of TRMM precipitation data based on the orographical effect and meteorological conditions in a mountainous area. Adv. Water Resour. 2013, 61, 42–50. [Google Scholar] [CrossRef]
- Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Healey, S.P.; Cohen, W.B.; Yang, Z.; Kenneth Brewer, C.; Brooks, E.B.; Gorelick, N.; Hernandez, A.J.; Huang, C.; Joseph Hughes, M.; Kennedy, R.E.; et al. Mapping forest change using stacked generalization: An ensemble approach. Remote Sens. Environ. 2018, 204, 717–728. [Google Scholar] [CrossRef]
- Zheng, H.; Hou, H.; Qin, Z. Research on a Non-Stationary Groundwater Level Prediction Model Based on VMD-iTransformer and Its Application in Sustainable Water Resource Management of Ecological Reserves. Sustainability 2024, 16, 9185. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD ’16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Sepehri, M.; Afshin, G.; Mahboobeh, K.-H.; Reza, I.A.; Ali, T.; Rodrigo-Comino, J. Assessment of drainage network analysis methods to rank sediment yield hotspots. Hydrol. Sci. J. 2021, 66, 904–918. [Google Scholar] [CrossRef]
- Niazkar, M.; Menapace, A.; Brentan, B.; Piraei, R.; Jimenez, D.; Dhawan, P.; Righetti, M. Applications of XGBoost in water resources engineering: A systematic literature review (Dec 2018–May 2023). Environ. Model. Softw. 2024, 174, 105971. [Google Scholar] [CrossRef]
- Guo, X.; Gui, X.; Xiong, H.; Hu, X.; Li, Y.; Cui, H.; Qiu, Y.; Ma, C. Critical role of climate factors for groundwater potential mapping in arid regions: Insights from random forest, XGBoost, and LightGBM algorithms. J. Hydrol. 2023, 621, 129599. [Google Scholar] [CrossRef]
- Kumar, V.; Kedam, N.; Sharma, K.V.; Mehta, D.J.; Caloiero, T. Advanced Machine Learning Techniques to Improve Hydrological Prediction: A Comparative Analysis of Streamflow Prediction Models. Water 2023, 15, 2572. [Google Scholar] [CrossRef]
- Scanlon, B.R.; Faunt, C.C.; Longuevergne, L.; Reedy, R.C.; Alley, W.M.; McGuire, V.L.; McMahon, P.B. Groundwater depletion and sustainability of irrigation in the US High Plains and Central Valley. Proc. Natl. Acad. Sci. USA 2012, 109, 9320–9325. [Google Scholar] [CrossRef]
- Condon, L.E.; Atchley, A.L.; Maxwell, R.M. Evapotranspiration depletes groundwater under warming over the contiguous United States. Nat. Commun. 2020, 11, 873. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; He, Q.; Liu, K.; Li, J.; Jing, C. Downscaling of GRACE-Derived Groundwater Storage Based on the Random Forest Model. Remote Sens. 2019, 11, 2979. [Google Scholar] [CrossRef]
- Ghaffari, Z.; Easson, G.; Yarbrough, L.D.; Awawdeh, A.R.; Jahan, M.N.; Ellepola, A. Using Downscaled GRACE Mascon Data to Assess Total Water Storage in Mississippi Alluvial Plain Aquifer. Sensors 2023, 23, 6428. [Google Scholar] [CrossRef] [PubMed]
- Gong, H.; Pan, Y.; Zheng, L.; Li, X.; Zhu, L.; Zhang, C.; Huang, Z.; Li, Z.; Wang, H.; Zhou, C. Long-term groundwater storage changes and land subsidence development in the North China Plain (1971–2015). Hydrogeol. J. 2018, 26, 1417–1427. [Google Scholar] [CrossRef]
- Yin, W.; Hu, L.; Zhang, M.; Wang, J.; Han, S.-C. Statistical Downscaling of GRACE-Derived Groundwater Storage Using ET Data in the North China Plain. J. Geophys. Res.-Atmos. 2018, 123, 5973–5987. [Google Scholar] [CrossRef]
- Zhang, J.; Hu, L.; Sun, J.; Wang, D. Reconstructing Groundwater Storage Changes in the North China Plain Using a Numerical Model and GRACE Data. Remote Sens. 2023, 15, 3264. [Google Scholar] [CrossRef]
- Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Model | Hyperparameters |
---|---|
XGBoost | n_estimators = 350, max_depth = 8, learning_rate = 0.03, reg_alpha = 2, reg_lambda = 3, gamma = 1 |
LightGBM | n_estimators = 400, max_depth = 9, learning_rate = 0.05, num_leaves = 2, max_bin = 3 |
CatBoost | n_estimators = 450, max_depth = 10, learning_rate = 0.08, border_count = 210 |
Month | Counts | XGBoost | LightGBM | CatBoost | Ensemble | ||||
---|---|---|---|---|---|---|---|---|---|
R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | ||
Mon01 | N = 172,859 | 0.896 | 27.792 | 0.913 | 25.376 | 0.899 | 27.332 | 0.926 | 23.354 |
Mon02 | N = 170,456 | 0.882 | 27.997 | 0.898 | 25.972 | 0.886 | 27.467 | 0.915 | 23.666 |
Mon03 | N = 172,396 | 0.893 | 29.306 | 0.911 | 26.700 | 0.902 | 28.037 | 0.926 | 24.349 |
Mon04 | N = 173,006 | 0.895 | 28.417 | 0.913 | 25.851 | 0.903 | 27.330 | 0.928 | 23.497 |
Mon05 | N = 173,696 | 0.911 | 27.862 | 0.925 | 25.608 | 0.918 | 26.745 | 0.937 | 23.398 |
Mon06 | N = 167,635 | 0.928 | 26.443 | 0.939 | 24.335 | 0.931 | 25.843 | 0.948 | 22.354 |
Mon07 | N = 165,010 | 0.913 | 26.497 | 0.926 | 24.393 | 0.918 | 25.751 | 0.937 | 22.448 |
Mon08 | N = 162,674 | 0.902 | 27.696 | 0.918 | 25.266 | 0.910 | 26.408 | 0.931 | 23.187 |
Mon09 | N = 161,160 | 0.889 | 29.822 | 0.911 | 26.689 | 0.904 | 27.760 | 0.929 | 23.869 |
Mon10 | N = 172,912 | 0.887 | 29.357 | 0.909 | 26.233 | 0.900 | 27.534 | 0.924 | 24.104 |
Mon11 | N = 162,770 | 0.904 | 28.336 | 0.922 | 25.483 | 0.915 | 26.697 | 0.937 | 22.868 |
Mon12 | N = 164,708 | 0.898 | 28.314 | 0.917 | 25.547 | 0.906 | 27.095 | 0.930 | 23.391 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, Y.; Shen, D.; Cao, Y.; Wang, X.; Zhang, B.; Dong, H. An Ensemble Machine Learning Approach for High-Resolution Estimation of Groundwater Storage Anomalies. Water 2025, 17, 1445. https://doi.org/10.3390/w17101445
Yuan Y, Shen D, Cao Y, Wang X, Zhang B, Dong H. An Ensemble Machine Learning Approach for High-Resolution Estimation of Groundwater Storage Anomalies. Water. 2025; 17(10):1445. https://doi.org/10.3390/w17101445
Chicago/Turabian StyleYuan, Yanbin, Dongyang Shen, Yang Cao, Xiang Wang, Bo Zhang, and Heng Dong. 2025. "An Ensemble Machine Learning Approach for High-Resolution Estimation of Groundwater Storage Anomalies" Water 17, no. 10: 1445. https://doi.org/10.3390/w17101445
APA StyleYuan, Y., Shen, D., Cao, Y., Wang, X., Zhang, B., & Dong, H. (2025). An Ensemble Machine Learning Approach for High-Resolution Estimation of Groundwater Storage Anomalies. Water, 17(10), 1445. https://doi.org/10.3390/w17101445