# Improving Groundwater Imputation through Iterative Refinement Using Spatial and Temporal Correlations from In Situ Data with Machine Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Motivation

^{2}), and its susceptibility to leakage (i.e., measurements in one pixel are affected by adjacent pixels) makes it challenging to use on small-to-medium-sized aquifers directly. Additionally, the lack of temporal history for GRACE data, only about 20 years, does not provide the context needed to understand groundwater storage changes over longer periods that are associated with processes such as development and climate change [15,16].

`–`Monteith (PDSI) and NASA’s Global Land Data Assimilation System (GLDAS) data as the meteorological features [22,28,29]. Our approach also created a model for each well within an aquifer but demonstrated significantly improved imputation results through the use of an inductive bias to build prior features for each well in the imputation process. We were able to demonstrate imputation over large 50-year gaps using this extended method.

#### 1.2. Research Overview

## 2. Methods

#### 2.1. Methods Overview

- Step A: Obtain the raw groundwater data with gaps and preprocess each dataset so that the datasets at each well have the same time steps.
- Step B: Perform an initial imputation to generate a complete time series dataset for each well. This means that every discretized time step will have an associated value. After imputation, replace any imputed value that has an observed measurement with the original data. The imputation step only fills gaps (i.e., imputation).

- Step 1: Select the number of iterations, n, to pass through the entire data set.
- Step 2: Use a Hampel filter, described in Section 2.2, to smooth synthetic data spikes or model predictions that are unrealistic for groundwater data. This process removes outliers from the initially imputed dataset. Before each iteration step, apply this filter to remove outliers. The Hampel filter is used so that any anomalies do not propagate errors.
- Step 3: Iterate through each well, w, in the aquifer. For each well:
- ○
- Step 3a: Select a small set of imputed time series datasets from the wells correlated to the target well. We selected wells based on linear correlation and spatial distance; both ideas are explained in Section 2.3.
- ○
- Step 3b: Develop a model for the target well using the time series data selected in Step 3a.
- ○
- Step 3c: Run the target well model to generate a complete time series. Replace any predictions that have an observed value with the in situ measurement. The results of every model are updated synchronously at the end of the iteration. This means that an updated representation will not be available as a feature until the next iteration; if a particular well is selected multiple times as a feature, each model will see the same version of the data. Once every well has been visited, the model output is used as the input for the next iteration.

- Step 4: Repeat Steps 2 and 3 for n iterations.
- Step 5: Examine the results.

#### 2.2. Hampel Filter

_{1}, x

_{2}, …, x

_{n}). The algorithm uses a centered rolling window of size, k, to calculate the local median (Equation (1)). The local median represents the trend at a particular point. Using the local median, the filter calculates the median absolute deviation (MAD), ${\widehat{\mathit{\sigma}}}^{\mathit{M}\mathit{A}\mathit{D}}$, by calculating the deviation of each observation in the window from the local median (Equation (2)) [33]. We set a threshold, T, to represent the maximum allowable deviation from the local median for an observation; we used 3 ∗ ${\widehat{\mathit{\sigma}}}^{\mathit{M}\mathit{A}\mathit{D}}$ for T (Equation (3)). Finally, the filter checks if the deviation from the median for the center point of the window is greater than the threshold; if the deviation is larger than the threshold, the point is replaced by the local median, and if the deviation is within the threshold, the value remains unchanged (Equation (4)).

#### 2.3. Well Modeling

#### 2.3.1. Well Feature Selection

**d**) between the target well and the feature well using a correlation weight (${\mathit{w}}_{\mathit{c}})$ as shown in Equation (5). We used ${\mathit{R}}_{\mathit{w}}^{\mathbf{2}}$ to identify wells that exhibited similar usage patterns, mostly represented by the data correlation (${\mathit{r}}^{\mathbf{2}}$), and that experience similar environments such as local climate, water demand, or any isolated regional events, mostly represented by the physical distance (

**d**), that could affect groundwater availability.

**d**using the Euclidean distance, $\sqrt{{x}^{2}+{y}^{2}}$ where x and y represented the distances in cartesian space, normalized by the largest distance from the target well to any well in the aquifer. This provided

**d**with a value between 0 and 1. We weighed the two similarity measures by the correlation weight, ${\mathit{w}}_{\mathit{c}}$, which weighed the importance of both the linear correlation and physical distance when selecting wells as model features.

#### 2.3.2. Prior Features

#### 2.3.3. Temporal Features

#### 2.4. Iterative Refinement

## 3. Results

#### 3.1. Case Study: Beryl-Enterprise Utah Aquifer

#### 3.2. Aquifer Results

#### 3.3. Well Details

#### 3.4. Validation through Water Storage Analysis

^{3}(1.3 million acre-feet) of storage. This value was derived based on a storage coefficient of 0.2.

^{3}) (Figure 15). Our results indicate that the aquifer experienced a steady decline in storage over time, with the rate of depletion increasing in recent years. This is consistent with the results of previous studies and highlights the need for sustainable management practices to ensure the long-term productivity and health of the aquifer [39,40]. Overall, our geospatial analysis and imputation methods provide a valuable tool for understanding and monitoring the changes in groundwater storage over time. These results could be used to inform water management decisions and aid in the development of strategies to conserve and protect this vital resource.

## 4. Discussion

#### 4.1. Imputation Case I

#### 4.2. Imputation Case II

#### 4.3. Imputation Case III

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Barber, N.L. Summary of Estimated Water Use in the United States in 2005; U.S. Geological Survey: Reston, VA, USA, 2009.
- Giordano, M.; Villholth, K.G. The Agricultural Groundwater Revolution: Opportunities and Threats to Development; CABI: Long Beach, CA, USA, 2007; ISBN 978-1-84593-173-5. [Google Scholar]
- Konikow, L.F.; Kendy, E. Groundwater Depletion: A Global Problem. Hydrogeol. J.
**2005**, 13, 317–320. [Google Scholar] [CrossRef] - Sophocleous, M. Interactions between Groundwater and Surface Water: The State of the Science. Hydrogeol. J.
**2002**, 10, 52–67. [Google Scholar] [CrossRef] - Fogg, G.E.; LaBolle, E.M. Motivation of Synthesis, with an Example on Groundwater Quality Sustainability. Water Resour. Res.
**2006**, 42, W03S05. [Google Scholar] [CrossRef] - Famiglietti, J.S. The Global Groundwater Crisis. Nat. Clim Change
**2014**, 4, 945–948. [Google Scholar] [CrossRef][Green Version] - Beran, B.; Piasecki, M. Availability and Coverage of Hydrologic Data in the US Geological Survey National Water Information System (NWIS) and US Environmental Protection Agency Storage and Retrieval System (STORET). Earth Sci. Inform.
**2008**, 1, 119–129. [Google Scholar] [CrossRef][Green Version] - Barbosa, S.A.; Pulla, S.T.; Williams, G.P.; Jones, N.L.; Mamane, B.; Sanchez, J.L. Evaluating Groundwater Storage Change and Recharge Using GRACE Data: A Case Study of Aquifers in Niger, West Africa. Remote Sens.
**2022**, 14, 1532. [Google Scholar] [CrossRef] - Mower, R.W.; Sandberg, G.W. Hydrology of the Beryl-Enterprise Area, Escalante Desert, Utah, with Emphasis on Ground Water; with a Section on Surface Water; Technical Publication; Utah Department of Natural Resources, Division of Water Rights: Salt Lake City, UT, USA, 1982; Volume 73, p. 86. [Google Scholar]
- Evans, S.W.; Jones, N.L.; Williams, G.P.; Ames, D.P.; Nelson, E.J. Groundwater Level Mapping Tool: An Open Source Web Application for Assessing Groundwater Sustainability. Environ. Model. Softw.
**2020**, 131, 104782. [Google Scholar] [CrossRef] - Freeze, R.A.; Cherry, J.A. Groundwater; Prentice-Hall: Hoboken, NJ, USA, 1979; ISBN 978-0-13-365312-0. [Google Scholar]
- Alley, W.M.; Healy, R.W.; LaBaugh, J.W.; Reilly, T.E. Flow and Storage in Groundwater Systems. Science
**2002**, 296, 1985–1990. [Google Scholar] [CrossRef][Green Version] - Becker, M.W. Potential for Satellite Remote Sensing of Ground Water. Groundwater
**2006**, 44, 306–318. [Google Scholar] [CrossRef] - McStraw, T.C.; Pulla, S.T.; Jones, N.L.; Williams, G.P.; David, C.H.; Nelson, J.E.; Ames, D.P. An Open-Source Web Application for Regional Analysis of GRACE Groundwater Data and Engaging Stakeholders in Groundwater Management. JAWRA J. Am. Water Resour. Assoc.
**2021**, 58, 1002–1016. [Google Scholar] [CrossRef] - Rodell, M.; Chen, J.; Kato, H.; Famiglietti, J.S.; Nigro, J.; Wilson, C.R. Estimating Groundwater Storage Changes in the Mississippi River Basin (USA) Using GRACE. Hydrogeol. J.
**2007**, 15, 159–166. [Google Scholar] [CrossRef][Green Version] - Sun, A.Y. Predicting Groundwater Level Changes Using GRACE Data. Water Resour. Res.
**2013**, 49, 5900–5912. [Google Scholar] [CrossRef] - Tao, H.; Hameed, M.M.; Marhoon, H.A.; Zounemat-Kermani, M.; Heddam, S.; Kim, S.; Sulaiman, S.O.; Tan, M.L.; Sa’adi, Z.; Mehr, A.D.; et al. Groundwater Level Prediction Using Machine Learning Models: A Comprehensive Review. Neurocomputing
**2022**, 489, 271–308. [Google Scholar] [CrossRef] - Ahmadi, A.; Olyaei, M.; Heydari, Z.; Emami, M.; Zeynolabedin, A.; Ghomlaghi, A.; Daccache, A.; Fogg, G.E.; Sadegh, M. Groundwater Level Modeling with Machine Learning: A Systematic Review and Meta-Analysis. Water
**2022**, 14, 949. [Google Scholar] [CrossRef] - Vu, M.T.; Jardani, A.; Massei, N.; Fournier, M. Reconstruction of Missing Groundwater Level Data by Using Long Short-Term Memory (LSTM) Deep Neural Network. J. Hydrol.
**2021**, 597, 125776. [Google Scholar] [CrossRef] - Bowes, B.D.; Sadler, J.M.; Morsy, M.M.; Behl, M.; Goodall, J.L. Forecasting Groundwater Table in a Flood Prone Coastal City with Long Short-Term Memory and Recurrent Neural Networks. Water
**2019**, 11, 1098. [Google Scholar] [CrossRef][Green Version] - Evans, S.; Williams, G.P.; Jones, N.L.; Ames, D.P.; Nelson, E.J. Exploiting Earth Observation Data to Impute Groundwater Level Measurements with an Extreme Learning Machine. Remote Sens.
**2020**, 12, 2044. [Google Scholar] [CrossRef] - Ramirez, S.G.; Williams, G.P.; Jones, N.L. Groundwater Level Data Imputation Using Machine Learning and Remote Earth Observations Using Inductive Bias. Remote Sens.
**2022**, 14, 5509. [Google Scholar] [CrossRef] - Motevalli, A.; Naghibi, S.A.; Hashemi, H.; Berndtsson, R.; Pradhan, B.; Gholami, V. Inverse Method Using Boosted Regression Tree and K-Nearest Neighbor to Quantify Effects of Point and Non-Point Source Nitrate Pollution in Groundwater. J. Clean. Prod.
**2019**, 228, 1248–1263. [Google Scholar] [CrossRef] - Gundogdu, K.S.; Guney, I. Spatial Analyses of Groundwater Levels Using Universal Kriging. J. Earth Syst. Sci.
**2007**, 116, 49–55. [Google Scholar] [CrossRef][Green Version] - Ahmadi, S.H.; Sedghamiz, A. Application and Evaluation of Kriging and Cokriging Methods on Groundwater Depth Mapping. Environ. Monit. Assess.
**2008**, 138, 357–368. [Google Scholar] [CrossRef] [PubMed] - Sener, E.; Davraz, A.; Ozcelik, M. An Integration of GIS and Remote Sensing in Groundwater Investigations: A Case Study in Burdur, Turkey. Hydrogeol. J.
**2005**, 13, 826–834. [Google Scholar] [CrossRef] - Tapoglou, E.; Karatzas, G.P.; Trichakis, I.C.; Varouchakis, E.A. A Spatio-Temporal Hybrid Neural Network-Kriging Model for Groundwater Level Simulation. J. Hydrol.
**2014**, 519, 3193–3203. [Google Scholar] [CrossRef] - Ramirez, S.G.; Hales, R.C.; Williams, G.P.; Jones, N.L. Extending SC-PDSI-PM with Neural Network Regression Using GLDAS Data and Permutation Feature Importance. Environ. Model. Softw.
**2022**, 157, 105475. [Google Scholar] [CrossRef] - Rodell, M.; Houser, P.R.; Jambor, U.; Gottschalck, J.; Mitchell, K.; Meng, C.-J.; Arsenault, K.; Cosgrove, B.; Radakovich, J.; Bosilovich, M.; et al. The Global Land Data Assimilation System. Bull. Am. Meteorol. Soc.
**2004**, 85, 381–394. [Google Scholar] [CrossRef][Green Version] - Hampel, F.R. The Influence Curve and Its Role in Robust Estimation. J. Am. Stat. Assoc.
**1974**, 69, 383–393. [Google Scholar] [CrossRef] - Liu, H.; Shah, S.; Jiang, W. On-Line Outlier Detection and Data Cleaning. Comput. Chem. Eng.
**2004**, 28, 1635–1647. [Google Scholar] [CrossRef] - Outlier Removal Using Hampel Identifier—MATLAB Hampel. Available online: https://www.mathworks.com/help/signal/ref/hampel.html (accessed on 18 January 2023).
- Ruppert, D.; Matteson, D.S. Statistics and Data Analysis for Financial Engineering: With R Examples; Springer Texts in Statistics; Springer New York: New York, NY, USA, 2015; ISBN 978-1-4939-2613-8. [Google Scholar]
- EmilienDupont Interactive Visualization of Optimization Algorithms in Deep Learning. Available online: https://emiliendupont.github.io/2018/01/24/optimization-visualization/ (accessed on 1 February 2021).
- Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed.; O’Reilly Media, Incorporated: Sebastopol, CA, USA, 2019. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. Oper. Syst. Des. Implement.
**2016**, 101, 582–598. [Google Scholar] - Chollet, F. Deep Learning with Python; Manning Publications Co.: Shelter Island, NY, USA, 2018; ISBN 978-1-61729-443-3. [Google Scholar]
- USGS Water Data for the Nation. Available online: https://waterdata.usgs.gov/nwis (accessed on 22 May 2022).
- Jones, K.L. Beryl Enterprise Ground Water Management Plan. Available online: https://waterrights.utah.gov/groundwater/ManagementReports/BerylEnt/berylEnterprise.asp (accessed on 28 January 2023).
- Mower, R.W. Ground-Water Data for the Beryl-Enterprise Area, Escalante Desert, Utah; Open-File Report; U.S. Geological Survey: Reston, VA, USA, 1981; Volume 81–340.

**Figure 4.**A prior estimate of well data, shown as the black line, which we generated by interpolating and extrapolating the observed groundwater data.

**Figure 5.**The prior estimate of groundwater data generated for the same well as shown in Figure 4. We used this prior feature in the iterative refinement model.

**Figure 6.**The Beryl-Enterprise aquifer (shown in green) located in the Southwest region of Utah within the United States.

**Figure 8.**Availability of preprocessed data for the Beryl-Enterprise aquifer where color corresponds to the point of the same color in Figure 7.

**Figure 15.**Groundwater storage change from 1948 to 2020 for the Beryl-Enterprise aquifer using storage coefficient of 0.2.

**Figure 18.**Well features have ground truth data overlapping in the training period and a few in testing.

**Table 1.**Number of well features used based on the ${\overline{\mathit{R}}}_{\mathit{w}}^{\mathbf{2}}$ of the best five well features.

${\overline{\mathit{R}}}_{\mathit{w}}^{2}$ Value | Feature Wells Used | ${\overline{\mathit{R}}}_{\mathit{w}}^{2}$ Value | Feature Wells Used |
---|---|---|---|

$0.0\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.1 | 11 | $0.5\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.6 | 6 |

$0.1\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.2 | 10 | $0.6\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.7 | |

$0.2\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.3 | 9 | $0.7\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.8 | 5 |

$0.3\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.4 | 8 | $0.8\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.9 | |

$0.4\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 0.5 | 7 | $0.9\text{}\le \text{}{\overline{R}}_{w}^{2}$ < 1.0 |

**Table 2.**Example of the time features, including the one-hot encoding of monthly data (green and yellow squares) and the linear time feature in the right most column.

Date | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | Decimal Time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 January 1948 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0000 |

1 February 1948 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0012 |

1 March 1948 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0023 |

1 April 1948 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0035 |

1 May 1948 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0046 |

1 June 1948 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0058 |

1 July 2020 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.9942 |

1 August 2020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0.9954 |

1 September 2020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.9965 |

1 October 2020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0.9977 |

1 November 2020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0.9988 |

1 December 2020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1.0000 |

Imputation Case I | Target and feature wells have measured data over the same intervals and the feature well has measured data over the gaps. |

Imputation Case II | Target and feature wells do not necessarily have measured data over the same intervals. Much of the correlation between the two is conducted through previous imputation results. The feature well have measured data within the gaps of the target well. |

Imputation Case III | Target and feature wells have measured data over the same interval, but only imputed values exist over the gap periods. The feature wells do not have any measured data in the gaps. |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ramirez, S.G.; Williams, G.P.; Jones, N.L.; Ames, D.P.; Radebaugh, J.
Improving Groundwater Imputation through Iterative Refinement Using Spatial and Temporal Correlations from In Situ Data with Machine Learning. *Water* **2023**, *15*, 1236.
https://doi.org/10.3390/w15061236

**AMA Style**

Ramirez SG, Williams GP, Jones NL, Ames DP, Radebaugh J.
Improving Groundwater Imputation through Iterative Refinement Using Spatial and Temporal Correlations from In Situ Data with Machine Learning. *Water*. 2023; 15(6):1236.
https://doi.org/10.3390/w15061236

**Chicago/Turabian Style**

Ramirez, Saul G., Gustavious Paul Williams, Norman L. Jones, Daniel P. Ames, and Jani Radebaugh.
2023. "Improving Groundwater Imputation through Iterative Refinement Using Spatial and Temporal Correlations from In Situ Data with Machine Learning" *Water* 15, no. 6: 1236.
https://doi.org/10.3390/w15061236