Forecasting GRACE Data over the African Watersheds Using Artificial Neural Networks

The GRACE-derived terrestrial water storage (TWSGRACE) provides measurements of the mass exchange and transport between continents, oceans, and ice sheets. In this study, a statistical approach was used to forecast TWSGRACE data using 10 major African watersheds as test sites. The forecasted TWSGRACE was then used to predict drought events in the examined African watersheds. Using a nonlinear autoregressive with exogenous input (NARX) model, relationships were derived between TWSGRACE data and the controlling and/or related variables (rainfall, temperature, evapotranspiration, and Normalized Difference Vegetation Index). The performance of the model was found to be “very good” (Nash–Sutcliffe (NSE) > 0.75; scaled root mean square error (R*) < 0.5) for 60% of the investigated watersheds, “good” (NSE > 0.65; R* < 0.6) for 10%, and “satisfactory” (NSE > 0.50; R* < 0.7) for the remaining 30% of the watersheds. During the forecasted period, no drought events were predicted over the Niger basin, the termination of the latest (March–October 2015) drought event was observed over the Zambezi basin, and the onset of a drought event (January-March 2016) over the Lake Chad basin was correctly predicted. Adopted methodologies generate continuous and uninterrupted TWSGRACE records, provide predictive tools to address environmental and hydrological problems, and help bridge the current gap between GRACE missions.


Introduction
The joint National Aeronautics and Space Administration (NASA) and the German Aerospace Center (DLR) Gravity Recovery and Climate Experiment (GRACE) satellite mission, launched on 17 March 2002, was designed to map both the Earth's static and temporal fields with unprecedented accuracy [1]. The month-to-month variability in gravity field solutions delivered by the GRACE mission is directly related to the spatial and temporal variabilities in the dynamic components of the terrestrial water storage (TWS), including surface and groundwater, soil moisture, and snow/ice [2]. The GRACE-derived TWS (TWS GRACE ) provides insights about the exchange of mass at both global and regional scales, thus enabling the scientific community to address previously unresolvable questions and groundbreaking discoveries in various disciplines including hydrology, oceanography, cryosphere, and solid Earth fields [3][4][5][6][7][8][9]. However, GRACE has its shortcomings. These include, among others, the coarse spatial resolution, absence of vertical resolution, and leakage problems due to spherical harmonic representation of Earth's gravitational field. The leakage refers to contributions from neighboring regions to the gravity signal in the region of interest. Some of these shortcomings are being partially addressed by the recent releases of the University of Texas -Center for Space Research (UT-CSR) and the Jet Propulsion Laboratory (JPL) GRACE Mass Concentration (mascon) solutions that are of higher spatial resolution, higher signal-to-noise ratio, and reduced leakage error compared to the CSR-and JPL-derived spherical harmonics solutions [10,11]. The absence of vertical resolution of TWS GRACE is being addressed by subtracting outputs of the Land Surface Models (LSMs) from the TWS GRACE data [12][13][14][15][16][17][18] and/or assimilation of TWS GRACE data into LSM [19][20][21]. The relatively large latency of TWS GRACE data is being addressed by generating daily TWS solutions [22]. Additional shortcomings that have not been fully addressed include (1) the temporal gaps in the GRACE mission due to battery performance issues, and (2) the temporal gap between GRACE missions; a minimum of one-year TWS data gap exists between GRACE (terminated in October 2017) and GRACE-Follow On (GRACE-FO; launched in May 2018) mission.
Analysis of TWS GRACE time series and other relevant time series has been successfully used in many hydrological and climatological studies including management of water resources [5,7,8,[23][24][25][26][27][28][29], and analysis, prediction, and forecasting of climate change and variability in climatic (e.g., rainfall, evapotranspiration, and wind speed), and hydrologic (e.g., stream flow, flood, drought, infiltration, recharge, and ground/surface water quantity, quality, and consumption) parameters [30][31][32][33][34][35][36]. The successful implementation of these analyses heavily relies on the availability of comprehensive, continuous, and uninterrupted time series records [37,38]. For example, short gaps in TWS records could hinder the analysis and interpretation of temporal variability by introducing errors in the extracted amplitude and phase of the annual cycle, non-seasonal residual signals, as well as in secular trends and associated significance levels [39][40][41] and could also increase the level of uncertainty in the spectral modeling of temporal variability and cyclicity [42]. Long gaps, on the other hand, could obscure the temporal patterns of TWS data and consequently distort the results of any statistical analysis [38,43].
Previous attempts to address TWS GRACE data gaps included substitution with LSMs outputs to fill gaps in TWS GRACE [44]. The successful implementation of these procedures depends on the degree to which these models can simulate natural events such as drought or floods, which in turn are dependent on the physics, structure, and accuracy of the LSMs, and on the sources and accuracy of the forcing meteorological and/or land surface datasets [45]. While such models can potentially simulate TWS GRACE variabilities caused by natural phenomena, they are less successful simulating variabilities caused by anthropogenic contributions [46][47][48][49].
Many of the remaining approaches that address the TWS GRACE data gaps are statistical in nature. They relate TWS GRACE variabilities to variations in one or more other relevant parameters and use derived relationships to predict TWS GRACE during data-deficient (e.g., gap) periods. For example, Reager and Famiglietti [24] adopted an empirical approach that relates changes in TWS GRACE to precipitation forcing and generalized these empirical relationships knowing large-scale (area > 200,000 km 2 ) basin characteristics (e.g., percent forest cover and basin temperature). They found that temperature, soil water-holding capacity, and percent forest cover are essential controls on TWS variability, but basin area and mean terrain slope have less impact. Such applications were shown to be effective in predicting TWS GRACE over large-scale basins, but less efficient over small-scale basins, and require knowledge of basin characteristics that might not be available for all basins.
Forootan et al. [50] described a statistical approach to predict TWS GRACE over West Africa using rainfall and sea surface temperature data. Their model characterized two-dominant annual and inter-annual modes of TWS GRACE variability at accuracies of 79% and 67% (first year) and 62% and 57% (second year). While this approach was effective in predicting many of the TWS GRACE variabilities over West Africa, the independent component analysis that defines teleconnections between model input/output parameters could not fully describe all the observed variabilities. Moreover, the applications of this approach are restricted to areas experiencing strong ocean-land-atmosphere interactions (e.g., North America, West Africa, Australia). De Linage et al. [51] proposed a simple statistical modeling framework to forecast TWS GRACE in the Amazon Basin, on seasonal timescales, based on the relationship between TWS GRACE and sea surface temperature anomalies (Niño 4 and Tropical North Atlantic Index). The model performance varied seasonally and spatially, where a higher performance was observed during wet seasons in the northeastern regions, and during dry seasons in the central and western regions. The model performance was generally high (66%) in the northeastern region compared to the central and western areas (39%). Additional statistical approaches include, but are not limited to: (1) the use of the GPS-derived low-degree surface loading variations to predict the seasonal harmonic variations mapped by GRACE [52]; (2) the application of a statistical model that relates Alaska glacier mass variability to variations in monthly snowfall and temperature fields [53]; (3) the use of empirical orthogonal function decomposition to reconstruct TWS GRACE data over the Amazon basin by examining the correlation between TWS GRACE and water levels over inter-annual and multi-decadal time periods [54]; (4) the use of statistical models to reconstruct global natural-varying TWS GRACE using rainfall and temperature data [55]; and (5) the use of artificial intelligence to predict the TWS GRACE over a large karst plateau in Southwest China using in situ precipitation, monthly mean temperature, and GLDAS soil moisture [56] and to predict the TWS GRACE over West Africa using rainfall, evaporation, surface temperature, net-precipitation, soil moisture, and climate indices [57].
Artificial neural network (ANN) is an artificial intelligence technique inspired by the functioning of the human brain [58,59]. ANNs have the ability to model both linear and nonlinear systems using learning and prediction algorithms that extract complex relationships between model input(s) and target(s). Benefits of ANN applications include, but are not limited to (1) computational robustness, (2) no prior knowledge of underlying physical phenomena required, (3) the ability to handle large and complex datasets and to learn and generalize even if the data is noisy and/or incomplete, and (4) the ability to derive complex and nonlinear relationships that could be used as forecasting tools [60,61]. ANNs have been widely used in climatic predictions, streamflow forecasting, and water resources management research studies [62][63][64][65][66]. For example, ANNs has been used to predict the groundwater level using (1) in situ hydro-meteorological observations [67], (2) variable pumping and weather conditions [68][69][70][71], (3) population and irrigation data [72], and (4) TWS GRACE , precipitation, and temperature data [73].
Nonlinear autoregressive with exogenous input (NARX) neural network is used in this study to predict and forecast TWS GRACE time series over ten African watersheds. Given our previous knowledge of the spatiotemporal variabilities in TWS GRACE over the African watersheds and how they are controlled by natural and anthropogenic interventions [9,74], NARX model was selected because it provides effective, efficient, and powerful (1) nonlinear systems modelling and predictive tool, (2) learning algorithm that discovers, and is not affected by, the long temporal dependence in the model outputs and/or inputs [75,76], and (3) faster convergence in reaching the optimal weights of the connections between neurons and/or inputs [77][78][79][80].
In our approach, we use the readily available rainfall (P), evapotranspiration (ET), temperature (T), and normalized difference vegetation index (NDVI) to predict TWS GRACE data. Each of the ten major African watersheds (Niger, Volta, Zambezi, Okavango, Lake Chad, East-Central Coast, Mozambique Northeast Coast, Congo, Limpopo, and Nile watersheds) was used separately as our test site ( Figure 1). It is worth mentioning that the ultimate goal of this study is to generate continuous and uninterrupted TWS GRACE records and to develop a robust predictive tool that forecasts TWS GRACE to address environmental and hydrological problems and help bridge the current gap between GRACE missions. Hence, the performance of the predictive model is the focus of the modeling exercise while the relative importance of the inputs in predicting the outputs is not the subject of this study.

Materials and Methods
In the selection of model inputs process, we were guided by the water budget equation that explains the temporal variability in TWS GRACE depending upon P, ET, and runoff. Given the fact that most of the African watersheds lack runoff data over the investigated period, T and NDVI data were used as model inputs instead. Both datasets are extracted from readily available datasets and expected to be correlated with TWS GRACE . For example, an increase in soil moisture will increase NDVI and TWS, and an increase in temperature will increase evaporation and decrease TWS. Monthly NARX model input datasets, acquired over the time period of April 2002 to September 2015, were averaged over each of the examined African watersheds. A normalization technique was implemented to account for varying units, ranges, and magnitudes of the model input data. The normalization scales all input data in the range between 0 and 1 and ensures that all variables receive equal consideration. Normalization was applied using the following equation: wherex t is the normalized value for a given input value x t at a given time t and x max and x min are the maximum and minimum recorded values of x t , respectively. and TWS, and an increase in temperature will increase evaporation and decrease TWS. Monthly NARX model input datasets, acquired over the time period of April 2002 to September 2015, were averaged over each of the examined African watersheds. A normalization technique was implemented to account for varying units, ranges, and magnitudes of the model input data. The normalization scales all input data in the range between 0 and 1 and ensures that all variables receive equal consideration. Normalization was applied using the following equation: where is the normalized value for a given input value at a given time t and and are the maximum and minimum recorded values of , respectively.

Data
In this section, we briefly describe the inputs and target data used in the NARX model. The most recent version (Release 06) of GRACE mascon solutions provided by the JPL (available at: https://podaac-opendap.jpl.nasa.gov/opendap/GeodeticsGravity/tellus/L3/mascon/) was used in this study. It has been reported that JPL mascon solutions provide a better reduction of leakage errors and improved signal localization [8]. Detailed description of processing steps used to generate JPL

Data
In this section, we briefly describe the inputs and target data used in the NARX model. The most recent version (Release 06) of GRACE mascon solutions provided by the JPL (available at: https://podaac-opendap.jpl.nasa.gov/opendap/GeodeticsGravity/tellus/L3/mascon/) was used in this study. It has been reported that JPL mascon solutions provide a better reduction of leakage errors and improved signal localization [8]. Detailed description of processing steps used to generate JPL mascon solutions is provided in Watkins et al. [11] and Wiese at al. [81]. In summary, the Earth's oblateness (C 20 ) and degree-1 (geocenter) coefficients provided by Cheng et al. [82] and Swenson et al. [83], respectively, were used. The glacial isostatic adjustment correction advanced by Peltier et al. [84] was applied. The temporal mean was removed. The Coastline Resolution Improvement (CRI) version, used in this study, were generated using the spherical caps (native resolution of GRACE mission, 3 • the Equator, total: 4551 grids) approach and represented on 0.5 • × 0.5 • monthly TWS GRACE girds. The basin averaged TWS GRACE time series is calculated for each basin by averaging the results for all TWS GRACE grid points lying within the spatial domain of each basin. These time series were then rescaled to restore true signal amplitude following the approach advanced by Landerer and Swenson [85]. Errors associated with basin averaged TWS GRACE were quantified using the approach described in Ahmed and Abdelmohsen [18]. Figure 2 shows the temporal variations in TWS GRACE averaged over each of the examined African watersheds.
Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 21 mascon solutions is provided in Watkins et al. [11] and Wiese at al. [81]. In summary, the Earth's oblateness (C20) and degree-1 (geocenter) coefficients provided by Cheng et al. [82] and Swenson et al. [83], respectively, were used. The glacial isostatic adjustment correction advanced by Peltier et al. [84] was applied. The temporal mean was removed. The Coastline Resolution Improvement (CRI) version, used in this study, were generated using the spherical caps (native resolution of GRACE mission, 3° the Equator, total: 4551 grids) approach and represented on 0.5°×0.5° monthly TWSGRACE girds. The basin averaged TWSGRACE time series is calculated for each basin by averaging the results for all TWSGRACE grid points lying within the spatial domain of each basin. These time series were then rescaled to restore true signal amplitude following the approach advanced by Landerer and Swenson [85]. Errors associated with basin averaged TWSGRACE were quantified using the approach described in Ahmed and Abdelmohsen [18]. Figure 2 shows the temporal variations in TWSGRACE averaged over each of the examined African watersheds.

Temperature (T)
The air temperature data was extracted from the European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA Interim) project. The global ERA Interim (available at http: //apps.ecmwf.int/datasets/data/interim-full-daily/levtype=sfc/) assimilates, using a 4-dimensional variational scheme, the 2 m air temperature using multiple sources of observational (e.g., station, satellite, and radiosonde) datasets at a temporal resolution of 6 hours and a spatial sampling of 0.75 • × 0.75 • [99]. The gridded ERA40 data shows similar short-term variability with Climate Research Unit (CRU) Temperature data version 2 (CRUTEM2v) derived directly from monthly station data [100]. Compared to ERA40 products, ERA Interim provides higher spatial and temporal resolutions as well as improved description of low-frequency variability and stratospheric circulation [101].

Normalized Difference Vegetation Index (NDVI)
The Moderate-resolution Imaging Spectroradiometer (MODIS)-derived NDVI products were used in this study. The NDVI data were generated from averaged level-3 MODIS Terra (MOD13C2) and MODIS Aqua (MYD13C2) products (available at https://lpdaac.usgs.gov/dataset_discovery/modis/ modis_products_table). Both MOD13C2 and MYD13C2 are global monthly datasets with spatial sampling of 0.05 • [102]. Significant correlations were reported between MODIS Terra and Aqua products over the majority of the African continent [103].

Approach
In this section, we describe the NARX model theory, structure, and performance measures.

Nonlinear Autoregressive with Exogenous Input (NARX) Model
NARX model is used to derive relationships between TWS GRACE data on the one hand and P, ET, T, and NDVI on the other hand. NARX is a recurrent dynamic ANN model with feedback connections enclosing several layers of the network [104,105]. NARX is designed to model nonlinear relations between the past inputs, past outputs, and the current output. In other words, the value of the output is regressed on previous values of the outputs and previous values of the inputs. The NARX model is formulated as a discrete time input-output recursive equations: where,ŷ t+1 is the NARX model output at the time t + 1;ŷ t ,ŷ t−1 , . . . ,ŷ t−n y are the past outputs of the NARX model; y t , y t−1 , . . . , y t−n y are the true past values of the time series (desired output values); x t+1 , x t , x t−1 , . . . , x t−n x are the NARX model inputs; ε t defines the error term, generally assumed to be Gaussian and white noise; n x and n y are the input and output orders of the dynamical model (e.g., delays), where n x ≥ 0 and n y ≥ 1; and f is a nonlinear mapping function that is approximated by a regression model.
An alternative approach to NARX would have been to train several ANNs, each predicting TWS GRACE with different lead times up to the maximum desired forecasting lead time. The two approaches are based on the same past measurements. The NARX approach forecasts targets after a recursive process and requires training only one model while the alternate approach predicts the targets directly but requires the calibration and validation of several models. It is unlikely that significant differences would have been observed using the multiple ANN approaches, the NARX approach was selected as it involves calibrating only one model.
The NARX model is expressed in two configurations, parallel architecture (also called closed-loop) and series-parallel (also called open-loop) architecture [106][107][108][109]. In the series-parallel architecture, which is useful for training, validation, and testing phases, the true output is used instead of feeding back the estimated output (Figure 3a). In other words, the future value of the time seriesŷ t+1 is predicted from the present and past values of x t and true past values of the time series y t . In the parallel architecture configuration, which is used in the prediction phase, the output is fed back to the input of the feedforward network (Figure 3b). In this case, the future value of the time seriesŷ t+1 is predicted from the present and past values of x t and the past predicted values of the time seriesŷ t . The open-loop configuration is expressed by Equation (2a), whereas, the closed-loop configuration is expressed by Equation (2b), e.g., [109].
Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 21 delays), where ≥ 0 and ≥ 1; and is a nonlinear mapping function that is approximated by a regression model. An alternative approach to NARX would have been to train several ANNs, each predicting TWSGRACE with different lead times up to the maximum desired forecasting lead time. The two approaches are based on the same past measurements. The NARX approach forecasts targets after a recursive process and requires training only one model while the alternate approach predicts the targets directly but requires the calibration and validation of several models. It is unlikely that significant differences would have been observed using the multiple ANN approaches, the NARX approach was selected as it involves calibrating only one model.
The NARX model is expressed in two configurations, parallel architecture (also called closed-loop) and series-parallel (also called open-loop) architecture [106][107][108][109]. In the series-parallel architecture, which is useful for training, validation, and testing phases, the true output is used instead of feeding back the estimated output (Figure 3a). In other words, the future value of the time series is predicted from the present and past values of and true past values of the time series . In the parallel architecture configuration, which is used in the prediction phase, the output is fed back to the input of the feedforward network (Figure 3b). In this case, the future value of the time series is predicted from the present and past values of and the past predicted values of the time series . The open-loop configuration is expressed by Equation (2a), whereas, the closed-loop configuration is expressed by Equation (2b), e.g., [109].  [109]. (a) Series-parallel architecture (open-loop) that was used in the training phase, and (b) parallel architecture (closed-loop) that was used in the prediction phase. "W" and "b" denote connection weights and bias, respectively; "F1" and "F2" represents the hyperbolic tangent and the linear activation functions, respectively.
In Equation (2), while the general functionality of the mapping function ( ) is known (usually a combination of sigmoid and linear type functions), the exact function is determined as part of the training phase of the NARX model. The NARX model is based on a Multi-Layer Perceptron (MLP) architecture consisting of three main layers: input, hidden, and output (target) layers. Hidden and output layers have associated activation functions for their respective neurons, each with weights and biases ( Figure 3). Within the NARX model, the information flows, throughout the layers, from the input to the hidden layers, then to the output layers. In each layer, a scalar product ( × ) is calculated, by each neuron, by multiplying the input vector of the previous layer ( ) by the weights vector ( ). The neurons outputs ( ) are obtained by applying the activation function to the result of the scalar product: parallel architecture (closed-loop) that was used in the prediction phase. "W" and "b" denote connection weights and bias, respectively; "F 1 " and "F 2 " represents the hyperbolic tangent and the linear activation functions, respectively.
In Equation (2), while the general functionality of the mapping function ( f ) is known (usually a combination of sigmoid and linear type functions), the exact function is determined as part of the training phase of the NARX model. The NARX model is based on a Multi-Layer Perceptron (MLP) architecture consisting of three main layers: input, hidden, and output (target) layers. Hidden and output layers have associated activation functions for their respective neurons, each with weights and biases (Figure 3). Within the NARX model, the information flows, throughout the layers, from the input to the hidden layers, then to the output layers. In each layer, a scalar product (x j × W ij ) is calculated, by each neuron, by multiplying the input vector of the previous layer (x j ) by the weights vector (W ij ). The neurons outputs (y i ) are obtained by applying the activation function F to the result of the scalar product: where, i and j are the neuron index in the layer and the input index in the NARX, respectively. Examples of activation functions (F) include linear, sigmoid (logsig), and hyperbolic tangent (tansig). A tansig transfer function is utilized between the hidden layers as well as between input and hidden layers, whereas, a linear activation function is employed between hidden and output layers.
In the NARX training phase, the weights and biases of the ANN are progressively modified. The Bayesian regularization training function was selected to minimize a linear combination of squared errors and weights [106,109,110] while minimizing the chances of overfitting. The widely used Levenberg-Marquardt optimization method, e.g., [58] was selected to progressively update the ANN weights and biases. This approach was found to be fast and robust to train this study's moderate-sized feedforward networks.

Nonlinear Autoregressive with Exogenous Input (NARX) Model Structure
In this study, four input variables (P, ET, T, and NDVI) are used to predict the target (TWS GRACE ) during the TWS GRACE gap period and to forecast TWS GRACE measurements 6 months ahead. To fill the TWS GRACE gap, the NARX model used recent inputs up to the onset of the TWS GRACE gap to predict the missing TWS GRACE data. In the forecasting phase, we forecasted TWS GRACE for "future" months where we do not have the benefit of concurrent or recent input data. To forecast TWS GRACE data over African watersheds, we used "past" model inputs to predict the current or future TWS GRACE data. The forecasting process was tested by increasing lead times from 1 up to 6 months in advance. For the one-month lead time, the model was used to predict the next TWS GRACE data using previous inputs and TWS GRACE data up to the time of prediction. The lead time was then increased while testing for different length of the input time series. The varying lead time approach was adopted based on our prior work [74] in which we reported varying lag times between the individual model inputs and the response in TWS GRACE data.
NARX model was trained using 85% of the data and tested using the remaining 15%. Training and testing periods were selected randomly to maximize the model predictive capabilities by taking advantages of different wet and dry periods represented in the model input data. For consistency purposes, our results were displayed using April 2002-September 2013 and October 2013-September 2015 as training and testing periods, respectively. The NARX model was then used to forecast TWS GRACE for the period from October 2015 through March 2016 (forecasting period).
Generally, the number of hidden neurons should range between half and twice the number of predictors [111,112]. In this study, the optimal number of hidden neurons was determined by trial and error, where the hidden neurons were added gradually until the predicted and measured values start to match while checking for potential model overfitting. The number of neurons in the hidden layer for the ten examined watersheds ranged from four to eight and a delay value ranging between 2 and 24 months was attempted. The delay value reflects, to some extent, the long-term dependence structures between the predicted target (e.g., TWS GRACE ) and the input variables (i.e., P, ET, T, and NDVI) [9,74]. Table 1 lists the architecture of the NARX network that generated the optimum results for each of the examined watersheds. The Monte Carlo simulations were used to measure the model stability, quantify the associated uncertainty, and improve the model generalization. We ran the NARX model for 50 runs, and the final TWS GRACE predictions were calculated from the median of the model runs; the range of the model runs was used to estimate the uncertainties associated with the predicted TWS GRACE . Automated regularization approach, advanced by MacKay [113], was used to optimize and overcome the overparameterization and generalization problems for each of these runs. In this approach, the parameters of the NARX model, with predetermined distributions, are assumed to be random. The regularization parameters, in this case, are attributed to the unknown variances associated with these distributions.

Performance Measures
The performance of the NARX model was evaluated using several standard coefficients; Root mean square error (RMSE), scaled RMSE (R*), correlation coefficient (r), Nash-Sutcliffe efficiency (NSE) coefficient, and the seasonal adjusted NSE (NSE*) that are commonly used in the analysis and evaluation of statistical model results, in our case the NARX model outputs [114]. For example, NSE was found to be the best objective function for evaluating the overall fit between the predictive and observed values [115]. Likewise, r values were found to be sensitive to the differences between modeled and observed data including the extreme values (i.e., outliers) [116].

Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 21
The performance of the NARX model was evaluated using several standard coefficients; Root mean square error (RMSE), scaled RMSE (R*), correlation coefficient (r), Nash-Sutcliffe efficiency (NSE) coefficient, and the seasonal adjusted NSE (NSE*) that are commonly used in the analysis and evaluation of statistical model results, in our case the NARX model outputs [114]. For example, NSE was found to be the best objective function for evaluating the overall fit between the predictive and observed values [115]. Likewise, r values were found to be sensitive to the differences between modeled and observed data including the extreme values (i.e., outliers) [116]. RMSE measures the global fitness of a predictive model by measuring the difference between the predicted and the observed values. Generally, lower RMSE indicate a better fit between the predicted and the observed values. RMSE is calculated as: where, and are the modelled and observed values at time (t), respectively, and n is the number of target data used for testing phase. RMSE measures the global fitness of a predictive model by measuring the difference between the predicted and the observed values. Generally, lower RMSE indicate a better fit between the predicted and the observed values. RMSE is calculated as: where, y t and o t are the modelled and observed values at time (t), respectively, and n is the number of target data used for testing phase. The scaled RMSE (R*) is defined as the ratio between RMSE and the standard deviation of observations (σ) according to the following equation: The correlation coefficient (r) is a measure of how likely future outcomes are to be predicted by the model. The value of r is obtained by dividing the covariance of the observed and modelled values by the product of their standard deviations according to the following equation: where the overbar denotes mean values. The value of r ranges between −1 and +1, where values of +1 and −1 indicate perfect increasing and decreasing linear relationships between variables, respectively, and a value of 0 indicates absence of such relationships between variables. The NSE coefficient measures the predictive skill of a model relative to the mean of observations. In other words, NSE indicates how well the plot of observed versus modelled data fits the 1:1 line relationship [117]. The optimal value for the NSE is 1.0 and it is calculated using the following equation: The seasonal adjusted NSE coefficient (NSE*) takes into account the long-term annual cycle. NSE* was developed by Lorenz et al. [118] it is calculated using the following equation: whereô is the long-term mean annual cycle. NSE* is greater than zero if the modelled time series is better than the long-term annual cycle and less than zero if the modelled time series cannot improve the long-term mean annual cycle with respect to the observations [118].
In this study, the model performance is classified into three main categories [73,114]: (a) very good performance if the resulting NSE is greater than 0.75 and R* is less than 0.50, (b) good performance if NSE is greater than 0.65 and R* is less than 0.60, and (c) satisfactory performance if NSE is greater than 0.5 and R* is less than 0.70. Table 1 Figure 4. Examination of Table 1 shows that six watersheds (Niger, Volta, Zambezi, Okavango, Lake Chad, and East-Central Coast) show "very good" performance, one (Mozambique Northeast Coast) shows "good" performance, and the remaining three watersheds (Congo, Limpopo, and Nile) show "satisfactory" performance. The NSE* values are smaller than that of the NSE, but greater than zero, for all of the investigated African watersheds. Positive NSE* values indicate that the simulated TWS GRACE time series agrees with the observed TWS GRACE time series more than the values of the mean annual cycle. Thus, the NARX model is able to predict the seasonal and non-seasonal temporal variabilities.

Results
Trial and error simulations, while avoiding overfitting, indicate that the performance of the NARX model is enhanced by increasing the training period and delays as well as number of input and hidden layers. For example, a delay of 13 months resulted in "very good" NARX model performance over the Niger, Volta, Okavango, Lake Chad, and East Central Coast basins. The performance of the NARX model, however, has not been improved with increasing the delay by more than 7 months in case of the Zambezi and the Mozambique Northeast Coast basins. The number of hidden layers was kept between half and twice the number of input layers. Eight hidden layers resulted in optimal NARX model performance over basins in the "very good" and "good" performance categories. Hidden layers exceeding eight in number resulted in model overfitting, reduced generalization, and increased training time. The Limpopo, Congo, and Nile basins were assigned 4, 5, and 7 hidden layers, respectively. Fewer hidden layers (<4) resulted in NARX model underfitting. NARX model results over the African watersheds were compared to results from other ANN models (nonlinear autoregressive exogenous [ARX]) as well as results from multilinear regression (MLR) model. Unlike the NARX model, the ARX model identifies nonlinear relations between the past inputs and the current output using the following equation: The same architecture listed in Table 1 was used for the ARX model. The MLR model, on the other hand, maps the linear relationships between the current inputs and the current output according to the following equation: where a 0 is the intercept and a 1 , . . . , a n are the model coefficients. Figure 5 shows the NSE values for NARX, ARX, and MLR models over the investigated African watersheds. Examination of Figure 5 indicates that the NSE values of the ARX and MLR models are always lower than that of the NARX model. For the Nile, Congo, and Okavango basins, the NSE of the ARX and MLR models are negative, indicating that the simulated TWS GRACE means are worse than the observed TWS GRACE means. One reason for the better performance of the NRAX compared to the other models could be the nature of the spatiotemporal variabilities of TWS GRACE over the African watersheds and how they are related to episodic changes in natural variability (e.g., rainfall, temperature, etc.), as well as long-term natural and anthropogenic changes in storage (e.g., soil moisture, groundwater) signals. The forecasted TWSGRACE could potentially be used in several hydrological applications. We provide one demonstration for a potential application, where the projected TWSGRACE was used to forecast the hydrological drought events in three African watersheds (Zambezi, Lake Chad, and Niger basins). Thomas et al. [119] used the TWSGRACE to quantify the monthly water storage deficits. The latter provides a direct measure of the magnitude of the volumetric departure from "normal" hydrological conditions. We applied the same approach to forecast the hydrological drought using the NARX-derived TWSGRACE. We first calculated the mean monthly seasonal cycle by averaging each of the 12 months (January, February, etc.) in the TWSGRACE time series. The seasonal cycle was then subtracted from the observed TWSGRACE time series. Negative difference indicated TWSGRACE deficits defined as (M). M (in km 3 ) represents the volume of water required, in any month, to return to normal conditions and is calculated by multiplying the negative difference by the basin area. A hydrological drought event is identified if M lasts three or more continuous months [119]. Figure 6 shows the GRACE-derived hydrological drought events over Zambezi (Figure 6a), Lake Chad (Figure 6b), and Niger basins (Figure 6c). Examination of Figure 6 shows that during the forecasted TWSGRACE period (October 2015-March 2016), no drought events were observed over the Niger basin, the latest (March-October 2015) drought event over the Zambezi basin came to an end, and the Lake Chad basin witnessed the onset of a drought event (January-March 2016). Analysis of different datasets (e.g., rainfall, temperature, climate indices) confirmed the 2016 drought in the Zambezi basin [120].

Discussion
The relatively lower performance of the NARX model over the Congo, Limpopo, and Nile basins could be attributed to poor correspondence of TWSGRACE with the 6 months lead time of one or more of the input variables (P, ET, T, and NDVI) or the absence of one or more of the important controlling and/or related factors. Other factors that could affect the NARX model performance include the climatic settings, climate cyclicity, and anthropogenic factors. It is worth mentioning that the performance of a NARX model is generally enhanced by the presence of more "dynamics" in the training phase [121]. For example, the Limpopo basin lacks the strong seasonal patterns that were observed over the other investigated basins (Figure 2g). In addition, the Limpopo basin exhibits the lowest TWSGRACE amplitude of annual cycle (6.09 cm) compared to other investigated watersheds The anthropogenic activities change the portioning of TWSGRACE components as well as the phase, amplitude, residual signals, and trend of TWSGRACE time series [9]. The dynamics/seasonality in the Nile basin is clear (Figure 2a), yet the basin extends over several climatic settings (tropical, semi-arid, arid, and Mediterranean), experiences anthropogenic activities The forecasted TWS GRACE could potentially be used in several hydrological applications. We provide one demonstration for a potential application, where the projected TWS GRACE was used to forecast the hydrological drought events in three African watersheds (Zambezi, Lake Chad, and Niger basins). Thomas et al. [119] used the TWS GRACE to quantify the monthly water storage deficits. The latter provides a direct measure of the magnitude of the volumetric departure from "normal" hydrological conditions. We applied the same approach to forecast the hydrological drought using the NARX-derived TWS GRACE . We first calculated the mean monthly seasonal cycle by averaging each of the 12 months (January, February, etc.) in the TWS GRACE time series. The seasonal cycle was then subtracted from the observed TWS GRACE time series. Negative difference indicated TWS GRACE deficits defined as (M). M (in km 3 ) represents the volume of water required, in any month, to return to normal conditions and is calculated by multiplying the negative difference by the basin area. A hydrological drought event is identified if M lasts three or more continuous months [119]. Figure 6 shows the GRACE-derived hydrological drought events over Zambezi (Figure 6a), Lake Chad (Figure 6b), and Niger basins (Figure 6c). Examination of Figure 6 shows that during the forecasted TWS GRACE period (October 2015-March 2016), no drought events were observed over the Niger basin, the latest (March-October 2015) drought event over the Zambezi basin came to an end, and the Lake Chad basin witnessed the onset of a drought event (January-March 2016). Analysis of different datasets (e.g., rainfall, temperature, climate indices) confirmed the 2016 drought in the Zambezi basin [120].

Discussion
The relatively lower performance of the NARX model over the Congo, Limpopo, and Nile basins could be attributed to poor correspondence of TWS GRACE with the 6 months lead time of one or more of the input variables (P, ET, T, and NDVI) or the absence of one or more of the important controlling and/or related factors. Other factors that could affect the NARX model performance include the climatic settings, climate cyclicity, and anthropogenic factors. It is worth mentioning that the performance of a NARX model is generally enhanced by the presence of more "dynamics" in the training phase [121]. For example, the Limpopo basin lacks the strong seasonal patterns that were observed over the other investigated basins (Figure 2g). In addition, the Limpopo basin exhibits the lowest TWS GRACE amplitude of annual cycle (6.09 cm) compared to other investigated watersheds  [9]. The dynamics/seasonality in the Nile basin is clear (Figure 2a), yet the basin extends over several climatic settings (tropical, semi-arid, arid, and Mediterranean), experiences anthropogenic activities (e.g., construction of Merowe, Tekeze, Amerti-Neshi, and Millennium dams, heightened of Roseiris, Jebel Aulia, Khashm El Gibta dams, and groundwater extraction in the Western Desert of Egypt; [9]). The Nile Basin also witnesses climatic cyclicity (64-, 19-, 12-, and 7-year cycles) [122]. The 7-year cycle could be well represented in the training period, but the 64-, 19-, and 12-year cycles are not, given the short time span of the training period (12-year).
Likewise, the Congo basin is also experiencing anthropogenic activities (i.e., deforestation activities; [9]) and is apparently witnessing multi-year and decadal climatic cycles. Ahmed et al. [9] reported a decrease in annual NDVI that was consistent with the increase in the annual deforestation rates. Annual deforestation rates for the Ubangi, Sangha, and Congo subbasins of the Congo basin are on the rise (period 1990-2000: 0.58%, 0.38%, and 0.20%; 2000-2005: 0.62%, 0.40%, and 0.21%; 2005-2010: 0.65%, 0.41%, and 0.22%). In addition, the rainfall patterns over the Congo basin are cyclic in nature. They range from multi-year (cycles: 2-8 year) oscillations [123,124], to near-decadal, decadal, and multi-decadal fluctuations [123,125,126]. These cyclicity patterns in precipitation are reflected in the fluctuations in basin discharge (fluctuations in basin discharge: 2-4, 6-8, 13-15, and 30-35 year; [127][128][129]  (e.g., construction of Merowe, Tekeze, Amerti-Neshi, and Millennium dams, heightened of Roseiris, Jebel Aulia, Khashm El Gibta dams, and groundwater extraction in the Western Desert of Egypt; [9]). The Nile Basin also witnesses climatic cyclicity (64-, 19-, 12-, and 7-year cycles) [122]. The 7-year cycle could be well represented in the training period, but the 64-, 19-, and 12-year cycles are not, given the short time span of the training period (12-year). Likewise, the Congo basin is also experiencing anthropogenic activities (i.e., deforestation activities; [9]) and is apparently witnessing multi-year and decadal climatic cycles. Ahmed et al. [9] reported a decrease in annual NDVI that was consistent with the increase in the annual deforestation rates. Annual deforestation rates for the Ubangi, Sangha, and Congo subbasins of the Congo basin are on the rise (period 1990-2000: 0.58%, 0.38%, and 0.20%; 2000-2005: 0.62%, 0.40%, and 0.21%; 2005-2010: 0.65%, 0.41%, and 0.22%). In addition, the rainfall patterns over the Congo basin are cyclic in nature. They range from multi-year (cycles: 2-8 year) oscillations [123,124], to near-decadal, decadal, and multi-decadal fluctuations [123,125,126].  The launch of the GRACE-FO mission and the acquisition of TWSGRACE data over the upcoming years, together with those acquired throughout the GRACE mission, will enable the construction of a continuous decadal TWS record. The latter will undoubtedly enhance model performance over basins that experience decadal climatic cyclicities such as the Congo and Nile basins given the prolonged time periods that are needed for the NARX model to learn over such basins. The launch of the GRACE-FO mission and the acquisition of TWS GRACE data over the upcoming years, together with those acquired throughout the GRACE mission, will enable the construction of a continuous decadal TWS record. The latter will undoubtedly enhance model performance over basins that experience decadal climatic cyclicities such as the Congo and Nile basins given the prolonged time periods that are needed for the NARX model to learn over such basins.

Conclusions
The currently available TWS GRACE time series suffers from gaps in temporal records due to battery performance issues. In addition, a minimum of one-year TWS data gap exists between the past GRACE and current GRACE-FO missions. We developed and implemented a statistical approach that predicts and forecasts TWS GRACE time series 6 months in advance and tested the developed approach over 10 major African watersheds. The NARX model was used to derive relationships between TWS GRACE data (target) and the controlling and/or related factors (P, T, ET, and NDVI) over the selected watersheds. The performance of the developed model was evaluated using several coefficients and was found to be "very good" over six watersheds (Niger, Volta, Zambezi, Okavango, Lake Chad, and East-Central Coast), "good" over one (Mozambique Northeast Coast) and "satisfactory" over the remaining three watersheds (Congo, Limpopo, and Nile). The "satisfactory" performance over these three basins is attributed here to one or more of the following: (1) the watershed witnessing anthropogenic activities, (2) the watershed extending over multiple climatic settings, and (3) prolonged (e.g., decadal) climatic cyclicity that is not adequately represented in the model training period. In addition, the modeled TWS GRACE were used to forecast the severity of droughts over the Zambezi, Lake Chad, and Niger basins. During the forecasted TWS GRACE period, no drought events were observed over the Niger basin, the latest (March-October 2015) drought event over the Zambezi basin came to an end, and the Lake Chad basin witnessed the onset of a drought event (January-March 2016). Although not demonstrated here, the forecasted TWS could potentially be used to address a wide range of hydrological applications, including forecasting seasonal and inter-annual spatiotemporal variations in TWS over river basins, basin-scale ET, and natural/climatic and anthropogenic impacts on continental aquifer systems, and flood events.
Our approach is robust, effective, and advantageous, given the following: (1) the model does not require knowledge of basin characteristics and could be applied to relatively small basins (as small as 63,000 km 2 ) that could be resolved by GRACE data [130]; (2) our approach forecasts TWS GRACE time series 6 months in advance; (3) the significant forecasting time period for TWS GRACE of our model could be used to project TWS GRACE within various watersheds and thus could potentially be used for developing sound, year-to-year, sustainable water management scenarios for these watersheds; (4) our approach considers the full spatiotemporal and spectral signatures of TWS GRACE time series (e.g., seasonal and sub seasonal, trend, and residual terms); (5) our model provides high TWS GRACE predictability values; and (6) the model could potentially be modified to enable applications over a wide range of drainage basins and settings by incorporating additional variables (e.g., snow-related variables) and could potentially enable long term predictions once longer training time periods could be attained.
Our NARX model was successfully tested over a wide range of climatic and hydrologic settings across the entire continent of Africa including tropical and subtropical systems. Thus, it could potentially be applied to large sectors of the globe with similar climatic zones and hydrologic settings in South and Central America, Australia, and southern Asia. These areas cover approximately 60% of the Earth's continental surface. Our NARX model will have to be modified if we were to apply it to the remaining continental masses including polar, boreal, and temperate areas where the water budget of the hydrologic systems is partially or largely controlled by, or correlated with, snow parameters (e.g., snow fall, snow melt). In such areas, one would expect that snow-related parameters would have to be incorporated in the modified NARX model to adequately account for the observed TWS GRACE over these areas. In other words, multiple NARX models will have to be constructed if we were to apply this approach on a global scale. Studies should concentrate on evaluating the extent to which the advanced and advocated procedures could be applied to predict and forecast pixel-based TWS GRACE datasets, as opposed to watershed-based predictions. Although not shown, our preliminary results are encouraging and if successful, this will benefit the GRACE user community at large. The NARX model could be generally used as Kalman prediction and data update or prediction only in case of no TWS GRACE data.
The applications of techniques such as those applied and advocated here and the acquisition of TWS GRACE data over the upcoming years by GRACE-FO and GRACE-II (planned in 2025) missions could potentially enable the construction of continuous uninterrupted decadal TWS GRACE records over the major hydrologic systems across the globe. These records could be readily used to enhance the current TWS GRACE hydrologic applications and provide predictive tools that could address a wide range of environmental and hydrological problems.
The developed approach is of timely importance as well given the degradation of GRACE operational capability specifically during the last couple months of the mission lifetime, as manifested by the increase in the number of data gaps with time due to battery performance issues. For example, the data gaps increased from 3 months for the period April 2002 to December 2010 to 17 months for the period January 2011 to June 2017. In October 2017, NASA decommissioned the GRACE-2 satellite and ended the GRACE science mission. This decision was based on an age-related battery issue; the remaining battery capacity could no longer operate the science instruments and telemetry transmitter on board GRACE-2. There is an urgent need to develop approaches to construct a continuous and an uninterrupted TWS GRACE record that encompasses GRACE and GRACE-FO mission datasets and fills the gap between these two missions. The applied and the advocated methodologies could assist in achieving this goal.
Author Contributions: M.A. extracted and processed the model input data, developed, calibrated, and validated NARX, ARX, and MLR models, forecasted drought events, and prepared the manuscript. M.S. provided guidance and advise throughout the project and assisted in the development of the adopted methodology and in the preparation of the manuscript. T.E. and P.T. provided the statistical framework and the analysis for the project.
Funding: Funding was provided by the National Aeronautics and Space Administration (NASA) grants NNX12AJ94G and 80NSSC18K1681 to Western Michigan University. Funding was also provided by the Division of Research and Innovation at Texas A&M University-Corpus Christi (University Research Enhancement and Research Equipment and Infrastructure grants) to the first author.