Fusion of Multi-Satellite Data and Artificial Neural Network for Predicting Total Discharge

As research on the use of satellites in combination with previous hydrological monitoring techniques increases, interest in the application of the machine-learning approach to the prediction of hydrological variables is growing. Ground-based measurements are often limited due to the difficulties in measuring spatiotemporal variations, especially in ungauged areas. In addition, there are no existing satellites capable of measuring total discharge directly. In this study, Artificial neural network (ANN) machine-learning approaches are examined for the prediction of 0.25° total discharge data over the Korean Peninsula using the data fusion of multi-satellites, reanalysis data, and ground-based observations. Terrestrial water storage changes (TWSC) of the Gravity Recovery and Climate Experiment (GRACE) satellite, precipitation of the tropical rainfall measuring mission (TRMM), and soil moisture storage and average temperature of the global land data assimilation system (GLDAS) models are used as ANN model input data. The results demonstrate the relatively good performance of the ANN approach for predicting the total discharge in terms of the correlation coefficient (r = 0.65–0.95), maximum absolute error (MAE = 13.28–20.35 mm/month), root mean square error (RMSE = 22.56–34.77 mm/month), and Nash-Sutcliff efficiency (NSE = 0.42–0.90). The precipitation is identified as the most influential input parameter through a sensitivity analysis. Overall, the ANN-predicted total discharge shows similar spatial patterns to those from other methods, while GLDAS underestimates the total discharge with a smaller dynamic range than the other models. Thus, the potential of the ANN approach described herein shows promise for predicting the total discharge based on the data fusion of multi-satellites, reanalysis data, and ground-based observations.


Introduction
Total discharge, which includes surface and subsurface discharge, is an elementary component of the hydrological cycle over the Earth. Continuous total discharge data are vital for the monitoring of extreme hydrological events, such as droughts and floods, and for water management. Dependable total discharge predictions can allow the water usage efficiency to increase and agricultural and economic losses to be minimized [1,2]. The monitoring and prediction of the discharge calls for continuous and reliable historical hydrometeorological/hydraulic data (i.e., precipitation, humidity, temperature, flow velocity, slope, etc.), which are typically applied to complex physical modelling methods. However, the quality of the station measurement networks is restricted in many mountainous parts of the world, where basic hydrologic data is insufficient and sparse and sites are inaccessible areas [3]. Moreover, physical models based on these data are difficult to set up, because the data collection, archiving, and distribution have suffered from the inadequate operation and maintenance of facilities, especially in ungauged areas. which is sparse compared to those provided by the Korea Meteorological Administration (KMA) in South Korea. In addition, Figure 1b indicates that the distribution of the river in the Korean Peninsula is relatively dense (0.2-0.5 km/km 2 ), with most of the rivers flowing into the West Sea due to geographic characteristics. Many of the rivers have gentle slopes and wider basins, leading to higher amounts of discharge. There are eleven major river basins (1-10 in Figure 1b) in North Korea and five major river basins (11-16 in Figure 1b) in South Korea (Table 1).

Terrestrial Water Storage Changes from the GRACE
The GRACE satellite system was launched by the National Aeronautics and Space Administration (NASA) with the objective of observing spatiotemporal variations in the Earth's gravitational field. The time variance in the gravitational field detected by GRACE reflects the redistribution of the Earth's water mass and can be converted to TWS changes after removing both the effects of nontidal (atmosphere and ocean) and tidal (solid Earth, ocean, and atmosphere). The gridded GRACE Level-2 RL05 datasets were processed by three organizations, namely: the Center for Space Research (CSR), the Jet Propulsion Laboratory (JPL), and the GeoForschungsZentrum (GFZ) using various orders of spherical harmonics to provide the gridded (1.0 • × 1.0 • ) TWS anomaly (TWSA) data that were downloaded from the JPL Tellus site (http://grace.jpl.nasa.gov). The TWSA is the time mean value of the TWS (time span: January 2004 to December 2009) subtracted from the TWS using Equation (1) and can be used to compute the TWSC using Equation (2).

TWS
(1) where i is the month, − TWS is the average of the TWS, and ∆t is the time span. This study used the latest CSR RL05 data (Table 2), which were provided by eliminated spherical harmonic coefficients at the order of 60 [20]. Although the RL05 data tend to reduce the retrieved TWSA amplitude due to destriping and smoothing, a multiplicative scaling factor can be applied to restore the signal attenuation [21]. To this end, scaling factors computed via the land TWSA time series output of the community land model (CLM) were applied to the RL05 TWSA dataset.

Precipitation from TRMM
One of the datasets provided by TRMM is the TRMM Multi-satellite Precipitation Analysis (TMPA), and the TMPA is provided at http://trmm.gsfc.nasa.gov. The TMPA 3B43 precipitation dataset was produced by incorporating the data of the TRMM satellite with data from the Special Sensor Microwave/Imager (SSM/I) sensor in the Defense Meteorological Satellite Program (DMSP) satellite, the advanced microwave scanning radiometer-Earth (AMSR-E) in the Aqua and National Oceanic and Atmospheric Administration (NOAA) satellites, and the global in situ precipitation measurement of the global precipitation climatology project (GPCP) produced by the NOAA's Climate Prediction Center and Global Precipitation Climatology Center (GPCC). The TMPA 3B43 provides 0.25 • (~25 km) monthly precipitation, covering 50 • N to 50 • S, 180 • W to 180 • E for 1998 to the present time. The TMPA dataset will continue by the next-generation global precipitation mission (GPM) satellite, although the TRMM satellite mission stopped in 2015. Previous studies demonstrated the applicability of the TMPA 3B43 to the Korean Peninsula by comparison with ground-based observations [22,23]. Hence, the TMPA 3B43 monthly precipitation data for the Korean Peninsula during the study period was extracted in this study (Table 2).

Soil Moisture Storage and Average Temperature from GLDAS
The GLDAS system produces primary land surface flux and storage component variables in water and energy cycles by forcing data from ground-based observations and satellites. Currently, the official website of GLDAS (http://ldas.gsfc.nasa.gov/gldas/) provides 3-h and monthly data with a spatial coverage of 0.25 • and 1.0 • by the application of widely used land surface models that have been well-proven in hydrometeorological fields, including the common land model (CLM), the mosaic, the Noah, and the variable inflatable capacity (VIC) models. Each model output is driven by various ground-based and satellite data and is modified by data assimilation. In view of the high accuracy of the model output demonstrated by previous studies, this is the primary source of data for hydrological analyses in ungauged areas around the world. For use in this study, the monthly GLDAS/Noah data for soil moisture (SMS) storage and average temperature (T) with a spatial resolution of 0.25 • (GLDAS_NOAH025_M) were gathered from the NASA land information system and Goddard Earth Sciences Data and Information Services Center (GEDISC; Table 2). This contains four depth layers of soil moisture: 0-0.1 m (layer 1), 0.1-0.4 m (layer 2), 0.4-1.0 m (layer 3), and 1.0-2.0 m (layer 4), and soil moisture data were integrated for input data.

Dependent Data
The Water Management Information System (WAMIS), which is managed by the Korean government and the Korea Water Resources Corporation in South Korea, provides the discharge observations (http://www.wamis.go.kr/). Out of the 736 stations that have been operated up to 2016, 187 observation stations were adopted in this study for their data continuity and inclusion of the study period (2003-2016; Table 2). The gauge stations for discharge at the five major river basins (Han, Geum, Nakdong, Yeongsan, and Seomjin Rivers) are indicated in Figure 1c. Meanwhile, North Korea's available hydrological data are limited to the precipitation data from a sparse 27 stations provided by the World Meteorological Organization (WMO; Figure 1a). In addition, the amount of discharge is difficult to estimate due to facility and operational problems, leading to poor reliability of the precipitation data. To overcome these problems, the GLDAS data were used to validate the total discharge over the Korean Peninsula by combining the surface and subsurface discharges. Thus, only the GLDAS total discharge data for North Korea, where no ground-based data existed, were used as the dependent data, while the GLDAS and ground-based discharge data for South Korea were combined via conditional merging techniques and used as the dependent data. A detailed description of the conditional merging technique is given in Section 3.3.

Validation Data
The GLDAS total discharge results were used as validation data. In addition, long-term total discharge data from the PRMS were used for South Korea. The PRMS model is a physics-based quasi-distributed effluent model developed by the U.S. Geological Survey (USGS) in 1983 and has been validated by several previous studies [24,25]. This model takes hydrologic interactions between the atmosphere, vegetation, and soil into account; subdivides the entire basin into homogeneous units termed hydrologic response units (HRU); and calculates the energy balance from these HRU to determine the outflow of the basin. The model is also known to simulate the surface water and intermediate and subsurface discharges [26]. The results of the PRMS analysis over 50 years  for the South Korea basins were provided by the WAMIS. The total discharge results for the Korean Peninsula obtained by Seo and Lee [11] based on the water balance was also used for validation data ( Table 2). More details on this method are given in Section 3.1.

Methodology
The approach proposed in this study is represented by the process flow diagram in Figure 2. A total of four input variables were used to develop the total discharge prediction model, namely: GRACE TWSC, TRMM precipitation, GLDAS soil moisture storage, and average temperature. Taking the spatial resolution of other variables into consideration, the GRACE product was subsampled by dividing each 1.0 • grid into uniform 0.25 • sub-grids. The target variables were the total discharge from the GLDAS Noah model for the Korean Peninsula and the enhanced total discharge data obtained by combining the GLDAS and discharge gauge station data for South Korea (SK) via the conditional merging method. Then, the independent variables were fed into an ANN model to predict the downscaled total discharge (0.25 • ) for each target dataset. Validation was conducted to compare the performances of the total Remote Sens. 2020, 12, 2248 7 of 21 discharge data from GLDAS, gauges, PRMS, and the water balance results of Seo and Lee [11] and to rank the importance of the various input data.

Water Balance-Based Total Discharge
The water balance is based on the conservation of the water mass, also referred to as the continuity equation. Generally, the total discharge estimate is obtained using the terrestrial water balance equation [11]: where total Q is the total discharge (mm/month), which involves the net of the surface and subsurface water flow, ro Q is the outflow of the surface water (mm/month), go Q is the outflow of the subsurface water (mm/month), ri Q is the inflow of the surface water (mm/month), gi Q is the inflow of the subsurface water (mm/month), P is the precipitation (mm/month), and ET is the actual evapotranspiration (mm/month). Changes in the regional total discharge ( total Q ) can be evaluated by integrating the GRACEmeasured TWSC with the precipitation and evapotranspiration observation data based on the terrestrial water balance. Seo and Lee [11] used the data from three satellites (GRACE, TRMM, and MODIS) to verify the feasibility of water balance-based total Q retrieval during the period January 2003-December 2014 in the Korean Peninsula. However, the water balance-based total discharge estimation raises uncertainties due to satellite data processes such as measurement error and the integration of each product. In the following, the total discharge results using the water balance based on multi-satellite data given by Seo and Lee [11] are used for validation of the ANN results of this study.

Artificial Neural Network
The ANN is widely used in machine learning as a data-driven approach that employs training algorithms [18]. The most common and typical training algorithm used in hydrological applications (e.g., prediction of rainfall, water level, and groundwater [14][15][16][17][18]) is the backpropagation algorithm

Water Balance-Based Total Discharge
The water balance is based on the conservation of the water mass, also referred to as the continuity equation. Generally, the total discharge estimate is obtained using the terrestrial water balance equation [11]: where Q total is the total discharge (mm/month), which involves the net of the surface and subsurface water flow, Q ro is the outflow of the surface water (mm/month), Q go is the outflow of the subsurface water (mm/month), Q ri is the inflow of the surface water (mm/month), Q gi is the inflow of the subsurface water (mm/month), P is the precipitation (mm/month), and ET is the actual evapotranspiration (mm/month). Changes in the regional total discharge (Q total ) can be evaluated by integrating the GRACE-measured TWSC with the precipitation and evapotranspiration observation data based on the terrestrial water balance. Seo and Lee [11] used the data from three satellites (GRACE, TRMM, and MODIS) to verify the feasibility of water balance-based Q total retrieval during the period January 2003-December 2014 in the Korean Peninsula. However, the water balance-based total discharge estimation raises uncertainties due to satellite data processes such as measurement error and the integration of each product. In the following, the total discharge results using the water balance based on multi-satellite data given by Seo and Lee [11] are used for validation of the ANN results of this study.

Artificial Neural Network
The ANN is widely used in machine learning as a data-driven approach that employs training algorithms [18]. The most common and typical training algorithm used in hydrological applications (e.g., prediction of rainfall, water level, and groundwater [14][15][16][17][18]) is the backpropagation algorithm with feed-forward nets. The ANN comprises nodes within input, hidden, and output layers, which form the units of basic computing, and each layer is connected by several links with assigned connection Remote Sens. 2020, 12, 2248 8 of 21 weights. The conventional ANN model is the multi-layer perceptron (MLP) with at least one hidden layer between the input and output layers that can flexibly interpret complex nonlinear relationships between independent and dependent variables.
The ANN learning process is performed according to Equations (4) and (5) below. When the input value set of n-dimensional vectors x = [x 1 , x 2 , . . . , x n ] is presented to the input layer, an output value set y is computed through the hidden layer by applying the activation function f and connection weight. The connection weights are updated by minimizing the error (E) between the predicted value (y) and target value (t) in accordance with Equation (6).
Here, the w 1 ij is the connection weight between the input layer and hidden layer; b i is the bias term; w 2 j is the connection weight between the hidden layer and output layer; b j is the bias; and i, j, and k indicate the number of nodes in the input, hidden, and output layers, respectively.

Model Setup
The above-mentioned ANN machine-learning approach was used to predict the 0.25 • total discharge data. The GRACE TWSC, TRMM precipitation (P), GLDAS average temperature (T), and GLDAS soil moisture (SMS) were processed over the Korean Peninsula in order to make up a prediction model. A total of 63,168 samples (168 months × 376 pixels) were obtained for the Korean Peninsula, and these were used to develop the ANN models. The feed-forward ANN MLP model with a backpropagation algorithm was used in this study. As shown in Figure 3, the input data for the ANN model consisted of monthly TWSC, P, SMS, and T. As previously explained, three tests were designed according to the target data, as listed in Table 3. Thus, for Case I, the target data were the total discharge over the Korean Peninsula from the GLDAS Noah model, whereas Cases II and III were based on the combined GLDAS and gauged discharge interpolated data for South Korea by using the conditional merging method along with the inverse distance weighting (IDW; Case II) and Kriging methods (Case III).
The caveat in ANN configurations is that overfitting or the generation of a local minimum can occur due to the complex computational processes and structures. Therefore, it is essential to choose an appropriate training algorithm and an appropriate number of nodes on the hidden layer. In addition, it is essential to determine an activation function to run the ANN model, which is converting the input signal of nodes to the output signal.
Hence, the Levenberg-Marquardt algorithm [27], which is based on the Gauss-Newton approximation and is generally used for training purposes [28], was applied to the ANN model in this study. The Levenberg-Marquardt algorithm has the advantage of converging faster and more reliably than the majority of training algorithms [27]. The epoch for the training was assigned to 1000, and early stop criteria were defined to avoid overfitting and to enhance the generalization of the ANN model. The training was stopped if the target value of error in the training set was met or when the error gradient reached a minimum threshold. As shown in Table 3, the number of nodes on the hidden layer was varied from 1 to 10 in order to calculate the optimum number, considering the computing demand. The structure of the ANN model used in this study is shown in Figure 3  The caveat in ANN configurations is that overfitting or the generation of a local minimum can occur due to the complex computational processes and structures. Therefore, it is essential to choose an appropriate training algorithm and an appropriate number of nodes on the hidden layer. In addition, it is essential to determine an activation function to run the ANN model, which is converting the input signal of nodes to the output signal.
Hence, the Levenberg-Marquardt algorithm [27], which is based on the Gauss-Newton approximation and is generally used for training purposes [28], was applied to the ANN model in this study. The Levenberg-Marquardt algorithm has the advantage of converging faster and more reliably than the majority of training algorithms [27]. The epoch for the training was assigned to 1000, [x 1 , x 2 , x 3 , x 4 , x 5 ] are the input data, which are antecedent (t-1 indicates 1 month ahead) TWSC, precipitation (P), soil moisture (SMS), and average temperature (T). The prediction (y 1 ) is the total discharge (Q total ). w i j and w jk denote connection weights, b j and b k are the bias term, and f represents the activation function (log-sigmoid).

Sensitivity Analysis
The general objective of the sensitivity analysis in ANN models is to evaluate the relative importance of the independent data. It is necessary to understand whether the predictions of the developed ANN model are similar to real-world phenomena and whether a valid relation between the input and output data has been applied. A method proposed by Olden et al. [29] for determining the relative importance of input variables in neural networks is similar to the Garson algorithm [30], in that it uses connection weights. Garson's algorithm considers only the absolute magnitude of the connection weight, while the Olden method considers both the magnitude and sign (positive or negative) to analyze the response between input and output data in more detail. Thus, the Olden method can be applied not only to a single hidden layer but, also, to the multi-hidden layer ANN model [29,31].
As shown in Figure 3, the application of the Olden connection weight method proceeds in three steps [29]. First, the input-hidden connection weight (w 1 ij ) and the hidden-output connection weight (w 2 j ) between the input and output nodes are multiplied to give a product matrix Q i and then, the product is summed (S i ) as the product across all hidden nodes. The relative importance (RI i ) of the ith input variable is then defined by Equation (7):

Conditional Merging
The conditional merging method, otherwise known as the kriging error correction method, is a strategy for merging satellite and ground-based observation data that was introduced by Ehret [32] and Sinclair and Pegram [33]. This method has been used to overcome the limitations of ground-based observations with point data while maintaining the accuracy of the ground data along with the spatially continuous data from satellites. Therefore, the spatial correlation of each dataset and the heterogeneity of the data can be identified [34]. Early studies using the conditional merging of precipitation data from radar and gauge sources demonstrated improved spatial and temporal variability. Recently, the conditional merging method has been applied in a wide scope of hydrological studies, including soil moisture [35] and land surface temperature [36]. In this study, the conditional merging method was used to obtain improved total discharge data from GLDAS and gauge discharge data, which were then used as the dependent variables for South Korea. In addition, the ArcGIS tool was used for interpolator IDW and Kriging (ordinary Kriging with an exponential semi-variogram model).
As indicated in Figures 4 and 5, this process involved the following steps: (a) the observed discharge at 187 stations across South Korea were collected, (b) the interpolation field (0.25 • ) was computed using either the IDW (Case II) or the Kriging (Case III) interpolator from the station data, (c) the values corresponding to the station used in step (a) were extracted from the GLDAS grid cells, (d) the 0.25 • interpolation field was estimated by applying either the IDW (Case II) or Kriging (Case III) to the grid cells extracted in step (c), (e) the residual between (b) and (d) was estimated, and (f) the merged 0.25 • total discharge field was obtained by adding the residual field from (e) to the interpolation field (0.25 • ) from the ground-based discharge data in step (a).

Metrics for Evaluation
In this study, the performance of the ANN models for predicting the total discharge was evaluated using statistical indicators such as the correlation coefficient (r; Equation (8)), maximum absolute error (MAE; Equation (9)), root mean square error (RMSE; Equation (10)), and Nash-Sutcliffe efficiency (NSE; Equation (11)). The r is an indicator of the degree of linearity in the interconnection between the prediction and observation; the r value is closer to 1; there is the positive correlation between the two datasets. Meanwhile, the MAE and RMSE provide measurements of the error. The closer the MAE and RMSE are to 0, the less is the error. Finally, the Nash-Sutcliff efficiency (NSE) measures the ability of the prediction of a model relative to the mean of the observation ranges between (−∞, 1) and measures, where a value of 1 indicates a perfect match between the datasets, a value of 0 indicates that the predictions similar as the average of the observed data, and a negative value indicates that the mean of the observation is more similar than the model.
where X is the observed data, − X is the mean of X, Y is the predictive data, − Y is the mean of Y, and N is the number of data.

of 21
the merged 0.25° total discharge field was obtained by adding the residual field from (e) to the interpolation field (0.25°) from the ground-based discharge data in step (a).

ANN-Predicted Total Discharge
The results of ANN modelling to predict the downscaled total discharge using the various satellite and reanalysis products are presented in Figure 6, where each row represents a specific case, and each column represents the results of each period (training, validation, and test). A total of the backpropagation algorithm was trained at 1000 epochs through 100 iterations, and the predictions indicated the most accurate results when the number of nodes on the hidden layer was set to four.
Among the three cases, Case III resulted in the best performance in the test period (2014.03-2016.12), with r = 0.74, MAE = 16.53 mm/month, RMSE = 27.06 mm/month, and NSE = 0.54 for South Korea. Similar to Case III, Cases I and II produced relatively good performances (r = 0.65-0.95, MAE = 13.28-20.35 mm/month, RMSE = 22.56-34.77 mm/month, and NSE = 0.42-0.90). Thus, the ANN method was able to produce relatively accurate models for the total discharge, which resulted in positive slopes.
The correlation between the ANN-predicted total discharge and target data was examined for each case. The spatial distributions of the RMSE, r, and NSE between the target total discharge and ANN-predicted total discharge over the test period (2014.03-2016.12) are presented in Figure 7. Overall, a low RMSE is noted in each case (blue color; first column in Figure 7), whereas the RMSE in North Korea is higher than that of South Korea. In addition, certain Eastern and Southern coastal regions in South Korea and the border area between South and North Korea display higher RMSE values (red color; first column in Figure 7). Moreover, positive correlations (blue color; second column in Figure 7) are observed in most areas. Similarly, the overall NSE results are fairly accurate (blue color; third column in Figure 7), although a lower accuracy is indicated in the Western regions of North Korea.
In Cases II and III, a low NSE value is observed in the border regions. In the Coastal regions, the satellite or reanalysis data have uncertainties caused by tidal effects; there is a higher degree of error than other areas. In addition, these regions have a lack of ground observations. For these reasons, it appears to indicate low correlations and high errors. The correlation between the ANN-predicted total discharge and target data was examined for each case. The spatial distributions of the RMSE, r, and NSE between the target total discharge and ANN-predicted total discharge over the test period (2014.03-2016.12) are presented in Figure 7. Overall, a low RMSE is noted in each case (blue color; first column in Figure 7), whereas the RMSE in North Korea is higher than that of South Korea. In addition, certain Eastern and Southern coastal regions in South Korea and the border area between South and North Korea display higher RMSE values (red color; first column in Figure 7). Moreover, positive correlations (blue color; second column in Figure 7) are observed in most areas. Similarly, the overall NSE results are fairly accurate (blue color; third column in Figure 7), although a lower accuracy is indicated in the Western regions of North Korea. In Cases Ⅱ and Ⅲ, a low NSE value is observed in the border regions. In the Coastal regions, the satellite or reanalysis data have uncertainties caused by tidal effects; there is a higher degree of error than other areas. In addition, these regions have a lack of ground observations. For these reasons, it appears to indicate low correlations and high errors. Modeling results for predicting the total discharge using ANN approaches at the 0.25 • scale. r: correlation coefficient, MAE: maximum absolute error, RMSE: root mean square error, and NSE: Nash-Sutcliffe efficiency.
As indicated in Figure 8, the P(t) was identified by the ANN as being the most important input parameter for all three cases: I, II, and III, while the TWSC appeared to be the least important. As shown in Figure 8a, the relative importance of each parameter for Case I was P(t) at 0.276, followed by T(t-1) (one month ahead) at 0.210, P(t-1) at 0.195, SMS(t-1) at 0.182, and TWSC(t-1) at 0.137, thus indicating comparable levels of importance for each factor. In Case II, the relative importance was 0.318 for the P(t), followed by the 0.193 for P(t-1), 0.176 for SMS(t-1), 0.164 for T(t-1), and 0.149 for TWSC(t-1) (Figure 8b). Similarly, the P(t) was of the highest importance in Case III at 0.321, followed by P(t-1) at 0.190, SMS(t-1) at 0.175, T(t-1) at 0.161, and TWSC(t-1) at 0.153 (Figure 8c). Thus, in all three cases, the P(t) and P(t-1) were identified as being very useful for the prediction of the total discharge, followed by the SMS and T. Total discharge is one of the key controls of water flux and is closely linked to precipitation. However, since these values are similar to each other, the present results demonstrate that all input data have a noteworthy effect on the total discharge prediction. As indicated in Figure 8, the P(t) was identified by the ANN as being the most important input parameter for all three cases: Ⅰ, Ⅱ, and Ⅲ, while the TWSC appeared to be the least important. As shown in Figure 8a, the relative importance of each parameter for Case Ⅰ was P(t) at 0.276, followed by T(t-1) (one month ahead) at 0.210, P(t-1) at 0.195, SMS(t-1) at 0.182, and TWSC(t-1) at 0.137, thus indicating comparable levels of importance for each factor. In Case Ⅱ, the relative importance was 0.318 for the P(t), followed by the 0.193 for P(t-1), 0.176 for SMS(t-1), 0.164 for T(t-1), and 0.149 for TWSC(t-1) (Figure 8b). Similarly, the P(t) was of the highest importance in Case Ⅲ at 0.321, followed by P(t-1) at 0.190, SMS(t-1) at 0.175, T(t-1) at 0.161, and TWSC(t-1) at 0.153 (Figure 8c). Thus, in all three cases, the P(t) and P(t-1) were identified as being very useful for the prediction of the total discharge, followed by the SMS and T. Total discharge is one of the key controls of water flux and is closely linked to precipitation. However, since these values are similar to each other, the present results demonstrate that all input data have a noteworthy effect on the total discharge prediction.

Validation of ANN-Predicted Total Discharge
The monthly time series of the ANN-predicted total discharge at 0.25° scales (Cases Ⅰ, Ⅱ, and Ⅲ); PRMS total discharge; and water balance-based total discharge according to Seo and Lee [11], corresponding to the precipitation in South Korea, are presented in Figure 9.

Validation of ANN-Predicted Total Discharge
The monthly time series of the ANN-predicted total discharge at 0.25 • scales (Cases I, II, and III); PRMS total discharge; and water balance-based total discharge according to Seo and Lee [11], corresponding to the precipitation in South Korea, are presented in Figure 9.  Case Ⅱ, and (c) Case Ⅲ.

Validation of ANN-Predicted Total Discharge
The monthly time series of the ANN-predicted total discharge at 0.25° scales (Cases Ⅰ, Ⅱ, and Ⅲ); PRMS total discharge; and water balance-based total discharge according to Seo and Lee [11], corresponding to the precipitation in South Korea, are presented in Figure 9.   The top panel presents the precipitation results from 56 stations in the KMA (Figure 1a), while the second, third, and fourth panels (Figure 9) present the results of Case I, Case II, and Case III, respectively, for South Korea, compared with the PRMS and water balance results from Seo and Lee [11]. In the Korean Peninsula, most of annual precipitation falls during July to September due to the monsoon, and the ANN-predicted total discharge starts to fluctuate in May and peaks in the summer season. Thus, the ANN-predicted total discharge closely matches the observed fluctuations in the precipitation. Moreover, the ANN-predicted total discharge is well-correlated with the dynamics of the PRMS and water balance total discharge data (r = 0.86-0.96). In all three cases, the ANN model tends to overestimate the water balance-based total discharge compared to the PRMS. In particular, the test results (March 2014-December 2016), which provide an independent ANN estimation and have no influence on the training/validation procedure, indicate that the ANN-predicted total discharge is closely correlated to the PRMS and water balance-based data.
The ANN-predicted total discharge (Case I), GLDAS, and water balance-based total discharge corresponding to precipitation in North Korea are shown in Figure 10. Overall, the total discharge of North Korea is lower than that of South Korea, and the GLDAS data tend to be underestimated compared to other data. In particular, it was found that there was little total discharge regardless of precipitation during the test period. However, the ANN-predicted and water balance-based total discharges displayed seasonal trends according to the precipitation. This is thought to be the cause for the low correlation and the high RMSE for the North Korean region in Case I (Figure 7). However, Case I showed a more reasonable correspondence with the water balance-estimated total discharge than with the GLDAS total discharge. have no influence on the training/validation procedure, indicate that the ANN-predicted total discharge is closely correlated to the PRMS and water balance-based data.
The ANN-predicted total discharge (Case Ⅰ), GLDAS, and water balance-based total discharge corresponding to precipitation in North Korea are shown in Figure 10. Overall, the total discharge of North Korea is lower than that of South Korea, and the GLDAS data tend to be underestimated compared to other data. In particular, it was found that there was little total discharge regardless of precipitation during the test period. However, the ANN-predicted and water balance-based total discharges displayed seasonal trends according to the precipitation. This is thought to be the cause for the low correlation and the high RMSE for the North Korean region in Case Ⅰ (Figure 7). However, Case Ⅰ showed a more reasonable correspondence with the water balance-estimated total discharge than with the GLDAS total discharge.

Spatial Distribution of Total Discharge
The ANN-predicted total discharges (Cases Ⅰ, Ⅱ, and Ⅲ) are compared with the PRMS, water balance-estimated, and GLDAS total discharges for March 2014 and August 2014 for the major basins of the Korean Peninsula in Figure 11. Here, the ANN-predicted total discharge shows similar spatial patterns compared to the PRMS and water balance-based total discharges, whereas the GLDAS

Spatial Distribution of Total Discharge
The ANN-predicted total discharges (Cases I, II, and III) are compared with the PRMS, water balance-estimated, and GLDAS total discharges for March 2014 and August 2014 for the major basins of the Korean Peninsula in Figure 11. Here, the ANN-predicted total discharge shows similar spatial patterns compared to the PRMS and water balance-based total discharges, whereas the GLDAS underestimates the total discharge with a smaller dynamic range than the other models. In addition, the water balance-based total discharge produces comparatively higher results. Furthermore, a visual comparison of the total discharge patterns by site (North Korea and South Korea), model, and month confirms that the ANN-predicted total discharge shows the best visual agreement with the PRMS overall, whereas the GLDAS cannot capture the high total discharge well.
The average annual precipitation and total discharge results for all major basins (Han, Geum, Nakdong, Yeongsan, and Seomjin Rivers) of South Korea are compared in Figure 12. Here, the results show similar patterns of variations in the total discharge depending on the dynamic variations in the precipitation. Thus, the ANN-predicted average annual total discharges are 633.7 mm/yr (95% confidence interval: 565.37-702.23 mm/yr) for Case I, 509.97 mm/yr (95% confidence interval: 457.10-562.04 mm/yr) for Case II, and 501.12 mm/yr (95% confidence interval: 448.56-553.69 mm/yr) for Case III, while the PRMS average annual total discharge is 723 mm/yr (95% confidence interval: 616.82-830.56 mm/yr), and the water balance-based result is 1090.40 mm/yr (95% confidence interval: 980.96-1199.83 mm/yr). As shown in Figure 11, the PRMS and ANN-predicted total discharges (especially in Case I) are similar, whereas the water balance-based total discharge result is slightly higher than in other models. underestimates the total discharge with a smaller dynamic range than the other models. In addition, the water balance-based total discharge produces comparatively higher results. Furthermore, a visual comparison of the total discharge patterns by site (North Korea and South Korea), model, and month confirms that the ANN-predicted total discharge shows the best visual agreement with the PRMS overall, whereas the GLDAS cannot capture the high total discharge well. The average annual precipitation and total discharge results for all major basins (Han, Geum, Nakdong, Yeongsan, and Seomjin Rivers) of South Korea are compared in Figure 12. Here, the results show similar patterns of variations in the total discharge depending on the dynamic variations in the precipitation. Thus, the ANN-predicted average annual total discharges are 633.7 mm/yr (95% confidence interval: 565.37-702.23 mm/yr) for Case Ⅰ, 509.97 mm/yr (95% confidence interval: 457.10-562.04 mm/yr) for Case Ⅱ, and 501.12 mm/yr (95% confidence interval: 448.56-553.69 mm/yr) for Case Ⅲ, while the PRMS average annual total discharge is 723 mm/yr (95% confidence interval: 616.82-830.56 mm/yr), and the water balance-based result is 1090.40 mm/yr (95% confidence interval: 980.96-1199.83 mm/yr). As shown in Figure 11, the PRMS and ANN-predicted total discharges (especially in Case Ⅰ) are similar, whereas the water balance-based total discharge result is slightly higher than in other models.
Similarly, the average annual precipitation and total discharge results for the major basins in North Korea (Tuman, Amnok, Cheongcheon, the Northern and Southern parts of the East Coast, Daedong, Geumya, Jangyeon Namdae, Yeseong, Imjin, and North Han Rivers) are presented in Figure 13. Hence, the precipitation and total discharge are comparatively lower than in South Korea.
In detail, the ANN-predicted annual total discharge for Case Ⅰ is 453.45 mm/yr 95% confidence interval: 405.70-501.20 mm/yr), while the GLDAS result is 396.23 mm/yr (95% confidence interval: 314.61-477.85 mm/yr), and the water balance-based result is 731.19 mm/yr (95% confidence interval: 622.59-839.79 mm/yr). Thus, as in Figure 11, similar values are obtained for Case Ⅰ, GLDAS, and the water balance-based total discharge, although the GLDAS result is comparatively lower than the ANN-predicted result, and the water balance-based result is comparatively higher than the ANN prediction.  Similarly, the average annual precipitation and total discharge results for the major basins in North Korea (Tuman, Amnok, Cheongcheon, the Northern and Southern parts of the East Coast, Daedong, Geumya, Jangyeon Namdae, Yeseong, Imjin, and North Han Rivers) are presented in Figure 13. Hence, the precipitation and total discharge are comparatively lower than in South Korea. In detail, the ANN-predicted annual total discharge for Case I is 453.45 mm/yr 95% confidence interval: 405.70-501.20 mm/yr), while the GLDAS result is 396.23 mm/yr (95% confidence interval: 314.61-477.85 mm/yr), and the water balance-based result is 731.19 mm/yr (95% confidence interval: 622.59-839.79 mm/yr). Thus, as in Figure 11, similar values are obtained for Case I, GLDAS, and the water balance-based total discharge, although the GLDAS result is comparatively lower than the ANN-predicted result, and the water balance-based result is comparatively higher than the ANN prediction.