Application of Artificial Neural Networks to Rainfall Forecasting in the Geum River Basin , Korea

This study develops a late spring-early summer rainfall forecasting model using an artificial neural network (ANN) for the Geum River Basin in South Korea. After identifying the lagged correlation between climate indices and the rainfall amount in May and June, 11 significant input variables were selected for the preliminary ANN structure. From quantification of the relative importance of the input variables, the lagged climate indices of East Atlantic Pattern (EA), North Atlantic Oscillation (NAO), Pacific Decadal Oscillation (PDO), East Pacific/North Pacific Oscillation (EP/NP), and Tropical Northern Atlantic Index (TNA) were identified as significant predictors and were used to construct a much simpler ANN model. The final best ANN model, with five input variables, showed acceptable performance with relative root mean square errors of 25.84%, 32.72%, and 34.75% for training, validation, and testing data sets, respectively. The hit score, which is the number of hit years divided by the total number of years, was more than 60%, which indicates that the ANN model successfully predicts rainfall in the study area. The developed ANN model, incorporated with lagged global climate indices, could allow for more timely and flexible management of water resources and better preparation against potential droughts in the study region.


Introduction
Rainfall prediction is of great importance to prevent flooding and manage water resources, saving lives and property and securing economic activities.Insufficient rainfall has a strong adverse influence on water supply, water quality, and the aquatic ecosystem.If possible, the ability to forecast rainfall several months in advance would enable effective water use.Therefore, accurate rainfall forecasting is a challenging task in operational water resources management [1].
Several methods are available for rainfall forecasting, such as numerical weather prediction (NWP) models, statistical methods, and machine learning techniques.Among these, machine learning techniques, such as artificial neural network (ANN), k-nearest neighbor, support vector machine, and random forest model, are more suitable for rainfall forecasting because physical processes affecting rainfall occurrence are highly complex and non-linear [2].The ANN is a form of machine learning technique that has been widely used in rainfall prediction given its ability to identify highly complex non-linear relationships between input and output variables without the need to understand the nature of the physical processes.
Various studies on rainfall prediction have been published using ANNs.Bodri and Cermak [3] developed the ANN model for predicting the time series of monthly precipitation for two Czech meteorological stations using actual precipitation data in the previous months of the current year and a given month in the two previous years.Bodri and Cermak [4] further predicted the next month's precipitation for six Czech and four Hungarian meteorological stations using ANNs.Wu et al. [5] developed ANN models to forecast monsoon rainfall over the Yangtze delta region in China 1, 5, and 10 years in advance using only historical data of the total amount of summer rainfall.Philip and Joseph [6] predicted monthly rainfall in Kerala State, the southern part of the Indian Peninsula, using the adaptive basis function neural network with historical data of the previous four years and three months of rainfall.Chakraverty and Gupta [7] predicted southwestern monsoon rainfall over India six years in advance using only historical data as inputs for ANN models.Chattopadhyay and Chattopadhyay [8] generated forecasts using ANNs for Indian average summer monsoon rainfall with the previous year's rainfall amount in the months of June, July, and August.Gholizadeh and Darand [9] forecasted monthly precipitation for Tehran, Iran a year in advance using ANNs with a genetic algorithm.Aksoy and Dahamsheh [10] used feed-forward back-propagation (FFBP), radial basis function (RBF), and generalized regression-type ANNs to forecast precipitation one month ahead.Bilgili and Sahin [11] applied ANNs to predict the long-term monthly temperature and rainfall for stations in Turkey with geographical variables and neighboring measuring stations data.The ANN models in these studies used the characteristics of previously observed rainfall data, which were rainfall-rainfall models.Those studies revealed that ANNs are a useful tool to forecast rainfall amounts on various time scales in advance in arid to humid areas.
Global teleconnections are statistical associations between climate variables separated by large distances [12].Teleconnections are the result of large-scale dynamics between the ocean and atmosphere, linking different regional climates into a unified global climatic system [13,14].Many attempts have been made to forecast precipitation using various climate indices data representing teleconnection patterns.Silverman and Dracup [15] applied ANNs to forecast the total water year precipitation of California's seven zones using the monthly 700-hPa teleconnection indices and El Niño Southern Oscillation (ENSO) indicators.They emphasized the possibility of long-term precipitation predictions using ANNs and large scale climate variables.Kumar et al. [16] developed summer monsoon rainfall forecasting ANN models with current and lagged climate indices of ENSO, Indian Ocean Oscillation, and local ocean-land temperature contrast as inputs.Iseri et al. [17] developed ANN models for August rainfall forecasting in Fukuoka, Japan using sea surface temperature anomalies in the Pacific Ocean and the lagged climate indices of the Southern Oscillation Index (SOI), Pacific Decadal Oscillation (PDO), and North Pacific Index (NPI).Hartman et al. [18] predicted summer rainfall in the Yangtze River basin using a set of climate indices including the SOI, the East Atlantic/Western Russia (EA/WR) pattern, the Scandinavia (SCA) pattern, the Polar/Eurasia (POL) pattern, and several indices calculated from sea surface temperatures (SST), sea level pressures (SLP), and snow data.Yuan et al. [12] predicted summer precipitation in the source region of the Yellow River, China using ANN models with inputs of North Atlantic Oscillation (NAO), West Pacific (WP), ENSO, and POL patterns.Most studies showed the applicability of the combined use of ANNs and large-scale climate teleconnections for regional summer rainfall forecasting in monsoon areas.
Abbot and Marohasy [19] applied ANNs to forecast monthly and seasonal rainfall three months in advance in Queensland, Australia by inputting climate indices such as SOI, PDO, and Nino 3.4, as well as historical rainfall and temperature data.Abbot and Marohasy [20] further applied ANNs to forecast the one-year-ahead monthly rainfall for locations in the Murray-Darling basin, Australia.They used the SOI, Dipole Mode Index (DMI), Niño 4, Niño 3.4, Niño 3, Niño 1.2, and the Inter-Decadal Pacific Oscillation (IPO) climate indices as inputs.Badr et al. [21] employed ANNs to forecast summer rainfall anomalies in the Sahel region of Africa using springtime surface air temperature (SAT) anomalies and sea surface temperature (SST).Rasel et al. [22] predicted spring rainfall for South Australia using ANNs with the lagged climate indices of the ENSO-DMI-SAM (Southern Annual Mode).Badr et al. [21] and Rasel et al. [22] showed the predictive superiority of ANNs by comparing with alternative statistical methods.As these comprehensive studies of the combined use of ANNs and teleconnection features have been completed to improve the predictability of rainfall by considering other local meteorological parameters together with large-scale climate indices, the predictive accuracy of ANN models has been shown to be superior to that Water 2018, 10, 1448 3 of 14 of different forecasting methods, and have enabled a longer lead-time.As reviewed in previous research, a variety of ANN-based regional rainfall forecast models using teleconnection climate indices have been created extensively around the world, but no models exist for South Korea.
Extreme droughts are occurring more frequently, resulting in a serious reduction in the water supply in the central region of South Korea from 2013 to 2016 and in the southern region in 2017.As a result of droughts, a large number of regions have experienced inaccessibility to safe water.Droughts have caused damage, such as restrictive water rationing, restrictions of instream flow, and reduced agricultural water supply for dams located in the Han River and Geum River Basin [23].Drought damage continued until the following year due to insufficient rainfall during the summer season.In 2017, the cumulative precipitation in spring (March to May) was low, at 50% compared with the historically normal level of the season, 30% of a normal year in May, and 38% of a normal year in June.The rainfall amount in late spring and early summer, in which a large amount of irrigation water is required for farming, greatly impacts rice production.Therefore, it is essential to forecast seasonal rainfall in advance for more timely and efficient management of water resources.Appropriate rainfall forecasts can be achieved using global teleconnection patterns.
In this study, a simple ANN model with inputs from several lagged climate indices is developed to predict late spring and early summer rainfall during May and June in the Geum River Basin.A preliminary ANN structure is constructed after identifying the lagged correlation between climate indices and rainfall.Then, a compact optimal ANN model is established through a process of determining the contribution of input variables, and the model performance is evaluated.

Rainfall Data
The Geum River Basin is located in the west central region of South Korea (Figure 1), having an area of 9912.15 km 2 and a main channel length of 360.70 km.The areal average monthly rainfall data for the period from January 1966 to December 2017 for the Geum River Basin were obtained from the Water Resources Management Information System (WAMIS) and the hydrological survey reports of the Geum River Flood Control Office (GRFCO) in Korea.Figure 1 shows the study area and the locations of the rainfall stations.This study attempts to predict the late spring-early summer rainfall, which presents the total amount of rainfall in May and June (M-J).

Climate Indices
The global climate indices and historical rainfall data were used as predictors of the ANN-based forecasting model in this study.The monthly values of climate indices were collected

Climate Indices
The global climate indices and historical rainfall data were used as predictors of the ANN-based forecasting model in this study.The monthly values of climate indices were collected from the Climate Prediction Center under National Oceanic and Atmospheric Administration (https://www.esrl.noaa.gov/psd/data/climateindices/list/).To select the predictor variables and identify the months that could be used as input to the ANNs, cross-correlation analyses were carried out for delayed climate indices.Many previous studies have validated that lagged climate variables are good predictors [19][20][21][22]24] for ANN models.The highly correlated 10 climate indices, including Arctic Oscillation (AO), East Pacific/North Pacific Oscillation (EP/NP), East Atlantic Pattern (EA), North Atlantic Oscillation (NAO), North Tropical Atlantic (NTA), Tropical Northern Atlantic Index (TNA), Western Pacific Index (WP), Pacific Decadal Oscillation (PDO), Southern Oscillation (SOI), and Sea Level Pressure of Darwin (SLP_D), were selected as inputs for the ANN models.The October rainfall amount of the previous year was also found to be highly correlated with a predictand.Table 1 provides a list of inputs with time lags and cross-correlation values that were used for the forecasting model.A maximum negative correlation of −0.375 was achieved for NAO, and the month with the highest correlation was the month of December of the previous year.A positive maximum correlation of 0.326 is obtained for the six-month-lagged EP/NP.For different climate indices, different months had a significant correlation with the M-J rainfall.

Artificial Neural Network
ANN is a data-driven mathematical model that was developed to imitate the structure of a human brain neural network and has been widely applied to solve problems such as prediction and discrimination.The ANN is based on the perceptron-a compound word combining the role of neurons and recognition.The perceptron consists of one input layer and an output layer, and each layer contains nodes for data operations corresponding to a cell body.By adding a hidden layer and nodes inside the input layer and the output layer of the perceptron, the network expands to a multilayer perceptron structure.In general, an artificial neural network refers to a multilayer perceptron structure.
The three-layered feed-forward neural network has been widely used in hydrologic forecasting models [25][26][27][28].The input data in the input layer is transferred to each neuron in the hidden layer through a linear sum operation, and the result of inputting the linear sum to the activation function is the result of the hidden layer neuron.The same procedure is followed from the hidden layer to the output layer.A neural network with three layers can be expressed mathematically by a linear combination of the transferred input values as: where ŷk is the forecasted kth output value, f 0 is the activation function for the output neuron, n is the number of output neurons, w kj is the weight connecting the jth neuron in the hidden layer and kth neuron in the output layer, f h is the activation function for the hidden neuron, m is the number of hidden neurons, w ji is the weight connecting the ith neuron in the input layer and jth neuron in the hidden layer, x i is the ith input variable, w jb is the bias for the jth hidden neuron, and w kb is the bias for the kth output neuron [29,30].
Learning the ANN model is a training process entailing the search for the optimal weight vector used in Equation (1).In this study, the weights that minimize the sum of errors of the network in Equation ( 2) were calculated using the back-propagation algorithm [31]: where E is the error for all input patterns and E p is the error based on the squared difference between the true outputs y pk and the forecasted outputs ŷpk for pattern p [32].

ANN Model Development
The identification of significant input variables and the optimization of the network structure are important steps in building an optimal ANN model.As described in Table 1, the input variables were determined by investigating cross-correlations between the lagged climate indices and the total rainfall in May and June.The number of input nodes is equivalent to the number of input variables.The number of hidden layers was selected as one, and the number of nodes in the hidden layer was experimentally determined by trial and error with a learning rate of 0.01 and a hyperbolic tangent sigmoid activation function for the hidden layer.The number of hidden nodes was changed for the trial networks from 2 to 10, and a decision was made regarding the relative root mean square error (RRMSE) of the forecasted M-J rainfall against the true observed rainfall during the training and validation stages.Since the artificial neural network randomly sets the initial weight value at the beginning of the training, a different neural network model is created for each training process, yielding different performance.Therefore, the optimal prediction model was selected based on the average accuracy obtained by repeating the ANN model generation process 100 times.
In this study, the K-fold cross-validation (CV) procedure [33,34], which is one of the most widely used re-sampling methods, was used to evaluate the model performance.The reasons for using the K-fold CV method were to allow the selection of the best model architecture and to avoid overfitting the specific training data set.Data for the ANN model development were firstly divided into five equally sized subsets (10 patterns per each subset), avoiding duplication.Four subsets were used for training and validation, and the one remaining subset was used for testing.Therefore, the four-fold CV procedure on the calibration data (i.e., training and validation data) was performed to determine which model is best.After the training and validation were repeated four times using the four subsets used in the cross-validation process, the average of the RRMSEs on each fold was used to obtain an aggregate measure.In order to avoid overtraining, an early stopping technique was also applied with continuous monitoring of the errors in both the training set and the validation set during training.The number of hidden nodes that showed the smallest CV error was chosen for the M-J rainfall forecasts.After the selection of the optimum number of hidden nodes, the ANN model was trained with the aggregate data of the training and validation sets, and the trained model was finally tested using the unseen data set to evaluate the model performance.
In order to quantify the contributions of each input variable to the prediction of the output variable in ANNs, we applied Garson's connection weight method [35] and Olden's connection weight method [36].Garson's method uses the absolute values of the connection weights between nodes to determine the relative importance (RI) of each input variable.The computation procedure of Garson's algorithm is as follows: (1) The products P ij are obtained by multiplying the input-hidden connection weight and the hidden-output connection weight for each hidden neuron i and repeating this for each input neuron j; (2) Scaled products Q ij are obtained by dividing the absolute values of P ij by the sum for all input variables ∑ J j=1 abs P ij for each hidden neuron i; (3) The product S j is obtained by summing Q ij for each input neuron; and (4) Relative importance values RI j (%) are obtained by dividing S j by the sum for all the input variables ∑ J j=1 S j and expressing the figure as a percentage.
Olden's method calculates the product of the raw input-hidden and hidden-output connection weights between each input node and output node, and sums the products across all hidden nodes [36].
After determining the contribution of the input variables, a simpler ANN model was constructed.The best ANN model was obtained from the four-fold CV procedure, as described previously, and the model performance was evaluated for the unseen testing data set.

Preliminary ANN Model for Rainfall Forecasting
The optimal ANN model structure with 11 input variables was determined using the four-fold CV procedure by varying the number of hidden neurons from 2 to 10.For each hidden neuron, four iterations of the training and validation were performed.The average RRMSE between the actual rainfall and the predicted rainfall measured using the ANN models is presented in Table 2.The results show that the network performance with different numbers of hidden neurons was not significantly different.The results also show that a large number of hidden neurons did not always lead to better performance For the training part, the RRMSE values ranged from 29.75% to 31.54% (RMSE: 68.76 mm to 72.90 mm), and for the validation part, the RRMSE values ranged from 35.52% to 35.73% (RMSE: 82.04 mm to 82.50 mm), and for testing part, the RRMSEs ranged from 34.28% to 34.88% (RMSE: 85.82 mm to 87.15 mm).The Pearson correlation coefficient (CC) values ranged from 0.731 to 0.757 for training, from 0.453 to 0.460 for validation, and from 0.723 to 0.743 for testing.The resultant accuracy was within the acceptable range.Based on the minimum error in CV, the optimal ANN structure with four hidden neurons was considered the best, which is denoted as ANN (11,4,1) hereafter.The selected ANN (11,4,1) was trained using the whole data set of the training and validation parts.The weights for this trained structure were saved and the network was evaluated for the testing part.The RRMSE values for the testing part for other structures that were not selected as the best are described in Table 2 for comparison.As evident from the results shown in Table 2, the RRMSE values for the test data were acceptable, with little difference from the training and validation results.The observed rainfall data were divided into three categories based on µ + 0.43σ, under the assumption of a normal distribution with a mean of µ and standard deviation of σ.Then, the observed and the predicted rainfall were classified into one of the three categories of below, near, and above-normal rainfall conditions.Figure 2 compares the ANN (11,4,1)-forecasted rainfall with observed rainfall for May-June.The shaded area in the figure means near-normal rainfall.Even though there are some deviations from the observed rainfall, the model results show reasonable accuracy.However, the model has under-forecasted for high rainfall years (1979, 1980, 1986, 1999, and 2004), and over-forecasted for low rainfall years (1968 and 1992).That is, the results indicated that the prediction performance for the below-and above-normal rainfall conditions is not as good as for near normal condition.These characteristics of the scatter plot of the observed and the predicted rainfall values are more clearly shown in Figure 3.The hit years are represented as filled symbols in the figure.The observed rainfall data were divided into three categories based on  + 0.43, under the assumption of a normal distribution with a mean of  and standard deviation of .Then, the observed and the predicted rainfall were classified into one of the three categories of below, near, and above-normal rainfall conditions.Figure 2 compares the ANN (11,4,1)-forecasted rainfall with observed rainfall for May-June.The shaded area in the figure means near-normal rainfall.Even though there are some deviations from the observed rainfall, the model results show reasonable accuracy.However, the model has under-forecasted for high rainfall years (1979, 1980, 1986, 1999, and 2004), and over-forecasted for low rainfall years (1968 and 1992).That is, the results indicated that the prediction performance for the below-and above-normal rainfall conditions is not as good as for near normal condition.These characteristics of the scatter plot of the observed and the predicted rainfall values are more clearly shown in Figure 3.The hit years are represented as filled symbols in the figure.Table 3 presents the number of years for each category and the hit score which is the number of true years divided by the number of total years.As shown in Table 3, the overall hit score from the ANN model was 62.0%, and for the near-normal rainfall condition, the hit score was particularly high at 70%.However, in the case of below-and above-normal rainfall, the hit scores were not as high as those of near-normal rainfall.

Quantification of Relative Importance of Input Variables
Each input variable's contribution to the output value was evaluated using Garson's method and Olden's method.Figure 4 shows the relative importance of the 11 independent variables obtained from the Garson's connection weights method in the form of a box plot.A wide variability of the importance values was observed depending on the different random initial weights used.Results show that the differences between the minimum and maximum values were about 6.0-12.8%for the input variables.The greatest difference was observed for the variable EA, and the lowest difference was observed for AO.Predictor contributions of median values ranged from 4.0% to 18.8%, with EA showing the strongest relationship with predicted rainfall, and SLP_D and WP exhibiting the weakest relationships.The EA index followed by NAO, PDO, EPNP, and TNA was identified as the most important predictor of late spring and early summer rainfall in the Geum River Basin.These top five important predictors, which always have a relative importance above 5%, were selected for constructing a more concise ANN model.Table 3 presents the number of years for each category and the hit score which is the number of true years divided by the number of total years.As shown in Table 3, the overall hit score from the ANN model was 62.0%, and for the near-normal rainfall condition, the hit score was particularly high at 70%.However, in the case of below-and above-normal rainfall, the hit scores were not as high as those of near-normal rainfall.

Quantification of Relative Importance of Input Variables
Each input variable's contribution to the output value was evaluated using Garson's method and Olden's method.Figure 4 shows the relative importance of the 11 independent variables obtained from the Garson's connection weights method in the form of a box plot.A wide variability of the importance values was observed depending on the different random initial weights used.Results show that the differences between the minimum and maximum values were about 6.0-12.8%for the input variables.The greatest difference was observed for the variable EA, and the lowest difference was observed for AO.Predictor contributions of median values ranged from 4.0% to 18.8%, with EA showing the strongest relationship with predicted rainfall, and SLP_D and WP exhibiting the weakest relationships.The EA index followed by NAO, PDO, EPNP, and TNA was identified as the most important predictor of late spring and early summer rainfall in the Geum River Basin.These top five important predictors, which always have a relative importance above 5%, were selected for constructing a more concise ANN model.Olden and Jackson [36] stated that Garson's method may be potentially misleading for the interpretation of the contribution of input variables because the method does not consider the direction of the input-output interaction.In some cases, the influence of an input variable on the output response can be negligible when a positive influence through a hidden node is counteracted by a negative influence through another hidden node.In order to compensate for the drawback of Garson's method, Olden's method was additionally applied for the quantification of the relative importance of input variables in the present study.
Figure 5 shows the overall connection weights for each input variable obtained from Olden's method.As expected, a higher variability in the overall connection weights was observed.The predictors' contributions ranged from −0.41 to 0.23 of the median values.Most variables affected this positively, except for EA, NAO, and WP, which showed a negative influence; in other words, as those values increased, the output rainfall decreased.It was apparent that the most influential variables were EA and NAO, and the least influential variables were SLP_D and GEUM.Therefore, both methods performed similarly in terms of determining the variable importance.Ranking produced the order of EA, NAO, PDO, EPNP, and TNA according to the magnitude of median values.Thus, these top five influential variables were selected as predictors, which agree with the results obtained from Garson's method.Olden and Jackson [36] stated that Garson's method may be potentially misleading for the interpretation of the contribution of input variables because the method does not consider the direction of the input-output interaction.In some cases, the influence of an input variable on the output response can be negligible when a positive influence through a hidden node is counteracted by a negative influence through another hidden node.In order to compensate for the drawback of Garson's method, Olden's method was additionally applied for the quantification of the relative importance of input variables in the present study.
Figure 5 shows the overall connection weights for each input variable obtained from Olden's method.As expected, a higher variability in the overall connection weights was observed.The predictors' contributions ranged from −0.41 to 0.23 of the median values.Most variables affected this positively, except for EA, NAO, and WP, which showed a negative influence; in other words, as those values increased, the output rainfall decreased.It was apparent that the most influential variables were EA and NAO, and the least influential variables were SLP_D and GEUM.Therefore, both methods performed similarly in terms of determining the variable importance.Ranking produced the order of EA, NAO, PDO, EPNP, and TNA according to the magnitude of median values.Thus, these top five influential variables were selected as predictors, which agree with the results obtained from Garson's method.Olden and Jackson [36] stated that Garson's method may be potentially misleading for the interpretation of the contribution of input variables because the method does not consider the direction of the input-output interaction.In some cases, the influence of an input variable on the output response can be negligible when a positive influence through a hidden node is counteracted by a negative influence through another hidden node.In order to compensate for the drawback of Garson's method, Olden's method was additionally applied for the quantification of the relative importance of input variables in the present study.
Figure 5 shows the overall connection weights for each input variable obtained from Olden's method.As expected, a higher variability in the overall connection weights was observed.The predictors' contributions ranged from −0.41 to 0.23 of the median values.Most variables affected this positively, except for EA, NAO, and WP, which showed a negative influence; in other words, as those values increased, the output rainfall decreased.It was apparent that the most influential variables were EA and NAO, and the least influential variables were SLP_D and GEUM.Therefore, both methods performed similarly in terms of determining the variable importance.Ranking produced the order of EA, NAO, PDO, EPNP, and TNA according to the magnitude of median values.Thus, these top five influential variables were selected as predictors, which agree with the results obtained from Garson's method.As pointed out previously [37], the use of a single ANN structure can cause misunderstanding when extracting the contributed input variables because the relative importance can be highly different from that averaged from the group of ANNs.The wider ranges of relative importance are depicted in Figures 4 and 5. Therefore, a set of ANNs with different initial weights and two or more methods of variable importance quantification, such as Garson's and Olden's algorithms, should be used to select significant predictors and to produce a reliable output response.

Best ANN Model for Rainfall Forecasting
The optimal ANN model structure with five input variables (EA, NAO, PDO, EPNP, and TNA) was determined by varying the number of hidden neurons from 2 to 10. Table 4 shows the training, validation, and testing results for ANN structures.As the number of hidden neurons increased, the RRMSE for the training part decreased and reached a minimum value after a certain number of hidden neurons, but increased for the validation part.For the training part, the RRMSE values ranged from 25.84% to 26.70% (RMSE: 59.31 mm to 60.63 mm), and for the validation part, the values of RRMSE range from 32.72% to 34.79% (RMSE: 75.73 mm to 80.55mm), which shows more accurate performance than the results of the preliminary ANN model.CC values ranged from 0.763 to 0.774 for training, from 0.584 to 0.614 for validation, and from 0.623 to 0.656 for testing.As indicated in Table 4, the ANN (5,2,1) with two hidden neurons had the best prediction accuracy in the validation part.After the ANN (5,2,1) was re-trained using the whole data set of the training and validation parts, the performance was evaluated for the testing part, which shows acceptable results with an RRMSE value of 34.75% (RMSE: 86.84 mm). Figure 6 shows the actual rainfall versus the predicted data using ANN (5,2,1).The model results showed reasonable accuracy with observed rainfall for most of the yearly M-J rainfall values, except for some deviation for the years of 1980, 1986, 1988, 1992, and 1999.The model significantly under-forecasted the high rainfall years (1980, 1986, and 1999), over-forecasted the low rainfall years (1988 and 1992), and forecasted for the remaining years with reasonable accuracy.Figure 7 shows the scatter plot of the observed and forecasted M-J rainfall values.From the comparison of Figures 3 and 7, ANN (5,2,1) model performed better than ANN (11,4,1) model for the predictions of higher or lower rainfall values.Table 5 includes the number of hits and fails for each category and the hit score.The overall hit score for the ANN (5,2,1) model was 66.0%, which was higher than the result for ANN (11,4,1) model.The ANN (5,2,1) hit score for the below-normal rainfall was very high at 75%, which is much higher than that of the ANN (11,4,1) model.In terms of the prediction performance of rainfall being below normal, the ANN (5,2,1) model can be more useful for low rainfall forecasting.This study was the first attempt to construct ANN models with predictors of global teleconnection patterns to forecast basin scale rainfall amounts several months in advance in South Korea.The optimal ANN model had fairly acceptable predictive performance with an RRMSE of about 30% and a Pearson correlation coefficient of more than 0.6.This performance is as good as that of other studies [12,17,18] on forecasting monsoon summer rainfall for East Asian regions, despite the fact that the ANN model developed in the present study had a longer lead time.Table 5 includes the number of hits and fails for each category and the hit score.The overall hit score for the ANN (5,2,1) model was 66.0%, which was higher than the result for ANN (11,4,1) model.The ANN (5,2,1) hit score for the below-normal rainfall was very high at 75%, which is much higher than that of the ANN (11,4,1) model.In terms of the prediction performance of rainfall being below normal, the ANN (5,2,1) model can be more useful for low rainfall forecasting.This study was the first attempt to construct ANN models with predictors of global teleconnection patterns to forecast basin scale rainfall amounts several months in advance in South Korea.The optimal ANN model had fairly acceptable predictive performance with an RRMSE of about 30% and a Pearson correlation coefficient of more than 0.6.This performance is as good as that of other studies [12,17,18] on forecasting monsoon summer rainfall for East Asian regions, despite the fact that the ANN model developed in the present study had a longer lead time.Table 5 includes the number of hits and fails for each category and the hit score.The overall hit score for the ANN (5,2,1) model was 66.0%, which was higher than the result for ANN (11,4,1) model.The ANN (5,2,1) hit score for the below-normal rainfall was very high at 75%, which is much higher than that of the ANN (11,4,1) model.In terms of the prediction performance of rainfall being below normal, the ANN (5,2,1) model can be more useful for low rainfall forecasting.This study was the first attempt to construct ANN models with predictors of global teleconnection patterns to forecast basin scale rainfall amounts several months in advance in South Korea.The optimal ANN model had fairly acceptable predictive performance with an RRMSE of about 30% and a Pearson correlation coefficient of more than 0.6.This performance is as good as that of other studies [12,17,18] on forecasting monsoon summer rainfall for East Asian regions, despite the fact that the ANN model developed in the present study had a longer lead time.
The present study intended to reduce the time-consuming work needed to construct ANN architecture, such as the determination of significant input variables.There could be a large number of climate variables affecting output response; thus, it is difficult to find optimal input variables by a trial and error procedure.To overcome this problem, firstly, possible candidates of lagged climate indices were determined from the correlation analysis, a preliminary ANN model was constructed using the candidates, and then the final optimal ANN model was developed using a few significant inputs, which were selected by evaluating the contribution of each variable.With the help of the correlation analysis and the quantification of variable importance, the time-consuming laborious trial and error procedure could be greatly reduced.To the best of our knowledge, the approach using teleconnection climate indices and quantifying variable importance has not been applied to seasonal rainfall forecasting ANN models.We think that this approach can be useful to enable quick forecasting.

Conclusions
We constructed an artificial neural network model to predict rainfall in late spring and early summer for the Geum River Basin, South Korea.For this purpose, several delayed global climate indices and the areal average rainfall of the basin were used as predictors and a predictand of the ANN model, respectively.
After identifying the lagged correlation between climate indices and rainfall amount in May and June, a preliminary ANN model with 11 input variables including the global climate indices of AO, EP/NP, EA, NAO, NTA, TNA, WP, SOI, PDO, SLP_D, and the areal rainfall of the Geum River Basin with different lag times for each was constructed.The optimal hidden neuron number of the ANN model with 11 input variables was selected as four based on the four-fold CV procedure.The preliminary ANN (11,4,1) model showed satisfactory prediction performance with RRMSE values of 30.43%, 35.52%, and 34.41% for the training, validation, and testing data sets, respectively.The hit score, which is the number of hit years divided by the number of total years, was 62.0%.However, ANN (11,4,1) has a tendency to under-forecast for high rainfall years while over-forecasting for low rainfall years.
We quantified the relative importance of input variables using Garson's and Olden's connection weight methods to identify highly significant predictors and to construct a simple ANN model with a few input variables.The five lagged climate indices-EA, NAO, PDO, EP/NP, and TNA-were selected as predictors and the optimal structure of the ANN with two hidden neurons was determined based on the four-fold CV results.The final best ANN (5,2,1) model showed acceptable performance with RRMSE values of 25.84%, 32.72%, and 34.75% for training, validation, and testing parts, respectively.The hit score was found to be 66% for total years, and 75.0%, 60.0%, 64.3% for below, near, and above-normal historical conditions, respectively.The results revealed that ANN (5,2,1) was more successful than ANN (11,4,1) in predicting early spring and late summer rainfall of the basin of interest, particularly showing good performance in below-normal condition.The results also indicated that the quantification of the contribution of the variable relative importance was able to improve the accuracy of forecasting rainfall forecasts by removing some input variables that show a weak correlation.
The optimal model predicted higher values of rainfall to be acceptable, but the prediction of the lower values was relatively insufficient.Future studies need to be carried out to improve the prediction of the extreme lower rainfall amount with the additional consideration of new climatic indices, as well as weather data such as temperature, humidity, and wind.
In conclusion, this study revealed the possibility of seasonal rainfall forecasting using ANNs and lagged climate indices four months in advance for the study region.Good prediction of the late spring-early summer rainfall amount could allow for the more flexible operation of multi-purpose dams in the Geum River and provide sufficient time to prepare strategies against potential drought damage.The developed ANN model can be considered an alternative tool to the existing physically-based forecasting models.

Water 2018 ,Figure 1 .
Figure 1.Map of the study area and rainfall stations.

Figure 1 .
Figure 1.Map of the study area and rainfall stations.

Figure 4 .
Figure 4. Relative importance of input variables.

Figure 5 .
Figure 5. Overall connection weights of input variables.

Figure 4 .
Figure 4. Relative importance of input variables.

Figure 5 .
Figure 5. Overall connection weights of input variables.Figure 5. Overall connection weights of input variables.

Figure 5 .
Figure 5. Overall connection weights of input variables.Figure 5. Overall connection weights of input variables.

Table 1 .
Selected climate indices and lagged months with the highest cross-correlation value.
AOThe first leading mode from the Empirical Orthogonal Function (EOF) analysis of monthly mean height anomalies at 1000-hPa 10 0.196 EP/NP A Spring-Summer-Fall pattern with three main anomaly centers 6 0.326EAThe second prominent mode of low-frequency variability over the North Atlantic, and appears as a leading mode in all

Table 2 .
Preliminary artificial neural network (ANN) model performance for training, validation, and testing parts.

Table 2 .
Preliminary artificial neural network (ANN) model performance for training, validation, and testing parts.

Table 3 .
Number of hit/fails and hit scores for three categories.

Table 3 .
Number of hit/fails and hit scores for three categories.

Table 4 .
ANN models performance for training, validation, and testing parts.

Table 5 .
Number of hit/fails and hit scores for three categories.

Table 5 .
Number of hit/fails and hit scores for three categories.

Table 5 .
Number of hit/fails and hit scores for three categories.