Medium-Term Rainfall Forecasts Using Artiﬁcial Neural Networks with Monte-Carlo Cross-Validation and Aggregation for the Han River Basin, Korea

: In this study, artiﬁcial neural network (ANN) models were constructed to predict the rainfall during May and June for the Han River basin, South Korea. This was achieved using the lagged global climate indices and historical rainfall data. Monte-Carlo cross-validation and aggregation (MCCVA) was applied to create an ensemble of forecasts. The input-output patterns were randomly divided into training, validation, and test datasets. This was done 100 times to achieve diverse data splitting. In each data splitting, ANN training was repeated 100 times using randomly assigned initial weight vectors of the network to construct 10,000 prediction ensembles and estimate their prediction uncertainty interval. The optimal ANN model that was used to forecast the monthly rainfall in May had 11 input variables of the lagged climate indices such as the Arctic Oscillation (AO), East Atlantic / Western Russia Pattern (EAWR), Polar / Eurasia Pattern (POL), Quasi-Biennial Oscillation (QBO), Sahel Precipitation Index (SPI), and Western Paciﬁc Index (WP). The ensemble of the rainfall forecasts exhibited the values of the averaged root mean squared error (RMSE) of 27.4, 33.6, and 39.5 mm, and the averaged correlation coe ﬃ cient (CC) of 0.809, 0.725, and 0.641 for the training, validation, and test sets, respectively. The estimated uncertainty band has covered 58.5% of observed rainfall data with an average band width of 50.0 mm, exhibiting acceptable results. The ANN forecasting model for June has 9 input variables, which di ﬀ ered from May, of the Atlantic Meridional Mode (AMM), East Paciﬁc / North Paciﬁc Oscillation (EPNP), North Atlantic Oscillation (NAO), Scandinavia Pattern (SCAND), Equatorial Eastern Paciﬁc SLP (SLP_EEP), and POL. The averaged RMSE values are 39.5, 46.1, and 62.1 mm, and the averaged CC values are 0.853, 0.771, and 0.683 for the training, validation, and test sets, respectively. The estimated uncertainty band for June rainfall forecasts generally has a coverage of 67.9% with an average band width of 83.0 mm. It can be concluded that the neural network with MCCVA enables us to provide acceptable medium-term rainfall forecasts and deﬁne the prediction uncertainty interval. hydrology. ﬁnd


Introduction
Accurate and timely rainfall forecasting is necessary for efficient water resources management, flood protection, and drought risk mitigation [1]. In particular, rainfall forecasting on a monthly or seasonal basis has positive effects on effective water resources allocation, water supply planning, and water demand reduction during a drought period. Medium to long-term rainfall forecasting is an interesting and challenging matter in the fields of meteorology and hydrology. and cross-validation in a single approach through the repeated random splitting of the original learning data into mutually disjoint datasets for training and validation. However, the Monte-Carlo cross-validation method does not significantly focus on the ANN-based medium to long-term rainfall forecasting. Our study uses the MCCVA method to generate an ensemble of forecasts, and it evaluates the performance of the rainfall forecasting model.
The objective of this study is to develop practical ANN models using MCCVA to forecast the rainfall for the Han River basin in South Korea during May and June. This time period was selected because this is when agricultural water demand abruptly increases and severe droughts in recent years have occurred during both months. The monthly global climate indices and past rainfall data with different lag times are determined as the predictors of the ANN models. The MCCVA method is used for random subsampling to split the entire dataset, create diverse networks, and evaluate the general performance of the resulting ensemble of rainfall forecasts. In addition, the output uncertainty arising from the variability of the network parameters is assessed by constructing the prediction interval.

Data
The study area of the Han River basin ( Figure 1) is located in the central region of the Korean Peninsula. A substantial proportion of the basin, 86.1% of the total area of 34,428 km 2 , is located in South Korea, and the remainder is located in North Korea.
Water 2020, 12,1743 3 of 18 prediction ability. Barrow and Crone [18] proposed the Monte-Carlo cross-validation and aggregation (MCCVA) method for combining neural network forecasts, which combines bootstrapping and crossvalidation in a single approach through the repeated random splitting of the original learning data into mutually disjoint datasets for training and validation. However, the Monte-Carlo cross-validation method does not significantly focus on the ANN-based medium to long-term rainfall forecasting. Our study uses the MCCVA method to generate an ensemble of forecasts, and it evaluates the performance of the rainfall forecasting model. The objective of this study is to develop practical ANN models using MCCVA to forecast the rainfall for the Han River basin in South Korea during May and June. This time period was selected because this is when agricultural water demand abruptly increases and severe droughts in recent years have occurred during both months. The monthly global climate indices and past rainfall data with different lag times are determined as the predictors of the ANN models. The MCCVA method is used for random subsampling to split the entire dataset, create diverse networks, and evaluate the general performance of the resulting ensemble of rainfall forecasts. In addition, the output uncertainty arising from the variability of the network parameters is assessed by constructing the prediction interval.

Data
The study area of the Han River basin ( Figure 1) is located in the central region of the Korean Peninsula. A substantial proportion of the basin, 86.1% of the total area of 34,428 km 2 , is located in South Korea, and the remainder is located in North Korea. The South Korea's Ministry of the Environment provides the area-averaged rainfall data along the basin scale. The data was calculated by using the Thiessen method so that the rainfall was measured at gauging points within and around the basin. The historical monthly area-averaged rainfall data for the Han River basin has been recorded since 1966, and it was obtained from the Water Resources Information System [19].
The choice of appropriate input variables is crucial in prediction modeling. Generally, the selection of input variables is based on a priori knowledge of causal variables [20]. Kim et al. [21] investigated the relationships between the large-scale atmospheric teleconnection and the warm season hydro- The South Korea's Ministry of the Environment provides the area-averaged rainfall data along the basin scale. The data was calculated by using the Thiessen method so that the rainfall was measured at gauging points within and around the basin. The historical monthly area-averaged rainfall data for the Han River basin has been recorded since 1966, and it was obtained from the Water Resources Information System [19].
The choice of appropriate input variables is crucial in prediction modeling. Generally, the selection of input variables is based on a priori knowledge of causal variables [20]. Kim et al. [21] investigated the relationships between the large-scale atmospheric teleconnection and the warm season hydro-climatology in the Han River basin. They found that the East Atlantic/Western Russia Pattern (EAWR) and East Atlantic Pattern (EA) patterns show a significant relationship with the regional precipitation and streamflow. Kim et al. [22] analyzed the past and future extreme weather events and the associated flow regime of the Han River basin using a variety of local extreme climate indices. However, a priori knowledge to choose the large-scale climate indices affecting the monthly and seasonal rainfall is not sufficient to be used for the study area, particularly for the target months (May and June) for this study. There may exist a large number of potential inputs that affect the rainfall occurrence in May and June for the target forecast months. Finding the most significant input variables using trial-and-error can be time-consuming; thus, this study uses a cross-correlation analysis to determine the candidate input variables that could be potentially related to a target rainfall occurrence.
Large-scale climate indices have been widely used for seasonal predictions because these indices compactly can express the regional variabilities of the atmosphere and the surface of the land and ocean [23]. These climate indices are updated on a monthly basis by the climate prediction center (CPC) under the National Oceanic and Atmospheric Administration (NOAA). Users can easily obtain information on the various climate indices from the website [24]. Many climate indices have been provided from 1950 to the present. In this study, the potential climate indices data for 54 years from 1965 to 2018 were collected and are summarized in Table 1. For this study, the data length was extended by one year in the past from the period for the used areal rainfall data. Using the possible combinations of the collected climate indices to construct the forecasting models is impractical because the computational works are too extensive. Therefore, candidate climate indices were initially selected through the cross-correlation analysis. The correlation coefficients were calculated between each monthly climate index with a lag time from one to twelve months, as well as the past monthly rainfall data (Han) with a lag time from one to four months and the target rainfall amounts in the months of May and June. For this study, 13 lagged climate indices that have a correlation coefficient absolute value higher than 0.2 (including three indices below 0.2) were selected as the candidate input variables of the ANN models, so they can be trained. Further, the significantly correlated monthly precipitation data with lag times of three and four months for the May and June rainfall forecasting models, respectively, were additionally included in the set of the candidate input variables. Table 2 presents the candidate input variables, lag times, and the values of the correlation coefficients. It was determined that the monthly rainfall in May has a maximum positive correlation of 0.338 with a 3-month lagged Han (3), followed by EAWR (3) and WP (5). Meanwhile, it has a maximum negative correlation of −0.374 with EAWR (7). The value in parentheses indicates the lag time (months) in advance. For the June rainfall, a maximum positive correlation of 0.356 is achieved for Han (4), and the maximum negative correlation of −0.450 is achieved for NAO (6). It is noted that most of the highly correlated climate indices are different from May and June except that Han and POL are similarly selected for the candidate input variables. In addition, a quantification of the relative importance of the variables is made later during the training phase to finally determine the most significant input variables to be used as the ANN inputs among the selected 14 candidates. The results are described in Section 3.1. Multicollinearity occurs when two or more explanatory variables are fed into a statistical model (e.g., multiple regression model) and are highly correlated. This can lead to the inappropriate identification of relevant predictor variables and the variance inflation. De Veaux and Ungar [25] pointed out that "neural networks tend to be fairly insensitive to problems of multicollinearity." Therefore, the neural network model is regarded to be free from the limit of the statistical regression model [1], and usually, the collinearity among the predictor variables has not been dealt with when the machine learning model focuses on the predictive power [26]. However, given that neural networks are similar to the regression model, multicollinearity can undermine the ability of the machine learning model as well as the statistical model. To resolve the multicollinearity problem, in general, principal components have been used [27,28], or any variable that is highly correlated with other variables was removed in the data-driven model with the evaluation of the variance inflation factor (VIF) values among the predictors [29].
The present study investigated the multicollinearity by calculating the values of the correlation coefficients and the VIF between the predictor variables. Among the predictors for the May rainfall forecasting, QBO (7) and QBO (8) showed a very high correlation coefficient with a value of 0.988 and VIF = 41.9. The second highest correlation value of 0.428 (VIF = 1.2) occurred between POL (12) and EAWR (3), and most pairs showed an insignificant correlation that is much less than 0.3 (VIF = 1.0). For the predictors of the June rainfall forecasting models, the highest correlation of 0.875 (VIF = 4.3) was found between AMM (11) and AMM (12), followed by 0.706 (VIF = 2.0) that is between NAO (6) and AO (6), and 0.542 (VIF = 1.4), which is between SLP_E (3) and SLP_E (6). Similar to the case of the May rainfall forecasting, most pairs exhibited a weak correlation that is below 0.3. According to Lin [30], a value of VIF that exceeds 5-10 indicates a multicollinearity problem. Therefore, it was determined that only one pair of QBO (7) and QBO (8) greatly violates the limit and another pair of AMM (11) and AMM (12) has a possible multicollinearity problem.

Procedure of Artificial Neural Network (ANN) Model Development
ANNs imitate the biological behavior of the brain and nervous system. They consist of a system of interconnected artificial neurons that can receive information by inputs, weigh and subsequently sum the information through mathematical functions, and thereafter transfer it to other neurons. There are diverse structures of artificial neural networks such as the recurrent neural network, the deep neural network, and the time-delay neural network. In this study, a multi-layer perceptron (MLP) feed-forward neural network is used as a rainfall forecasting model. A simple neural network with one hidden layer between the input and output layers can be mathematically expressed as follows.
whereŷ k is the output of the network; x i is the input to the network; w ji and w kj are the connecting weights between the neurons of the input layer and hidden layer as well as between the neurons of the hidden layer and output layer, respectively; w jb and w kb are the bias of the hidden layer and the output layer, respectively; and f h and f 0 are the activation functions of the hidden layer and output layer respectively [13,31]. The hyperbolic tangent sigmoid function for f h and the linear function for f 0 were used in this study. The data were normalized within a range of −1.0 to 1.0 while considering the limits of the selected activation functions. The procedure for developing the ANN models is as follows: identify the suitable input variables; determine the number of hidden layers and neurons; estimate the parameter of the network for the training phase; and evaluate the model performance for the validation/testing phase.
Identification of the most significant influencing input variables is the first step in the development of the ANN rainfall forecasting model. This study used a technique of relative importance of input variables used by Lee et al. [13] to determine the appropriate input variables. A preliminary ANN structure using the 14 candidate input variables selected in Section 2.1 is first assumed. In addition, a stepwise trial-and-error procedure is used to determine the best and most simple structure by varying the number of input variables. It starts with the preliminary ANN structure, and it subsequently makes simpler models by removing insignificant input variables through the analysis of the relative importance (RI) of the variables. The method selected in this study to determine the RI of the input variables in the neural networks is the connection weights method [32,33], which has the following mathematical expression.
where RI i is the relative importance of the variable x i with respect to the output neuron. With this method, the product of the raw input-hidden weights (w ij ) and the hidden-output connection weights (w jk ) between the neurons are calculated and then the products across all hidden neurons are summed. The connection weight method has the advantage to consider the direction of the input-output interaction. A positively or negatively higher value of RI indicates a significant predictor variable for the output of the neural network model. As mentioned in Section 2.1, the candidate input variables are determined using cross-correlation analysis. The number of input variables is set to 14, which is equal to the number of candidate input variables. Thereafter, the number of inputs is further reduced to construct a more compact network by finding the most significant input variables during the training and validation processes. A stepwise removing approach is used to reduce the number of input variables, in which (1) the preliminary network is trained using all of the candidate input variables; (2) the weakest input variable is determined through the analysis of the relative importance of the variables; (3) the network is re-trained without the input neuron and the relevant weights of the weakest input variable; (4) the relative importance of the variables is again quantified to find the second-weakest input variable; (5) the networks are successively trained by removing one weak variable at a time; and (6) finally, the diverse types of ANN models with different numbers of input variables were determined, and their performances were compared to find the optimal number of input neurons.
Determining the appropriate numbers of the hidden layers and their neurons is another important step in building the ANN architecture. This is important for capturing the complex and nonlinear relationships between diverse input and output variables and achieving a good network performance. For this study, one hidden layer was set for simplicity, and it tested 2-10 neurons for this layer. This study applied a trial-and-error approach to determine the optimal number of hidden nodes and preferred the least number of neurons for the final model if there is no noticeable difference in the model performance.
The MCCVA method is applied to create a diversity of the training dataset, obtain an ensemble of the forecasts, and to estimate the prediction errors while considering the uncertainty from random splitting patterns (random subsampling) as well as the random initialization of the network weights. The entire original input-output patterns were randomly subsampled without replacement to compose the training dataset for estimating the model parameters, the validation dataset for early stopping to avoid any over-fitting, and the testing dataset to evaluate the model's general performance. In this study, a total of 53 patterns of monthly input-output data were randomly split into three parts: 60% of the data for training, 20% for validation, and the remaining 20% for test datasets. The weights between the neurons were initialized with a set of random values for starting the training that also produces variability of the outputs. Neural networks were repeatedly trained (10,000 times) using 100 sets of randomly subsampled data and 100 sets of initial weight vectors under the chosen numbers of input variables and hidden neurons

Performance Evaluation of the Artificial Neural Network (ANN) Outputs
The prediction accuracy of the 10,000 ensembles was assessed in terms of the root mean squared error (RMSE) and the Pearson correlation coefficient (CC). Further, the forecast quality was measured by calculating the Heidke skill score (HSS). The HSS is an index for the categorial forecasts where the proportional correction measure is scaled with the reference forecast that is expected by chance [34]. The HSS measures the fraction of possible improvement that is afforded by the forecasts over the reference forecast. It is defined as where NC = m i=1 X ij , E = m i=1 (X ip X pi )/T, and T = X pp . In Equation (3), NC equals the number of correct forecasts (i.e., the number of times the forecast and the observations match); T represents the total number of forecasts; E equals the number of forecasts that are expected to be verified based on chance; m is the number of categories; and the elements X ij indicate the number of times the forecast was in the jth category and the observation was in the ith category. The row and column totals are shown by the subscript (and category) p, and E is computed for the marginal totals of the contingency table. A larger positive value of HSS is better. The negative HSS indicates that a forecast is worse than a randomly based forecast.

Uncertainty Analysis of the Artificial Neural Network (ANN) Outputs
The sources of uncertainty related to the neural networks can originate from incomplete training data, diverse training algorithms, non-unique model structure, and inappropriate model parameterization. In particular, the training dataset selection has a significant impact on model performance [35]. In this study, the uncertainty arising from the variability of training data due to the random subsampling and the different initialization of the network weights was assessed. Generally, the uncertainty is evaluated after constructing the ensemble of the model outputs. The ensemble of the rainfall forecasts, which is the result of the MCCVA, is used to evaluate the output uncertainty as well as evaluate the model performance.
The uncertainty associated with the ANN model output was evaluated by constructing a 95% prediction band for the ensemble of rainfall forecasts for May and June. The uncertainty band can be useful in quantifying the uncertainty associated with the ensemble of the forecasts. To construct the 95% prediction interval, the 2.5th and 97.5th percentiles of the empirical cumulative distribution of the output variables are selected as the upper and lower limits of the prediction interval, respectively. The percentage of coverage (POC) and the standardized average width (SAW) of the constructed prediction interval are calculated. The POC is measured by counting the number of observed data that were enveloped by the prediction interval, and thereafter, it was divided by the total number of observations. A high POC value for the estimated prediction interval is desirable. The SAW, which is known as the r-factor (Singh et al. [36]), is measured by calculating the average width (AW) between the differences of the upper and the lower values of the band, and the AW was divided by the standard deviation of the observed data. It is desirable for the AW of the prediction interval to be as narrow as possible and the r-factor to be less than 1 [36].

Determination of the Preliminary Input Variables
Preliminary network geometries were constructed using the 14 candidate input variables and the 2-10 hidden neurons. In addition, they were repeatedly trained for 100 training samples with 100 sets of initial weights and biases to minimize the errors between the computed and observed data. To assess each input variable's contribution to the rainfall forecasts, the values of the relative importance (RI) were calculated using the Olden's connection weight method [32,33]. This method sums the product of the input-hidden and the hidden-output weights across all hidden neurons to quantify the RI. Figure 2 shows the RI values for each input variable in the form of a box plot. A box plot, also known as a box and whisker diagram, is a method for graphically depicting the summary of the distribution of a dataset, which is made up of five components: the minimum, maximum, sample median (50th), and first (25th) and third (75th) quartiles [37]. The variations of the RI for each variable are due to the random number generator for data sampling and the initialization of the network weights and biases. These variations have a significant effect on the individual model predictions thus resulting in uncertainty in the model output. The most significant input variable to the rainfall in May is EAWR (7), which exhibits a median RI value of −0.257, followed by Han (3) with 0.211, and POL (7) with 0.188. The weakest one was determined to be EAWR (3) with an RI of 0.035. The variables of AO (10), EAWR (3), POL (7), POL (12), WP (5), and Han (3) exhibit a positive influence, and the remainder have a negative impact on the output. Meanwhile, for the rainfall in June, the strongest input variable is Han (4), and the least influential one is WP (4), which have RI values of 0.366 and 0.023, respectively. Nine input variables among the 14 candidates were positively related to the output, and the remainder were negatively affected. This study aimed to determine the optimal number of input variables of the ANN model to produce the best performance. The weakest input variable was removed from the input nodes to obtain a simpler network geometry. The training was conducted by putting off the weakest input variable, and the relative importance of the input variables was analyzed again. This procedure was iteratively conducted, initially starting with 14 input variables until the number of inputs decreased to four. This was achieved by removing the least influential input variable after every iteration. Tables 3 and 4 present the summaries of the values of the RI for each input variable for 11 types of models with different numbers of input variables. The blanks represent the weakest input variable for each type of model to be removed from the networks. Here, ANN-M and ANN-J mean the ANN rainfall forecasting model for May and June, respectively. Table 3. Summary of the relative importance (RI) values of the input variables for the ANN-M models. This study aimed to determine the optimal number of input variables of the ANN model to produce the best performance. The weakest input variable was removed from the input nodes to obtain a simpler network geometry. The training was conducted by putting off the weakest input variable, and the relative importance of the input variables was analyzed again. This procedure was iteratively conducted, initially starting with 14 input variables until the number of inputs decreased to four. This was achieved by removing the least influential input variable after every iteration. Tables 3 and 4 present the summaries of the values of the RI for each input variable for 11 types of models with different numbers of input variables. The blanks represent the weakest input variable for each type of model to be removed from the networks. Here, ANN-M and ANN-J mean the ANN rainfall forecasting model for May and June, respectively.

Performance of the Artificial Neural Network (ANN) Models
To find the optimal network geometry, the performances of the ANN models were evaluated by varying the numbers of the input and hidden neurons from 4 to 14 and from 2 to 10, respectively. Two statistical measures of RMSE and CC between the observed and predicted rainfall values were used for evaluating the model performance. For each combination of input and hidden neuron numbers, the values of the RMSE and CC averaged across the 10,000 models were obtained. The structure with the highest performance exhibiting the minimum RMSE and the maximum CC in the training and validation datasets was selected for the optimal ANN model. Figure 3 presents the performance of the different ANN-M models with varying numbers for the input and hidden neurons for the training and validation datasets. It is observed that the variations of the RMSE and CC across the number of input neurons is more significant than the number of hidden neurons. It was determined that the ANN-M with 11 input neurons and four hidden nodes is the best, which exhibited averaged RMSE values of 27.4 mm and 33.6 mm, and a CC of 0.809 and 0.725 for the training and validation datasets, respectively. Based on the above results, this study reduced the number of input variables in the ANN-M model from 14 to 11. The final input variables were determined as AO (8), EAWR (7), EAWR (10), NOI (12), POL (7), POL (12), QBO (7), SPI (9), WP (5), and Han (3). This outcome indicates that removing the irrelevant input variables can improve the performance of the rainfall prediction. As described in Section 2.2, the multicollinearity problem can occur when QBO (7) and QBO (8) are used. But only QBO (7) was included in the final 11 input variables that were selected; therefore, the appropriate identification of the predictor variables was achieved.

Performance of the Artificial Neural Network (ANN) Models
To find the optimal network geometry, the performances of the ANN models were evaluated by varying the numbers of the input and hidden neurons from 4 to 14 and from 2 to 10, respectively. Two statistical measures of RMSE and CC between the observed and predicted rainfall values were used for evaluating the model performance. For each combination of input and hidden neuron numbers, the values of the RMSE and CC averaged across the 10,000 models were obtained. The structure with the highest performance exhibiting the minimum RMSE and the maximum CC in the training and validation datasets was selected for the optimal ANN model. Figure 3 presents the performance of the different ANN-M models with varying numbers for the input and hidden neurons for the training and validation datasets. It is observed that the variations of the RMSE and CC across the number of input neurons is more significant than the number of hidden neurons. It was determined that the ANN-M with 11 input neurons and four hidden nodes is the best, which exhibited averaged RMSE values of 27.4 mm and 33.6 mm, and a CC of 0.809 and 0.725 for the training and validation datasets, respectively. Based on the above results, this study reduced the number of input variables in the ANN-M model from 14 to 11. The final input variables were determined as AO (8), EAWR (7), EAWR (10), NOI (12), POL (7), POL (12), QBO (7), SPI (9), WP (5), and Han (3). This outcome indicates that removing the irrelevant input variables can improve the performance of the rainfall prediction. As described in Subsection 2.2, the multicollinearity problem can occur when QBO (7) and QBO (8) are used. But only QBO (7) was included in the final 11 input variables that were selected; therefore, the appropriate identification of the predictor variables was achieved.    Figure 4 illustrates the values of the RMSE and of the ANN-J models according to the different combinations of input and hidden neurons for the training and validation datasets. It is also noted that the impact of the number of inputs on the model performance can be clearly seen despite the slight difference of errors, and the number of hidden neurons does not have a significant effect on the model predictions.
The best ANN-J model with nine input variables and three hidden neurons was finally chosen, and it exhibited a model performance where the RMSE values are 39.5 and 46.1 mm, and the CC values are 0.853 and 0.771 for the training and test datasets, respectively. The optimal ANN-J was determined to have the input variables of AMM (12), EPNP (2), EPNP (7), NAO (6), PNA (4), POL (12), SCAND (3), SCAND (10), SLP_E (3), and Han (4). It could also be noticed that the final nine input variables did not violate the limit of multicollinearity because they contained AMM (12) but not AMM (11). Figure 4 illustrates the values of the RMSE and of the ANN-J models according to the different combinations of input and hidden neurons for the training and validation datasets. It is also noted that the impact of the number of inputs on the model performance can be clearly seen despite the slight difference of errors, and the number of hidden neurons does not have a significant effect on the model predictions. The best ANN-J model with nine input variables and three hidden neurons was finally chosen, and it exhibited a model performance where the RMSE values are 39.5 and 46.1 mm, and the CC values are 0.853 and 0.771 for the training and test datasets, respectively. The optimal ANN-J was determined to have the input variables of AMM (12), EPNP (2), EPNP (7), NAO (6), PNA (4), POL (12), SCAND (3), SCAND (10), SLP_E (3), and Han (4). It could also be noticed that the final nine input variables did not violate the limit of multicollinearity because they contained AMM (12) but not AMM (11).  The optimal ANN-M and ANN-J models determined from the training and validation phases were thereafter used with the test datasets to evaluate the general model performance. Table 5 gives a summary of the performance statistics for the training, validation, and testing datasets. The averaged values of the RMSE and CC for the ANN-M model were estimated as 39.5 mm and 0.641, respectively for the testing dataset. The model yielded larger errors for the testing dataset compared with the training and validation datasets; however, the general performance is satisfactory because the difference is insignificant. The ANN-J model exhibited similar features, thereby performing poorly for the testing dataset compared with the others; however, this is acceptable with an averaged RMSE and CC of 62.1 mm and 0.683, respectively, for the testing dataset. It is noticeable that the standard deviations are about 20~30% of the mean values. This implies that a significant variance of the output occurred if any individual model was used. MCCVA has the advantage of reducing the variance by generating ensembles and aggregating them. The observed rainfall data were divided into three categories based on μ + 0.43σ and μ − 0.43σ, under the assumption of a normal distribution with a mean of μ and standard deviation of σ. Then, The optimal ANN-M and ANN-J models determined from the training and validation phases were thereafter used with the test datasets to evaluate the general model performance. Table 5 gives a summary of the performance statistics for the training, validation, and testing datasets. The averaged values of the RMSE and CC for the ANN-M model were estimated as 39.5 mm and 0.641, respectively for the testing dataset. The model yielded larger errors for the testing dataset compared with the training and validation datasets; however, the general performance is satisfactory because the difference is insignificant. The ANN-J model exhibited similar features, thereby performing poorly for the testing dataset compared with the others; however, this is acceptable with an averaged RMSE and CC of 62.1 mm and 0.683, respectively, for the testing dataset. It is noticeable that the standard deviations are about 20~30% of the mean values. This implies that a significant variance of the output occurred if any individual model was used. MCCVA has the advantage of reducing the variance by generating ensembles and aggregating them. The observed rainfall data were divided into three categories based on µ + 0.43σ and µ − 0.43σ, under the assumption of a normal distribution with a mean of µ and standard deviation of σ. Then, the observed and predicted rainfall were classified into one of three categories, which are below normal, near normal, and above normal conditions. Tables 6 and 7 present the Heidke skill score contingency tables of the comparison between the forecast and observed rainfall amounts in May and June for the test phase, respectively. The values of the HSS were calculated by plugging the counts in the tables into Equation (3). The HSS is about 0.45 for the rainfall forecasts in May, which indicates a 45% improvement in the forecast over the reference forecast. The HSS for rainfall forecasts in June is 0.32. This means that the forecast quality using the ANN-J model increased by 32%, which is better than the forecasts that are expected by chance.

Uncertainty of the Artificial Neural Network (ANN) Models
To assess the uncertainty of the model outputs, the 95% prediction intervals were estimated using the ensemble of the rainfall forecasts from the ANN-M and ANN-J models. Figure 5 compares the observed rainfall data in May and the forecasted ensembles with the optimal ANN-M model for the entire datasets. The line with a symbol represents the observed data of the monthly rainfall in May, and the shaded area represents the 95% prediction interval. The overall forecast values exhibited the general behavior of the observed values. The estimated prediction interval reasonably represents the observed data, although observations tend to exceed some of the upper and lower limits of the prediction interval in some years and are rather overestimated in years with low rainfall. ANN-M was able to produce ensembles with 58.5% of the total observed data points lying within the prediction interval. The ANN-M produced a reasonable uncertainty band even though the observed values included in the uncertainty band are theoretically less than the expected value of the confidence level. The width of the predictive uncertainty band should be narrower, and the band contains a significant proportion of the observations. The AW of the band is estimated as 50.0 mm, which is acceptable because its standardized value (SAW = 1.03) is closer to the upper criteria of 1.0, which was proposed by Singh et al. [36].  Figure 6 illustrates the 95% uncertainty band of the rainfall forecasts in June using the optimal ANN-J along with the observations. It is clear from the figure that the overall predictions are wellsuited to the observed data, and many points of the observed data fall within the prediction interval of the ensembles. It may also be noted that the model is insufficient to produce extremely high and low  Figure 6 illustrates the 95% uncertainty band of the rainfall forecasts in June using the optimal ANN-J along with the observations. It is clear from the figure that the overall predictions are well-suited to the observed data, and many points of the observed data fall within the prediction interval of the ensembles. It may also be noted that the model is insufficient to produce extremely high and low rainfall characteristics. This may be due to more training sample data in the middle rainfall amount near the normal. The ANN-J model generates ensembles with 67.9% of the observed rainfall data lying within the uncertainty band. The 95% uncertainty band for the June prediction has a larger average band width of 83.0 (SAW = 1.04) compared with the corresponding one for May. However, more observations are included inside the band, which can provide a reasonable estimation of the uncertainty of the rainfall forecasts.  Figure 6 illustrates the 95% uncertainty band of the rainfall forecasts in June using the optimal ANN-J along with the observations. It is clear from the figure that the overall predictions are wellsuited to the observed data, and many points of the observed data fall within the prediction interval of the ensembles. It may also be noted that the model is insufficient to produce extremely high and low rainfall characteristics. This may be due to more training sample data in the middle rainfall amount near the normal. The ANN-J model generates ensembles with 67.9% of the observed rainfall data lying within the uncertainty band. The 95% uncertainty band for the June prediction has a larger average band width of 83.0 (SAW = 1.04) compared with the corresponding one for May. However, more observations are included inside the band, which can provide a reasonable estimation of the uncertainty of the rainfall forecasts. The results of the uncertainty analysis indicated that the middle range of rainfall amount near the normal is well-forecasted by the ANN models irrespective of the variation in the training samples. However, the performance for extremely high or low rainfall may not be satisfactory because limited patterns were used for training the networks. These features have also been found in some previous long-term rainfall forecasting studies [1,13]. It is worth mentioning that accurate forecasts for the lower rainfall below the normal condition is important for water resource management and drought risk mitigation. Accordingly, it could be better to use the lower limit of the uncertainty band estimated in this study for practical purposes. In the future, it will be necessary to improve the performance for longterm forecasting of low and high rainfall with additional considerations by finding other influential input variables, applying other data-driven models, using diverse learning algorithms with different hyper-parameters, and combining the use of physically based models. The results of the uncertainty analysis indicated that the middle range of rainfall amount near the normal is well-forecasted by the ANN models irrespective of the variation in the training samples. However, the performance for extremely high or low rainfall may not be satisfactory because limited patterns were used for training the networks. These features have also been found in some previous long-term rainfall forecasting studies [1,13]. It is worth mentioning that accurate forecasts for the lower rainfall below the normal condition is important for water resource management and drought risk mitigation. Accordingly, it could be better to use the lower limit of the uncertainty band estimated in this study for practical purposes. In the future, it will be necessary to improve the performance for long-term forecasting of low and high rainfall with additional considerations by finding other influential input variables, applying other data-driven models, using diverse learning algorithms with different hyper-parameters, and combining the use of physically based models.

Conclusions
This study presented ANN models based on the MCCVA technique for forecasting rainfall in May and June for the Han River basin in South Korea. The MCCVA technique was used to generate different parameters of the ANN models while considering variabilities of the random sampling of training datasets and the random assignment of the initial weights of networks. To build the ANN structures, the most influential input variables of the lagged climate indices and the historical rainfall data were selected through the cross-correlation analysis between the input and output variables and the quantification of the relative contribution of each variable to the output variable. This resulted in 11 types of ANN models that have 4 to 14 input variables. The number of the hidden layer was set to one and the neurons varied from 2 to 10. For each combination of the number of input and hidden neurons, 10,000 ANN models were generated using the MCCVA technique with 100 times data splitting of training, validation, and test datasets and 100 sets for the initial weights. The predictive errors were evaluated to find the optimal rainfall forecasting ANN models for May and June. The optimal ANN model to forecast the monthly rainfall in May with a three month lead-time was determined to have 4 hidden neurons and 11 input variables of the lagged climate indices such as the Atlantic Meridional Mode (AMM), the East Pacific/North Pacific Oscillation (EPNP), the North Atlantic Oscillation (NAO), and the Scandinavia Pattern (SCAND). The prediction errors of the ensemble of forecasts were acceptable with averaged values of the RMSE of 27.4, 33.6, and 39.5 mm, and the correlation coefficients of 0.809, 0.725, and 0.641 for the training, validation, and test datasets, respectively. The uncertainty band obtained from the ensemble of forecasts has covered 58.5% of the observed rainfall data with an average band width of 50.0 mm. Meanwhile, the best ANN rainfall forecasting model, which is two months in advance of June, has 3 hidden neurons and 9 input variables including the Atlantic Meridional Mode (AMM), the East Pacific/North Pacific Oscillation (EPNP), the North Atlantic Oscillation (NAO), and the Scandinavia Pattern (SCAND), which are significantly different from the May model. The averaged RMSE values of the model are 39.5, 46.1, and 62.1 mm, and the correlation coefficients values are 0.853, 0.771, and 0.683 for the training and test datasets, respectively. The results indicate that the uncertainty band for the June rainfall forecasts has coverage of 67.9% with an average band width of 83.0 mm, which are slightly larger than those obtained for May. Both ANN models provide satisfactory forecasting performance despite the limited short learning data that were used. It is discovered that the middle rainfall near normal can be well-bracketed by the estimated prediction interval; however, extremely high or extremely low rainfall is expected to be outside the band. Further research will be required to improve the performance of the ANN models to capture extremely high and low rainfall characteristics. This can be achieved by inputting other climate indices so they can play an important role, have varying time scales, pre-processing the learning data, and combining the use of other statistical models.
It can be concluded that the ANN models with MCCVA enable us to construct a reasonable prediction uncertainty interval as well as to provide an ensemble of rainfall forecasts with improved reliability by reducing variance. The ANN forecasting models developed for the study area are expected to be used for effective and timely water resource management during May and June, which are prone to drought.
Author Contributions: All authors substantially contributed in conceiving and designing the approach and realizing this manuscript. J.L. implemented the artificial neural network models with Monte-Carlo cross-validation and analyzed the results. C.-G.K. and J.E.L. worked on the analysis and presentation of the results. N.W.K. and H.K. analyzed the results and supervised the entire research. All five authors jointly wrote the paper. All authors have read and approved the final manuscript.
Funding: This research was funded by the Korea Institute of Civil Engineering and Building Technology (grant number 20200041-001) and the APC was funded by the Korea Institute of Civil Engineering and Building Technology.