Monthly Precipitation Forecasting in the Han River Basin, South Korea, Using Large-Scale Teleconnections and Multiple Regression Models

: In this study, long-term precipitation forecasting models capable of reflecting constantly changing climate characteristics and providing forecasts for up to 12 months in advance were developed using lagged correlations with global and local climate indices. These models were applied to predict monthly precipitation in the Han River basin, South Korea. Based on the lead month of forecast, 10 climate indices with high correlations were selected and combined to construct four-variable multiple regression models for monthly precipitation forecasting. The forecast results for the analytical period (2010–2019) showed that predictability was low for some summer seasons but satisfactory for other seasons and long periods. In the goodness-of-fit test results, the Nash–Sutcliffe efficiency (0.48–0.57) and the ratio of the root mean square error to the standard deviation of the observation (0.66–0.72) were evaluated to be satisfactory while the percent bias (9.4%–15.5%) was evaluated to be between very good and good. Due to the nature of the statistical models, however, the predictability is highly likely to be reduced if climate phenomena that are different from the statistical characteristics of the past appear in the forecast targets or predictors. The forecast results were also presented as tercile probability information (below normal, normal, above normal) through a comparison with the observation data of the past 30 years. The results are expected to be utilized as useful forecast information in practice if the predictability for some periods is improved.


Introduction
Reliable prediction of precipitation is essential for stable operation and management of water resources. In particular, a long-term forecast ranging from one month to several months, which corresponds to seasonal forecasts, is very important for responding to disasters caused by extreme weather events, such as droughts, floods, and heat waves, and securing stable water resources. Forecasting models have been widely used to predict climate elements, such as precipitation and temperature. These models can be mainly divided into two types depending on their approach [1-3]. The first type uses dynamic models that obtain numerical solutions through various equations that describe the fluid phenomena in the atmosphere, ocean, and land. The general circulation models (GCMs) are widely used here. The second type performs forecasting through statistical relationships between the forecast targets and the atmosphere-ocean-land data.
The statistical approaches used included analog methods, time series models, multiple regression models, canonical correlation models, and artificial neural network models.
Some countries, including South Korea, Australia, the United States, the United Kingdom, and Japan, use numerical dynamic models to provide monthly and seasonal forecasting data on precipitation and temperature. With the development of modeling and computing technologies, there is an increase in the short-term (a few days or less) predictability through the reproduction of complex and nonlinear phenomena. However, the accuracy of long-term prediction of more than several months is still not sufficient when using numerical dynamic models, due to the uncertainty caused by initial conditions and the increasing error caused by the integration time of these models [1, 4,5]. Statistical models, on the other hand, are relatively flexible in model construction and have a potential to improve the forecast period and predictability depending on the accumulation of observations and related data, although their predictability is unstable compared to dynamic models. Until 2013, the long-term weather forecast in Australia was performed using statistical techniques [6], and the runoff forecast data for major nationwide branches are still being generated using statistical techniques. Statistical techniques are also utilized in South Korea, the United States, and Japan, and forecasts based on various statistical approaches have been attempted by many studies due to the benefits of statistical models compared to dynamic models despite the fact that they cannot explain physical relationships among natural phenomena. A representative statistical approach in recent years is to derive monthly or seasonal forecast information by constructing forecasting models, such as multiple regression and artificial neural network models, through the teleconnections with large-scale climate indices and forecast targets (e.g., precipitation). Many statistical forecasts have been performed with the focus on precipitation and temperature, as well as on runoff [7][8][9][10][11][12], snowfall [13][14][15], crop yield [16][17][18][19], and brown planthopper infestation [20].
There are two limitations in precipitation forecasting studies using teleconnections with climate indices, other than the predictability issue. One limitation is that it is difficult to judge predictability on future periods that were not used in the analysis even though the studies [21,[25][26][27]33,34] generally present optimized results for calibration and validation periods by dividing the past analytical period. This means that if forecasting models are derived based on the correlation results for a certain period only, the utilization of the models can be difficult when teleconnections vary due to climate change in the future. The other limitation is that the lead time of the forecast is limited by the lead time of the predictors. For example, if the analysis results for correlations with climate data that occurred in the same month as the forecast target are presented [35][36][37], it is not possible to forecast more than one month using these results although statistical interactions can be identified. In other words, when n-month preceding data of climate indices are selected as predictors, only n months or less can be forecasted.
Therefore, this study was conducted with a focus on building more flexible models and providing forecast information to improve utilization for practical purposes. The monthly precipitation of the Han River basin in South Korea was set as the forecast target. The 39 global climate indices and 6 local climate indices in the target basin, including precipitation and temperature, were utilized as predictors. Statistical multiple regression models were used as forecasting technique, and forecasting models capable of predicting up to 12 months in advance were constructed by analyzing the lagged correlations with climate indices and selecting the optimal predictors according to the forecast lead time. The range of forecast values were derived from a number of forecasting models constructed for each target month, and tercile probability information was also presented for monthly forecasts through a comparison with the observation data of the past 30 years.

Study Area and Data Sets
The Han River basin, which is the focus area of this study, is located in the center of the Korean Peninsula as shown in Figure 1. It is a large basin that represents approximately 19% (41,947 km 2 ) of the total area of the Korean Peninsula. This area includes Seoul, the capital of South Korea, with more than 27 million people. Table 1 shows the names and geographical locations of the automated synoptic observation system (ASOS) stations of the Korea Meteorological Administration (KMA).  The monthly precipitation for the last 10 years (2010-2019) was selected as forecast targets. A total of 45 indices, including 39 global climate indices, such as the Antarctic oscillation (AAO) and the Atlantic meridional mode (AMM), provided by the National Oceanic and Atmospheric Administration (NOAA) and 6 local climate indices for the target basin were utilized as predictors. Table 2 summarizes the global and local climate indices that were used as predictors, where the global indices are monthly data provided by each institution.
To analyze the lagged correlations between precipitation and predictors, the data of the predictors from 1968 to 2019 were used, where the data of precipitation and other local climate indices were area-averaged using the monthly data from 34 ASOS stations of the KMA in and around the basin.

Teleconnection Analysis
In order to determine predictors for constructing forecasting models, the correlation between the past precipitation data and 1-18 months of the preceding data for each predictor were analyzed.
In this instance, the delayed correlations between the forecast target (precipitation) and predictors were analyzed using the data of the past 40 years based on the target month considering the short-term variability of each index and their long-term variability due to climate change. Figure 2 shows the correlations between the precipitations and the preceding data for 1-18 months for each index from the past 40 years (1970-2009) based on January and July 2010, and Figure 3 shows the results of analyzing the correlations between the monthly precipitation and each predictor over the past 40 years (1979-2018) based on January and July 2019. The interpretation of correlation coefficients differs depending on the field of study [39], and as suggested by Evans [40], in statistics, the absolute values above 0.4 are considered to be moderate. Therefore, in this study, when the absolute value of the correlation was 0.4 or higher, the numerical values were also indicated. Red indicates a positive correlation, and blue indicates a negative one. The absence of color means that there is no climate index available for that period.
As shown in Figure 2(a), the correlations between the precipitation in January from 1970 to 2009, and the predictors revealed the highest value, −0.483 in the EASMI data of the preceding 17 months, i.e., EASMI (17). Other significant correlations were found in TPI (7) (−0.420) and HrDL (9) (0.407). Meanwhile, as shown in Figure 2(b), TNA (12) (0.539) exhibited the highest correlation with the precipitation in July from 1970 to 2009, followed by NTA (11) (0.520) and NTA (12) (0.517). The correlation results were different for each month of each year. In general, significant positive correlations were observed for summer (July to September), and significant negative correlations were observed for winter.

Results of Monthly Precipitation Forecasting
for 2010-01 ~ 2019-12 Top 10 climate indices used in monthly forecasting models  As shown in the above two examples, the forecast ranges for spring and winter, when precipitation is relatively low, were narrow and not significantly different from the observed values. However, the range was very wide in the summer when there was a large amount of precipitation, and the difference from the observed value was also large for some months. Figures 11 and 12 show the 12-month forecast results predicted in June 2010 and in June 2018, respectively. Compared to the results in Figures 9 and 10, the forecast ranges were different depending on the lead time of forecasting, but the impact of the lead time was not significant.  As shown in Figures 9 to 12, it is possible to derive the forecast values for the next 12 months based on the forecast time. Figure 13 compares the average values of the monthly forecasts with the observed values for the entire analytical period (January 2010 to December 2019). As up to 12 average values for the forecasts for each month were generated depending on the lead month (LM) of the forecast, these were divided by shades. As shown in Figure 13, there was no significant difference in forecast depending on the lead month; however, there were clear differences from summer observation data for certain years. In July 2011 and 2013 when there was much precipitation, the models were not sufficient in reproducing the phenomena of heavy rain. Meanwhile, for 2014, 2015, and 2019, when the precipitation was low, the predictability of summertime was found to be relatively low. For the other periods, however, the tendencies of the observation data were reproduced relatively well.  Figure 14 shows the results of the goodness-of-fit test analyzed by the lead time for the forecast results presented in Figure 13. The percent bias (PBIAS) ranged from 9.4% to 15.5%, and the ratio of the root mean square error (RMSE) to the standard deviation of the observation (RSR) ranged from 0.66 to 0.72. The Nash-Sutcliffe efficiency (NSE) ranged from 0.48 to 0.57, and the Pearson correlation coefficient (r) ranged from 0.73 to 0.78. When compared with the evaluation grade of each index suggested by Moriasi et al. [42], NSE and RSR were evaluated to be satisfactory while PBIAS was rated to be between very good and good. In other words, the predictability for some specific periods was low, but the predictability over a long period was found to be satisfactory. There was no tendency according to the lead month of forecast (1-12 months of lead time), and some goodness-of-fit indices exhibited better results for the 12-month lead time. These appear to be the characteristics of statistical models.

Probabilities of Monthly Precipitation Forecasts
Some countries that perform long-term forecasts by government agencies, such as South Korea, Australia, the United States, the United Kingdom, and Japan, provide tercile probability information (below normal, normal, above normal) through a comparison with historical observations. In this study, the tercile probability was analyzed for the 2520 forecasts of each month compared to the observed precipitation data over the past 30 years. Figure 15(a) compares the forecast results from January to December 2010 predicted in December 2009 with the observation data of the past 30 years for each month, using box plots, and Figure 15(b) also compares the forecasts for the next 12 months predicted in December 2018 and the observations over the past 30 years. From the figures, it is possible to qualitatively judge the levels of the forecast results compared to the past observation data. Figure 16 quantitatively shows the results in Figure 15 as a tercile probability information. Figure 16 shows that more precipitation than normal was forecasted for February and July in 2010, and a normal precipitation was highly probable for January and April 2019.

Conclusions
In this study, statistical precipitation forecasting models capable of reflecting long-term climate change and providing forecasts up to 12 months in advance were developed through the continuous update of the past climate data and the use of selected predictors according to the forecast lead time.
As a test, the monthly precipitation of the Han River basin was set as the forecast target, and 1-18 months of lagged correlations were analyzed for 45 global and local climate indices. Ten climate indices with high correlations were selected for each target month according to the lead month of forecast. These were combined to construct four-variable multiple regression models for monthly precipitation forecasting. In this process, many indices with positive correlations were selected for summer, and indices with negative correlations were selected for winter. In the course of deriving regression models, the models that had multicollinearity between the independent variables or derived values smaller than zero for the target month were excluded. In addition, leave-one-out cross-validation was performed for the data of the past 40 years to derive 2520 forecasting models with good predictability for each month.
When examining the climate indices used in the monthly forecasting models for the entire period (2010-2019), we found that similar indices were also used in previous studies for the analysis of precipitation in South Korea for the months of February, May, June, October, and November. The results of comparing monthly forecasts (forecast ranges) with observation data for the period between 2010 and 2019 revealed that predictability was relatively low for the summer of 2011 and 2013 when the precipitation was high due to sudden heavy rain, and for the summer of 2014, 2015, and 2019 when the precipitation was significantly lower than normal. For the other periods, however, the tendencies of the observation data were reproduced relatively well. In the goodness-of-fit analysis results, NSE and RSR were evaluated to be satisfactory while PBIAS was evaluated to be between very good and good. In other words, predictability for some specific periods was low, but predictability over a long period was found to be satisfactory. The monthly forecast results were also presented as tercile probability information (below normal, normal, above normal) through a comparison with the observation data of the past 30 years to be used for practical purposes in the future.
The novelty of this study lies in the fact that, unlike previous studies, it developed a method that allows forecasts for future periods (up to 12 months), by updating the relationships between the forecast targets and predictors of each month based on the statistical characteristics of the past data. However, the predictability of some summer periods is very low. This is attributed to irregular rainfall characteristics (such as heavy rains caused by typhoons and monsoonal front) in summer in northeast Asia, including Korea. In particular, the prediction of summer precipitation is a very important issue in water management related to the response to flooding. Due to the limitations of statistical models, however, predictability is bound to decline if precipitation or climate indices that are different from the past statistical characteristics occur. That is, for now, there is a problem of poor predictability due to insufficient past available data required to find a statistical relationship, but in the future, if the data continues to accumulate and some statistical relationship is explained from the accumulated data, the predictability for a specific period may improve. In future studies, it will be important to track predictors that can reflect abnormal weather conditions (e.g., sudden typhoons, heavy rain, and temporary droughts) and to improve forecasting models that consider such predictors.

Discussions
The investigation of teleconnections with climate signals, as performed by many researchers, is the first process to predict precipitation statistically using the relationships with climate indices. Many studies have been successful in deriving meaningful correlations between precipitation and climate indices [2,21,[24][25][26][27][28][29][30][31]33,37], but using these correlations to obtain reliable long-term precipitation estimates is still difficult. As shown in this study, the results show high accuracy for regions or periods with relatively stationary climate characteristics, but the prediction accuracy is still low for irregular climatic conditions, making it difficult to use in practice. In particular, the predictability of the heavy rains caused by typhoons or monsoonal fronts in northeast Asia is significantly lower. The reasons for this may be a failure to find a signal of a climate index related to such a heavy rain, a failure to properly understand the signal of precipitation, or a lack of a significant teleconnection between the precipitation and climate indices in the historical data. Since the relationship between precipitation and climate indices is very complicated, and the signals of precipitation and climate indices have regular or irregular variability depending on temporal scales, various studies have been conducted to analyze these variabilities. He and Guan [43] developed a wavelet-based method to solve these problems, presenting three difficulties (interdependent difficulty of explanatory variables, multitemporal component difficulty, nonuniform time-lag difficulty) in the process of correlation and variability analysis.
For the interdependent difficult, a multicollinearity validation was performed to avoid intercorrelations between the climate indices in this study. The variation inflation factor (VIF) was used as a criterion for multicollinearity. Various recommendations for acceptable levels of VIF have been published in the literature, and this study applied a value of 10, which has been mostly recommended as the maximum level of VIF [44][45][46][47].
Multitemporal component analysis is considered to be a very useful way to examine several limited climate indices. However, this study tried to include as many climate indices as possible. As a result, a total of 45 indices (39 global and 6 local) were used as candidates as predictors. Using many climate indices does not necessarily increase predictability, but it can increase the utility in constructing statistically flexible models. A direct statistical relationship was analyzed for the original time series between precipitation and climate indices. As He and Guan [43] mentioned, there is a limitation that a direct comparison of the original time series can only reveal part of the relationship, and if various correlations for different temporal scales are found, it can be very useful for prediction. In the future, more detailed analyses of the correlations at different temporal scales will be performed for the climate indices that have been found to be highly correlated in this study through other studies.
For the non-uniform time-lag difficulty, as in the examples of Figures 2 and 3, since the correlation was analyzed using data from the past 40 years according to the forecast month, the results of the correlation analysis differ from month to month. This is the difference from previous studies. Most of the existing studies [2,21,[23][24][25]27,31,33,34] have constructed a "fixed" model using these climate indices after investigating the lag time (month) and the climate indices that showed the maximum (or higher) correlation among all available historical data. If the signal characteristics of the climate indices differ from the past (i.e., the statistical characteristics vary with time), there will be restrictions on the continued use of the "fixed" model in the future. However, in this study, the models were constructed by flexibly analyzing the corresponding past data according to the forecast month, and even if the climate characteristics changed, the statistical relationship was newly derived from the moved historical data to update the forecast models. These "flexible" models have the advantage of being able to actively respond to long-term climate change.  Acknowledgments: This research was supported by a grant from a Strategic Research Project (Developing technology for water scarcity risk assessment and securing water resources of small and medium sized catchments against abnormal climate and extreme drought) funded by the Korea Institute of Civil Engineering and Building Technology. Authors appreciate the editors of the journal and the reviewers for their valuable comments and suggestions for improvements.

Conflicts of Interest:
The authors declare no conflict of interest.