Linking Singular Spectrum Analysis and Machine Learning for Monthly Rainfall Forecasting

Monthly rainfall forecasts can be translated into monthly runoff predictions that could support water resources planning and management activities. Therefore, development of monthly rainfall forecasting models in reservoir watersheds is essential for generating future rainfall amounts as an input to a water-resources-system simulation model to predict water shortage conditions. This research aims to examine the reliability of linking a data preprocessing method (singular spectrum analysis, SSA) with machine learning, least-squares support vector regression (LS-SVR), and random forest (RF), for monthly rainfall forecasting in two reservoir watersheds (Deji and Shihmen reservoir watersheds) located in Taiwan. Merging SSA with LS-SVR and RF, the hybrid models (SSA-LSSVR and SSA-RF) were developed and compared with the standard models (LS-SVR and RF). The proposed models were calibrated and validated using the watersheds’ observed areal monthly rainfalls separated into 70 percent of data for calibration and 30 percent of data for validation. Model performances were evaluated using two accuracy measures, root mean square error (RMSE) and Nash–Sutcliffe efficiency (NSE). Results show that the hybrid models could efficiently forecast monthly rainfalls. Nonetheless, the performances of the hybrid models vary in both watersheds which suggests that prior knowledge about the watershed’s hydrological behavior would be helpful to implement the appropriate model. Overall, the hybrid models significantly surpass the standard models for the two studied watersheds, which indicates that the proposed models are a prudent modeling approach that could be employed in the current research regions for monthly rainfall forecasting.


Introduction
In hydrological research, timely rainfall forecasts are of major concern, and can be used to deliver valuable data for agrarian development, controlling of water resources, and application of crop insurance [1,2]. Additionally, accurately forecast rainfalls can be used for reservoir operation and flooding prevention [3,4]. However, the spatial and temporal rainfall distribution has substantial consequences on the water availability on land and thus on agrarian operations. Considering that agricultural operations and crop production rely on the rainfall distribution, monthly rainfall forecasting is critical for agricultural scheduling and flood control. However, accurate forecasting of monthly rainfall remains a major challenge among hydrologists. Monthly rainfall forecasting with an appropriate method is an essential requirement to support water resources management [5,6]. Therefore, monthly rainfall forecasting is widely applicable in the field of hydrology [7,8]. For example, in Taiwan, due to industrial expansion and population growth, the demand for water has risen, and the hydrologic variability has increased, possibly because of the impact of climate change [3]. The forecasting of monthly precipitation is then very critical in Taiwan.
Genetic Programming models. Ref [40] developed a downscaling method by LS-SVR for improving the downscaling of extreme rainfall. Ref [41] used LS-SVR to forecast the reservoir inflows of Demirkopru Dam in Turkey. The findings promote LS-SVR efficiency. Ref [42] forecast multistep-ahead flow by means of LS-SVR, with the everyday river flow changes modelled prior to forecasting. Ref [43] used LS-SVR to calibrate parameters and to develop models. The study shows that the SVR approach is superior to the Box-Jenkins approach. The study concludes that the explanation for SVR's good performance lies in the non-linear characteristic of the captured and used SVR space.
More recently, interest in applications of random forest (RF) has extended to a number of areas [44][45][46]. However, there are only few applications of RF in hydrology. Ref [40] applied RF and LS-SVR to boost downscaling of intense precipitation. Ref [47] used RF for rainfall forecasting at Besut station on the eastern coast of the Malaysian Peninsula. Ref [48] carried out a comprehensive literature review to compare various approaches of machine learning, including SVR and RF. Ref [49] argues that RF offers a good alternative to SVR and has often performed better than SVR. Ref [50] developed an RF-based evaluation model for evaluating regional flood hazards, and [51] suggested a RF-based drought forecast to predict the monthly standardized precipitation index (SPI) time series. Ref [52] forecasted the start of winter rain by RF in Australia. Ref [53] applied RF in Vietnam to forecast the incoming flood of the Hoa Binh reservoir. Ref [54] proposed RF for daily and monthly rainfall forecasting. Ref [55] compared ANN, SVR, and RF performances in general flood applications and RF delivered the best output. Nevertheless, there is no study of RF-based models with a data preprocessing technique for monthly rainfall forecasts, which prompts this current study to introduce a hybrid model coupling RF with a data preprocessing technique.
Today, data preprocessing approaches have been applied to break down observed data into subseries so as to boost the accuracy of time series forecasting. Efficient forecast outcomes can be enhanced with independent modeling of subseries. The most extensively applied approaches of preprocessing data can be found in wavelet transformation [56], singular spectrum analysis (SSA) [57], ensemble empirical mode decomposition [58], and principal component analysis [59].
SSA has been applied to many different areas such as medical engineering [60], economics [61] and hydrology [62]. The SSA strategy comprises two phases: decomposition and reconstruction. In this study, only decomposition was implemented. Decomposition into different principle components (PCs) of the initial time series including trend patterns, oscillating components and noise [57]. It is used to conduct a spectrum analysis of the input data, remove high-frequency components, and invert the remaining components to produce a 'filter' time series. In SSA, different parts of a time series can be extracted [63,64]. For data pretreatment and modeling, SSA has shown that it is a successful algorithm [65,66]. Ref [17] introduced SSA in order to forecast rainfall at an Indian weather station. The results demonstrate that the SSA technique can separate the different components of the time series efficiently and allow forecasting day-to-day precipitation for a long period, such as one year in a single cycle, with fair precision. Ref [67] compare wavelet analysis with SSA. The results indicate that SSA performs better than WA. Ref [68] propose a hybrid model of an SSA adaptive noise reduction algorithm and a traditional feed-forward neural network forecasting, with the goal of significantly improving short-and long-duration time series forecasting. The findings indicate that the coupled model has superior performance as compared with the forecasts provided directly on raw data by the same network and is therefore well suited for forecasting short and noisy time series with the intrinsic deterministic method of data output. Ref [69] used SSA for decomposing and reconstructing water consumption with six climate variables so as to build a time series for seasonal forecasting. The SSA model is an efficient tool that can split the original time series down into a quantity of separate components, including trend patterns, oscillating components, and noise. Ref [62] applied SSA to annual precipitation, monthly runoff, and hourly water temperature time series to evaluate its capacity and forecasting ability to distinguish significant information from those series. The research concludes that the SSA has the ability to obtain and provide excellent predictions for significant hydrological time series components with distinctive uneven behaviors, such as precipitation and runoff series.
Ref [70] introduced SSA to extract a trend. The study showed that SSA is an attractive trend extraction technique, since it needs no model description of time series and trend, extracts noisy time series trends with uncertain oscillations in time series, and is sensitive to outliers. The aforementioned literature shows the significance of using a suitable data preprocessing method (SSA) to raw input signals to supply high quality data before being implemented as a model input. Although a large number of previous studies have been conducted to explore the advantage of coupled data preprocessing and machine leaning, to the best of our knowledge, this is the first time that SSA coupled with LSSVM and RF has been used for monthly rainfall forecasting. Subsequently, these two coupled models (i.e., SSA-LSSVM and SSA-RF) are compared to find a better model for each reservoir watershed.
There is convincing evidence that the hybrid modeling approach based on LS-SVR and RF is reliable and that the SSA produces superior outputs. Thus, for improving the forecasting performance of the standard models, the data preprocessing technique is adopted in the current study. The primary objectives are as follows: 1.
Linking SSA with machine learning techniques (i.e., LS-SVR and RF) to construct hybrid models (SSA-LSSVR and SSA-RF) for monthly rainfall forecasting in two reservoir watersheds of Taiwan where the hybrid models have not been applied before.

2.
Comparison between the hybrid models (i.e., SSA-LSSVR and SSA-RF) and the standard models (i.e., LS-SVR and RF) to validate the efficiency of the data preprocessing technique.
The remainder of this paper is arranged as follows. Section 2 ("Study Sites and Data collection") summarizes the study sites and the data collection. Section 3 ("Methodology") briefly describes the machine learning techniques (LS-SVR and RF), data preprocessing technique (SSA), coupling of SSA with LS-SVR and RF, and measurement of model performance. Section 4 ("Results and Discussion") shows the results of the proposed models for monthly rainfall forecasting, and elaborates and discusses the findings of the current study. Section 5 ("Conclusions") gives the conclusions and considers the need for future work.

Study Sites and Data Collection
Taiwan is located in the Western Pacific Ocean. The East Asian Monsoon characterizes its climate. The main sources of rainfall are heavy rainfall systems (Mei-Yu front) and typhoon events. The average rainfall for Taiwan is 2510 mm per year, spatially and temporally unequally distributed. About 70 percent of the annual rainfall occurs in the rainy season (May-October), and the dry season occurs from November to April.
This present study selected the Shihmen and Deji reservoir watersheds as the study areas, as shown in Figure 1. The study areas were chosen as the research interest for rainfall forecasting because they are critical to water resources in Taiwan. The watershed area of Shihmen reservoir is situated in the Danshuei river basin, in the north of Taiwan. The storage capacity in this reservoir is around 309 million m 3 . The watershed's area is 763 km 2 and the reservoir's elevation is 250 m above sea level. The annual average rainfall is about 2250 mm. In 1964, Shihmen reservoir was constructed as a multipurpose water supply storage tank for water irrigation and was used to produce hydroelectricity, prevent flooding, and for recreation [71]. Deji reservoir is situated in the middle of northern Taiwan. It has a capacity of about 266 million m 3 . The watershed covers an area of 1411 m above sea level. The annual average rainfall is about 2450 mm. Like Shihmen reservoir, Deji reservoir has been used since 1974 to supply municipal water, to produce hydropower, for recreation, and to prevent flooding.
Data were collected from Taiwan Water Resource Bureau, which includes the monthly rainfall data during 1958-2018 for each rain gauge in the Shihmen watershed and during 1981-2017 for each rain gauge in the Deji Reservoir watershed. For each reservoir watershed, the first 70% of the entire data collection was used for training, and the remaining 30% for validation. Tables 1 and 2 present geographical information on the two studied watersheds. The areal monthly rainfalls for each reservoir watershed were calculated using the Thiessen polygon method to calculate areas in relationship to Appl. Sci. 2020, 10, 3224 5 of 20 the rain gauges and thereby compute the average amount of rainfall (areal rainfall) that fell in the watershed, which was then used for constructing the rainfall forecasting models in the present study. Data were collected from Taiwan Water Resource Bureau, which includes the monthly rainfall data during 1958-2018 for each rain gauge in the Shihmen watershed and during 1981-2017 for each rain gauge in the Deji Reservoir watershed. For each reservoir watershed, the first 70% of the entire data collection was used for training, and the remaining 30% for validation. Tables 1 and 2 present geographical information on the two studied watersheds. The areal monthly rainfalls for each reservoir watershed were calculated using the Thiessen polygon method to calculate areas in relationship to the rain gauges and thereby compute the average amount of rainfall (areal rainfall) that fell in the watershed, which was then used for constructing the rainfall forecasting models in the present study.   To understand the characteristics of the monthly rainfall series in Deji and Shihmen Reservoir watersheds, the descriptive statistics were used for exploring their rainfall characteristics. The statistical behavior of any hydrological series can be described based on certain parameters such as mean, standard deviation, kurtosis, skewness, and coefficient of variation. In the present study, the mean, standard deviation, skewness, and kurtosis are used to describe the variability of monthly rainfall. Using the observed data, the descriptive statistics of rainfall data were estimated. Table 3 shows the statistical parameters of the monthly rainfall series for the two reservoir watersheds. The standard deviation value indicates that the precipitation pattern varies greatly. The skewness values show that rainfall patterns do not comply with a normal distribution. The skewness and kurtosis indicate differences in their statistical distribution.

Least-Squares Support Vector Machine
The least-squares support vector machine (LS-SVM), a new type of SVM, comprises a sequence of similar supervised learning techniques that analyze data and identify patterns. Based on the constraints of equality rather than inequality, the LS-SVM technique is employed for classification and regression [72]. Instead of resolving the convex quadratic programming problem, LS-SVM solutions are reached by resolving a series of linear equations. This alteration decreases the computational complexity and makes the LS-SVM more attractive. In addition, the model has the benefits of simplifying the problem and solution finding without losing precision [73,74]. More detailed information about LS-SVM is available from [75]. The LS-SVM technique has been applied to a variety of fields such as text classification [23,76], image processing [77,78], and time series forecasting [79,80]. In this analysis, the LS-SVR is applied to forecast monthly rainfall. In the following section, we briefly present the basic theory on LS-SVR in time series forecasting.
Considering a training set as input data x i and output y i , the regression function that links the input vector to the output can be defined as [81]: where the nonlinear mapping ϕ(x) maps the input data into a higher-dimensional feature space.
Transforming the regression problem in Equation (1) into a constrained quadratic optimization problem, by minimizing the cost variable, w and b can be estimated. In the structural minimization principle, the regression problem can be formulated as: subject to the following constraints: 7 of 20 where γ denotes the penalty term and e i is the training error for x i . In order to solve the optimization problem, the solution for optimizing the LS-SVM is to create a Lagrangian function as follows: where α i are the Lagrange multipliers (support values). Under the state of the Karush condition [82], we can partially differentiate the solution of Equation (4) to obtain w, b, e and alpha, respectively, as: e i and w elimination would generate a linear process rather than a quadratic problem of programming: Mercer's theorem can be fulfilled (a detailed explanation of the Mercer's condition can be found in [83]). Next, the LS-SVM method is implemented for estimating the function: Within LS-SVM, there are many possible choices for the kernel function, such as sigmoid kernel, polynomial kernel, linear kernel, and radial bias function (RBF). The latter has fewer parameters and needs to be adjusted for practical problems. In this study, we apply the Gaussian RBF kernel [84,85] followed by Equation (8): In the formula stated above two parameters need to be chosen: the penalty term (γ) and the kernel bandwidth sigma (σ). In this work, we used the LS-SVR implementation from the MATLAB toolbox LSSVMLAB.

Random Forest
Random forest (RF) is one of the most powerful machine learning models for predictive analysis, comprising of a range or group of simple trees. RF is an enhancement of the decision tree model based on the method of bagging (bootstrapping + aggregating). The bagging method may constitute the overfitting problem of computer models. In bagging, multiple stochastic error scenarios are performed by choosing training sets individually and randomly from the total historical observation set; aggregating the forecast of each individual trained model is accomplished by taking its arithmetic mean value. The decision tree model is chosen as the individual model of forecast in the RF-based model. In addition to forecasting, RF can also assess the significance of explaining variables to the forecast according to the simple rule, "the more relevant the explanatory variable, the more important the effect on the forecast" in order to use the RF-based model for the selection of variables. In bootstrap sampling, the total observations are divided into two groups for each individual model: in-bag subset (i.e., training subset) and out-of-bag subset. The out-of-bag subsets may be used to determine the significance of each explanatory variable: (1) randomizing the values for one chosen explanatory variable in the out-of-bag subset; (2) using the randomized out-of-bag subset and the original sample to make new predictions; (3) testing the significance of the chosen explanatory variable by increasing the mean square error of the new forecast. A detailed description of RF can be found in [86][87][88]. The current study employs the random forest package [88].

Singular Spectrum Analysis
SSA presents an effective approach to time series analysis in many areas of scientific research [89]. The SSA approach is particularly important when time series are decomposed into major components like trend patterns, oscillations, and noise [64]. A major benefit of the SSA approach is that it is nonparametric, meaning it can be tailored to the underlying data set and deny the need for an a priori model. For this reason, the SSA technique is considered a model-free method. According to [90], two supplementary stages are involved in the SSA method: decomposition and reconstruction. The theoretical development of SSA is as follows. Decomposition The process of decomposition comprises two steps: embedding and singular value decomposition (SVD). This decomposition is the main result of the SSA algorithm. Decomposition is important when each restored subsection can be categorized as either a trend pattern, or as an oscillation or aspect of noise.
Step 1: Embedding The embedding step is the initial step in the SSA algorithm. This method converts the observed time series to a multi-dimensional vector sequence.
The embedding technique maps the original time series by shaping K = N − L + 1 lagged vectors X i = x i , . . . , x i+L−1 T , i = 1, . . . ..K − L + 1 of the series f as: In Equation (9) we notice that X has the same elements on the anti-diagonals. This type of matrix is called a Hankel matrix.
According to [90], the size of the window L should be sufficiently large, but less than half of the time series. Larger values of L allow longer period oscillations to be detected, but a too-high value of L may require a large number of eigentriples and skip some essential major components with high contributions. Ref [62] selected window length L from experience, [91] repeatedly tried with varying window length, and another study [92] used proportional data length, such as N 3 , N 4 . In the present study, the method of proportional data length proposed by [92] is adopted.
Step 2: Singular Value Decomposition (SVD) The SVD of the trajectory matrix is the primary component of the decomposition process. From matrix X, define the covariance matrix XX T . The SVD of XX T provides a set of L eigenvalues in descending order of magnitude (λ 1 ≥ λ 2 ≥ ... ≥ λ L ≥ 0) and the corresponding eigenvectors U 1 , U 2 , . . . , U L . The SVD of the trajectory matrix can then be formulated as: . . , L, (equivalent to the ith column of XX T ).
The matrices X 1 have rank 1; thus, they are elementary matrices. The collection √ λ i , U i , V i denotes the ith eigentriple.

Coupling SSA with Machine Learning (LS-SVR and RF)
The values of monthly rainfall were standardized by their respective means and standard deviations prior to training the standard models (LS-SVR and RF). To train the standard models, the standardized rainfall values were then used. As mentioned in Section 3.1, two parameters, the penalty term (γ) and the kernel width (σ), should be selected in the calibration process of the LS-SVR model. The grid search technique is utilized for optimizing parameters during the calibrating period of the LS-SVR model. The grid search technique is capable of producing an optimum parameter set and can overcome the problems of over-fitting of the model by means of the cross-validation procedure.
RF has two parameters, the number of variables √ M (mtry) and the number of trees (ntree), which need to be determined. Ref [93] found that √ M (mtry) would usually produce near-optimum results, so the value of mtry was selected by trial and error using the value around √ M (value of M is 4). The range of ntree from 0 to 2000 was used to search the best value. However, no substantial change was achieved compared to the default value for ntree of 500. Hence, the value of 500 for ntree was adopted in this current study.
The areal monthly rainfalls in Deji Reservoir watershed from 1981-2017 and in Shihmen Reservoir watershed from 1958-2018 were used. In Deji Reservoir watershed, the first 25 years of rainfall data were applied for training and the remaining 10 years of rainfall data were used for testing, whereas in Shihmen Reservoir watershed, the first 41 years of rainfall data were used for training and the remaining 18 years of rainfall data were used for testing. The areal monthly rainfall forecasting for each reservoir watershed was implemented using the forecasting models (i.e., LS-SVR and RF). The collection of appropriate input data sets is a major concern for LS-SVR and RF modeling. Different combination of antecedent values of the rainfall data were considered as inputs (i.e., (1) R (t) ; (2) R (t) , R (t-1) ; (3) R (t) , R (t-1) , R (t-2) ). The output is rainfall times series data to be forecasted with 1-, 2-, and 3month lead-time (i.e., R (t+1) , R (t+2) , and R (t+3) ).
The hybrid models (i.e., SSA-LSSVR and SSA-RF) were obtained by combining two different methods. In view of the SSA's dominance, the hybrid models are designed to improve forecasting performance and reliability. The results of the standard and hybrid models are compared to assess the model performance in rainfall forecasting in different study catchments.
The subsequent steps give a summary of the methodological processes for the model: 1. Initially, the time series of rainfall data was decomposed into several principal components (PCs) using SSA.

2.
The relevant principal components are calculated on the basis of the trend or period of each series, and a new series for each variable is constituted by adding up the primary components to be defined. The new series was used to construct model input.

3.
LS-SVR and RF models are applied to every component of the reconstruction so that the architecture of LS-SVR and RF is different for each component of the reconstruction.

4.
Finally, LS-SVR and RF models are fed with the new series to forecast the future rainfalls for 1-, 2-, and 3-month lead-time. This is the principal idea of coupling SSA with machine learning techniques (LS-SVR and RF), as illustrated in Figure 2. Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 20

Forecast Verification
The performances of the forecasting models are analyzed using the two criteria, root mean square error (RMSE) and the Nash-Sutcliffe model efficiency coefficient (NSE), for the calibration and validation periods. The definitions of the different criteria are presented below: where and are the observed and estimated values, respectively. Smaller values of RMSE suggest higher accuracy.
An NSE of 0.75-1.0 corresponds to a "very good" performance, 0.65-0.75 to a "good" performance, and 0.5-0.65 to a "reasonable performance", while values below 0.5 reflect unsatisfactory performance [94]. In essence, the closer NSE is to 1, the more accurate is the forecast.

Results and Discussion
The current study constructed the monthly rainfall forecasting models for 1-, 2-, and 3-month lead-time for two reservoir watersheds (i.e., Deji and Shihmen) in different regions of Taiwan. The standard models (i.e., LS-SVR and RF) and the hybrid models (i.e., SSA-LSSVR and SSA-RF) have comparable processes for the data modeling. The difference is that the standard models used the raw data as model input, whereas the hybrid models used the decomposed input data generated by SSA instead of raw data. Table 4 presents the forecasting performances for 1-, 2-, and 3-month lead-time for the standard and hybrid models during the validation period by the criteria of RMSE and NSE.

Forecast Verification
The performances of the forecasting models are analyzed using the two criteria, root mean square error (RMSE) and the Nash-Sutcliffe model efficiency coefficient (NSE), for the calibration and validation periods. The definitions of the different criteria are presented below: where y i andŷ i are the observed and estimated values, respectively. Smaller values of RMSE suggest higher accuracy.
An NSE of 0.75-1.0 corresponds to a "very good" performance, 0.65-0.75 to a "good" performance, and 0.5-0.65 to a "reasonable performance", while values below 0.5 reflect unsatisfactory performance [94]. In essence, the closer NSE is to 1, the more accurate is the forecast.

Results and Discussion
The current study constructed the monthly rainfall forecasting models for 1-, 2-, and 3-month lead-time for two reservoir watersheds (i.e., Deji and Shihmen) in different regions of Taiwan. The standard models (i.e., LS-SVR and RF) and the hybrid models (i.e., SSA-LSSVR and SSA-RF) have comparable processes for the data modeling. The difference is that the standard models used the raw data as model input, whereas the hybrid models used the decomposed input data generated by SSA instead of raw data. Table 4 presents the forecasting performances for 1-, 2-, and 3-month lead-time for the standard and hybrid models during the validation period by the criteria of RMSE and NSE. The numbers 1, 2, and 3 denote the lead times of one, two, and three months, respectively.
As observed from Table 4, it is found that RMSE and NSE exhibit very poor values for the standard models using original data when compared to the hybrid models using data generated by SSA. There is also evidence that the standard models have been poorly validated for all three lead-times (i.e., 1-, 2-, and 3-month lead-time). Using only the original data as input for rainfall forecasting seems less efficient than using the data generated by SSA. The LS-SVR's output is noise-sensitive and may not be effective when the level of noise is high. Therefore, the LS-SVR coupling with the SSA filtering the raw rainfall data will reduce the noise effects. Furthermore, the trained LS-SVR in conjunction with the SSA thus paid greater attention to the noise sensitivity, increasing the overall forecasting performance of the hybrid model. Ref [68] suggested that the coupled model (the SSA with predictive model) has superior performance as compared to those directly using the raw data and is therefore well suited for forecasting noisy time series. Figures 3 and 4 illustrate the time-series graphs of the observed and 1-month lead-time forecasted rainfalls by the standard models during the validation process for Deji and Shihmen Reservoir watersheds respectively. From the figures, we find that the forecasted values from the hybrid models are closer to the observed values than the values forecasted by the standard models. The scatterplots of observed vs. values forecasted by the standard models (LS-SVR and RF) and the hybrid models (SSA-LSSVR and SSA-RF) in Deji and Shihmen Reservoir watersheds are also illustrated in Figures 5 and 6 for 1-,2-, and 3-month lead-time forecasts, respectively. The rainfalls forecasted by the hybrid models were found to be very closely limited to the line of equality, whereas the rainfalls forecasted by the standard models are not close to the line of equality. This also reveals that the standard models without a data preprocessing technique are not efficient to forecast rainfalls and the hybrid models are able to approximate different rainfall properties much better than the standard models.
Moreover, the performances of the hybrid models (SSA-LSSVR and SSA-RF) vary in both reservoir watersheds. For example, the SSA-LSSVR forecasts better than the SSA-RF in Deji reservoir watershed for all three lead-times. NSE values above 0.75 may characteristically be considered as a "very good" performance [94]. From Table 4 The above results show that the SSA-LSSVR outperforms the SSA-RF in Deji Reservoir watershed, whereas the SSA-RF is superior to the SSA-LSSVR in Shihmen Reservoir watershed, which may be attributed to the different rainfall characteristics of the catchment. The catchment characteristics are an important feature of all models for hydrological simulation and prediction. The performance of simulation and prediction techniques for single hydrometric stations differ according to their watershed climate zone and characteristics, and the climate conditions in different watersheds will significantly affect the effectiveness of different forecasting techniques [95]. In some cases, the behavior of the watersheds is more sensitive to the effects of rainfall patterns, so some models would perform better than others [96]. The above results show that the SSA-LSSVR outperforms the SSA-RF in Deji Reservoir watershed, whereas the SSA-RF is superior to the SSA-LSSVR in Shihmen Reservoir watershed, which may be attributed to the different rainfall characteristics of the catchment. The catchment characteristics are an important feature of all models for hydrological simulation and prediction. The performance of simulation and prediction techniques for single hydrometric stations differ according to their watershed climate zone and characteristics, and the climate conditions in different watersheds will significantly affect the effectiveness of different forecasting techniques [95]. In some cases, the behavior of the watersheds is more sensitive to the effects of rainfall patterns, so some models would perform better than others [96]. Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 20      In general, given the aforementioned differences in model performances, our findings support the view that the hybrid models are more efficient for monthly rainfall forecasting than the standard models. Furthermore, the performances of the two hybrid models (SSA-LSSVR and SSA-RF) vary in both reservoir watersheds, which might result from different hydrological behaviors (e.g., rainfall patterns). Having a prior knowledge of the rainfall characteristics (standard deviation, skewness, and kurtosis) of the studied watershed would be helpful to apply the suitable model. However, to be able to make a thorough conclusion, more study cases should be investigated in future work. On the basis of our current findings, only a general assumption can be drawn that if the standard deviation, skewness, and kurtosis values are small then SSA-LSSVR may perform better as seen in Table 3; otherwise, SSA-RF should be preferred. Thus, the current study preliminarily investigated the characteristics of the monthly rainfall series for the two reservoir watersheds. In Table 3, the rainfall data for both watersheds can be observed with higher positive skewness and very high kurtosis values. This illustrates a strong right-tailed distribution with a very sharp peak and a fat tail. Furthermore, with the standard deviation values of monthly rainfall (Table 3), it could be concluded that rainfall data for the Shihmen Reservoir watershed is slightly more scattered than that of Deji Reservoir watershed. Moreover, the difference in standard deviation values for the rainfall can justify the difference in the behavior of the reservoir watersheds. In the current study, it seems that the SSA-RF performs better than the SSA-LSSVR for the watershed with more scattered rainfall data. Nevertheless, linking the rainfall characteristics of watersheds to the forecasting model performances should be further investigated for more study sites in future work.

Conclusions
The current study examined the reliability of combining a data preprocessing method (SSA) with the machine learning techniques, LS-SVR and RF, for monthly rainfall forecasting in two reservoir watersheds of Taiwan. One of the major findings is that the hybrid models (SSA-LSSVR and SSA-RF) have better performance than the standard models (LS-SVR and RF) for both watersheds. It can be concluded that the hybrid models are a prospective modeling approach that can be applied to forecast the monthly rainfalls in the present study region. However, the two hybrid models perform differently in the two watersheds (i.e., SSA-LSSVR performs better than SSA-RF in one watershed but performs worse in the other). Linking the rainfall characteristics of watersheds to the forecasting model performances is suggested to be further investigated in future work. In addition, only one data preprocessing method (SSA) has been adopted; thus, future work may consider various preprocessing techniques (wavelets) and compare their efficiencies to achieve more accurate predictive outcomes. Moreover, only the areal monthly rainfall data from two reservoir watersheds was used in the current study. More study areas should be included for validating the findings.