Assessment of Quarterly, Semiannual and Annual Models to Forecast Monthly Rainfall Anomalies: The Case of a Tropical Andean Basin

: Rainfall forecasting is essential to manage water resources and make timely decisions to mitigate adverse effects related to unexpected events. Considering that rainfall drivers can change throughout the year, one approach to implementing forecasting models is to generate a model for each period in which the mechanisms are nearly constant, e.g., each season. The chosen predictors can be more robust, and the resulting models perform better. However, it has not been assessed whether the approach mentioned above offers better performance in forecasting models from a practical perspective in the tropical Andean region. This study evaluated quarterly, semiannual and annual models for forecasting monthly rainfall anomalies in an Andean basin to show if models implemented for fewer months outperform accuracy; all the models forecast rainfall on a monthly scale. Lagged rainfall and climate indices were used as predictors. Support vector regression (SVR) was used to select the most relevant predictors and train the models. The results showed a better performance of the annual models mainly due to the greater amount of data that SVR can take advantage of in training. If the training of the annual models had less data, the quarterly models would be the best. In conclusion, the annual models show greater accuracy in the rainfall forecast. This study shows a real approach to implementing operational forecasting models and al-lowing more accurate insight into the generalization of the models in a production environment. Author Contributions: Conceptualization, methodology, formal draft visualization, authors


Introduction
Rain is a phenomenon that significantly conditions human activity. Knowing its dynamics and forecasting its behavior is essential to optimize water use, for example, in human consumption, hydroelectric generation, agriculture [1], and industry. On the other hand, anticipating extreme rainfall events helps to take measures to mitigate possible adverse effects (e.g., landslides, floods, and droughts). Examples of such extreme events are the droughts in the Southwest US [2], the São Francisco river basin (Brazil) [3], the northeast region of Brazil [4,5], and over Brazil [6]. Additionally, rain is an essential atmospheric variable to characterize the climate [7]. Therefore, unveiling the drivers related to this hydrologic process is essential to understanding possible changes in its dynamics under low-frequency natural climate variability [8] or climatic change [9].
Different models allow us to anticipate rain behavior. There are different methods that can be used to make predictions. Dynamic models are physically consistent [10], but these have a tremendous computational burden. Instead, statistical methods are widely  The integrated management of natural resources in the Machángara basin guarantees the provision of essential services, for example, water for human consumption for more than 390,000 inhabitants of Cuenca ( ∼ =60% of the population), irrigation for more than 3900 users, the generation of 39.5 MW of hydroelectricity (the first source of electric energy in Ecuador [34]), and the provision of water for various industries in the area. In the highest part of the basin, two representative stations were selected for the study, namely El Labrado and Chanlúd, which are at approximately 3335 and 3485 m a.s.l., respectively. The two stations are located in the dams that bear the same names [35] and are greatly important in the national hydroelectric generation system.

Data
The daily rainfall data of El Labrado and Chanlúd go back to 1964 and 1981, respectively. This study used monthly rainfall data from 1981 to 2021 (41 y). Therefore, the daily rainfall corresponding to each month of the period 1981-2021 was added to generate data on a monthly scale. Figure 2a shows the monthly rainfall data of El Labrado and Chanlúd in the study period. Rainfall shows a marked bimodal seasonality, which is shown in Figure 2b. A peak of heavy rain is present in April, while the driest month is August.

Data
The daily rainfall data of El Labrado and Chanlúd go back to 1964 and 1981, respectively. This study used monthly rainfall data from 1981 to 2021 (41y). Therefore, the daily rainfall corresponding to each month of the period 1981-2021 was added to generate data on a monthly scale. Figure 2a shows the monthly rainfall data of El Labrado and Chanlúd in the study period. Rainfall shows a marked bimodal seasonality, which is shown in Figure 2b. A peak of heavy rain is present in April, while the driest month is August. The target variable of the forecasting models is the rainfall anomalies. Most of the predictors of such models are based on climate indices. Table 1 shows the 38 climate indices used in the study, which have a monthly resolution. Thirty-four were downloaded from the National Oceanic and Atmospheric Administration (NOAA) (https://psl.noaa.gov/data/climateindices/list/, accessed on 21 January 2022), and the sources of the rest are indicated in the table footer.

North Hemisphere
Pacific/North American Index (PNA), East Pacific/North Pacific Oscillation (EP/NP) [37], West Pacific Index (WP), North Atlantic Oscillation (NAO) [38], Jones NAO (J.NAO *) [39], East Atlantic (EA) and Arctic Oscillation (AO) [40]. South Hemisphere Antarctic Oscillation (AAO) [41]. The target variable of the forecasting models is the rainfall anomalies. Most of the predictors of such models are based on climate indices. Table 1 shows the 38 climate indices used in the study, which have a monthly resolution. Thirty-four were downloaded from the National Oceanic and Atmospheric Administration (NOAA) (https://psl.noaa.gov/ data/climateindices/list/, accessed on 21 January 2022), and the sources of the rest are indicated in the table footer.
Both rainfall and climate indices constitute the raw monthly dataset. The dataset was split into training and testing subsets. The former spanned from 1981 to 2015 (35 y) and the latter from 2016 to 2021 (6 y). Figure 2a shows the training subset period in a white background, while the testing period is shaded. The training subset was used to compute rainfall anomalies, standardize the dataset, generate and select predictors, and train the forecasting models. The testing subset was only used to evaluate the performance of the models. It is worth noting that the standardization of the testing subset was based on the parameters found in the training subset. The latter ensures an adequate evaluation since it simulates a scenario in which nothing is known beyond the data available for model training.

Settings and Workflow
This subsection explains the workflow followed in the study in a general manner. The following subsections describe details about data or methods in each step. So, Scheme 1 shows the workflow to assess the quarterly, semiannual and annual models to forecast monthly rainfall anomalies.
The first step shown in Scheme 1 is the deseasonalization of rainfall. The monthly rainfall climatology (Figure 2b) was computed based on the training subset period . Then, the climatology was subtracted from the raw rainfall signal. The second step is the standardization of the rainfall anomalies and climate indices data by removing the mean and scaling to unit variance. The Standard Scaler from the Scikit-learn library [52] was used in this step. As an example, Figure 2c shows the scaled and standardized rainfall anomalies. The shaded period in Figure 2c was not used to compute the parameters in the scaling and standardization. The third step is the generation of the candidate predictor set. For this, lagged versions of the time series of the rainfall anomalies and climate indices were generated up to a maximum τ lag (τ max ) which was chosen through an autocorrelation analysis. These signal delays are operationally essential to generate predictors with past information that serve the current forecast, anticipating the decision making.
The fourth step is the selection of relevant predictors for each forecasting model. For each forecasting model, different subsets of relevant predictors were selected (dashed squares in step 4) through the sequential forward selection (SFS) algorithm. There were seven different subsets for four quarterly models, two semiannual models, and one annual model. The quarterly models forecast anomaly rainfall of months belonging to each of the four seasons (e.g., the DJF model predicts rainfall in Dec-Feb), the semiannual models forecast rainfall of months belonging to each of the two semesters (e.g., the NDJFMA model forecasts rainfall in November-April), and the annual model forecasts rainfall of any month of the year. In any case, the forecasting horizon was one year. Those seven models were trained in the fifth step shown in Scheme 1. The forecasting models were based on the support vector regression (SVR) learning algorithm [53]. indices were generated up to a maximum τ lag (τmax) which was chosen through an autocorrelation analysis. These signal delays are operationally essential to generate predictors with past information that serve the current forecast, anticipating the decision making. Scheme 1. Workflow followed in the study.
The fourth step is the selection of relevant predictors for each forecasting model. For each forecasting model, different subsets of relevant predictors were selected (dashed squares in step 4) through the sequential forward selection (SFS) algorithm. There were seven different subsets for four quarterly models, two semiannual models, and one annual model. The quarterly models forecast anomaly rainfall of months belonging to each of the four seasons (e.g., the DJF model predicts rainfall in Dec-Feb), the semiannual models forecast rainfall of months belonging to each of the two semesters (e.g., the NDJFMA model forecasts rainfall in November-April), and the annual model forecasts rainfall of any month of the year. In any case, the forecasting horizon was one year. Those seven models were trained in the fifth step shown in Scheme 1. The forecasting models were based on the support vector regression (SVR) learning algorithm [53]. The last step is the evaluation. Each year of rainfall was forecasted independently for the testing subset period (2016-2021). With the entire testing period, a qualitative comparison was first made. Then, the evaluation was performed with seven evaluation metrics (the models had a horizon of one year). The comparison was performed with the raw testing rainfall data, so the results of the models were firstly converted (seasonalization) to the original scale (based on the parameters of the training subset).

Maximum τ lag (τ max )
In order to choose the τ max for the generation of candidate predictors, the autocorrelation of the raw rainfall signals was used. Each station's autocorrelation function (ACF) was plotted with 95% confidence intervals employing the statsmodels library [54]. These confidence intervals suggest that the correlation values within them are likely a statistical fluke. The standard deviation computation for the confidence intervals was performed according to Bartlett's formula [55,56].
The 38 climate indices from which the candidate predictors were derived had a particular τ max , after which, the correlation with rainfall (target variable) was no longer significant. Beyond an analysis of autocorrelations, an exhaustive analysis of lagged crosscorrelations (such as in [57]) would allow one to obtain a τ max for each index concerning the target variable and to possibly achieve higher-performance forecasting models. In addition, the above should have been taken into account for both El Labrado and Chanlúd rainfall. This would have led to 39 different τ max for each station. However, in order not to divert the study from the objective pursued, only the autocorrelation of rainfall was taken into account. Once the autocorrelation graphs of each station were analyzed, a single reasonable τ max value was taken through the study.

Generation of Candidate Predictor Sets
Four quarterly models were trained, one by each season, i.e., DJF, MAM, JJA and SON, two semiannual models related to semesters November-April (NDJFMA) and May-October (MJJASO) and one annual model (J-D). These models were schematized as blank squares in step five in Scheme 1. The forecasting horizon was one year, and as an example, if one wanted to forecast rainfall for 2016, January rainfall would be forecasted with the DJF model. Such a value would be immediately used as a possible predictor to forecast the February rainfall with the DJF model. The MAM model would be used to forecast March rainfall, and the same for April and May rainfall. The exact process would be used to forecast the rest of the months. Thus, the following lags were used for each forecasting model to generate the candidate predictor sets.
Candidate predictor set for annual models: • J-D: 13 to τ max .
The minimum limit of each interval shown above allows one to leverage as much information as possible. For instance, for forecasting March-May rainfall using the MAM model, information with a minimum lag of six means that information from November of the previous year could be used.

Sequential Forward Selection (SFS) of Predictors
The sequential forward selection (SFS) is a greedy approach to selecting the best new predictor iteratively from the candidate predictor set to aggregate to a subset of selected predictors [58]. The algorithm initially finds one predictor (the first) that maximizes a cross-validated score when a learning algorithm is trained on this single predictor. After the first (best) predictor is selected, the algorithm finds the second predictor that maximizes the score of the learning algorithm when it is trained on these two single predictors. The process is repeated by adding new predictors to the subset of selected predictors in each iteration. The optimum subset of relevant predictors is the one that gives the best cross-validated performance. There are different implementations of the SFS algorithm, and here, the one from the MLxtend library [59] was used. Moreover, this study used the support vector regression (SVR) learning algorithm to compute the score based on the selected predictor subsets. The SVR implementation of the Scikit-learn library [52] was used with the default hyperparameters. The cross-validated score was obtained through 5-fold cross-validation.

Support Vector Regression (SVR)
The support vector regression (SVR) model was proposed by Vapnik [60] and is a suitable model for linear and nonlinear regression. SVR is based on elements of the support vector machine (SVM), where support vectors are the closest points toward the generated hyperplane in a high-dimensional feature space [53]. As in most machine learning models, the training data are divided into two subsets: the training and validation sets [61]. The SVR model maps the training data to a high-dimensional feature space using a kernel. The radial basis function (RBF) kernel was used in this study. The hyperparameters are then optimized (i.e., model training) by fitting the model to the training data in that feature space. The formal definition of the SVR model is as follows.
Given {x i , y i } denoted as a characteristic vector of sample data with i = 1, 2, . . . , m samples, where x i R n , n is the number of predictors, and y i R is the target variable (rainfall anomalies). The SVM regression estimation function is defined as where W T is the weights matrix of the independent function, φ(x) is the nonlinear (kernel) mapping function, and b is the intercept. Then, W T and b can be obtained by minimizing the equation Min : where ||W|| 2 is known as regularized term; C is the penalty parameter; and R ε is the insensitive loss function (error control function) of the margin of tolerance ε. The SVR model generally requires a small sample size for training, has a simple statistical structure and performs better than complex models, e.g., artificial neural networks.
In the model training, the training data were not only divided into a new training subset and a validation subset. Instead, the 5-fold cross-validation technique was used to gain generalization. A particular subset of predictors for each forecasting model was used (dashed squares of step 4 in Scheme 1). Moreover, the following set of values for each of the hyperparameters was used in the training process: The γ hyperparameter corresponds to the RBF kernel used and relates to the inverse of the radius of influence of registers selected by the model as support vectors. The number of models trained until the best-performing one is found for each forecasting model is 31,713 × 5 = 158,565 (31,713 hyperparameter combinations, and 5 corresponds to the 5-fold cross-validation).
Mean Absolute Relative Error (MARE). The MARE measures how much error exists in the forecasted rainfall relative to the observed values in absolute terms. It is computed by The MARE is independent of the time series scale, and its value ranges from 0 to ∞, with 0 being the measure of a perfect forecast.
Mean Absolute Error (MAE). The MAE represents the average of the absolute difference between the forecasted values and the observations. It measures the average of the residuals regardless of their sign. The MAE is defined as This metric scale depends on the scale of rainfall, and its value ranges from 0 to ∞, with 0 being the best value.
Root Mean Square Error (RMSE). The RMSE is the square root of the average of the squared difference between the forecasted values and the observations. The RMSE is defined as The RMSE values are dependent on the time-series scale, and its value ranges from 0 to ∞, with 0 being the measure for a perfect forecast.
Nash-Sutcliffe Efficiency (NSE). The NSE [66] is widely used to evaluate the performance of hydrological models. Although the NSE is susceptible to outliers because it takes a sum over the squared values of the differences between the forecasted values and the observations, it is even better than other metrics, such as the coefficient of determination. The NSE is defined as where − y is the average of the rainfall time series (observations). The scale of this metric is independent of the scale of the rainfall values. The values of this metric go from −∞ to 1, with 1 meaning perfect forecasting, 0 meaning that the results are as good as always using − y for the forecasting and negative values meaning arbitrarily bad results.
Kling-Gupta Efficiency (KGE). The KGE [67] is a robust performance measure based on three equally weighted components: variability, linear correlation, and bias ratio between forecasted and observed rainfall. The KGE is defined as where α is the variability (the ratio between the standard deviation of forecasted rainfall over the observed rainfall), cc is the linear correlation coefficient between forecasted and observed values, and β is the division between the average of the forecasted rainfall over the average of the observed rainfall. The KGE is independent of the rainfall scale, and its value goes from −∞ to 1. The higher the value, the better the forecast.
Explained Variance (EV). The EV measures the proportion of the variance of the residuals (differences between y i andŷ i ) and the rainfall variance. It is computed by The EV is independent of the time series scale, and its value ranges from −∞ to 1, with 1 being the optimum value and negative values indicating arbitrarily bad forecasting results. EV = 0 indicates that the model is as good as using any fixed value for the forecast.
Percent Bias (PBIAS). The PBIAS determines whether there is a tendency in the values forecasted by the model (i.e., if these are higher or lower than the observed values). A positive PBIAS indicates that the model overestimates the forecasted variable, while a negative value indicates that the variable is underestimated. The optimal value is a PBIAS equal to zero. This metric is defined as This metric is independent of the rainfall scale, and the closer the value of |PBIAS| to 0, the better the results, with 0 being the optimum value. |PBIAS| values greater than 100 indicate arbitrarily bad results. Figure 3 shows the ACFs of El Labrado and Chanlúd rainfall. The autocorrelation demonstrated a high seasonal rainfall signal with statistical significance until 37 months in El Labrado and 49 months in Chanlúd. A lag of 37 months was then taken as the τ max to create the candidate predictor set. Since this study concentrates on comparing models differing in the number of months used in the implementation, the same number of 37 lags was taken as the maximum to create the candidate predictors based on the 38 climate indices (Table 1).

τ max for Generating the Candidate Predictors
is a PBIAS equal to zero. This metric is defined as This metric is independent of the rainfall scale, and the closer the value of |PBI to 0, the better the results, with 0 being the optimum value. |PBIAS| values greater t 100 indicate arbitrarily bad results. Figure 3 shows the ACFs of El Labrado and Chanlúd rainfall. The autocorrela demonstrated a high seasonal rainfall signal with statistical significance until 37 mo in El Labrado and 49 months in Chanlúd. A lag of 37 months was then taken as the τm create the candidate predictor set. Since this study concentrates on comparing mo differing in the number of months used in the implementation, the same number o lags was taken as the maximum to create the candidate predictors based on the 38 clim indices (Table 1).   Figure 4 shows the results of selecting the optimum number of predictors through the SFS approach. Each column corresponds to the different model sets, i.e., quarterly, semiannual and annual models. Each row corresponds to the models for El Labrado and Chanlúd. The performance behavior of the models showed a similar tendency. The optimum number of predictors was around 81 and 68 for El Labrado and Chanlúd, respectively. The optimum number of predictors for quarterly, semiannual and annual models for El Labrado were around 93, 73 and 55, respectively. Meanwhile, for Chanlúd, the optimum numbers were around 69, 49 and 105. However, the performance of the annual model for Chanlúd (Figure 4f) had a lower variance in the approximated interval of 35-105 predictors. Thus, the greater the number of months that are considered in the models with different periods, the fewer the predictors that are needed to get the best performance. A possible explanation for this behavior is the greater number of records (instances) that the models had in the training stage of the SFS as they were built for more months (e.g., annual models). The annual models used 100% of the available records in the selection of predictors (383 of the 420 because of the candidate predictor generation with 37 lags), the semiannual models used 50% of such records (November-April: 191, May-October: 192), and the quarterly models used 25% (DJF: 95, MAM: 96, JJA: 96, SON: 96). The SFS used the SVR learning algorithm with the default hyperparameters, so varying the number of records changed the amount of relevant information for the forecasting. From a purely predictive point of view, models for more months achieve better performance with fewer predictors that have more relevant information to achieve better generalization. with 37 lags), the semiannual models used 50% of such records (November-April: 191, May-October: 192), and the quarterly models used 25% (DJF: 95, MAM: 96, JJA: 96, SON: 96). The SFS used the SVR learning algorithm with the default hyperparameters, so varying the number of records changed the amount of relevant information for the forecasting. From a purely predictive point of view, models for more months achieve better performance with fewer predictors that have more relevant information to achieve better generalization.

Relevant Predictors
The optimum number of predictors allows light to be shed on the more prominent indices to predict rainfall anomalies in the stations of the study zone. First, quarterly models allowed indices influencing each season to be analyzed. The analysis was conducted using the times an index with different lags was chosen. This number is labeled the frequency in Figures 5-8. The mean number of lags of the different indices chosen are shown in intervals of up to 12 months, 24 months and more than 24 months. Figure 5 shows the climate indices providing more information for the predictor selection stage in each season for El Labrado. The most prominent feature was the EP/NP climate index in all seasons. In all the seasons, the EP/NP mean lag was within the 12 months before the rainfall value that had to be forecasted. The NP index was the second most prominent feature present in the models for the four seasons. NP was the second most prominent in DJF and SON, third in MAM, and sixth in JJA. Like EP/NP, the NP mean lag was within 12 months before the forecasted rainfall value. The Niño 3 index was present within the seven most prominent indices in DJF and MAM. DJF is a season when the climate conditions of the ENSO regions in the Pacific are more important in learning about rainfall in the Ecuadorian Andes [68]. The results showed that information 12 months before the rainfall observation is also important in learning about rainfall anomalies. It is worth noting that the Niño 3 index is a mean value index and does not refer to anomalies.

Relevant Predictors
The optimum number of predictors allows light to be shed on the more prominent indices to predict rainfall anomalies in the stations of the study zone. First, quarterly models allowed indices influencing each season to be analyzed. The analysis was conducted using the times an index with different lags was chosen. This number is labeled the frequency in Figures 5-8. The mean number of lags of the different indices chosen are shown in intervals of up to 12 months, 24 months and more than 24 months.  On the other hand, the same signal (El Labrado) and the Niño 1+2 index were prominent indices in JJA and SON models. El Labrado signal was the third most frequent index appearing in the models for JJA and SON. The Niño 1+2 index mean lag was 12 months before the rainfall value. Again, Niño 1+2 is not an anomaly index but provides information in selecting predictors. September-November. The size of each circle corresponds to the frequency with which a climate index (with different lags) appears as a predictor of the model. The color of each circle corresponds to the mean of the lags with which a climate index appears as a predictor of the model.   GMSST was not prominent and even not present in SON. PDO was not prominent because it is a shallow frequency signal. PacWarm was only present in DJF and MAM. Niño 4 was present in DJF and JJA but with a low frequency; it was not present in MAM and SON. BEST and TPI.IPO were only present in SON but were not very prominent. CAR, QBO and ESPI were only present in DJF. TNI, MEIv2, Niño 4.A, and AMM were not present in any season. Niño 3 in NDJFMA; the rest were not present in any model. TNI, EMI, TPI.IPO, TNA, TSA, IDO.W and DMI did not appear in any model. CAR was present in both models for Chanlúd but not for El Labrado. QBO was present in NDJFMA for both El Labrado and Chanlúd but not for MJJASO.  Figure 4b); (c,d) models for Chanlúd (dots in Figure 4e); (a,c) semester November-April; (b,d) semester May-October. The size of each circle corresponds to the frequency with which a climate index (with different lags) appears as a predictor of the model. The color of each circle corresponds to the mean of the lags with which a climate index appears as a predictor of the model. Figure 8 shows the predictors selected for the annual models for El Labrado ( Figure  8a) and Chanlúd (Figure 8b). Interestingly, the same signal of El Labrado and Chanlúd rainfall, with a mean lag within 12 months, was within the higher-frequency indices. CAR was the most frequently chosen for El Labrado. Meanwhile, lags of the Chanlúd  Figure 4b); (c,d) models for Chanlúd (dots in Figure 4e); (a,c) semester November-April; (b,d) semester May-October. The size of each circle corresponds to the frequency with which a climate index (with different lags) appears as a predictor of the model. The color of each circle corresponds to the mean of the lags with which a climate index appears as a predictor of the model. Figure 6 shows the climate indices providing more information in the predictor selection stage in each season for Chanlúd. Like for El Labrado, the most prominent climate index was EP/NP. Again, the mean lag of the predictors derived from EPO was within 12 months. Unlike El Labrado, NP was not found as a frequent index even though it was present in all seasons. Chanlúd appeared as the third most frequent index in SON. TNI and PDO were only present in SON. Niño 4 was only present in DJF. TPI.IPO was only present in MAM. MEIV2, QBO and AMM were not present in any season. rainfall anomalies were the most frequent in Chanlúd. NAO and EP/NP were indices within the six prominent índices in both El Labrado and Chanlúd.
In models for Chanlúd AO, Niño 1+2, Niño 3, Niño 3.A, Niño 3.4, Niño 3.4.A, Niño 4, Niño 4.A, TNI, WHWP and TNA were not present. Figure 4c); (b) model for Chanlúd (dots in Figure 4f). The size of each circle corresponds to the frequency with which a climate index (with different lags) appears as a predictor of the model. The color of each circle corresponds to the mean of the lags with which a climate index appears as a predictor of the model. Figure 9 allows one to compare the rainfall forecasts from quarterly, semiannual and annual models. An outstanding feature in El Labrado (Figure 9a) is the overestimation of the semiannual model in November and December. This characteristic was prominent from 2016 to 2019. Quarterly and annual models showed similar results even though quarterly models showed better results for some specific months, for instance, from October to December 2016, and October and December 2019. Generally, quarterly models best reproduced the pattern of September-December. Likewise, semiannual models showed the best results in months such as December 2017, January 2018 and August-October 2021. Figure 9b shows that in Chanlúd, the semiannual models tended to result in mean values demonstrating the lowest performance. Annual models were better than semiannual models reproducing high values of rainfall. However, the general characteristic of quarterly models in reproducing the highest values of rainfall made them the best option. Nevertheless, it should be mentioned that quarterly models showed poor performance in some specific cases, such as in October 2017 and April 2019.  Figure 4f). The size of each circle corresponds to the frequency with which a climate index (with different lags) appears as a predictor of the model. The color of each circle corresponds to the mean of the lags with which a climate index appears as a predictor of the model. Figure 7 shows the most prominent climate indices for El Labrado (Figure 7a,b) and Chanlúd (Figure 7c,d) in the semiannual models. As the number of months increased from quarterly to semiannual models, NP appeared with less relevance in El Labrado and even did not appear in Chanlúd for NDJFMA. EP/NP was the most frequent index in the subset of predictors that allowed the best performance in the models, both in El Labrado and Chanlúd. The mean lag for EP/NP was around 12 months before the forecasted month. Unlike quarterly models, in semiannual models, the higher frequency of EP/NP was 16 in MJJASO. For El Labrado, the same signal with a mean lag of 12 months appeared with more repetitions, especially in MJJASO. AO and TNA area climate indices appeared in both semiannual models for El Labrado.

Qualitative Evaluation
Concerning El Labrado, PacWarm only appeared in NDJFMA, but its relevance was low. BEST, TNI and MEIV2 only appeared in NDJFMA with low frequency but a mean lag of 12 months. EA, Niño 1+2, Niño 1+2.A, Niño 3.4 and Niño 3.4.A were only present in MJJASO, but their frequency was low. Niño 3 only appeared in MJJASO, but its frequency was low, with a mean lag beyond 24 months. This is interesting since only Niño 4 and Niño 4.A corresponding to El Niño indices appeared in NDJFMA. EMI, Niño 4 and Niño 4.A only appeared in NDJFMA. QBO was only present in NDJFMA with the lowest frequency and a mean lag greater than 24 months. TSA, WHWP, IOD.W, IOD.E and DMI only appeared in MJJASO. Niño 3.A, TPI.IPO and CAR were not present in any semiannual models.
Concerning Chanlúd, NP, SOI, BEST, MEIv2, QBO, ESPI and PacWarm were only present in NDJFMA. Although NAO appeared in both semiannual models, J.NAO did not appear in any semiannual models. AAO, PDO, AMM, IOD.E and WHWP only appeared in MJJASO. Concerning the El Niño indices, Niño 1+2 appeared in MJJASO and Niño 3 in NDJFMA; the rest were not present in any model. TNI, EMI, TPI.IPO, TNA, TSA, IDO.W and DMI did not appear in any model. CAR was present in both models for Chanlúd but not for El Labrado. QBO was present in NDJFMA for both El Labrado and Chanlúd but not for MJJASO. Figure 8 shows the predictors selected for the annual models for El Labrado (Figure 8a) and Chanlúd (Figure 8b). Interestingly, the same signal of El Labrado and Chanlúd rainfall, with a mean lag within 12 months, was within the higher-frequency indices. CAR was the most frequently chosen for El Labrado. Meanwhile, lags of the Chanlúd rainfall anomalies were the most frequent in Chanlúd. NAO and EP/NP were indices within the six prominent índices in both El Labrado and Chanlúd.
In   Figure 10 shows the performance results for El Labrado (left-hand side of each figure panel) and Chanlúd (right-hand side of each figure panel) models that employed the seven evaluation metrics. The semiannual models showed the worst results in all the metrics, becoming the worst approach to forecast rainfall anomalies in both stations. According to the MARE, MAE, RMSE, NSE and EV metrics (Figure 10a-d,f), the best model was undoubtedly the annual model for El Labrado and Chanlúd. Moreover, the PBIAS (Figure 10g) metric confirmed the annual models as the best for Chanlúd. For El Labrado, the PBIAS indicated that the quarterly models were the best. However, the KGE metric showed that the quarterly models had the best performance.

Quantitative Evaluation
As indicated in Section 3.2, annual models leverage a major amount of records in the  Figure 9b shows that in Chanlúd, the semiannual models tended to result in mean values demonstrating the lowest performance. Annual models were better than semiannual models reproducing high values of rainfall. However, the general characteristic of quarterly models in reproducing the highest values of rainfall made them the best option. Nevertheless, it should be mentioned that quarterly models showed poor performance in some specific cases, such as in October 2017 and April 2019. Figure 10 shows the performance results for El Labrado (left-hand side of each figure panel) and Chanlúd (right-hand side of each figure panel) models that employed the seven evaluation metrics. The semiannual models showed the worst results in all the metrics, becoming the worst approach to forecast rainfall anomalies in both stations. According to the MARE, MAE, RMSE, NSE and EV metrics (Figure 10a-d,f), the best model was undoubtedly the annual model for El Labrado and Chanlúd. Moreover, the PBIAS (Figure 10g) metric confirmed the annual models as the best for Chanlúd. For El Labrado, the PBIAS indicated that the quarterly models were the best. However, the KGE metric showed that the quarterly models had the best performance.

Discussion
As more months were used to generate the models, the predictors chosen a most relevant changed, especially for the annual models. EP/NP was the most promi As indicated in Section 3.2, annual models leverage a major amount of records in the model training stage, so they probably obtain better results. In order to give evidence for such conjecture, five new annual models were trained with the same followed method but by only using 95 randomly chosen records in training. This was the same number of records used when training the DJF quarterly models (in the rest of the seasons, 96 records were used). The mean values of the evaluation metrics are indicated as bars with dots in Figure 10.
Comparing the annual models trained with 95 records with the quarterly and semiannual models (Figure 10), all the metrics indicated that the quarterly models were the best models. According to KGE, the quarterly models were always the best performers.

Discussion
As more months were used to generate the models, the predictors chosen as the most relevant changed, especially for the annual models. EP/NP was the most prominent index in the quarterly, semiannual and annual models. This index was related to the most frequently selected predictors for quarterly and semiannual models for El Labrado and Chanlúd. In the case of annual models, EP/NP was related to the fifth most frequently selected predictors for El Labrado and Chanlúd. Except for the annual model for El Labrado, in all cases, the average lag of the predictors associated with this index was between 12 and 24 months. The EP/NP is a northern hemisphere index related to 500 hPa height anomalies over three main anomaly centers: Alaska/western Canada, central North Pacific and eastern North America. Since it is a relevant index for the climate of North America, most of the works are related to that geographical area. Córdoba Machado et al. [69] found weak but significant correlations between EP/NP and rainfall in Colombia. However, lagged correlations were not used. Mora and Willens carried out a study analyzing the relationship between the index and rainfall in a basin where the Machángara is located [70]. They found correlations around |R 2 | = 0.6. This study showed that EP/NP also turned out to be an index with information that allows one to forecast rainfall anomalies with a horizon of approximately one year.
For the quarterly models, another prominent index was the north Pacific pattern (NP). NP is another northern hemisphere index. Specifically, it is the area-weighted sea level pressure over the region 30 • N-65 • N, 160 • E-140 • W. This confirms the relevance of information from the North Pacific in predicting rainfall anomalies in high tropical mountain areas. However, more research must be conducted to shed light on the acting mechanisms.
For the semiannual models, other prominent indices were TNA for El Labrado and AO for El Labrado and Chanlúd. TNA is a tropical Atlantic index and is defined as the anomaly of the average of the monthly SST over the region 5.5 • N-23.5 • N, 15 • W-57.5 • W. The tropical Atlantic SST is a driver of rainfall in the study zone [29,71,72]. Therefore, this study showed the applicability of the index in providing information to forecast rainfall anomalies in the study zone around 12 months in the future. AO is another northern hemisphere index that, like EP/NP, needs further study to understand the underlying mechanisms that make it a prominent index for forecasting rainfall in the study area.
For the annual models, the most prominent indices were CAR, NAO and the lagged versions of the same anomaly rainfall signal. CAR is related to the SST anomalies over the Caribbean and is not as prominent in quarterly and semiannual models. The Caribbean is known as a source of humidity that influences rainfall in the study zone [29,32]. NAO is another north hemisphere index and is a prominent teleconnection pattern in all seasons. Like EP/NP, further study is needed to understand the underlying mechanisms that make it a significant index for forecasting rainfall in the study area. Despite the known correlation between the conditions of El Niño zones and rainfall in Ecuador, these indices (Niño 3.A, Niño 3.4, Niño 3.4.A, Niño 4 and Niño 4.A) did not provide any relevant information to forecast rainfall anomalies in annual models. It should be borne in mind that the indices above are used in these models, but with delays of 13 to 37 months in order to generate forecasts with a one-year horizon. Despite the known relation between these indices and rainfall, this relation can fade when using signals with information distant in time. In addition, there may be other indices correlated with those of Niño with linear and nonlinear correlation with rainfall anomalies (standardized) that, more importantly, together with the rest of the selected predictors, are better leveraged by SVR, producing higher accuracy. Finally, another possible reason is that there are not enough events for SVR to learn the most significant patterns between the rainfall anomalies and the indices.
Another relevant result is the selection of SOI by many of the models, whether quarterlies, semiannuals or annuals. SOI had a linear correlation with the Niño 3.4 and Niño 3.4.A indices of −0.65 and −0.73, respectively. However, these last two were not selected for most models. The possible explanation is related to what was explained in the previous paragraph. Despite a high linear correlation between the indices, SOI contributed more to the learning algorithm in the context of the set of predictors chosen to produce a higheraccuracy model. In fact, the correlations between the Niño 3.4 and Niño 3.4.A indices with the standardized rainfall anomalies at Chanlúd (−0.04) were only slightly lower in magnitude than the correlation between SOI and rainfall anomalies (0.06) (in El Labrado, they were −0.07, −0.08, and 0.08, respectively). This means that SOI is more relevant to forecast rainfall when used with the other predictors in the model.
The distance between El Labrado and Chanlúd is approximately 8 km, and the difference in altitude is approximately 150 m. Despite the above, the SFS algorithm selected groups of predictors that differ for the models of these two stations, except those derived from EP/NP. EP/NP was the most relevant climatic index for the quarterly and semiannual models and was among the five most relevant in the annual one. Some reasons can explain this difference. First, note that the correlation between the standardized rainfall anomalies (Figure 2c) of the two stations (0.87) decreased compared to that of the raw data (0.92) (Figure 2a). These anomalies were used until before the evaluation (Scheme 1). As Figure 2b shows, there were systematic differences between the rainfall that were evident, for example, between May and August. Due to the above, it seems reasonable to expect certain differences between the groups of selected predictors because SVR made the best possible use of the highly nonlinear relationships in each group. Second, the relevance of the indices (size of the circles in Figures 5-8) could have been affected by the number of derived predictors that were selected (different lags). Since we used the same number of lags in climate indices and rainfall to generate the predictors, it is possible that for one of the two stations, a different τ max should have been used (see Section 2.4). With that, it is possible that a larger number of predictors related to certain indices could be selected. Third, the SFS of predictors used SVR with the default hyperparameters to compute the score on the selected predictor subsets. Variations in the selection approach (e.g., sequential backward selection [73]) or in the model to calculate the score (e.g., random forests [74]) or its tuning would lead to a profoundly exhaustive sensitivity analysis of predictors. However, analyzing the influence of all the above is beyond the scope of the study and is proposed for future research.
According to almost all evaluation metrics, the annual models were the best among quarterly, semiannual and annual models. However, the KGE metric showed that quarterly models were the best. The implementation of annual models with fewer registers showed that such high performance is possibly due to the amount of information that the learning algorithm can leverage in the training stage. Not enough (or not the optimal) predictors were selected in such a case to be exploited by the SVR algorithm. When comparing such results with those from the quarterly models, all metrics demonstrated that quarterly models were the best, which KGE indicates for all models.
The semiannual models were the ones that reported the worst results. This could be related to the bimodal seasonality of rainfall (Figure 2b) that does not allow data to be separated into periods with similar rainfall characteristics. Each semiannual model contained information on both rainy months and drier months. This means that the selection of predictors was not so robust since the selection algorithm used information on transition periods. The hypothesis that quarterly models could perform better by selecting more robust predictors was not necessarily true in practical terms. This is because, for operational reasons, the amount of information that can be used in annual models is greater. Specifically, this can be evidenced by using SVR as the learning algorithm for generating the forecast models. Depending on the learning algorithm, the negative effect of the amount of information to be used could be greater or lesser. Therefore, other studies using other learning algorithms are necessary to reach general and conclusive results.
A comprehensive comparison of the results with the predictions of South American models (e.g., the SEAS5 model) is pending, but a very brief discussion follows. Gubler et al. [68] demonstrated high precision in the highlands of Ecuador during the austral summer, which is consistent with our findings. However, Coelho et al. [75] state the low seasonal forecast skill of either empirical or coupled multimodel predictions in South America but highlight forecast assimilation's importance in obtaining better forecasts. Barnston and Tippett [10] showed that the North American Multimodel Ensemble project (NMME) [76,77] with a correction of bias with statistical methods does not improve the skill of the forecasts in South America.

Conclusions and Remarks
This study had the following main objectives: first, to know if the performance of the monthly rainfall forecast models improved when they were implemented for each season, by semesters or a single model for all the months of the year. These models always forecast rainfall on a monthly scale and differ in the months that are used in training and the months for which they make the forecast. Second, to analyze the main predictors that influence the improvement of the performance of the models. These predictors were generated using 38 climate indices and the same rainfall signal using lags of up to 37 months. The El Labrado and Chanlúd stations, located in the Andean Machángara basin, were used for the study.
The annual models were the best according to six of seven evaluation metrics. However, the Kling-Gupta efficiency shows that the quarterly models were the best. The study gives evidence that the performance of annual models was due to the more significant number of records (instances) that could be exploited when training them. When the annual models were trained with the same number of records as the quarterly models, the quarterly models were the best. Therefore, from a pragmatic point of view, annual models should be used to generate operational rainfall forecasting models in the study area. Studies in more areas are necessary to generalize the results obtained here.
The largest number of predictors that were chosen for the forecasting models were those derived from the EP/NP climate index. The influence of this northern hemisphere index on the Machángara rainfall has not been extensively studied, so it must be taken into account to investigate the mechanisms involved. For the quarterly models, another prominent index was NP, another northern hemisphere index. For the semiannual models, other prominent indices were the tropical Atlantic index TNA for El Labrado and the northern hemisphere index AO for El Labrado and Chanlúd. Finally, for the annual models, the most prominent indices were CAR (related to the conditions in the Caribbean), NAO (north hemisphere index) and the lagged versions of the same anomaly rainfall signal.
The results show that annual models can be operationally helpful since rainfall forecasts could be made in the current month with climatic information from twelve or more previous months. This is essential to anticipate water resources management in different sectors, e.g., agriculture and hydroelectricity.
There were some limitations in the study, which are described next. First, we selected a single τ max for the climate indices and rainfall in El Labrado and Chanlúd. Tuning a τ max for each station and index could reduce the search space of the learning algorithms and improve their performance. Second, SVR was used with the default hyperparameters in selecting predictors through SFS. Other selection methods [78][79][80][81][82][83] could be tested in future studies, as well as other learning algorithms that serve as scoring functions (e.g., random forests) performing a hyperparameter tuning. The former would help analyze the most relevant indices that influence the study area more exhaustively and with greater significance. Finally, the training of the forecast models could be carried out with other learning algorithms [17,84], also comparing their performance when combined with different selection methods.
This study shows a real approach to implementing operational forecasting models and allowing more accurate insight into the generalization of the models in a production environment. Funding: The authors would like to thank to Corporación Ecuatoriana para el Desarrollo de la Investigación y Academia-CEDIA, Empresa Electro Generadora del Austro ELECAUSTRO S.A., and Universidad de Cuenca for the financial support given to the present research work through the FONDO 11 program, especially for the project "Modelo matemático para optimización hidroenergética del complejo hidroeléctrico Machángara incluyendo criterios ambientales y de adaptación a los impactos del cambio climático". The first author was also supported by the VIUC through "Conjunto de horas 1".

Informed Consent Statement: Not applicable.
Data Availability Statement: Data about climate indices are publicly available online (see Data subsection). Rainfall data analyzed during the current study are not publicly available and must be requested from the electricity generation company ELECAUSTRO S.A.