Ensemble Precipitation Estimation Using a Fuzzy Rule-Based Model †

: In this study, a Takagi-Sugeno (TS) fuzzy rule-based (FRB) model is used for ensembling precipitation time series. The TS FRB model takes precipitation predictions of grid-based regional climate models (RCMs) from the EUR11 domain, available from the CORDEX database, as inputs to generate ensembled precipitation time series for two meteorological stations (MSs) in the Mediterranean region of Turkey. For each MS, RCM data that are available at the closest grid to the corresponding MSs are used. To generate the fuzzy rules of the TS FRB model, the subtractive clustering algorithm (SC) is utilized. Together with the TS FRB, the simple ensemble mean approach is also applied, and the performances of these two model results and individual RCM predictions are compared. The results show that ensembled models outperform individual RCMs, for monthly precipitation, for both MSs. On the other hand, although ensemble models capture the general trend in the observations, they underestimate the peak precipitation events.


Introduction
Precipitation is one of the main meteorological parameters that affects water resources and it is directly influenced by climate change [1]. General circulation models (GCMs) are used to estimate precipitation under climate change [2]. However, the systematic biases between the simulated and observed values, and coarse resolution (generally a few hundred kilometers), of GCMs prevent their applications for regional-scale climate impact studies [3]. Although regional climate models (RCMs), obtained by the dynamical downscaling of GCMs, provide spatially and physically consistent outputs, with a finer resolution (typical resolution of tens of kilometers), they still have significant biases [3]. Moreover, it has long been recognized that a single model prediction does not provide the range of outcomes that are required to assess the risks of future climate change [4].
One of the alternatives to overcome the above-mentioned issues is to use an ensemble approach (EA). The EA is applied in various fields, such as biology [5], water resources [6], medicine [7], and decision-making [8]. However, the application of the EA for climate predictions is relatively new [9]. The EA is applied by [10] to assess climate change impact on hydrology and water resources. Ref. [11] assessed climate change impact on climate extremes, while [12] carried out future predictions of atmospheric rivers, by using GCMs with the EA. The EA is used by [13] to assess climate change impact on surface winds, and by [14] on extreme rainfall events as well.
Multiple linear regression (MLR) is a simple EA used to generate the superensemble, through minimization of the sum of the squares of the differences between predictors and predictands conclude that MLR has better performance skills compared to single model predictions [15][16][17]. Machine learning methods are also used for ensembling climate models. Artificial neural networks, random forest, K-nearest neighbor, and support vector  [18][19][20]. In this study, another data-driven method, namely, TS FRB, identified by SC, is applied for ensembling RCM predictions. The TS FRB model is a powerful practical engineering tool for the modeling and control of complex systems [21]. The main advantage of the TS model is that simple fuzzily defined local models will result in a nonlinear (of high order) global model [22]. The TS FRB model has wide applications in a number of fields, including adaptive nonlinear control, fault detection, performance analysis, forecasting, knowledge extraction, and behavior modeling [21]. In this study, the relation between individual RCM's predictions and observed precipitation is represented using fuzzy rules, which are formulated through SC. Clustering is used in grouping, pattern recognition, data mining, and machine learning [23].
Fuzzy models are used for statistical downscaling of precipitation [24][25][26]; nevertheless, to the best of our knowledge, this is the first application of the TS FRB model as an EA for climate models in Turkey. The TS FRB model with one rule is equivalent to the MLR; this provides a benchmark for the evaluation of model performance in this study area. In addition to MLR, the performance of the TS FRB model is compared to that of simple ensemble mean (SEM), in terms of the prediction of monthly precipitation in the study area.

Data
Daily precipitation observations from two MSs are obtained from the Turkish State Meteorological Service. One of the MSs (Afyon: MS17190) is located inland while the other (Anamur: MS17320) is located close to the shoreline ( Figure 1). Observational data are subjected to a two-step quality check (QC). First, the months with more than 10 days missing data are eliminated. Then observed data is examined for the whole historical period and commonly dry months are identified as July, August, and September. Months other than these with an average precipitation less than 0.1 mm are considered unreliable and removed from the time series. For the remaining months, the monthly average time series is obtained and named as the final dataset for observations (FDO).
To build an ensemble model, monthly average precipitation simulations from eight different CORDEX RCMs are obtained for the corresponding months of FDO. RCM predictions are extracted by using the code developed by [27]. Information about the CORDEX RCMs used in this study and their long-term monthly mean and standard deviations (µ/σ) for the grid closest to the MS17190 (~17190) and MS17320 (~17230) are shown in Table 1. The long-term monthly µ/σ for the observed precipitation at MS17190 and MS17320 are 1.16/0.85 and 2.88/3.45, respectively. Table 1. RCMs used in the study.

Regional Climate Models (RCMs)
Model Number  To build an ensemble model, monthly average precipitation simulations from eight different CORDEX RCMs are obtained for the corresponding months of FDO. RCM predictions are extracted by using the code developed by [27]. Information about the CORDEX RCMs used in this study and their long-term monthly mean and standard deviations (µ/σ) for the grid closest to the MS17190 (~17190) and MS17320 (~17230) are shown in Table 1. The long-term monthly µ/σ for the observed precipitation at MS17190 and MS17320 are 1.16/0.85 and 2.88/3.45, respectively.

Takagi-Sugeno Fuzzy Rule-Based Model
In this study, a TS FRB model is developed to obtain an ensemble precipitation time series for each MS. Mathematical representation of the TS FRB model is as follows: where i is the index for the MS (here i = 1, 2), t is the index for time, FRB t i is the ensembled precipitation value for MS i in month t, RCM t i,1 , RCM t i,2 , . . . , RCM t i,8 are the precipitation predictions of 8 different RCMs at the grid closest to the MS i in month t.
The rule-based structure of the TS FRB model is identified by using SC. TS FRB takes precipitation predictions of eight RCMs as inputs to predict ensembled precipitation time series at the corresponding MS as the output. In SC, together with the input data, output data is included in the clustering process. Before clustering, log-transformation is applied to all data sets and the feature space is normalized to bind all data in a unit hypercube. In SC, each data point is treated as a candidate to be a cluster center (cc), and the potential of each data point to be a cc is calculated using [28], as follows: where X m is the normalized data point m, P m is the potential of the X m , r a is a user-defined positive constant identifying cluster radius and n is the number of data points. The potential of a data point exponentially decays with the square of the distance between that data point and all other data points. In this way, the data points with many neighboring data have higher chances to be cluster centers. The data point having the highest potential value is assigned as the first cc. Then, the potentials are updated using [28], as follows: where X * z is the cc z, P * z is the potential of the X * z and r b is a user-defined positive constant. In this study, r b is taken 1.5r a to avoid cluster centers being spaced closely.
In the updating process, the potential decays exponentially with the square of the distance from each data point to the previously assigned cc. Hence, the updating process ensures the potential of data points that are close to the previously assigned cc drop significantly compared to the data points distant to it; specifically, the potential of the previous cc becomes zero. Note that the predecessor cc is used for the updating process. The data point having the highest potential after the updating process is assigned as the next cc. Updating procedure is repeated until a user-defined number of cluster centers are identified.
Each cc is, in essence, a prototypical data point that exemplifies a characteristic behavior of the dataset. Therefore, each cc can be used as the basis of a fuzzy rule that describes the system behavior [28]. To convert 9-dimensional (8 input RCMs and one output) cc to the fuzzy rules, each cc is decomposed into two vectors (first with eight and second with 1 element). In this study, each fuzzy rule has the following form: where x 1 , x 2 , . . . , x 8 are the input variables and y is the output variable, M a 1 , M a 2 , . . . , M a 8 are antecedent fuzzy sets for rule a, which are defined by Gaussian membership functions and N a 1 , N a 2 , . . . , N a 8 , N a 9 are the parameters to be optimized for rule a. After obtaining fuzzy rules, well-known TS FRB [29] model is constructed and parameters are optimized by recursive least square estimation. For further details, the reader may refer to [28,30].
To select the best combination of the clustering parameters (e.g., the number of cc and r a ), a trial-error procedure is applied. Limiting the study space with 20 cc and maximum r a of 1, 400 FRB models are built using the FDO as the output and corresponding RCMs as inputs for each MSs. Dataset is randomly sampled and 75% of the dataset is used for training while the rest is used for validation. Combination resulting in the best performance in the validation set is selected and the selected number of cc and r a are used in ensembling.
The framework of the TS FRB is given in Figure 2. In the EA, 5-fold validation is used. The folds are combined together to form the validation time series (VTS) of precipitation. Note that each data point is used in the validation dataset once.

Simple Average of the Models for Ensembling
The SEM is formed to predict monthly precipitation values for the closest grid to the selected MS. Formulation of SEM is as follows [31]: where SEM t i is the precipitation prediction at grid i in time t, OBS i is the climatology of the precipitation observation at grid i, RCM t i is the average of the eight RCMs for grid i at time t and RCM i is the climatology of the eight RCMs. RCMs as inputs for each MSs. Dataset is randomly sampled and 75% of the dataset is used for training while the rest is used for validation. Combination resulting in the best performance in the validation set is selected and the selected number of cc and are used in ensembling.
The framework of the TS FRB is given in Figure 2. In the EA, 5-fold validation is used. The folds are combined together to form the validation time series (VTS) of precipitation. Note that each data point is used in the validation dataset once.

Simple Average of The Models for Ensembling
The SEM is formed to predict monthly precipitation values for the closest grid to the selected MS. Formulation of SEM is as follows [31]: where is the precipitation prediction at grid in time , is the climatology of the precipitation observation at grid , is the average of the eight RCMs for grid at time and is the climatology of the eight RCMs.

Results
In this section, the prediction performances of the SEM and TS FRB models are analyzed and compared for the VTS. In the clustering parameter selection process for the TS FRB, it is observed that as the number of cluster centers increases, the models tend to overfit. On the other hand, the performance of the models with relatively low numbers of cluster centers is very similar to those of the models with one cc (e.g., equivalent to the MLR). As a result of the trial-and-error procedure, for MS17190, the TS FRB model with 2 cc and of 0.45 is selected, while 2 cc and of 0.65 is selected for MS17320. Using the selected parameters, ensembling is carried out, and the VTS of each MS is formed. The performances of these models, together with those of the TS FRB model with one cluster center (e.g., MLR) and SEM, are given in Table 2, in terms of correlation (corr), root mean square error (RMSE) and percent bias (PBias).

Results
In this section, the prediction performances of the SEM and TS FRB models are analyzed and compared for the VTS. In the clustering parameter selection process for the TS FRB, it is observed that as the number of cluster centers increases, the models tend to overfit. On the other hand, the performance of the models with relatively low numbers of cluster centers is very similar to those of the models with one cc (e.g., equivalent to the MLR). As a result of the trial-and-error procedure, for MS17190, the TS FRB model with 2 cc and r a of 0.45 is selected, while 2 cc and r a of 0.65 is selected for MS17320. Using the selected parameters, ensembling is carried out, and the VTS of each MS is formed. The performances of these models, together with those of the TS FRB model with one cluster center (e.g., MLR) and SEM, are given in Table 2, in terms of correlation (corr), root mean square error (RMSE) and percent bias (PBias).  1 The blods referred results of the trial-and-error procedure, for MS17190, the TS FRB model with 2 cc and r a of 0.45 is selected, while 2 cc and r a of 0.65 is selected for MS17320.
As shown in Table 2, where the best performance is given in bold, the prediction skills of TS FRB, MLR and SEM are very similar for both stations. On the other hand, the ensembled results have higher prediction skills in terms of corr and RMSE, and lower prediction skills in terms of PBias, compared to the best-performing RCM. The time series of the observations and predictions of TS FRB and SEM for MS17190 and MS17320 are given in Figures 3 and 4, respectively. In Figures 3 and 4, to increase visibility, transparent colors are preferred for the TS FRB and SEM predictions. skills of TS FRB, MLR and SEM are very similar for both stations. On the other hand, the ensembled results have higher prediction skills in terms of corr and RMSE, and lower prediction skills in terms of PBias, compared to the best-performing RCM. The time series of the observations and predictions of TS FRB and SEM for MS17190 and MS17320 are given in Figures 3 and 4, respectively. In Figures 3 and 4, to increase visibility, transparent colors are preferred for the TS FRB and SEM predictions.    ensembled results have higher prediction skills in terms of corr and RMSE, and lower prediction skills in terms of PBias, compared to the best-performing RCM. The time series of the observations and predictions of TS FRB and SEM for MS17190 and MS17320 are given in Figures 3 and 4, respectively. In Figures 3 and 4, to increase visibility, transparent colors are preferred for the TS FRB and SEM predictions.    Although TS FRB and SEM follow the general trend in the observations for both MSs, both models underestimate the peak precipitations, as can be seen in Figures 3 and 4. However, despite its simplicity, SEM has higher skill in the prediction of peak precipitations compared to the nonlinear TS FRB. On the other hand, although the peak precipitations for MS17320 are much larger than those of MS17190, the EAs have higher prediction skills for MS17320, especially in terms of corr.

Conclusions
In this study, the ensembling performance of a TS FRB model for monthly precipitations is compared to those SEM and individual RCMs.

•
The analysis shows that the performance, in terms of corr and RMSE, of the EA is better, compared to the individual RCMs for both MSs. However, the PBias values of the best-performing RCM are much better than those of SEM and TS RFB;

•
The nonlinear TS FRB model has very similar prediction skills to the simple SEM model. So, when the effort to select the best cc and r a combination is taken into account, SEM is more efficient, compared to the TS FRB model for ensembling; • Both models failed to predict peak precipitation events.