Rainfall Prediction in the State of Para í ba, Northeastern Brazil Using Generalized Additive Models

: The state of Para í ba is part of the semi-arid region of Brazil, where severe droughts have occurred in recent years, resulting in signiﬁcant socio-economic losses associated with climate variability. Thus, understanding to what extent precipitation can be inﬂuenced by sea surface temperature (SST) patterns in the tropical region can help, along with a monitoring system, to set up an early warning system, the ﬁrst pillar in drought management. In this study, Generalized Additive Models for Location, Scale and Shape (GAMLSS) were used to ﬁlter climatic indices with higher predictive e ﬃ ciency and, as a result, to perform rainfall predictions. The results show the persistent inﬂuence of tropical SST patterns in Para í ba rainfall, the tropical Atlantic Ocean impacting the rainfall distribution more e ﬀ ectively than the tropical Paciﬁc Ocean. The GAMLSS model showed predictive capability during summer and southern autumn in Para í ba, highlighting the JFM (January, February and March), FMA (February, March and April), MAM (March, April and May), and AMJ (April, May and June) trimesters as those with the highest predictive potential. The methodology demonstrates the ability to be integrated with regional forecasting models (ensemble). Such information has the potential to inform decisions in multiple sectors, such as agriculture and water resources, aiming at the sustainable management of water resources and resilience to climate risk.


Introduction
The high spatial-temporal variability of rainfall in the Northeast of Brazil (NEB) is highly influenced by global teleconnection patterns associated with the El Niño-Southern Oscillation (ENSO), as well as The flexibility of probabilistic modeling can be observed even under non-stationary conditions, making it easier to understand how distribution parameters depend on external covariates [40][41][42][43][44][45]. Consequently, due to these characteristics, GAMLSS can be used in the analysis of the climate risk associated with hydroclimatic variables [46].
This study aims to identify the influence of SST anomalies on rainfall in the state of Paraíba using GAMLSS, to quantify the contribution of climatic indices to the distribution of rainfall and, consequently, to obtain seasonal rainfall forecasts for the state. In this way, the results may serve as support for decision-makers, since these models with local characteristics are not systematically used by the agencies responsible for the implementation of Brazil's climate forecasts, particularly in the state of Paraíba. Figure 1 shows the spatial distribution of the rain gauges in the state of Paraíba used in this study. Paraíba is located in the eastern portion of NEB, between the parallels of 06 • S-09 • S and the meridians of 34 • W-39 • W, bounded to the north by the state of Rio Grande do Norte, to the south by the state of Pernambuco, to the west by the states of Ceará and the east by the Atlantic Ocean. The population is estimated to be 4,025,558 inhabitants in 223 municipalities, with a demographic density of 66.70 inhabitants/km 2 [47].

Data
Rainfall data from 45 locations were used ( Figure 1). Monthly time series information was provided by Development Superintendence of Northeast (SUDENE) [48] from 1948 to 1990, and State Water Management Executive Agency [49] from 1994 to 2017. The filling of missing data was performed using the Expectation-Maximization with Bootstrapping (EMB) algorithm. Following the completion of the missing data, the average seasonal corresponding to the period 1986-2015 was used for this study.
The analysis of SST behavior was performed from anomaly indices in the regions of Niño 1 + 2, Niño 3, Niño 3.4, Niño 4, and SOI (Southern Oscillation Index) in the Pacific Ocean obtained from the Climate Prediction Center/National Centers for Environmental Prediction/National Oceanic and Atmospheric Administration [50]. Anomaly indices in the TNA (Tropical North Atlantic) and TSA (Tropical South Atlantic) regions, and the decadal scales AMO (Atlantic Multidecadal Oscillation) and PDO (Pacific Decadal Oscillation), were provided by the Earth System Research Laboratory/National Oceanic & Atmospheric Administration [51].

Descriptive Analysis
Time series analysis describes the data structure, acquiring its characteristics, and enabling predictions of its future behavior. In this study, all calculations were performed using the R programming language [52], including its library of described functions.
Missing observations are often found in time-series studies [53] as the methodologies associated with this type of analysis often must be applied in the presence of a complete historical series. Therefore, the resampling of these data using the Expectation-Maximization with Bootstrapping (EMB) algorithm, implemented in the Amelia II R package [54], was carried out to fill missing data found in the time series.
This method seeks to simulate up to four possible precipitation scenarios to replace the missing or inconsistent information found in the historical series [55]. These four imputations are the combinations generated by the algorithm, in which the user can subjectively select the value to be used in the imputation of the missing data. Amelia II is a complete R package for multiple imputations of missing data, which works faster and is easier to use than various Monte Carlo approaches via the Markov Chain, and provides essentially the same answer [54].
It is desirable to use the series of rain gauges closest to the location requiring data resampling so that the method identifies the characteristics and patterns of variability between the series and improves the method's ability to simulate missing data. It is also necessary to verify the reliability of the resampling of these data by simulating the observed data. In order to subjectively select the most suitable data for resampling, it is imperative to understand the influence of the meteorological systems regulating the region's rainfall regime and their dependence on the SST regions. Previous studies have emphasized the evolution and effectiveness of this method in information reconstruction [56][57][58].
The use of hierarchical cluster analysis [59][60][61], using the average monthly total for each location, is proposed to improve interpretation when interpreting results. This technique quantifies the similarity between sites by recognizing patterns that divide the area into groups of homogeneous precipitation regions [61,62]. The hierarchical cluster analysis is a multivariate technique that aims to group data according to the degree of similarity between them. In this way, its application is useful to generate precipitation groups with homogeneous regions within the cluster and heterogeneity between them. It is crucial to instruct that the robustness of this technique will depend on the amount of data available in the time series, where the sample must be quite representative. As in the context of this work, 30 years of data were applied, it is admissible that the selection of groups of homogeneous regions (mesoregions) was used safely.
The factoextra package [63] was used to facilitate the measurement of similarity as it provides accessible functions for extracting and visualizing data output and simplifies hierarchical cluster analysis using the Ward.D2 method [64]. The Ward.D2 algorithm uses square Euclidean distances to minimize errors while selecting dendrogram nodes, showing equivalence between locations. Each node indicates that there is a set (mesoregions) of cities, in which the rainfall regime has similar characteristics.
One of the difficulties in analyzing the association between the SST anomaly in the Atlantic and Pacific Oceans with precipitation of a given region is to have prior knowledge of the impact of temporal lags on this relationship. In order to obtain this information, it is possible to carry out climatological analyzes applying statistical predictions in precipitation. For this, it is necessary to use some method that can observe the sensitivity between the changes in the SST and precipitation in Paraíba and the lag associated with this relationship, before applying the main methodology.
Lag correlations between time series (Yt) and external variables (Xt) are often verified using the Cross-Correlation Function (CCF). However, this technique is limited to the characteristics of a stationary series, since the autocorrelation of Yt and Xt influences the CCF. To account for this, we use the prewhitening method [65] to identify the monthly lag between the SST climatic indices and the rainfall time series.
Prewhitening generates a visual identification of the signal in which there is a higher probability of having a temporal dependence, thus highlighting the possible value of the lag between the series. For this purpose, it is desired the vertical line of the graph (not shown), which varies between the zero of the CCF, exceeds the horizontal line that indicates the confidence level at p < 0.05. This procedure has been used and adapted by several authors to remove the influence of serial correlation, as well as to eliminate the trend structure of the series [66][67][68][69]. In this study, we used the prewhitening function found in the TSA package [70].

GAMLSS Model
A fundamental criterion for studying time series is to analyze the presence of stationarity. When applying the stationarity criteria, it is possible to choose the best adjustment of the distribution of potential models to the variables to be explored. However, when working with environmental data, such as meteorological data that follow non-stationary characteristics, mathematical transformations are needed to make the series stationary. GAMLSS [33] can adjust to a non-stationary time series, highlighting its practicality.
The GAMLSS model class was used because the data are complex, given the existence of considerable asymmetry in the data and many zeros. With this class of models, it was possible to filter the climatic indexes with greater predictive efficiency and, consequently, make a climatic forecast of precipitation. The specific method was selected due to the small amount of research using statistical modeling in the regional context, which associates the rainy season in Paraíba with the SST. There is a recently published study using GAMLSS modeling in Paraíba, but using latitude, longitude, and topography as predictor variables [35]. The innovation of this study is due to the use of oceanic indices that influence and regulate the region's rainfall regime, mainly between summer and southern autumn. Therefore, it is expected that the result found will be better because it contains the main variables that predict precipitation in the localities.
As already described, in this work, the average between three consecutive months was used. The justification for this analysis is given by the historical context when working with climatic variables. For example, the Climate Prediction Center (CPC) classifies events from El Niño or La Niña from reading the averages of the anomalies of three consecutive months in the Niño 3.4 region. Another advantage of this approach is the smoothing of the high monthly climatic variability that exists in the region. When applying this form, it facilitates to measure by simulation the volume of water to be distributed in the regions of Paraíba. This performance is verified during the application of climate forecasts in the years 2016 and 2017.
In GAMLSS models, the distribution of the response variable Y does not need to be part of the exponential distribution family. In this sense, for each distribution parameter, different additive terms can be included in the predictor. This is due to the ability to model not only the mean but also the higher-order moments, including the variance, skewness, and kurtosis parameters of the predictor variable distribution, which is ideal for the analysis of highly variable time series. Therefore, there is a possibility that the model has the potential to capture the influence of climatic indices (Niño 1 + 2, Niño 3, Niño 3.4, Niño 4, SOI, AMO, PDO, TNA, and TSA) on rainfall behavior in the state. The model adjustment can be applied in two general ways, the first is by selecting the predictor variables with the greatest influence (statistical significance) on rainfall in Paraíba state, and the second is by differentiating (smoothing) the predictor variables with low statistical significance in the attempt to improve them. In the development of this work, only the predictor variables with the best statistical significance in the modeling were selected.
This model was selected for the development of the seasonal forecast with a horizon of three months. The use of this, in the first half of the year, was chosen to be the period with the highest water recharge in cities located in the interior of the state of Paraíba. As these localities live with frequent droughts and a few months of rain, it is essential to use this periodicity to analyze the behavior of the model's predictions with the observed values of precipitation to provide a tool that can improve the management of water resources by decision-makers.
A better understanding of how GAMLSS works can be found in [33,40,41]. In the GAMLSS model framework, it is assumed that the y i data series are independent, for i = 1, . . . , n, with F Y (y i θ i ) being the cumulative probability density function (p.d.f.) where θ iT = θ i1 , θ i2 , . . . , θ ip is a vector of p parameters related to explanatory variables and random effects. This parameter vector includes the location, scale, and shape parameters of the p.d.f. Usually, in the context of hydro-meteorological data, p ≤ 4, since one to four parameter families have sufficient flexibility for characterizing continuous and discrete distributions [39].
In a GAMLSS model [33], the additive terms represent the smoothing terms that allow the flexibility of the parameter dependence of p.d.f. on climate indices. For a vector of observations of the response variable given by y T = (y 1 , y 2 , . . . , y n ), considering k = 1, 2, . . . , p, a monotonous function of g k (·) relating the k th parameter θ k explanatory variables random effects through an additive model written form where θ k and η k are vectors of size n, for example, θ T k = (θ 1k , θ 2k , . . . , θ nk ), β T k = β 1k , β 2k , . . . , β J k k is a parameter vector with size J k , X k is a known structure matrix of the order n × J k . Z jk is a fixed matrix of known structure n × q jk , and γ jk is a q jk dimensional random variable. When choosing Z jk = I n where I n is an identity matrix n × n, γ jk = h jk = h jk x jk for all combinations of j and k, thus obtaining the semi-parametric additive equation of the GAMLSS, where the h jk is an unknown function of the explanatory variable x jk and h jk = h jk x jk is a vector representing the h jk function of x jk . The attributes of the four parameters (p = 4) are usually characterized by location (µ), scale (σ), skewness (ν), and kurtosis (τ). The first two parameters θ 1 and θ 2 (Equation (1)), denoted by µ and σ are defined in the literature as the location (or mean) and scale (or variance) parameters, respectively; the last two ν = θ 3 and τ = θ 4 are called shape parameters.
We used seven probability distributions ( Table 1) that have been widely used in the climate analysis of precipitation time series, such as [71][72][73][74][75][76], with the support of the gamlss R package [33,77]. Table 1. Summary of the seven distributions considered in the study.

Distribution Function Abbreviation Probability Density Function
In the absence of smoothing functions, the model is reduced to the parametric GAMLSS model, g k (θ k ) = X k β k , k = 1, 2, 3, 4 estimated by the maximum likelihood (ML), with the maximum likelihood function given by [77] where f represents the p.d.f. of the response variable. However, for non-parametric models, it is necessary to resort to the penalized ML method, through the logarithm of the likelihood function given by [77],

Model Evaluation Criteria
In order to compare and select the appropriate probability distribution for the rainfall time series in the different GAMLSS models, we chose the model with the lowest Akaike Information Criterion (AIC) [78] and the Bayesian Information Criterion (BIC) [79]. The independence and normality of the residues are verified by the worm plot graph, which is a qq-plot of the residues. These graphs visually help to decide if the selected model fits the behavior of the studied series when verifying whether the residuals of the simulated versus the observed value are similar [80,81]. For optimal adjustment, 95% of the points are expected to be between the two elliptical curves (unit normal quantile). Otherwise, when finding points outside the range of high-frequency elliptical curves means that the model distribution adjustments (selection model) are insufficient to explain precipitation behavior (response variable), the outliers can also be found distant from the two elliptical curves on the graph [77,82] reiterate that this step is essential to enable model optimization and computational processing and to finally identify the most significant predictors to be used in the determination of predictions, as stated by [83].

Efficiency for Seasonal Prediction
Initially, the quantile technique was discussed by [84] and addressed by other authors [85][86][87], describing that this method can be applied to precipitation series, where the precipitation Y of a locality over time is considered a continuous variable capable of characterizing the dry, normal and rainy periods, based on the quantile categories. A quantile of order p is a numerical value that classifies a data set between intervals, the choice of which may vary according to the desired percentage of the research.
We decided to use the quantile technique to characterize the high annual variability of rainfall found in the region due to the dependence on SST anomalies. For each month, we classified the forecasted precipitation by applying the "Type = 6" algorithm because it is used in continuous data, where observations can assume any value in an interval. The algorithm is present in the quantile function of the stats package, the associated equation in the interactions of this algorithm is m = p.
Thus, it is possible not only to examine whether the response of the model corresponds to the classification of the observed value but also to determine whether the model at least captures the general precipitation behavior towards normality, in the sense of achieving the proper distribution for that period. In this case, the technique was adapted for the locations studied in Paraíba to classify the precipitation predictions as extremely dry (ED), very dry (VD), dry (D), normal (N), rainy (R), very rainy (VR) and extremely rainy (ER), as shown in Table 2. Table 2. Corresponding range to climate classifications represented by their quantiles.

Quantiles
Rating Very rainy Q(0.95) < p Extremely rainy Four different statistical indices were adopted to measure the model's performance and its applicability in conducting climate studies in Paraíba, due to the need to qualify the adjustment of the simulated values to those observed. Comparisons for each homogeneous locality and regions were made by Mean Absolute Error (MAE), Percent Bias (PBIAS), Root Mean Square Error (RMSE), and Coefficient of Determination (R 2 ). The MAE and the RMSE represent the average deviation between the observed and predicted data. The PBIAS assesses the trend that the mean of the predicted values has with the observed ones. Therefore, positive and negative values indicate overestimation and underestimation of precipitation by the model, respectively. Finally, R 2 indicates the proportion in which the variable can be explained by the model [29].
This enabled the quantification of the underestimation, overestimation, and the best adjustments of the model. Optimal fit estimation occurs when the estimation error indicated by MAE, PBIAS, and RMSE is nearby to zero, and the R 2 is next to one [29]. As noted, the study was designed to follow multiple statistical criteria to achieve efficiency and enhance time series analysis and its ideal procedures, significantly improving the product generated by the GAMLSS model.

Series Investigation
The quantile cluster analysis facilitates the identification of five homogeneous precipitation regions in the state of Paraíba (Figure 2), which were R1 (Litoral), R2 (Brejo), R3 (Agreste), R4 (Cariri and Carimataú) and R5 (Sertão). After obtaining the locations belonging to each homogeneous region (mesoregion), three representative cities were selected from each of the mesoregions, and pre-whitening was applied to calculate the lag between the precipitation time series and the climatic indices obtained from the regions of Niño1 + 2, Niño 3, Niño 3.4, Niño 4, SOI, TNA, and TSA (Table 3), except AMO and PDO. The lag value of the abnormal levels of AMO and PDO due to the characteristics of the series being multidecadal and decadal, respectively, is not precisely indicated by the pre-whitening method. Since the calculation obtains signals with strong multicollinearity, the technique would indicate the existence of several lags (lag = 0, 1, . . . , 12, . . . , n months), which could lead to an error in the interpretation of the results. As a result, the highest correlation level for these two indices is lag 0 [89], so we applied a lag equal to zero.    R1  0  0  0  7  3  0  0  3  1  R2  0  0  0  7  3  0  0  3  1  R3  0  0  0  7  3  0  0  3  1  R4  2  2  2  6  3  0  0  3  2  R5  2  2  2  6  3  0  0  3  2 It is necessary to justify that there are situations in which two or three "ideal" lag values are indicated by pre-whitening due to the different seasonal dependencies. Therefore, we generated Table 3 and compared this result with those already described in the existing rainy season literature in the eastern and northern parts of the NEB [8,[90][91][92][93][94][95], which is the target of this research.

Application of the GAMLSS Model
The performance of the GAMLSS model class was verified by adjusting the non-linear characteristics (shape) of the precipitation time series (similar to [45]), and by assessing the dependence of the series at 45 locations in Paraíba based on SST anomaly indices. SST anomalies, classified as predictive variables, have been used in the distributions to obtain the best fit and combination.
The distributions that best fit the Paraíba precipitation time series can be seen in Table 4. The GG distribution resulted in the best model of rainfall in the regions R1, R2, and R3, which are localities in the state where there are no long periods with zero precipitation and are close to the east coast. The application of the GG distribution at the other locations of the study (i.e., R4 and R5) would not be adequate due to the presence of zeros in the observed data. It should be noted that the total annual rainfall of R3 is less than or equal to R5. However, R3 has more months of zero precipitation than R5. The precipitation deficit in R3 for the June to December period is often affected by negative TSA anomalies [95].  The inflated zero model, i.e., the ZAGA distribution function, provided the best time-series adjustments with zero presence, according to the AIC and BIC. Consequently, given that regions R4 and R5 present frequent periods of drought, the ZAGA distribution was the best way to characterize the behavior of the series.
Once the characteristic distribution for each location was determined, we verified the residual behavior of the model and the predictions under two scenario hypotheses. In the first scenario (S1), only the location parameter (µ) was used in the model equation, i.e., only the mean of coefficients of the predictor variables associated with the SST was used in the framework of the GAMLSS model, both in residual analysis and in the obtaining of climatic predictions. For the second scenario (S2), the location (µ), scale (σ) and asymmetry (ν) parameters are adopted to not only using the mean of the distribution of the GAMLSS model but also inserting information related to the behavior of the variability and asymmetry associated with the SST indices.
Understanding the pattern of climate variability increases the accuracy found in climate modeling methodologies [96][97][98]. Therefore, the estimation of climatic parameters (Figures 3 and 4) allows the identification of probable variables that regulate the state rainfall regime. The GAMLSS model structure penalizes p-values above 5%, avoids multi-collinear problems, and increases the model's predictive capacity to select which climate indices have the highest correlation with rainfall at each location.   Figure 3 represents S1, where only the mean behavior of the climatic indices is used. When using the other parameters of variance and asymmetry, as in the equations expressed in S2, it is possible for the model to remove some of the climatic indices from the equation of the mean parameter, to reallocate the equations associated with the parameters of variance and asymmetry.
The parameters estimated by the GAMLSS model (Figures 3 and 4) indicated that the indices associated with the Niño regions in the Tropical Pacific Ocean showed a low frequency of contribution to the state's rainfall regime when compared to the other analyzed indices. This does not mean that the Tropical Pacific Ocean region plays a weak role in the state since the large-scale contribution associated with the Niño region influences the behavior of smaller systems, such as those associated with SST anomalies in the tropical Atlantic Ocean. It is worth noting that [20] also observed this contribution.
Therefore, it is essential to highlight the influence of Niño 1 + 2 in the R3, Niño 3 in the R4, Niño 3.4 in the R1 and R4, and Niño 4 in the R5. The SOI, AMO, PDO, TNA, and TSA indices are found to be more prevalent in rainfall behavior, as shown in Figures 3 and 4, with SOI and PDO being the most present. For this reason, the potential of these indices can be highlighted as good predictors, as already stated by some authors [36,99,100]. It is also noted that during the positive phase of SST in the North Atlantic Ocean, the behavior of TNA in the R1, R2, and R3 regions, and, in particular, the magnitude of AMO in the R4 and R5, makes it difficult for precipitation to occur in these regions.
The quality of the linear relationship between precipitation and climatic indices by the GAMLSS model can be verified by using only the practicality found in the information from the worm-plot graph observed in Figures 5 and 6, as confirmed by [101].
Characteristics of the quality of the model can be verified in most of the analyzed locations because the points of the worm-plot graphs (Figures 5 and 6) are close to the solid red line, with little oscillation, close to the zero-deviation line, and within the 95% confidence intervals (dotted elliptical curves). This indicates that the model fits the time series better during the rainy season of the regions. However, it is necessary to be careful, since the large deviations found in the upper and lower normalized quantiles can be interpreted as a limitation of the model in capturing the occurrence of maximum and minimum accumulated precipitation because the curve behavior is asymmetric and leptokurtic at the ends of the worm-plot graphic. This limitation of time series adjustment associated with observed environmental data has already been verified in recent studies [34,77]. Therefore, caution is needed when using this methodology in periods of irregular rainfall, such as those that occur during the dry season.

Quantile Prediction
When quantile residues approximately follow the characteristic behavior of normal distribution [102], the rationale validates that the coefficients of the parameters found are capable of diagnosing the influence of the climate indices [103] on the distribution of rainfall and, consequently, its use in the simulation of the seasonal precipitation forecast in the state of Paraíba. Therefore, we used the quantile technique to examine the data and to classify the observed precipitation by percentile intervals (categories) of seasonal rainfall for each of the 45 locations, and to observe the applicability of the GAMLSS model to reproduce seasonal rainfall variability in the post-2015 period. This makes it possible to evaluate in practice (operational work), the quality of the model in estimating the behavior of the rainy season for the next two years (2016 and 2017) in the region, which were climatologically dry. As the ideal period, i.e., with less climate variability and more stable meteorological systems, for forecasting in the NEB is the six-month seasonal rainy (from January to June), given that rainfall during the dry season is insignificant [104], particularly in R4 and R5, for representing the highest percentage of annual rainfall, presenting more weather systems and less climatic variability. The three-month periods used in the approach were divided as DJF (December/January/February), JFM (January/February/March), FMA (February/March/April), MAM (March/April/May), AMJ (April/May/June), MJJ (May/June/July).
Predictions found in both scenarios were satisfactory for the observed rainfall data in the first period of 2016 ( Figure 7). Even with the high seasonal and interannual variability of rainfall in the state resulting from mechanisms associated with SST [10], the model was able to highlight the role of climate indices as good predictors. As presented in Figure 7, the predictive capability of the model was able to indicate the general behavior of rainfall for R1, R2 and R3 during the DJF, JFM, FMA, AMJ and MJJ periods. The most reliable predictions for R4 and R5 were found in FMA, MAM, and AMJ. Figure 8 presents the forecast simulation for 2017, emphasizing the results found between the JFM and AMJ periods. The most reliable rainfall estimates for R1, R2, and R3 were during the periods of FMA, MAM, and AMJ characterized for the proximity of forecasting, as demonstrated by a good association between the sign of observed rainfall and the simulated values. The adjustment of the model to the observed data was also present in the R4 region, in the periods of JFM, FMA, MAM, and AMJ. Subjectively, it is possible to see in Figure 8 coherence between the observed and predicted quantiles in MJJ. In the R5 region, the estimates did not achieve the same success as those found in the R4 region. However, acceptable results for DJF are observed because they indicate approximately the variability of the rainfall volume identified between normal, dry and very dry.

Efficiency Indicators
Possible changes in climate are likely to reduce the NEB rainy season [105], intensifying the meteorological systems that cause extreme rainfall and droughts [106], thus requiring the effectiveness of the models [107]. The reliability of the forecasts using SST anomalies in tropical regions is evident, and the efficacy of the models must be presented by statistical criteria [108]. Due to the adequacy of the scenarios for the first semester of 2016 and 2017, it was necessary to assess the accuracy of the estimates for each homogeneous region of the study. The authors of [109] found that using a statistical model could be the easiest way to predict reservoir capacity and drought events at the regional level.
The performance of the model in estimating the rainfall volume of each homogeneous region of Paraíba through statistical and efficiency indicators can be seen in Tables 5-9. For this purpose, it was chosen to compare the average rainfall of the observed and simulated data for 2016 and 2017.     The MAE, PBIAS, and RMSE indicators (Table 5) allowed the identification of the limited reliability of the model in estimating rainfall in R1, with only the FMA period being adequate. In other periods, indicators display significant disparity between observed and estimated rainfall.
High values of MAE, PBIAS and RMSE were also observed in the R2 region, as shown in Table 6. However, these values were lower than those shown in Table 5. The FMA seasonal continued to be the most statistically significant for estimating since the associated error between the observed and the simulated values was small, as observed in the MAE, PBIAS and RMSE indicators. As in the R1 region, the values found in the coefficient of determination for the JFM, FMA, AMJ and MJJ periods indicate that the forecast was able to follow the general pattern of rainfall in the region. Once the climate forecast for a given region has been obtained, decision-makers can prepare measures that are resilient to known climatic adversities, corroborating the findings of [98], which underlines the importance of investing in hydro-climatic monitoring systems.
The statistical indicators presented in Table 7 show that the model was capable of capturing rainfall characteristics in the R3 region for the entire first semester. Precipitation estimates for the FMA and MJJ periods stand out as those with the best statistical efficiency, which can be observed in the results of MAE, PBIAS and RSME, with the indication of little associated error and the coefficient of determination to indicate that the model can explain the variability of rainfall.
In addition, the validation of the model demonstrates its applicability qualitatively throughout the semester in the display of the expected climatological behavior, thus enabling the implementation of the climate forecast. This statement is associated with the low value found for PBIAS and the minimally acceptable value of the coefficient of determination The predictive ability of the model showed efficiency in making estimates in the R4 region, as observed in the MAE, PBIAS, and RMSE indicators ( Table 8). The reliability of the prognoses in reproducing the climatological behavior for the region is justified by the determination coefficient, which proves the adequacy in using the GAMLSS model class to generate seasonal precipitation prediction.
DJF and AMJ, as can be seen in Table 9, were seasonal that the model was able to follow at least the variability of rains that occur in R5. In others, it was possible to observe the model's low efficiency through statistical indicators. The predictions managed to follow the general pattern of rainfall behavior observed, as highlighted in the results obtained by the coefficient of determination. Although the indicators highlight the low efficiency in estimating the observed value, it can be seen that, at some locations in Figures 7 and 8, the forecast classification followed the observed rainfall pattern, suggesting that regions with the best predictive characteristics could be filtered when performing these tests by location.
The authors of [110] highlighted the importance of calibration in the modeling of statistical downscaling to obtain quality in the final result. However, they recommend that this modeling cannot be considered as an alternative, but that it should be integrated with other models in an ensemble. The authors of [111] identified that the choice of the appropriate model is not simple and depends on several factors such as location, climatic factors, data quality, among others. They observed a significant improvement in the ability of the models when selecting predictive variables with a higher level of influence in the study area.

Discussion
Studies on SST interactions and their impact on NEB rainfall are essential for the assessment of climate change scenarios [106,107] argue that rainfall forecasts have the potential to benefit different sectors of society, but that there is a need for effective implementation of climate services, public policies, and close collaboration between scientists, climate professionals and organized civil society. This information makes it possible to reduce seasonal errors, to filter periodicity with higher statistical significance, and to improve the accuracy of the statistical models. Several authors have discussed the influence of SST in the precipitation variability on NEB [86,[90][91][92][93][94][95].
Seasonal forecasts are crucial for the management of water resources [109], mainly from January to May in Paraíba, a critical period that guarantees the recharge of reservoirs [104] in the R4 and R5 regions. This is due to the frequent problem of water scarcity, which requires a higher predictive capacity of the model; this enables sustainable regional water management, reducing conflicts by improving food and water security [98].
The parameters estimated by the GAMLSS model for precipitation time series allow an understanding of the physical mechanisms associated with precipitation, indicating the climatic indices with the most significant influence on rainfall in the region, as shown in [69]. The presence of AMO, SOI, PDO, TNA and TSA indices was found to be highly representative, with the most relevant AMO, TNA and TSA indices, highlighting the importance of the tropical Atlantic Ocean as the primary regulator of the rainfall regime in the region. These results agree with [17].
The positive TSA phase, associated with the negative TNA phase, contributes to the formation of the north-south thermodynamic gradient [112,113], favoring the intensification of the low TSA, which in turn strengthens the northeast trade winds, displacing the southernmost ITCZ from its average climate position. As observed by [20], this pattern contributes to the occurrence of meteorological systems such as instability lines, easterly wave disturbances, and more frequent convective mesoscale complexes, which cooperate in the distribution of rainfall in all regions, particularly R4 and R5.
The possible explanation for the existence of TNA favoring precipitation in the R4 and R5 regions may be related to the behavior of trade winds [95] because, during the positive TNA phase, the northeast trades are weakened. As a result, rainfall in the R1, R2 and R3 regions is reduced. In this scenario, the increase in consecutive dry days and, thus, the reduction in expected rainfall for some locations can be observed [113]. However, this configuration contributes to the advancement of weather systems by the NEB, to some front and UTCV fragments inside Paraíba during the rainy pre-season, due to the strengthening of southeastern trade winds and cyclonic circulation in eastern South America and the South Atlantic [114].
A description of the behavior of rainfall distribution, which is conditioned by climate indices, is valuable information for water resource management in the state of Paraíba. Resilience measures for social problems associated with weather events, floods, landslides, and drought conditions can, therefore, be planned for the development and implementation of public policies. As the potential of the GAMLSS model is extensive, it can be discussed in future studies, in particular on the subject of environmental science. We also believe that this research can stimulate new applications of statistical modeling, particularly in the preparation of climate forecasts for the northeast region of Brazil.

Conclusions
It was proposed in this study to create a methodological structure capable of establishing a significant relationship between climatic indices and rainfall in the state of Paraíba, corresponding to 30 years (1986-2015), to develop seasonal rainfall predictions in 2016 and 2017. The results of the GAMLSS model showed the dominant influence of the tropical Atlantic Ocean on rainfall in Paraíba. In addition, the tropical Pacific Ocean also contributes to rainfall behavior in the R4 and R5 regions.
The model emphasized that the SOI, PDO, TNA, TSA, and AMO indices are the main precipitation regulators in Paraíba, classifying them as good predictors. The indices associated with the Niño regions have their characteristics of rainfall contribution, which stand out from the performance of Niño 3 and Niño 3.4.
The model has been shown to have predictive capabilities in all regions of the state during the summer and southern autumns. However, improvements can be applied to the development of the model. The potential contribution of the ZAGA regression model is highlighted in locations where there is no rainfall during the dry season. More significant results are found on a local scale than on a regional level.
The methodology applied can be used as an additional tool for the achievement of climate predictions by decision-makers. It highlights innovation in the application of statistical models for the simulation of rainfall behavior. This will allow the various sectors to promote strategies and planning for water resources in the region, as they proactively manage, with actions to combat, prevent, and respond to climate risks.