3.1.1. Historical Data
The first analysis was based on the data series of daily rainfall observations recorded at Dar Es Salaam (TZ) Airport, Tanzania (Latitude: −6.87 N; Longitude: 39.20 E; Elevation: 53 m a.s.l.) with reference to the time span 1 January 1958–31 December 2010. Annual maxima are plotted in the following
Figure 1.
The assumption of data as independent observations from the GEV distribution was applied. Based on the Maximum Likelihood Estimation (MLE) method (by using the
eXtremes [
28] and
evd [
29] packages in
R software [
30]), the following estimation of
μ,
σ and
ξ parameters was obtained (
,
,
) = (68.25, 16.93, 0.039), with standard errors equal to 2.55, 1.83 and 0.083, respectively. Approximate 95% confidence intervals for each parameter were [63.25, 73.25], [13.35, 20.52] and [−0.124, 0.202] for
μ,
σ and
ξ, respectively. This showed that the 95% confidence interval is well extended for values lower than zero, although the estimation of the shape parameter was positive, pointing out the uncertainty of the performed evaluation.
The survey for the 100-year return level was
= 153.6 mm, with a 95% confidence interval of [129, 208] mm, and the return level plot in
Figure 2 shows the linear trend of the function as a consequence of the estimation of the
ξ parameter tending towards 0. Diagnostic plots (not shown for the sake of brevity), such as probability plot and quantile plots, showed that each set of plotted points are roughly linear, validating the use of the GEV model.
A more reliable survey using a Bayesian approach was implemented by applying the
evdbayes package in
R software [
23]. The algorithm provides functions for the Bayesian analysis of extreme value models, using the Markov Chain Monte Carlo (MCMC) method. The solely genuine prior information available referred to the GEV shape parameter; thus, prior information about
μ and
σ parameters were not-informative normal distributions with variance 10
4. In the Bayesian analysis, more specific empirical evidence provided by Koutsoyiannis [
31,
32] was applied, as a function of
ξ ≈ 0.15 for Europe.
A normal distribution around 0.15 with variance 0.2 was formed, restricting the
ξ variation to a physically reasonable range [
22]. By applying the MLEs as the initial vector
θ0 = (
,
,
) = (68.25, 16.93, 0.039) and the proposal standard deviations
psd = (6.191, 0.230, 0.216) (identified with some pilot runs), a Markov Chain Monte Carlo (MCMC method) was generated with a length of 100,000, satisfying mixing properties (
Figure 3). By graphically examining the chain (
Figure 3) and using the Geweke diagnostic [
33], a burn-in period of 10 iterations was found.
The sample means and standard deviations of each marginal component of the chain were:
whereas the 95% reasonable intervals were [63.10, 73.52], [14.32, 22.12] and [−0.11, 0.22] for
μ,
σ, and
ξ, respectively.
The sequence of simulated (
μi, σi, ξi) values was transformed, leading to a sample from the corresponding posterior distribution of the 100-year return level (
Figure 4). This gave a
estimation equal to 161.8 mm with 95% reasonable interval of [131.6, 219.1] mm. The plot of the posterior return level given in
Figure 5 shows the upper 95% interval to be more remote than the lower interval from the median level.
This was due to the heavier upper tail of the posterior distribution (
Figure 4), achieved for the non-negative prior on
ξ. The summary of Dar Es Salaam (TZ) data is given in
Table 2.
Moreover, μ was estimated to be 68 mm; nevertheless, MLE returned a lower estimation of the scale parameter σ, with respect to that derived from the Bayesian method. In terms of credibility intervals, the estimation of the shape parameter was more precise using the Bayesian method.
The Bayesian estimates were relatively insensitive to the prior distributions, as shown by the similar parameter and quantile estimates. The computational efforts of the Bayesian approach by using the R software (evdbayes package) were significantly short, requiring very short extra processing time with respect to the MLE. The sole prejudices, in terms of required computational time, regarded both the prior setting up and ensuring that the Markov Chain Monte Carlo had desirable properties. The inclusion of genuine prior information was a compelling factor in favour of the Bayesian inference. This, together with the limited amount of historical data available for the Dar Es Salaam (TZ) analysis, provided significant evidence to prefer the Bayesian analysis instead of the MLE one.
In
Figure 6, the autocorrelations for all three parameters after a 5 lag period decreased rapidly. Therefore, it is shown that the result has good mixing.
Table 3 shows the results of the 3 diagnostics according to the Gelman–Rubin, Geweke and Raftery–Lewis methods, as reported in [
34] for checking the convergence of the algorithm. The Gelman–Rubin diagnostic is equal to 1.000 for both
μ,
σ, and
ξ. Therefore, it is known that the chains could be accepted, and this indicates the estimates come from a state space of the parameter, as depicted in
Figure 7. In
Table 3, Geweke’s test statistics are 0.4307, 0.6353 and 0.9895 for
μ,
σ and
ξ, respectively. Therefore, also in this case, the chain is acceptable, as shown in
Figure 8. The last quantitative diagnostic is the Raftery–Lewis method. In
Table 3, the dependence factors
I are 4.320, 3.860 and 3.85 for
μ,
σ and
ξ, respectively. According to this method, high dependence factors (>5) show significant correlations between estimates, indicating poor mixing. Therefore, the estimated values have good mixing.
After analyzing the historical data with a stationary approach, the analysis was performed to verify the feasibility of Non-Stationarity data by applying both the GEV and assuming a linear trend for the location parameter.
The GEV log-Likelihood is based on the assumption that the data to be fitted are the observed values of independent random variables X1, …, Xn, where Xi ~ GEV(μ, σ, ξ) for each i = 1, … , n. This assumption can be extended to Xi ~ GEV(μi, σ, ξ), where μi = μ0 + μtrend ti. The parameters (μ, μtrend) are estimated, and the vectors of covariates t = (t1, …, tn) are specified by the user.
In this case study, the MLE fit for the location parameter was
= 61.34 + 0.25·
ti (where
ti = 0, 1, 2, …, 52 years) and associated standard errors were 4.92 and 0.15 for
μ0 and
μtrend (
Figure 9), respectively. The
and
estimates were 16.22 and 0.071 with standard errors of 1.81 and 0.091, respectively. As shown in
Figure 9, this resulted once again in a satisfactory fit. The observed trend of the fitted data with the GEV distribution can be considered a relevant result and suggests further analysis be performed on extended time series with climatic models. An analytic approach to determine the better fit between stationary and Non-Stationary approaches is the Likelihood-Ratio test (
eXtremes package [
28]). In this test case, the Likelihood-Ratio was equal to about 2.4897, i.e., lower than the 95% quantile of the
X12 distribution of 3.8415, suggesting that the covariate
ti model did not provide a significant improvement to the model without a covariate. This assumption was also supported by the estimation of the
p-value, equal to 0.114.
3.1.2. Historical Data with CMCC Simulation Data
In this section, the analysis is extended by integrating the rainfall observations recorded at Dar Es Salaam (TZ) Airport with the simulated data until year 2050, derived from the climatic forecasting simulations performed by the Euro-Mediterranean Centre for Climate Change (CMCC) for the IPCC scenario RCP8.5, using the COSMO CLM model. Data were downscaled to 1 km spatial precision. Thus, the total dataset was composed of 93 annual maximum rainfall events, 53 observed and 40 simulated data points (
Figure 10).
A similar analysis was performed for the historical data, and the results are summarized in
Table 4. For the Bayesian analysis, the MLE was applied as the initial vector
θ0 = (
,
,
) = (59.62, 20.96, −0.00645) by using proposal standard deviations
psd = (5.931, 0.193, 0.194). A Markov Chain Monte Carlo (MCMC method) was generated, with a length of 100,000 and good mixing properties. By examining the chain graphically and using the Geweke diagnostic, a burn-in period of very few iterations (about 50) was found to be satisfactory. As in the previous analysis, once a stationary approach was applied, the verification of possible Non-Stationarity of the data was done by using the GEV, as a function of a linear trend for the location parameter. In this case, the MLE fit for the location parameter was
= 72.46 − 0.273·
ti (where
ti = 0, 1, 2, …, 92 years), and associated standard deviations were 3.88 and 0.0683 for
μ0 and
μtrend, respectively. The
and
estimations were 18.83 and 0.0542 with associated standard deviations of 1.60 and 0.074, respectively. The Likelihood-ratio was about 13.3543, resulting in a greater 95% quantile of the
X12 distribution of 3.8415. The latter suggested that the covariate
ti model was a significant improvement over the model without a covariate, obtaining a small
p-value of 0.000258.
The posterior return level plot represented in
Figure 11 shows once again how the upper 95% interval was farther from the median than the lower one.
A naive Bayesian analysis was thus performed, taking near-flat priors that reflected the absence of external information. Indeed, prior on
μtrend was a non-informative normal distribution, in a way similar to
μ and
σ parameters, with a standard deviation of 100. Using MLEs as the initial vector
θ0 = (
,
,
,
) = (72.46, 18.83, 0.0542, −0.27), and using proposal standard deviations
psd = (5.679, 0.202, 0.187, 0.095), a Markov Chain Monte Carlo (MCMC method) was generated with a length of 100,000 and good mixing properties. As usual, the proposal standard deviation was determined by pilot runs. By examining the chain graphically (
Figure 12) and using the Geweke diagnostic plot through the coda package in
R [
35] (
Figure 13), a burn-in period of only 100 iterations was estimated.
The sample means and standard deviations of each marginal component of the chain were:
whereas the 95% reasonable intervals were [64.14, 79.58], [16.42, 23.15], [−0.0792, 0.211] and [−0.405, −0.129] for
μ0,
σ,
ξ and
μtrend, respectively.
In this case, the estimate for the 100-year return level
was not feasible because of the linear trend of the location parameter. In
Figure 14, the
,
and
return levels are plotted against time from year 1958 to year 2050, as a function of the applied Bayesian analysis. Moreover, the 95% credible intervals are plotted with dashed and dotted lines.
In
Figure 14, the linear variation over time (although improving the distribution pattern) of the mean parameter was observed, leading towards return level estimations being insignificant over time. As an example, for the 100-year return level, in 1958, 175 mm was achieved with 95% reasonable intervals of [127, 259] mm, becoming 150 mm in 2050 with 95% reasonable intervals of [90, 248] mm. A reduction of only 25 mm was thus observed. In
Table 5, the whole set of results for the Non-Stationary Analysis is summarized.
Considering the two methods, the μ0 parameter was estimated to be approximately 72 mm. Nevertheless, as shown in the Maximum Likelihood simulations, it returned a lower estimate of the σ scale parameter than that from the Bayesian method. The estimation of the shape parameter was more precise for the Bayesian method, whereas the estimation of the μtrend parameter was equally precise with both approaches.