Jointly Modeling Drought Characteristics with Smoothed Regionalized SPI Series for a Small Island

: The paper refers to a study on droughts in a small Portuguese Atlantic island, namely Madeira. The study aimed at addressing the problem of dependent drought events and at developing a copula-based bivariate cumulative distribution function for coupling drought duration and magnitude. The droughts were identiﬁed based on the Standardized Precipitation Index (SPI) computed at three and six-month timescales at 41 rain gauges distributed over the island and with rainfall data from January 1937 to December 2016. To remove the spurious and short duration-dependent droughts a moving average ﬁlter (MA) was used. The run theory was applied to the smoothed SPI series to extract the drought duration, magnitude, and interarrival time for each drought category. The smoothed series were also used to identify homogeneous regions based on principal components analysis (PCA). The study showed that MA is necessary for an improved probabilistic interpretation of drought analysis in Madeira. It also showed that despite the small area of the island, three distinct regions with different drought temporal patterns can be identiﬁed. The copulas approach proved that the return period of droughts events can differ signiﬁcantly depending on the way the relationship between drought duration and magnitude is accounted for.


Introduction
Droughts, perceived as prolonged and regionally extensive occurrences of below average natural water availability, are among the most destructive hazards and can arise virtually everywhere on the planet [1]. In island environments, where freshwater is often a limiting factor and people strongly rely on precipitation to refill the surface and underground water reservoirs and to support activities, such as rain-fed agriculture, droughts have frequently led to water insecurity-ranging from chronic water scarcity, lack of access to safe drinking water and sanitation services, to hydrological uncertainty [2][3][4]. As stated by the Intergovernmental Panel on Climate Change, IPCC, in its periodical assessment reports [5][6][7], compared to continental areas, islands are specifically more vulnerable to natural hazards due to their lower adaptive capacity, and are more often affected by extreme hydrological events (e.g., floods and droughts) and climate change, especially the so-called small islands with areas between 100 km 2 and 5000 km 2 [8]. Although small islands are not a homogeneous group, they share many common features that distinguish them from larger islands [9], which make more challenging their adaptation to the projected climate change risks-such as the increase in the probability of drought and rainfall deficits [10].
The pronounced hydrological temporal but also spatial variability in some of the small islands makes drought complex to analyze and simultaneously a poorly understood extreme hydrological events (e.g., compared to floods) [1]. Examples are Madeira with a very pronounced wet season and with notable differences in rainfall between northern and southern slopes; the nearby Porto Santo Island, with some signs of aridity and relatively low rainfall concentrated in a few days [11]; the Azores archipelago with very wet high regions and drier coastal areas [12]; and the Canary Islands, located in a dry belt with very low rainfall near the coast, especially in the flat islands [8]. These particular features have contributed to the absence of comprehensive drought assessment methods for small islands.
From a hydrological perspective, droughts are mainly characterized into three major types, with their own specific spatiotemporal characteristics [13], according to their duration and type of freshwater reservoir they affect: meteorological, agricultural, and hydrological droughts [14][15][16]. Meteorological droughts are characterized by a prolonged deficit of rainfall from its long-term average. Triggered by longer rainfall deficits, agricultural droughts are characterized by reduced soil moisture. Hydrological droughts are related to the impacts of persistent shortage of rainfall on lakes and reservoirs, rivers, surface water, and groundwater. Droughts can develop from over short periods (a few months) to longer periods (seasons, years, or even decades) [15,17].
The monitoring of drought employs widely used drought indices, such as the Palmer Drought Severity Index (PDSI) [18,19], the Standardized Precipitation Evapotranspiration Index (SPEI) [20,21], or the Standardized Precipitation Index (SPI) [22]. The PDSI uses precipitation and temperature data in a water balance model to compare meteorological and hydrological droughts across space and time, the SPEI considers precipitation and potential evapotranspiration, whereas the SPI only uses precipitation as state variable. Different authors have recommended that droughts should be studied within a regional context [17], because the results of individual case studies may not be comparable [23]. To make the drought analyses results comparable, regardless of the studied region, the drought indexes should be standardized [14] which is precisely one of the most important characteristics of the Standardized Precipitation Index. The SPI is likely to be the most frequently used drought indicator worldwide, because it is applicable in all climate regimes [14,[24][25][26].
At given location, the SPI quantifies the observed rainfall as a standardized departure from a selected probability distribution function that models the raw rainfall data for the timescale of interest, from 1 to 48-month or longer (1, 3, 6, 9, and 12-month are the most common timescales [27]). The rainfall data are fitted to a probability distribution function, which is then transformed into a normal distribution so that the mean SPI for the location and desired timescale is zero [22]. Negative SPI values represent rainfall deficit, whereas positive SPI values indicate rainfall surplus.
By applying the run theory [28] to the SPI series at a given timescale, the following characteristics of the droughts can be determined ( Figure 1): drought duration (Dd), during which the SPI is continuously below an adopted critical level or threshold (u); magnitude (Dm) indicating the cumulative absolute deficit due to a drought event below the threshold; drought maximum intensity (Dmi) indicating the minimum value of the drought index below the critical level, and interarrival time (L), which is the time range between the initiation of two consequent drought events. The drought characterization in Madeira used the daily rainfalls, from January 1937 to December 2016 (80 years), at the 41 rain gauges of Figure 2 and Table 1. The records were provided by the Portuguese Institute for the Ocean and Atmosphere (IPMA), which has high data quality standards and is one of the main sources of Portuguese hydrometeorological data. The series had a few missing values that were filled in using a gap-filling procedure tested for Madeira, namely the Multiple Imputation by Chained Equations (MICE) [47,48]. The monthly and annual rainfall series were obtained from the complete daily series.

Standardized Precipitation Index (SPI) Calculation and Drought Recognition
As previously mentioned, the drought characterization used the SPI, developed by McKee et al. [22], and described in detail by Edwards and McKee [49]. Such index measures the rainfall anomalies at a given location, based on the comparison of the rainfall for a timescale of interest, with the average rainfall in the same period. The historic records are fitted to a probability distribution, which is then transformed into a normal distribution such that the mean SPI value for that location and timescale is zero. The Pearson Type III distribution with parameters given by the L-moments was applied to describe both the monthly rainfalls and the cumulative rainfalls at Madeira [26,47].
The drought events were assessed based on the SPI computed at 3 (SPI3) and 6 (SPI6) months for the period of 80 years (960 months of rainfall data) and for the 41 selected rain gauges. These two timescales are usually related to meteorological and agricultural droughts [50], and, the latter, also to early signs of streamflow shortfalls [27]. Such sustained shortfalls are very important for the management of water resources in Madeira due to its high dependence on rainfall for groundwater recharge-which is the main source of freshwater on the island [46,47]. The drought categories for SPI were those proposed by Agnew [51], presented in Table 2. According to this classification, there is a drought whenever the SPI falls below the threshold u < −0.84. Furthermore, the droughts were characterized based on run theory as depicted in Figure 1. It is noteworthy that drought duration (Dd) and magnitude (Dm) are necessarily mutually dependent, because the longer Dd is, the higher Dm should be.

Moving Average Filter (MA)
Relatively few studies have addressed the run theory applied to the recognition of the drought characteristics ( Figure 1) to cope with the possible dependence of the periods under drought conditions. Van Loon et al. [16], Fleig et al. [31], Tallaksen et al. [52] have applied moving average (MA), with different running lengths and at different SPI timescales, to smooth out short-term fluctuations and highlight possible longer drought events. Similar procedures have been carried out by López-Moreno et al. [30], although applied to drought analysis based on streamflow series, Zelenhasić and Salvai [53], Byzedi and Saghafian [54], Li et al. [55], Shin et al. [56]. However, criteria about the running length to adopt, also taking into account the different timescales of SPI, are missing.
The moving average (MA) is the most common filter in digital signal processing, mainly because it is the easiest to understand and use [33]. The MA is a convolution using a very simple filter kernel. This makes it the premier filter for encoded signals in the time domain in which samples are usually created by sampling at regular intervals of time containing information that is interpretable without reference to any other sample [57]. In the present paper, the MA operates by averaging several points from the input signal symmetrically chosen around each point in the output signal, according to: where x i is the input signal, y i is the output signal, and M is the number of points in the average (running length). For M = 1, the output coincides with the original or unsmoothed series. For the smoothing of SPI3 and SPI6 at the adopted 41 rain gauges, Equation (1) was applied aiming at reducing the random noise, i.e., the spurious short droughts and drought interruptions, while maintaining a sharp step response (e.g., Dm, Dmi). Different M were tested and assessed for each SPI timescale, namely for SPI3, 2 and 3, and for SPI6, 3 and 5. The SPI output series for M equal to 1, have obviously the same values as the input series and were also analyzed for both timescales (unsmoothed series). Symmetrical averaging usually requires that M be an odd number. However, for the case of SPI3, because only two values could be adopted for M, they were both tested, despite one being an even number. It is acknowledged that the higher M, the greater the noise reduction, but also the greater the possibility that the signal would be distorted by the smoothing operation. A key issue in SPI series smoothing is the choice of M. In the Madeira case study, the selection of its value for each timescale was assessed based on the regionalized SPI series derived from the unsmoothed and smoothed SPI series at the 41 rain for different values of M, as discussed in Section 4.1.

Regionalized SPI Series Based on Principal Components Analysis (PCA)
Principal components analysis is a statistical procedure that transforms several (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) [58]. When applied to the SPI values from a set of rain gauges, it allows their regrouping and consequently, the delimitation of climatic regions in relation to synoptic situations, i.e., a regionalized SPI series-such as the three distinct areas with coherent climatic variability identified by Bonaccorso et al. [59] in Sicily from 1926 to 1996, the three homogeneous regions adopted for the drought characterization in mainland Portugal (which is approximately 120 times the size of Madeira) Santos et al. [29], or the two climatic sub-regions in the western Iran by Raziei et al. [60], all of them based on the SPI field. In the case of Madeira, the PCA was applied to the unsmoothed and smoothed (with different M) SPI time series at the timescales of 3 and 6 months.
The PCA consists of computing the covariance matrix of the SPI series with the corresponding eigenvalues (λ) and eigenvectors (v). The main applications of the PCA, specifically the principal factor analysis (PFA), are: (1) to reduce the number of variables; (2) to detect structures in the relationship between variables; (3) to reduce the system's information entropy, i.e., the information not directly available about a system due to the uncertainty or randomness of data flow [61,62]; and (4) to combine correlated variables into factors [63]. As for Madeira, the variables are the SPI series at the rain gauges and the factors, the regionalized SPI. Henceforth, the term factor analysis will be used generically to encompass both PCA and PFA.
The determination of the number of components or factors to retain was based on the Kaiser's rule [63], according to which the factors whose eigenvalues are greater than 1 must be retained. The spatial patterns of the eigenvectors (factor loadings) represent the correlation between the original data and the corresponding factor time series. More localized patterns are obtained by applying the Varimax rotation technique to selected factor loadings [64]. The projection of the SPI fields onto the orthonormal eigenfunctions provides the factor score time series [65]. Factor scores are estimates of the actual values of individual cases (observations). For instance, the estimated factor score (regionalized SPI) on factor j for observation, or month, i, F ji , can be represented as follows: where w j is the regression weight, multidimensional value referred to as factor score coefficient; and z i is the variable, i.e., the SPI series at a single rain gauge. For any single common factor, an infinite number of sets of scores can be derived that would be consistent with the same factor loadings [66]. The factor scores are particularly useful to perform further regional analyses that have been identified in the factor analysis, such as fitting drought characteristics with copulas, as will be mentioned in the next subsections.

Univariate Analysis of Drought Duration and Magnitude; Selection of Probability Distribution Functions
The univariate analysis of drought duration, Dd, and magnitude, Dm, has not yet established a consensus on the marginal distribution to be used. Shiau [42] suggested the use of Exponential and Gamma distributions to model separately Dd and Dm, respectively. The advisable practice, however, is to test different families to achieve the best-fitted model for each region and drought characteristic [67,68]. Thus, in this study, Exponential, Gamma, Logistic, Log-Normal, Normal, and Weibull were inspected-based on the most relevant factor scores for the smoothed regionalized SPI3 and SPI6 series-to determine the most suitable distribution function for drought duration and magnitude values. The Akaike Information Criterion (AIC) was used as a goodness-of-fit test to define the best-fitted distribution and the parameters were estimated using the Maximum Likelihood Estimation (MLE) [69].

Bivariate Analysis of Drought Duration and Magnitude
Few researchers have been devoted to the multivariate modeling of extreme events due to the considerably more data requirements, sophisticated mathematical treatment and models required, and complex interpretation of the outputs. A bivariate probabilistic distribution is thus more common, provided it proves to be able to account for the relevant correlations under analysis, mostly because it is easier to apply. One of the drawbacks of using bivariate distributions is that the same family is needed for each marginal distribution (e.g., [70][71][72]). Several methods have been proposed to investigate the bivariate characteristic of droughts, such as the product of the conditional distribution of drought severity for a given drought duration and the marginal distribution of drought duration to construct the joint distribution of drought duration and magnitude used by Salas et al. [73] and González and Valdés [74], with complex mathematical derivation involved. Nevertheless, multivariate distributions using copulas, whose applications in hydrology have been increasing in recent years with several and different uses [67,[75][76][77][78][79][80], can overcome such issues.
Copulas, introduced by Sklar [81], are functions that can be used to separate the marginal distributions from the dependency structure of a given multivariate distribution. According to Nelsen [82], to model n random correlated variables (x 1 , ., x n ) is given by the copula function C(u 1 , u 2 , ..., u n ) according to Equation (3): where F k (x k ) = u k for k = 1, ..., n, with u k ∼ u(0, 1). The two major classes of copula families are the Meta-elliptic copulas and Archimedean copulas. Meta-elliptic copulas are directly obtained by inverting Sklar's Theorem [81]. Given a bivariate distribution function F with invertible margins F 1 and F 2 , a bivariate copula C(u 1 , u 2 ) for u 1 , u 2 ∈ [0, 1] is given by Equation (4). Meta-elliptic copulas are symmetric and hence lower and upper tail dependence coefficients are the same.
The Archimedean copulas are more flexible than Meta-elliptic and can present lower or upper tail dependence. They are defined by Equation (5): where ϕ is the generator function of the copula C and ϕ[0, 1] → [0, ∞] is a continuous and factually reducing function. The Meta-elliptic Gaussian and t-Student, and the Archimedean Clayton, Frank, and Gumbel, were tested to verify best fit. Table 3 presents the formulations of the candidate copula families. When the Archimedean copula is rotated 180 • it is called survival copula and can invert the predefined tail dependence to best fit the data. Table 3. Copula candidate family and mathematical formulation.

Copula Family Mathematical Formulation
Meta-elliptic Gaussian

Copula Parameters Estimation
The parameters for the families of the candidate copula were estimated considering the maximum likelihood MLE method, by choosing the Inference Functions from Margins (IFM) method [83]. The use of IFM method requires previous fitting of marginal distributions functions to transform its values into the (0, 1) interval.

Best-Fitted Copula
To compare the bivariate copula models from several families and choose the best-fitted model, the fit statistic AIC was used. First, all the candidate copulas, Gaussian, t-Student, Clayton, Frank, and Gumbel, are fitted using maximum likelihood estimation, MLE. Then the AIC is computed for all copula families and the one with the minimum AIC is chosen. According to Brechmann and Schepsmeier [84], for observations u i,j with i = 1, ..., N and j = 1, 2, the AIC of a bivariate copula family c with θ parameter(s) is defined by Equation (6): (6) where one parameter copulas have k = 1 and the two-parameter t-student has k = 2.
The two-parameter copula is penalized in the minimization of AIC value to reduce overfitting possibility due to parsimony principle.

Univariate Return Period
A common approach used in hydraulic and hydrological design is based on frequency analysis of the recurrence interval or return period (T) of a given hydrological event. Shiau and Shen [85] define the return period as the average elapsed time between occurrences of the event with a certain magnitude or greater. The highest the return period the more exceptional the event is. The univariate return period of droughts, based on the concept of stochastic processes, is derived as follows. The return period of drought duration (T Dd ) or magnitude (T Dm ), is described as function of the expected interarrival time E(L) and of the cumulative distribution functions (CDF) of the drought characteristic marginal distributions F Dd (d) or F Dm (m), as defined in Equations (7) and (8), where both return periods and E(L) are expressed in years [71,85,86]. The E(L) is calculated by adjusting a distribution function to interarrival time and deriving its mean value.

Bivariate Drought Return Periods
Due to the multivariate nature of droughts, Shiau [42] proposed a methodology that categorizes the return periods of bivariate distributed hydrological events as joint and conditional return periods. The joint drought duration and magnitude return periods can be defined in two cases: return period for Dd ≥ d or Dm ≥ m and return period for Dd ≥ d and Dm ≥ m, as described by Equations (9) and (10), respectively: where T Dd or Dm is the return period for Dd ≥ d or Dm ≥ m; T Dd & Dm is the return period for Dd ≥ d and Dm ≥ m.

Conditional Drought Return Periods
Similarly, the conditional return periods can be defined for two cases: the return period of drought duration given drought magnitude exceeding a certain threshold (T Dd|Dm≥m ), and the return period of drought magnitude given drought duration exceeding a certain threshold (T Dm|Dd≥d ). These conditional return periods are calculated by Equations (11) and (12): Detailed discussions on the relationships between univariate, bivariate, and conditional return periods can be found in Shiau [42] and have been applied worldwide [67][68][69][86][87][88].

Smoothed Regionalized SPI3 and SPI6
At each of the 41 rain gauges, the SPI series were computed based on 80 years (960 months) of rainfall data, as described in Section 3.1, and then smoothed by applying the MA technique, mentioned in Section 3.2. The SPI series have different lengths according to the timescale and running length: e.g., the length of the unsmoothed SPI3 is n = 960 − 2 = 958 while for M = 3 is n = 960 − 2 − 2 = 956; the unsmoothed SPI6 series has n = 960 − 5 = 955 elements, and the smoothed series with M of 5, n = 960 − 5 − 4 = 951 elements. The smoothed SPI series were organized into matrices, X, with different number of rows (cases or observations) but with the same number of columns (the 41 rain gauges, i.e., 41 variables), namely for SPI3, X 958×41 , X 957×41 , X 956×41 , for M equal to 1, 2 and 3, respectively, and for SPI6, X 955×41 , X 953×41 , X 951×41 , for M equal to 1, 3 and 5, also respectively.
The factor analysis was applied to the data set to transform the 41 variables into the same number of factors (F)-extracted by the PC method-by means of simple linear transformations. The first factors are expected to account for meaningful amounts of variance. By using the eigenvalues (λ), information about the number of F to retain and the contribution to the data variance of each F (individually and cumulative) can be extracted. The Kaiser's rule [63], with threshold of λ > 1.00, was used as the criterion to choose the number of F to retain (Section 3.3). According to the results of Table 4, this rule suggests a three-factor solution regardless the timescale of SPI and running length, with explained cumulative variance (Σ%Var) higher than 80.00% (Tables 4 and 5 for the unrotated and rotated cases, respectively).
By retaining the first three F, the cumulative %Var is in any case higher than 84.00% of the original data set. Up from the fourth F (F4), the individually explained variance is relatively meaningless. Based on the Kaiser rule, only the first three F were retained for a Varimax maximizing rotation (Varimax normalized) of the original variable space. The rotation aims at best fitting the SPI series to the axes that represent the factors, and redistributing the explained variance of each F in the most unequal way. By this way, the original problem of 41 dimensions, was transformed into a three-dimensional problem. The same number of F was retained in a previous study by Espinosa et al. [47] but with different factor retention criteria-based on the scree plot and on some clustering techniques [58].  Table 5. Factor loadings (correlation coefficients) from the factor analysis associated with the rotated solution for different timescales of SPI and running lengths (M). Correlations equal or higher than 0.60 are marked with an asterisk (*). Smaller factor loadings were assumed to be uncorrelated with their respective F. The rain gauges are sorted in terms of the proposed regionalization.  Factor loadings are part of the output from factor analysis. They explain the correlations between the observed variables and the underlying factors. According to the three-dimensional solution (after Varimax maximizing rotation), for the first factor (F1) and regardless of the timescale and M, there are relatively strong factor loadings (correlation coefficients of 0.56 or higher) on the SPI series at the 11 rain gauges located in the northern slope of Madeira and small loadings at the rain gauges located in the southern slope and in central region, as reported in Table 5.

Region Code
It is clear from Table 5 that the rain gauges with SPI series with the strongest association with the underlying latent variable factor 2 (F2), are the 14 located in the southern slope, with factor loadings of 0.60 or higher. The SPI series at the 16 rain gauges of the central highlands region are associated with factor 3 (F3)-factor loadings of 0.61 or higher. The foregoing suggests that individual factors are modeled by different individual SPI series (provided they are relevant). Please note that among comparable unrotated and rotated solutions, the %Var of each F changes but the Σ%Var remains constant (Tables 4 and 5). Factor loadings smaller than 0.60 were assumed to be uncorrelated with their respective F. Table 5 shows that the cumulative variance, Σ%Var, decreases as the running length increases, but this is just an effect of the decreasing length of the series. Figure 3 depicts the spatial distribution of the rotated factor loadings over Madeira, suggesting that the island encompasses three distinct regions characterized by different SPI temporal variability. The factor loading patterns for all the SPI timescales and running lengths are similar, delineating three regions that include all the rain gauges, nearly cover the whole study area and do not overlap, namely: the northern slope or RG1 (11 rain gauges), the southern slope or RG2 (14 rain gauges), and the central region or RG3 (16 rain gauges) (column Region of Table 1). Each map of the figure was obtained by cropping and joining the three regions with factor loadings approximately higher than 0.6. Each one of these regions was delimited by applying the inverse distance weighting (IDW) with an exponent of 2 [89] to the corresponding factor loadings. The regions are related to the regionalized SPI, i.e., to the factor scores of F1, F2, and F3-SPI-RG1, SPI-RG2 and SPI-RG3, respectively-as shown in Figure 4. The similar spatial ( Figure 3) and temporal ( Figure 4) patterns suggests that the application of MA did not influence the drought regionalization.  However, the factor loadings shown in Table 5 give rise to some considerations that support the choice of the M values. In any region and SPI timescale, when selecting a value for M it is important to ensure that the corresponding factor loadings for the rain gauges there located are the highest, i.e., for that M, those rain gauges should be highly positively correlated with the F of the region but uncorrelated, as much as possible, to the F of the other regions.
Due to the importance of the central region, RG3, on the availability and management of Madeira freshwater resources, special attention was given to the same when applying the previous criteria. Accordingly, based on the values of Table 5 For the central region, RG3, Figure 5 presents the temporal evolution for the period from 2000 to 2016, adopted as example, of the regionalized SPI (SPI-RG3) from the factor analysis based on the unsmoothed and smoothed series with different running lengths. The figure shows that the application of MA to the original SPI series prior to the factor analysis, eliminates spurious and short duration events for both regionalized SPI3 and SPI6, and that the value of the running length determines the smoothness of the factor score time series. Although the SPI series with M of 2, 3, and 5, are smoother than those computed based on the unsmoothed data, the general pattern for the different time series is the same.  Also, for RG3, Figure 7 enables a closer look at the drought characteristics. For both SPI timescales, it shows that for the unsmoothed SPI series there is a higher number of 1-month droughts; however with smaller magnitude which can make the drought analysis biased and, therefore, less representative. As the running length increases, the number of drought events decreases, while Dd and Dm increase. The smoothed regionalized series always allow the recognition of the more severe droughts with longer duration and higher magnitude. On the basis of these findings and of those from the factor analysis, the smoothed SPI series, whether at the rain gauges (non-regionalized) or regionalized based on M = 3, for SPI3, and on M = 5, for SPI6, have been selected in pursuing the study based on copulas.

Estimation of Drought Characteristics and Univariate Analysis
The understanding of the relationship among drought duration and magnitude is essential to better comprehend the phenomenon and its impacts. For this purpose, the Dd and Dm values were obtained for the 80-year period by applying the run theory (Section 3.1) to the smoothed SPI3 and SPI6 series (M = 3 and M = 5, respectively), as shown in Figure 8. The resulting descriptive statistics are presented in Table 6. The number of events that occurred in the different regions of Madeira from 1937-2016 ranged from 58 to 65, for SPI3, and from 29 to 35, for SPI6. Despite the smaller number of events for SPI6 comparatively to SPI3 (which resulted in larger interarrival times), the Dd and Dm values indicate worse drought conditions. The Kendall's τ correlation coefficient presents very high values (higher than 0.80) which are in accordance with dependence visible in the scatter plots of Figure 8. However, such coefficient measures only the overall strength of the association between Dd and Dm, but does not give information about how such association varies across the distribution [90]. The higher Kendall's τ for SPI6 denotes a stronger correlation between Dd and Dm. The mean values of Dd and Dm suggest, for SPI3, regions with similar characteristics, while for SPI6, the values for RG3 are higher. However, these findings are quite vague because they are based on a univariate appraisal which justifies a copula-based characterization of the dependency structure between Dd and Dm, presented in the next sections. Table 6. For each region, drought events statistics for u < −0.84 based on the regionalized SPI3 (with M = 3) and SPI6 (with M = 5). E(L) is the expected drought interarrival time. Kendall's τ is the correlation between Dd (in month) and Dm.

Estimation of Bivariate Joint Distributions
The bivariate copulas of Table 7 were used to determine the best-fitted copula for modeling bivariate joint distribution between drought duration and magnitude aiming at obtaining the corresponding joint return periods, which are essential for evaluating and predicting the regional drought risks. According to Sections 3.4 and 3.5, one copula function was selected for each homogeneous region and timescale. The estimated parameters and values of the goodness-of-fit test (AIC) for the best-fitted copula are listed in the same table. For SPI3, for example, the Survival Gumbel was chosen for all the three regions. On the other hand, for SPI6, a specific copula family was assigned to each region, meaning that the drought characteristics and their dependence are more distinct among regions. Table 7. Selected copula families, Kendall tau (τ), marginal distributions and parameters of the models. Par stands for the copula parameter and Par1 and Par2 for the marginal distribution parameters.  Figure 9 presents the copulas that best describe the observed dependence patterns between normalized values of Dd and Dm. The normalization was performed to allow direct comparisons among multivariate normal shapes and to bring out specific features, such as sharp corners indicating tail dependence [84]. Only SPI6 in region RG3 (SPI6-RG3) presented a copula without tail dependence, i.e., a Gaussian one. On the other hand, SPI6-RG2 presented the greatest lower tail dependence. This result indicates that with the increase in drought duration and magnitude values, the linear correlation between the variables tends to decrease. Although the copulas for SPI6-RG1 and SPI6-RG2 are of the same class (Archimedean copulas, Table 3), the dependence structure of the former is closer to SPI6-RG3 (Meta-elliptic copula). This result indicates the existence of a lower tail dependence for SPI6-RG1 less pronounced than the one for SPI6-RG2.

Regional Bivariate Return Period of Drought Events
Since various Dd and Dm combinations can result in the same return period, the joint return periods defined by Equations (9) and (10)  For each region and SPI timescale, Table 8 identifies the five drought events with the highest return periods according to the different approaches. Just for the sake of systematizing, the start dates of the droughts were identified based on the events to which correspond the return periods T Dd & Dm .
A numerical example shows that for SPI6-RG1, regardless the approach, the worst drought, with Dd = 27 months and Dm = 39.49, occurred in June 2002. According to the univariate approach (Equations (7) and (8)), the return periods are 167.5 and 99.7 years. As for the bivariate approach, the return periods T Dd or Dm and T Dd & Dm are 96.1 and 178.7 years (Equations (9) and (10)). Finally, the conditional return periods are T Dd|Dm ≥m = 7867.7 years and T Dm|Dd ≥d = 13, 218.2 years (Equations (11) and (12), respectively). The univariate return periods were computed based on the best-fitted distribution for each region and drought characteristic (Section 3.4), and are also identified in Table 7. Table 8 shows that for RG1 and RG2 the ranking of the exceptionality of the drought events for each SPI timescale seldom coincides. RG3 is an exception, with only the event beginning in September 2013, according to SPI3, out of order. This coherence among results may indicate a better performance of the models due to the more homogeneous behavior of the region. The table also stresses that the conditional return periods are always much higher because they are restricted to smaller sample spaces due to the conditional constraints (drought magnitude or duration exceeding certain thresholds). Figures 11 and 12 represent the conditional return periods although only for SPI6, in order to shorten the paper, but also because the results for SPI3 are similar. Please note that due to the adopted scale of the axes some of the more exceptional drought events of Table 8 are not represented.     Figure 13 summarizes the results from the univariate and bivariate analysis based on the representation of the return periods of all the drought events from 1937 to 2016. Although the return period is a discontinuous variable, in the figure the different points were connected just to improve its readability, thus allowing easy detection of the most exceptional drought events (i.e., with the highest return periods) in Madeira Island.
The results for SPI3 show that the period from January 1937 to December 1999 has been characterized by numerous drought events (represented by the black bullets) that were particularly intense in RG1 and RG2 in the earlier years. In RG1, the first and long-term drought event took place from September 1943 to July 1944 with T Dd & Dm = 327.9 years (Table 8) (not depicted in Figure 13 due to the vertical scale adopted for improving the legibility of the figure.). Approximately three years after its end (that is, in July 1947), two considerable events occurred, also spaced of three years (T Dd & Dm of 230.8 and 119.9 years), but this time in RG2. After the end of these events there were not very exceptional droughts in any region until November 2003. From this year on, several events with sustained high return periods took place, namely in RG3, the most important region in terms of freshwater sources. As for SPI6, RG1 discloses a performance opposite to that of SPI3, with more exceptional droughts in more recent years. The results for RG2 and RG3 are similar to those based on SPI3, with a higher concentration of important droughts in the past and more considerable events in recent times, respectively. These results are in line with the study about time-dependent occurrence rates of droughts also in Madeira by Espinosa et al. [47].

Discussion and Conclusions
This work presents a systematic analysis of the effect of the moving average filter on drought assessment based on the SPI series (SPI3 and SPI6) from 1936 to 2016, and of the jointly modeling of drought characteristics with bivariate copulas for Madeira. The factor loadings from the factor analysis applied to unsmoothed and smoothed SPI series identified three distinct regions with different temporal patterns of the droughts: northern slope (RG1), southern slope (RG2) and central region (RG3). RG1 denotes exceptional droughts both in earlier years and at present, RG2 suffered the worst droughts in the past, while RG3 has featured more exceptional drought events from 2000 onwards. Special attention was given to this last region due to its relevance for the island's water security, as main region for the recharge of the groundwater reservoirs.
Planning and management of water resources systems under drought conditions often require the estimation of return periods of the exceptional drought events [59,85]. However, droughts are defined by multiple characteristics, some of them, presumably highly correlated. Based on the regionalized SPI series, two drought characteristics, namely drought duration (Dd) and magnitude (Dm) were analyzed and bivariate copulas were implemented to construct their joint distributions aiming at estimating return periods. The drought maximum intensity (Dmi, Figure 1) was not considered to avoid possible redundant information, since it is already part of the Dm data, and also due to its poorly correlation with Dd as stated by Sharma [91], Dracup et al. [92] and Chen et al. [87].
The bivariate approach enabled the generation of the joint return periods return periods between Dd and Dm of Figure 10. The figure shows that the events that took place more recently, namely after January 2000 (red circles), present higher or even the highest return periods, meaning that the drought events they represent were more exceptional. This is particularly evident in RG3, regardless the SPI timescale, but also in RG1, in this case only for SPI6. Table 8 also stresses the critical situation of RG3. In fact, from the five more exceptional drought events, regardless the SPI timescale, four took place after 2000 (more specifically after 2006). The table also shows the poorer performance of the univariate approach: in fact, for a same region and SPI timescales, all the droughts with the same duration would have the same return period, regardless the drought magnitude, and vice versa.
Aiming at discussing the information gained with the application of factor analysis and copulas, the results presented in Figure 13 will be compared to those from a drought characterization considering the 41 original SPI series. For that purpose, for the entire island and for each of the homogeneous regions, the yearly areas affected by moderate, severe and extreme drought were computed, based on the non-regionalized and unsmoothed SPI for wettest months of the rainy season and for the entire season itself, SPI3 Nov−Jan and SPI6 Oct−Mar , respectively, as shown in Figures 14 and 15. The Thiessen polygon method [93] was applied to assign an areal influence (ATP from Table 1) to each rain gauge. The area attributed in each year to a specified drought category was given by the cumulative areas of the rain gauges with values of SPI within the limits of that category ( Table 2). The total areas thus achieved for each year were made dimensionless by division by the Madeira area or by the area assigned to each region, in this last case, computed based on the rain gauges located in the region. Figures 14 and 15 show that generally, the island and each of its three regions have been similarly affected in terms of percentage of the area under drought conditions. Most of the same driest spells occurred in the three regions-such as exceptionally extreme droughts of 1947, 1995, and 2011 [11]-but sometimes with different distribution of the areas assigned to the three drought categories. Figure 15 suggests that the distribution of the years with moderate, severe and extreme droughts is similar among regions for SPI3 Nov−Jan . However, in the case of SPI6 Oct−Mar , regardless the affected area, for RG1 and RG3 the concentration of years with severe and extreme droughts is higher in recent years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016), whereas RG2 exhibits an opposite behavior, i.e., a higher concentration of years with severe and extreme events in the past. Even though Figure 15 refers to annual series and Figure 13 to continuous series there is a certain similarity between sub-periods with more severe and extreme droughts and with the highest return periods, denoting coherence among different characterizations.  Finally, the same copula approach was applied to the regionalized unsmoothed SPI. The results for T Dd & Dm based on SPI6-RG3 are presented in Figure 16 which also includes the ones from the smoothed series with a running length of M = 5. The figure confirms that smoothing the SPI series prior to factor analysis allows the elimination of the spurious drought events and/or their clustering, thus improving the visibility of the more exceptional droughts, as it happened in November 2010 and October 2012. It should be stressed that the return periods that result from the unsmoothed SPI series may be much higher than those from the smoothed series due to the higher temporal variability of the unsmoothed SPI series. The less steep response of the smoothed series compared to the one from the unsmoothed series may be because the MA filter has a good performance in the time domain (as mentioned in Section 3.2), but has a poor performance in the frequency domain [33]. Since in the present study, the response that describes how the information in the time domain (the SPI series) is being modified by the system is the important parameter and the frequency response is of little concern, this makes the MA filter applicable. In other fields of hydrology, such as in flood frequency analysis Halbert et al. [94], Archer et al. [95], the response in the frequency domain is all important, while the one in the time domain does not matter. Consequently, a frequency-domain filter may be more appropriate, e.g., the Fourier transforms [96]. Therefore, the selection of a digital filter should consider the features of the studied phenomenon. Advances were made in the study of drought analysis based on regionalized smoothed series including on the criteria to selected the running length, M, and on the consequences of different M values. The copula approach showed that the drought events may have completely different return periods, depending on how the relationship between Dd and Dm is accounted for. In any case, the univariate approach only provides part of the information, often underestimating the exceptionality of the events. The use of bivariate approaches, namely based on copulas, can easily overcome such constraint.