Parametric Assessment of Trend Test Power in a Changing Environment

In the context of climate and environmental change assessment, the use of probabilistic models in which the parameters of a given distribution may vary in accordance with time has reinforced the need for appropriate procedures to recognize the “statistical significance” of trends in data series arising from stochastic processes. This paper introduces a parametric methodology, which exploits a measure based on the Akaike Information Criterion (AICΔ), and a Rescaled version of the Generalized Extreme Value distribution, in which a linear deterministic trend in the position parameter is accounted for. A Monte Carlo experiment was set up with the generation of nonstationary synthetic series characterized by different sample lengths and covering a wide range of the shape and scale parameters. The performances of statistical tests based on the parametric AICΔ and the non-parametric Mann-Kendall measures were evaluated and compared with reference to observed ranges of annual maxima of precipitation, peak flow, and wind speed. Results allow for sensitivity analysis of the test power and show a strong dependence on the trend coefficient and the L-Coefficient of Variation of the parent distribution from the upper-bounded to the heavy-tailed special cases. An analysis of the sample variability of the position parameter is also presented, based on the same generation sets.


Introduction
The environment and climate change are recognized worldwide as an interdisciplinary area of investigation which reaches into different fields of research (e.g., [1,2]). It regards earth sciences such as climatology, atmospheric physics and chemistry, and hydrology. It is also an area of investigation in life sciences, including ecology and the biology of any living organism on earth, and has strong implications for social, economic, and industrial sciences, considering new branches of emerging risks and cascading effects. Due to such interactions, multi-disciplinary approaches should also be more developed, as proposed by Metzger et al. [3] and Cagle et al. [4], considering that, even in contiguous fields of environmental analyses and applications, the perspective and scale of investigations can be very different, ranging from global to regional, catchment-size and at-site assessment of physical quantities and their observations [5,6].
This paper deals with trend assessment in extreme events of environmental variables and how it affects their frequency distribution. We focus on annual maxima of peak flow, rainfall, and wind speed, which trigger natural hazards such as floods, storms, sea waves, and storm surges. The same approach could also be adapted to describe other phenomena, such as heatwaves and droughts.
Our perspective is typical of the hydrological sciences and is strongly based on the analysis of time series of variables modeled as stochastic processes. Moreover, this is the approach adopted in the civil engineering practice, which is focused on providing sustainable solutions for living and developing human activity and security in terms of adaptation to environmental changes and societal preparedness for natural hazards.
Over the last few decades, increasing attention has been dedicated by scientists to statistical analyses of environmental random variables. Interest in this field is reasonably motivated by the need for finding reliable approaches aimed at detecting and understanding the nature of climate change. This has led to a vast amount of study into the detection of trends, change points, or shifts in time series, collecting a wide range of tests, each one with proper peculiarities and characteristics (e.g., [7]). A paramount example is the well-known Mann-Kendall (MK) test, which is probably the most used tool in environmental analysis for investigating the statistical significance of an observed trend and belongs to the vast category of Null-Hypothesis Statistical Tests (NHST). It should be remarked that while many studies are produced by exploiting NHST, a significant number of other papers warn of theoretical and practical drawbacks to their use (e.g., [8,9]). Some of these warnings deal with the misuse of statistical tests. Two types of error may occur during the application of these tests: the first one (called Type I error) is to reject the null hypothesis when it is true, the second one (Type II error) is to accept the null hypothesis when it is false. In practice, the probability of committing a Type I error can be estimated theoretically or numerically, thanks also to the increasing availability of tools (such as R software) able to provide results in a few steps. It should be noted that the ease of applying these approaches and of managing results has often led to misuse or misconceptions of their outputs. In this regard, while much attention has always been paid to the Type I error, much less care has been dedicated to evaluating the Type II error, which should be considered more important [10][11][12].
In this debate, the American Statistical Association (ASA) dedicated a journal special issue "Moving to a World Beyond 'p < 0.05'", to encourage researchers to be "more thoughtful when applying statistical significance tests" [13].
The need for giving an appropriate evaluation of the power of the MK test was highlighted, among others, by Totaro et al. [14]. They followed the approach proposed by Yue et al. [15] that produced a numerical evaluation of the MK test power for Generalized Extreme Value (GEV) distributed samples. Wang et al. [16] provided a practical relationship for a proper power assessment of specific time series.
In environmental applications in particular, the length of the observed series required to reach an acceptable power value could be very different according to the variable's parent distribution and the strength of the trend [10,[14][15][16]. It could be much shorter for temperature and humidity time series than for runoff and precipitation time series that usually show higher space-time variability.
Focusing attention on the hydrological field, a proper approach to the theory of stochastic processes is a conditio sine qua non for addressing the analysis correctly. In particular, the occurrence and magnitude of the hydrological variables can be considered as an outcome of a process that can be characterized by one or more probability distributions. As a consequence, a correct analysis should rely on a process of identification able to discern between stationary and nonstationary models [17,18]. From this point of view, the goal of this kind of analysis is shifted to the model selection framework in which nonstationarity is identified as a deterministic time-dependence in parameters of the parent distribution [19].
According to Yohe et al. [20], a sustainable approach to risk, hazard, and disaster risk management entails both disaster risk reduction and adaptation to climate change. One way to pursue these targets is by changing guidelines and procedures of design to account for the increasing magnitude of events due to climate change. The evaluation of frequency distributions for design purposes is typically treated with a parametric approach, through probabilistic models with parameters estimated from observed data (see [21]). Therefore, the parent distribution is exploited to find quantiles of the assigned probability of exceedance. The parametric procedure proposed here allows for the detection of trends to be used for design purposes accounting for nonstationary conditions. On the other hand, non-parametric approaches are very commonly adopted in the environmental analysis because they only exploit observed data and do not require the identification of a parent distribution and its parameters. A typical example is the MK test.
Following this path of investigation, Totaro et al. [14] showed that both parametric and non-parametric tests for trend assessment suffer from the following issues: (i) the results of the test are dependent on the parent distribution and (ii) for weak trends, i.e., for small values of the trend coefficient, the test power reduces to low and unacceptable values.
They carried out a sensitivity analysis on a set of parameters of the GEV distribution that was representative of the annual maximum rainfall series. For a series of length L = 30, depending on the parent parameter values, trend coefficients up to 0.8 mm per year may lead to power values below the conventional threshold of 0.8. The situation improves for longer series nevertheless, and the problem persists also for L = 50; the power is still too low for trend values up to 0.5 mm/y for a Gumbel distributed parent and up to even larger values for heavy-tailed distributions.
They conclude that, as already acknowledged in many other scientific fields such as medical and social sciences [22,23], the evaluation of power should be always performed to assess the real physical meaning of the test outcomes and, to such purpose, they proposed the use of a parametric approach, which allows for a simple power evaluation by numerical Monte Carlo experiments, applicable to any specific context.
The goals of this paper are (i) to further develop the use of parametric tests, based on model selection criteria (Akaike Information Criterion, AIC ∆ ) described in Section 2, (ii) to provide a generalized methodology which exploits a Rescaled GEV distributed variable, and (iii) to extend the evaluations performed by Totaro et al. [14] to parameter ranges covering the variability of different physical variables suitable for climate change assessment. Such a development is of great practical interest because, in principle, it provides a general tool for estimating the power value and assesses the real statistical significance on extreme values of any variable of environmental interest.
To this purpose, we investigated the range of variability of three different annual maximum series, including precipitation, peak flow, and wind speed. In particular, the search was conducted focusing on world datasets found in literature following the GEV distribution and exploiting parameter values as a function of L-moments ratios [24], widely employed in regional frequency analysis.
The paper is structured as follows: in Section 2, the MK and AIC ∆ tests are illustrated and a Rescaled GEV distribution is derived; in Section 3, the explored world datasets and the range-values of L-moments ratios are described. Section 4 focuses on the power of the above-mentioned tests as a function of L-Coefficient of Variation (L-CV) and trend slope. Considerations about the sampling variability of the parent distribution position parameters are also reported. Section 5 summarizes the main conclusions of this work and focuses on the main implications of these findings in research perspectives.

Methodologies
In this paragraph, three main topics are described. In Section 2.1, the parametric AIC ∆ test is proposed. In Section 2.2, the Rescaled GEV distribution is defined, specifying relationships between dimensionless parameters and theoretical GEV L-moments. Finally, in Section 2.3, the sensitivity analysis and the procedure followed for the evaluation of test power are illustrated.

The Akaike Information Criteria Based Test
NHST are precious tools for identifying and quantifying particular properties of recorded data. Their correct use includes a conscious interpretation of their outputs for supporting deductions in the framework of climate change.
The non-parametric Mann-Kendall test [25,26] is widely diffused in environmental analysis, due to its simplicity in the evaluation of the statistical significance of a trend. Its structure and main properties are reported in Appendix A.
In this paper, we introduce a new parametric test based on the use of the Akaike Information Criterion (AIC; [27]) for model selection (e.g., [28][29][30]). AIC is probably the most used likelihood-based criterion for model selection. It is legitimate to consider its success due to simplicity and conceptual basis. It is based on the Kullback-Leibler distance measure. From a practical approach, defined as θ st and θ ns the set of parameters characterizing the probabilistic models, and withˆ (·) the maximized value of the log-likelihood function, AIC is defined as where k is the number of parameters of the selected model. According to this criterion, the model with the lowest value of AIC should be the better one among the set of models examined. Following the idea of exploiting model selection criteria for detecting changes in time series analysis, Totaro et al. [14] proposed the use of AIC R , defined as the ratio between the AIC values obtained, respectively, from nonstationary and stationary candidate models. They showed that, through Monte Carlo numerical experiments, is possible to evaluate AIC R statistical distribution and find the AIC R,α threshold value corresponding to an assigned level of significance α . In this way, AIC R,α AIC R can be used as a test variable whose power depends on the parent distribution shape and scale parameters.
In this paper, the same procedure is applied to AIC ∆ [31], defined as the difference: which assumes positive values if the stationary model is the preferred one and negative values in the opposite case. The AIC ∆ distribution can be numerically computed with the aim to find a threshold value AIC ∆,α for an α level of significance, by the following procedure:

1.
Generate a certain number of samples (e.g., 10,000) from an assigned stationary model, assumed as the null hypothesis of the test; 2.
Compute AIC ∆ for all samples and determine its empirical probability distribution; 3.
Evaluate the threshold value AIC ∆,α corresponding to an assigned level of significance α (here set to 0.05), that will be used for accepting or rejecting the null hypothesis of stationarity of point 1.

Rescaled GEV Distribution
The Generalized Extreme Value (GEV) distribution was introduced in the hydrological literature by Jenkinson [32]. Its success in practical applications is mainly due to its ability for contemplating Gumbel (EV1), Fréchet (EV2), and Weibull (EV3) distributions as special cases. Assuming z is an annual maximum stationary process variable, with parameter set θ , the GEV cumulative distribution function is expressed as where ζ , σ, and ε are, respectively, position, scale, and shape parameters. Analytical positions require that σ must be strictly positive. Several methods are available for estimating GEV parameters in the stationary case, such as Maximum Likelihood, probability-weighted moments, or L-moments. This latter approach, introduced by Hosking [24], makes use of linear combinations of order statistics. For the aim of this paper, it is useful to report theoretical expressions of the following L-moment values (first and second-order) and ratios (L-CV, L-Skewness, L-Kurtosis) [33]: L-Kurtosis Equation (3) can be extended to the case of nonstationarity, simply assuming that its three parameters can be modeled as a function of time [19,34]: Whilst from a theoretical perspective, the θ ns parameter set in Equation (9) could be modeled in a variety of ways; in this paper, we adopt the simplest time-dependent behavior by considering σ and ε constant values and only the position parameter to follow a linear trend dependence, as This requires only four parameters to be estimated [ ζ 0 , ζ 1 , σ, ε ]. This can be done via the Maximum Likelihood method. Several R packages can be used for this purpose, among which are extRemes [35] and ismev.
The range of values of the four parameters is strongly linked to observed time series, leading to a variety of values and cases depending on the hydrological variable taken into account (e.g., rainfall, flow peak, wind speed). In order to produce results of general interest, in this paper a dimensionless formulation in Equations (3)-(11) is proposed by introducing the rescaled aleatory variable z defined as Following this transformation also the position and scale parameters are rescaled by ζ 0 , while the shape parameter (which is dimensionless by definition) remains unchanged, hence we obtain to provide the Rescaled GEV written as with θ = [ζ 1 , σ , ε] , which is nonstationary if ζ 1 0, and stationary if ζ 1 = 0 . The first and second-order L-moments are also rescaled to give Second-order and the other L-moment ratios (L-Skewness and L-Kurtosis) remain unaffected. It is worth noting that both the mean and the L-CV values change in time according to Equations (15)- (17). In the following we will refer to the stationary (or initial) L-CV value evaluated as (18) With these statements, a generalization of results provided by Totaro et al. [14] can be provided.

Power Evaluation and Sensitivity Analysis
The application of NHST is based on the definition of two important quantities: level of significance and power. The first one is defined as the probability of committing a Type I error, i.e., to incorrectly reject the null hypothesis when it is true. Defining the level of significance is a required task and its use is widespread, also sometimes in comparison with observed p-values. On the other hand, it has to be remarked that equal or even more importance should be given to the probability of a Type II error, i.e., the incorrect acceptance of the null hypothesis when it is false. Complementary to this one is the power of the test, defined as the probability to correctly reject the null hypothesis when it is false. Compared to the huge number of studies dealing with the statistical hypothesis testing of stationarity in environmental sciences, only a few papers focus on the evaluation of power and its crucial role (e.g., [9,10]). Among these, Totaro et al. [14] highlighted the need to establish a conventional power value as a minimum for accepting test results and also suggested the use of a parametric approach to evaluate the test power accounting for the length of data series and parent parameter values. Thus, applying both the MK and AIC ∆ and assuming as a null hypothesis that data are originated from a stationary model, the test power can be evaluated simulating a certain number of samples N from a nonstationary model and counting the number of series N r in which the null hypothesis is rejected: The sensitivity analysis presented in this paper was carried out repeating such a procedure for sets of N = 2000 series generated by a Rescaled GEV parent distribution, parameter sets obtained by the combination of different values of ζ 1 , σ', ε , and different sample lengths L.

Literature Parameter Range of Extreme Rainfall, Flow Peak, and Wind Speed
Datasets of three hydrological variables (annual maxima of rainfall, flood, and wind) were analyzed for performing this analysis. The following sub-paragraphs report the main sample properties of each of these variables. In Tables 1-3

Literature Studies on Extreme Rainfall
The use of L-moments for regional frequency analysis is widely diffused in hydrological practice. Several applications of this approach can be found about the analysis of annual maximum daily rainfall [36][37][38][39].  Table 1 reports studies illustrating ranges of the main L-moments ratios. In particular, Koutsoyiannis and Papalexiou [40] collected a world dataset of annual maximum daily rainfall, providing reliable information about the state of the art of the distribution of their statistical characteristics.
It is worth noting that the L-CV has a relatively limited range of variation, which is between 0.13 and 0.44. The L-Skewness, instead, moves from negative (about −0.13) to positive values up to 0.5. An exception is represented by the Iran case study, where the upper limits are 0.6 (L-CV) and 0.75 (L-Skewness).

Literature Studies on Extreme Flow Peak
Flood frequency analysis is a diffused technique for evaluating the design values of peak flow. Its importance is mainly due to the strong impact that floods exercise on societies and assets. A wide literature is available on methods and approaches for the implementation of statistical analysis, especially in regional contexts. In this framework, numerous studies were realized (e.g., [47][48][49][50]). In Table 2, an overview of L-moments ratios from the analyzed datasets is provided.

Literature Studies on Extreme Wind Speed
Jung and Schindler [59] provided a wide and complete review of the selection of wind speed distributions. Several studies have assessed the best-fitting distributions and methodologies for wind speed prediction [60,61]. Fawad et al. [62] highlighted the importance of extreme wind speed distribution in different fields, such as renewable energies, structural designs, and meteorology. The use of L-moments for extreme value analysis was applied by [63][64][65][66]. Hundecha et al. [67] exploited the application of the nonstationary GEV model for investigating the annual maxima of wind speed in Canada.
In Table 3, summary statistics for annual maximum wind speeds are reported.

Results and Discussion
The sensitivity analysis was carried out evaluating the test power as described in Section 2.3 and repeating such a procedure for sets of N = 2000 series generated by a Rescaled GEV parent distribution with three different values of ε accounting for different GEV shapes: upper-bounded EV3 (ε = −0.5), EV1 (ε = 0), and heavy-tailed EV2 (ε = 0.5). Furthermore, according to the database analysis reported in Sections 3.1-3.3, we selected a range of L-CV which goes from 0.1 to 0.8 and sampled such a range with a step of 0.1. Then, we exploited the relationship (18) to evaluate the σ' values, according to the chosen set of ε and L-CV values. Resulting σ' values are reported in Table 4.   We considered different values of the trend ζ 1 ranging from −0.05 to 0.05 (± 5% per year) with a step of 0.005, and three sample lengths L (30,50,70). Considering all combinations of ε , σ , ζ 1 , and L, we obtained 1512 different generation sets

Power of AIC ∆ and MK Tests for Different Values of ε and L-CV.
For each of the underlying distributions described above, Figures 1 and 2 show the dependence between the rescaled trend coefficient ζ 1 and the power of the AIC ∆ and MK tests, according to three defined values of the shape parameter ε (−0.5, 0, 0.5) , sample size L (30, 50,70), and L-CV (0.1 : 0.1 : 0.8) . All curves obtained with values of the trend ζ 1 0 are generated by a nonstationary parent distribution.  In general, AICΔ shows a higher power than MK, except for upper-bounded GEV, where MK is slightly better performing. Such differences are more evident for heavy-tailed GEV and smaller sample length. Nevertheless, both show strong dependence from , ε, L-CV, and L. The power, for In general, AIC ∆ shows a higher power than MK, except for upper-bounded GEV, where MK is slightly better performing. Such differences are more evident for heavy-tailed GEV and smaller sample length. Nevertheless, both show strong dependence from ζ 1 , ε, L-CV, and L. The power, for all ε values, decreases with increasing L-CV and decreasing ζ 1 and L.
It is interesting to remark that, apart from the well-known dependence from trend value, sample size, and shape parameter, a strong influence is exerted by the L-CV values. As an example, the left column of Figure 1 shows results for the minimum considered sample length (L = 30)-for all the ε values and the entire range of trend values considered, the AIC ∆ power is below 0.8 when L-CV is higher than 0.5. For further increase of L-CV, it collapses to values as small as 0.2. This situation gradually expires when increasing the sample size up to 70. Faster growth in power performances is detected for the upper-bounded GEV (ε = −0.5).
A similar but even worse situation is found for the MK test. For L = 30 and all the values of ε , the test is characterized by power values lower than 0.8 if L-CV is higher than 0.3. Such an implication is remarkable because it affects the reliability of this test for all series characterized by high variability. As for AIC ∆ , there is an obvious improvement in the performances as the sample size increases. These results are in perfect agreement with the findings of Yue et al. [15], who analyzed the MK test power from generation performed with GEV parent distribution, with different values of the trend, coefficient of variation, and sample size. They are also consistent with the results shown by Wang et al. [16] and extend the range of climatic variability explored by Totaro et al. [14].
Results shown above suggest that for different combinations of ε, L-CV, and L it is possible to evaluate the range of trend values ζ 1 providing acceptable power values. For some cases, we had to extend the range of ζ 1 beyond ±5% per year, up to values that allow the reaching of an acceptable power value.
This analysis is reported with reference to absolute values of the trend because the power curves are all symmetric with respect to the vertical axis ζ 1 = 0. In Figure 3 is reported the dependence from L-CV of the minimum absolute value of the trend coefficient ζ 1 , which is necessary for achieving the conventional power value of 0.8. This dependence is shown for different ε and L values, both for AIC ∆ and MK tests. It is interesting to remark that, apart from the well-known dependence from trend value, sample size, and shape parameter, a strong influence is exerted by the L-CV values. As an example, the left column of Figure 1 shows results for the minimum considered sample length (L = 30)-for all the ε values and the entire range of trend values considered, the AICΔ power is below 0.8 when L-CV is higher than 0.5. For further increase of L-CV, it collapses to values as small as 0.2. This situation gradually expires when increasing the sample size up to 70. Faster growth in power performances is detected for the upper-bounded GEV (ε = −0.5).
A similar but even worse situation is found for the MK test. For = 30 and all the values of , the test is characterized by power values lower than 0.8 if L-CV is higher than 0.3. Such an implication is remarkable because it affects the reliability of this test for all series characterized by high variability. As for AICΔ, there is an obvious improvement in the performances as the sample size increases. These results are in perfect agreement with the findings of Yue et al. [15], who analyzed the MK test power from generation performed with GEV parent distribution, with different values of the trend, coefficient of variation, and sample size. They are also consistent with the results shown by Wang et al. [16] and extend the range of climatic variability explored by Totaro et al. [14].
Results shown above suggest that for different combinations of ε, L-CV, and L it is possible to evaluate the range of trend values providing acceptable power values. For some cases, we had to extend the range of beyond ±5% per year, up to values that allow the reaching of an acceptable power value.
This analysis is reported with reference to absolute values of the trend because the power curves are all symmetric with respect to the vertical axis = 0. In Figure 3 is reported the dependence from L-CV of the minimum absolute value of the trend coefficient , which is necessary for achieving the conventional power value of 0.8. This dependence is shown for different and L values, both for AICΔ and MK tests. The role of L-CV is crucial for a proper interpretation of results in this case too. Small L-CV values correspond to smaller | | ranges, with an increase of up to 300% when L-CV varies from 0.1 to 0.8. In principle, looking at Figure 3, it is possible to evaluate whether the chosen test is applicable for a series of known distribution and given length. The role of L-CV is crucial for a proper interpretation of results in this case too. Small L-CV values correspond to smaller ζ 1 ranges, with an increase of up to 300% when L-CV varies from 0.1 to 0.8. In principle, looking at Figure 3, it is possible to evaluate whether the chosen test is applicable for a series of known distribution and given length.

Sampling Variance of ζ 0
A required task to apply the methodology suggested in this paper to real data is the evaluation of parameters of the parent distribution [ζ 0 , ζ 1 , σ, ε]. Totaro et al. [14] already analyzed the sample variability and the empirical distribution obtained via Maximum Likelihood of parameters ζ 1 , ε, and σ. Whilst this paper provides an extension of the range of considered values of σ, the results obtained here and the conclusions that were drawn are not different. Nevertheless, for the purpose of this paper, we focus on the sample variability of ζ 0 , which is crucial for obtaining rescaled values ζ 1 and σ'. Results are obtained via Maximum Likelihood (ML) estimation of the four dimensionless parameters [ζ 0 , ζ 1 , σ , ε] from generations of the nonstationary GEV parent distribution performed in the Monte Carlo experiment. Hence, generations are all characterized by ζ 0 = 1 and different values of ζ 1 , σ , and ε . Only results for the shorter considered sample length (L = 30) are reported for brevity. Figure 4 shows that the empirical distributions of the ML estimate of ζ 0 (ML-ζ 0 ) are generally unbiased and independent on the trend value ζ 1 . The estimator efficiency decreases with increasing L-CV for all ε values. The sample-variability of ζ 0 can be analyzed also in Figure 5 where the standard deviation of the ML estimate of ζ 0 is reported. The three subplots highlight that for upper-bounded GEV and EV1 the sample variability ζ 0 is affected by L-CV but is independent of ζ 1 . Nevertheless, results relative to the heavy-tailed GEV (ε = +0.5) show that for the highest L-CV value (0.8) the sample variability of ζ 0 sharply increases for series characterized by negative trend values lower than −0.04 and positive trend values higher than about +0.045. Such evidence suggests that the efficiency of the ML estimator remains the same than for the stationary case (ζ 1 = 0) for a wide range of parameter values in the upper-bounded GEV and EV1, while for heavy-tailed GEV with also high L-CV and trend values close to ±5% per year, estimators other than ML should be used.

Sampling Variance of
A required task to apply the methodology suggested in this paper to real data is the evaluation of parameters of the parent distribution [ , , , ]. Totaro et al. [14] already analyzed the sample variability and the empirical distribution obtained via Maximum Likelihood of parameters , ε, and σ. Whilst this paper provides an extension of the range of considered values of σ, the results obtained here and the conclusions that were drawn are not different. Nevertheless, for the purpose of this paper, we focus on the sample variability of , which is crucial for obtaining rescaled values and σ'.  Figure 4 shows that the empirical distributions of the ML estimate of ′ (ML-′ ) are generally unbiased and independent on the trend value ′ . The estimator efficiency decreases with increasing L-CV for all values. The sample-variability of ′ can be analyzed also in Figure 5 where the standard deviation of the ML estimate of ′ is reported. The three subplots highlight that for upper-bounded GEV and EV1 the sample variability ′ is affected by L-CV but is independent of . Nevertheless, results relative to the heavy-tailed GEV (ε = +0.5) show that for the highest L-CV value (0.8) the sample variability of ′ sharply increases for series characterized by negative trend values lower than −0.04 and positive trend values higher than about +0.045. Such evidence suggests that the efficiency of the ML estimator remains the same than for the stationary case ( ′ = 0 ) for a wide range of parameter values in the upper-bounded GEV and EV1, while for heavy-tailed GEV with also high L-CV and trend values close to ±5% per year, estimators other than ML should be used.

Discussion
Salas and Obeysekera [19], exploiting the nonstationary GEV formulation provided by Coles [34], suggested that environment and climate change may affect the probability distribution of environmental variables by shifting in time the position parameter of the probability function (and possibly affecting also other distribution features, such as scale and shape parameters) from low to higher frequency. They showed that the presence of a positive trend in the position and scale parameters of the parent distribution may heavily affect the design standard of a built structure, such as a levee (floods), a bridge (wind speed), or other structural defenses like coastal protection structures and harbor breakwaters.
As an example, depending on the values of observed trends, a structure designed (in stationary condition) for a 100-year return period, could face an expected waiting time [70] of the next exceedance of fewer than 30 years (see also Du et al. [71]).
In the environmental sciences and related applications, the detection of stationarity or nonstationarity for a selected time series is usually performed by using nonparametric tests available in the literature (e.g., Mann-Kendall or CUSUM test). Recent studies (e.g., Totaro et al. [14]) show that this kind of analysis may exploit well-developed model selection tools (e.g., the Akaike Information Criterion); this implies the use of parametric methods, and raises issues related to parameter estimation of a probability distribution, in stationary and nonstationary conditions. In particular, this paper confirms results obtained by Totaro et al. [14] and extends their analysis to wider ranges of the GEV shape and scale parameters, depending on L-skewness and L-CV values typical of physical variables of interest for climate change assessment.
The power value of AIC ∆ and MK tests is reported for different sample sizes and three GEV special cases: upper-bounded, EV1, and heavy-tailed. A strong dependence of power on L-CV is found in all cases and values below the conventional threshold 0.8 are found for ranges of ζ 1 , which in many cases exceeds the limits of our analysis i.e., ±5% per year. For such cases, we have extended the range of ζ 1 up to values that allow the reaching of an acceptable power value. As a result of these analyses, we provide in Figure 3 the minimum absolute value of ζ 1 which could be detected with acceptable power value (greater than 0.8). Such a threshold value of the trend depends on the used test and also on the L-CV and sample length L of the observed series. The same figure may be used to figure out which is the minimum sample length needed to perform trend analysis for known values of ζ 1 and L-CV.
The analysis of sample variability of the position parameter shows that the ML procedure usually suggested for evaluating parameters of the nonstationary GEV distribution, in the explored range of parameter values and sample lengths, performs as well as for the stationary case (ζ 1 = 0) in both the upper-bounded and the EV1 special cases of GEV. For heavy-tailed GEV, with L-CV higher than 0.5, the sample variability sharply grows for trend coefficients approaching ±5% per year. On the other hand, the ML bias is independent of the trend coefficient in the entire examined trend coefficient range and for all the used parameter sets.
Results of this study, after Vogel et al. [10], suggest that procedures for assessing the statistical significance of observed trends in real-world data should follow a recognized experimental design protocol aimed to guarantee a minimum level of test power or, otherwise, a numerical assessment of the test power should be provided through a properly designed numerical Monte Carlo experiment. This could contribute to understanding the complex change patterns often observed in time series all over the world (e.g., [72,73]) together with the increasing number of natural hazards and related losses observed by many studies (e.g., [74,75]). Relatively weak but important trends could be investigated, apart from the concept of statistical significance as provided by procedures based on the adoption of a stationary null hypothesis [13]. On the other hand, much more effort should be devoted to the estimation of the trend coefficient. As shown by Totaro et al. [14], the very high sample variability shown by this parameter, even if it is estimated using a non-parametric method such as the well-known Sen's slope [76], may provide also a trend of a different sign (negative rather than positive or vice-versa) as an effect of high sample variability, for relatively short samples (30 years).
In order to exploit the parametric approach within an accepted framework for risk assessment, other important steps are required. The use of nonstationary distributions, such as the Rescaled GEV in Equation (14), raises the problem of estimating a four-parameter set from observed data. This task is not trivial, as was shown in Section 4 for the position parameter, and by Totaro et al. [14] for the trend coefficient, and the scale and shape parameters. Moreover, in real-world data, trends could be not linear, could also affect other parameters, and could be not homogeneous in a long period spanning for several decades. In principle, deterministic trends for future projections should be provided by suitable climate models according to specifically selected scenarios. In such a case, parametric procedures for trend detection on observed series could be useful to validate or select the climate models based on their runs on a reference period.
Such procedures are also of broad general interest since the recent occurrence of relevant natural disasters (such as floods, hurricanes, earthquakes, tsunamis) induced the scientific community to provide a deeper understanding of relationships between extreme events and new emerging risks, with particular focus on cascading multi-hazard effects [77][78][79]. In this framework, procedures of risk assessment accounting for environment and climate change may lead to recognizing new areas subject to risk. The evaluation of areas exposed to a potential hazard should be periodically revised, with the consequence that critical facilities and infrastructures that were assumed to be in a safe location could be re-considered exposed to risk.

Conclusions
In this paper, a parametric methodology based on the AIC is proposed, suitable for establishing whether the presence of a specified trend is detectable or not from a series of known characteristics. To this purpose, we introduced a GEV distributed variable rescaled by the position parameter.
We compared the performances of the parametric AIC ∆ test and non-parametric MK test, analyzing their power with reference to observed ranges of different hydrological extreme variables. With this aim, we scoped case studies found in the literature dealing with extensive databases of annual maximum series of precipitation, peak flow, and wind speed. Through Monte Carlo simulations, a sensitivity analysis of both tests' power was carried out to parameters of the Rescaled GEV distributed variable, showing that power strongly depends on dimensionless parameters ε, L-CV, and ζ 1, .
Results are of practical interest, and in particular, we provided a tool for evaluating if the length of an available sample series suffices in reaching a conventional power value, depending on the values of the parent distribution, in the hypothesis that only a linear trend on the position parameter occurs. Obviously, from a general point of view, other sources and types of nonstationarity also may occur and negatively affect test power.
This paper sheds more light on parametric and non-parametric approaches for assessing nonstationarity of climatic variables and highlights important drawbacks in both of them. Nevertheless, it also shows that through a parametric procedure, the evaluation of method performances is feasible. More effort is needed to develop powerful methodologies for the analysis of real data, accounting for many other sources of variability and nonstationarity affecting the climatic and environmental extreme processes.
Author Contributions: All authors contributed in equal measure to all stages of the development and production of this paper. All authors revised the manuscript and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The Mann-Kendall test is a trend detection tool widely diffused in environmental studies, due to its conceptual and practical simplicity, whose origins can be traced back to the seminal works of Mann [25] and Kendall [26]. Given a sample of length L, z = [z 1 , z 2 , . . . , z L ] , this test relies on the null hypothesis that data are independent and identically distributed, while the alternative hypothesis is that a monotonic trend can be detected [80].
The statistic of this test is defined as with g being the number of tied groups in sample z and t j the extent of j-th group.
Usually, this test is implemented standardizing the test statistic as In this way, it is obtained that Z follows a standard normal distribution with 0 mean and variance 1.