Temperature Extremes: Estimation of Non-Stationary Return Levels and Associated Uncertainties

: Estimating temperature extremes (TEs) and associated uncertainties under the non-stationary (NS) assumption is a key research question in several domains, including the nuclear safety ﬁeld. Methods for estimating TEs and associated conﬁdence intervals (CIs) have often been used in the literature but in a stationary context, separately and without detailed comparison. The extreme value theory is often used to assess risks in a context of climate change. It provides an accurate indication of distributions describing the frequency of occurrence of TEs. However, in an NS context, the notion of the return period is not easily interpretable. For instance, to predict a high return level (RL) in a future year, time-varying distributions must be used and compared. This study examines the performance of a new concept to predict RLs in an NS context and compares three methods for constructing the associated CIs (delta, proﬁle likelihood, and parametric bootstrap). The present work takes up the concept of integrated return periods that deﬁne the T -year RL as the level for which the expected number of events in a T -year period is one and proposes a new method based on conditional predictions that is useful for predicting high RLs of extreme events in the near future (the 100-year RL in the year 2030, for instance). The daily maximum temperature (DMT) observed at the Orange Station in France was used as a case study. Several trend models were compared and a new likelihood-based method to detect breaks in TEs is proposed. The analyses were conducted assuming the time-varying Generalized Extreme Value (GEV) distribution. The concepts have been implemented in a software package (Non-Stationary Generalized Extreme Value (NSGEV)). The application demonstrates that the RL estimates for NS situations can be quite different from those corresponding to stationary conditions. Overall, the results suggest that the NS analysis can be helpful in making a more appropriate assessment of the risk for periodic safety reviews during the life of a nuclear power plant (NPP).


Introduction
Nuclear energy is used to derive over 80% of the electricity in France. Nuclear power plants (NPPs) are not exempt from natural and climatic hazards (flooding, extreme temperatures and heat waves, earthquakes, etc.). The direct, indirect, and cumulative effects of floods, temperature extremes (TEs), and heatwaves impede the safe functioning of NPPs. Nuclear power is a low greenhouse gas emitter but, paradoxically, global warming is making the technology more vulnerable in terms of providing electricity. NPPs in France were constructed to withstand hot summers. In addition, operators are continually investigating prospective internal or external aggressions due to hot temperatures. Nevertheless, climate change and its impact on TEs remain an important issue for the Institute for Radiological Protection and Nuclear Safety (IRSN). During the last two decades, France and other European countries experienced three violent heatwaves (2003, 2006, and 2015) that gave rise to very high temperatures in summer. Urbanization and industrialization, the effect of natural climatic variability, and increased greenhouse gasses in the atmosphere have been suggested as the leading causes of the increase in duration, magnitude, and frequency of TEs and heatwaves in the context of the current climate evolution. Despite the fact that the nuclear facilities were designed to withstand these TEs with a low probability of failure, the current evolution of the climate (global warming) is considered in a too simplistic way by the existing frequency analysis models based on the extreme values theory.
In the current practice using probabilistic methods to estimate design variables, extreme events are generally assumed to be stationary (i.e., in an unchanging climate in a statistical sense) (e.g., [1,2]). The idea that some mechanisms in hydrological systems are time-invariant and the fact that hydrological predictions are based on assumptions that should include stationarity have been argued in the literature (e.g., [3][4][5]). We must keep in mind that these studies emphasize that any hydrological model, including deterministic and nonstationary approaches, is affected by uncertainty and, therefore, should include a random component that is stationary. From these specific studies, however, it is noted that the developments and applications of non-stationary (NS) models in hydrologic frequency analyses in a changing climate and environments can be frustrated when the additional uncertainty related to trend models, for example, is accounted for along with the sampling uncertainty [6].
Nevertheless, many studies in the last few decades have shown that climatic and meteorological records exhibit some type of non-stationarity such as trends and shifts [7][8][9]. Several approaches have been proposed in the literature to tackle non-stationarity in hydrometeorological and climatological extremes. The frequency analysis of TEs with time-varying distribution functions (or moments such as the mean and the variance) is one of the approaches that have been proposed in the literature to characterize TEs in an NS context (e.g., [10][11][12]).
To meet and satisfy safety requirements, estimate return levels with an appropriate confidence level becomes a major operational concern. Definitions of the return period in an NS context, its statistical significance, as well as its interpretation, have to be addressed properly and with extreme care when used for environmental, health, and safety concerns. However, a frequently occurring 'gap' is the widespread use of the time-varying frequency models to estimate high return levels (without a projection into the future) while assuming the classic return level concept used in stationary environments: the return level Z T for T years is the level for which the probability of exceedance every year is equal to 1/T. Nevertheless, the hypothesis allowing the use of this classic definition of the return period T no longer makes sense in an NS case [10,[13][14][15][16][17][18][19][20][21][22]. Some early studies on the topic have been described by Wigley [22,23] who presented, in a simple way, how non-stationarity may be considered in the classic concept of risk. Katz et al. [10] proposed the "effective return levels" concept to present non-stationarity in terms of time-varying quantiles while the occurrence probability of an extreme event remains constant. Alternatively, the Design Life Level concept was introduced by Rootzén and Katz [24] to estimate the probability of exceeding a fixed threshold during the design life of a project.
However, even before this, in a paper published in 1941 [25], Emil Gumbel, a pioneer in the field of statistics of extremes, had not only addressed the invalidity of this assumption but also thought about climate evolution that was identified half a century later. He cautioned that: "In order to apply any theory, we have to suppose that the data are homogeneous, i.e., that no systematical change of climate and no important change in the basin have occurred within the observation period and that no such changes will take place in the period for which extrapolations are made." Thus, early on, the use of the extreme values theory was limited by the homogeneity of data and stationarity of both the studied phenomenon (i.e., TEs) and the environment (i.e., an important evolution of urbanization), an issue that will receive attention in the present paper.
Unsurprisingly, some creative ideas have been proposed by researchers for communicating risk and helping to address such an important issue. The most important contributions propose two different definitions of the return period under non-stationarity. The first defines the T-year return level (RL) as the level for which the expected waiting time until the exceedance is T years and proposes the concept of "the expected waiting time before failure" as an equivalent definition of the return period under stationary conditions [18]. More recently, the notion of return period was extended to the NS setting by Parey et al. [19,26]; it defines the T-year RL as the level for which the expected number of events in a T-year period is one. These two definitions suggested by Olsen et al. [18] and Parey et al. [19] for communicating risk were reviewed, compared, and illustrated by Cooley [15] with an application to annual peak flow measurements for the Red River of the North, USA. Obeysekera and Park [17] also used non-stationarity concepts assuming the Generalized Extreme Value (GEV) distribution for estimating sea-level extreme events. The time-varying design level (expressed in return period of extreme events) of coastal facility protection was used. Another concept (replacing the average return period by one) was proposed more recently by Read and Vogel [20]. The authors introduced the reliability (or "risk of failure") as a new concept to communicate the risk of experiencing an exceedance event within a planning horizon. They defined the risk of failure over a planning period as the likelihood of experiencing at least one event exceeding the design event over a given project life of n years. This approach in a way originated from those proposed by Olsen et al. [18] and Parey et al. [19]. Read and Vogel [20] strongly believe that the probability of failure over an infrastructure's lifetime (or its associated reliability) is the most important piece of information an engineer can communicate, and rightly so given the huge importance of this information for the public and planners.
A related issue concerns the detection and estimation of break dates in a time series. Indeed, it is commonly known that climate and human activity always has a direct influence on some hydrometeorological processes. Large-scale climatic and oceanic phenomena such as El Niño and the North Atlantic Oscillation (NAO), as well as all kinds of large-scale urbanization and industry projects, may have a direct or indirect impact on some climatic parameters such as ocean and air temperature. Indeed, in addition to the current climatic evolution due to the emission of greenhouse gases into the atmosphere, these particular phenomena are likely to suddenly modify TEs [27]. Therefore, in a context of non-stationarity, identifying abrupt changes in the time series (and locating the times of significant changes) becomes an important step in modeling temporal trends and taking them into account in statistical inference (frequency analysis and associated uncertainties). Various methods have been proposed in the literature for change-point analysis. Non-parametric techniques were introduced for this problem (e.g., [28,29]). Pettitt [28] obtained exact and approximate results for testing the null hypothesis of no change. The study of Servat et al. [29] was based on a set of methods both for interpolation and for cartographic representation, as well as on statistical methods for the detection of breaks in the precipitation time series in Africa. Yu et al. [30] then proposed a general fuzzy piecewise regression analysis with automatic change-point detection. More recently, Xiong and Guo [31] adopted the Bayesian approach to detect a single change in the mean level of both the annual minimum flow series and the annual mean discharge series of the Yangtze River (in China) at the Yichang Hydrological Station. The Bayesian method has the advantage of making inferences on the posterior distribution with respect to change-point location.
Some R-packages have been developed for the analyses of extremes in an NS context exist (e.g., extRemes, GEVcdn, NEVA). The extRemes package [32] is based on the concept of effective return levels and was introduced by Katz et al. [10]. The GEVcdn package [33] supplies a framework for a conditional density estimation network. The package for Non-Stationary Extreme Value Analysis (NEVA) was developed by Cheng et al. [13] to estimate climate extremes using Bayesian inference. The NEVA software provides three approaches for estimating RLs in the NS context: (1) RL (for each return period) with a constant "design exceedance probability" during the lifetime considered for the design; (2) time-varying exceedance probability with constant thresholds; and (3) effective return levels as introduced by Katz et al. [10]. However, these packages do not provide any generalization of the concepts of NS return periods as addressed by Olsen et al. [18] and Parey et al. [18,19].
In this study, a framework for TE analysis under NS conditions is introduced. Based on the concepts proposed by Olsen et al. [18] and Parey et al. [18,19,26], two notions of NS return period have been implemented in a software package called Non-Stationary Generalized Extreme Value (NSGEV). Under the NS assumption, NSGEV provides two different kinds of predictions for estimating return levels in a future year: (a) conditional prediction (CP) in which the RL is conditional to a fixed date (i.e., the future year 2030, for instance); and (b) integrated prediction (IP) in which the RL is integrated over a future period (i.e., the concept introduced by Parey et al. [19,26]). A unique feature of NSGEV is that it provides CIs that are calculated with the three most widely used methods: delta, profile likelihood, and bootstrap. Furthermore, unlike existing tools, calculation of the CIs by profile likelihood is automated in NSGEV. Further noteworthy features of the NSGEV package are that it provides and plots the density functions at any future date, provides the number of exceedances of any extreme temperature (the higher value in observations or the 100-year RL, for instance), and uses the likelihood-based method to detect breaks in climatic and hydrometeorological time series. These features make NSGEV a practical tool that is particularly useful for carrying out expert assessments to characterize climatological and hydrometeorological hazards and analyze extremes in both stationary and NS contexts.
Overall, our goal is to build on the definitions and developments proposed in the literature and revive the debate as to how engineers can effectively communicate the risks associated with climatological and hydrometeorological hazards especially over medium-and short-term planning horizons. This goal is in line with the recent literature that challenges the stationarity assumption and clearly demonstrates the importance of conducting extreme-value analyses in a non-stationary context. In order to achieve this goal, a simple framework to estimate RLs under stationary and NS assumptions is introduced with an application to make it accessible to a broad audience in the field of NS extreme-value analysis and make it easily interpretable by engineers.
The paper is organized as follows: The theoretical basis for these approaches will be addressed in Section 2. The methodology of estimating the trends and break dates detection and estimation shall form the subject of the third section of this paper. Section 4 presents the NSGEV R-package, takes up the concept of integrated return periods introduced by Parey et al. [19,26], and proposes a new one based on conditional predictions. In Sections 5 and 6, the concepts are applied using the daily maximum temperatures (DMTs) observed at the Orange Station in France as a case study. Finally, we conclude in Section 7 with recommendations concerning suitable statements of the unconditional return periods in non-stationary settings.

Extreme Values in a Non-Stationary Environment-Theory
Extreme Value Theory (EVT) is an appropriate framework for analyzing the risks associated with climate extremes and their return levels [10,34]. It is assumed here that the reader is familiar with statistical extreme value analysis using this theory and those who are not are referred to Coles [34]. In this section, we give a very brief background primarily to identify the notation used here. In this section, we give, nevertheless, the theoretical elements necessary for generally understanding the EVT and the concepts and definitions that will be proposed in this paper.
One of the three limiting distributions-Gumbel, Fréchet, or Weibull-is often fitted to environmental time series of extreme values such as annual maxima [35]. These three distributions come from the same family, the generalized extreme value (GEV) distribution. The GEV distribution has been applied in a large number of studies to estimate and analyze extremes [11,[35][36][37]. The frequency model using this family of distributions is often referred to as the block maxima approach (e.g., [34]). A similar theory holds for excesses over a high threshold. The peaks-over-threshold (POT) approach in which the excesses are analyzed with the Generalized Pareto (GP) distribution is another widely used frequency model based on the EVT (e.g., [34]). The GEV distribution depends on three parameters: location (µ), scale (σ), and shape (ξ) in a cumulative distribution function expressed as: The location parameter specifies the center of the distribution, the scale parameter gives an indication of the size of deviations around the location, and the shape parameter governs the tail behavior of the GEV distribution. For NS processes and environments, it is often necessary to allow the underlying distribution function to be time dependent [14,21,32,38] by allowing its parameters (µ, σ, ξ) to depend on time or other covariates. This is commonly referred to as the time-varying frequency model, which is used as an alternative approach.

The Time-Varying Distribution and Methodology for Non-Stationarity
We propose three methodological issues in the present section. First, we use the time-varying GEV distribution to estimate TEs under the non-stationarity assumption. In this first step, trend identification and estimation are also performed. Second, in a context of non-stationarity, it is essential to investigate whether there are sudden changes in the time series and to locate the times of significant changes (e.g., [29]), if any. It is then useful to model the time series with multiple linear trends and propose a method for break date analysis. Third, we estimate the uncertainty by examining CIs for the time-varying GEV parameters and other quantities of interest such as the desired NS-RLs that are usually required for nuclear safety (100-year RL, for instance).

Time-Varying Distribution
The location and scale parameters are often time-varying (expressed as a function of time t) in the NS-GEV distribution (e.g., [39,40]). Other covariates can also be used [34] (only time is used herein). As is usually done, the location and scale parameters are assumed to be polynomial functions of time to account for non-stationarity (Equation 4) while keeping the shape parameter constant. Indeed, even for the stationary case, reliable estimates of the shape parameter cannot be obtained easily and that is why it is advisable to assume it as a constant (e.g., [11,34]), unless there is a convincing reason to let it be time-varying. Another reason for keeping the shape parameter constant is to reliably model its temporal variability that requires long observation periods (often not available for practical applications). It should be noted that the method developed in the ensuing sections are not constrained by the assumption of the constant shape parameter. The cumulative NS-GEV distribution function is given by the following equation ( [34]): Many methods for estimating the distribution parameters are available. The most popular include the maximum likelihood (e.g., [34]), probability weighted moments (e.g., [41]), L-moments (e.g., [42]), and Bayesian methods (e.g., [43]). Without carrying out a comparative study of these methods, we merely describe briefly the maximum likelihood method used in the present work. It is commonly known that this approach may give unrealistic estimates for the shape parameter when relatively short samples are used (e.g., [44]). Nevertheless, it remains that for applications on extreme temperatures, enough data are typically available to expect that the maximum likelihood method would be stable. However, more interestingly, this method is applied herein because it allows one to easily incorporate time (or any other covariates) into the parameter estimates. The maximum likelihood method is more straightforward than most of the alternative methods for obtaining error bounds for parameter estimates.

Estimation of Non-Stationarity and Time-Varying Distribution Parameters
To use the best time-varying GEV distribution, criteria for selecting the best time-varying model among the GEV models that allow non-stationary scale and location parameters are compared. The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are always used to detect non-stationarity and to select an adequate distribution. In addition, the likelihood ratio test (or the deviance criterion) can be used to compare the fit of two nested models: the stationary case (with likelihood L{null}) is represented by the null model whereas the alternative is the NS case (L{alternative}). The log-likelihood ratio statistic can be expressed as follows [13,34]: To analyze low frequency NS TEs, it is useful to detect and estimate the temporal trends. Regression structures allowing quadratic dependence on time can be considered for µ and σ parameters: In climate and hydrometeorology literature, linear or log-linear trends are usually preferred when selecting trends in the occurrence of extremes [45]. In this study, only non-stationarity with respect to location and scale parameters is discussed. The simplest case with a linear regression for the location parameter µ(t) = µ 0 + µ 1 t, while keeping the scale and shape parameters constant, has often been considered in the literature (e.g., [14]). Three general trend models are tested: • M 1 : only the trend after the break date is considered; • M 2 : both trends, before and after the break date, are considered; • M 3 : there is no break date: a single trend over the entire dataset.
A trend model is better than another if it is more significant (at 90% and 95% levels). For convenience, only linear models are used for the temporal evolution of DMTs. Trend models are compared using the Akaike and Bayesian information criteria (AIC and BIC). The likelihood ratio test is used as well. For each model M i (i = 1, 2, 3), three cases are considered: Following the same logic, a model M(1, 2) corresponds to an NS linear model for µ(t) and an NS quadratic model for log σ(t). On the other hand, it is commonly known today that the estimation of the shape parameter ξ may sometimes cause difficulty, as observed by Coles and Dixon [46] who proposed the use of a penalized likelihood function to avoid this problem. Similarly, Martins and Stedinger [47] proposed restricting the estimate of ξ to fall within a restricted range. However, we did not encounter any difficulty in the estimation of ξ in the present work.

Break Dates
Without claiming a breakpoint, if there were any, the use of a frequency model would be impaired by the use of a single trend representative of all data. Indeed, the use of a single trend, where a trend break exists, would introduce a significant bias in the results. On the other hand, considering the breakpoint and taking into account only the post-break data would significantly reduce the size of the sample and this would be an equally poor solution.
As mentioned in the introductory section, many approaches have been proposed and applied in the literature for change-point analysis [28][29][30][31]. In this study, a new likelihood-based approach was adopted to detect change-points in the temporal evolution of annual maximum temperatures. Unlike existing methods and tests for break-point detection (except for non-parametric tests such as the Pettitt one [28,29]), which generally assume that observations are normally distributed (such as the standard normal homogeneity test (SNHT) introduced by Alexandersson (1986) [48] and the Buishand range test [49]), the likelihood-based method has the advantage of making inferences on extreme value distributions (i.e., GEV, GP). In addition, it allows the use of multiple linear trends and exploits all available data considering the post-break slope, as well as the one prior to the occurrence of each proposed breakpoint. Still, even if this strategy uses the suitable breakpoint of a specific series, it has its limitations. Indeed, it is not uncommon for the slopes on either side of the breaking point to be of opposite signs, i.e., a negative trend before the break and a positive one after. A single and generalized trend model (one trend model for all the data) is adopted in this case (the one after the break, of course). A single trend, as will be shown later in this paper with the case study, should be used if no break date has been identified.
Two types of models can be used. In the first model with one trend, there is only one trend after the break date (and no trend before). In the second model, there can be two different trends, a trend before and a trend after the break date. As mentioned earlier, the break date is determined using the principle of maximum likelihood. Indeed, m models (with a different break for each) are tested where m is the number of years of data available. In the developed procedure, each year of the data period is assumed to be a potential candidate for a break year. Looking through all the years of the period of interest, a log-likelihood profile is computed and plotted. Finally, the maximum of the likelihood profile, corresponding to the most appropriate break date, is extracted. It is to be noted that the developed method that optimizes the breakpoint with the likelihood function by evaluating the most likely break dates is not yet implemented in the NSGEV package.

Uncertainty Associated with Non-Stationary Extreme Events
Generally, when examining TEs, we are interested in asking the question: How often do we expect a nuclear power plant to experience an extreme temperature? Additionally, if that happens, how high will the temperature be? To answer these questions, we need to calculate the return level for a given return period. The 1/p return levelẑ p (computed from the GEV distribution) is the 1 − p quantile and it is given by:ẑ where y p = − log(1 − p). A degree of uncertainty in the estimates of a return level is then closely related to that of the model parameters. It is a common belief today that a good estimation of uncertainty in extreme levels can be as important as the estimate of the level itself (e.g., [34]). It is then noteworthy that a frequency model predicts well the future return values only if it produces RL estimates that fit inside a CI. Three methods for computing CIs-delta, profile likelihood, and bootstrap-are used here in stationary and NS contexts. Appendix A presents a description of these methods that are quite well documented in the literature. Standard errors (and their corresponding confidence intervals) of the GEV distribution parameters and of return levels of interest were estimated and examined for each method. Despite the fact that the variance and confidence intervals based on the maximum likelihood estimators and delta method are asymptotic, the scientific community no longer has any doubt that parametric confidence intervals based on a normal distribution approximation cannot be expected to be accurate for extremes; that is, their actual coverage probabilities will not be close to the nominal values [50]. Uncertainty in predictions also stems from model selection. The CIs are based on the assumption that the correct model has been selected and take no account of the alternatives that could be considered. Probabilistic approaches exist and are used in many contexts to overcome this objection but are not considered here. To make the best possible use of the expertise, we compare the three methods to best characterize the uncertainties. This comparison, which sometimes leads us to make conservative choices, gives us an indication of the conditions under which one method or another is more appropriate. However, in some cases, the estimation of extreme quantiles can inherently be so uncertain that the discrepancy between different types of confidence intervals is not of major relevance. Furthermore, in the nuclear safety field in France, the 70% confidence interval is considered "reasonable" for most reference situations (as the upper bound of the confidence interval is often involved in the estimation of the return level of interest). It is an expert choice used by French nuclear operators.

The NS Return Period-NSGEV Package Features
As explained in the introductory section of the present paper, the definition of the RL (i.e., the T-years RL is the level for which the probability of exceedance is 1/T each year) no longer holds in an NS case. Indeed, depending on the identified temporal trend, the level at which the expected number of its exceedances in the next T years will be one must be defined. In the context of non-stationarity, the RL value varies over time and it is in this perspective that the NSGEV package has been developed. Existing packages (e.g., extRemes, GEVcdn or NEVA) estimating extreme values in an NS context consider time as any other explanatory variable. There is, therefore, no "time-varying" model and this is the object of the NSGEV package. It is developed to estimate and to extrapolate (beyond the observation period) to the desired return level of extreme events and to evaluate the risk associated with the NS phenomena. Time, then, becomes an important variable in the return level equation (model becomes conditioned by time). The initial date and the number of years considered to make the prediction are the two key parameters of the RL equation in an NS context. Therefore, the return level Z T for T years starting from an initial date t 0 (i.e., 2017) will be defined.
Two notions of the NS return period have been implemented in the NSGEV (two kinds of predictions are allowed): 1.
Return period conditional to a fixed date: relative to a future block (i.e., 2030). It is conditional to the explanatory variable, which is the date of this block. In the introductory section, this is called the conditional prediction (CP).

2.
Return period integrated over a future period: as proposed by Parey et al. [19], the calculated RL corresponds to an expected number of exceedances equal to one over this period. The user must select the first year of the projection and then calculate the predictions for different periods starting from this year. This is what we called the integrated predictions (IP).
One key advantage of the first concept is that, unlike the IP, we do not need to assume that the current trend will remain unchanged until far horizons. Indeed, this is an attractive feature in the analysis and prediction of TEs, especially for those who do not venture into planning and projecting far into the future (i.e., medium and long-term) when dealing with a temperature-related project. Furthermore, for the needs of expertise and periodic evaluations of the safety demonstration of NPPs that operators can put forward, we need to predict high RLs (the 100-year RL, for instance) in the near future (10-30 years, for instance) instead of RLs corresponding to the short-term horizons. That is why the unconditional definition of a return period, as introduced by Parey et al. [19,26], may not be fully relevant for describing the recurrence of extreme temperature events (100-year RLs, for instance) in the near future. Alternatively, the conditional definition, introduced herein, provides high RLs but for reasonable horizons and without making any assumption about the persistence of trends. The two definitions are implemented in NSGEV with the three methods of calculation of the CI: delta, profile likelihood, and bootstrap. Furthermore, unlike existing tools, the calculation of the CIs by the likelihood profile is automated in NSGEV.

Case Study: The Orange Station in France
The TE frequency analysis was performed at the Orange Station, which is located in Southern France. The 1952-2016 DMTs were provided by the French National Meteorological Service (Météo-France). One of the most important features of the Orange Station is the fact that the region in which this station is located has experienced important TEs and heatwave events during the last two decades (e.g., European heatwave of the summer of 2003, the heatwave of the summer of 2006 in Western Europe, and the heatwave of the summer of 2015 in France). Figure 1 displays the geographic location of the Orange Station. The Orange DMT series is considered by Météo-France as homogeneous. It is considered to be one of the best and longest daily data available at Météo-France [19]. As part of a climate watch and a periodic nuclear safety review, all reference extreme temperatures (in all the meteorological stations representative of NPPs in France) are subject to revision. The developed methodology was applied to no fewer than 10 sites, which allowed us to assess the general validity of the methodology. The model was applied in all its stages except in two situations where opposite trends (negative before the breaking points and a positive trend after) were obtained. Only the trend after the breakpoint was considered for these two cases. Aside from the validity of the proposed methodology, the main recommendation of the exercise (in its regional context) was to conduct a more in-depth study to regionalize the trend models.

Results and Discussion
We report herein the results of the conditional and integrated prediction methods applied to the Orange Station. As with any sensitive facility, high return levels (RLs) (the 100-year hot temperature, for instance) in a future year are needed for the safety of NPPs. Furthermore, for the needs of expertise and periodic evaluation of the safety demonstration of NPPs, we need to predict these high RLs with the associated upper bounds of the CIs in a near future (10-30 years, for instance). Despite this, we have taken the liberty to present integrated predictions covering the period up to 2118 (a 100-year NS return period). These results are presented in the form of probability plots and tables summarizing the estimates of the desired RLs and associated CIs calculated using three methods: delta, profile likelihood, and bootstrap. In the probability plots, the time-varying GEV distribution function is represented by the solid red line in the figures while the hatched area represents the 70% CIs. The CPs will first be performed considering three arbitrary dates and then presented. The integrated predictions will be presented afterward. Other features of the NSGEV package, as well as additional results, will likewise be presented. However, even before that, we need to identify and estimate the optimal trend characterizing the TEs at the Orange Station.

Optimal Trend
For convenience, only linear models are used for the temporal evolution of DMTs. As mentioned above, trend models are compared using the AIC and BIC. The results of the trend estimations are summarized in Table 1  The analysis of the results shows that the AIC and BIC give an advantage to the M 3 model with an NS linear trend on the location parameter. Furthermore, the outputs of the likelihood ratio test show that the trend model M 3 with an NS linear evolution of the location parameter and with no break date (line 1 and column 6 of Table 1) is more significant (at 95% levels). The comparison based on the likelihood ratio statistic give an advantage to the M 3 trend model with a p-value equal to 49.33% when compared to the most competitive non-stationary model (M 2 ) with linear trends on location parameter before and after the break date.
The latter result is coherent with those of the NSGEV likelihood-based algorithm to identify breakpoints. Indeed, the plot (not presented herein) of the likelihood as a function of time (candidate break dates) does not present any global extremum likely to correspond to a maximum of likelihood (whose projection on the X-axis gives a potential break date). It should be noted that these results are also consistent with those obtained by Laurent and Parey [51] for the Orange Station. However, this is not entirely consistent with the date of the onset of climate change (the 1970s) commonly shared in the literature and from which trends in extreme temperatures become significant. Figure 2 shows the time evolution of the annual maximum temperatures over the period 1952-2016. The NS linear evolution of the expected value (without any break date) is represented by the straight red line in the figure. Since the density of the annual maximum temperatures is close to that of a GEV distribution of which the parameters were estimated by the maximum-likelihood method, the expectation is given by µ + γσ where γ = 0.577 is the Euler-Mascheroni constant.

Conditional and Integrated Predictions
A convenient tool for estimating either conditional or integrated predictions is the "predict" function in the package NSGEV. The CP was performed for three dates (2017, 2030, and 2050) and the results are summarized in Table 2 and shown in Figure 3. It can be seen that under the non-stationarity assumption, the 100-year return level reaches about 44.0 • C by the year 2050. The results of the CP for the year 2050 are shown in the right panel of Figure 3. In this figure, the CI was calculated with the three methods (delta, profile likelihood, and bootstrap) considering a trend for the period 1952-2016. Looking out to the year 2050, the highest upper bound of the CI around the 100-year return level was obtained by the profile likelihood method and it is equal to 44.9 • C (44.6 • C with the delta method and 44.8 • C with the bootstrap method). For comparison purposes, we also analyzed the same DMT series (for the same period) under the stationarity assumption for which we focused our attention on the 100-year RL and associated CIs. This yields an estimate of 41.1 • C for the 100-year return level and 41.9 • C for the upper bound of the CI calculated with the bootstrap method. The intent of this analysis is only to illustrate how the RL estimates for non-stationary situations can be quite different and significantly larger than those corresponding to stationary conditions. As it can also be noticed in Figure 3, the CIs (calculated with the three methods) are relatively narrow and the relative width (∆CIs/RLs) of these intervals does not exceed 6%. They are likewise very similar and differences in the upper bounds are almost negligible (does not exceed 0.3 • C) for all of the dates.  When the predictions are integrated over a period with a start block date being chosen (say t 0 = 2018), the IPs corresponding to a return period of T years was calculated using the T years from t 0 to t 0 + (T − 1). Table 3 and Figure 4 show the results of the integrated predictions from 2018. For example, the temperature, exceeded on average once during the period 2018-2048 (30 years from 2018), is approximately 42.0 • C. The associated upper bound does not exceed 42.5 • C. As quadratic, or even linear, trends for the distribution parameters will eventually result in unrealistic predictions at a certain point from the time horizon of the sample, it is inappropriate to extrapolate the integrated predictions so far into the future and it is not reasonable to estimate the return levels on long time horizons (100 years, for instance). Estimates after a short time horizon of about 20 to 30 years are then considered as unreliable. As it is shown in Table 3 and Figure 4, estimates after a time horizon of 30 years are indicated as unreliable. Rather, our need and aim are to perform predictions of high RLs (with the associated upper bounds of the CIs) in 10-30 years, for instance. Despite this, we have taken the liberty of presenting integrated predictions covering a period to 2118 (a 100-year NS return period). The highest upper bound of the 70% CI around the 100-year return level was obtained by the delta method and it is equal to 47.3 • C.  As shown in Figure 4, CIs calculated with the bootstrap and profile likelihood methods are very similar for the entire distribution (left tail, body and right tail), especially for the upper bounds. They are slightly different from those obtained with the delta method, especially for the high return periods. The upper bounds given by these two methods are slightly lower than those provided by the delta method but the difference is only 0.2 • C. It is also interesting to note that whatever the method of their calculation, the relative widths of the CIs seem rather small.
The uncertainty is also shown in Table 3 presenting the 5-year to 100-year RLs and comparing the results of the methods of calculating CIs. These results underline the visual impression of Figure 4 and give the exact values of the return levels and the bounds of the CIs. The values indicate that the differences in CIs are small and almost negligible between the bootstrap and the profile likelihood methods. The worst-case uncertainty scenario suggests a relative width of the CI in the range of 5-6%. This holds for all the methods of calculating CIs and all the return periods of interest (30,50, and 100 years). It is important to understand that the very small relative width of the CIs, with the fact that these CIs (calculated by different methods) are very similar, is solely a statistical effect caused by the robustness of the frequency model, which is based on the concept of the integrated predictions from a given year and over a period of time. That is to say, if the IPs are estimated with sufficiently high precision, then the time-varying frequency model introduced in this paper will produce narrow CIs regardless of the method of their calculation.

Further Results
The developed package is particularly useful for carrying out an expert evaluation. The number of exceedances of any extreme temperature (the 100-years RL, for instance) observed on many simulated samples (using the parameters estimated from the initial sample) can be calculated with NSGEV. One-thousand sample images of the initial sample were generated over the period 2018 to 2118 (arbitrary choice). The 90%, 95%, and 99% extreme quantiles were calculated for each sample and are plotted in Figure 5 as black, red, and green lines, respectively. The average quantiles for each sample are also plotted (the bottom thick line). The number of times the T • (45.9 • C, which is the integrated prediction during the period 2018-2118) is exceeded was then calculated. This level is exceeded in 60% of the simulated samples. The highest upper bound of the 70% CI around the 100-year return level (47.3 • C) is also exceeded but in only 15% of the simulations. A further noteworthy feature of the NSGEV package is that it provides and plots the density functions at any future date. For illustration, the density functions for the dates 1 January 2017, 2040, 2070, and 2100 were calculated and plotted. These densities are shown in Figure 6. It can be seen, for example, that it was extremely difficult to observe the temperature of 42.6 • C in 2017, whereas from 2040 this temperature becomes very accessible (the green shaded area in Figure 6). It is noteworthy that the temperature of 42.6 • C was the highest temperatsure recorded (observed in the summer of 2003) at Orange Station.

Conclusions and Perspectives
In the present paper, we have provided detailed reasoning for the need to extend the use of the traditional definition of return level assuming stationarity (i.e., the return level Z T for T years is the level for which the probability of exceedance every year is equal to 1/T) with a new return level concept over planning horizons in a NS context as a metric for the design of structures and for communicating the risks associated with climate hazards.
Several ideas and approaches have been proposed in the literature to tackle non-stationarity in hydrometeorological and climatological extremes. The present work supports these ideas, takes up the concept of integrated return periods that defines the T-year RL as the level for which the expected number of events in a T-year period is one (introduced by Parey et al. [19]), and proposes a new method based on conditional predictions that are useful for predicting high RLs of extreme events in the near future (the 100-year RL in the year 2030, for instance). In the present work, the concepts of conditional and integrated NS return periods have been implemented in a software package called Non-stationary Generalized Extreme Value (NSGEV).
Another consideration in this paper was applying and illustrating these approaches on the example of DMTs at Orange Station, Southern France, over the period 1952-2016. The time-varying GEV distribution was used and CIs for the time-dependent parameters and RLs using three methods (delta, profile likelihood, and parametric bootstrap) were examined. Overall, the results suggest that the non-stationary analysis can be helpful in making a more appropriate assessment of the risk during periodic safety reviews on the NPPs life. The application demonstrates that the RL estimates for NS situations can be quite different from those corresponding to stationary conditions.
If the RLs performed using the conditional and integrated predictions have a sufficiently high precision, then the concept introduced will produce relatively narrow and very similar CIs, whatever the method of their calculation. However, unlike the IP concept, the conditional prediction does not use the assumption of current-trend persistence until far horizons. It provides high RLs for short-term horizons. This attractive feature makes it more interesting from a practical point of view.
However, to accomplish this, trends of the GEV distribution parameters are identified and estimated. In NSGEV, regression structures allowing quadratic dependence on time are allowed and can be compared to select the best time-varying model. To account for non-stationarity in this paper, it was found that a linear trend of the location parameter without a break date is the most appropriate for annual maximum temperatures observed at the Orange Station (the absence of a break date is specific to the Orange Station). The whole series has led to a better impact absorption of the 2003, 2006, and 2015 high temperatures and the trend has become more realistic in terms of its use in future projections.
An in-depth study could help to thoroughly improve the NSGEV package and apply the developed concept at other sites of interest. The concept of conditional predictions and methodology developed here and the integrated return level definition should find additional applications for the assessment of risk associated with other hazards in other climate and geoscience fields (e.g., coastal hazards).