Frequency-Domain Evidence for Climate Change

: The goal of this paper is to search for conclusive evidence against the stationarity of the global air surface temperature, which is one of the most important indicators of climate change. For this purpose, possible long-range dependencies are investigated in the frequency-domain. Since conventional tests of hypotheses about the memory parameter, which measures the degree of long-range dependence, are typically based on asymptotic arguments and are therefore of limited practical value in case of small or medium sample sizes, we employ a new small-sample test as well as a related estimator for the memory parameter. To safeguard against false positive ﬁndings, simulation studies are carried out to examine the suitability of the employed methods and hemispheric datasets are used to check the robustness of the empirical ﬁndings against low-frequency natural variability caused by oceanic cycles. Overall, our frequency-domain analysis provides strong evidence of non-stationarity, which is consistent with previous results obtained in the time domain with models allowing for stochastic or deterministic trends.


Introduction
While the presence of global warming is largely undisputed among experts and public climate change awareness has reached levels of over 90% in Western countries (Lee et al. 2015;Poortinga et al. 2018;Baiardi and Morana 2020), the verification of a change in the global surface temperature in terms of statistical significance remains a challenging task. The main problem is that there is no consensus about the true nature of the underlying process which has generated the observed temperature record. It may have a deterministic trend, a stochastic trend or even a combination of both. It may also share its trend in certain subperiods with other time series such as greenhouse gas emissions, which would allow us to draw conclusions about the cause of the warming trend. In order to address questions of this type, various sophisticated time series models have been employed in previous studies, a selection of which will be reviewed in Section 1. However, when we are only interested in the question whether the global surface temperature is changing or not, we may be able to choose a relatively simple setting and possibly avoid to impose implausibly strong assumptions.
In this paper, we consider only fractionally integrated processes with constant mean, and try to distinguish between the stationary case, where the memory parameter d is less than 0.5, and the non-stationary case, where d ≥ 0.5. In this setting, the rejection of the stationarity hypothesis H 0 : d < 0.5 can always be interpreted as evidence of climate change, regardless whether the rejection is due to a large value of d (infinite variance) or a nonconstant mean (deterministic trend). To emphasize the importance of including the non-stationary, yet mean-reverting case 0.5 ≤ d < 1 in the alternative hypothesis of climate change, we plotted realizations of fractional white noise for different values of d and different sample sizes. Figure 1g shows that even for a relatively large value of d in the stationary range, the long-term fluctuations are modest. In contrast, the changes are quite dramatic for d = 0.8. See Figure 1h. Changes of this magnitude would be disastrous in the case of the global temperature. However, when testing the hypothesis H 0 : d < 0.5, we have to strike a balance between simplicity and plausibility. On the one hand, we must allow for short-range dependence in order to keep the test realistic, hence using only fractionally integrated white noise is not an option. On the other hand, the class of data generating processes (DGPs) must also not be too rich because Pötscher (2002) has shown that inference on unit roots and cointegration as well as the problem of estimating the memory parameter fall into the category of ill-posed estimation problems under commonly used specifications for the set of feasible DGPs. For example, reliable unit root inference is not possible when this set is rich enough to contain all ARMA(1, 1) processes.
Conventional tests of hypotheses about the memory parameter d are based on the asymptotic normality of semiparametric estimators for d which are obtained from the periodogram ordinates at Fourier frequencies in a small neighborhood of frequency zero. The restriction to consider only the lowest Fourier frequencies certainly helps to avoid any distortions caused by short-range dependencies but it also compromises the large-sample assumption required for the asymptotics to kick in. Mangat and Reschenhofer (2019) therefore developed a small-sample test which requires only that the total sample size is large but not the number Fourier frequencies that are used in the test procedure. Building on this test,  introduced a new estimator for d, which performed well in terms of bias and root-mean-square error (RMSE) in an extensive simulation study. Both the test and the estimator will be used together with other methods in Section 3 to search for conclusive evidence against the stationarity of the global air surface temperature. Before, in Section 2, all employed methods will be described in detail. Section 4 concludes the paper.

Literature Review
A popular way to test for a trend in a time series is based on the simple linear trend model. Under the assumption that the errors are stationary with mean zero and variance σ 2 , the OLS estimator β for the slope parameter β is in case of a large sample size n approximately normally distributed with mean β and variance 12σ 2 /n 3 (Grenander and Rosenblatt 1957). Using the fact that the error variance σ 2 is proportional to the spectral density of the error process at frequency zero, it can be estimated by a suitable lag window estimator based on the sample autocovariances of the OLS residuals. However, as pointed out by Fomby and Vogelsang (2002), this approach breaks down when there is a unit root in the errors or the errors have long memory because in these cases the spectral density at frequency zero does not exist. In their analysis of six global temperature series, Fomby and Vogelsang (2002) therefore used the robust trend test proposed by Vogelsang (1998) instead and were able to reject the null hypothesis of a nonpositive linear trend at the 5% level in all but one case. The robustness of this test to strong serial correlation implies that it cannot be used for the detection of a stochastic trend, hence other tests must be used for this purpose.
Using various unit root tests to test for the presence of stochastic trends in three temperature series (northern hemisphere, southern hemisphere, global; period: 1865-1994), Stern and Kaufmann (1999) obtained conflicting results. The augmented Dickey-Fuller (ADF) test Fuller 1979, 1981) and the Kwiatowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski et al. 1992) suggested that all temperature series are I(1) (integrated of order one), which means that they require differencing to achieve stationarity, whereas the Phillips-Perron test (Phillips and Perron 1988) and the Schmidt-Phillips test  suggested that all temperature series are I(0) (trend stationary), which means that the subtraction of a deterministic trend is sufficient to achieve stationarity. Taking also the spatial distribution of the temperature into account, Chang et al. (2020) detected functional unit roots in time series of cross-sectional temperature densities.
Once we have established that surface temperature follows a stochastic trend, we might go one step further and investigate the causes for the changes in surface temperature. This could be done by testing whether surface temperature cointegrates with independent variables like greenhouse gases. The existence of a cointegrating relationship would indicate that there is a common stochastic trend and consequently also that human activity is responsible for the rising temperature. However, the methodological problems do not become any smaller when we move from a univariate analysis to a multivariate analysis. Indeed, the reports of cointegrating relationships by Kaufmann and Stern (2002) and Kaufmann et al. (2006) were challenged by Gay-Garcia et al. (2009) who argued that surface temperature is stationary around a broken linear trend. Kaufmann et al. (2010) responded that the in-sample forecasts generated by the cointegration model are more accurate than the in-sample forecast generated by the broken trend model. Conceding that a single break may not be enough for the rejection of non-stationarity, Lai and Yoon (2018) tried up to three breaks and were then able to reject the unit root hypothesis in favor of stationarity about a broken trend for both global and hemispheric temperatures. They used the unit root test proposed by Harvey et al. (2013), which extends the minimum GLS detrended Dickey-Fuller (MDF) test of Perron and Rodríguez (2003) by allowing for multiple breaks.
However, the evidence presented by Lai and Yoon (2018) must be interpreted with caution. Firstly, in relation to the modest sample size of n = 136 (observation period from 1880 to 2015) there are simply too many tuning parameters besides the maximum number of breaks, namely the trimming parameters 0 < τ L < τ U < 1 determining the times τ L n and τ U n of the earliest possible break and the latest possible break, respectively, the separation parameter η determining the minimum number ηn of observations between breaks, and the number p of lags used in the ADF-type autoregression or the maximum number of lags if p is chosen automatically, e.g., by the modified Akaike information criterion (MAIC) procedure of Ng and Perron (2001) and Perron and Qu (2007). Moreover, Lai and Yoon (2018) also considered three different types of breaks occurring either in the intercept, in the slope, or both in the intercept and the slope. Less problematic are the specifications required for the calculation of the asymptotic critical values because they leave no room for data snooping when the values provided by Harvey et al. (2013) are used. Secondly, the underlying asymptotics of this test is intuitively not very plausible. Suppose that three breaks in 150 years are a realistic estimate, then we would expect six breaks in 300 years and 30 breaks in 1500 years. Why should the maximum number of breaks remain fixed when the sample size increases? This can only happen when we keep the length of the observation period fixed and increase only the sampling frequency. For example, when we switch from annual observations to weekly observations, the sample size n increases dramatically from n to n * = 52n but the number of breaks does not change and the minimum number of observations between breaks increases accordingly from ηn to ηn * . Clearly, this kind of asymptotics is inappropriate for the investigation of climate change. Not much new can be learned about the long-term trend of the global surface temperature when weekly, daily or hourly measurements are used instead of annual measurements. Unfortunately, the "right asymptotics", where the number of breaks grows at the same rate as the length of the observation period, does not appear to be a practical option.
Instead of assuming that surface temperature and greenhouse gases have stochastic trends and searching for evidence of a cointegrating relationship, we may as well assume trend-stationarity and search for a common deterministic trend in order to establish that human factors are responsible for the rising temperature. Estrada et al. (2013) provided evidence for the existence of such a trend, which is characterized by common breaks in the slope. Perron and Yabu (2009) designed a test for structural breaks in a deterministic trend, which does not require any prior knowledge as to whether the deviations from the trend are difference-or trend-stationary. Applying this test to various temperature series (land and sea, global, northern and southern hemispheres) as well as series of greenhouse gases and other factors contributing to radiative forcing (difference of the sunlight absorbed by the Earth and the energy radiated back to space), Estrada et al. (2017) found breaks in the slope of the trend function in almost all series but also large differences in the break date estimates. They attributed these differences to natural variability, e.g., low-frequency oscillations such as the Atlantic Multidecadal Oscillation (AMO) and shorter-term variations such as El Nino/Southern Oscillation (ENSO). The results of cotrending analyses of Estrada et al. (2017) and Estrada and Perron (2019), which take the AMO into account, show that temperatures and forcing variables share the same long-term trend, which is highly influenced by greenhouse gases.

Methods
In light of the discussion in Section 1, we will employ low-order fractionally integrated ARMA (ARFIMA) models for the investigation of the long-range dependencies in the global surface temperature series. Our goal is to find conclusive evidence against the null hypothesis that the memory parameter d is less than 0.5. If d ≥ 0.5, the variance will diverge and the time series will always tend to leave the previous range. Suppose that the observations (y t ) n t=1 can be described by an ARFIMA process (Granger and Joyeux 1980;Hosking 1981), where L denotes the lag operator and (u t ) is white noise with mean zero and variance σ 2 . The memory parameter (or fractional differencing parameter) d is a measure of the degree of long-range dependence while the autoregressive parameters φ 1 , . . . , φ p and the moving average parameters θ 1 , . . . , θ q model the short-range dependence. Since we are only interested in long-range dependence and the ARMA-order (p, q) is usually unknown anyway, we adopt a semiparametric approach and estimate d independently of the ARMA parameters. This can be achieved by carrying out the estimation in the frequency domain and focusing on the low frequency components. Indeed, the ARMA component of the ARFIMA spectral density is roughly constant for small values of ω and can therefore be approximated by in a neighborhood of frequency zero. Frequency-domain estimators for d and frequency-domain tests of hypotheses about d will be discussed in Sections 2.1 and 2.2, respectively, and later employed for the analysis of the temperature series in Section 3.

Estimation of the Memory Parameter
The periodogram is a sample version of the spectral density. It is given by and is related to the sample autocovariancesγ(j) via where are the Fourier frequencies. Taking the logarithm of each side of (3) and adding the logarithm of the periodogram to both sides, we obtain log(I(ω j )) = log(4 −d f 0 (ω j )) + d(−2 log(sin(ω j /2))) + log(I(ω j )/ f (ω j )).
Since f 0 (ω) is approximately constant near frequency zero and the ratios (Künsch 1986), the memory parameter d can be estimated by a simple linear regression with log(I(ω j )) as dependent variable and −2 log(sin(ω j /2)) as independent variable (Geweke and Porter-Hudak 1983). The resulting estimator for d is denoted aŝ d GPH when the K lowest Fourier frequencies are included in the regression and asd tr when the H very lowest Fourier frequencies are trimmed away.
Since the periodogram is a very volatile estimator for the spectral density, smoothing is an obvious option to boost its performance. Accordingly, we may replace the periodogram ordinates I(ω j ) occurring in the log periodogram regression either by simple moving averages like (I(ω j−h ) + · · · + I(ω j+h ))/(2h + 1) or by the more sophisticated lag window estimator (Hassler 1993;Peiris and Court 1993;Reisen 1994) where w is a lag window satisfying w(0) = 1, |w(s)| ≤ 1 and w(−s) = w(s). A commonly used lag window is the Parzen window given by In contrast to the last two estimators, which depend also on the additional tuning parameters H and m, respectively, the estimatord W depends only on the number K of included Fourier frequencies.
It is obtained by maximization of the Whittle-likelihood in the narrow frequency band (0, ω K ] (Künsch 1987). An approximation of the Whittle-likelihood is given by (Robinson 1995). Similarly the estimatord KS is obtained by minimization of the test statistic of a two-sided Kolmogorov-Smirnov goodness-of fit test applied to the cumulative normalized periodogram ). This estimator is closely related to the small-sample test which will be described in the next subsection and which will be the most important tool in our empirical study of the temperature data.

Testing
The asymptotic normality of the estimatord GPH established by Hurvich et al. (1998) under the assumption that K = o(n 4/5 ) and log 2 (n) = o(K) can be used to test hypotheses about the memory parameter d. The problem with this approach is that not only the sample size n must be large but also the number K of included Fourier frequencies. A popular choice for K is √ n . The length of our temperature series is 170, hence K = 13, which is definitely not a large number. Alternatively, we may also use the standard variance formula of the least squares estimator of the slope in a simple linear regression instead of the asymptotic variance. However, Mangat and Reschenhofer (2019) found that the test based on the asymptotic variance has very low power and the test based on the standard variance formula tends to over-reject under the null hypothesis. They therefore developed a new frequency-domain test which works also when K is small.
Their test is based on the assumption that a relatively small number of normalized periodogram ordinates at low Fourier frequencies are approximately i.i.d. Exp(1), possibly after omitting the very first frequencies, and consequently also that their cumulative sum is approximately distributed as the order statistics of a random sample from a uniform distribution on [0, 1], which is plausible in the light of the asymptotic result by Künsch (1986) and the small-sample evidence provided by Mangat and Reschenhofer (2019). Clearly, the normalization requires the knowledge of the true value of d. If the value specified in the null hypothesis is wrong, there will be a discrepancy between the empirical distribution function of the sample,F, and the cumulative distribution function of the uniform distribution, F U , which can be detected with a goodness-of-fit test like the Kolmogorv-Smirnov test. Compared to the small-sample test used by Reschenhofer (2013), this new test has the advantage that it allows for more general null hypotheses and does not require tables of critical values.
The empirical distribution function will be convex if the null hypothesis H 0 : d ≤ d 0 is false and concave if the reverse null hypothesis H 0 : d ≥ d 0 is false. Fortunately, these are exactly the cases where the Kolmogorov-Smirnov test is most powerful. Thus, we use the one-sided test D − KS based on the statistic in order to reject the hypothesis H 0 : d ≤ d 0 when we suspect that d > d 0 . Analogously, we use the one-sided test D + KS based on the statistic in order to reject the hypothesis H 0 : d ≥ d 0 when we suspect that d < d 0 . However, it is a priori not clear whether this test can also be employed in case of a null hypothesis such as H 0 : d < 0.5, which requires the examination of the non-stationary borderline case d = 0.5. It will be shown in the next section how this can be done.

Empirical Results
Important theoretical results for the analysis of long-range dependence have typically been derived under the assumption −0.5 < d < 0.5, hence they cannot be used for the construction of tests of hypotheses such as H 0 : d ≤ 0.5 and H 0 : d ≤ −0.5, which are the practically testable versions of the hypotheses of interest, H 0 : d < 0.5 and H 0 : d < −0.5, respectively. Fortunately, there is a big difference between the two cases, which still allows us to get meaningful test results. For example, when we test the hypothesis H 0 : d ≤ 0.4 instead of the hypothesis H 0 : d ≤ 0.5 for the original time series, we clearly cannot interpret a rejection of that hypothesis as evidence against stationarity because the alternative hypothesis still contains the stationary region (0.4, 0.5). In contrast, when we test the hypothesis H 0 : d ≤ −0.4 instead of the hypothesis H 0 : d ≤ −0.5 for the differenced series, a rejection of that hypothesis implies that d > −0.4 for the differenced series and consequently also that d > 0.6 for the original series, which is strong evidence against stationarity. In our empirical study, we will therefore also have a closer look at the first differences of the temperature series.
The main datasets used in our investigation were the hemispheric and global annual means from 1850 to 2019 of the combined land and marine temperature anomalies (HadCRUT4; Morice et al. 2012), the land air temperature anomalies (CRUTEM4; see Jones et al. 2012), and the sea surface temperature anomalies (HadSST3; see Kennedy et al. 2011). These datasets were developed by the Climatic Research Unit of the University of East Anglia and the Met Office Hadley Centre and are available at the website http://www.cru.uea.ac.uk/. Temperature anomalies were defined relative to the 1961-1990 temperature mean. Secondary datasets were the NASA Goddard Institute for Space Studies (GISS) hemispheric annual means from 1880 to 2019, which are available at the website https://data.giss.nasa.gov/gistemp/ (Lenssen et al. 2019). In addition, we also use the annual unsmoothed (1856-2019) and smoothed (1861-2014) AMO series obtained by averaging from the corresponding monthly series, which are available at the website https://psl.noaa.gov/data/ timeseries/AMO/ of the NOAA Physical Sciences Laboratory.
First, we looked at the global means of the combined land and marine temperature anomalies from the HadCRUT4 dataset. The central question was whether the clearly noticeable recent rise in temperature (see Figure 2a) is just a transient phenomenon or a strong indication of non-stationarity. (e,f): Log periodogram ordinates log(I(ω k )) plotted against −2log(sin(ω k /2)) along with various estimates of d (d GPH : red,d tr : blue,d sm : green, d 0.9 smP : gold) based on K = √ n = 13 Fourier frequencies and parallel lines (gray, dotted) with slope 0.5, which represent the boundary of the region of non-stationarity (the slope is −0.5 in the case of the differenced series).
The periodograms of the temperature series and the differenced temperature series, which are displayed in Figure 2c,d, respectively, were consistent with a pole at frequency zero in the spectral density of the first series and a zero at frequency zero in the spectral density of the second series. A pole in an ARFIMA spectral density implied that the memory parameter d was greater than zero and a zero implies that d was less than zero, hence Figure 2c suggests that d > 0 for the original series and Figure 2d that d < 0 for the differenced series, which implied that d < 1 for the original series. More informative was a plot of the log periodogram against −2 log(sin(ω j /2)).The slope appeared to be greater than the slope of the lines in the background, which was 0.5 in case of the temperature series (see Figure 2e) and −0.5 in case of the differenced series (see Figure 2f), hence we concluded that 0.5 < d, which implied that the temperature series was non-stationary. This finding was corroborated by the estimates of d obtained with the estimatorsd GPH (log periodogram regression),d tr (trimming with H = 1),d sm (simple smoothing by averaging over neighboring periodogram ordinates with h = 1), d 0.9 smP (smoothing with Parzen window and truncation point m β , where β = 0.9), d W (narrow-band Whittle likelihood) andd KS and (Kolmogorov-Smirnov goodness-of-fit test), which were all greater than 0.86 (see Table 1).
Finally, in order to verify our findings in terms of statistical significance, we also employed the small-sample test described in Section 2.2. The hypotheses to be tested were of the form H 0 : d ≤ d 0 , where d 0 = 0.4, 0.49, 0.5 in case of the original temperature series and d 0 = −0.4 in case of the differenced series. For this type of hypotheses, the one-sided test D − KS is appropriate. The test results obtained by using K = 8, 10, 12, 14 Fourier frequencies are shown in the third column of Table 2. Clearly, the choice of a small value increased the probability of a type II error (non-rejection of a false null hypothesis) because of insufficient sample size whereas a large value increased the probability of a type I error (rejection of a true null hypothesis) because of distortions caused by short-range dependencies. For all values of d 0 , the null hypothesis could be rejected at the 5% level if K ≥ 12.
As pointed out earlier, the rejections for d 0 = 0.4, 0.49 were not conclusive because the alternative hypotheses still contained the stationary regions (0.4,0.5) and (0.49,0.5), respectively. The problem with d 0 = 0.5 was the lack of theoretical justification. We have therefore carried out a simulation study to search for possible overrejections. The results showed that this problem was of no great concern (see Table 3). However, the test results obtained with d 0 = −0.4 for the differenced series were significant throughout anyway.
Significance at the 10%, 5%, and 1% level is indicated by (*), (**), and (***), respectively.  In our frequency-domain analysis, we assumed that the periodogram in the neighborhood of frequency zero was not distorted by any short and medium-range dependencies. However, as pointed out by Estrada et al. (2013), there is some low-frequency natural variability caused by oceanic cycles which needs to be addressed. In the following, we will focus on the AMO, which is a striking pattern visible in a plot of the detrended North Atlantic sea surface temperature. Compared to shorter-term variations such as the ENSO, the AMO has a much greater amplitude and a much longer period (six decades or more). Naturally, it has a large influence on the northern hemisphere climate. Estrada and Perron (2019) rejected the unit root hypothesis in favor of trend stationarity with a break in the rate of growth both for temperature and forcing series and observed that all series have approximately the same break dates once the AMO is filtered out from the global and northern hemisphere temperature series. Because of the different degrees to which the two hemispheres are affected by the AMO, we tested the stationarity hypothesis separately for the northern and southern hemispheres. For this purpose, we used the two hemispheric temperature series from the HadCRUT4 dataset. The results are shown in in the 5th and 6th columns of Table 2. Table 3. Rejection rates for the true null hypothesis H 0 : d ≤ d 0 at the 1% and 5% levels of significance obtained by applying the one-sided test based on D − KS with K = 6, 12, 18, 24 to 5000 realizations (n = 170) of a Gaussian fractionally integrated ARMA (ARFIMA) process with memory parameter d 0 and AR-parameter φ 1 .

1% 5%
d 0 φ 1 K = 8 K = 10 K = 12 K = 14 K = 8 K = 10 K = 12 K = 14 For the northern hemisphere (NH), they were similar to those obtained for the global series and generally weaker in case of the southern hemisphere (SH). Noticeable in the latter case was the discrepancy between the original series and the differenced series, which could not be explained by a simple type of short-term autocorrelation. Indeed, Table 4 shows that in the case of a simple ARFIMA(1,d,0) model, the number of rejections of a false null hypothesis H 0 : d ≤ d 0 generally decreased as d 0 increased from 0.4 over 0.49 and 0.5 to 0.6 (which corresponded to −0.4 after differencing). Clearly, the effect of differencing is less predictable in case of a large low-frequency oscillation such as the AMO. The best way to deal with the AMO properly is to remove it. The problem is that it is unknown and-despite its long history of possibly hundreds of years (see Delworth and Mann 2000)-also difficult to estimate because each cycle has a different period, phase, amplitude, and non-sinusoidal shape. Table 4. Rejection rates for the false hypothesis H 0 : d ≤ d 0 at the 5% level of significance obtained by applying the one-sided test based on D − KS with K = 6, 12, 18, 24 to 5000 realizations (n = 170) of a Gaussian ARFIMA process with memory parameter d A and AR-parameter φ 1 . By definition, the AMO is closely related to the NH sea temperature series. In Figure 3h, this close relationship is illustrated graphically by applying the Hodrick-Prescott (HP) filter with a medium and a large smoothing parameter to the NH sea temperature series and comparing the difference with the smoothed AMO series. The similarity to the smoothed AMO series was still visible when the global sea temperature series (Figure 3g) or the corresponding combined sea and land temperature series (Figure 3b: NH, Figure 3a: global) are used instead. This meant that, in principle, an oscillation like the AMO could also have been defined on the basis of the global combined temperature series. However, in this case, we would be reluctant to filter out the whole oscillation because we would be aware that it is the sum of an unknown "periodic" component and random fluctuations with similar periods. This line of reasoning is corroborated by Figure 4d, which shows the log periodogram of the global temperature series after the removal of the AMO (by a regression on the smoothed AMO or on an estimate obtained by applying a HP-filter to the unsmoothed AMO), and the corresponding figures for the two hemispheric series (Figure 4e,f). While the third periodogram ordinate (from the right because the log periodogram was plotted against −1 times the log frequency) was well above the regression lines in Figure 2e, it was now substantially smaller, which indicates that the AMO removal has possibly gone too far. Noting that the AMO can be approximated very well by a sinusoid with a period of 61 years and still quite satisfactorily by a sinusoid with a period of n/3 = 56.6 years (see Figure 4a), which corresponds to the third Fourier frequency, we may alternatively account for the AMO in the frequency domain just by omitting the third periodogram ordinate. For a robustness check of our earlier test results, we employed the test based on the statistic D − KS again, first after the removal of the AMO in the time domain by regressing the temperature on the smoothed AMO and second after the omission of the third periodogram ordinate. The fact that the first approach probably removed too much and the second approach too little is not the only possible cause for discrepancies in the results. There was also a difference in the sample sizes which was due to the fact that the AMO series were shorter. We therefore used the HadCRUT4 dataset only for the examination of the effect of the AMO-removal in the frequency domain. For the AMO-removal in the time domain, we used the shorter NH temperature series from the GISS dataset and, in addition, replaced the smoothed AMO series by the series obtained by applying the HP-filter with λ = 1000 to the unsmoothed AMO. Table 2 shows that the test results obtained for the global temperature series hardly changed when the third Fourier frequency was omitted. Furthermore, comparing the last two columns of Table 2, which show the test results for the original and the AMO-detrended NH temperature series, respectively, we again found no noticeable differences, hence we concluded that the rejection of the null hypothesis of stationarity was not just due to the AMO.  In the second column, the log periodogram ordinates log(I(ω k )), k = 1, . . . , 13, obtained from the original temperature series (black) and two versions of AMO-detrended temperature series (smoothed AMO: red, HP-filter: green) are plotted against −2log(sin(ω k /2)) for different regions ((d): global, (e): northern hemisphere, (f): southern hemisphere). The parallel lines (gray, dotted) with slope 0.5 represent the boundary of the non-stationarity region.

Discussion
In this article, we have employed new frequency domain-methods for the investigation of global and hemispheric temperature series in order to obtain conclusive evidence of a long-term persistent change in the climate. Regardless whether there is a deterministic upward trend, a stochastic trend, or too strong serial correlation in the temperature series, there will always be a very steep increase in the spectral density in the neighborhood of frequency zero, which can be estimated or subjected to a formal statistical test. The advantage of this focus on a narrow frequency band is that no strong assumptions about the data generating process are required. The disadvantage is that the proof of a steep increase implies only that the temperature series are non-stationary but does not provide any details about the true nature of these series.
While establishing evidence of global warming in a statistically sound way is important in its own right, issues of more practical relevance must, of course, still be studied in a more concrete setting. For example, employing highly complex econometric models, Morana and Sbrana (2019) found that global warming (which is driven by radiative forcing) might directly enhance Atlantic hurricanes intensity/risk, which is inconsistent with the steady decline in the risk premia for catastrophe bonds since their introduction in the mid-1990s. Pretis (2020) showed that energy-balance models, which describe the change in temperatures as a function of incoming and outgoing radiation, can be mapped to cointegrated vector autoregressions, which allows the use of standard econometric techniques for the estimation of climate responses and the testing of their feedback.
Unfortunately, conventional tests of stationarity are based on asymptotic arguments and are therefore not really appropriate for temperature series, which are relatively short because there are too few data available from the period before 1850 (or even 1880 under more stringent criteria) for the estimation of global averages (for analyses of longer time series which are based on estimates rather than measurements, see, e.g., (Mills 2012;Davidson et al. 2016)). We have therefore employed a new small-sample test proposed by Mangat and Reschenhofer (2019), which is based on the comparison of the cumulative normalized periodogram in a narrow frequency band close to frequency zero with the cumulative distribution function of the standard uniform distribution. This test requires only that the standard distributional assumptions for the included periodogram ordinates hold approximately, which is plausible when the sample size is not too small. However, in contrast to conventional frequency-domain tests, the number of included frequencies do not have to be large too. The results obtained with this test after suitably transforming the temperature series and slightly modifying the null hypothesis are highly significant. These findings are corroborated by the estimates of the memory parameter obtained with various methods, which lie all in the region of non-stationarity.
However, even when the frequency band under consideration is very small, there is still the danger of distortions caused by low-frequency oscillations such as the AMO. Since the AMO mainly influences the northern hemisphere climate, we checked the robustness of our findings by testing the hypothesis of stationarity separately for the NH temperature series after the removal of the AMO. Regardless whether the AMO was removed in the time domain or in the frequency domain, the test results remained significant. We conclude that our empirical evidence of a persistent change in the surface temperature is genuine and not just due to a stationary phenomenon like the AMO.
Author Contributions: Both authors have contributed equally to all aspects of the work. They have both read and agreed to the published version of the manuscript.
Funding: This research received no external funding.