Monitoring Cointegrating Polynomial Regressions: Theory and Application to the Environmental Kuznets Curves for Carbon and Sulfur Dioxide Emissions

This paper develops residual-based monitoring procedures for cointegrating polynomial regressions (CPRs), i.e., regression models including deterministic variables and integrated processes, as well as integer powers, of integrated processes as regressors. The regressors are allowed to be endogenous, and the stationary errors are allowed to be serially correlated. We consider five variants of monitoring statistics and develop the results for three modified least squares estimators for the parameters of the CPRs. The simulations show that using the combination of self-normalization and a moving window leads to the best performance. We use the developed monitoring statistics to assess the structural stability of environmental Kuznets curves (EKCs) for both CO2 and SO2 emissions for twelve industrialized countries since the first oil price shock.


Introduction
This paper develops residual-based monitoring procedures for structural change in cointegrating polynomial regressions (CPRs), using the terminology of Wagner and Hong (2016). CPRs are regression models that include as explanatory variables deterministic terms, integrated processes, and integer powers of integrated processes. The regressors are allowed to be endogenous, and the stationary errors are allowed to be serially correlated. Structural change-at an unknown point in time-can occur in two facets: First, the relationship may turn into a spurious relationship. 1 Second, the parameters of the relationship may change. The developed monitoring statistics extend those of Wagner and Wied (2017) in two dimensions. First, a variety of monitoring statistics is considered, including self-normalized versions and moving window detectors. 2 Second, the approach is extended from cointegrating linear to cointegrating polynomial regressions.
All considered monitoring statistics are based on parameter estimation for the CPR relationship over a calibration period known to be-or at least assumed to be-free of structural change, an approach to monitoring inspired by Chu et al. (1996). With regressors that 1 In the CPR setting the concept of spurious regression has to be interpreted a bit wider than in cointegrating linear regression settings. If, e.g., the polynomial degree of a CPR relationship increases at a certain point in time but one continues to consider a CPR relationship with an unchanged polynomial degree, then the error term of this spurious relationship contains higher order powers of an integrated process and is, thus, not integrated, as in the usual form of spuriousness considered in linear cointegration. 2 Some of these possibilities have been mentioned in Wagner and Wied (2017, Footnote 4) but have not been explored in full detail and systematically. changes in environmental legislation that has been put in place in the 1970s. 6 Taking into account that the polynomial functional form should probably be interpreted more as an approximation to an underlying relationship of unknown form rather than as a true relationship, of course, implies a trade-off (at least in-sample) between finding structural breaks and approximation with a higher polynomial degree. 7 Against this background, it may be interesting to use monitoring tools to study whether, and at which point in time, the EKC relationship needs to be modeled with more curvature: Over the calibration period , for most countries (as minimum polynomial degree), a cointegrating linear relationship prevails. 8 This is not too big of a surprise since, about until the mid 1970s, economic activity expanded roughly in line with emissions in many countries. Only thereafter, and triggered-as mentioned-by price, technical, and legislative changes the economic activity and pollution start to be decoupled to a certain extent. From a CPR perspective, this could mean either a structural change in the parameters of a relationship of given degree or a change to a relationship with a higher polynomial degree, e.g., from a linear to a quadratic relationship. For both CO 2 and SO 2 emissions, for nine of the twelve countries structural breaks are detected. The detected break points are, in some cases, quite late, which most likely reflects the delays inherent in monitoring procedures. The evidence, when considering also the full sample, is mixed with respect to structural change in the parameters but unchanged polynomial degree or structural change also with respect to the polynomial degree. The monitoring decisions lead to, as a simple empirical cross-check, good results in the following sense: For those country-pollutant combinations where no structural break is detected, using the specification and parameter values from the calibration period leads to good fit also for the full period until 2016, with obviously even better fit when re-estimating the relationship over the full sample.
The paper is organized as follows: Section 2 contains the setting, assumptions, monitoring statistics, and asymptotic results. Section 3 discusses finite sample simulation results. Section 4 presents the monitoring application to CO 2 and SO 2 emissions data. Section 5 briefly summarizes and concludes. Appendix A contains all proofs. In addition, four supplementary appendices are available: Supplementary Appendix B discusses local asymptotic power properties. Supplementary Appendix C contains additional finite sample simulation results, and Supplementary Appendix D presents additional empirical results. Supplementary Appendix E includes tables with critical values for the detectors for a broad variety of specifications relevant for EKC-type analysis. MATLAB code-including the critical values for the mentioned variety of specifications-for the monitoring statistics developed in this paper is available upon request. 9 We use the following notation: Definitional equality is signified by := and ⇒ denotes weak convergence. x denotes the integer part of x ∈ R and diag(·) denotes a diagonal matrix. For a vector x ∈ R n we denote by x 2 := ∑ n i=1 x 2 i and for a matrix M ∈ R m×n we denote by M := sup x Mx x . E(·) denotes the expected value and L is the backwardshift operator, i.e., L{x t } t∈Z := {x t−1 } t∈Z . The first-difference operator is denoted with ∆ := 1 − L. We denote the kdimensional identity matrix with I k . A Brownian motion with covariance matrix specified in the context is denoted by B(r), and W(r) denotes a standard Brownian motion. 6 This means that we use our monitoring tools for an ex-post analysis rather than "true" online monitoring. 7 This is obvious since one can achieve perfect fit with a polynomial of degree sample size minus one. There is an ongoing discussion in the EKC literature concerning appropriate functional form and estimation strategies (see, e.g., Bertinelli and Strobl 2005;Millimet et al. 2003;Schmalensee et al. 1998). Inverted U-shaped relationships are also considered, e.g., in the intensity of use or material Kuznets curve (MKC) literature that investigates the potentially inverted U-shaped relationship between gross domestic product (GDP) and energy or metals use per unit of GDP (see, e.g., Grabarczyk et al. 2018;Guzmán et al. 2005;Labson and Crompton 1993;Stuermer 2018), for which the tools developed in this paper may also be useful. 8 A cointegrating linear relationship implies tautologically that CPRs with higher polynomial degrees are also present, albeit with (theoretically) zero coefficients to the higher order powers. 9 The MATLAB code can be straightforwardly modified to other specifications to obtain additional critical values; under the assumption of full design.
Under the null hypothesis, no structural change occurs, that is θ 1 = θ and {u t } t∈Z is an I(0) process, with detailed assumptions specified below, throughout. Under the alternative hypothesis, either some parameters change or the relation turns spurious, i.e., {u t } t∈Z turns into an I(1) process for every θ ∈ R q+p , or both at a sample fraction rT that has to be-as discussed in the introduction-larger than mT for some 0 < m < 1. 10 In formal terms: . . , rT is I(0) and u t , t = rT + 1, . . . , T, is I(1).

Remark 1.
Note that the regression model given in (1) is a special case of the CPR model considered in Wagner and Hong (2016) since only one of the integrated regressors, w.l.o.g. x kt , is allowed to enter the regression model with powers larger than one. Whilst this is, obviously, restrictive compared to the case where higher order powers of all elements of x t can be present as regressors, this special case covers EKCs and MKCs and similar applications. The mathematical reason for considering this special case is that it allows for-potentially up to a scalar nuisance parameter that can be consistently estimated and scaled out-nuisance parameter-free limiting distributions of the considered detectors that can be simulated. In the terminology of Vogelsang and Wagner (2014b), a cointegrating polynomial regression needs to exhibit full design to allow for asymptotic standard inference by scaling out a scalar long-run variance. This is the case for EKC-type relationships with only one integrated regressor present with powers larger than one but also for some other economically relevant more general cases of CPRs, e.g., Translog functions (see, e.g., Christensen et al. 1971). Given our focus on EKCs, we abstain from formulating the results here in the most general form; the required extensions are straightforward. For even more general specifications that do not exhibit full design, a sub-sampling approach may be considered relying upon similar arguments as discussed in Wagner and Hong (2016, Proposition 6). The performance of sub-sampling-based procedures, however, suffers particularly from short sample periods, as also illustrated by the simulations reported in Wagner and Hong (2016). Consequently, a sub-sampling-based approach cannot be expected to perform well in a monitoring context.
The assumptions on the deterministic components, the regressors, and error terms are similar to the assumptions used in Wagner and Hong (2016) and, more implicitly, in Wagner and Wied (2017). In particular, Assumption 2 is sufficient for a functional central limit theorem to hold for {η t } t∈Z : with the positive definite long-run covariance matrix Ω := ∑ ∞ j=−∞ E(η t−j η t ) and W(s) := [W u·v (s), W v (s) ] a (k + 1)-dimensional vector of standard Brownian motions. We also define the half long-run covariance matrix ∆ := ∑ ∞ j=0 E(η t−j η t ). The matrices Ω and ∆ are partitioned according to the partitioning of B(s), i.e., Using, e.g., the Cholesky decomposition of Ω yields: where ω 2 u·v := Ω uu − Ω uv Ω −1 vv Ω vu and λ uv := Ω uv (Ω 1/2 vv ) −1 . The conditional long-run variance ω 2 u·v is a key quantity that needs to be estimated for all but the two self-normalized detectors.
Where needed, consistent long-run covariance estimation is performed non-parametrically, requiring the choice of both a kernel function and a bandwidth parameter. The inputs in the non-parametric estimation are the OLS residuals from estimating (1) over the calibration period and the first difference of x t over the same period. For consistent long-run covariance estimation, it suffices to assume (following, e.g., Jansson 2002): Assumption 3. The kernel function k(·) satisfies: All our monitoring statistics discussed in the following subsection are based on consistent parameter estimators that are required to lead to limiting distributions that are nuisance parameter free up to a scalar parameter, the conditional long-run variance ω 2 u·v , that can (asymptotically) be scaled out, either by scaling by a consistent estimator, which we refer to later as standardized, or by self-normalization.
As indicated, estimation takes place on the calibration sample t = 1, . . . , mT for some 0 < m < 1 that is known to be generated under the null hypothesis. This approach to monitoring, based on parameter estimation on a calibration sample known to be-or at least assumed to be-free of structural change, has been popularized in the econometrics community by the seminal work of Chu et al. (1996).
The cointegration literature provides a variety of modified ordinary least squares estimators of θ with the required asymptotic properties; see, e.g., Wagner (2018), for a survey. All these estimators commence from the fact that the OLS estimator of θ is consistent with-in case of regressor endogeneity and error serial correlation-a limiting distribution that is contaminated by second order bias terms. These second order bias terms are removed, one way or another, by the various modifications of OLS. In this paper, we consider three modified OLS estimators: Fully Modified OLS (FM-OLS), Dynamic OLS (D-OLS), and Integrated Modified OLS (IM-OLS). These three estimators were originally developed for cointegrating linear regressions: FM-OLS in Phillips and Hansen (1990); D-OLS in Saikkonen (1991), Phillips andLoretan (1991), or Stock andWatson (1993); and IM-OLS in Vogelsang and Wagner (2014a). The extensions to the CPR setting are discussed for FM-OLS in Wagner and Hong (2016), for D-OLS in Choi and Saikkonen (2010), and for IM-OLS in Vogelsang and Wagner (2014b). 11 Our brief discussion of the three estimators first necessitates the definition of a few more quantities, i.e., Z t := [D t , X t ] and y + t,m := y t − ∆x tΩ −1 vv,mΩvu,m , with the second subscript m indicating that estimation of the long-run covariances is-as mentioned-also based on the calibration sample t = 1, . . . , mT . Furthermore, define: 11 To be precise, Choi and Saikkonen (2010) propose an extension of the dynamic regression approach, adding leads and lags of the first differences of the integrated regressors, to a more general setting than CPRs. Given that the CPR model is linear in parameters, D-OLS can be extended relatively straightforwardly, without the need to resort to modified nonlinear least squares type estimators. Vogelsang and Wagner (2014b) consider an extension of IM-OLS to general multivariate polynomials allowing also for arbitrary cross-products of powers of integrated regressors. Stypka and Wagner (2020) extend the FM-OLS estimation principle to this more general polynomial-type setting.
Long-run covariance estimation uses the OLS residuals of (1) from estimation over the calibration period t = 1, . . . , mT in conjunction with the first differences of the integrated regressors, i.e.,η t,m := [û t,m , v t ] , withû t,m denoting the OLS residuals here: In both the finite sample simulations, as well as the application, we use for long-run covariance estimation, in line with Assumptions 3 and 4, the Bartlett kernel with bandwidth chosen according to Newey and West (1994). With all required quantities defined, the FM-OLS estimator computed over the calibration sample is given by: While FM-OLS is based on a two-part nonparametric transformation to remove endogeneity and serial correlation related bias terms from the limiting distribution of the OLS estimator, D-OLS is based on a more "classical projection and orthogonalization" argument by performing OLS estimation in an augmented version of the CPR regression (1), with leads and lags of the first differences of x t added as regressors to "clean the limiting distribution". The D-OLS regression-estimated by OLS over the calibration sample-is given by: with the number of leads (d 1 ) and lags (d 2 ) chosen to ensure consistent parameter estimation of θ with a limiting distribution that is-up to a scalar-nuisance parameter-free. In general, this requires that d 1 , d 2 → ∞ at suitable rates. More specifically, in our finite sample simulations and the application, we choose leads and lags using the AIC-type criterion of Choi and Kurozumi (2012). The resultant OLS estimators of θ andΘ j from (14), estimated over the calibration sample, are referred to asθ D m andΘ D j,m , respectively. The third estimation principle addresses endogeneity correction by partial summation. Define for a sequence z t , t = 1, . . . , T, the partial summed variable by S z t := ∑ t j=1 z j , t = 1, . . . , T. Then, the IM-OLS regression-estimated by OLS over the calibration sample-is given by: The OLS estimators of θ and ϕ from (15) estimated over the calibration sample are referred to asθ I m andφ I m , respectively. Note that endogeneity correction in the IM-OLS estimator does not require any lead-lag or kernel-bandwidth choices, as it suffices to simply add the original integrated regressor vector x t to the partial summed regression.
The key input for the monitoring statistics discussed in the following subsection are the residual (processes) obtained with these three estimators. In particular, the asymptotic null behavior of the residual (partial sum) processes is the key ingredient to derive asymptotic properties of any of our variance-ratio type monitoring statistics. This result is formalized in the following lemma, for which formulation requires to define some additional (asymptotic) quantities first, i.e., W v (s) : Lemma 1. Let the data be generated according to (1) and (2) with Assumptions 1 and 2 in place. Furthermore, let long-run covariance estimation be performed under Assumptions 3 and 4 and let lead-lag choices be made as discussed in Choi and Kurozumi (2012). Based on estimation over the calibration period t = 1, . . . , mT -with the estimatorsθ F m ,θ D m andθ I m as discussed before-denote the FM-OLS residuals by: the D-OLS residuals by: For T → ∞, it holds under the null hypothesis for m ≤ s ≤ 1 that: 12 The lemma shows that indeed all three partial sum processes of the residuals converge to processes that are (i) functionals of standard Brownian motions, W v (r) and W u·v (r), and (ii) proportional to ω u·v , a scalar nuisance parameter that can be consistently estimated and hence scaled out from the limit processes or that can be eliminated by self-normalization. The limiting null distributions of test statistics based on the (normalized) limit processes consequently can be obtained by simulating the corresponding functionals of standard Brownian motions. Note that the FM-OLS and D-OLS residual partial sum processes converge to the same limiting process, which is a consequence of these two estimators having identical limiting distributions.

The Monitoring Statistics
Similarly to Wagner and Wied (2017), the starting point of our monitoring statistics is to combine the approach of Chu et al. (1996) with variance-ratio statistics that diverge under the alternative. More specifically, the underlying variance-ratio (full sample) statistic motivating the construction of our monitoring statistics is the Kwiatkowski et al. (1992) sta-tionarity test, respectively, the related Shin (1994) cointegration test. 13 Using our notation, the (full sample) Shin-statistic is given by: withû t denoting the residuals from (full sample) estimation with, e.g., FM-OLS or D-OLS.
In case that IM-OLS is used for estimation, the resultant residuals are already partial summed quantities, i.e., one immediately obtains (by construction) quantities to insert into the expression in the second line of (22). The test statistic given above converges under the null hypothesis to a functional of standard Brownian motions, which is as expected when considering (22), where convergence to the squared integral of a standard Brownian motion follows immediately from our assumptions if instead ofû t the errors u t were used (and scaling would take place by a consistent estimator of ω 2 u ). Usingû t instead of u t leads to a similar result, but with a different (specification dependent) functional of standard Brownian motions after scaling out ω 2 u·v rather than ω 2 u . To be precise, when using FM-OLS or D-OLS for parameter estimation, the limiting null distribution will be a function of W u·v,m (r). When using IM-OLS for parameter estimation, the limiting null distribution will be a function of P u·v,m (r); see Proposition 1 below.
The above test statistic (22) can be easily seen to diverge under the alternative hypothesis of a structural break occurring after the calibration period. Consider, e.g., the FM-OLS residuals (with the argument entirely analogous for all three considered estimators) using our already established notation: Now, suppose that at some time point rT > mT a structural change occurs. If, e.g., {u t } t∈Z turns from being I(0) to I(1), then the first term in (24) diverges for s > r. Similarly, the third or fourth term (or both) diverge in case of change in the parameter vector, i.e., when θ 1 = θ, as, of course,θ F m → θ, because of parameter estimation on the calibration sample.
We consider, for each of the three considered estimators-neglecting for notational brevity the dependence of the residuals and, thus, the test statistics on the estimation method-, five monitoring statistics: The monitoring statisticĤ m (s) given in (25) is of the same form as the monitoring statistic used in Wagner and Wied (2017) considered here in the CPR context. The monitoring statisticĤ m d (s) given in (26)-with a term calculated only over the calibration sample subtracted-is of a similar form as used in Chu et al. (1996). The third variantĤ m sn (s) given in (27) is a self-normalized statistic, for which, under the null hypothesis, both the numerator and denominator converge (appropriately scaled) to functionals of standard Brownian motions proportional to ω u·v , which is hence scaled out in the ratio. Long-run covariance estimation is known to be a notoriously problematic aspect in unit root and cointegration analysis; therefore, test statistics that do not require this step may exhibit better performance. The fourth considered variant is a moving window statisticĤ m,n mov (s) given in (28) with n denoting the moving window (sample fraction or) length. The key difference between the moving window detector and the expanding window detectors is thatĤ m,n mov (s) is based on a constant number of residual partial sums for all values of s. This construction increases, under the alternative hypothesis, the impact of post-break residuals on the test statistic, which is ex ante expected to lead to faster detection of structural breaks. The performance of the fourth variant will depend on the length of the moving window, to be chosen in applications. Finally, the fifth monitoring statistic H m,n mov,sn (s) given in (29) combines self-normalization and moving window estimation, with the performance as for the fourth variant expected to depend upon the moving window length. 14 The following proposition summarizes the asymptotic behavior of the monitoring statistics under the null hypothesis. Proposition 1. Let the data be generated according to (1) and (2) with Assumptions 1 and 2 in place. Furthermore, let long-run covariance estimation be performed under Assumptions 3 and 4 and let lead-lag choices be made as discussed in Choi and Kurozumi (2012). In case parameter estimation is performed with FM-OLS or D-OLS, the limiting process Q u·v,m (s) below equals W u·v,m (s), and, in case parameter estimation is performed by IM-OLS, it equals P u·v,m (s). The 14 To be precise, only when usingĤ m sn (s) orĤ m,n mov,sn (s) in conjunction with D-OLS or IM-OLS, no long-run covariance estimators are required, whereas estimated long-run covariances are required for FM-OLS estimation. For D-OLS, still, lead-lag length choices have to be made, and only when usinĝ H m sn (s) orĤ m,n mov,sn (s) in conjunction with the IM-OLS estimator does no kernel/bandwidth or lead-lag choices have to be made. In this case, the only choice to still be made when usingĤ m sn (s) is the length of the calibration sample, a choice required throughout. In case ofĤ m,n mov,sn (s), both the calibration sample length m and the moving window length n have to be chosen. defined monitoring statistics converge under the null hypothesis for T → ∞; in particular, it holds that:Ĥ It is widely-used practice in monitoring to base the decision not on monitoring statistics as just defined, but on monitoring statistics divided by a weighting function, say g(s). For chosen weighting function g(s)-with 0 < g(s) < ∞-, the null hypothesis is rejected, if the weighted monitoring statistic Ĥ (s) g(s) is larger than a critical value c for the first time. We denote this point in time as detection time τ m , i.e., withĤ(s) short-hand notation for any of the considered detectors. 15 In case no structural change is detected, i.e., A finite value of τ m not only indicates a structural break but also contains information about the location of the potential break point.
Weighting function and critical value have to be chosen so that, under the null hypothesis, it holds that: with α denoting the chosen significance level, and H(s) short-hand notation for the limit corresponding to the considered monitoring statistic. Considering only positive and bounded weighting functions, also seen in Aue et al. (2012, Assumption 3.6), allows us to derive the required result given above based on the developed asymptotic null behavior of the monitoring statistics and the continuous mapping theorem.
Proposition 2. Let the data be generated according to (1) and (2) with Assumptions 1 and 2 in place. Furthermore, let long-run covariance estimation be performed under Assumptions 3 and 4 and let lead-lag choices be made as discussed in Choi and Kurozumi (2012). In addition, assume that the weighting function g(s) is continuous and bounded. Then, there exist critical values c = c(α,Ĥ, g, m, n) such that, for any 0 < α < 1, it holds that: 16 Clearly, the choice of a weighting function g(s) impacts the performance of monitoring procedures and has to combine two opposing goals: (i) small size distortions under the null hypothesis and (ii) small delays under the alternative hypothesis, that is, detection of a break as soon as possible after the break. The discussion in Chu et al. (1996, Section 3) makes clear that it is, in general, even in more standard regression models, impossible to derive analytically tractable optimal weighting functions (from a certain class of functions), e.g., with respect to minimal expected delay. 17 Given the lack of analytical results concerning optimal choices of weighting functions, we have performed a large number of preliminary simulations using a range of candidate weighting functions. 18 The starting point of these considerations is Wagner and Wied (2017), who choose the weighting function in relation to the expected value of the monitoring statistics, resulting in g(s) = s 3 in case D t = 1 (intercept only) and g(s) = s 5 in case D t = [1, t] (intercept and linear trend). In case of a linear trend, we have, in addition, experimented with g(s) ∈ {1, s 10 , the last two functions inspired by Horváth et al. (2004). 19 It turns out that no weighting, i.e., g(s) = 1 does not lead to favorable performance compared to g(s) = s 3 or s 5 . The function s 10 is chosen by "extrapolation" of the fact that s 5 works better than s 0 = 1. The idea of the third and fourth functions is-merely the result of some experimentation and heuristics-to make the detector more sensitive by increasing the value of the statistic, whilst at the same leaving the critical values effectively unchanged. The effects are to a certain extent as expected, without, however, leading to overall better performance. Taking s 6 = (s 3 ) 2 or s 10 = (s 5 ) 2 as weighting functions does indeed lead, e.g., to earlier detection times, however, often also to detections in cases when there is no structural change, i.e., these functions lead to larger over-rejections under the null hypothesis. The two functions s 5(0.5+m) , s 5(0.85+m) 2 , where we have also experimented with other powers and values, try to strike a balance between earlier rejections and size distortions. Altogether, however, the simple functions s 3 and s 5 perform most stably over a variety of configurations, in terms of comparably low over-rejections under the null hypothesis as first priority and short delays in the detection times. 20 Finally, the two functions inspired by Horváth et al. (2004), where we have also experimented with different powers, lead to essentially the same null rejection probabilities and size corrected power as, e.g., s 3 or s 5 , but lead to partly substantially bigger delays than the other weighting functions. Therefore, we stick to the weighting functions already used also in Wagner and Wied (2017), i.e., the end point of the considerations is the starting point. As mentioned, it remains an open challenge to make progress on finding optimal weighting functions for the monitoring problem and detectors considered in this paper, or more generally when monitoring cointegrating relationships. 16 By construction, the critical values depend on n only for the moving window detectors. 17 Aue et al. (2009) derive the limiting distributions of the delay time for a one-time parameter change in a linear regression model with stationary errors for a simple class of weighting functions depending only upon a single (tuning) parameter. The situation is much more involved in our context and any result concerning asymptotic distributions of delay times will depend upon intricate crossing-probability calculations involving complicated functions of Brownian motions. Results in this direction, therefore, appear to be very hard to obtain, at least for us. 18 We have performed the type of simulations reported in Section 3 investigating the performance with respect to null rejection probabilities, size corrected power, and detection times for all weighing functions discussed here. The simulations in Section 3 report the results based on the overall best performing weighting function, s 3 (intercept only) and s 5 (intercept and linear trend). 19 In the intercept only specification, the set of functions considered are given by {1, s 6 , s 3(0.5+m) , s 3(0.85+m) 2 , observations are similar for both specifications of the deterministic component. 20 More specifically, the simpler functions lead to the lowest over-rejections almost throughout, the "race" in terms of size-corrected power is relatively even, and, in some cases, the more complicated weighting functions, particularly s 3(0.85+m) 2 or s 5(0.85+m) 2 , lead to slightly smaller delays.
It remains to characterize the asymptotic behavior of the proposed monitoring procedures under the relevant alternatives in our setting: First, the error process {u t } t∈Z changes its behavior from I(0) to I(1), i.e., it changes to being an integrated process, and, second, there are breaks in (some of) the parameter values. For both cases, we consider the asymptotic behavior against fixed and local alternatives, with the local alternatives having to be specified, as always, in line with the convergence rates of parameter estimation.
Proposition 3. Let the data be generated for t = 1, . . . , rT according to (1) and (2) with Assumptions 1 and 2 in place. Furthermore, let long-run covariance estimation be performed under Assumptions 3 and 4 and let lead-lag choices be made as discussed in Choi and Kurozumi (2012). In addition, assume that the weighting function g(s) is continuous, positive, and bounded. Furthermore,Ĥ(s) again denotes any of the considered monitoring statistics. fulfilled.
Then, the monitoring procedures are consistent, i.e., for any 0 < c < ∞, it holds that for all t > rT , where {u 0 t } t∈Z and {γ t } t∈Z are independent processes and where {γ t } t∈Z fulfills an invariance principle with long-run variance ω 2 γ > 0 and δ > 0; The local asymptotic power (LAP) properties of the procedures, with respect to break type, estimation method, self-normalization, and window size of the moving window detectors, are discussed in some detail in Supplementary Appendix B. 21 The LAP results carry over by and large to similar relative performance findings in the finite sample simulations presented in Section 3, e.g., in case of I(1) breaks, a small moving window leads to both highest LAP, as well as highest size-corrected finite sample power. Another example is power differences across estimation methods, with IM-OLS dominated by FM-/D-OLS in terms of LAP and also in finite sample size-corrected power for large sample sizes.
The critical values depend on the specification in several ways: on the specification of the deterministic component, the number of I(1) regressors, the highest power of the single integrated regressor that enters the CPR with higher order powers, the estimation method, and-as discussed in Proposition 2-the detector, the weighting function, the calibration fraction, and for the moving window detectors, in addition the window size.
Supplementary Appendix E provides tables containing critical values for the usual deterministic components, i.e., an intercept only and intercept and linear trend, and up to four integrated regressors, of which one regressor enters the model with up to power three. Furthermore, the critical values are available for all considered detectors, FM-/D-OLS and IM-OLS estimation and significance levels of 0.01, 0.025, 0.05, 0.1. The weighting function g(s) is set to s 3 and s 5 , depending on the deterministic specification. Critical values are available for a fine grid of the calibration fraction m ranging from 0.1, 0.11, . . . , 0.9. For the moving window detectors, critical values are available for the window sizes equal to 10%, 20%, and 30% of the sample size. 22 It remains to clarify the "meaning" of m and T for the monitoring procedures (also see p. 967 in Wagner and Wied 2017). It is convenient to interpret T as the sample size including the out-of-sample monitoring period. Let T 0 denote the length of the actually available sample-in our application, in Section 4, T 0 = 71 with annual data from 1946-2016. Then, denote T = T 0 + H, with H > 0, indicating that out of sample monitoring is intended, and H = 0, indicating that monitoring takes place on a historic data set, as in Section 4. The fact that the critical values depend on m, and the moving detectors on n, means that a decision has to be made about the length of the calibration period, potentially the length of the moving window and about the out-of-sample monitoring period H prior to the analysis. The latter necessity renders our procedure a closed-end monitoring procedure. The calibration period will be chosen as large as possible (as a sub-sample 1, . . . , T C of 1, . . . , T 0 ) to increase the precision of the parameter estimates while avoiding the risk of having a structural break in the calibration period. Now, m is given by m = T C T 0 +H . Thus, choosing H larger implies that m is smaller, which in turn implies that the critical values are larger (since they are decreasing in m). This decreases ceteris paribus, despite asymptotic size control, the empirical rejection probabilities under both the null and the alternative. This is the reason why one should choose the monitoring period as short as possible, a calibration period as large as possible, and an out-of-sample monitoring period as short as possible.
Finally, similar to Wagner and Wied (2017), the procedures are consistent also against a variety of other forms of structural changes, with all results following more or less straightforwardly from the construction principle. Of course, the finite sample performance may be relatively poor in some cases.
Remark 2. The developed monitoring procedures are, in addition to the results provided in Proposition 3, also consistent against the following types of structural change: (i) The process {u t } t∈Z changes its behavior from I(0) to being a near-integrated process, compare Phillips (1987), from rT + 1 onwards. In this case, effectively, under the alternative functionals of Wiener processes will be replaced by functionals of Ornstein-Uhlenbeck processes.
The rates of divergence are the same as for the I(1) alternative. (ii) Similarly, consistency also prevails in case {u t } t∈Z changes its behavior from rT + 1 onwards to being fractionally integrated, compared to Davidson and de Jong (2000), with fractional integration parameter 0 < f < 1/2. In this case, contrary to item (i), the divergence rate under the alternative changes and depends upon f since, under this alternative,

1
T 1/2+ f ∑ sT t= rT +1 u t converges to a fractional Brownian motion. Thus, the smaller f , the more difficult it will be be to detect this form of structural change.
(iii) The approach can also be employed for detecting bubbles. In the recent literature, a bubble is often characterized as a period where the behavior of a time series has switched to explosive behavior, compare, e.g., Phillips et al. (2011), Phillips et al. (2015a, 2015b, and Phillips and Shi (2018). Thus, our procedure allows us to detect (the beginning of) a bubble by considering the first difference of the series since, in the absence of a bubble, the first differences are stationary, whereas, in case of explosive behavior, the first differences also exhibit explosive behavior. (iv) In relation to the previous item, with bubbles typically considered to be temporary rather than permanent phenomena, it has to be noted that our procedures will be consistent in detecting episodes of I(1) or explosive behavior, as long as these episodes have asymptotically positive length. In, e.g., the case of only one period under the alternative, it has to hold that this period occurs over a sub-sample of the form r 1 T , . . . , r 2 T with r 1 < r 2 . It is immediate that consistency generalizes to multiple periods of this form.

Finite Sample Performance
Under the null hypothesis of no structural change, we consider the same data generating process as Wagner and Hong (2016, Section 3), i.e., we consider a quadratic cointegrating polynomial regression model: with the errors u t and ∆x t = v t generated as: where (e 1,t , e 2,t ) ∼ N (0, I 2 ). The parameter ρ 1 controls the extent of serial correlation in {u t } t∈Z and is set to ρ 1 = 1 after rT under the alternative of I(1) errors, whereas ρ 2 controls the extent of regressor endogeneity. The parameter values are θ D 0 = θ D 1 = 1, θ X 1,1 = 5 and θ X 1,2 = −0.3, with the values for θ X 1,1 and θ X 1,2 inspired by the FM-OLS EKC coefficient estimates for Austria (see Wagner 2015). 23 We provide simulation results for T ∈ {200, 500} and ρ 1 = ρ 2 ∈ {0, 0.3, 0.6, 0.9}. 24 For the moving window and self-normalized moving window detectors, we use window sizes n ∈ {0.1, 0.2, 0.3}. As indicated in the previous section, long-run covariance estimation is performed with the Bartlett kernel with bandwidth chosen according to Newey and West (1994). For D-OLS, estimation leads and lags choices are performed using the AIC-type criterion of Choi and Kurozumi (2012). The number of replications is 10,000 throughout. All monitoring decisions are performed at the nominal 5% significance level.
We start the analysis by considering empirical null rejection probabilities. In doing so, we vary the calibration fraction over a grid of 81 values in the range m = 0.1, 0.11, . . . , 0.9. Two main observations that allow us to zoom in subsequently, mostly on the self-normalized detectors, emerge. Figure 1 clearly shows that with respect to null rejection probabilities, the detectors are separated in two groups, with the better performance offered by self- 23 The results are, of course, invariant with respect to the values chosen for the parameters θ D 0 , θ D 1 , θ X 1,1 , and θ X 1,2 . 24 Varying ρ 1 and ρ 2 separately has the following effects on the performance: Starting from a given pair of values for (ρ 1 , ρ 2 ), an increase of ρ 1 (alone) has bigger detrimental effects on larger over-rejections under the null hypothesis, smaller size-corrected power, and larger delays under the alternative than an increase of ρ 2 (alone) by the same amount. In this sense, serial correlation is ceteris paribus more detrimental than endogeneity, and the magnitude of ρ 1 effectively drives performance. For brevity, therefore, we only report results for the ρ 1 = ρ 2 cases.
normalization. Figure 2 shows that, when using moving window detectors, the choice of the window size has no visible impact on null rejection probabilities. 25 To assess the performance improvement that can be realized by self-normalization in some detail, Figure 3 comparesĤ m,0.1 mov andĤ m,0.1 mov,sn for T = 200. 26 This figure illustrates also other more generally observed patterns: First, using IM-OLS for parameter estimation leads to the smallest over-rejections under the null hypothesis. Often, and particularly for small values of m, D-OLS estimation leads to the largest over-rejections. The differences across estimation methods widen for increasing ρ 1 , ρ 2 , with D-OLS most strongly negatively affected. In addition, with the exception of D-OLS, self-normalization attenuates the detrimental impact of increasing ρ 1 , ρ 2 . An increasing sample size, of course, reduces over-rejections by and large.
We turn to size-corrected power. 27 For brevity, the main text focuses on size-corrected power against I(1) breaks, i.e., the situation where {u t } t∈Z changes its behavior from I(0) to 25 The null rejection probability differences between the standardized and self-normalized detectors increase with increasing ρ 1 , ρ 2 . The null rejection probability results for T = 500 are contained in Figures S8 and S9 in Supplementary Appendix C. As expected, over-rejections are smaller than for T = 200, especially for small values of m, and the differences between the detectors also decrease. 26 Figure S10 in Supplementary Appendix C displays the corresponding results for T = 500. 27 We focus on size-corrected power because of the potential over-rejection problems under the null hypothesis. This allows us to see power differences across detectors while holding null rejection probabilities constant at 0.05. Clearly, this is useful for theoretical power comparisons, but it has to be kept in mind that such size-corrections are not feasible in practice.
I(1) after rT . The other two breaks dealt with in Proposition 3, trend and slope breaks, with changes in θ D or θ X , respectively, are discussed in Supplementary Appendix C. 28 Size-corrected power simulations are performed for m, r ∈ {0.25, 0.5, 0.75}, which includes, therefore, cases where r < m, i.e., where a break occurs in the calibration period. We report results for all nine detectors considered, for all three estimation methods, ρ 1 = ρ 2 = 0.3 until rT and T = 200 in Table 1. 29 The following observations emerge: Grosso modo, highest size-corrected power is achieved by the moving window detector with n = 0.1, either with or without self-normalization; with as reported above selfnormalization leading to smaller over-rejections under the null hypothesis. The differences in size-corrected power are typically relatively or even very small. This, in conjunction with the performance improvement of self-normalization under the null hypothesis, makes the case in favor of self-normalization clear. In line with standard asymptotic theory concerning estimator efficiency, size-corrected power is typically lower for IM-OLS than for FM-/D-OLS, with the difference between the latter two often rather small, also in line with asymptotic theory. 30 These results, of course, have to be seen in conjunction with the smaller over-rejections of IM-OLS under the null hypothesis. Next, consider the impact of m and r on size-corrected power. In case r < m, i.e., when a break occurs in the calibration period, size-corrected power is often low, which is related to inconsistency of parameter estimation due to the structural break in the calibration period. It turns out that, in this case, the self-normalized detectors lead to lower sizecorrected power than the standardized detectors; see Table 1. For m ≤ r, size-corrected power behavior is as expected, i.e., for fixed m, size-corrected power decreases with increasing r, and, for fixed r, size-corrected power increases with increasing calibration 28 See Figures S11 to S19 in Supplementary Appendix C for size-corrected power results in case of trend and slope breaks. 29 Table S1 in Supplementary Appendix C displays the corresponding size-corrected power results for T = 500. In addition, Tables S2 to S7 in Supplementary Appendix C display the results for ρ 1 = ρ 2 ∈ {0, 0.6, 0.9} for both T = 200 and T = 500. 30 The main motivation for developing IM-OLS in Vogelsang and Wagner (2014a) was to develop an estimator that allows to perform fixed-b inference, which is an alternative asymptotic theory that captures the impact of kernel and bandwidth choices. These aspects are not covered by standard asymptotic theory.
period m. The case m = r leads to the highest size-corrected power. In addition, as expected, increasing T leads to higher size-corrected power, whereas increasing ρ 1 , ρ 2 leads to lower size-corrected power. For trend breaks-investigated in some more detail in Supplementary Appendix C-, many observations are qualitatively similar. One difference is that, in case of trend breaks, the detectorsĤ m orĤ m d lead to highest size-corrected power in some configurations. This finding, however, has to be considered in light of the larger over-rejections exhibited by these two detectors under the null hypothesis. Slope breaks lead to very similar results as I(1) breaks. It remains to investigate the detection times and delays. We again consider the case of I(1) breaks in the main text, for T = 200 and for ρ 1 = ρ 2 = 0.3 in Figure 4. 31 With respect to m and r we only display results for m = 0.5 and r = 0.5; additional configurations leading to qualitatively very similar results are displayed in Figures S21-S26 in Supplementary Appendix C. The figures display the results in the form of box-whiskers plots-for completeness-for all nine considered detectors. The numbers below the detector labels indicate the corresponding null rejection probabilities, i.e., size-corrected power, given in Table 1. Thus, the different box-whiskers plots are based on different numbers of 31 Figure S20 in Supplementary Appendix C displays the corresponding detection times results for T = 500.
replications because of different numbers of rejections across different detectors, sample sizes, and ρ values.
One relatively clear observation that emerges from the figures, both in the main text and in Supplementary Appendix C, is that the choice of the estimation method does not exhibit major impacts on detection times and delays. Given the standard asymptotic properties of the estimators, as expected, IM-OLS leads to slightly larger delays than FM-OLS and D-OLS in many cases. The moving window detectors lead to the shortest delays, both standardized and self-normalized, with the best performance achieved with n = 0.1. One interesting observation is that, for the delay, the choice of the window size does matter and, in fact, exerts bigger influence on the results than the choice of standardizing or selfnormalizing. Furthermore, an increasing sample size leads to a-ceteris paribus-more concentrated distribution of the estimated detection times (based on a larger number of observations) but does not, throughout, lead to smaller average delays. The results of the finite sample simulations can essentially be summarized as follows: First, when comparing self-normalized monitoring statistics with their standardized versions, self-normalization leads to smaller over-rejections, without having any systematic or sizeable negative impacts on size-corrected power (for m ≤ r), and without leading to larger delays when compared to standardization. Second, within the group of selfnormalized detectors, the moving window detector with n = 0.1 leads almost throughout to highest size-corrected power without detrimental effects on null rejection probabilities. For the moving window detectors, size-corrected power decreases with increasing window size. For all moving window detectors, size-corrected power is at least as good or higher than for the expanding window detector. Some exceptions occur in case of trend breaks, in which case the expanding window detector sometimes outperforms the moving window detectors in terms of size-corrected power. In addition, the window size exhibits some impact on the delays, with the moving window detector with n = 0.1 leading to the shortest delays. Third, with respect to estimator choice, the usual trade-off between IM-OLS, on the one hand, and FM-/D-OLS, on the other, occurs. IM-OLS leads to lower over-rejections at the cost of lower size-corrected power and slightly larger delays than FM-OLS, with D-OLS outperformed by FM-OLS in terms of larger over-rejections by D-OLS under the null hypothesis. Size-corrected power and detection times based on D-OLS are by and large very similar compared to those based on FM-OLS. Self-normalization in conjunction with IM-OLS is particularly beneficial for small samples, as in this case no long-run covariance estimates are required. The poor performance of FM-OLS compared to IM-OLS for small samples can be traced back to long-run covariance estimation required for FM-OLS parameter estimation, even when using self-normalized detectors. For larger sample sizes, with better properties of long-run covariance estimation, the asymptotic efficiency advantage of FM-OLS over IM-OLS becomes visible, particularly with respect to higher size-corrected power.
Altogether, we recommend to use the self-normalized moving window detector with n = 0.1, i.e.,Ĥ m,0.1 mov,sn . The choice of the estimator is dictated by the trade-off between null rejection probabilities on target and high size-corrected power.

The Environmental Kuznets Curves (EKCs) for Carbon and Sulfur Dioxide Emissions
We now apply the developed monitoring procedures to EKCs for carbon and sulfur dioxide (CO 2 and SO 2 ) emissions. We commence from a sample of 18 industrialized countries, listed in Table 2, over the period 1946-2016. 32 The highest polynomial degree we consider is the cubic cointegrating polynomial relationship, i.e., with x t the logarithm of real per capita GDP and y t the logarithm of per capita CO 2 or SO 2 emissions. As will be seen below, over the calibration period 1946-1973, for many countries, in fact, a cointegrating quadratic or even cointegrating linear relationship prevails, i.e., is not rejected. The GDP and CO 2 emissions data are similar to those used in Wagner et al. (2020). The main difference is the extension of the sample period from 2013 to 2016; additionally, there are some marginal changes in the GDP data in the newer vintage. More precisely, the GDP (in 2011 prices) and population data stem from the Maddison project database (in the 2018 version of Bolt et al. 2018). The CO 2 emissions data-which cover CO 2 emissions from fossil fuel usage-are taken from the Carbon Dioxide Information and Analysis Center (CDIAC) of the U.S. Department of Energy; see Boden et al. (2018). The SO 2 emissions have been combined from two sources. The data for 1946-2005 are from the NASA Socioeconomic Data and Applications Center (SEDAC); see Smith et al. (2011). The data for the period 2006-2016 are from the OECD (2020). 33 Table 2. List of countries included in the empirical analysis. The sample range is 1946-2016. Italic country names indicate that the augmented Dickey-Fuller test rejects the unit root null hypothesis for log GDP per capita on the calibration period   With the developed methods resting upon a calibration period, a corresponding choice has to be made. We choose the calibration period 1946-1973, reflecting that the first oil price shock of 1974 is considered a major event for changes in energy consumption patterns. 34 In our notation, this amounts to m = 28/71 ≈ 0.4. Given this choice, the first step is to perform the CPR modeling cycle for the calibration period. The augmented Dickey and Fuller (1981) test results, obtained on the calibration period for the null hypothesis of a unit root in log real per capita GDP against the alternative of (linear-)trend stationarity, are 32 Only for twelve of these 18 countries, as discussed below, is the unit root null hypothesis not rejected for the logarithm of real per capita GDP. This, of course, reduces the number of countries for which monitoring is performed in this section to twelve countries. 33 Note that the combination of these two data sources using growth rates rests upon the assumption that the share of SO 2 in SO x is constant at about 98% also over the period 2006 onwards, as the OECD data comprise all SO x emissions and not only SO 2 emissions. 34 In addition, and preceding the oil price shock, many countries have put more stringent environmental legislation in place in the late 1960s or early 1970s, e.g., the United States introduced Clean Air Acts in 1963 and 1970, Canada introduced a similarly named law 1971, and Sweden introduced its Environmental Protection Act in 1969.
contained in the country list Table 2. 35 These results lead to the following twelve countries being included in the subsequent analysis (based on a 5% significance level): Australia, Belgium, Canada, Denmark, Finland, Italy, Japan, Portugal, Spain, Sweden, the United Kingdom (UK), and the United States (U.S.). The next step is to test for the prevalence of a CPR relationship over the monitoring period-for both CO 2 and SO 2 emissions for the countries with I(1) log real per capita GDP. Given that the polynomial degree is ex ante unclear, we perform a testing sequence with polynomial degrees ranging from three (the cubic specification) to one (linear cointegration). 36 The deterministic specification consists of intercept and linear trend throughout. The lowest degree polynomial for which a CPR relationship is not rejected is considered as starting point for monitoring in the following subsections; see Table 3. The most striking feature of Table 3 is that, in most countries-for CO 2 even more than for SO 2 -, a cointegrating linear relationship appears to be present over the period . This seems to be at odds, at first sight, with the EKC hypothesis of an inverted U-shaped relationship. However, these results reflect the fact that, until the early 1970s, per capita GDP and per capita emissions developed very similarly on a "log-linear extension path"; see the scatter plots in Figures S27 and S28 in Supplementary Appendix D. Only starting in the mid 1970s, the oil price shock, as well as environmental legislation-particularly with respect to SO 2 and "acid rain"-, lead to a decoupling of the two quantities, to a certain extent, in many countries. 37 Given these preliminary results, we turn in the following two subsections to monitoring, using the moving window detectorĤ m,0.1 mov,sn with window size n = 0.1, i.e., seven years. 38

Results for Carbon Dioxide Emissions
The detection times for the twelve considered countries using, with the exceptions of Finland and the United States, where the quadratic specification is estimated, the linear specification are displayed in Table 4. With the exception of Canada and Spain for all countries the linear relationship found over the calibration period appears to break down. Furthermore, Portugal is a "mixed" case, with unsurprisingly similar behavior to Spain in many ways, as discussed below. In line with the simulation results, the detection times are-with the exception of Japanearlier with FM-OLS than with IM-OLS, with the differences often just a few years. Mostly, breaks are dated in the 1980s or early 1990s, with a few exceptions: Australia (IM-OLS: 2001), Denmark (IM-OLS: 2011), Portugal (FM-OLS: 1998; IM-OLS: no break, which is the only difference in finding a break point or not between the two methods). This can be interpreted, admittedly unsurprisingly, as strong evidence against a continued log-linear co-movement, i.e., aligned expansion, of log per capita GDP and CO 2 emissions from the-given the delays seen in the simulation section at latest-early 1990s onwards. Or, the other way around, these findings are indicative of curvature picking up in CO 2 EKC-type relationships in the 1980s, in most of the considered countries, with the exception of Canada, and potentially Portugal and Spain, which is discussed further below. 39 As an illustration, Figure 5 displays the mechanics of the monitoring procedure for Finland. 40 The upper graph displays the FM-and IM-OLS residuals over both the calibration and the monitoring period. By definition, since an intercept is included, the residuals have zero mean over the calibration period and then turn systematically (more and more) negative thereafter on the monitoring period. This, by definition, means that the estimated-in case of Finland quadratic-relationship systematically over-predicts actual log per capita CO 2 emissions, i.e., the slope of an actual relationship between output and emissions is estimated as too high; also see Figure 6. The statistical monitoring procedures need to collect enough signals until a break is detected; in this example, this takes until 1989 for FM-OLS and until 1990 for IM-OLS, which is clearly too late. Nevertheless, it may be interesting to note that Finland was severely adversely affected by the collapse of the Soviet Union in the early 1990s, which, in fact, for Finland, led to a deeper recession than the Great Depression in the 1930s. 39 Considering a quadratic specification over the full sample period, see Table S17 in Supplementary Appendix D for details, confirms this. For all countries, except Canada, Portugal, and Spain, the coefficients to log per capita GDP squared are (significantly) negative, whereas they show a positive sign for these three countries. The coefficient is, however, only significantly positive for Portugal, underlining the "borderline case" behavior of Portugal. 40 Figures S32 to S42 in Supplementary Appendix D display monitoring results for CO 2 emissions for the other eleven countries in exactly the same format as Figure 5 for Finland. Finland Actual values 1946-1973Actual values 1974-1988Actual values 1989-2016Fitted values (estimation 1946-1973 Fitted values (estimation 1946 -1988 Figure 6 shows the fitted values obtained when estimating the quadratic EKC with FM-OLS over the calibration sample, the sample ending prior to the detected break-point 1988 and when estimation takes place over the full sample. 41 The dashed red line shows that, in fact, more or less immediately after the oil price shock, the relationship between GDP and CO 2 emissions appears to have changed, with the detector taking about 15 years to collect enough signals to declare the null hypothesis rejected at the 5% significance level. 42 Parameter estimation until 1988 "catches the turn" of the mid 1970s and leads to good fit. However, using the full sample for parameter estimation of the quadratic EKC leads to the best fit, particularly for the period after 1989. This leads to some support for the interpretation that maybe the relationship is after all quadratic and one needs to have a sample that is (i) larger and (ii) covers the inverted U behavior to be able to estimate the relationship with sufficient precision; compare Footnote 37. 43 To complete the analysis for CO 2 emissions, we now turn to the countries for which no structural change is indicated by the monitoring procedure, Canada, the mixed case Portugal, and Spain. Table 5 displays the estimated coefficients for these three countries when the cointegrating (linear) relationships between log per capita GDP and log per capita CO 2 emissions are estimated over the calibration sample and over the whole sample period.
The results reported in Table 5 are a bit mixed, and no uniform pattern across all three countries emerges, with some of the smoke clearing when considering Figure 7 below. For Canada, the slope coefficient corresponding to log per capita GDP becomes much smaller when estimated over the full sample, but the trend coefficient also becomes less negative. For Portugal, the trend coefficient is not significant in both periods, but turns from non-negative point estimates in the calibration period to smaller, and for FM-OLS negative, values in the full period. The slope coefficient, however, is bigger, when estimated over the full sample. For Spain, the trend coefficient does become more negative and significant when estimated over the full sample period and the slope coefficient becomes bigger. Thus, qualitatively, the results are similar for the two countries on the Iberian peninsula, which 41 See Figure S29 in Supplementary Appendix D for analogous-and very similar-IM-OLS results. 42 Clearly, this is a long delay-particularly when looking at Figure 6, but it has to be taken into account that the estimation sample comprises only 28 observations, which is an rather small sample for cointegration analysis. 43 This appears to be the case indeed. The coefficients to squared log per capita GDP are very small but positive and borderline significant when estimated over the calibration sample (results are available upon request) and are significantly negative when estimated over the full sample (see Table S17 in Supplementary Appendix D). not only share many economic similarities but also political similarities, with the end of the Franco and Salazar regimes in these countries in the mid 1970s. Probably more informative, Figure 7 displays similar results for these three countries as displayed for Finland in Figure 6. 44 The message is clear, particularly in comparison with Figure 6. In case no structural break is detected, estimation over the larger sample does lead-unsurprisingly-to better fit, but the differences in fit are, compared to the differences observed in Figure 6, minor. From a standard inverted U quadratic EKC perspective, the excellent fit obtained for these three countries with the linear specification is to a certain extent surprising-and, which might be bad news, of course, there is no turning point.
Note that, when considering the full sample period 1946-2016, we find a cointegrating EKC relationship for all nine countries, for which monitoring detects a structural break. 45 For about a third of the countries, the polynomial degree of the EKC appears to be larger for the full sample than for the calibration period, supporting the notion of more curvature. For most countries, the evidence, with an unchanged minimal polynomial degree, points, however, more towards structural change of the parameters for a given form of the relationship. For Finland and the U.S., it appears that the polynomial degree is lower over the full sample than for the calibration period, albeit the coefficients to squared log per capita GDP are significant when estimated over the full sample, which leads to an unclear picture. Altogether, the results in this CO 2 subsection indicate that the procedures work even on such small samples, albeit, of course, for many applications the sizeable delay is most certainly problematic. 44 Figure S30 in Supplementary Appendix D shows the corresponding results obtained with IM-OLS. 45 Table S12 in Supplementary Appendix D contains similar results concerning the minimal polynomial degrees of a CPR relationship for the full sample period as Table 3 for the calibration period. Canada Actual values 1946-1973Actual values 1974-2016Fitted values (estimation 1946-1973 Fitted values (estimation 1946 -2016) 1950 1960 1970 1980 1990 2000

Results for Sulfur Dioxide Emissions
We now turn to monitoring the EKC for SO 2 emissions, with the results displayed in Table 6 for the detectorĤ m,0.1 mov,sn for both FM-OLS and IM-OLS. 46 Qualitatively, the results for SO 2 are similar to the results for CO 2 discussed in the previous subsection. With the exception of Australia, the United States and the mixed case (as for CO 2 emissions) Portugal monitoring indicates structural breaks. Again, in line with the simulation findings, the FM-OLS break points are in no case later than the IM-OLS break points, with equal break dates from both estimators for three countries. This similarity of detected break points can be tentatively interpreted as evidence for underlying structural changes in the relation between economic activity and emissions in-or due to the delays prior to-the 1980s, potentially as a consequence of tighter environmental legislation. 47 Table 7 displays the estimation results for Australia, Portugal and the U.S., i.e., the three countries where no structural break was detected, or-to be precise-where, for Portugal, only FM-OLS estimation leads to a break being detected. The results lead to some interesting observations. For Australia, estimation on the calibration period leads to (with all coefficients not significantly different from zero) positive trend coefficient and negative slope coefficient in a cointegrating linear relationship. 48 This is surprising to a certain extent, as a negative trend slope is typically considered to capture autonomous energy efficiency increases, admittedly more important for CO 2 emissions rather than SO 2 emissions. The expected signs, with the trend coefficient negative and the slope coefficient positive (and significant), emerge only over the full sample. For Portugal, the coefficient signs are as expected, with the trend coefficient only significant for the full sample. The slope coefficient to log per capita GDP becomes much larger when estimated over the full sample. The results for the cubic specification estimated for the U.S. are hard to interpret with the enormous fluctuations in the coefficient values. However, the implied turning points do change in expected ways, meaning that the first one becomes smaller and the second one larger, e.g., for FM-OLS and estimation over the short sample the two turning points are at about US$ 16,000 and 27,000, whereas full sample estimation leads to turning points at about US$ 5400 and 79,000.
The results in Figure 8, however, indicate that indeed the relationship specified over the calibration period, when estimated over the full sample, leads to very good fit, either with 47 This finding again indicates the potential gains to be reaped by considering pooling, across countries, or pollutants or both. Compare Wagner et al. (2020) for a discussion of pooling issues and options in the context of EKC analysis. This suggests that a worthwhile extension to be considered could be combining Wagner et al. (2020) with the monitoring ideas pursued in this paper. Such an extension, beyond the scope of this paper, could potentially lead to smaller delays. 48 It is in all likelihood mere coincidence that the trend coefficients are of more or less same magnitude over both sample periods with, effectively, only the sign changing. Considering the full sample period leads-similar to CO 2 emissions-to a CPR relationship for all nine countries where a structural break has been detected. In addition, 49 Figure S31 in Supplementary Appendix D shows similar results when using IM-OLS for parameter estimation. similar to CO 2 , the evidence is mixed with respect to either change in the parameters or a change in the minimal polynomial degree of a cointegrating EKC for SO 2 emissions. 50 Table 7. FM-OLS and IM-OLS estimation results for a cointegrating linear relationship between log per capita GDP and log per capita SO 2 emissions for Australia and Portugal, and a cointegrating cubic relationship for the United States, for the calibration period 1946-1973 (left panel) and the full sample period 1946-2016 (right panel). Italic entries indicate significance of coefficients at the 10% level, and bold entries significance of coefficients at the 5% level.

Summary and Conclusions
The paper extends the residual-based monitoring procedure for cointegrating relationship developed in Wagner and Wied (2017) in two dimensions. First, in addition to the detector studied in that paper, we consider a number of detectors that consider two aspects not considered in detail in the earlier paper: self-normalization and moving window detectors. Second, the approach is extended from cointegrating linear to cointegrating polynomial regressions (CPRs).
The starting point is the extension of the Shin (1994) test statistic for the null hypothesis of linear cointegration to the CPR setting (see, e.g., Wagner 2020). The full sample test statistic is then used as the basis for developing monitoring procedures, similar to Wagner and Wied (2017) for the case of cointegrating linear regressions. As also usual in the cointegrating regression literature, the regressors are allowed to be endogenous and the stationary errors are allowed to be serially correlated. Consequently, parameter estimation is based on modified OLS estimators, to be precise, FM-OLS, D-OLS, and IM-OLS, in their versions adapted to the CPR setting to allow for the construction of nuisance parameter-free monitoring statistics by scaling out a scalar long-run variance. However, even the usage of suitably modified least squares regressors is not, in the general case, sufficient to arrive at nuisance parameter-free monitoring statistics. In the CPR case, as mentioned in Remark 1, additionally, full design is required to arrive at nuisance parameter-free test statistics, either by standardization or by self-normalization. By construction, and as in the cointegrating linear case, the limiting distributions of the monitoring statistics coincide for FM-OLS and D-OLS, whilst IM-OLS leads to different limiting distributions for the monitoring statistics.
Both self-normalization, as well as the consideration of moving window detectors, turn out to be beneficial. The combination of these two new elements leads to the, grosso modo, best performance. More precisely, this means that self-normalization leads to smaller over-rejections under the null hypothesis, without leading to strong disadvantages in terms of either size-corrected power or detection delays. The performance differences between a standardized and a self-normalized detector reflect the (well-known smallsample) problems associated with long-run covariance estimation. Using a moving window rather than an expanding window also contributes positively to performance. The idea 50 See Table S12 in Supplementary Appendix D for the minimal polynomial degrees on the full sample. behind moving window detectors is to reduce the impact of pre-break observations on the monitoring statistics in the post-break period, and the simulation evidence indicates that this is indeed what happens. The finite sample performance differs to a certain extent across the three estimators. IM-OLS leads to, with the differences more pronounced for small samples, smaller over-rejections under the null hypothesis than FM-OLS, which in turn leads to lower over-rejections than D-OLS. This, however, comes in conjunction with slightly smaller size-corrected power and slightly bigger delays of IM-OLS compared to FM-OLS. The evidence is a bit mixed, and the differences are often a bit too small to give an unequivocal recommendation concerning the usage of either FM-OLS or IM-OLS.
The monitoring statistics, precisely (in the main text only) the best performing variant using self-normalization in conjunction with a short moving window, are then used to investigate the stability of cointegrating (polynomial) EKCs for both carbon and sulfur dioxide emissions for a sample of twelve countries over the period 1946-2016, with a calibration period ranging from 1946-1973, i.e., until prior to the first oil price shock. One interesting observation is that, over the calibration period, especially for CO 2 for the majority of countries, in fact, a cointegrating linear relationship is present, in line with effectively "balanced" growth of economic activity and emissions (from burning fossil fuel), until the mid 1970s. The monitoring procedures indicate, for both CO 2 and SO 2 , structural changes in nine out of twelve countries. Of course, especially for the cases where a linear cointegrating relationship prevails over the calibration period, this may not be too big a surprise. No structural break is detected for Canada, Portugal, and Spain for CO 2 emissions, and Australia, Portugal, and the U.S. for SO 2 emissions. For the countrypollutant combinations where no structural break is detected, the type of cointegrating relationship specified over the calibration period leads to very good fit also over the full sample period, and, of course, even better fit when the parameters are estimated over the full sample. This is not the case for countries where breaks are detected, indicating that structural breaks are indeed detected when present. Altogether, with the caveat of, in some cases, late detection times, remember the delays illustrated in the simulation study, the monitoring procedures lead to "plausible" findings in our application.
Future work needs to address inter alia the following issues left open in this paper: First, of course, large delays are problematic, even in an ex-post exercise one would want to date the breaks as precisely as possible, e.g., to gauge the time it takes until behavioral or legislative changes translate into changes in the economic activity-emissions nexus. For certain applications (when, e.g., legislation comes into place in a group of countries at the same time), it may be possible to rely also upon the cross-sectional dimension to date break points. The seemingly unrelated cointegrating polynomial regression analysis put forward in Wagner et al. (2020) or the classical panel EKC analysis of de Jong and Wagner (2018) might serve as starting points for developing monitoring statistics for small and large cross-sectional dimension, respectively. Second, if one wants to use the monitoring procedures in a real time manner-rather than for a historical analysis as in this paper-, it may be important to consider extending end-of-sample cointegrating break point tests to the setting considered here (one such test for the cointegrating linear case is discussed in Andrews and Kim 2006). Extensions to more general settings are, however, not obvious. Third, it may be interesting to monitor and detect structural breaks in the other direction, i.e., from a spurious regression to a cointegrating polynomial regression. This could be achieved, e.g., by suitably extending the results of Sakarya et al. (2015) from cointegrating linear to cointegrating polynomial regressions. Data Availability Statement: The analysis in this paper utilizes only publicly available data, which are available here: GDP and population, https://www.rug.nl/ggdc/historicaldevelopment/ maddison/releases/maddison-project-database-2018?lang=en; CO 2 emissions, https://data.essdive.lbl.gov/view/doi:10.15485/1712447; SO 2 emissions, http://sedac.ciesin.columbia.edu/data/ set/haso2-anthro-sulfur-dioxide-emissions-1850-2005-v2-86; SO x emissions, https://doi.org/10.178 7/93d10cf7-en.

Acknowledgments:
The authors are grateful to the editor Claudio Morana and three anonymous referees for their valuable suggestions. The helpful comments of participants at the 3rd Conference on Econometric Models of Climate Change in September 2018 in Rome, as well as those of the participants at the Annual Meeting of the Austrian Economic Association in February 2020 in Vienna, are gratefully acknowledged. The paper solely reflects the views of the authors and not necessarily those of the Bank of Slovenia or the European System of Central Banks. On top of this, the usual disclaimer applies.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proofs
Proof of Lemma 1. The lemma follows with straightforward changes essentially from available full sample results: For FM-OLS, see Wagner and Hong (2016, Proof of Proposition 5), and, for D-OLS, see Choi and Saikkonen (2010, Lemma A.2). For IM-OLS, the result follows with two changes from the results in Vogelsang and Wagner (2014b). First, again, a change from a full sample result to a calibration (sub-)sample result and, second, the results in that paper are all formulated only under the null hypothesis of a true linear relationship nested in a CPR-type relationship; a few results have to be correspondingly-and straightforwardly-modified.
Proof of Proposition 1. Using the results from Lemma 1, the continuous mapping theorem (see, e.g., Hall and Heyde 1980, Theorem A.3) and the assumption of consistent longrun covariance estimation leads to the stated asymptotic distributions under the null hypothesis.
Proof of Proposition 2. The result follows from Proposition 1 in conjunction with the continuous mapping theorem since the limit H(s) ofĤ(s), using short-hand notation also for the limit quantities, is well defined for all considered monitoring statistics. The same holds true for |Ĥ (s) g(s) | since 0 < g(s) < ∞ is assumed to be continuous.
Proof of Proposition 3. The proof of the proposition is similar in spirit and follows from similar arguments as the proof of Proposition 2 of Wagner and Wied (2017, Supplementary Appendix A) for monitoring in the linear cointegration case.
To show consistency against fixed alternatives for part (a) of the proposition, the limiting behavior of the residual partial sum processes is key. Consider here the FM-OLS residual partial sum process, with the results being similar for both D-OLS and IM-OLS (without partial summing due to the partial summed regression). For 0 < m ≤ r < s ≤ 1, it holds that: The first term converges to ω u·v W u·v,m (r) according to Lemma 1. Depending upon which case considered, at least one of the other terms diverges in case of a fixed alternative. In case (i), the second term diverges at rate T since, in this case, {u t } is an I(1) process for t ≥ rT + 1. Similarly, in cases (ii) or (iii), either the next to last or last term diverges-or both in case breaks occurring in both θ D and θ X .
Therefore, for any of the considered fixed alternatives, the partial summed residual process or-in case of IM-OLS-the residual process diverges. The result stated in (a) follows immediately from the definitions of the considered monitoring statistics.
We now turn to part (b) of Proposition 3 and consider local asymptotic power. For item (i), it follows for the residual partial sum processes of FM-OLS, D-OLS, and IM-OLS residuals that: with Q u·v,m (s) denoting the corresponding limiting process as in Proposition 1. For item (ii), it follows that: It remains to consider the asymptotic behavior of the residual partial sum process appearing in the third item of part (b) of the proposition. The partial sum process of the residuals converges to the following limiting process: with Ω 1/2 vv := diag(Ω 1/2 vv , ω 2 k , . . . , ω p k k ) and ω k the lower right-hand corner element of Ω 1/2 vv corresponding to the k-the element of v t . Equations (A2) to (A4) show that in case of local alternatives the limiting processes differ from the limiting processes derived under the null hypothesis by an extra term, depending upon the case considered equal to δω γ s r (W γ (z) − W γ (r))dz, s r D(z) dz∆ θ D and s r W v (z) dzΩ 1/2 vv ∆ θ X . These terms can be made arbitrarily big (in mean square sense) by choosing δ, ∆ θ D or ∆ θ X which translates in the detectors also becoming arbitrarily large. To illustrate the mechanics, considerĤ m,n mov,sn and the case of a local I(1) break, in which case one obtains for 0 < m ≤ r < s ≤ 1: H m,n