HAR Testing for Spurious Regression in Trend

The usual t test, the t test based on heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators, and the heteroskedasticity and autocorrelation robust (HAR) test are three statistics that are widely used in applied econometric work. The use of these signiﬁcance tests in trend regression is of particular interest given the potential for spurious relationships in trend formulations. Following a longstanding tradition in the spurious regression literature, this paper investigates the asymptotic and ﬁnite sample properties of these test statistics in several spurious regression contexts, including regression of stochastic trends on time polynomials and regressions among independent random walks. Concordant with existing theory (Phillips, 1986, 1998; Sun, 2004, 2014), the usual t test and HAC standardized test fail to control size as the sample size n→∞ in these spurious formulations, whereas HAR tests converge to well-deﬁned limit distributions in each case and therefore have the capacity to be consistent and control size. However, it is shown that when the number of trend regressors K→∞, all three statistics, including the HAR test, diverge and fail to control size as n→∞. These ﬁndings are relevant to high dimensional nonstationary time series regressions.


Introduction
In a well-cited contribution that emphasized the importance of diagnostic testing in econometrics, David Hendry (1980) highlighted how easy it is to mistake spurious relationships as genuine when using trending data of the type that are so commonly encountered in econometric work, especially in macroeconomics. Spurious regressions occur when conventional signi…cance tests are so seriously biased towards rejection of the null hypothesis of no relationship that the alternative of a genuine relationship is accepted when the variables have no meaningful relationship and may even be statistically independent. Hendry's article showcased the potential for nonsense regressions with the illustration of a regression between UK consumer prices and cumulative rainfall that displayed a high level of 'signi…cance'and passed many -but not all -diagnostic tests.
Spurious regressions continue to attract considerable attention in econometric work, long after the original study by Yule (1926), the simulation experiments of Granger and Newbold (1974), and cautionary warnings made by David Hendry and many other writers since then.
The limit theory of Phillips (1986) and Durlauf and Phillips (1988) provided the …rst analytic step forward by explaining the phenomena of persistent null hypothesis rejections in spurious regressions. These studies helped applied researchers understand the failure of conventional signi…cance tests by showing that in regressions with independent or even correlated trending I(1) data the usual regression t-and F -ratio test statistics do not possess limiting distributions but actually diverge as the sample size n " 1; leading inevitably to rejections of the null of no association. These studies formed the basis of a large subsequent literature that has analyzed spurious regressions among various classes of trend stationary, long memory, nonstationary, and near-nonstationary time series. A recent article by Ernst et al. (2017) provided further analysis by deriving an expression for the standard deviation of the sample correlation co-e¢ cient between two independent standard Brownian motions. While this expression does not explain the phenomenon of spurious regression betwen two independent random walks, it does reveal that the limiting correlation is not centred on the origin and is highly dispersed.
This result complements the …nding in Phillips (1986) and many subsequent papers that the coe¢ cient of determination in a spurious regression has a well de…ned limit distribution and does not converge in probability to zero.
In later work, Phillips (1998) pointed out that spurious regressions typically re ‡ect the fact that trending data may always be 'explained'by a coordinate system of other trending variables -which includes the example of UK price series being well-explained by cumulative rainfall that was used by David Hendry (1980). In this broad sense of interpretation, there are no spurious regressions for trending time series, just alternative 'valid' representations of the time series trajectories (and those of its limiting stochastic process, given a suitable normalization) in terms of other stochastic processes and deterministic functions of time.
The asymptotic theory in Phillips (1998) utilized the general representation of a stochastic process in terms of an orthonormal system and provided an extension of the Weierstrass theorem to include the approximation of continuous functions and stochastic processes by Wiener processes. That theory was applied to two classic examples of spurious regressions: regression of stochastic trends on time polynomials, and regressions among independent random walks.
Such regressions were shown to reproduce asymptotically in part (and in whole as the regressor space expanded with sample size) the underlying valid representations of one trending process in terms of others, a coordinate system that is entirely analogous to orthonormal or Fourier series representations of a continuous function in terms of polynomials or other simple classes of functions over some interval. An important feature of these 'valid' trend relationships is that the coe¢ cients in the representations, like those in the Karhunen-Loève representation of a general stochastic process, are themselves random variables. Randomness in the representation of time series trajectories is embodied in these coe¢ cients. Much subsequent work has utilized these ideas and methods, either in justifying certain regression representations or in using partial versions of these regression representations to focus on certain features -such as long run features -of the data (Phillips, 2005(Phillips, , 2014Müeller, 2007;Sun, 2004Sun, , 2014aSun, ,2014bSun, , 2014cHwang and Sun, 2018;Müller andWatson, 2016, 2018). Hendry's (1980) discussion of econometric practice was its emphasis on the value of diagnostic testing to ascertain limitations of regressions used in applications. In any empirical regression equation, the properties of the residuals depend inevitably on the properties of the data. To build upon a saying of the famous statistician John Tukey, in the regression equation y = X + u the empirical investigator chooses the variables y and X (possibly with the aid of an autometric regression or a machine learning algorithm) and god gives back u: Any misspeci…cation in the relationship between y and X must therefore be manifest in the properties of u: This is precisely what occurs in a spurious regressionthe residual embodies the consequences of a model's fundamental error of speci…cation -as is revealed by the fact that tests for residual serial correlation such as the Durbin Watson statistic converge in probability to zero in such regressions (Phillips, 1986).

An important element in
Accommodating departures in …tted relationships from conventional assumptions on the properties of regression errors and thereby some of the e¤ects of misspeci…cation has been a longstanding goal of econometrics. One of the great advances in econometric research over the last half century in response to this goal has been the development of methods of inference that are robust to some of the properties of the data and, particularly, those of the regression error. Such robustness can o¤er protection against speci…cation error in validating inference. This research has led to the progressive development of heteroskedastic and autocorrelation consistent (HAC 1 ) procedures and subsequently to heteroskedastic and autocorrelation robust (HAR 2 ) methods. These methods control for the e¤ects of serial dependence and heterogeneity in regression errors and they play a key role in achieving robustness in inference. One area where methods of achieving valid statistical inference via HAC procedures has proved especially important in practice are regressions that involve trending variables and cointegration. This goal motivated the early research on optimal semiparametric approaches to the estimation of cointegrating relationships (Phillips and Hansen, 1990) and continues to play a role in subsequent developments in this …eld (Phillips, 2014;Hwang and Sun, 2018).
HAC methods generally have good asymptotic properties but they are susceptible to large size distortions in practical work. Several alternative methods have been proposed in the recent literature to improve …nite sample performance. Among these, the '…xed-b'lag truncation rule (Kiefer and Vogelsang, 2002a, 2002b, 2005 has attracted considerable interest. The method uses a truncation lag M for including sample serial covariances that is proportional to the sample size n (i.e., M bn for some …xed b 2 (0; 1)) and sacri…ces consistent variance matrix (and hence standard error) estimation in the interest of achieving improved performance in statistical testing by mirroring …nite sample characteristics of test statistics in the new asymptotic theory of these tests. The formation of t ratio and Wald statistics based on HAC estimators without truncation belongs to the more general class of HAR test statistics. There are known analytic advantages to the …xed b approach, primarily related to controlling size distortion. In particular, research by Jansson (2004), Sun et al. (2009), and Sun (2014b) has shown evidence from Edgeworth expansions of enhanced higher order asymptotic size control in the use of these tests. Recently, Müller (2014), Lazarus, et al. (2018, and Sun (2018) have surveyed work in this literature and given recommendations for practical implementation.
1 Heteroskedastic robust standard errors were introduced by Eicher (1967), Huber (1967) and White (1980). HAC estimators were introduced by White (1982) and have a long subsequent history of enhancement.
2 Heteroskedastic and autocorrelation robust standard errors were introduced in Vogelsan (2002a, 2002b) and, following this lead, Phillips (2005a) used the HAR terminology to characterize a class of robust inferential procedures in an article concerned with the development of automated mechanisms of valid inference in econometrics. Other important early contributions concerning HAC covariance matrix estimators without truncation are given by Robinson (1998), Kiefer Vogelsang andBunzel (2000), and Kiefer and Vogelsang (2005).
In studying spurious regression on trend phenomena, Phillips (1998) showed that the use of HAC methods attenuated the misleading divergence rate (under the null hypothesis of no association) by the extent to which the truncation lag M ! 1: In particular, the divergence rate of the t statistic in a spurious regression involving independent I (1) variables is O p p n=M rather than O p ( p n) : Pursuing this philosophy further, Sun (2004) o¤ered a new solution to deal with inference in spurious regressions. He argued that the divergence of the usual t-statistic arises from the use of a standard error estimator that underestimates the true variation of the ordinary least squares (OLS) estimator. He proposed use of a …xedb HAR standard error estimator with a bandwidth proportional to the sample size (where M bn ! 1 at the same rate as n). The resulting t-statistic converges to a non-degenerate limiting distribution which depends on nuisance parameters. These discoveries revealed that prudent use of HAR techniques in regression testing might widen the range of inference to include spurious regression.
In the same spirit as Sun (2004Sun ( , 2014, the present contribution analyzes possible advantages in using HAR test statistics in the context of simple trend regressions such as where u t is I (1) : For trend assessment in models of this type it is of interest to test the null hypothesis H 0 : a = 0 of the absence of a deterministic trend in (1.1). This framework is a prototypical example of much more complex models where deterministic and stochastic trend components are present and valid testing is needed.
The paper considers three types of t test widely used in econometrics: the usual t test, the t test based on HAC covariance matrix estimators, and the …xed-b HAR test. We apply these t-statistics to three classic examples of spurious regressions: regression of stochastic trends on time polynomials, regression of stochastic trends on deterministic time trend and regression among independent random walks. The asymptotic behavior of these three di¤erent t-statistics are investigated. In the regression of stochastic trends on time polynomials and the regression among independent random walks, it is shown that the usual t test and HAC based t test are likely to indicate a signi…cant relation with probability that goes to one as the sample size n goes to in…nity. However, provided the number of regressors (K) is …xed, the HAR t-statistics converge to well-de…ned distributions free from nuisance parameters. As a result, when appropriate critical values are drawn from these limiting distributions, the HAR t-statistics would not diverge and valid inference on the regression coe¢ cients would be possible, concordant with Sun (2004).
In contrast to these results and those of Sun (2004), we …nd that HAR t-statistics diverge at rate p K as K ! 1. Hence, the characteristics of spurious regression return even with the use of HAR test statistics in models with an increasing number of regressors. These …ndings seem relevant for machine learning and autometric model building methods which accommodate large numbers of regressors, including those of the p > n variety where model searching often begins with more regressors than sample observations and penalized methods of estimation are needed to obtain even preliminary results.
Our results also reveal that the other two t-statistics (the usual t and HAC-based t) diverge at greater rates when K ! 1 than when K is …xed. In the regression of stochastic trends on deterministic time trends, we derive the limiting distributions of the statistics under both the null and alternative hypotheses. The HAR test turns out to be the only test which is consistent and has controllable size. All the limit theory for these tests receives strong support in simulations. And, as will become evident, the appealing asymptotic properties of the HAR test in the …xed number of regressors case are manifest even in situations where some commonly-used regularity conditions in the construction of HAR tests are violated.
The rest of the paper is organized as follows. Section 2 examines regressions of stochastic trends on a complete orthonormal basis in L 2 [0; 1] and establishes the limiting distributions of the three di¤erent t-statistics with explicit application to the prototypical case of a spurious linear trend regression. Section 3 examines the limit behavior of the t-statistics in regressions among independent random walks. Simulations are reported in Section 4. Section 5 concludes.
All proofs are given in the Appendix.

Model Details and Background
The development in this section concentrates on a simple unit root time series whose increments t form a stationary time series with zero mean, …nite absolute moments to order p > 2, and continuous spectral density function f ( ). We assume that X t satis…es the functional central limit theorem (FCLT) for which primitive conditions are well known (e.g., Phillips and Solo, 1992). The results that follow are illustrative and apply with suitable modi…cation to more general nonstationary time series, such as near integrated or long memory series, which upon standardardization converge to limiting stochastic processes with sample paths that are continuous almost surely.
By the Karhunen-Loève (KL) expansion theorem (e.g., Loève, 1963, p.478), any function that is continuous in quadratic mean has a decomposition into a countable linear combination of orthogonal functions. The KL representation for the Brownian motion B (r) is are eigenvalues and corresponding eigenfunctions of the Brownian motion's covariance kernel are independently and identically distributed (iid) as N (0; 1). This series representation of B (r) is convergent almost surely and uniformly in r 2 [0; 1]. Denoting z k = p k k as the stochastic coe¢ cients, the KL representation (2.3) could be rewritten as (2.4) Starting from the KL representation of B (r), Phillips (1998) studied the asymptotic properties of regressions of X t on deterministic regressors of the type Least squares estimation giveŝ K = (â 1 ; ::: where K = (' K1 ; :::; ' Kn ) 0 with ' Kt = (' 1 (t=n) ; :::; ' K (t=n)) 0 , and X = (X 1 ; :::; X n ) 0 . Let where K = diag( 1 ; :::; K ) and ' K (r) = (' 1 (r) ; :::; ' K (r)) 0 . In the expanding regressor case where K = K (n) ! 1 and K=n ! 0, it was also shown in Phillips (1998) that are the random coe¢ cients in the KL representation (2.4). Therefore, the …tted coe¢ cients in regression (2.6) tend to random variables in the limit as n ! 1 that match those in the KL representation of the limit process B ( ). In other words, least squares regressions reproduce in part (when K is …nite) and in whole (when K ! 1) the underlying orthonormal representations.

Three t-statistics
Suppose interest centers on testing whether the regression coe¢ cients are signi…cant or more generally whether some linear combination C 0 K K of the underlying coe¢ cients K = (b 1 ; :::; b K ) 0 in the estimated regression (2.5) is equal to 0, that is Three types of t-statistics are considered. The …rst is the usual t-ratio de…ned as the usual error variance estimate. The second t-statistic is constructed by using a HAC variance estimator and has the following Here, [ lrvar HAC ( t ) is a kernel estimate of the long run variance of its argument, k ( ) is a lag kernel, M is a bandwidth parameter satisfying M=n + 1=M ! 0 as n ! 1; and the argument t =û t ' Kt in (2.9). If we choose a …xed b 2 (0; 1] and set M = bbnc, the condition M=n + 1=M ! 0 as n ! 1 is violated. In that case, the long run variance estimate is a …xed-b estimate and leads to the (2.13) k b j n = k j nb , and and k ( ) is a lag kernel function as before. With minor changes of the proof given in Phillips (1998), it is easy to deduce that for …xed such tests indicate statistically signi…cant regression coe¢ cients with probability that goes to one as n ! 1. These results match what is now standard spurious regression limit theory for inference.
In addition, as we show in Theorem 2.3 below, the large regressor case where K ! 1 leads to di¤erent results. In this case, both t-statistics t of divergence that depend on the expansion rate of K; given by t Thus, with the addition of more regressors the combined e¤ect of the regression coe¢ cients -as well as that of the individual coe¢ cents -appears more signi…cant and diverges when K ! 1 as n ! 1: In consequence, large numbers of regressors e¤ectively worsen the spurious regression problem.
Is there a test which does not always indicate that coe¢ cients^ K are signi…cant in the "spurious" regression (2.5)? As the results of Sun (2004) show, the answer is positive for the case where K is …xed. In this event, the HAR test is appealing in the sense that t when n ! 1 and K is …xed, so that test size is controlled in the limit. Therefore, when appropriate critical values obtained from the limit distribution of t These results are collected in the following two theorems.
Theorem 2.1 For …xed K, as n ! 1 and M=n + 1=M ! 0, we have asymptotically follows a well-de…ned limit distribution when the number of regressors K is …xed. The limit distribution is free from nuisance parameters and is easily computable but depends on the lag kernel as well as the form of the trend regressors, which in ‡uence the detrended standard Brownian motion process W ' K : The asymptotic critical values therefore di¤ er from those of the usual standard normal limit distribution of a t-statistic. But the speci…c features of the limit distribution of t which retain randomness in the denominator of the limiting statistic, help to control size in …nite sample testing.
Theorem 2.3 As n; K ! 1, M=n + 1=M ! 0 and K 5=2 =n + K 3=2 =n 1 2 1 p ! 0 , the following results hold: Remark 2.4 Theorem 2.3 shows that all three t-statistics diverge as n ! 1 but at different rates, each of which depends on K. The divergence rate of the …xed-b test statistic t HAR C 0 K K = O p p K is the slowest and depends only on K: These results strengthen the …nding in Phillips (1998) that attempts to deal with serial dependence in controlling size in significance testing generally fail when enough e¤ ort is put into the regression design to …t the trajectory. This failure now includes HAR testing when K ! 1. All the tests are therefore ultimately con…rmatory of the existence of a 'relationship' -in the present case a coordinate representation relationship among di¤ erent types of trends, at least when a complete representation is attempted by allowing the number of regressors K to diverge with n: The results of the theorem may be interpreted to mean that when a serious attempt is made to model a stochastic trend using deterministic functions (either a large number of such regressors or regressors that are carefully chosen to provide a successful representation and trajectory …t) it will end up being successful even when a spurious regression robust method such as …xed-b HAR test is used.
An additional matter concerning the form of these tests may usefully be highlighted. To construct the HAC and HAR t-statistics, the following condition is usually imposed (e.g., Kiefer et al. 2000, Kiefer and Vogelsang 2002a, 2002b as in standard approaches to robust covariance matrix estimation. In other words, the process f' Kt X t g is typically assumed to be unconditionally stationary or weakly dependent with uniformly bounded second moments so that series such as (2.14) converge. However, this condition is violated in both regressions (2.5) and (2.6) as depends on t. For example, when the components s are iid 0; 2 with partial sums satisfying depends on t. Regardless of this violation, HAC and HAR t-statistics may still be constructed in the traditional way; and the HAR statistic, t HAR C 0 K K has nuisance parameter free asymptotic properties even though the above unconditional stationarity condition is not satis…ed.
The above results apply straightforwardly to the simple case of a spurious linear regression on trend where the time series is a unit root process generated by X t = at + X 0 t ; t = 1; :::; n; (2.15) with a = 0 and X 0 t = P t s=1 s is the partial sum of a zero mean stationary process f s g with continuous spectral density f ( ) : The standardized process X n (r) = n 1=2 X 0 bnrc satis…es the functional law The …tted regression model is X t =ât +û t ; or equivalently, X t p n = p nâ t n +û t p n ; (2.16) whereâ = P n t=1 tX t = P n t=1 t 2 is the least squares (LS) estimate of a; which satis…es (Durlauf and Phillips, 1988) p n (â a) = n 5=2 P n t=1 tX 0 t n 3 P n t=1 t 2 ) 3 (2.17) so thatâ is consistent, including the case where a = 0: However, as is well known, the usual t-statistic has order O p ( p n) and diverges as n ! 1; indicating a signi…cant relationship between fX t g and t in spite of the fact that a = 0: This outcome follows directly from Theorem 2.1 and the (alternate) representation for the standard Brownian motion W (r) as Thus, when a = 0, the scaled LS estimator p nâ has a random limit a N 0; 6 5 ! 2 from (2.17) that approximates but does not exactly reproduce the leading random coe¢ cient term : This dependence induces an asymptotic ine¢ ciency in the trend coe¢ cient estimateâ, since 6 5 ! 2 >Var(! 0 ) = ! 2 . Next, in testing H 0 : a = 0 versus H 1 : a 6 = 0; the following statistics are considered: k ( ) is a kernel function, k b (j=n) = k (j= (nb)) andû t = X t ât for t = 1; ; n. The asymptotic properties of these test statistics follow in the same way as before when n ! 1 with M=n + 1=M ! 0; giving the following results.
(2.24) (ii) Under H 1 : a 6 = 0, t a n ) a 3 R 1 0 B 2 1=2 ; (2.25) has e¤ective discriminatory power, being consistent and having controllable size. These results match those in Sun (2004Sun ( , 2014 showing that for simple trend misspeci…cations like that of a …nite degree polynomial trend function in place of a stochastic trend, use of …xed-b HAR testing controls size and leads to a consistent test.

Regressions among independent random walks
This section extends these ideas to regressions among independent random walks. Let B ( ) be a Brownian motion on the interval [0; 1]. Phillips (1998) proved that there exist a sequence of independent standard Brownian motions fW i g K i=1 that are independent of B ( ), and a sequence of variables fd i g K i=1 de…ned on an augmented probability space ( ; F; P ) such that, as K ! 1, (3.1) The random coe¢ cients d i are statistically dependent on B ( ). Replacing the Wiener processes W i by orthogonal functions V i (r) in L 2 [0; 1] using the Gram-Schmidt process gives the representation In the following, we consider the unit root process y t = P t s=1 s with mean zero stationary components f s g with continuous spectral density f ( ) and satisfying the functional law n 1=2 y bnrc ) B (r) BM ! 2 ; ! 2 = 2 f (0) > 0: be K independent standard Gaussian random walks, all of which are independent of y t . Consider the linear regression y t =b 0 x x t +û t , based on n > K observations of these series. The large n asymptotic behavior ofb x is (Phillips (1986) where W x is the vector standard Brownian motion weak limit of the standardized partial sum processes n 1=2 x bn c .
Suppose we orthogonalize the regressors fx k = (x kt ) n t=1 : k = 1; ; Kg using the Gram-Schmidt process By standard weak convergence arguments we have Now let z t = (z kt ) K k=1 , and consider the regression The LS estimatorb zK = [ P n t=1 z t z 0 t ] 1 P n t=1 z t y t has the limit b zK ) where V K = (V k ) K k=1 be a K 1 vector. Thus, the empirical regression of y t on z t reproduces the …rst K terms in the representation of the limit Brownian motion B in terms of an orthogonalized coordinate system formed from K independent standard Brownian motions.
Suppose now that we are interested in testing whether a linear combination of b zK equals zero, viz., with C K 2 R K satisfying C 0 K C K = 1. Again, three types of t-statistics are considered: The following theorem establishes the limiting distributions of these three t-statistics.
Theorem 3.1 For …xed K, n ! 1, where W ( ) is a standard Brownian motion. Hence, the nuisance parameter ! appearing in the numerator and dominator of the limiting distribution of t HAR b zK cancels. The limit distribution of t HAR b zK is therefore free of nuisance parameter.

Remark 3.3 Even when
Thus E z t y t y t j z 0 t j = E (y t y t j ) E z t z 0 t j depends on t in a similar way. Therefore, as we discussed earlier, the usual regularity conditions employed in constructing HAC and HAR t-statistics does not apply here.
Remark 3.4 In view of (3.2) and Theorem 4.3 in Phillips (1998), W yK (r) ! 0 almost surely and uniformly as K ! 1. We can expect that the rates of divergence of t b zK and t HAC b zK are greater in the case where K ! 1 than they are when K is …xed. Moreover, similar to the earlier …ndingg in Theorem 2.3, the HAR statistic t

Simulations
This section reports simulations to investigate the performance in …nite samples of the di¤erent t-statistics in spurious trend regressions, simple time trend regression, and spurious regression among stochastic trends.
We …rst examine spurious regression of a stochastic trend on time polynomials. Consider the standard Gaussian random walk X t = P t s=1 s , where u s d iid N (0; 1) 3 . Orthogonal basis functions f' k ( )g K k=1 ; where ' k (r) = p 2 sin [(k 0:5) r] ; were used as regressors and …tted time trend regressions of the form X t = ' 0 Kt^ +û t were run with ' Kt = [' 1 t n ,...,' K t n ] 0 . We focus on the prototypical null hypothesis H 0 : 1 = 0 in what follows. In the construction of the HAC and HAR t-statistics, a uniform kernel function was employed. Figure 1 reports the kernel estimates of the probability densities for these t-statistics under di¤erent model scenarios based on 10,000 simulations. The …rst panel of the …gure gives the results for the di¤erent t-statistics as the sample size n increases with …xed K = 1. It is evident that both the usual t-statistic and HAC t-statistic (with M = bnbc = n 1=4 and b = n 3=4 ) diverge as n increases and the HAC statistic diverges at a slower rate. In contrast, the HAR t-statistic (b = 0:2) is evidently convergent to a well-de…ned probability distribution as the sample size expands. These results clearly corroborate Theorem 2.1.
The second panel presents the estimated densities of the three t-statistics as K increases for a …xed sample size n = 200. As K increases, all three t-statistics are clearly divergent but at di¤erent rates. For each statistic the increase in dispersion as K increases is evident.
The last panel reports the results for the HAR t-statistic with K = 1; 5; 20 and bandwidth coe¢ cient b = 0; 0:1; 0:4; 0:6; 0:8; 1. As K increases while maintaining the same bandwidth setting, the densities become more progressively dispersed. For …xed K, it is clear that the quantile is not a monotonic function of b. For K = 1; 5, when b is close to zero, the limiting distributions become more dispersed. When b is close to one, the limiting distributions also get dispersed for all three choices of K. As explained in Sun (2004), for small or moderate K, when b is close to zero, the behavior of the t-statistic may be better captured by conventional limit theory without taking into account the persistence of the regression residuals. But when b is close to unity, we can not expect the standard variance estimate to capture the strong autocorrelation. If we choose the kernel k (x) = 1 and use the full sample (i.e., setting b = 1), the long run variance estimate equals to zero by construction. We conjecture that for …xed K it may be possible to …nd an optimal bandwidth b opt (K) by following an approach similar to the method used in Sun, Phillips and Jin (2009) that controls for size and power.
From the shape of the densities in the last panel of Figure 1, we would expect that any such optimal bandwidth b opt (K) will get closer to zero as K gets larger. Extension of robust testing techniques to machine learning regressions where K may be very large will likely require very careful bandwidth selection in signi…cance testing that takes the magnitude of K into account.
Next, we consider a simple spurious linear trend regression of X t on a time trend. The …rst panel presents kernel estimates of densities of the t-statistics for sample sizes n = 50; 100; 400; 800. Again, the usual t-statistic and HAC statistic are divergent but at di¤erent rates. The HAR statistic is evidently convergent. The second panel in Figure 2 provides results for the HAR statistic with di¤erent bandwidth choices. It is clear that the distributions become more dispersed as b moves close to zero or close to one. In this respect the …ndings are similar to those of Figure 1 when K = 1.
Last, we consider spurious regressions of a standard Gaussian random walk process on independent Gaussian random walks. Figure 3 shows the kernel estimates of the probability densities for these t-statistics under di¤erent scenarios based on 10,000 simulations. The patterns exhibited are evidently similar to those in Figure 1. The same qualitative observations made for Figure 1 therefore apply to these regressions.

Conclusions
Robust inference in trend regression poses many challenges. Not least of these is the critical di¢ culty that a trending time series trajectory can be represented in a coordinate system by many di¤erent functions, be they relevant or irrelevant, stochastic or non-stochastic. Valid The present work has studied the asymptotic and …nite sample performance of simple t statistics that seek to achieve some degree of robustness to misspeci…cation in such settings.
The analysis is based on three classic examples of spurious regressions, including regression of stochastic trends on time polynomials, regression of stochastic trends on a simple linear trend, and regression among independent random walks. Concordant with existing theory, the usual t-statistic and HAC standardized t-statistic both diverge and imply 'nonsense relationships' with probability going to one as the sample size tends to in…nity. Also concordant with existing theory, when the number of regressors K is …xed, the HAR standardized t-statistics converge to non-degenerate distributions free from nuisance parameters, thereby controlling size and leading to valid signi…cance tests in these spurious regressions. These …ndings reinforce the optimism expressed in earlier work that …xed-b methods of correction may …x inference problems in spurious regressions.
But when the number of trend regressors K ! 1, the results are di¤erent. First, rates of divergence of the usual t-statistic and HAC t-test are greater by the factor p K than when K is …xed. Second, the …xed-b HAR t-statistic is no longer convergent and instead diverges at the rate p K; leading to spurious inference of signi…cance when K ! 1.
(iii)-(iv) The proofs of (iii) and (iv) are similar. Hence, only the proof of (iv) is given below. By noticing that k iid N (0; 1) and for each k = 1; :::; K the functions ' k (r) = p 2 sin [(k 1=2) r] are bounded uniformly in r, we have Therefore, since k b (r q) is uniformly bounded and Therefore, Finally, we get Var Proof of Lemma A.2. (i) See Phillips (2002), Lemma 2.2.
Hence, for any jjj M ,