1. Introduction
Granger and Newbold (
1974) and others show that regression of independent (nearly) nonstationary time series could result in spurious outcomes, 
Pesaran et al.  (
1999) and others find that a mixed integration of orders; that is, I(0) or I(1), could be cointegrated and the residual is stationary, and 
Westerlund  (
2008) documents that many studies commit a Type 1 error by failing to reject the no-cointegration hypothesis. On the other hand, 
Engle and Granger  (
1987) establish the relationship between cointegration and error correction models that first suggested in 
Granger  (
1981) and develop estimation procedures and tests for the cointegration model. In addition, 
Phillips  (
1986) develops an asymptotic theory for regressions of integrated random processes, including the spurious regressions discovered by 
Granger and Newbold (
1974) and the cointegrating regressions developed by 
Engle and Granger  (
1987). 
Entorf  (
1997) analyses the regression of two independent random walks with drifts and shows that the convergence to pseudo true values applies to the estimation of spurious fixed-effects models. Readers may refer to 
Ventosa-Santaulária  (
2009) for an overview of spurious regression.
 Is it possible that the regression of two independent and nearly non-stationary series does not have any spurious problem? In this paper, we explore the issue. To explore the problem, we first conjecture that under some situations, regression of two independent and nearly non-stationary series does not have any spurious problem at all. To check whether the conjecture we set holds, we first generate two independent and nearly nonstationary AR(1) processes,  and  with . We then regress  on the independent  to get  and check the proportion of rejecting the null hypothesis that the beta () is zero. We first find that under some situations, consistent with the literature, regressing two independent and (nearly) nonstationary time series could be spurious. Nonetheless, we also find that under some other situations, different from the literature, our results show that the rejection rates are much smaller than the 5% level of significance for all the cases simulated in our paper, implying that under some other situations, regressing nearly nonstationary  on independent and nearly nonstationary  will not get any spurious problem at all as shown in all the cases being simulated in our paper.
The rest of the paper is organized as follows. In 
Section 2, we state the basic models for the regression and the regression with a spurious problem. In 
Section 3, we state our model setup and construct the algorithm for the simulation. In 
Section 4, we discuss our findings from our simulation and the last section concludes.
  3.  Model Setup and Algorithm
In this section, we first state the model setting of generating two purely independent and nearly nonstationary time series, regressing one of them onto the other, and examining whether the corresponding regression is spurious. We then construct the algorithm for the simulation and discuss our simulation result in the next section.
  3.1. Model Setup
We consider the simple linear regression in (
1) between two unrelated nearly nonstationary AR(1) series 
 and 
 such that
        
        in which 
 . For simplicity, we assume that both 
 and 
 follow:
        where 
 and 
. We note that 
, 
, and 
 follows a Student’s 
t distribution with 
 degrees of freedom (df). For 
, 
k is equated to 1 and in this case, 
 in (
8) is simply a scale parameter. When 
, it becomes a Cauchy distribution and when 
, it becomes a normal distribution. Readers may refer to 
Pötzelberger  (
1990), 
Tiku and Wong  (
1998), 
Tiku et al.  (
1999, 
2000), 
Wong and Bian  (
2005), 
Fu and Fu  (
2015), and others to know more properties of AR(1) series.
To simulate  and  properly, without loss of generality, we will consider different factors that could affect the behavior of the time series. First, we consider the distribution of the error terms. We choose a time series that follows the following four different iid error distributions in our study:
Situation 1. We assume that the distribution of the error terms  and  defined in (7) follow the following situations: - 1. 
 a standard normal distribution: that is, both  and ∼ N(0,1);
- 2. 
 a t-distribution with df = 5: that is, both  and ∼ t(5);
- 3. 
 a t-distribution with df = 2: that is, both  and ∼ t(2); and
- 4. 
 a t-distribution with df = 1: that is, both  and  follow the standard Cauchy distribution.
 Second, we vary the lengths of the times series and simulate a time series with the following four different lengths in our study as stated in the following situations:
Situation 2. We consider that the lengths of the times series  and  defined in (7) to be: (i) T = 100; (ii) T = 200; (iii) T = 400; and (iv) T = 800.  After deciding the error distribution and the lengths of the AR(1) processes, we now consider the different values of 
 and 
. In our model, since both 
 and 
 are nearly nonstationary, we choose 
  and, in particular, we define 
 and 
1 and consider the following values for both 
 and 
 as stated in Situations 3 and 4:
Situation 3. We consider that the values of both  and  such that  and .
 Situation 4. We consider that the values of both  and  such that  and .
 We note that in this paper, we consider Situations 3 and 4 because when two autoregressive processes in which one is associated to the zero frequency; that is, the AR(1) with a positive coefficient in our paper, and the other is associated to the Nyquist frequency (
); that is, the AR(1) with a negative coefficient in our paper that has power at frequency 
 and completes a cycle every 2 observations, are independent or even asymptotically orthogonal. Readers may refer to 
Johansen and Schaumburg  (
1999), 
Ghysels and Osborn  (
2001), and 
del Barrio Castro et al. (
2018, 
2019) for more information. Readers may also refer to seasonal unit root tests, see, for example, 
del Barrio Castro et al.  (
2012) and 
Smith et al.  (
2009), and cointegration for processes integrated at different frequencies, see, for example, 
del Barrio Castro et al.  (
2020) with properties that are related to the series we are using in our paper.
2With four different error distributions, four different time series lengths, and the above 50 combinations of  and  values as stated in Assumptions 1, 2, 3, and 4, there are in total 800 cases of simulation in our study for the cases when autoregressive coefficients  and  have different signs.
Nevertheless, in this paper, we also study the cases when both autoregressive coefficients  and  are of the same signs, either positive or negative. Thus, we include the following situations in our study:
Situation 5. We consider that the values of both  and  such that  and .
 Situation 6. We consider that the values of both  and  such that  and .
   3.2. Algorithm
The two series 
 and 
 are generated from independent error terms, and thus, they are expected not to be related. However, Granger and Newbold (1974) and others have shown that regression of independent nonstationary time series could result in spurious outcomes. In this paper, we believe that it is possible that when regressinng independent and nearly nonstationary 
 and 
 as shown in Equation (
1) may not be spurious under some situations as we stated in Conjecture 1. To check whether Conjecture 1 could hold under some situations, we set the following algorithm for each situation (different error distributions, different time series lengths, different combinations of 
 and 
) as described in 
Section 3.1:
		
| Algorithm 1: For each situation (different error distributions, different time series lengths, different combinations of  and ) as described in Section 3.1, we will conduct the following steps in our simulation: | 
 | 
For each situation (different error distributions, different time series lengths, different combinations of 
 and 
) as described in 
Section 3.1, we will conduct simulation as described in Algorithm 1 and discuss the results in the next section.
  4. Simulation
We follow Algorithm 1 to conduct simulation for each situation (different error distributions, different time series lengths, different combinations of 
 and 
) as described in 
Section 3.1. The simulation helps us to examine whether the 
T statistic as shown in Equation (
3) for the model as shown in Equation (
1) follow a Student t-distribution. If 
 and 
 are unrelated, the true null hypothesis that all 
 coefficients are zero should be rejected around 5% of the time at the significance level of 5%. If the T test is good, that is, 
’s follow student t-distribution, the rejection rate should be close to 5%. If the rejection rate is significantly greater than 5%, then we conclude that there exists the spurious problem. In addition, we believe that it is possible that when regressing independent and nearly nonstationary 
 and 
 as shown in (
1) may not be spurious under some situations as we hypothesized in Conjecture 1. To check whether Conjecture 1 could hold under some situations, we discuss it in this section. We first discuss the results of the simulation for the cases when 
 and 
 are of different signs in the next subsection.
  4.1. Simulation for the Cases When  and  Are of Different Signs
We first analyze cases as stated in Situation 3 and exhibit the results in 
Table A1, 
Table A2, 
Table A3 and 
Table A4 displaying in 
Appendix A that report the rejecting frequency of the 
T test when 
. From 
Table A1, 
Table A2, 
Table A3 and 
Table A4, one can observe that when choosing the values of both 
 and 
 as stated in Situation 3 are from 
 to 
, the rejection rate is about 0.0000 for any 
n and for any error distribution studied in our paper, except the situation when the error term follows a 
 in which the rejection rates are close to 0.0004.
We then analyze the cases as stated in Situation 4 and show the results in 
Table A5, 
Table A6, 
Table A7 and 
Table A8 displaying in 
Appendix B that report the rejecting frequency of the 
T test when 
. Similarly, from 
Table A5, 
Table A6, 
Table A7 and 
Table A8, one can observe that when choosing values of both 
 and 
 as stated in Situation 4 are between 
 and 
, the rejection rate is zero or close to zero for any 
n and any error distribution studied in our paper.
Our analysis shows that for all the cases when choosing values for 
 as stated in Situations 3 and 4 and when choosing values of both 
 and 
 are between 
 and 
, the rejection rates are much smaller than the 5% level of significance, implying that when 
 follow Situations 3 and 4 and when both 
 and 
 are between 
 and 
, all the corresponding regressions do not encounter any spurious problem for all the cases simulated in our paper, confirming that Conjecture 1 holds. In other words, our analysis shows that when independent 
 and 
 follow nearly nonstationary AR(1) model and the autoregressive coefficients 
 and 
 have opposite signs, there is no spurious problem in the regression stated in Equation (
1) and Conjecture 1 holds.
  4.2. Simulation for the Cases When  and  Are of the Same Sign
We turn to examine whether the regression shown in Equation (
1) is spurious for the cases when both 
 and 
 are of the same signs; that is, both 
 and 
 are positive or both are negative. To do so, we follow Algorithm 1 to conduct simulations for the cases when both 
 and 
 are positive and both are negative as displayed in Situations 5 and 6 and exhibit the results in 
Table A9, 
Table A10, 
Table A11, 
Table A12, 
Table A13, 
Table A14, 
Table A15 and 
Table A16 displaying in Appendices 
Appendix C and 
Appendix D, respectively.
We first discuss the cases when both 
 and 
 are positive as stated in Situation 5. Compared with the results in 
Table A1, 
Table A2, 
Table A3, 
Table A4, 
Table A5, 
Table A6, 
Table A7 and 
Table A8, all of the rejection rates in 
Table A9, 
Table A10, 
Table A11 and 
Table A12 are significantly higher than 5% and the rejecting frequency of the 
T test is higher than 49% for any 
n and any error distribution studied in our paper, except the situation when the error term follows 
 in which the rejection rates is higher than 32%. In addition, as 
n increases, or either 
 or 
 increases, or as the error distributions are further away from normal distribution, the rejecting rate increases even further.
We turn to discuss the cases when both  and  are negative as stated in Situation 6. Similar to the cases when both  and  are positive, when both  and  are negative, the rejecting frequency of the T test is higher than 50% for any n and for any error distribution studied in our paper, except the situation when the error term follows  in which the rejection rates is higher than 31%. In addition, Similar to the cases when both  and  are positive, as n increases, or either  or  increases, or as the error distributions are further away from normal distribution, the rejecting rate increases even further.
Our analysis shows that, different from all the cases when for  and  are of different signs, for all the cases when  and  are of the same signs, either positive or negative, as stated in Situations 5 and 6, respectively, and when both  and  are between  and , the rejection rate is much higher than the 5% level of significance for all the cases studied in our paper and it could be higher than 49%, implying that when  follow Situations 5 and 6 and when both  and  are between  and , the chance that the regressions being spurious is very high for all the cases simulated in our paper, which, in turn, rejects Conjecture 1 for all the cases in Situations 5 and 6.
  5. Concluding Remarks
In this paper, we conjecture that under some situations, the regression of two independent and nearly non-stationary series does not have any spurious problem at all. To check whether our conjecture holds, we first generate two independent and nearly nonstationary AR(1) processes,  and  in which . We then regress  on independent  to get  and check whether the proportion of rejecting the null hypothesis of the beta () to be zero. We first find that consistent with the literature that supports the hypothesis of regressing two independent and (nearly) nonstationary time series could be spurious, when both  and  are of the same signs, either positive or negative, and when the values of both  and  are between  and , the rejection rate is much bigger than the 5% level of significance in all the cases examined in our simulation and it could be higher than 49% in many cases, implying that the chance that the regressions being spurious is very high for all the cases when both  and  are of the same signs.
Nonetheless, for all the cases when for  and  are of different signs, then different from the literature, our results show that when both  and  are between  and , the rejection rates are much smaller than the 5% level of significance for all the cases studied in our paper, implying that when  and  are of different signs, regressing nearly nonstationary  on independent and nearly nonstationary  will not get any spurious problem at all for all the cases being simulated in our paper.
We note that the literature shows that the regression of independent and (nearly) nonstationary time series could result in spurious outcomes. In this paper, we conjecture that under some situations, regression of two independent and nearly non-stationary series does not have any spurious problem at all, and in this paper, we aim to find some situations that our conjecture could hold. In this paper, we find that when  or , then our conjecture holds. We note that when  or , our conjecture holds which does not imply that these are only situations that our conjecture holds. There could have other situations that our conjecture could hold. We leave it to future studies to find other situations that our conjecture could hold. The purpose of our paper is to tell readers that when one finds regression of any two or more time series that do not have any spurious problem, this does not necessarily imply that the series are not independent. Thus, academics and practitioners should conduct some proper tests to show whether the series are independent.
Some academics may wonder whether there are some financial or economic time series that exhibit extreme negative autocorrelations. We believe there could have some financial or economic time series exhibit positive autocorrelations and some exhibit negative autocorrelations. We note that the time period used in our paper may not be daily or monthly, it should be set to fit the nature of the time series. It is well-known that stock returns could be overreacted or underreacted, this means that it could be positively auto-correlated or negatively auto-correlated and the true unobserved stock returns are positively auto-correlated or negatively auto-correlated. Whether they are extreme positively auto-correlated or negatively auto-correlated will depend on particular stocks. In addition, as we have mentioned before, when  or , our conjecture holds which does not imply that these are the only situations that our conjecture could hold. There may have other situations that our conjecture could hold. Some financial or economic time series could follow other situations that yet to be discovered, and thus, the conjecture could be important not only for statistics, but also for economics and finance. We also note that in our paper, we only consider  to cover nearly non-stationary series but do not cover the situations  and . We do not cover the situation  because this has been well-studied in the literature. On the other hand, we do not cover the situation  because this situation, we believe, is of no practice relevance.
We note that as far as we know, this paper is the first paper to discover that under some situations, the regression of two independent and nearly non-stationary series does not have any spurious problem at all. We follow 
Granger and Newbold (
1974) and others to provide simulation results to show our discovery. Academics could follow 
Phillips  (
1986), 
Johansen and Schaumburg  (
1999), and others to provide formal proof of the finding in our paper to replace Brownian motions by using the OU processes with 
 to approximate 
.
3 We will leave it to further research to develop the theoretical results to explain the phenomena discovered in this paper. We also note that in this paper, we get very good results by using 
. One may get good results by using the near-integrated approach. We will leave this to future studies.
4 Another problem in our study is that there is a serious problem with under-rejection. Further study could expose this problem and correct the test properly.