#### 4.1. A SAD Stock Market Cycle

In empirical finance, a large number of market anomalies have been identified, where it is claimed that a stock market is systematically influenced by the factors unrelated with the market fundamentals. The evidence is at odds with the efficient market hypothesis which is a cornerstone of modern finance theories. Central to this is the findings that investors’ mood systematically and negatively affects stock return. For example, it is hypothesized that less sunlight or more cloudiness negatively affect investors’ mood, which in turn exerts a negative impact on stock market return. The seminal papers in this area of literature include

Saunders (

1993),

Hirshleifer and Shumway (

2003), and

Kamstra et al. (

2003). However, as

Kim (

2017) reports, the studies in this area typically show negligible effects with high statistical significance, accompanied by large sample size and negligible

${R}^{2}$ values.

Kamstra et al. (

2003) study the effect of depression linked with seasonal affective disorder (SAD) on stock return. They claim that, through the link between SAD and depression, and the link between depression and risk aversion, seasonal variation in length of day can translate into seasonal variation in equity return. They consider the regression model of the following form:

where

${R}_{t}$ denotes the stock return in percentage on day

t;

M a dummy variable for Monday;

T a dummy for the last trading day or the first five trading days of the tax year;

F a dummy for fall;

C cloud cover,

P a precipitation; and

G temperature.

$SA{D}_{t}$ is a measure of seasonal depression, which takes the value of

${H}_{t}-12$ where

${H}_{t}$ represents the time from sunset to sunrise if the day

t is in the fall or winter; 0 otherwise.

Kamstra et al. (

2003; p. 326) argue that lower returns should commence with autumn because depressed investors shunning risk and re-balance their portfolio in favor of safer assets (i.e.,

${\gamma}_{6}<0$). This is followed by abnormally higher returns when days begin to lengthen and SAD-affected investors begin resuming their risky holdings (i.e.,

${\gamma}_{5}>0$). They use the daily index return data from the markets around the world: U.S. (S&P 500, NYSE, NASGAQ, AMEX), Sweden, U.K., Germany Canada, New Zealand, Japan, Australia, and South Africa. They report, nearly for all markets, that the parameter estimate of

${\gamma}_{5}$ is positive and statistically significant at a conventional level of significance; and that of

${\gamma}_{6}$ is negative and statistically significant. These results are the basis of their evidence for the existence of the SAD effects around the world. However, the results are based on the point null hypothesis at a conventional level of significance under large sample sizes, for which

Rao and Lovric (

2016) among others are concerned about. In this section, we evaluate the regression results of

Kamstra et al. (

2003) using the interval-based tests.

#### 4.1.1. Evaluating the Results of Kamstra et al.

We first conduct the interval tests using the regression results reported in

Kamstra et al. (

2003).

Table 1 reports the sample size (

T) and

${R}^{2}$ values of the regression (

9), reproduced from

Kamstra et al. (

2003; Table 2 and 4A–C). From these values, we calculate the

F-statistic for joint significance of all slope coefficients are jointly zero (

${H}_{0}:{\gamma}_{1}=\dots ={\gamma}_{9}=0$), as reported in

Table 1. The

$CR$ column reports the 5% critical values from the central

F distributions, which are around 1.88 regardless of sample size. It appears that the

F-test for joint significance is clearly rejected for all markets at a conventional significance level, which indicates that the all slope coefficients of regression (

9) are statistically significant. However, this is at odds with negligible

${R}^{2}$ values reported in

Table 1 which indicate little predictive power for all markets.

Suppose that, for a regression model for stock return to be economically significant, it should explain at least 5% of the return variation. That is, we test for ${H}_{0}:0\le {R}_{p1}^{2}\le 0.05$ against ${H}_{1}:{R}_{p1}^{2}>0.05$. The column labeled $C{R}_{2}$ reports the 5% critical values associated with $F(J,T-K-1;{\lambda}_{max})$ while the value of ${\lambda}_{max}$ is associated with ${R}_{p1}^{2}=0.05$ (and ${R}_{p0}^{2}=0$). According to these critical values, the null hypothesis of economically negligible effect cannot be rejected for all market indices except for US4. The critical values listed in column $C{R}_{1}$ are those associated with ${H}_{0}:0\le {R}_{p1}^{2}\le 0.01$, which delivers rejection in four markets only. If we test for ${H}_{0}:0\le {R}_{p1}^{2}\le 0.1$, the critical values in column labeled $C{R}_{3}$ indicate that the predictive power of the estimated models are economically negligible for all markets.

Economic significance of the magnitude of regression coefficients reported in

Kamstra et al. (

2003) is also questionable. For example, for the U.S. market with S&P500 index (US1),

${\widehat{\gamma}}_{6}=-0.058$ and its 90% confidence interval is

$[-0.10,-0.01]$. The point estimate means that the stock return is on average lower by 0.058% during the autumn period. Suppose, for a factor to have an economically meaningful impact on stock return, its marginal effect should be at least 0.5% (either positive or negative) to justify transaction cost. Then, one can formulate the null hypothesis of economically negligible effect as

${H}_{0}:-0.5\le {\gamma}_{6}\le 0.5$. The 90% confidence interval is clearly within this bound, so we do not reject

${H}_{0}$ at the 5% level of significance. The same inferential outcomes apply to all the other regression coefficients of (

9) reported in

Kamstra et al. (

2003). Note that, depending on the attitude of the researcher, one can formulate the null hypothesis as

${H}_{0}:({\gamma}_{6}<-0.5)\cup ({\gamma}_{6}>0.5)$, but it is also clearly rejected at the 5% level in favor of a negligible effect. Although

Kamstra et al. (

2003) justify their effect size using the annualized return, this annualized return does not take account of the underlying volatility of stock return or trading costs involved.

#### 4.1.2. Replicating the Results of Kamstra et al.

We now replicate the model (

9) using the value-weighted daily returns from the NYSE composite index (CRSP). The SAD variable and other dummy variables are generated following

Kamstra et al. (

2003), using programming language R (

R Core Team 2017). The data for weather variables (

C,

P, and

G) are collected from the National Center for Environmental Information.

2 Our data for the regression ranges from January 1965 to April 1996 (7886 observations), due to the limited availability of the weather data (

C) for New York. We have the following estimated values for the key coefficients:

${\widehat{\gamma}}_{5}=0.032$ with

t-statistic of 2.29;

${\widehat{\gamma}}_{6}=-0.055$ with

t-statistic of

$-2.17$; and

${R}^{2}=0.05$. These values are fairly close to those reported in Table 4A of

Kamstra et al. (

2003).

We first pay attention to the point null hypothesis that

${H}_{0}:{\gamma}_{5}={\gamma}_{6}=0$ for joint significance of the SAD effects. The

F-statistic is 3.18 with the

p-value of 0.04, rejecting

${H}_{0}$ at the 5% significance level. This is despite the observation that the incremental contribution of these two variables is negligible, measured by

${R}_{1}^{2}-{R}_{0}^{2}=0.0008$ with

${R}_{1}^{2}=0.0501$ and

${R}_{0}^{2}=0.0493$. Next, we consider an interval hypothesis of minimum-effect. Suppose that the incremental contribution of these variables should be at least 0.01 to be economically significant. That is,

Assuming

${R}_{p0}^{2}=0.05$,

${\lambda}_{max}=83.87$ and the corresponding 5% critical value is 58.97, obtained from

$F(J,T-K-1;{\lambda}_{max})$. With this critical value being much larger than the

F-statistic of 3.18, the above interval null hypothesis of minimum-effect cannot be rejected at the 5% level, providing evidence that the SAD economic cycle is economically negligible in the U.S. stock market.

#### 4.2. Empirical Validity of an Asset-Pricing Model

An asset-pricing model explains the variation of asset return as a function of a range of risk factors. The most fundamental is the capital asset pricing model (CAPM) which stipulates that an asset (excess) return is a linear function of market (excess) return. The slope coefficient (often called beta) measures the sensitivity of an asset return to the market risk. While the CAPM is theoretically motivated, the market risk alone cannot fully explain the variation of asset return. In response to this, several multi-factor models have been proposed, which augment the CAPM with a number of empirically motivated risk factors such as the size premium or value premium (see, for example,

Fama and French 1993). The most recently proposed multi-factor model is the five-factor model of

Fama and French (

2015), which can be written as

where

${R}_{it}$ is the return on an asset or portfolio

i at time

t (

$i=1,\dots ,N;t=1,\dots ,T)$,

${R}_{ft}$ is the risk-free rate,

${R}_{Mt}$ is the return on a (value-weighted) market portfolio at time

t,

$SM{B}_{t}$ is the return on a diversified portfolio of small stocks minus the return on a diversified portfolio of big stocks, the

$HM{L}_{t}$ is the spread in returns between diversified portfolios of high book-to-market stocks and low book-to-market stocks,

$RM{W}_{t}$ is the spread in returns between diversified portfolios of stocks with robust and weak profitability, and the

$CM{A}_{t}$ is the spread in returns between diversified portfolios of low and high investment firms. The precursors to this 5-factor model include the 3-factor model of

Fama and French (

1993) which include

$({R}_{Mt}-{R}_{ft})$,

$SMB$, and

$HML$; and the 4-factor model of

Carhart (

1997) which adds momentum factor (

$MOM$) to the 3-factor model. If these factors fully or adequately capture the variation of asset return, then the intercept terms

${a}_{i}$ (which may be may be interpreted as the risk-adjusted return) should be zero or sufficiently close to it. On this basis, the model’s empirical validity is evaluated by testing for

${H}_{0}:{a}_{1}=\dots ={a}_{N}=0$, which is a point-null hypothesis.

#### 4.2.1. GRS Test: Minimum-Effect

The

F-test for

${H}_{0}$ is widely called the GRS test, proposed by

Gibbons et al. (

1989). Let

$a={({a}_{1},\dots ,{a}_{N})}^{\prime}$ be the vector of

N intercept terms, and

$\mathsf{\Sigma}$ be the

$N\times N$ covariance matrix of error terms. The model (

10) is estimated using the ordinary least-squares:

$\widehat{a}$ denotes the estimator for

a and

$\widehat{\mathsf{\Sigma}}$ the estimator for

$\mathsf{\Sigma}$. The

F-test statistic is written as

where

T is the sample size,

$K=5$ is the number of risk factors,

$\widehat{\mathsf{\Omega}}$ is the

$K\times K$ covariance matrix of risk factors, and

$\widehat{\mu}$ is the

$K\times 1$ mean vector. Under the assumption that the error terms

e’s follow a multivariate normal distribution, the statistic follows the

$F(N,T-N-K;\lambda )$ distribution, with the non-centrality parameter

where

$\widehat{\theta}$ is the

ex-post maximum Sharpe ratio of

K-factor portfolio,

$\theta $ is the

ex-ante maximum Sharpe ratio of

K-factor portfolio, and

${\theta}^{\ast}$ is the slope of the

ex ante efficient frontier based on all assets.

Gibbons et al. (

1989) call

$\theta /{\theta}^{\ast}$ the proportion of the potential efficiency. Note that, under

${H}_{0}$, this ratio is equal to one and

$\lambda =0$.

However, perfect efficiency cannot exist in practice. It is unrealistic that all of a values are jointly and exactly zero. On this point, it is sensible to consider an interval-based hypothesis testing. For example, consider ${H}_{0}:0.75<\theta /{\theta}^{\ast}\le 1$ against ${H}_{1}:\theta /{\theta}^{\ast}<0.75$. This is on the basis of judgment that the factors with the proportion of potential efficiency of 0.75 or higher provide practically efficient asset-pricing.

The data is available from French’s data library monthly from 1963 to 2015 (

$T=630$).

3 We use 25 portfolio returns (

$N=25$) sorted by size and book-to-market ratio extensively analyzed by

Fama and French (

1993,

2015).

Table 2 reports the test results. The GRS test for

${H}_{0}:{a}_{1}=\dots ={a}_{N}=0$ are clearly rejected for all models considered, with the

p-value (not reported) practically 0 for all cases. The critical values of this test (from the central

F distributions) is listed in the column labeled

$CR$. This results suggest that none of the asset pricing models are able to fully capture asset return variations. This is at odds with the high values of

${R}^{2}$ and small values of

$\left|a\right|$, especially multi-factor models. For the 4-factor and 5-factor model, the estimated ratio of potential efficiency is much higher than other models, close to 0.7.

Table 2 also reports the critical values (

$C{R}_{2}$) for

${H}_{0}:0.75<\theta /{\theta}^{\ast}\le 1$, which is calculated from

$F(N,T-N-K,{\lambda}_{max})$ distribution with the value of

${\lambda}_{max}$ implied by

$\theta /{\theta}^{\ast}=0.75$. It is found that, for the 4-factor and 5-factor models,

${H}_{0}:0.75<\theta /{\theta}^{\ast}\le 1$ cannot be rejected at the 5% level of significance. This suggests that these multi-factor model have captured the variation of asset returns adequately, with economically negligible deviation from the perfect efficiency. For the CAPM and 3-factor models, the interval-based

${H}_{0}$ is rejected at the 5% level, but this seems consistent with the estimated values of potential efficiency which are less than 0.5 for both cases. It is worth noting that the critical values

$CR$ for the point-null hypothesis (based on the central

F-distribution) are nearly identical for all cases, regardless of the estimation results such as

${R}^{2}$ and

$\left|a\right|$. However, those for the interval-based tests are different, depending on the model estimation results.

#### 4.2.2. LR Test: Model Equivalence

We now test for the validity of the asset-pricing models using the model equivalence test discussed in

Section 3.6. We calculate the LR test for given in (

8) for

${H}_{0}:{a}_{1}=\dots ={a}_{N}=0$, which is written as

where

$\widehat{\mathsf{\Sigma}}({H}_{i})$ denotes the maximum likelihood estimator for

$\mathsf{\Sigma}$ under

${H}_{i}$. For the model equivalence test given in (

7), the above LR statistic follows the

${\chi}_{N,{\delta}^{2}}^{2}$ distribution with

${\delta}^{2}=T{\Delta}^{2}$. Using the same data set as in

Section 4.2.1, the LR statistic is 105.67, 88.05, 75.82, and 69.26 for the CAPM, 3-factor model, 4-factor model, and 5-factor model respectively. If we set

${\Delta}^{2}$ to 0.1, the 5% critical value is 61.12, indicating that

${H}_{0}$ is not approximately valid for all models. If we set

${\Delta}^{2}$ to 0.15, the 5% critical value is 87.19, indicating

${H}_{0}$ is approximately valid only for 4-factor and 5-factor models. If we set

${\Delta}^{2}$ to 0.20, the 5% critical value is 114.00, indicating that

${H}_{0}$ is approximately valid for all models. It appears that the results are sensitive to the choice of

${\Delta}^{2}$ values. However, at a reasonable value of

${\Delta}^{2}=0.15$, the results are consistent with the minimum-effect test based on the GRS test conducted above.

#### 4.3. Testing for Persistence of a Time Series

The presence of a unit root in economic and financial time series has strong implications to many economic theories and their empirical validity (see

Choi 2015). For example, a unit root in the real exchange rate is evidence that the purchasing power parity does not hold (

Lothian and Taylor 1996); and a unit root in the real GNP supports the view that a shock to the economy has a permanent effect, which is not consistent with the traditional (or Keynesian) view of business cycle (

Campbell and Mankiw 1987). To test for the hypothesis, the unit root test proposed by

Dickey and Fuller (

1979) has been widely used, while a large number of its extensions and improvement have been proposed. The augmented Dickey–Fuller (ADF) test for a time series

Y is based on the regression of the form

where

$\Delta {Y}_{t}={Y}_{t}-{Y}_{t-1}$;

m is the autoregressive (AR) order of

Y; and

${u}_{t}$ is an

$i.i.d.$ error term with zero mean and fixed variance. Note that

$\theta \equiv \tau -1$ where

$\tau $ is the sum of all AR(m) coefficients in level of

Y, measuring the degree of persistence. The test for a unit root is based on point-null hypothesis of

${H}_{0}:\theta =0$ against

${H}_{1}:\theta <0$. Under

${H}_{0}$, the

t-test statistic asymptotically follows the Dickey–Fuller distribution, from which the critical values of the test are obtained. Under

${H}_{1}$, the

t-test statistic asymptotically follows the standard normal distribution.

The problems of the unit root test are well documented (see, for example,

Choi 2015). The most well-known is its low power (at a conventional significance level), which means that there is a high chance of committing Type II error (failure to reject a false null hypothesis). On this point,

Kim and Choi (

2017) propose the unit root test at the optimal level of significance, which is obtained by minimizing the expected loss from hypothesis testing. They find that the optimal level is in the 0.3 and 0.4 range for many economic time series, arguing that the exclusive use of 0.05 level has led to accumulation of false stylized facts. The other problem of the test is the discontinuity of the sampling distributions of the test statistic under

${H}_{0}$ and

${H}_{1}$. This makes the decision highly sensitive to the value specified under

${H}_{0}$.

More importantly, as discussed in

Section 2.3, it is unrealistic to assume that an economic time series such as the real GNP or real exchange rate has an autoregressive root exactly equal to one. An economist may wish to test whether a time series shows a degree of persistence practically different from that of a unit root time series. The test can be conducted in the context of non-inferiority test discussed in the previous section. To do this, we need to find the value of

$\tau $ or

$\theta $ under which a time series shows a practically different degree of persistence from a unit root time series. According to

DeJong et al. (

1992), a plausible value of

$\tau $ under

${H}_{1}:\theta <0$ is 0.85, 0.95, 0.99 for annual, quarterly and monthly data respectively, which translate to the

$\theta $ values of

$-0.15$,

$-0.05$, and

$-0.01$. On this basis, we test for the persistence of a time series using the following interval hypotheses:

where

${\theta}_{1}\in \{-0.15,-0.05,-0.01\}$ depending on the data frequency. The time series is practically trend-stationary under this

${H}_{0}$. This test is a standard one-sample

t-test whose statistic asymptotically follows the standard normal distribution. However, we note that the least-squares estimator for

$\tau $ or

$\theta $ is biased in small samples, which may adversely affect the small sample properties of the test. As an alternative to the non-inferiority test, we also use the bias-corrected bootstrap confidence interval for

$\theta $ for improved statistical inference, similar to those of

Kilian (

1998a,

1998b) and

Kim (

2004).

For a set of time series

$({Y}_{1},\dots .{Y}_{T})$, we first estimate the parameters of model (

13) using the bias-corrected estimators. Let

$({\widehat{\delta}}_{0},{\widehat{\delta}}_{1},\widehat{\theta},{\widehat{\rho}}_{1},\dots ,{\widehat{\rho}}_{m-1})$ be the bias-corrected estimators; and let

$\{{e}_{t}\}$ denote the corresponding residual. Generate the artificial data set as

using

$({Y}_{1},\dots ,{Y}_{m})$ as the starting values, where

${e}_{t}^{\ast}$ is a random draw with replacement from

${\{{e}_{t}\}}_{t=m+1}^{T}$ and

$({\widehat{\beta}}_{1},\dots ,{\widehat{\beta}}_{m})$ are the AR coefficients in level associated with

$(\widehat{\theta},{\widehat{\rho}}_{1},\dots ,{\widehat{\rho}}_{m-1})$. Using

${\{{Y}_{t}^{\ast}\}}_{t=1}^{T}$, estimate the

$AR(m)$ coefficients, again with bias correction,

$({\widehat{\delta}}_{0}^{\ast},{\widehat{\delta}}_{1}^{\ast},{\widehat{\beta}}_{1}^{\ast},\dots ,{\widehat{\beta}}_{m}^{\ast})$. For bias correction, we use

Shaman and Stine (

1988) asymptotic formula with stationarity-correction, following

Kilian (

1998b) and

Kim (

2004). We obtain

${\widehat{\theta}}^{\ast}={\widehat{\tau}}^{\ast}-1$, where

${\widehat{\tau}}^{\ast}={\sum}_{j=1}^{m}{\widehat{\beta}}_{j}^{\ast}$. Repeat this process

B times to obtain the bootstrap distribution

${\{{\widehat{\theta}}^{\ast}(j)\}}_{j=1}^{B}$, which can be used as an approximation to the sampling distribution of

$\widehat{\theta}$. If the confidence interval for

$\theta $ obtained from

${\{{\widehat{\theta}}^{\ast}(j)\}}_{j=1}^{B}$ covers

${\theta}_{1}$, then this is evidence that the time series shows a degree of of persistence practically no different from that of a trend-stationary time series.

Table 3 reports the results from the extended

Nelson and Plosser (

1982) data for a set of annual U.S. macroeconomic time series, setting

${\theta}_{1}=-0.15$. Firstly, the ADF test (a point-null hypothesis test) provides the

p-values larger than 0.05 for most of time series, providing evidence that many macroeconomic time series have a unit root. In contrast, the

t-test (non-inferiority test) results for

${H}_{0}:\theta \le -0.15$ against

${H}_{1}:\theta >-0.15$ show that we clearly cannot reject this

${H}_{0}$ at the 5% level of significance (asymptotic critical value 1.645) for the real GNP, real per capita GNP, industrial production, employment, unemployment rate, providing evidence that these time series are practically trend-stationary. As for the bootstrap inference, it is found that the 95% bias-corrected bootstrap confidence interval for

$\theta $ does cover

$-0.15$, for the real GNP, real per capita GNP, industrial production, employment, unemployment rate, real wage, and interest rate, indicating that these time series show the degree of persistence practically of a trend-stationary time series. The two alternative methods are in agreement in their inferential outcomes, except for real wage and interest rate.

The results for the test of persistence based on the non-inferiority test are largely consistent with those of

Kim and Choi (

2017) who re-evaluate the ADF test results at the optimal level of significance and report evidence that the real GNP, real per capita GNP, employment, and money stock do not have a unit root. These results are also largely consistent with the Bayesian evidence of

Schotman and van Dijk (

1991).