Long Run Returns Predictability and Volatility with Moving Averages

This paper examines how the size of the rolling window, and the frequency used in moving average (MA) trading strategies, affects financial performance when risk is measured. We use the MA rule for market timing, that is, for when to buy stocks and when to shift to the risk-free rate. The important issue regarding the predictability of returns is assessed. It is found that performance improves, on average, when the rolling window is expanded and the data frequency is low. However, when the size of the rolling window reaches three years, the frequency loses its significance and all frequencies considered produce similar financial performance. Therefore, the results support stock returns predictability in the long run. The procedure takes account of the issues of variable persistence as we use only returns in the analysis. Therefore, we use the performance of MA rules as an instrument for testing returns predictability in financial stock markets.


Introduction
introduced the moving average (MA) trading rule to detect stochastic trends in the prices of risky assets. According to the rule, unnecessary price fluctuations are supposedly reduced when the rolling averages are calculated over the price history. If the rolling average is lower (higher) than the current closing price, the rule suggests that an uptrend (downtrend) prevails in risky asset prices. Black (1986) refined the idea of Gartley by assuming that all unnecessary price fluctuations that are independent of fundamental information concerning the risky assets were noise fluctuations (that is, opposite to those produced by fundamental information). This means that any price variation that has nothing to do with any new information regarding risky assets can simply be referred to as noise. This idea has become elementary in the behavioral finance literature that started from Shiller (1981), which asks why asset price fluctuations are more severe than expected fundamentals would count for. Merton (1981) stated that trading rules are usually used to determine when to buy or sell stocks, and calls this 'market timing'. According to Merton, market timing by some technical trading rule is useless, and its performance should equal that of random market timing in efficient financial markets. Short selling costs or fund manager constraints make the market timing strategy usually turn into a simple rule, specifically when to buy risky assets, and when to sell them and switch to the risk-free asset.
The phenomenon is important because Menkhoff (2010) reports that 87% of fund managers use technical trading rules in their investment decisions, and tend to use the weekly horizon as the time frame. According to Zhu and Zhou (2009), the MA200 daily rule is the most popular trend chasing rule, in practice. This market timing rule means that the rolling window is 200 trading days, and every trading day is included in the calculations of the historical average. Ilomäki et al. (2018) used the Dow Jones Industrial Average (DJIA) stocks from the beginning of 1988 through to the end of 2017, and found that the lower was the frequency in the MA rule, the higher were average daily returns, even though average volatilities remained unchanged. The MA was calculated for the following frequencies: daily, weekly, monthly, every other month, every 3rd month, every 4th month, and every 5th month, from the maximum 200 rolling window to the smallest. Monthly frequencies were produced, for example, with ten, nine, eight, seven, six, five, four, three, and two monthly observations. The largest rolling window (200 3 days) produced the best results, on average, with all frequencies including, for example, ten observations at the monthly frequency.
More importantly, the MA200 was found to produce a lower Sharpe ratio than the random market timing strategy, implying that the most popular MA rule among practitioners was useless for risk averse market timing. However, starting from the monthly frequency, that is, every 22th trading day in the 200 day rolling window, the MA200 Sharpe ratio began to exceed that of the random timing strategy. Moreover, the Sharpe ratio continued to rise when the frequency was reduced. This suggests that the MA rules are more accurate in detecting long term stochastic trends.
The empirical results indicated an anomaly: lower frequency increases returns and Sharpe ratios with relatively unchanged volatility. The anomaly can be explained by the time varying risk premium of aggregate risk averse investors, or by investor affection for high volatility (see Baker et al., 2011). The literature in financial economics discusses stock returns being predictable in the long run, as well as problems raised by the persistence of explanatory variable observations (mainly dividend yields and dividend-price ratios). Our procedure solves the problem by using only returns as observations.
In Ilomäki et al. (2018), the annualized average volatilities and average returns were calculated using annualized Sharpe ratios. This raises a question about conditional volatility that indicates time-varying risk. Among other issues, this paper tackles that question. In addition, what would happen to the performance if the rolling window size were expanded? What about long term stock return predictability? The null hypothesis is that the size of rolling windows or frequencies do not explain the performance of MA rules. The empirical results confirm previous empirical findings, namely that reduction of the frequency of the rolling windows makes the returns grow, and the conditional risk remains the same, on average.
In addition, when the rolling window size is expanded, the financial performance improves. However, when the rolling window is 800 trading days (about three years), the significance of the frequency disappears. The results support previous empirical findings in the financial economics literature, namely that stock market returns are predictable in the long run.
Moreover, this empirical finding is free from the non-stationarity issue (as reported in Valkanov (2003) and Boudoukh et al. (2008)) that has been a major problem concerning the long-term predictability of stock returns with dividend yields, or with dividend-price ratios.
The remainder of the paper is as follows. Section 2 presents a literature review. Section 3 discusses the model specification. The empirical tests for expanded rolling windows and the conditional volatility analysis are analyzed in Section 4. Section 5 gives some concluding comments. 4

Literature Review
Beginning with the influential work of Fama and French (1988), there is substantial evidence to suggest that stocks returns are predictable by dividend yields, by dividend price ratios, or by interest rate term spreads over the longer horizon, that is, from two to four years ahead (see, for example, Campbell and Shiller, 1988;Fama, 1998;Campbell and Cochrane, 1999;Cochrane, 1999;Campbell and Viceira, 1999;and Menzly et al., 2004). Cochrane (1999) notes that stock returns are predictable in the long run over business cycles, whereas daily, weekly and monthly returns remain mainly unpredictable.
However, Valkanov (2003) emphasizes that long term predictability is mainly due to the non-stationarity issues in the regressors, such as in dividend yields and in dividend/price ratios, thereby producing spurious regression results over the longer horizon. More importantly, Cochrane (2011) reports that variations in dividend/price ratios matches almost perfectly with variations in discount rates, indicating that changes in risk-free rates and in risk premia can be substituted reported non-stationary dividend/price ratios.
In addition, Campbell and Yogo (2006), Ang and Bekaert (2007), Campbell and Thompson (2008), Hjalmarsson (2010), and Maio (2014) show that stock returns are partly predictable mainly through changes in short term interest rates over a short horizon, whereas changes in long term bond yields do not seem to predict stock returns. Obviously, short term predictability is explained by changes in the discount factor in present value models for cash flows for investors from risky assets. In fact, Boudoukh et al. (2008) stress that weak predictability over a short horizon reflects stronger predictability over a long horizon due to persistence in dividend yields and in dividend price ratios.
However, the technique we espouse in the paper for measuring stock returns predictability does not suffer from non-stationarity issues as we only analyze trading strategy returns, and compare the risk and returns that are produced by different MA frequencies and by different rolling window sizes. Brown and Jennings (1987) note that investors use technical trading rules assuming that past prices incorporate useful information. Brock et al. (1992) report that MA rules are valuable for investors, while Sullivan et al. (1999) note that MA rules can become useless when transaction costs are considered. On the other hand, Allen and Karjalainen (1999), Lo et al. (2000), and Zhu and Zhou (2009) report that risk averse investors benefit from MA rules. Neely et al. (2014) and Ilomäki (2018) find, using monthly data, that MA rules are beneficial for investors, and Marshall et al. (2017) draws the same conclusion using daily data on US small stocks. In addition, Ni et al. (2015) report that a combination of two MA rules (or the so-called dead cross emerges) is useful for investors. However, Hudson et al. (2017) and Yamamoto (2012) conclude that MA rules are totally useless in high-frequency trading.
The financial economics literature stresses that investors are risk averse, which means that they care about the first and second moments of return distributions equally, that is, both returns and variability. This basic assumption of modern financial theory can be traced back to Markowitz (1951) and Tobin (1958). Furthermore, the Capital Asset Pricing Model (CAPM) of Sharpe (1964) and Lintner (1965) indicates that the excess return of any share is linearly and positively dependent on the excess returns of the whole market. Beginning with LeRoy (1973), Merton (1973) andLucas (1978), time-varying risk-premia have been regarded as rational phenomena because investors are risk averse.
This can lead to a non-linear relationship between risk and returns. Malkiel (2003) states the common wisdom, namely that efficient financial markets do not allow investors to earn above average returns without accepting above average risk. Therefore, market efficiency can be examined by testing Malkiel's claim as a null hypothesis (allowing non-linearity in returns as the null hypothesis is the buy and hold performance of the market portfolio). Cochrane (2008) emphasizes this by claiming that the time-varying standard deviation of realized returns reflects the time-varying expected excess returns, thereby implying a constant Sharpe ratios over time.
Stock market returns for a share i are assumed to be stationary over time. A traditional way is to assume that returns i include a constant variance 2 i σ , which also indicates constant volatility, 2 i σ , as volatility is simply the square root of the variance. However, Engle (1982) shows that the conditional variance, t h (and the conditional volatility, In the simplest version, this leads to the following ARCH (1) process:  (2014)). The unconditional variance is 2 0 Bollerslev (1986) generalized the ARCH process to GARCH by adding a lagged conditional variance, t s h − , in ARCH, so that GARCH (1,1) is given as: The conditional volatility can be detected in trading rule returns by using the GARCH (1,1) model. However, Bollerslev and Engle (1986) report that the stock market returns may actually exhibit an integrated GARCH (that is, IGARCH) process, resulting in 1 1 1. α β + = If an IGARCH process is identified, the unconditional variance cannot be determined as it will expand linearly in the forecasting horizon.
However, we can still estimate the conditional volatility, for example, one year ahead.
In addition, Allen et al. (2014) found that the realized volatility exceeds the forecasted volatility in stock markets. Corsi (2009) introduced an estimation method where the possible long memory of realized volatility can be investigated, denoting the method as a heterogenous autoregressive (HAR) model, as an approximation to long memory models (see, for example,  and  for empirical examples of HAR modelling in tourism research and agricultural commodity futures returns, respectively).

Model Specification
The model follows Ilomäki et al. (2018) closely. Assume an overlapping generation economy with a continuum of young and old investors, [ ] 0,1 . A young risk-averse investor j is assumed to invest her initial wealth, j t w , in infinitely lived risky assets, i = DJIA index, and in risk-free assets that produce the risk-free rate of return, r f = the rate of three-month U.S. Treasury bill. A risky asset i pays dividend , t D and has s i x outstanding.

7
A young investor j maximizes utility from old time consumption through optimal allocation of initial resources, j t w , between risky and risk-free assets: where t E is the expectations operator, t P is the price of one share of stock index, j ν is a constant risk-aversion parameter for investor j , 2 σ is the variance of returns for the DJIA index, and j t x is the demand of risky assets for an investor j. The first-order condition is: which results in the following optimal demand for risky assets: ( ) (1) Suppose that an investor j is a macro forecaster who allocates their initial wealth between risky stocks and risk-free assets according to their forecast about the return of the risky alternative. Then, equation (1) says that the investor invests in the risky stocks only if the numerator on the right-hand side is positive. Ilomäki et al. (2018)  The rolling windows are 200, 400, 600 and 800 trading days. The first frequency is to calculate MA for every trading day; the 2nd frequency takes into account every 5th trading day (thereby providing a proxy for the weekly rule); the 3rd frequency takes into account every 22nd

Empirical Analysis
trading day (proxy for the monthly rule); the 4th rule is to calculate MA for every 44th trading day (proxy for every other month); the 5th rule takes into account every 66th trading day (proxy for every 3rd month); the 6th rule takes into account every 88th trading day (proxy for every 4th month); and the 7th rule takes into account every 110th trading day (proxy for every 5th month). In this way, the procedure generates 219100 return observations (in addition to the buy and hold results of 7825 observations) return observations that will be used in the empirical analysis.
The trading rule for all cases is a simple crossover rule. When the trend-chasing MA turns lower (higher) than the current daily closing price, we invest in the stock index (three-month US Treasury Bills) at the closing price of the next trading day. Thus, the trading rule provides a market timing strategy where we invest all wealth either in the DJIA index, or in the risk-free asset (three-month U.S. Treasury bill), while the moving average rule advises on the timing.
The MA200, MA400, MA600 and MA800 are calculated as: , we buy the stock at the closing price, t P , thereby giving daily returns as:   In addition, Figure 1 shows the realized volatilities for these returns series.
<Figure 1 goes here> <Table 2 goes here> The results concerning the conditional volatilities are again more drastic. While the buy and hold strategy produces average yearly conditional volatility for a year ahead 0.255, the trading rule volatility reduces to 0.098, on average, indicating a 62% reduction. Moreover, the conditional volatility for the random timing is approximated as: 0.255 0.52 0.184 * = . Figure 2 shows the realized volatilities with these returns series.
<Figure 2 goes here> <Table 3 goes here>  Figure 3 shows the realized volatilities with these returns series.
<Figure 3 goes here> <Table 4 goes here>  Figure 3 shows the realized volatilities with these returns series.
<Table 5 goes here.> Table 5 suggests that when the size of the rolling window is 800 trading days (about three years), the significance of the frequencies in the MA rules becomes unimportant. In order to analyze how the size of the rolling window and the frequencies can affect the performance (see Table 5) in trading rules, we estimate the following OLS regression models: where i SR denotes the Sharpe ratio, RW denotes the rolling window, all explanatory variables are taken to be dummies, and the benchmark group is the random timing strategy. Therefore, equations (2) and (3) contribute to the calculation of the analysis of variance (ANOVA).
<Table 6 goes here.> Table 6 shows that all the estimated parameters are statistically significant, and the random timing strategy produces 0.40 for the Sharpe ratio, on average. With the rolling window of 200 trading days, the Sharpe remains the same statistically. However, RW400 produces 0.55, RW600 produces 0.54, and the rolling window of 800 trading days produces 0.58, on average. Note that we use the small sample adjusted heteroskedasticity consistent standard errors (JHCSE) for all OLS estimates. Table 5 shows that the sample size is 32 for the OLS estimates in Tables 6-9.
According to the small sample adjusted Jarque-Bera test, the residuals are normally distributed, with a p-value of 0.25 for this case.
These empirical esults suggest that the widest window yields the best performance, beating the random timing performance by a 45% increase in the Sharpe ratio, on average. The adjusted R 2 value is 0.34, indicating that the size of the rolling window explains about one-third of the variations in the Sharpe ratios. The empirical results show that even the stochastic trend information from three years ago seems to improve the performance of the trading strategies.
Moreover, the random timing (Efficient Market Hypothesis) performance is beaten by MA trading strategies, using the long run rolling window. This indicates that stock returns are more predictable in the long run.
<Table 7 goes here.> Table 7 shows that the random timing beta, as well as the betas from monthly frequency onwards, are statistically significant. Moreover, the random timing strategy produces a Sharpe ratio of 0.40, on average. However, using monthly frequencies, the Sharpe ratio increases to 0.54, every other month produces 0.55, every 3rd month frequency produces 0.57, every 4th month produces 0.56 and every 5th month produces a Sharpe ratio of 0.61, on average. According to the small sample adjusted Jarque-Bera test, the residuals are normally distributed, with a p-value of 0.78.
These results support the results of Ilomäki et al. (2018), suggesting that the lowest frequency produces the best performance, beating the random timing performance by a 51% increase in the Sharpe ratio, on average. The results suggest that using daily and weekly frequencies are practically useless, except when the widest rolling window is used. The adjusted R 2 value is 0.38, which indicates that the frequency explains 38% of the variations in the Sharpe ratios.
These empirical findings suggest that the long run stochastic trend information (that is, the observations in every 5th month), enhances the performance of trading strategies, and the random timing (Efficient Market Hypothesis) performance is clearly beaten by MA trading strategies. This indicates that the stock returns are more predictable in the long run.
Next, we change the explained variable in equations (2) and (3) to the Sharpe ratio, where the unconditional volatility is changed for the conditional volatility ( ) CV measures in Tables 1-4, and are presented in Table 5. We denote this performance measure as i CSR , which is calculated as: where i r is the annualized average returns for trading rule i , % x is the share of time invested in the stock index, 0.026 is the average annual dividend, 0.022 is the average annualized risk-free rate of return, and cv i σ is the annualized average conditional standard deviation, which is estimated yearly by GARCH(1,1) for 260 trading days ahead for trading rule i .
Then, we estimate the ANOVA equations: where the benchmark group is the random market timing, and all the explanatory variables are taken to be dummies. Table 8 presents the regression results for the model given in equation (4). <Table 9 goes here.> Table 9 shows that the random timing strategy produces

Concluding Remarks
This paper investigated the performance of Moving Average (MA) market timing strategies when the rolling window used in such strategies was expanded, and the frequency used in the calculations was also changed. The timing considered 200, 400, 600, and 800 trading days rolling windows, and daily, weekly, monthly, every other month, every 3rd month, every 4th month and every 5th month frequencies were used. The primary purpose is to apply MA rule returns performance as an instrument for testing returns predictability in stock markets.
The first empirical finding is that, on average, using daily or weekly frequencies does not beat random market timing performance. For example, the MA200 trading rule, which is the most common rule among practitioners, underperforms the random market timing strategy.
However, it was also found that, when the rolling window was expanded from 400 trading days (a year and a half) onwards, with monthly and lower frequencies, the performance of MA trading strategies started to exceed that of random market timing when the unconditional volatility was used in the Sharpe ratios. Random market timing dominates if expected stock returns were constant or, as in our test, if the Sharpe ratios with unconditional and conditional volatility were fairly constant over time.
Furthermore, we found that, when the unconditional volatility was changed to the conditional volatility in the Sharpe ratios, the results became more variable, as expected, but the main results remained fairly consistent with each other. However, when the conditional volatility was incorporated in the Sharpe ratio, then the monthly frequency seemed to lose power in predicting stock returns, on average. In addition, when the size of the rolling window reached 800 trading days (about three years), the frequencies produced a similar performance in the tested MA rules. This included both Sharpe ratios using unconditional and conditional volatilities.
In summary, the empirical results indicated that stock returns were indeed predictable in the long run, and also over business cycles and stochastic trends. The results were also independent of the persistence issues of explanatory variables in predictions, which have been noted in the literature, because only returns were considered in the empirical analysis.