Longer-Term Forecasting of Excess Stock Returns—The Five-Year Case

Long-term return expectations or predictions play an important role in planning purposes and guidance of long-term investors. Five-year stock returns are less volatile around their geometric mean than returns of higher frequency, such as one-year returns. One would, therefore, expect models using the latter to better reduce the noise and beat the simple historical mean than models based on the former. However, this paper shows that the general tendency is surprisingly the opposite: long-term forecasts over five years have a similar or even better predictive power when compared to the one-year case. We consider a long list of economic predictors and benchmarks relevant for the long-term investor. Our predictive approach consists of adopting and implementing a fully nonparametric smoother with the covariates and the smoothing parameters chosen by cross-validation. We consistently find that long-term forecasting performs well and recommend drawing more attention to it when designing investment strategies for long-term investors. Furthermore, our preferred predictive model did stand the test of Covid-19 providing a relatively optimistic outlook in March 2020 when uncertainty was all around us with lockdown and facing an unknown new pandemic.


Introduction
In recent years, investment planners and expert forecasters recognize the importance of the horizon when long-term predictions of (excess) stock returns are constructed. It is well-known in the financial literature that it is difficult to provide better forecasts than the simple historical long-term mean. There are also regular discussions in academic as well as practitioner circles on whether it is possible at all. This paper follows and adds to the work and insight of Lioui and Poncet [1] or Møller and Rangvid [2] which provide evidence that it is indeed possible to predict better than the trivial long-term mean when a careful validation approach is applied based on reasonable long-term economic drivers of the financial market to predict. It is also well-recognized, of course, that the longer the horizon, the less noise will disturb the prediction. Therefore, one would expect that it becomes more and more difficult to beat the long-term mean when the horizon is increased, simply because the noise involved in the long term is so low that it seems to be difficult to lower it even further through predictive modeling. In this paper, we find that long-term forecasts counter-intuitively improve as much as, or sometimes even more, than the simple one-year long-term mean forecast. We show this concretely via a comparison between the five-year and the one-year case. The extraordinary and surprisingly good results we get from the five-year prediction approach should lead to reconsiderations when deciding on the horizon to use in the design of investment strategies for long-term investors. The linear regression model is the classical benchmark in this predictive modeling context. However, there is some evidence documented in the literature that stock return predictability is much stronger when the functional form of the relationship between stock returns and predictive variables is allowed to be nonlinear [3][4][5]. Along those lines, we adopt and implement the fully nonparametric local-linear smoothing technique in combination with a leave-k-out cross-validation. The linear function can be estimated through the local-linear smoother without any bias and is thus automatically embedded in our approach. We extent the single-and full-benchmarking of Kyriakou et al. [6] to the five-year horizon, since a careful imposition of structure in the statistical modeling process has shown to be promising in previous work [7][8][9]. Our preferred predictive model gives an optimistic outlook, even though we are at the beginning of a worldwide economic crisis caused by the Covid-19 pandemic. This surprising optimistic outlook was indeed in line with the performance of the stock market from March till June 2020.
The remainder of this paper is organized as follows. In Section 2, we present our framework for nonlinear predictive long-term regressions. We define the underlying financial model, describe the local-linear smoother with its theoretical properties, and present our validation criterion for the model and smoothing parameter selection. In Section 3, we provide the details of our dataset and a short descriptive analysis. Subsequently, we illustrate our empirical findings from two different validated scenarios: (i) a single benchmarking approach where only the dependent variable is measured in excess of the benchmark; and (ii) the case where both the independent and dependent variables are adjusted accordingly to the benchmark (full-benchmarking approach). Finally, we comment on real-income pension prediction and give one-year-ahead real-time predictions. Section 4 summarizes the key points of our analysis and concludes the paper.

A Method for Long-Term Prediction
The focus of our analysis lies on nonlinear predictive relationships between stock returns over the next T years in excess of a reference rate (or benchmark) and a set of explanatory variables. We aim to investigate different benchmark models and their predictability over horizons of one year and five years. We consider the four different benchmarks proposed in Kyriakou et al. [6]: the short-term interest rate, the long-term interest rate, the earnings-by-price ratio, and the inflation rate.

The One-year Case
We investigate stock returns S t = (P t + D t )/P t−1 , where D t denotes the (nominal) dividends paid during year t and P t the (nominal) stock price at the end of year t, in excess (log-scale) of a given benchmark B where A ∈ {R, L, E, C} with, respectively, R t is the short-term interest rate, L t the long-term interest rate, E t the earnings accruing to the index in year t, and CPI t the consumer price index for year t. The predictive nonparametric regression model for the one-year horizon is then given by where is an unknown function which we want to estimate with the local-linear smoother and ξ t is an error term. The sequence of error terms in Equation (3) is assumed to form a martingale difference process, consisting of serially uncorrelated zero-mean random variables, given the past, of an unknown conditionally heteroscedastic form σ(x).

The T-year Case
Longer horizons are of fundamental interest to long-term investors, such as pension funds, insurance companies, other institutional investors, or market participants saving for a distant pay-off. Long-term investors are in general willing to take on more risk for higher rewards. Note that the risk is taken usually increases with the investment horizon. However, Rapach and Zhou [10] report that longer horizons also tend to produce better estimates than shorter horizons. Munk and Rangvid [11] indicates that major investors today use longer horizons (up to ten years) to stabilize and improve future predictions.
In our paper, we concentrate on the five-year view as the first compromise for a comparison between a longer horizon and the shorter one-year horizon. The choice of the five-year horizon is arbitrary and is intended to illustrate the potential of our approach. Any other combination of short-and long-term horizons could be considered. However, it seems that shorter horizons based on monthly, weekly, daily, or even intra-day data do not provide good information on the pension savers future income. Therefore, these types of short-term predictions, also known as investment robots, are not suitable when the (maximum) risk level of the pensioner is defined. More in detail, for longer horizons T we consider the sum of annual continuously compounded returns: Note that the returns Z (A) t are overlapping, which requires a careful econometric modelling.
Assume for illustration a linear relationship in Equation (3) between Y (A) t and X t−1 , as well as some (linear) persistence of the forecasting variable (treating the variables as deviations from their means): with ξ t as in Equation (3), η t being white noise, and slope parameters β and γ. The T-year regression problem that is implied by this pair of one-year regressions is now that is, the excess stock return for year t over the next T years can be decomposed in two parts: a predictive part depending on the variable X t−1 and an unpredictable error term ν t . To avoid functional misspecification due to our simplistic assumption, we allow for nonlinearity and set up our predictive nonparametric regression model in the same fashion as in Equation (3) Z where is again an unknown smooth function. The important difference between the models (3) and (9) is now that ξ t is a martingale difference process but ν t will be serially correlated by construction. The predictive variables under consideration collected in the q-dimensional vector X include again the dividend-by-price ratio d, earnings-by-price ratio e, short-term interest rate r, long-term interest rate l, inflation π, term spread s, and the one-year excess stock return Y (A) .

The Local-linear Smoother for the T-year Horizon
As mentioned before, the important difference between the one-year and the T-year case is the inherited serial correlation of the error terms ν t in Equation (9). It is well documented in the statistical literature that, in the presence of correlated errors, quite fundamental problems occur: (i) while the consistency result from Theorem 3 in Kyriakou et al. [6] still holds, the left-out information of the error dependency leads to less efficient estimators [12,13]; and (ii) the commonly applied automatic smoothing parameter selection procedures, like cross-validation or plug-in, break down [14,15]. The latter problem will be discussed in detail in the next section.
More efficient estimators are proposed in the literature. For example, Xiao et al. [12] use a pre-whitening transformation of the dependent variable that has to be estimated from the data. More in detail, the residual process ν t is assumed to be stationary, zero-mean with variance σ 2 ν , and has an invertible linear process representation where ε t are i.i.d. with zero mean and variance σ 2 ε . Define c(L) = ∑ ∞ i=0 c i L i with the usual lag operator L. By inverting c(L) one gets an autoregressive representation of ν t of infinite order: and thus a(L)ν t = ε t . The transformed regression problem (9) is then with an uncorrelated error term ε t . In practice,Z (A) t is replaced by an approximation based on estimates of the coefficients {a i } and a truncation of the infinite sum. Other contributions which provide more efficient local-polynomial estimators under similar settings can be found in Su and Ullah [13], Linton and Mammen [16], or more recently in Geller and Neumann [17] (and citations therein).
We do not apply such techniques in our paper for several reasons. First, additional parameters {a i } have to be estimated and the infinite sum must be truncated at a meaningful value in Equation (14), or the residual process has to be adequately modeled by some parametric ARMA process or even nonparametrically, where the appropriate lag-length has to be specified. Second, most examples and simulations are given in the literature are one-dimensional. However, we adapt our local-linear smoother in a multidimensional problem and it is not clear what the efficiency gain would be in our scarce data environment. Finally, in a recent study, Scholz et al. [8] show that, in a long-term set-up with annual data, the reduction of the prediction bias is crucial as it contributes squared to the prediction mean squared error. Our approach of imposing economic structure by using explanatory variables that are transformed according to the chosen benchmark (full benchmarking) aims in a similar direction.
Thus, we think that the more severe problem caused by autocorrelation is the misleading smoothing parameter selection for methods like cross-validation or plug-in. These will be discussed in detail in the next section.

A Principle of Validation for Model Selection and Smoothing Parameter Choice
For our nonparametric estimation technique, we require an adequate measure of the predictive power. Classical in-sample measures, such as the R 2 or adjusted R 2 , are not appropriate. Note further that in prediction, we are not interested in how well a model explains the variation inside the sample but, instead, in its out-of-sample performance. Therefore, we aim to estimate the prediction error directly.
We follow Nielsen and Sperlich [7] and use the validated R 2 V which is based on a leave-k-out cross-validation for both models as well as optimal bandwidth (smoothing parameter) selection. This method has been shown to be suitable also in a time series context. For example, Bergmeir et al. [15] show that, in the case of uncorrelated errors, cross-validation is preferred to out-of-sample evaluation where a section from the end of the series is withheld for evaluation. For the latter, only one evaluation on a test data set is possible, whereas cross-validation performs various evaluations. This property is beneficial, especially for small data sets such as ours applied in Section 3.
The validation criterion for one-year predictions is defined as where leave-one-out estimators are used:m −t for the conditional mean function m from equation (3) andȲ : where leave-k-out estimators are used:m −t for the conditional mean function m from Equation (9) and t . Both are computed by removing k observations around the tth time point. Here we use k = 2T − 1 due to the construction of the dependent variable over a horizon of T years, that is, for the five-year horizon the leave-nine-out estimator. As it is clear from the context whether we are in the one-or five-year case, we use in the following for both horizons the shorter notation R 2 V . Note that the R 2 V measures the predictive power of a given model compared to the cross-validated historical mean; a positive R 2 V implies that the predictor-based regression model (3) or (9) outperforms the corresponding historical average excess stock return over 1 or T years, respectively.
Cross-validation often requires the omission of more than one single data point and our five-year scenario is one such example. It can also happen that additional corrections are necessary when the omitted fraction of data is considerable [18]. In addition, De Brabanter et al. [14] show that automatic tuning parameter selection methods, such as cross-validation or plug-in, can fail when serial correlation arises (as in our longer-horizon application) and the structure of the error terms is ignored. Here, the problem is that for increasing correlations the chosen bandwidths become smaller, and the corresponding model fits become progressively more under-smoothed [19]. This reduces the bias of the predictor which contributes in a squared fashion to the prediction mean squared error-the numerator of the ratio in Equations (15) and (16). Consequently, the R 2 V also increases-not for the better fit but due to the ignored correlation structure. The consequence is a misleading decision on the bandwidth or model specification (the set of preferred covariates). To avoid such problems, Chu and Marron [20] propose the use of bimodal kernel functions which are known to remove the correlation structure very effectively. Nevertheless, the estimatorm suffers from increased mean squared error [14]. To overcome the problems mentioned, De Brabanter et al. [14] propose a correlation-corrected cross-validation which consists of two steps: i) finding the amount of data k to be left out in the estimation process when a bimodal kernel function is used; and ii) applying the actual choice of the smoothing parameter using leave-k-out cross-validation with a unimodal kernel function. As we know k in our set-up by construction, we can skip the first step. For example, remember that Z t+4 . It is easy to see from Figure 1 that this amounts to a leave-nine-out set of Z

The Data Set
We applied our local-linear smoother to annual US stock-market data. This dataset was provided by Robert Shiller and is made available from http://www.econ.yale.edu/~shiller/data.htm. It includes, among other variables, long-term changes of the Standard and Poor's (S&P) Composite Stock Price Index, consumer price index changes, and interest rate data from 1872 to 2019. This is an updated and revised version of (Shiller [21], Chapter 26), which provides a detailed description of the data.
Note that the extension of the risk-free rate series (based on the six-month commercial paper rate until 1997 and, afterward, on the six-month certificate of deposit rate, secondary market) was not possible as it was discontinued in 2013. Here, we followed the strategy of Welch and Goyal [22] and Mammen et al. [23] and replace this variable by an annual yield based on the six-month Treasury-bill rate, secondary market, from https://fred.stlouisfed.org/series/TB6MS. This new series was only available from 1958 to 2019. In the absence of information prior to 1958, we had to estimate it. To this end, we regressed the Treasury-bill rate on the risk-free rate from Shiller's data for the overlapping period 1958 to 2013. Assuming a linear relationship and using an ordinary least squares regression, we obtained the estimated equation: Treasury-bill rate = 0.0961 + 0.8648 × commercial paper rate, with an R 2 of 98.6%. Therefore, we instrumented the risk-free rate from 1872 to 1957 with the predicted regression equation. The correlation between the actual Treasury-bill rate and the predictions for the estimation period was 99.3%.

Descriptive Analysis
There is much research on the predictability of returns and a lot is known about the characteristics of short-horizon stock returns. For example, stylized facts about daily and monthly returns include excess kurtosis, distributions which are not normal, and volatility clustering [24]. However, less is known about distributions of long-horizon returns. However, such characteristics are of central interest to investors saving for distant pay-offs. Figure 2 shows the time series of the one-year returns (left) and five-year returns (right), both in excess of the risk-free benchmark R, which are displayed on the same scale for the sake of comparison. The five-year series exhibits larger positive returns, which is not surprising as a longer period under risk should be paid-off with a higher risk premium. The autoregressive structure of the five-year returns can be easily seen in comparison to the assumed weak dependence of one-year returns.  Figure 3 shows histograms of the one-year returns (left) and five-year returns (right) together with a kernel density estimate (red) and a fitted normal distribution (green). One notes again that the distribution of the five-year returns is shifted to the right but has a higher volatility. A Jarque-Bera test of the hypothesis of normality does reject for one-year returns (p-value = 0.013) but does not reject for five-year returns (p-value = 0.522). Furthermore, the density estimate for the five-year returns indicates more a mixture of normals than a single normal distribution which represents some evidence of a possible structural break in the data-generating process. Including structural changes in the modelling process could be beneficial, as shown in the literature especially for higher-frequency returns [25,26]. However, this comes with additional effort which is beyond the scope of our article and is left for future research. Several important points would have to be taken into consideration, for example: (i) it is not clear for which point in time one should incorporate a structural break in our annual data (the Great Recession, the Second World War, the Global Financial Crisis, the Bretton Woods agreement, etc.); (ii) a simple sample split would result in even smaller and not balanced data sets. From a statistical perspective, as we apply a fully nonparametric method, this would lead to mostly one-dimensional and potentially linear models. This way, we would lose the analysis of higher-dimensional models and nonlinear relationships between excess stock returns and their predictor variables.
This section is concluded with Table 1 which displays standard descriptive statistics for one-year and five-year returns as well as the available covariates. Both the one-year and five-year excess returns had a negative skewness, that is, the left tail of the distribution (large negative returns) was longer or fatter than the right tail (large positive returns). Note that this is more pronounced in the case of one-year rather than five-year returns. While one-year returns were leptokurtic (positive excess kurtosis of 0.68), five-year returns exhibited a small negative excess kurtosis of −0.37.  Table 1 for the other benchmarks are available upon request by the authors. In the next sections, we analyzed the predictability of one-year and five-year stock returns in excess of the different benchmarks.

The Single Benchmarking Approach
In this section, we considered a single benchmarking approach as in Kyriakou et al. [6] where only the variable S t was adjusted according to some benchmark B (A) t−1 , as shown in (1), while the independent variable(s) is (are) measured on the original nominal scale. The models (3) and (9) are estimated with a local-linear kernel smoother using the quartic kernel and the optimal bandwidth is chosen by cross-validation, that is, by maximizing the R 2 V given by (15) and (16). Moreover, it should be kept in mind that the nonparametric method can estimate linear functions without any bias, given that we apply a local-linear smoother. Thus, the linear model is automatically embedded in our approach. We study the empirical findings of R 2 V values based on different validated scenarios shown for the one-year horizon in Table 2 and the five-year horizon in Table 3. Note that the one-year predictions may differ from those originally reported in Kyriakou et al. [6] due the updated data set and the replacement of the commercial paper rate by the Treasury-bill rate; nevertheless, the models remain similarly ranked.
We found that in the case of the five-year returns, which was the focal point of this paper, the term spread s was, overall, the most powerful predictive variable for excess stock returns; this superior performance was also observed in the one-year case. More in detail, with the prediction constrained to using only one-dimensional covariates, the term spread is the best predictor for the one-year and five-year horizon under the short interest benchmark B (R) with, respectively, R 2 V = 9.7% and 15.5%, but this also does quite well in the one-year case under the long rate and earnings-by-price benchmarks, B (L) and B (E) , with R 2 V ∈ {6.2%, 7.5%} (for B (C) the best is π with 10.3%). In the five-year case under B (L) and B (E) , the term spread s yields a high R 2 V ∈ {8.0%, 11.5%} whereas under B (C) the dividend-by-price ratio d gains ground with R 2 V = 7.6%. In light of these remarks, we therefore, focused the spotlight on the relationship between the spread and the excess stock returns. We present in the top panel of Figure 4 the estimated functionsm (red solid line) for the one-year horizon (left) and the five-year horizon (right) under the single risk-free benchmark together with a corresponding linear model (dash-dotted green line), and a 45-degree line (dashed black line). Figure 4 shows thereby the three single covariates with the largest R 2 V (T = 1 and T = 5): the term spread (9.7% and 15.5%), the short-term interest rate (3.0% and 7.8%), and the long-term interest rate (0.0% and 1.4%). Our findings for both horizons conformed to the fact that an increase in the spread corresponds to an increase in the excess stock return. While a positive spread corresponds to a positive return for the one-year case, a spread larger than −1% gave, on average, a positive five-year return. This finding is in line with, for example, Resnick and Shoesmith [27] who find that the value of the yield spread holds important information about the probability of a bear stock market. Regarding our validation procedure, Figure 4 also confirms that our approach of correcting for autocorrelation in the five-year prediction problem was successful. The estimated functions are quite smooth indicating that the chosen bandwidth is not too small and that the resulting fit and validated R 2 are reasonable.  Back to our discussion of the results in Tables 2 and 3, in broad terms, five-year predictability improved over one-year: 67 out of 112 models achieve a larger R 2 V , and we observed 64 (five-year) versus 52 (one-year) models with nonnegative R 2 V , that is, our proposed predictor-based regression model for the longer forecast horizon in this application surpassed the historical average excess return in the majority of cases. In addition, combining the term spread s with the dividend-by-price d results in uplifted predictability to 26.2%; this combination is, in fact, the best-performing one for 3 out of 4 benchmarks (B (R) , B (L) , B (C) ). In particular, imposing an additional covariate to s results in one-year R 2 V in the range 6-10% under B (R) ; under other benchmarks, such as B (L) , the one-year R 2 V is in the range 3-6.5% (approx.); changing to the B (E) benchmark results in R 2 V in the range 4.5-7.5%. Interestingly, for a five-year horizon, we observed a substantially improved predictive power with our cross-validated R 2 V ranking some two-dimensional better than one-dimensional models, in fact, more than for a one-year horizon: in particular, as possibly anticipated by the aforementioned performances of d and s, the two-dimensional covariate (d, s) boosts R 2 V to 26.2% under B (R) , performs best with 21% under B (L) , and comes second with 12% under B (E) being beaten by (Y (E) , s) with 14.1%.  In the one-year case, quite remarkable is the predictor π, either in itself or combined with covariates Y (C) , d, e, r, l, under the inflation benchmark B (C) leading to R 2 V in the range 9.5-15.4%. In addition, when put together with the term spread, the resulting combination (π, s) under B (C) is the clear winner reaching up to R 2 V = 15.4%. This is probably good news in an actuarial context where the inflation benchmark can be seen as an important one in pension product applications. In the five-year case, π still does quite well in the range 6.8-10.8% (with an exception of 0.9% for (r, π)) and remains generally the best predictor under B (C) , nevertheless, it is no longer the globally best one.

The Full Benchmarking Approach
The next step now is to analyze whether transforming the explanatory variables can improve predictions. Recall that fully nonparametric models suffer in general by the curse of dimensionality, as in our framework where we confront sparsely distributed annual observations in higher dimensions. Importing more structure in the estimation process can help reduce or circumvent such problems.
Here, we extend the study in Section 3.3 using economic structure in the sense that we consider adjusting both the independent and dependent variables according to the same benchmark. To this end, in our full (double) benchmarking approach, the prediction problems are reformulated as where we use transformed predictive variables , X ∈ {d, e, r, l, π} This approach can be interpreted as a way of reducing dimensionality of the estimation procedure as t−1 encompasses an additional predictive variable.

Benchmark B (A)
Explanatory Variable(s) X    (π (A) , s (A) ) Short-term rate 7.2 Long-term rate 3.6 Earnings-by-price 5.1 Inflation -Results of this empirical study are presented for the one-year horizon in Table 4 and for the five-year horizon in Table 5. In addition, Figure 5 presents the three single covariates with the largest R 2 V (T = 1 and T = 5) for the double inflation benchmark case: the earnings-by-price ratio (12.2% and 12.4%), the dividend-by-price ratio (10.4% and 10.9%), and the long-term interest rate (10.5% and 8.7%).
We find that, in the majority of cases, the full outruns the single benchmarking approach, even more when we consider a longer horizon, and the number of models with nonnegative R 2 V (that is, cases of beating the historical average excess return) increases: 68 out of 82 models (full benchmarking, five-year); 55 out of 82 (full benchmarking, one-year); 64 out of 112 (single benchmarking, five-year); and 52 out of 112 (single benchmarking, one-year).
The pair (d (R) , s (R) ) in the full benchmarking approach for a five-year horizon yields R 2 V = 21.9% against 26.2% in the single benchmarking under B (R) , whereas (e (C) , s (C) ) in the full benchmarking approach for a one-year horizon yields R 2 V = 17.8% against 15.4% using the predictor (π, s) in the single benchmarking under B (C) . It, therefore, seems that s is an important predictor, whose power is mostly unveiled when combining with another predictor depending on the benchmark choice and the forecast horizon. In addition, although under B (R) and B (L) , full benchmarking does not improve predictability, it does under B (E) and, especially, B (C) which is important if we aim to identify a likely common well-performing benchmark and predictor, that is, (e (C) , s (C) ), independently of the horizon length. For B (C) full benchmarking, R 2 V lies in the range 14.7-10.1% (five-year) and 17.8-11.5% (one-year), which are both an improvement from B (C) single benchmarking yielding R 2 V in 13.7-0.9% (five-year) and 15.4-9.5% (one-year), that is, a maximum width reduction by a factor of almost 3 for the five-year horizon.
Overall, we conclude that the term spread is a good predictor; if we aim to homogenize our choice of predictor and benchmark with respect to the horizon length, then the earnings-by-price and the term spread under the inflation benchmark would be an ideal compromise, even if not the winning one. This is welcoming, as, for example, in pension research or other long-term saving strategies, it is sensible to look at real value and employ such a model with all returns and covariates net-of-inflation.

Real-Income Long-Term Pension Prediction
In long-term pension planning, real-income protection is often an important aspect [28][29][30]. When optimizing investment asset allocation for the long term, one therefore needs a good econometric model in real terms. Based on the research in this paper and in Kyriakou et al. [6], we are able to conclude that, in the natural double benchmark setting for real-income econometrics, it looks like earnings divided by price is the natural covariate to consider. In Table 2 of Kyriakou et al. [6] and in Table 4 of this paper, it is concluded that earnings divided by price is the best single covariate to use in the double inflation benchmark case and, in this paper's Table 5, this is also the case in the five-year view. On balance, we therefore conclude that the intuitively appealing earnings divided by price is a good long-term predictor for real-income forecasting. In the one-year view of Kyriakou et al. [6], the nonparametric smoother estimated for the relationship between earnings divided by price and return in the inflation double benchmarking case has the exact functional form of a simple line. So, even though we consider a nonparametric estimator that can pick any functional form, the resulting functional form is a simple line. This provides us with a strong argument for using the simple line in this case. The functional form of a line has been picked via a validation procedure against all functional forms. The linear expression is Real one-year stock return = 0.004875 + 1.119 × real earnings-by-price. (21) Notice that a very good long-term predictor of real income can, therefore, be expressed as a simple linear relationship, where the expected return adds first 12% to the earnings divided by price and then another 0.5%. This is a very simple relationship that is easy for long-term investors to remember. Similarly to the one-year view, our validation procedure exactly picks a line against all other functional alternatives in the five-year view. The linear form for the five-year view is Real five-year stock return = 0.2068 + 2.264 × real earnings-by-price.
The top panel of Figure 5 shows the estimated nonparametric functionm (red solid line) for the one-year horizon (left) and five-year horizon (right) under the double inflation benchmark for the earnings-by-price covariate together with the corresponding linear model (dash-dotted green line), and a 45-degree line (dashed black line). Note that the linear relationship discovered for the earnings-by-price predictor must not hold true for other covariates or their combinations. In those cases, the full benefit of our approach comes into its own. For example, the bottom panel of Figure 5 clearly shows nonlinearities for the five-year case when the long-term interest rate is considered. For a suitable statistical smoothing-based test (nonparametric versus linear model), see, for example, the test based on wild bootstrap proposed in Scholz et al. [8] or the discussion in the survey of González-Manteiga and Crujeiras [31].

One-Year ahead Real-Time Predictions
The four benchmarks proposed in this paper are useful in different situations. While in Section 3.5 we focused on real-income long-term predictions based on models using the inflation-double benchmark, we now want to explore the development of 'pure' stock returns S (without a benchmark) in the near future. Kyriakou et al. [6] found the model using the earnings benchmark with the term spread as the covariate, that is,Ŷ (E) =m(s), to perform best in terms of R 2 V when the predicted values are back-transformed and validated on stock returns S. We use this simple model to illustrate the usefulness of the earnings benchmark. For this purpose, we estimate this model over the full sample as before and predict the stock returns in excess of earnings using the current spread (in the period September 2018-March 2020). Finally, we back-transform those predictions to get a one-year ahead forecast for nominal stock returnsŜ t,nom from lnŜ t,nom =Ŷ As a by-product, we also calculate a prediction for the real-stock return,Ŝ t,real =Ŝ t,nom − π t , and the risk premium, RP t =Ŝ t,nom − R t , of holding stocks versus a risk-free asset. Table 6 shows the results of this forecasting exercise. The considered forecasting period is of interest for two specific features: (i) the term spread is U-shaped, that is, it reduces, gets negative (an inverted yield curve in August and September 2019), and finally increases again; and (ii) the external shock to the market caused by the Covid-19 pandemic leads to large negative returns starting in March 2020. Table 6. One-year ahead real-time forecasts: predictions from the model using the single-earnings benchmark and the term spread s as covariate (Ŷ (E) =m(s)) and estimated over the full sample period.

US Stock Market Data Predictions
We find i) that for the (slightly) negative spread the predicted stock return in excess of earnings is also negative. Nevertheless, the predicted nominal stock return is positive (around 4.1% and mainly driven by the earnings of around 4.5%) as well as the real return (around 2.4%) and the risk-premium (around 2.3%). Note ii) that the one-year ahead predictions in March 2020 are not frightening even though we are at the beginning of a worldwide economic crisis and recession. They seem to reflect optimistic market expectations. Of course, we cannot incorporate external shocks in our model but the key variables show comforting figures as both, the term spread and the earnings-by-price, are at their second-highest values in the last 20 months. Thus, our model predicts that compared to the month before the crises started nominal one-year stock returns increase by 2.3%, real returns increase by 3.1%, and the risk-premium increases by 3.4%. Low inflation, low short-term interest rates (the latter being almost zero), and increasing prices will bring back the market to past performance such that this prediction was in line with what happened.

Conclusions and Outlook
In this paper, we extend the original working framework of Kyriakou et al. [6] to forecasting stock returns from a one-year to a five-year horizon in excess of different benchmarks, including the short-term rate, long-term rate, earnings-by-price ratio and inflation. We use predictors such as the dividend-by-price ratio, earnings-by-price ratio, short interest rate, long interest rate, term spread, inflation, as well as the lagged excess stock return, in one-and two-dimensional settings, with the returns benchmarked or also the covariates used to predict them.
We conclude that five-year returns can be forecasted via our economic variables. The improvement in overall variability compared to predicting a simple mean-measured via the R 2 V -is good and comparable to the overall forecasting improvement we have seen earlier in Kyriakou et al. [6] for the one-year view. We find that for both one-year and five-year returns, the term spread is, overall, the most powerful predictive variable for excess stock returns. Combining this with the dividend-by-price in the five-year case boosts the predictive power to a maximum. In the one-year case, the inflation predictor is quite remarkable under the inflation benchmark either in itself, or combined with other covariates such as the term spread to achieve a best-performing pair for the given horizon. Notice that earnings seem to be the best overall predictor when working net-of-inflation. The double benchmarking approach has earnings as the best individual predictor net with the inflation benchmark, where it is almost as strong a predictor as when using earnings and spread combined. Based on the results of this paper and also of Kyriakou et al. [6], we therefore, conclude that modeling earnings-by-price is a good and relatively simple starting point when constructing forecasting models for real-value pension prognoses.
Finally, a good compromise when promoting only one set of predictors for both the one and the five-year view would be earnings-by-price and the term spread. It seems that the earnings-by-price tends to define the overall level of the return, while the term spread provides information on short-term market corrections. This is why the earnings-by-price benchmark with the term spread as a covariate was the superior combination among all considered opportunities when all forecasts were back-calculated to nominal returns (see, Kyriakou et al. [6]). The different role played by earning-by-price and the term spread may be explaining why they work so well on aggregate. So, the overall conclusions can be (i) that one should work with these two predictors when forecasting long-term stock returns and (ii) that the good results of our approach should lead to reconsiderations, along some of the comments of Lioui and Poncet [1], when deciding on the horizon of the investment strategy for the long-term investor.
Future research might work on econometric modeling that can combine short-term and long-term predictions. One could, for example, imagine an econometric model having the same predictive mean and variance one year ahead provided by the optimal one-year forecast, while simultaneously having the same predictive mean and variance five years ahead as provided by the optimal five-year forecast. Current efforts are being undertaken in that direction by the research team behind this paper.

Conflicts of Interest:
The authors declare no conflict of interest.