Linear regression models are popular in predictive modeling as these classical benchmarks are easy to estimate and interpret. However, the fixed functional form of the relationship between stock returns and predictive variables leads to inferior predictive power compared with nonlinear approaches [
8,
9,
10,
11,
12]. Therefore, we focus on potentially nonlinear predictive relationships between returns over the next
T years in excess of a reference rate (or benchmark) and a set of economic predictors relevant for the long-term investor using a fully nonparametric smoother. We analyze the two most important benchmark models of Kyriakou et al. [
7,
13]: the short-term interest rate and the inflation rate. Note that the former directly corresponds to the prediction of the risk premium (over a risk-free investment), whereas the latter refers to the forecast of real returns. We aim, first, to investigate their predictability over horizons of one year and five years separately and then provide an intuitive single econometric model that combines both predictive horizons.
2.1. One-Year Predictions
We start with annual nominal stock returns defined by
, where
is the stock price at the end of year
t and
is the dividends paid during year
t. We focus on returns in excess (log-scale) of a given benchmark
with
:
where
and
with
denoting the short-term interest rate and
is the inflation rate for the consumer price index
for year
t.
Our predictive nonparametric regression model for the one-year (
) excess returns defined in Equation (
1) is now given by
Note that the conditional mean in Equation (
2),
is unknown and its functional form is not predetermined, for example, to be linear, but can take any shape. Our preferred nonparametric method to estimate this function
is the local-linear smoother because of its flexibility and well-known statistical properties. For example, the linear function can be estimated without any bias and is thus automatically embedded in our analysis; that is, if the data-generating process is linear, we expose this simple functional form. Note further that the error terms
in Equation (
2) form a martingale difference process, i.e.,
are serially uncorrelated random variables with zero mean, given the past, and the unknown conditionally heteroscedastic variance of the form
. The elements of the
q-dimensional vector
in Equation (
2), which collects the explanatory variables, are also transformed under the chosen benchmark
A according to
Therefore,
contains (combinations of transformed) popular time-lagged predictive variables based on the: (i) dividend-by-price ratio
; (ii) earnings-by-price ratio
, where
denotes the earnings accruing to the index in year
t; (iii) short-term interest rate
; (iv) long-term interest rate
; (v) inflation rate
; and (vi) term spread
. The use of such a transformation is one example of the careful imposition of an additional structure in the statistical modeling process, which has shown promising results in previous works [
10,
11,
14]. We call this adjustment of both the independent and dependent variables according to the same benchmark double (or full) benchmarking.
2.2. Longer-Horizon Predictions
A main contribution of our work is the combination of short- and long-term predictions into one single model. Hence, we introduce, in addition to the short one-year predictions, our version of long-horizon predictions. We highlight three important points that distinguish both cases fundamentally from each other: first, the autoregressive behavior of the underlying predictive variable in Equation (
6), which is used as the building block of our econometric model in
Section 2.4 as well; second, the more complicated error structure (serial correlation by construction) in the predictive relationship (8); and, third, closely related to the last point, a more complicated smoothing parameter selection for the correct estimation of
in Equation (
9).
For longer horizons
T, with
, we consider the sum of annual continuously compounded returns defined in Equation (
1), that is,
Here, careful econometric modeling is necessary because of the overlapping nature of the returns
(refer also to
Appendix A). For ease of illustration, assume a linear model for
in Equation (
2) as well as some linear and autoregressive behaviors with an order of one for the forecasting variable
:
with
as in Equation (
2),
being a white noise, and regression parameters
, and
. A simple linear model for the
T-year (
) regression problem that directly follows from Equations (
5) and (
6) is then
with parameters
and
, and error terms
(more details are deferred to the
Appendix A). Equation (
7) shows that the excess stock return for year
t over the next
T years can be decomposed into two parts: a predictive linear part dependent only on the variable
, the same predictive variable as in the one-year case, and unpredictable error terms
, which are now serially correlated by construction.
As the linear setup of Equation (
6) could be misspecified and thus not account for important nonlinearities, we model the functional relationship between the predictive variable
and
T-year excess stock returns
in a more flexible nonparametric way analogous to Equation (
2)
where
is an unknown smooth function. Note again the important difference between the error terms of Model (
2) and Model (
8): while
is a martingale difference process,
is serially correlated by construction. This property has to be considered when estimating the unknown conditional mean function
; otherwise, fundamental problems occur: the estimators are still consistent but less efficient than those correcting for autocorrelation [
15,
16,
17,
18]; and, more importantly, the commonly applied automatic smoothing parameter selection procedures (such as cross-validation and plug-in) break down [
19,
20]. In the empirical part of our work, we overcome the aforementioned problems using a special leave-
l-out cross-validation strategy, which is closely related to our method of measuring predictive power. Our approach to this issue is discussed in detail in the next section.
Before we proceed, we summarize what we have discussed so far: the nonparametric Models (
2) and (
8) for one-year and
T-year returns, the autoregressive behavior of order one for the predictive variable in (
6), and the necessity of a leave-
l-out cross-validation in the estimation procedure.
2.3. Predictive Power, Variable Selection, and Smoothing Parameter Choice
For our nonparametric one- and
T-year models defined earlier, we need an adequate measure that (a) quantifies and validates the predictive power, (b) allows for comparisons and ranking of models when different sets of explanatory variables are used (variable selection), and (c) best selects the bandwidth(s) and thus determines the functional form of the conditional mean for the given predictive variables (smoothing parameter choice). In our work, we apply the validated R-squared (
) of Nielsen and Sperlich [
14], which conforms to these requirements. It directly aims to estimate the
k-year (
)-ahead prediction error based on a leave-
l-out cross-validation (with
) and can thus be used for both variable as well as smoothing parameter selection. In our notation, the validated
R is defined as
where such estimators are used that leave out
l observations around the
tth point in time,
, for the conditional mean function
from Equations (
2) or (
8) with
and
for the unconditional (historical) mean of
, that is, the
k-year return to predict (equal to
for
and
for
). To maintain the simplicity of notation, we drop an extra subscript for the bandwidth
h used in the calculation of
, as we always choose
h in the numerator in Equation (
10) so that the prediction error is minimized and thus the largest possible
is achieved for the given predictive variables. Note that
measures the predictive power of a given model against a benchmark (here, the cross-validated historical mean). For our setup, this means that when
is positive, the predictor-based regression Model (2) or (8) outperforms the corresponding historical mean forecast.
In a time-series context, out-of-sample evaluations are often proposed where a fraction of the data from the end of the time-series is not used for estimation but is withheld for evaluation. In the case of uncorrelated errors, Bergmeir et al. [
20] showed that cross-validation, as proposed in this section, is preferred to out-of-sample evaluation. Another advantage is that cross-validation involves various evaluations, whereas out-of-sample analysis can test the data only once. This property is especially beneficial when the number of recorded observations is small, as in our case with annual stock market data. When errors are correlated, as discussed in
Section 2.2 for our
T-year predictions, it may be necessary to omit more than a single point and apply leave-
l-out cross-validation (with
). This strategy avoids model fits that are progressively under-smoothed caused by too-small bandwidths [
21]. Alternative approaches, for example, involve using bimodal kernels [
22] or the correlation-corrected cross-validation [
19]. Note that in the case of a large fraction of skipped data, additional corrections might be required [
23].
2.4. An Econometric Model for Combined Short- and Long-Term
Predictions
In this section, we present a simple method of combining short- and long-term predictions. Our model builds on the autoregressive development of the earnings variable
or, more precisely, on the change in earnings growth, which has been identified as one of the key drivers of stock prices
P. Other important factors, such as the dividend yield
, can be easily incorporated in our model as well, for example, as covariates in the one- or five-year conditional mean regressions in (
2) or (8), which will be used to calibrate our model. The important contribution of our approach is twofold. First, the application of predictive regressions for two different horizons individually reduces the noise or risk for short- and long-term investments. Second, the combination of predictions of different horizons further reduces the noise or risk for the short-term investment. The reason is that even in, for example, one-year returns, a large amount of speculative variation is still included. This is clearly reduced in longer-horizon investments. Using now such
T-year predictions in combination with the one-year ones, the latter benefits from the former as they are forced to sum up to the long-term predictions after our model is calibrated. In other words, our model provides one- and
T-year predictions that are equal to the conditional mean forecasts based on regressions (
2) and (
8) (and thus with an interpolation argument for the horizons in between), and reduces the variation in the short-term predictions after year one.
We start with the linear formulations of the autoregressive behavior of order one of the predictive variable and the linear model version of one-year return predictions in Equation (
6). Here, we consider the earnings variable
to be this special predictor and estimate the linear models by ordinary least squares (OLS). In a first step, we obtain:
with unknown parameters
and
, sample average of earnings
, and independent and identically distributed error terms
. The OLS estimates of
and
shall be denoted by
and
, respectively. In a second step, we apply the linear version of Equation (
2) for the earnings variable
:
with unknown parameters
and
, which will be estimated again by OLS; their estimates are denoted by
and
, respectively. Remember that we have
n observations in our records. Thus, with Equations (
11) and (
12) and the corresponding OLS estimates, which we keep fixed in the following steps, we now forecast out-of-sample
.
Our aim was to construct an econometric model that reflects one-year and
T-year predictions (from the preferred models (
2) and (
8) at hand) simultaneously. For this reason, we correct
in the following linear way:
where
and
are unknown parameters;
, and
are independent error terms with unknown variances
and
. Note that we allow for a different variation in the first corrected one-year-ahead prediction
in Equation (
13) compared with the second to
Tth corrected one-year-ahead predictions
in Equations (14) to (15). This way, our model can account for the lower variation in longer-horizon returns relative to one-year returns. In other words, after calibrating the model, we expect
to be smaller than
. Note further that from Equations (
13)–(
15), we directly obtain an expression for the corrected
T-year return
:
Next, we adequately calibrate Equations (
13)–(
16), i.e., choose the model parameters
,
,
, and based on these, obtain the corrected one-year and
T-year returns. Here, we use the recursive representation of the earnings
from Equation (11) with the starting value
(the last earnings observation in our records) together with the linear predictive model (
12) and the corresponding OLS estimates
,
,
,
. Plugging-in for the corrected
, and
gives:
and
Now, we fix the first and second moments of
and
with the estimated values from our preferred (best) one- and
T-year predictive Models (
2) and (
8). By doing so, we obtain a linear equation system with four equations, which can be easily solved for the four unknown parameters
,
,
. For this purpose, let
and
be the conditional mean forecast and its estimated variance from Equation (
2), respectively; and
and
be the conditional mean forecast and its estimated variance from Equation (
8), respectively. Note that
and
can be readily calculated from the
of the predictive regressions (
2) and (
8). A closer inspection of Equation (10) shows that the ratio in our validation criterion compares the sample variance of the estimated residuals from the preferred predictive model (the numerator) with the sample variance of the benchmarked returns (the denominator). Algebraically, we therefore have that
and
, and
Given
the solution of the equation system (
23) is
where
and
The a priori expectations about our model are the following: First, when the autoregressive behavior of the earnings in Model (11) and the linear model for stock returns (
12) produce reasonable predictions
, only a marginal correction is necessary, i.e.,
is close to zero and
close to one. Second, when
, one-year returns should diminish over time (as the sum of
still has to be equal to
) and
becomes negative. Now
takes the role of an upper limit (larger than
), from which increasing values (over time) are subtracted to match the
T-year prediction
. Finally, note that by construction,
if and only if
, that is, the cumulated risk over
T periods of short-term investments exceeds the risk of a
T-year investment (as discussed earlier).
2.5. Data Sources and Descriptive Statistics
Our empirical application is based on historical U.S. stock market data on the annual frequency. The dataset includes, among other variables, the Standard and Poor’s (S&P) Composite Stock Price Index, dividends and earnings accruing to the index, as well as macroeconomic measures like the short-term interest rate, the long-term interest rate, and the consumer price index covering the period from 1872 to 2020.
Table 1 exhibits their basic descriptive statistics.
We here use an updated and revised version of Shiller’s ([
24], Chapter 26) data, which are available from
http://www.econ.yale.edu/~shiller/data.htm (accessed on 16 April 2020). Note that a simple extension of the risk-free rate series was not possible because the underlying 6 month certificate of deposit rate (secondary market) was discontinued in 2013. We thus followed the strategy of Welch and Goyal [
25] and replaced this variable by an annual risk-free rate based on the 6 month treasury-bill rate (secondary market) from
https://fred.stlouisfed.org/series/TB6MS (accessed on 16 April 2020). As this series is available only from 1958, we had to estimate the information prior to 1958 using results from an OLS regression of the treasury-bill rate on the risk-free rate from Shiller’s data for the overlapping period 1958 to 2013. With the estimated linear model (
of 98.6%, estimated standard errors in brackets) of
we finally instrumented the risk-free rate from 1872 to 1957. The high correlation of 99.3% between the actual treasury-bill rate and the predictions for the estimation period verified the usefulness of this approach.
This section is concluded with
Table 2, which displays the standard descriptive statistics for the transformed variables according to Equations (
1), (
4) and (
5). The predictive variables under the inflation benchmark are more spread out, with a wider range and a higher standard deviation than the variables under the risk-free rate benchmark. This property of the inflation benchmark could be beneficial for the estimation process because a larger variability in the regressors usually leads to a more efficient predictor.
However, the returns transformed with the two benchmarks differ only slightly. A small upward shift under the inflation benchmark is noticeable in
Figure 1, which shows density plots of the benchmarked returns for both the one- and five-year horizons.