Forecasting U.S. Aggregate Stock Market Excess Return: Do Functional Data Analysis Add Economic Value?

: This paper analyzes the forecast performance of historical S&P500 and Dow Jones Industrial Average (DJIA) excess returns while using nonparametric functional data analysis (NP-FDA). The empirical results show that the NP-FDA forecasting strategy outperforms not only the the prevailing-mean model, but also the traditional univariate predictive regressions with standard predictors used in the literature and, most cases, also combination approaches that use all predictors jointly. In addition, our results clearly have important implications for investors, from an asset allocation perspective, a mean-variance investor realizes substantial economic gains. Indeed, our results show that NP-FDA is the only one individual model that can overcome the historical average forecasts for excess returns in statistically and economically signiﬁcant manners for both S&P500 and DJIA during the entire period, NBER recession, and expansions periods.


Introduction
Stock return predictability has been of considerable interest to practitioners and academics in finance. Practitioners attempt to improve asset allocation and risk management performance using real-time prediction of stock returns. Further, academics use information regarding stock return forecasting to generate more realistic asset-pricing models, to test the market efficiency hypothesis, and other financial problems. Therefore, it is not surprising that the search for accurate and reliable return forecasts has attracted great interest from both finance professionals and academics.
In the early 1970s, it was widely accepted that stock markets were efficient. However, over the last two decades, academics have found evidence of stock return predictability and highlighted the potential benefits for actively managed investment strategies (see [1], for a recent survey). While many studies report evidence of in-sample return predictability, out-of-sample return predictability remains controversial. In an influential article, Ref. [2] show that it is difficult to find models that can surpass even the most naive benchmarks out-of-sample. They observe that most individual predictor variables cannot provide statistically significant forecast improvements over a historical average benchmark.
Recent studies report evidence of excess return predictability that is based on macroeconomic variables, technical indicators, short interest rates, investor sentiment, and so on. On the other hand, several studies demonstrate that the out-of-sample forecasting ability of commonly used predictors can be improved while using various strategies: imposing economically motivated restrictions on the model [3][4][5][6][7], diffusion indices [8][9][10][11], using regime switching models [12][13][14][15], or combining forecasts from individual predictive models [1,5,7,[16][17][18]. Particularly, ref. [16] find that a combination forecast approach is able to deliver consistently superior out-of-sample US equity premium prediction. Refs. [19,20] use partial least squares methodology in order to create a sentiment index for predicting future market returns and finding results that are both statistically and economically significant.
In this paper, we investigate the capacity of nonparametric functional data analysis (NP-FDA) to directly forecast the S&P500 and Dow Jones Industrial Average (DJIA) excess returns and compare their performance to that of the traditional regression model. We model monthly returns as curves in a functional space. Specifically, we assume that daily cumulative returns can be interpreted as curves describing the cumulative return during a given month. The daily cumulative return curves give more relevant information, because they show how returns evolve during the month. The NP-FDA methodology can be seen as a functional regression, in which the regressor is the curve for a given month, t. As our main objective is the forecast of monthly return 1-step-ahead, the regressand, in this case, is the last day of the next month, t + 1. Moreover, the NP-FDA estimator that is described in Section 2.1 may be viewed as a weighted mean of monthly returns in which weights are a function of a measure of proximity between a given curve and the other curves presented in the sample. An interisting relation between our methodology and the prevailing-mean emerges. With appropriate choices for bandwidth, b, and kernel function, K(·), our estimates coincide with those of the prevailing-mean. Nevertheless, since the bandwidth is automaticaly chosen by a cross-validation procedure and the kernel is a density with smooth decay, following the nonparametric literature, this coincidence will not happen in general. (For the NP-FDA estimator to coincide with the prevailing-mean the kernel function would have to be the density of the Uniform distribution.) In other words, we attempt, in this article, to forecast monthly returns while using a flexible nonparametric estimator that uses the information presented in the sample of curves of daily cumulative returns. For every fixed month, these curves exhibit a specific pattern, typically with some upward or downward momentum, disturbed by some noise. Therefore, it is tempting to study their statistical behaviour using the framework of NP-FDA.
We employ monthly S&P500 and DJIA excess returns and the 12 monthly economic fundamentals of [2] for a sample period ranging from January 1927 to December 2019 for S&P500 and January 1889 to June 2020 for DJIA, respectively. We also consider a set of economic variables originally proposed by [2] in the predictive regression models. The analysis is performed purely out-of-sample to inform real-time investment decisions. A detailed performance assessment of forecast combinations strategies in comparison to individual models is provided in terms of both statistical accuracy and economic relevance. Because the final goal of returns forecasting is to improve economic and financial decision making, we propose evaluating the accuracy and economic relevance of excess returns forecast based on a mean-variance investor with quadratic utility, as in [1].
Our main empirical finding is that the NP-FDA delivers a statistically significant monthly out-of-sample R 2 of 0.57% and 0.37% for DJIA and S&P500, respectively. This is considerably higher than the R 2 values for the best individual model and for the standard forecast combination based on all 12 predictors. In addition, the NP-FDA also delivers high economic value in the context of a dynamic mean-variance strategy. The average utility gain (certainty equivalent return) is approximately 0.98% and 0.28% per year over and above the historical mean benchmark for the full sample for DJIA and S&P500, respectively. When we consider the economic cycle, the results are also the most impressive, 0.70% per year in expansions, and 2.68% per year in recessions for the S&P500 dataset.
The remaining sessions of the paper are organized, as follows. In Section 2, the predictive regression framework is presented and the NP-FDA estimation is detailed. Section 3 describes the datasets and reports out-of-sample results for statistical and economic evaluation. Section 4 contains concluding remarks.

Excess Return Forecast Models
In this section, we outline our empirical framework to predict DJIA and S&P500 excess returns, which we apply to our data in subsequent sections. In addition, we describe the methods for statistical and economic evaluation of the accuracy of the return forecast. Now, let r t+1 denote the excess return of the index held from period t to period t + 1, in excess of the risk-free rate. We use the continuously compounded returns that were obtained by subtracting the short T-bill rate from the DJIA and S&P500 returns, including dividends, which is given by: where R t is the return on a broad stock market index and R f,t−1 denotes the risk-free rate from period t − 1 to t. The unit of time can be day, month, or year.

Functional Data Methodology and Nonparametric Estimation
In this section, we explain how the functional data analysis methodology can be applied to our problem (For further details about the method see [21,22]). Following [23], let X be a functional random variable taking values in a semi-metric space (E, d), where E is an infinite dimensional space and d is a semi-metric (as described below). Moreover, let X be an observation of X. In particular, when X denotes a random curve, it is convinient to identify X = {x(τ); τ ∈ D} (respectively, X = {x(τ); τ ∈ D}), where D is the set of points at which the curve is evaluated. Now, consider the problem of predicting a dependent scalar random variable y as a function of a functional regressor X. Let be the nonlinear regression operator and (X t , y t ) t=1,...,T be a sequence of random pairs taking values in E × R. Given a fixed element X s ∈ E, the nonparametric estimator for r NP (X s ) is defined bŷ where K is an asymmetric kernel and b > 0 is a bandwidth. The estimator that is presented in (3) may be viewed as a weighted mean, in which the weights are ultimately determined by the kernel density. The argument of this kernel depends on the semi-metric, d, and the bandwidth, b. The proper selection of the bandwidth is of fundamental importance. Given an observed sample (X t , y t ) t=1,...,T , the optimal bandwidth, b opt , may be selected by the following cross-validation procedure where andr The estimator that is presented in Equation (6) is well known in nonparametric literature as the leave-one-out estimator and it is also suggested by Ferraty et al. [24] and Ferraty and Vieu [21]. Now, we make the definition that a semi-metric d is a metric, but such that d(x, y) = 0 x = y, where x, y ∈ E. This characteristic is suitable when dealing with infinite dimensional space, since, in such spaces, there is no equivalence between norms. Many semi-metrics are available in the literature, each one with its strength and weakness. (Here, we focus on the PCA semimetric. For other types of semimetrics, the reader can refer to [21]) As pointed in [21], the PCA semi-metric is adequate for computing proximities between curves in a reduced dimensional space and it has the advantage of being usable even if the curves are rough. On the other hand, it only applies to balanced data. Technically, as stated in [21] and in Ferraty et al. [24], the PCA semi-metric may be built when assuming that E X 2 (s)ds < ∞. Under this assumption, the following expansion holds where v 1 , v 2 , . . . are orthonormal eigenfunctions of the covariance operator Γ X (t, s) = E (X(t)X(s)). Regarding its empirical counterpart, let be a truncated version of X (Equation (7)). Based on the L 2 -norm, we have, for all (X 1 , X 2 ) ∈ E 2 , the following parametrized family of semi-metrics It is worth noting that q is a tunning parameter that needs to be estimated. One possibility is choosing q via cross-validation. In the next paragraph, we detail how to estimate the non-observable quantities that are presented in Equation (9).

Estimation Details
In practice, it is only possible to observe a discretized version of X = {x(τ); τ ∈ D}, which we denote by X = {x(τ); τ = 1, 2, . . . , D}, implying that D = {1, 2, . . . , D}. Specific to our problem, a particular month is viewed as a function (curve) that links days to cumulative excess returns. Therefore, we observe a sequence {X 1 , X 2 , . . . , X T } of realizations of X, where X t = {x t (τ); τ = 1, 2, . . . , D} corresponds to the observed excess return curve at months t = 1, . . . , T, and D represents the number of days each month. It is worth noting that balanced data are needed in order to use the PCA semimetric. In Section 3, we detail how to adjust the data to make D constant over t.
The main goal of this article is to forecast, one step ahead, the excess return on the last day of a given month, when considering the information available before the forecast is made. In other words, given a sequence {X 1 , X 2 , . . . , X T } of realizations of X, we want to estimate r NP (X T ) = E(y T+1 |X = X T ), (10) where y t := x t (D), which means that the regressor is the excess return curve of a particular month and the regressand is the last day excess return of the next month. Now, Equation (3) may be adapted to our problem in the following wayr where K, b and d are defined as before. Roughly speaking, Equation (11) states that the one step ahead forecast for the excess return on the last day of the month is a weighted mean, in which the weights depend on the proximity, in terms of the semi-metric d, between the curve X T and the other curves in the sample. Now, we turn attention to the PCA semimetric. Because Γ X is not observable, its empirical version is set to be Γ n (9) can be approximated by numerical quadrature. Following [21], the empirical version of the PCA semi-metric is defined for all (X 1 , where v 1 , v 2 , . . . , are the W-orthonormal eigenvectors of the covariance matrix where W = diag(w 1 , w 2 , . . . , w D ) is the diagonal matrix with the quadrature weights in the main diagonal. In order to estimate q, we apply the cross-validation procedure described in (4) for q = 1, 2, 3 and choose the pair (q, b opt ) that produces the smallest value in (5).

Predictive Regressions
Following the literature on excess return predictability, we consider the standard multivariate regression model for predicting excess returns, which can be expressed as where r t+1 is the stock market index log excess return from period t to t + 1, Z t is a vector of predictive variables available at the end of period t. β 0 is the conditional average excess return, β 1 is the incremental expected excess return with respect to one unit change in predictor variable Z t , ε t+1 is the regression residual that is assumed to follow a standard normal distribution, and n is the number of predictors. We then divide the total sample of T observations for r t and Z t into an in-sample period that comprises the first P observations and an out-of-sample period comprising the last q observations (q = T − p). As a simple no-predictability benchmark, we use the naive prevailing-mean and variance model, as suggested by [3], which can be expressed as namely, the constant mean and volatility model. Note that the prevailing-mean forecast ignores information in any predictor variable; it is simply the historical average excess return, which is calculated over all prior r s . This is equivalent to restricting β 1 = 0 in (13).

Data and Results
In this section, we first describe the predictor variables that are used in the literature and the details of their data. Subsequently, we describe the methodology used to evaluate the excess return forecasts that were obtained from the NP-FDA and regression based models. Furthermore, we present the statistically based and then the economic value assessments of the forecasts based on the mean-variance optimizing investor with quadratic utility.

Data and Traditional Predictors
We examined the performance of the NP-FDA and the competing models that are discussed in Section 2 in forecasting excess returns from the DJIA and S&P 500 indices. To evaluate our finds to the numerous literature on excess market return predictability, we compare the predictive accuracy of NP-FDA to that of twelve predictor variables suggested by [2] (The data on both stock returns and used to construct the popular predictors are downloaded from the website of Amit Goyal: http://www.hec.unil.ch/agoyal/). Market excess returns are computed from the DJIA and S&P500 indexes (including dividends). In order to capture excess returns, a short rate (Treasury bill) is subtracted from the returns, as in Equation (1). Below, we provide a list of the predictors that constitute the set of variables used to predict the excess return in (13): • Dividend-price ratio (DP): the difference between the log of a twelve-month moving sum of dividends paid on the S&P500 index and the log of stock prices. Specifically, we estimate predictive regressions while using updated monthly excess return data for DJIA and S&P500 from [2] and Kenneth French's Data Library (Available at http://mba.tuck.dartmouth. edu/pages/faculty/ken.french/datalibrary.html). For the main results we start the sample in 1926:04 to account for the lagged predictors when estimating the predictive regressions. After accounting for the lagged predictors, the available estimation sample covers from 1926:04 to 2019:12 (1116 observations).
In addition, in the out-of-sample analysis with NP-FDA, we consider a longest DIJA return series, from 1885:01 to 2020:06 (1620 observations).

Dataset Used in NP-FDA Estimation
The PCA semimetric requires a balanced dataset to be usable, as pointed out in Section 2.1. On the other hand, we have a sample of daily cumulative returns. Because the number of business days varies along the months, we need to make some adjustment in the data in order to have a balanced dataset. We propose organizing these data in a matrix, denoted by X, in which each row represents a month while each column represents a day. We fixed the number of days to 20. The first column gets the data of the first business day of each month in the sample, likewise the last column is filled with the data of the last business day. The second and the penult column get the second and the penult business day data, respectively. We keep doing that until the matrix is fullfilled. For some months, there is more than 20 business days. In such cases, the remaining data is dropped. For months with less than 20 days, we apply linear interpolation in order to input data. The main justification to proceed this way is that the beginning and end of each month carry more information about market movements.
It is important to highlight how that matrix X relates to Equation (11) in NP-FDA estimation process. Let X 1 , X 2 , . . . , X T and Y 1 , Y 2 , . . . , Y 20 be the rows and the columns of X, respectively. Because we are interested in one-step-ahead forecast, the NP-FDA regressor is the matrix that formed by stacking X 1 , X 2 , . . . , X T−1 and the regressand is the last column of X, i.e., Y 20 , with its first element discarded.

Forecast Combination
In addition to individual forecasting models, we consider another approach to improve the excess return forecasts based on forecast combination. The motivation for doing this is in the methodological literature on forecasting, which shows that more accurate predictions can be obtained from a linear combination of two or more forecasts in relation to the use of just one forecast [25][26][27]. In addition, adaptive strategies to combine predictions can also alleviate the effect of structural breaks, model uncertainty, and incorrect model specification [26,28]. In particular, there is recent evidence that the combination of nested models can significantly improve the accuracy of the forecasts when compared to predictions obtained from single model specifications [29].
Assuming M different models, a combined forecast for one-step-ahead return is given bŷ where w t+1|t,m stands for the time −t weight assigned to the m th model,r t+1|t,m . The forecast combination strategies are mostly adaptive, which means that the forecasts included in M = {r t+1|t,m , m = 1, 2, . . . , M} and/or corresponding weights w t+1|t,m are selected in a sub-sample of observations, based on some criteria. In this article, the following four combination strategies are considered: 1. Equally weighted forecasts or pooled (POOL-AVG): this forecast combination method assigns equal weights to the forecasts of all individual models, i.e., w t+1|t,m = 1/M for m = 1, . . . , M. This approach is likely to work well if the forecasting errors of different models have similar variances and are highly correlated, as explained in [30]. Therefore, in many cases, this simple average of forecasts can work well against more sophisticated weighting schemes [26,29].
2. Thick Modeling Approach with MSFE (POOL-DMSFE): the second scheme consists of selecting models by means oh thick modeling approach. Following [31], the weight for model m is computed as: where θ is a discount factor. Thus, the DMSPE aproach assigns greater weights at time t + 1 to individual predictive models with lower MSPE values over the holdout out-of-sample period, which ranges from j + 1 to t. 3. Diffusion Index: this scheme involves the estimation of factors that are subsequently used for forecasting. The idea here is to extract a small number of common factors (often called diffusion indexes) assumed to drive the dynamics associated with a large number of potential return predictors (see, e.g., [9]). The basic intuition behind this approach is to filter out the noise present in the individual predictors, as discussed in [1]. The resulting factorial structure is more parsimoniuos, thus generating a more reliable signal.

4.
Sum-of-the-Parts Method: the fourth combination scheme is the sum-of-the-parts, which is in line with the ideas that are presented in [4]. The sum-of-the-parts scheme consists of decomposing the return index into three components: the dividend yield, the earnings growth rate, and the growth rate in the price-earnings ratio. Subsequently, each of these components is predicted separately. Ref. [4] show that their sum-of-the-parts forecast scheme significantly outperforms the historical average forecast.

Forecast Evatuation
Our out-of-sample procedure mimics the situation faced by real-time forecasters. Forecasts from NP-FDA and regression models are generated while only using information available at period t. To compute the results, we use a rolling window estimation of 360 monthly observations (30 years) (To save space, we do not present the results of the in-sample analysis. The results are available upon request). Therefore, our out-of-sample period for the forecast evaluation ranges from January 1956 to December 2019. Most of our results refer to the full sample and two subsamples, one that focuses on economic expansions and the other on recessions. These periods are defined according to the National Bureau of Economic Research (NBER) business cycle dating committee methodology. Thus, a recession is the period following the peak of economic activity until the trough. Firstly, we detect the statistical and economic predictability while using individual models. Subsequently, we use the combination schemes that are presented above.
We follow the literature and compute the root mean squared forecast error (RMSFE) in order to evaluate out-of-sample forecasts. Given a sample of P out-of-sample forecasts for 1-step-ahead forecast horizon, the (RMSFE) for model m is defined, as follows: where r t+1 is the observed return at time t + 1, andr t+1|t,m is the corresponding forecast made at time t.
In addition, we follow the literature when using the out-of-sample R 2 (R 2 oos ) in order to evaluate the forecasting performance. The R 2 oos compares unconditional forecasts for a one-month ahead of the prevailing-mean benchmark,r t+1|t,bench , to the conditional forecasts,r t+1|t,m , of an alternative model, and it is defined, as follows: A positive R 2 oos means that the alternative model presents lower MSE than the benchmark model. We implement the test proposed by [32] in order to assess the statistical significance of the R 2 oos . In this test, the null hypothesis is that the benchmark is not outperformed by the competing model in terms of forecast MSE against the complementary alternative hypothesis. The statistic is calculated by first defining, After regressing {f t+1 } P t=1 on a constant, the Clark and West statistic is nothing but the t-statistic of the constant. The p-value for the test may be obtained from the standard normal distribution.

Out-of-Sample Excess Returns Predictability Results
Tables 1 and 2 report the statistical measures of the out-of-sample forecasting performance from the NP-FDA, individual regression models, and four combination schemes for S&P500 and DJIA excess returns. All of the models are evaluated relative to the prevailing-mean benchmark. The out-of-sample forecasts are generated by rolling predictive regressions. The period 1926:12-1956:12 is considered as the initial in-sample estimation period. Hence, we compute out-of-sample forecasts for 1957:01-2019:12 (756 observations). In addition to the full forecast evaluation period, we present results that were computed separately during NBER-dated business-cycle expansions and recessions.
The first column of Tables 1 and 2 report the RMSE-value relative to the benchmark prevailing-mean model, i.e., (RMFSE model − RMFSE bench )/RMFSE bench . Therefore, negative entries indicate that the candidate model outperforms the benchmark model in term of RMFSE, thus generating more accurate point forecasts, while positive values indicate the opposite. The evidence found is consistent with [2]: very few of the univariate models beat the historical average benchmark in terms of RMFSE. In fact, for S&P500, only NP-FDA and SVAR models produce lower RMSE-values than the benchmark model when we consider the entire out-of-sample period and individual models. For DJIA, seven predictors and the NP-FDA produce lower RMSE-values than the prevailing-mean model for entire out-of-sample period.
In Table 1 for the S&P500 excess returns forecast, none of the individual predictors generate significantly positive R 2 OOS , except NP-FDA in the three sample periods. On the other hand, all of the combing methods exhibit positive R 2 OOS (ranging from 0.21% to 0.48%) in the full sample period. Nevertheless, none combing approach generates significantly positive R 2 OOS (significant at 10% level) in all considered sub-samples. The superiority of NP-FDA forecast is further confirmed when the NP-FDA approach methods outperform the historical average forecast in the full sample period, expansions, and recessions. Similar to Table 1, Table 2 (when we consider the DJIA excess returns) shows that all of the individual predictors also fail to deliver significantly positive R 2 OOS , except TBL, LTY, and LTR in the full sample period. On the other hand, the NP-FDA method consistently generates significantly superior performance compared to the historical average premium forecast. In addition to NP-FDA, for the DJIA excess returns, all of the combination forecast strategies consistently generate performance that is significantly superior to the historical average.
Overall, we find that the NP-FDA exhibits a positive and statistically significant R 2 OOS for both indices. The result is particularly strong along the entire out-of-sample period, where the NP-FDA produces R 2 OOS of 0.37% and 0.57% for S&P500 and DJIA, respectively. Among the other predictors, we highlight the performance of TBL, LTY, and LTR. For both indices, we find that 7 of the 14 predictors result in negative R 2 OOS values, suggesting underperformance relative to the benchmark model. These results are consistent with the findings of [2] and more recent literature ( [1,33,34], among others) that it is hard to find an individual variable that can significantly beat the historical average model. It is worth noting that predictive regressions typically have a very low R 2 . A monthly R 2 statistic of even 0.5 percent can yield an economically significant result (in terms of utility gain) in a return predictability study [3]. The last four rows in each table shows the results for combination schemes. Tables 1 and 2 also reports statistics computed separately during NBER expansions and recessions, respectively. The most important finding is that NP-FDA performs well in expansions and recessions, exhibiting positive and significant statistically R 2 OOS for all sub-sample periods. In contrast, none of the 14 predictors perform well in both states of the economy for both indices. When we consider the other economic fundamentals, the results change considerably between periods of recession and expansion. In fact, most of the predictors produce better results in periods of expansion than in periods of recession.
Recent literature (e.g., [1,16,34]) shows that individual models have performed poorly in forecasting stock returns because of model uncertainty and instability. Ref. [16] finds that combining individual forecasts provides convincing predictability performance out-of-sample. Inspired by their paper, we examined whether forecast combinations while using NP-FDA and traditional predictor models can lead to better performance when compared to univariate counterparts.
We consider four combinations strategies, which differ depending on the weighting scheme, as discussed in Section 3.3 above. We find that for the entire out-of-sample period and for both indices the combination strategies under consideration produce R 2 OOS that range from 0.18% to 0.50%, and they are statistically significant at the 10% level. These results that were obtained from the forecast combination are consistent with the findings of ( [7,16,34], among others). Interestingly, we find that the Sum-of-the-parts scheme results in higher R 2 OOS during expansion periods, but deteriorates in NBER recessions. Although RRMSFE and R 2 OOS provide a statistical measure of excess return predictability, they do not take account of an investor's risk during the out-of-sample period. Interestingly, R 2 OOS values are typically small; nevertheless, a modest forecasting ability may yield substantial utility gains for risk-averse investors [3]. In the next section, we use a utility gain measurement to assess economic significance of return predictability.

Economic-Based Forecast Evaluation
Although our analysis is focused on statistical measures of predictive accuracy, it is important to evaluate the extent to which apparent gains in predictive accuracy can be used in real time to improve economic utility of the investor, that is, translate into better investment performance. Given that statistical significance does not necessarily imply economic significance ( [1,5,33,34], among others), we assessed the economic value of the predictive power of stock returns by investigating the utility gains for investors who exploit the predictability of excess returns over an alternative without predictability associated with the prevailing average model. The motivation here is that investors are primarily concerned with the performance of return forecasts in terms of asset allocation.
We consider a mean-variance investor with quadratic utility who split her portfolio between stock market indices and a risk-free rate [1]. At the end of t, she chooses to invest, during t + 1, the following share of her portfolio to stocks: where γ is the relative risk aversion γ, andσ 2 i is a forecast of the variance of the index returns. Similar to [16], we simply assume the variance to be a 10-year rolling window of quarterly returns. Over the forecast evaluation period, the investor realizes the average utility, whereμ m andσ 2 m are, respectively, the out-of-sample mean and variance of the return on the dynamic portfolio formed on the basis ofr t+h andσ 2 m over the forecast evaluation period for each model m. The quantity that is defined in Equation (19) may be calculated for the prevailing-mean model as well (ν bench ). The differenceν m −ν bench represents the utility gain of using the competitor model m to forecast the excess returns in place of the prevailing-mean forecast in the asset allocation decision. This utility gain is the certainty equivalent return, which may be viewed as the quantity an investor would be willing to pay to have access to the return forecasts generated using the model m instead of using the information in the historical average.
We compute the utility for the same investor who uses the historical average excess return forecast. We measure the utility gain (or certainty equivalent return) as the extra utility that is generated from Equation (19) relative to this benchmark. This difference is multiplied by 1200 to express the utility gain in average annualized percentage return. Table 3 reports certainty equivalent (in annualized percent return) for a mean-variance investor with γ = 5 who allocates among stocks and risk-free bills while using forecasts that are based on competitors models in place of historical average forecasts.
The results indicate that the NP-FDA is the only one that provides economic value relative to the prevailing-mean benchmark for all considered datasets and for both market indices. Specifically, when we consider the entire out-of-sample period the NP-FDA approach provides annualized gain above 0.7% and 0.3% for S&P500 and DJIA, respectively, meaning, in these cases, that the investor would be willing to pay more than 0.7% and 0.3% to have access to the information in the NP-FDA compared to the historical average forecasts, respectively. The certainty equivalent gains of the popular preditor models imply evidence of moderate economic performance relative to the prevailing-mean model. Indeed, only 5 out of the 14 univariate models produces positive ∆ for S&P500 excess returns, indicating that the predictive regression forecast has a higher certainty equivalent than the historical average. The performance for the DJIA index is even worse. Overall, the average utility gains provided here supports NP-FDA approach to forecast stock indices excess returns. It is important to emphasize the need to supplement standard statistical criteria with more direct economic-based measures when analyzing out-of-sample bond return predictability. Note: This table presents the average utility gain (∆) in the portfolio management fee (in annualized percentage return) that an investor would be willing to pay to have access to the NP-FDA forecasts and the predictive regression forecast that is based on the economic variables given in the first column relative to the prevailing-mean benchmark forecast. All results are out-of-sample and were calculated using a rolling window estimation of the most recent 30 years of observations (1957:01-2019:12).
As discussed earlier, forecast combinations improve the forecast results by weighting the individual regression forecasts resulting in a lower variance relative to the individual forecasted returns. The utility gains in the certainty equivalent, ∆ (annual %) are presented in the last four rows of Table 3 for S&P500 and DJIA. The utility gains provide economic evidence for stock return predictability of combination forecast models. Specifically, we find consistent evidence that the incorporation of NP-FDA improves the forecast combination performance, from both the statistical and economic perspectives, mainly for S&P500 excess returns. The best performance is found for the diffusion index scheme, in which the utility gains are greater than 230 basis points for all of the considered time periods. This finding on the return predictability based on forecast combination is consistent with the result in ( [7,16,34], among others). Interestingly, for the S&P500 index, the simple equally weighted forecast combination (POOL-AVG) delivers an ∆ (annual %) of 2.62%, which is considerably higher than the ∆ (annual %) of the better individual model.
Our results show that the performance of some of the predictor models, according to R 2 OOS and utility gain metrics, is quite similar. For example, in the S&P500 index, the NP-FDA approach has an R 2 OOS of 0.37 percent and a utility gain of 0.98 percent. The prior literature suggests a lack of association between utility gain and R 2 OOS [1] and shows that different performance metrics can lead researchers to draw different conclusions regarding the forecasting ability of the predictor variables.

Robustness Check
In addition to evaluate the forecast performance of the NP-FDA approach in both NBER expansions and recessions, we extend our analysis to a longer DJIA dataset, from 1985:01-2020:06. As in the previous analysis, we use a 30-year rolling window of the data, generating 1252 out-of-sample forecasts for the period 1915:01-2020:06. Table 4 reports the same statistics that were considered before. In this case, we compare the NP-FDA forecasts with those of the prevailing-mean model. Once again, we report results for the full sample as well as NBER-dated expansions and recessions. In the expansion periods, the NP-FDA exhibit positive and statistically significant R 2 oos of 0.48%. However, the performance deteriorates in recession periods. For the full sample period NP-FDA forecasts beat the historical mean, statistically and economically. More importantly, taking into account that investors are primarily concerned with the performance of return forecasts in terms of asset allocation, the NP-FDA forecasts overcome the prevailing-mean model in economic terms at all time periods considered. In summary, this result corroborates previous findings that NP-FDA presents good performance when used to forecast excess stock returns.

Conclusions
This study provides a comprehensive examination of out-of-sample predictability of excess returns for both the S&P500 and DJIA indices while using NP-FDA approach, a variety of individual regression models as well as combination forecast models. Similar to the previous literature, our results reveal that it is difficult to find an individual predictor that consistently outperforms the historical average benchmark and provides significant out-of-sample forecasts of the equity premium. Nevertheless, the NP-FDA tends to generate significantly positive R 2 oos for both indices and different out-of-sample periods. Assuming that daily cumulative returns can be interpreted as curves describing the cumulative return during a given month, we propose predicting the S&P500 and the DJIA excess returns out of sample based on the NP-FDA estimator. This estimator may be viewed as a weighted mean of monthly stock excess returns, in which the weights are defined based on a measure of proximity between those curves. In addition, we propose combining the forecasts that are generated by the NP-FDA and the traditional predictor variables. We find that NP-FDA produces excess returns forecasts that are statistically and economically significantly superior to the benchmark prevailing-mean model, over our out-of-sample period, 1957:01-2019:12. Indeed, our results show that NP-FDA is the only one individual model that can overcome the historical average forecasts for excess returns in statistically and economically significant manners for both S&P500 and DJIA during the entire period, NBER recession, and expansions periods. Furthermore, the NP-FDA provides additional relevant information for predicting stock returns in relation to popular predictor variables to a large extent.
Our results show that adding NP-FDA forecasts to the traditional univariate predictive regressions with a popular predictors significantly improves the forecasting combination performance with different schemes in terms of conventional statistical measures of forecast performance and economic gains term. Economic gains are of particular importance from the perspective of investors, and our NP-FDA approach is of tremendous value in this regard, when compared to the methods adopted in the existing literature so far.
As part of future analysis, it would be interesting to extend our analysis to other developed and emerging market economies in order to confirm the superiority of the NP-FDA approach in a robust manner.