Prediction of Stock Returns: Sum-of-the-Parts Method and Economic Constraint Method

: Forecasting stock market returns has great signiﬁcance to asset allocation, risk management, and asset pricing, but stock return prediction is notoriously di ﬃ cult. In this paper, we combine the sum-of-the-parts (SOP) method and three kinds of economic constraint methods: non-negative economic constraint strategy, momentum of return prediction strategy, and three-sigma strategy to improve prediction performance of stock returns, in which the price-earnings ratio growth rate ( gm ) is predicted by economic constraint methods. Empirical results suggest that the stock return forecasts by proposed models are both statistically and economically signiﬁcant. The predictions of proposed models are robust to various robustness tests.


Introduction
The prediction of stock returns has always been a concern of scholars and investors, as it affects some fundamentals of capital budgeting and investment processes [1]. Nevertheless, the prediction of stock returns is difficult. Goyal and Welch [2] argue that the out-of-sample prediction performance of predictive models based on a long series of economic variables does not outperform the historical average, although the interest models have a superior in-sample forecasting performance compared with the benchmark. Hence, researchers tend to adopt different predictors such as dividend yield [3], inflation [4], oil price increase [5], and technical indicators [6,7] for predicting stock returns. On the other hand, some researchers aim to gain more accurate stock return forecasts through various methods, for example: non-negative economic constraint (CT) [8], sum-of-parts (SOP) method [9], combination forecasts [10] and [11], the conditional Sharpe ratio [12], momentum of return prediction (MOP) [13], and three sigma rule (three-sigma) [14].
Although Ferreira and Santa-Clara [9] show that sum-of-the-parts (SOP) can improve the accuracy of prediction as a whole and have better out-of-sample results, things do not seem so optimistic when it comes to the predictive power of the price-earnings ratio growth rate (gm). To remedy this situation in the SOP method, we will combine SOP [9] and three kinds of economic constraint methods in Campbell and Thompson [8], Wang et al. [13], and Zhang et al. [14] to obtain stock return forecasting.
In detail, SOP model divides the stock returns into three parts. First, each part is predicted separately, and then the predictive value of each part is added up to get the forecast of stock returns. Moreover, in the process of coefficient estimation, a certain constraint can be imposed on its coefficients of the predictive model. Relative to the original SOP method, the influence of 14 economic variables [9] on the SOP component is also taken into account. In other words, we first decompose the stock returns into three different parts: earnings growth rate (ge), dividend-price ratio (dp), and price-earnings ratio growth rate (gm). Concerning dp, we use the true value because of its high persistence. For ge, we construct its 20-year moving average [9] as its predictive value. The forecasting methods of the first two components are the same as those of the previous literature. However, with regard to the third component of gm, we use 14 economic variables to forecast it, instead of assuming that gm is 0, as in simple versions of SOP. It is noteworthy that we limit the parameters when regressing gm and macroeconomic variables in order to improve the performance of prediction.
After obtaining the predictive values of gm with parameter constraints, we adopt three strategies to get the final forecast of gm. The first one is CT strategy [8], that is to say, we only consider the part of gm larger than 0 simply because we prefer to get the positive stock return prediction value. To reduce the impact of extreme value, we use a new three-sigma strategy slightly different from Zhang et al. [14]. When the gm at time t + 1 is beyond the range of three-sigma, we use the value ofĝm t , otherwise we use theĝm CT t+1 value based on the CT strategy. The third one is the MOP strategy [13]. MOP is the momentum effect, which believes that some attribute of the past period can be extended to the present. The reason why we argue gm may have momentum effect is because it is a part of stock return, and the predictability of stock return considering momentum effect obviously improves according to Wang et al. [13]. Thus, we obtain three gm predictions based on different strategies. We call the method, considering three different strategies constrained, SOP because it can be interpreted as imposing certain constraint on the predictive values of gm.
In the empirical analysis, our sample data spanned from December 1927 to December 2018. Recursive window method was used to generate out-of-sample stock return forecasts in this paper. Following Binsbergen and Koijen [15], Paye [16] and Rapach et al. [17], we employed the monthly out-of-sample R 2 (R 2 OS ) calculated based on relative percentage decrease of mean squared predictive error (MSPE) to evaluate the performance of prediction and the p-value of Clark and West [18] statistic to check the significance from a statistical perspective. From the out-of-sample stock return forecasts, we can easily find that the SOP model with CT, three-sigma, and MOP constraints increase the average R 2 OS by 0.132%, 0.158%, and 0.401%, respectively, compared with SOP model, which indicates that imposing certain restrictions on the SOP model will indeed improve the predictability of stock returns. What is more, our new three-sigma SOP model yields the largest increase in average out-of-sample R 2 OS . Regarding economic performance, we further calculated the certainty equivalent return (CER) gain, the difference between the CER for a mean-variance investor with risk coefficient of five, who uses a SOP or constraint-SOP predictive regression forecast of the equity risk premium and the CER gained by utilizing benchmark forecast according to Rapach and Zhou [19]. In univariate analysis, the average CER gain increased from 0.688% of SOP model to 0.799%, 0.891%, and 0.843% when imposing CT, MOP, and three-sigma constraints, respectively. This again supports the evidence that generally combining SOP with three constraints moderately improves out-of-sample economic values. In addition, after using principal component predictive regression to predict price-earnings multiple growth rates, we can obtain significant stock return predictability both in statistical and economic terms, which suggests that incorporating multivariate information is conducive to improve the out-of-sample performance.
The predictability of stock return using the proposed models is robust, even considering reasonable transaction costs or various risk aversion coefficients. This is still the case that the robustness test again supports the advantages of our constraint-SOP models.
The rest of this paper is organized as follows. Section 2 introduces the methodology. We elaborate on two aspects: the form of SOP model and three kinds of constraint-SOP models, and out-of-sample evaluation is also presented. Section 3 reports descriptive statistical analysis on the monthly data. Section 4 discusses the out-of-sample empirical results including statistical and economic performance. Section 5 gives the robustness test. Section 6 concludes this paper.

Return Decomposition
Following Ferreira and Santa-Clara [9], we decomposed the stock return into three parts. In addition, we obtained the stock return forecast through adding up the predictive values of each decomposed part.
The stock returns at month t + 1, R t+1 can be divided into capital gains, CG t+1 , and dividend yield, DY t+1 : where P t+1 is the stock return at time t + 1, D t+1 is dividend per share paid from time t to t + 1.
Capital gains can be written as: where M t+1 is the price-earnings multiple, GM t+1 is the price-earnings multiple growth rate, and GE t+1 denotes the earnings growth rate. Similarly, the dividend yield can be divided into: where DP t+1 is the dividend/price ratio. Therefore, Equation (1) can be rewritten substituting capital gain, CG t+1 , and the dividend yield, DY t+1 : Finally, we take logarithms of Equation (4) to get the following: r t+1 = log(1 + R t+1 ) = dp t+1 + ge t+1 + gm t+1 (5) where lowercase ones are rates in logarithmic form. For instance, gm t+1 is the log of one plus price-earnings multiple growth rate.

The Sum-of-the-Parts Method
As shown in Section 2.1, we forecasted separately the components of the stock return from Equation (5): where E t ge t+1 is the expected earnings growth estimated using a 20-year moving average and regarded as the low-frequency prediction part in line with Binsbergen and Koijen [15]. According to the analysis results of Ferreira and Santa-Clara [9], E t dp t+1 , the expected dividend-price ratio, is assumed to be a random walk, so it is equivalent to the present true value. Following the extended version of the SOP method in Ferreira and Santa-Clara [9], we used 14 economic variables to predict the E t gm t+1 , rather than making it equal to zero as in the simple version of the SOP (see Ferreira and Santa-Clara [9]). The following univariate predictive regression model can be used to predict the price-earnings multiple growth rate: where x i,t is the i-th macroeconomic variable at montht. Following Ferreira and Santa-Clara [9] and Connor [20], in order to lessen the influence of model uncertainty on prediction performance, we restricted the intercept and the slope coefficient, instead of directly usingâ i andb i , estimates of ordinary least squares (OLS) regression. Along with Ferreira and Santa-Clara [9] and Connor [20], we converted the slope coefficient into b Hence,ĝm t+1 refers to the predictive value of log of one plus the price-earnings multiple growth rate with shrinkage in subsequent studies.
In addition, we will incorporate information from multiple variables to predict the price-earnings multiple growth rate based on principal components. Let X t = (x 1,t ,x 2,t . . . x 14,t ) T denote the 14 vectors of the entire set of variables, and let E t = (e 1,t , e 2,t . . . e K,t ) T denote the vector containing the first K principal components extracted from E t (where K < 14). The principal component predictive regression (PC model) is given as follows:ĝ Principal components parsimoniously incorporate information from a large number of potential predictors in a predictive regression, where the first few principal components identify the key co-movements among the entire set of predictors. Following Neely et al. [5], we also selected the number of principal components as K = 3.
Overall, the SOP method adopted in this paper is: where ge t denotes the twenty-year moving average of ge up to month t, dp t means the log of one plus the dividend/price ratio at month t, andĝm t is obtained by Equations (7) or (8).

Forecasting with Constrained-SOP Model
In order to improve the prediction accuracy of the log price-earnings multiple growth rate (gm), we will impose reasonable constraints on Models (7) and (8). In this paper, three economic constraints were applied in the process of forecasting.
Campbell and Thompson [8] argue that reasonable investors do not consider a negative stock return forecast. In the same light, we expect a positive price-earnings multiple growth rate. Consequently, the CT forecasts are defined as:ĝ whereĝm CT i,t+1 denotes the CT predictive value of gm at month t + 1. The second is based on momentum effect. Wang et al. [13] found if past predictability of the benchmark model outperforms, it is more likely to still beat the current model of interest. The past predictability is defined as follows: where pp t (k) equals one when the model of interest performs better than the benchmark. k is the look-back period. It is evident that pp t (k) is affected by k. We finally chose k = 3 as the out-of-sample performance, as it is relatively better than the remaining look-back periods. The momentum of predictability (MOP) strategy is as follows: whereĝm MOP t+1 (k) denotes the predictive value of gm based on the MOP strategy, and gm t+1 is the historical average up to time t.ĝm t+1 represents the predictive value of log of one plus the price-earnings multiple growth rate with shrinkage.
The third one is based on the three-sigma strategy. Zhang et al. [14] argued that a rational investment behavior is unlikely to be implemented when the return forecast is too large or small. Moreover, rational investors tend to get a positive forecast. In view of these two points, we put forward the following new three-sigma constraint strategy slightly different from method proposed in Zhang et al. [14].ĝ whereĝm i,t is the OLS predictive value of log of one plus the price-earnings multiple growth rate with shrinkage, and σ t denotes the standard deviation of log of one plus price-earnings multiple growth rate for month t generated by using an expanding window.ĝm new i,t+1 is the forecast on account of new three-sigma constraints.ĝm CT i,t+1 is the forecast in Model (12), and it usually helps to exclude the impact of outliers.
Overall, we combined the SOP method in Equation (9) and three kinds of constraints to receive more precise forecast results on the stock return.

Forecast Evaluation
We used the recursive window method to generate out-of-sample forecasts. To be more specific, the full sample of T observations was divided into two parts: M observations for the in-sample part and N observations for the out-of-sample segment. . The progress will not be interrupted until N out-of-sample predictions are obtained.
Following [21][22][23][24][25][26][27][28][29], the forecast quality is evaluated by R 2 os , which is the percent reduction of MSPE of the given model compared to the benchmark model, given by: where (r t − r t ) 2 .r t and r are the return forecast from the model of interest and historical average forecast, respectively. The historical average r t . As we know, a positive R 2 os value indicates that the predictive regression forecast outperforms the benchmark in terms of MSPE, while a negative value means the opposite.
Moreover, we used Clark and West [19] statistics (i.e., MSFE-adjusted statistics) to further explore the significance of R 2 os . The null hypothesis is MSPE bench ≤ MSPE model against the upper-tail alternative hypothesis MSPE bench > MSPE model . The CW statistic is computed by first defining: The t-statistic through a regression of f t on a constant is the CW statistic, and its p-value can be used to explore the significance.

Stock Return
Considering the availability of the data, we used the return of S&P 500 index (hereinafter referred to as S&P 500 or SP500) as market return. The data can be obtained from https://finance.yahoo.com/, and the stock return is computed as: where P t is the closing price at month t. For the convenience of showing the forecast effect, we multiplied the logarithmic return by 100.

Macroeconomic Variables
In order to explore the predictability of stock return, we selected 14 monthly macroeconomic variables from Amit Goyal's home page at http://www.hec.unil.ch/agoyal/. In detail, we included the following variables: • stock variance (SVAR), the sum of the squared daily returns on the S&P 500 index;  Table 1 displays summary statistics for stock returns and 14 macroeconomic variables from December 1927 to December 2018 for the full-sample period at a monthly frequency. The average monthly stock return was 0.769%, and the monthly standard deviation was 5.375%. Although the autocorrelation of SP500 was very low, 0.078%, 10 out of 14 economic variables were strongly autocorrelated, as their autocorrelation coefficients were more than 0.9. In terms of the mean, we can see that the mean value of SP500 approximately equaled the sum of means of dp, ge and gm. This also verified the correctness of the formula deduced in the SOP decomposition process. This table reports the summary statistics for the stock return on the SP500 index and the related predictors used in the study. Ret is the market return. gm is the growth in the price-earnings ratio. ge is the growth in earnings. dp is the dividend-price ratio. The remaining are defined as follows: stock return variance (SVAR), default return spread (DFR), long-term bond yield (LTY), long-term bond return (LTR), inflation rate (INFL), term spread (TMS), treasury bill rate (TBL), default yield spread (DFY), net equity expansion (NTIS), log dividend payout ratio (DE), earnings-price ratio (EP), the log dividend-price ratio (DP), log dividend yield (DY), and log book-to-market ratio (BM). AR (1) refers to the first-order autocorrelation. The sample period ranges from December 1927 to December 2018.

Out-of-Sample Forecasting Performance
Following Ferreira and Santa-Clara [9], we forecasted the stock return combining the SOP method and the CT strategy, the MOP strategy, or the three-sigma strategy, rather than only using different economic variables as for the predictors in the third part. Table 2 reports prediction statistical results from an out-of-sample standpoint. The R 2 OS and the p-value of Clark and West [18] MSFE-adjusted statistics were used to evaluate whether our models could beat the benchmark. The forecast period ranged from December 1947 through December 2018. Twelve of 14 macroeconomic variables had positive R 2 OS values, which was nearly consistent with the literature (e.g., Ferreira and Santa-Clara [9]). That is to say, the SOP method with multiple growth gm and shrinkage outperformed the historical average method.
Interestingly, after adding CT constraint to the SOP method, we found that 12 predictors generated larger R 2 OS statistics compared to the original SOP method. When we implemented the three-sigma strategy in the SOP predictive model, the vast majority of R 2 OS values significantly matched or even exceeded the counterparts. It is remarkable that all R 2 OS values for three constraint methods were positive, as shown in the third to fifth column, meaning the three constraint methods outperformed the benchmark model. Nevertheless, though the SOP method based on the MOP switching regime failed to outperform the SOP method for all the predictors when the look-back period k was set to 3, the MOP-based SOP method applied to EP and DP regressions improved from −0.44% to 0.129% and from −0.157% to 0.532%, respectively, which indicates that it is still helpful for predictability improvement. The SOP based on three-sigma strategy outperformed the above three SOP methods and yielded the largest R 2 OS , 1.535%, at a 1% significance level. When considering out-of-sample results for the principal component predictive regression forecasts, it was observed that the out-of-sample performance of the principal component predictive regression forecasts manifested a stronger out-of-sample predictive ability than that of the univariate predictive regression model. Meanwhile, for the principal component predictive regression forecasts, the economic constraint SOP models also yielded larger R 2 OS values than the original SOP model.
Overall, the predictive ability of SOP was stronger with the addition of the CT constraint, MOP, and three-sigma strategy. Among them, the three-sigma strategy had the best performance.  [8] economic constraint, momentum of predictability (MOP) strategy, and our new three-sigma constraint approach, respectively. PC means principal component predictive regression forecasts. The return forecasts are generated from a recursive window. In the MOP-SOP model, the look-back period is three. We measured statistical significance using the Clark and West [18] test statistic. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively. The out-of-sample period is December 1947 to December 2018.

Asset Allocation
In the above subsection, we investigated the out-of-sample performance from a statistical significance perspective. We then quantified the economic value of stock return forecasts based on four kinds of SOP models. Namely, we subsequently computed the certainty equivalent return (CER) and its gain. As in [30][31][32][33][34][35][36][37][38][39], we imagined investor as a risk-averse and a mean-variance utility investor who optimally allocated her wealth across stock index and risk-free bills in light of various stock return forecasts. The weight of the portfolio to equities during month t + 1 is given by wherer t+1 denotes forecasts of the stock return based on four kinds of SOP approaches. We set the risk aversion parameter γ to 5.σ t+1 2 is the forecast of stock return variance computed from a five-year rolling window of the past returns. We also restricted ω t to be between −0.5 and 1.5 to have over 50% financial leverage.
The portfolio return at month t + 1 is as follows: where R f ,t+1 is the risk-free rate. The CER for portfolio is whereμ p andσ 2 p denote the mean and variance forecasts, respectively. We then calculated CER gain, the difference between the CER achieved through employing the four kinds of SOP models, to forecast stock returns and the CER using the benchmark forecasts. Following Rapach et al. [10], we multiplied the CER gain by 1200. We can thus treat the CER gain as the management fee that an investor would be willing to accept to have access to a trading strategy based on the four kinds of SOP methods, instead of one based on the benchmark forecast. Table 3 shows the portfolio results of the single economic variable and the principal component predictive regression forecasts. Comparing with the economic performance of the SOP model, we found 11 of 15 variables with CT constraints improved their CER gains. Using the univariate three-sigma SOP model, we obtained the highest CER gains of 224.8 basis points in the out-of-sample, demonstrating the predictability of stock return was economically significant. Moreover, 9 of 14 predictors in the three-sigma SOP models achieved CER gains of over 1%, and 11 of 14 predictors including variables like LTY, DE, and DY improved their CER gains after implementing the CT and three-sigma strategy. In detail, the CER values increased from 1.318% to 1.538% for LTY, from 1.26% to 2.248% for DE, and from 1.671% to 2.024% for DY after implementing the three-sigma strategy. Twelve of 14 predictors in the MOP-SOP model had positive CER gains, and even 8 predictors had more than 1%, although they failed to perform better than the SOP method. When considering economic performance for the principal component predictive regression forecasts, we can see that the CER gain of the principal component predictive regression forecasts was larger than that of the univariate predictive regression model, which is consistent with the out-of-sample prediction performance in Section 4.1. Meanwhile, the CER gain of economic constraint SOP models was larger than that of the original SOP model for the principal component predictive regression forecasts. Generally, the CER gain of the SOP model improved to some extent after adding different restrictions, and most CER gains were greater than 1%, which indicates that the economic performance of the SOP model with restrictions was superior. In a word, the performance of the SOP model with constraints performed better than the historical average model. When we compute the certainty equivalent return (CER) for a mean-variance utility investor who allocates his or her assets between a stock index and the risk-free asset by using various stock return forecasts, the SOP model and the CT constraint SOP model can promote systematic gains (except for the predictors DP, EP, and NTIS), and the MOP-SOP model and three-sigma SOP model can promote systematic gains (except for the predictors EP and NTIS).

Investor Risk Aversion Choices
In the earlier part of the article, we assumed that an investor's relative risk aversion equals 5. However, in fact, the optimal weight may be affected by the investor's relative risk aversion. Therefore, in order to further examine whether the economic performance of stock return makes a difference according to the investor's relative risk aversion, it is worthwhile to consider other risk aversion values. Here, we employed alternative risk aversion coefficients of 3 and 4 and re-ran the asset allocation to check the robustness. Table 4 reports CER gains calculated with alternative risk aversion coefficients. In univariate regression, whether the risk aversion coefficient was 3 or 4, we can see that 11 out of 15 predictors in the CT-SOP model increased CER gains clearly compared with the original SOP model. When γ was 3, it is noteworthy that 11 out of 14 predictors of CER gains of SOP model with three-sigma restriction were higher relative to the original SOP model, while only 7 predictors in SOP models with MOP constraints were better. Besides, when switching to the risk aversion coefficient of 4, it was the same case, which is consistent with analysis results in Section 4.2. At the same time, we can note that CER gain decreased with the increase of risk aversion coefficients. The CER gain of economic constraint SOP models was larger than that of the original SOP model for the principal component predictive regression forecasts if different risk aversion coefficients were selected. The above findings suggest that the superior economic performance of SOP models with constraints is robust.

Transaction Cost
Moreover, we may have to pay a certain transaction cost in practice, instead of ignoring the transaction cost as in Section 4.2, that is, to make the transaction cost zero. Following prior literature (see e.g., Neely et al. [5]; Zhang et al. [14]), we calculated the certainty equivalent return (CER) gains taking the transaction cost of 50 basis points into account (Balduzzi and Lynch [23]). Table 5 checks the robustness for considering transaction costs. As pointed out in Zhang et al. [14], the stock return forecasted by the interest model usually fluctuated more than that of the historical average, which results in relatively lower economic values. Compared with Table 4, it can be shown that 10 of the 14 predictors in the CT-SOP model, 10 of the 14 predictors in the MOP-SOP model, and 12 of the 14 predictors in three-sigma SOP model had bigger CER gains than that of the original SOP model. The CER gain of economic constraint SOP models was larger than that of the original SOP model for the principal component predictive regression forecasts, even if transaction fees were considered. In sum, the economic performance of SOP models with economic constraints is robust regardless of risk aversion factors or transaction costs. This table reports the certainty equivalent return (CER) gains when supposing a proportional transaction cost of 50 basis points per transaction. The CER gains are calculated as the difference between the CER from the model of interest and that from benchmark forecasts. We multiply CER gains by 1200 to get the annualized values. PC denotes the CER gains based on principal component predictive regression forecasts. SOP refers to the original SOP model with multiple growths and shrinkages, while CT-SOP, MOP-SOP, and three-sigma SOP denote the SOP models with Campbell and Thompson [8] economic constraint, momentum of predictability (MOP) strategy, and our new three-sigma constraint SOP approach, respectively. Risk aversion coefficient is set to 5, and the weight of stock is restricted to values between −0.5 and 1.5. The out-of-sample period spans from December 1947 to December 2018.

Conclusions and Implication
This paper combines the SOP method and economic constraint method to predict stock return. Specifically, unlike the ordinary economic restriction model, we do not directly choose between the predictive value of stock return based on the model of interest and the historical average according to some strategy, but we proceed to its component selected following the strategy. For instance, it is enough to use the 20-year moving average to capture its direction for some component like ge because it has a low-frequency predictable part [34], while dp is suitable for being described with random walk. The purpose of combining CT restriction with the SOP method is to fully consider the impact of positive stock returns. To mitigate the impact of the extreme value of its components, we combine a new three-sigma rule with SOP. Using MOP to investigate the component gm, we can obtain that the properties of gm in the past will continue to the present.
Our out-of-sample analysis results show that, in general, the SOP model with restrictions can significantly improve the predictive ability, especially when our new three-sigma restriction is applied to the SOP model. Overall, the economic constraint SOP models can obtain better forecasting results than those of the original SOP model. Moreover, the economic performance of SOP models with economic constraints is robust regardless of risk aversion factors or transaction costs.
Finally, possible implications of our findings are the following: (i) the predictive power can be improved by using economic constraint SOP model; (ii) the principal component predictive regression can further enhance the out-of-sample prediction ability; and (iii) in the investments world, a mean-variance investor can choose an appropriate SOP model to obtain high gains from the market.