Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Tree-Based Methods of Volatility Prediction for the S&P 500 Index

Computation 2025, 13(4), 84; https://doi.org/10.3390/computation13040084

by Marin Lolic

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Computation 2025, 13(4), 84; https://doi.org/10.3390/computation13040084

Submission received: 27 January 2025 / Revised: 17 March 2025 / Accepted: 22 March 2025 / Published: 24 March 2025

(This article belongs to the Special Issue Quantitative Finance and Risk Management Research: 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The topic of the paper is in a highly interesting area. However, its novelty is difficult to identify. Additionally, the contribution appears to be minimal. Moreover, the mathematical formatting and overall editing require professional refinement.

Author Response

Comment 1: The topic of the paper is in a highly interesting area. However, its novelty is difficult to identify. Additionally, the contribution appears to be minimal.

Response 1: I have added to the Introduction and Discussion sections to make the novelty and contribution more clear. In short, the novelty/contribution is the application of tree-based methods to volatility prediction, which I do not believe has been done before.

Comment 2: Moreover, the mathematical formatting and overall editing require professional refinement.

Response 2: I used standard Latex formulas throughout, so I'm not sure why the mathematical formatting is an issue. The editing comment is also not specific.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper presents a well-structured and technically sound exploration of tree-based ensemble methods for forecasting the volatility of the S&P 500 Index. It provides a thorough review of classical volatility prediction methods and effectively compares them with machine learning approaches, particularly random forests and gradient boosting. However, I recommend the following aspects to strengthen its contribution.

First, the authors should use the journal's template. The article was submitted in a document that does not adhere to the journal's specific template.

In the literature review section, I believe the paper mainly uses raw historical returns as input features. While the variable importance section suggests that tree models naturally learn useful patterns, discussing alternative feature engineering techniques (e.g., inclusion of lagged volatility, macroeconomic indicators) could demonstrate the model's adaptability.

In the methodology section, the paper addresses high-dimensional settings where tree-based models can be affected by redundant or correlated features. Was there any assessment of feature collinearity, and how does it impact model interpretability?

In the results section, the paper presents machine learning models, particularly gradient boosting, which are prone to overfitting. The paper should discuss whether cross-validation techniques (beyond simple train-test splits) were employed to mitigate this risk.

Author Response

Comment 1: First, the authors should use the journal's template. The article was submitted in a document that does not adhere to the journal's specific template.

Response 1: I was told I could submit in an ordinary Word document. If the editor requests I use the template, then I will do so.

Comment 2: While the variable importance section suggests that tree models naturally learn useful patterns, discussing alternative feature engineering techniques (e.g., inclusion of lagged volatility, macroeconomic indicators) could demonstrate the model's adaptability.

Response 2: See Discussion section, where I include this as a possible future extension. Additionally, I include VIX as an external variable, which leads to the most successful of the 7 models I test.

Comment 3: Was there any assessment of feature collinearity, and how does it impact model interpretability?

Response 3: I have added feature collinearity to Section 3.2. In short, there is almost no collinearity of features.

Comment 4: The paper should discuss whether cross-validation techniques (beyond simple train-test splits) were employed to mitigate this risk.

Response 4: See Section 3.3, where I describe the cross-validation procedures used to minimize overfitting risk.

Reviewer 3 Report

Comments and Suggestions for Authors

I have read the paper with interest and have the following comments.

1) The paper is well-written so no issues with the language or design. However, I believe the paper has three minor shortcomings: The first is the author’s shallow review of the comparison models. It seems like the author is either not an expert on those models or he elected to use a rather shallow review of those models. The background section is extremely short and covers a very small fraction of the relevant models. I will detail this later.

The second issue is that the coverage lacks details in forecast results in terms of how they were generated and how the comparison statistics were obtained.

Finally, the third issue is the style that the paper does not follow the typical style of a research paper. Its explanations look more like news briefings rather than usual research paper explanations.

For instance, in section 3.3 we see “Using the R programming language and associated libraries, we examine seven total methods of volatility prediction - three classical, one using options, and three based on decision trees. The first method is an equal-weighted moving average of past values. This simply involves calculating the realized volatility of the final 21 days of each data slice and using it as the prediction for the following 21 days; it implicitly assumes that each day in the past contributes an equal amount of information. The second method utilizes…” This is not a typical explanatory paragraph expected in a journal named “computation”. It is expected that the author should insert the equations with explanations and results separated for all three methods instead of “saying” what they are.

The same goes for the second paragraph of section 3.3. Each of those methods must be identified using symbols, equations, etc. so we will be able to follow and understand their differences quantitatively and their relative contributions. As it is, it looks like an informational statement not suitable for a research paper that has to elaborate on the methods including steps. Ultimately, several journals require data and method files (with a batch file explaining how to obtain the results) so that the results must be replicated.

As a result of those issues, I think the paper should go over a minor-to-major revision. The following sections will elaborate my evaluation hoping the author will benefit from those and will revise the paper in line with the suggestions.

2) The author highlights that the main issue is “predicting the asset return volatility.” However, the paper uses the S&P 500 index as the main data and generalizes. While the market index is of course an excellent portfolio of assets, quantitative finance also focuses on individual asset volatilities or custom portfolio volatilities. So I suggest addressing this issue in the abstract as well as the introduction section. The volatility of an index consisting of 500 major stocks might be significantly different from other smaller and engineered portfolio volatilities as well as the volatilities of individual assets. If I monitor Amazon shares, can I still use the suggested methods? What if I designed a hedge fund to hedge my 15-stock portfolio? Again, I am not criticizing the use of the S&P 500 as it is a common practice, but suggesting an explanatory paragraph highlighting its limitations.

3) Mandelbrot, Benoit. (1963) also needs more elaboration. The persistence highlighted by Mandelbrot seems to be misunderstood by the author. Mandelbrot highlights volatility clustering: large changes in the price of an asset are often followed by other large changes, and small changes are often followed by small changes. This behavior has been reported by numerous other studies, such as Baillie et al (1996), Chou (1988), and Schwert (1989). So, I suggest that the author should make a correction underlining that Mandelbrot is not saying volatility is more predictable than future returns. “Persistence” implies the “size” of volatility not the “existence” of volatility. Furthermore, it was highlighted in the relevant literature that the volatility always moves toward (reverts to the mean) its “normal” level. If it is too high, it declines, if it is too low, it increases.

4) Section 3.2. shows that the study covers 31 years of daily returns! Researchers know the issue of using daily returns for such a long study period: It introduces a lot of noise. So, the author should underline why they do not study weekly or monthly data. If it is justified, we accept the use of daily returns. Keep in mind that even the beta regressions will use only the last 3 years with monthly data.

Another issue is the period studied. The author would not explain the logic of going this much backward… One should not go back this much to discover the stylized facts of volatility as three decades is way too long to obtain past information to forecast future volatility. The author needs to explain how and why he decided to use this for 31 years. The author actually explained that he did not go more backward than 31 years although the data is available. However, 31 years is a very long period for this sort of memory model.

5) Section 2.1 Volatility Predictions seems to be quite incomplete and not up to date. To the extent that the author's volatility models classification totally ignores the volatility models that are not functions of the observables. This whole section resembles a news article rather than a research paper.

6) Section 3.1 underlines that “the daily returns are simple returns, not logarithmic.” The statement begs for further elaboration. Why not logarithmic? The author is not establishing the notation properly here. What is the asset price at time t and the return (with proper compounding such as continuous etc.) on the asset over the set period t-1 to 1? How the conditional mean and variances are defined? None of those were elaborated in the paper.

7) Figure 1 is too general and looks like something taken from Wikipedia. When one searches with the keyword “random forest” this exact figure pops up. It needs to be recreated with proper explanations showing why it is needed in this research paper and how it explains the process. Instead of having X₁ and t₁, it should use the relevant variable names.

8) The author should provide the results with and without the purged set for the readers to compare the benefits of introducing the purged set as well as how the size of the purged set has been decided.

9) Figure 3 is not comprehensive enough for any reader. It has to have a caption explaining what is going on in the figure. What is the main time series? I managed to get those using the available data file but for a reader who would not bother, a more comprehensive figure is more valuable.

10) The paper never touches on positive and negative shocks that may have the same impact on the volatility. As explained by Engle, this may be called a leverage effect and sometimes a risk premium effect. “In the former theory, as the price of a stock falls, its debt-to-equity ratio rises, increasing the volatility of returns to equity holders.” So, not using the GARCH model may miss this important information although the forecast is better with the proposed method.

11) The author is not up to date with the references.

Baillie R T, Bollerslev T and Mikkelsen H O 1996 Fractionally integrated generalized autoregressive conditional heteroskedasticity J. Econometrics 74 3–30

Chou R Y 1988 Volatility persistence and stock valuations: some empirical evidence using GARCH J. Appl. Econometrics 3 279–94

Schwert G W 1989 Why does stock market volatility change over time? J. Finance 44 1115–53

Author Response

Comment 1: Finally, the third issue is the style that the paper does not follow the typical style of a research paper. Its explanations look more like news briefings rather than usual research paper explanations.

Response 1: I have revised Section 3.3 to tie each of the classical prediction methods back to Equations 1, 2, and 3. Additionally, I have attached the full R code used to generate all the results. As for references, I note that there are many variants on GARCH, and I don't think I can reasonably cover all of them. I have chosen to focus on simple moving average, EWMA, and GARCH because of their longevity and popularity in financial risk management (my professional field).

Comment 2: The author highlights that the main issue is “predicting the asset return volatility.” However, the paper uses the S&P 500 index as the main data and generalizes.

Response 2: I have added to the introduction, noting that these results may not generalize to individual securities.

Comment 3: Mandelbrot, Benoit. (1963) also needs more elaboration.

Response 3: I have changed the relevant section to discuss persistence and not predictability.

Comment 4: Section 3.2. shows that the study covers 31 years of daily returns!

Response 4: While I use 31 years of data in total, each prediction uses a much shorter time frame of 126 days. Thus, I'm not claiming that data from 31 years ago is relevant to predicting volatility today, but that the patterns learned from 31-year-old data could be. I have added to the section about choosing the length of historical data.

Comment 5: Section 2.1 Volatility Predictions seems to be quite incomplete and not up to date.

Response 5: See Response 1. I have added a reference to models that incorporate macro variables.

Comment 6: Section 3.1 underlines that “the daily returns are simple returns, not logarithmic.”

Response 6: To reduce confusion, I have taken out the statement on simple versus logarithmic returns.

Comment 7: Figure 1 is too general and looks like something taken from Wikipedia.

Response 7: Figure 1 comes from a standard text on machine learning (The Elements of Statistical Learning), which is why it can be found in a Google search. I thought it would be better to include a simple schematic on how decision trees are built rather than relying on text alone.

Comment 8: The author should provide the results with and without the purged set for the readers to compare the benefits of introducing the purged set as well as how the size of the purged set has been decided.

Response 8: I describe how the length of the purged set was decided in Section 3.2. Given how small the purged set is as a percentage of the total data, I don't think it makes sense to present two sets of results.

Comment 9: Figure 3 is not comprehensive enough for any reader.

Response 9: I describe the main time series in Section 3.2, prior to Figure 2. Figure 3 is simply a visual representation of the slicing process that I describe in the text immediately before Figure 3.

Comment 10: The paper never touches on positive and negative shocks that may have the same impact on the volatility.

Response 10: I have added this topic in the Discussion Section.

Comment 11: The author is not up to date with the references.

Response 11: As I mentioned in Response 1, the potential number of papers one could cite on this topic is huge. This is particularly true for the many derivatives of GARCH, which would warrant their own paper (or book).

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Major : The author claims that tree-based methods have been applied to volatility prediction, which they believe has not been done before. However, previous studies have already used tree-based models to predict financial data. The author should compare this study with existing literature to clarify its unique contribution.

Basak, S., Kar, S., Saha, S., Khaidem, L., & Dey, S. R. (2019). Predicting the direction of stock market prices using tree-based classifiers. The North American Journal of Economics and Finance, 47, 552-567.

Sadorsky, P. (2022). Forecasting solar stock prices using tree-based machine learning classification: How important are silver prices?. The North American Journal of Economics and Finance, 61, 101705.

Montgomery, J. M., & Olivella, S. (2018). Tree‐Based Models for Political Science Data. American Journal of Political Science, 62(3), 729-744.

Sadorsky, P. (2021). Predicting gold and silver price direction using tree-based classifiers. Journal of Risk and Financial Management, 14(5), 198.

Minor: The author mentioned that the equations were written using LaTeX. However, equations (2), (4), (5), and (6) appear to be inserted as images rather than properly typeset equations. Additionally, the resolution of these images is low, making them difficult for readers to view clearly.

Author Response

Comment 1: The author claims that tree-based methods have been applied to volatility prediction

Response 1: I have clarified the statements in the introduction to specify that tree-based methods have been used on financial data, just not for volatility prediction. I have also added references to the papers you mention.

Comment 2: The author mentioned that the equations were written using LaTeX.

Response 2: I have directly incorporated the Latex equations into the Word document, replacing the images.

Reviewer 2 Report

Comments and Suggestions for Authors

I have reviewed the revised document submitted for the second round of evaluation. However, the changes made based on the first review are not clearly highlighted. It is common practice to submit the revised version with the modifications marked for clarity Additionally, it is important to ensure that the article is formatted according to the journal's template. Please provide a version where the adjustments are evident and properly incorporated into the required format.

Author Response

Comment 1: Please provide a version where the adjustments are evident and properly incorporated into the required format.

Response 1: I have submitted the new version in the Computation template. Additionally, I have highlighted changes from the original in yellow.

Article Menu

Tree-Based Methods of Volatility Prediction for the S&P 500 Index

Further Information

Guidelines

MDPI Initiatives

Follow MDPI