Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

CNN-CBAM-LSTM: Enhancing Stock Return Prediction Through Long and Short Information Mining in Stock Prediction

Mathematics 2024, 12(23), 3738; https://doi.org/10.3390/math12233738

by Peijie Ye^†

, Hao Zhang^† and Xi Zhou^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Aristeidis Mystakidis

Mathematics 2024, 12(23), 3738; https://doi.org/10.3390/math12233738

Submission received: 11 October 2024 / Revised: 20 November 2024 / Accepted: 27 November 2024 / Published: 27 November 2024

Round 1

Reviewer 1 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

The authors have incorporated all my criticisms regarding the previous version of the manuscript, and have corrected the issues of non-stationarity, the comparison with other predictive models, the use of predictive error metrics and predictive performance tests. The authors have also performed additional analyses with other indices.

Thus, the authors have adequately responded to my criticisms, and I am in favor of publishing the article in its current form.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

Authors made adequate changes in the paper.

An overall revision of language and formulas is necessary.

For instance,

- number of figure is missing in line 332

- the subscript d W_d in line 177

- seems that eq 4 should be generalized, since k may have different sizes (3, 5, 7), and eq 4 seems to be specific for size 3

There are other items to be corrected. Please make a thorough revision of the new version.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

Comments:

Authors present a study on a hybrid deep learning model (CNN-CBAM-LSTM) aimed at improving stock return predictions by leveraging both long- and short-term time series forecasting information. The study claims significant predictive advantages and robustness across multiple datasets by incorporating Convolutional Block Attention Module (CBAM) and other deep learning modules.

In general, authors present an interesting topic while trying to utilize and compare DL and ML architectures, showcasing how their proposed models can outperform the other models for Stock Prediction. I believe that authors invested a lot of effort for this comparative analysis, however, the overall manuscript lacks several aspects like result evaluations, literature, data presentation, highlighting novelty and overall presentation with many abbreviation issues. This paper requires several revisions to improve clarity, methodological rigor, and presentation, to be published on a Journal of this level.

Basic Justification for the Score

Relevance: High

The paper is highly relevant to both finance and machine learning fields, especially given the increasing interest in applying advanced neural networks for financial forecasting. The integration of CNN, CBAM, and LSTM modules addresses a contemporary need to improve prediction accuracy for financial data forecasting, making the research suitable for MDPI’s Mathematics journal.

Novelty: Moderate

Although several attempts and studies have done for financial data and stock market prediction regarding hybrid DL approaches and the proposed paper has similarities by combining DL models like CNN and LSTMs using also CBAM (https://github.com/ZiruiFeng/Predicting-Stock-Prices-With-LSTM-and-CBAM-Model) it proposes different methodology of DL combination and dataset for stock price prediction.

Significance: Moderate

Although this work successfully demonstrates how CNN-CBAM-LSTM models can outperform the other models, the comparison in general terms should provide more details in some aspects. As DNN are superior models especially compared to statistical or ML ones, the outcome is somehow expected. Furthermore, as attention based gating mechanisms are very popular, authors should consider comparing the proposed model with other similar DL models with attention based gating mechanisms.

Readability: Moderate to Low

The manuscript is generally well-structured, with a logical flow of information, there are some areas where grammar, syntax, figures and phrasing could be improved. like abstract literature review, data presentation, results and abbreviations.

Technical Quality: Low to Moderate

While the methodology is well-documented, it lacks detailed explanations some parts, especially in literature and why this comparison is necessary from novelty point of view. Moreover, authors should include a more in-depth comparison with other state-of-the-art attention based methods, or at least with other more advanced tree based models like XBoost, LGBM, CatBoost etc.. Furthermore, there is no further validation of the optimized model.

General Comments:

There are several comments, that I believe could significantly help this work to be upgraded:

1. Abstract, as a general comment, is informative but could be improved by clearly stating the key contributions and findings of the study. Please check and revise.

2. Abstract, 6^th sentence "… the predicted recall rate of the…” should be better rephased or be more specific. Authors should provide detail for the metrics they used like rmse, mae or r2.

3. Abstract has many abbreviation issues. Authors should consider explaining abbreviations like CNN and LSTM as they are introduced without full definition.

4. Abstract, similarly to the previous comment the full term “Convolutional Block Attention Module” should be introduced before using the abbreviation “CBAM” as it is introduced only in page 5. Please check and revise all abbreviations across the manuscript.

5. Introduction, the section would benefit from a more extensive intro of recent studies on hybrid deep learning models used in financial forecasting, particularly concerning attention mechanisms like CBAM.

6. Section 2, Related Work, is somehow limited to include literature depth only about the model part of this study and not about related problems importance.

7. Section 2, Related Work, Authors should consider adding some information about load forecasting research overall metrics utilized and their scores in related works. Please check and revise.

8. Section 3, Proposed Methodology, subsection 3.1, although it is provided barely as information “to construct a sliding time window, utilizing the historical data from the past T days for prediction.”, it is unclear what are the final parameters used to train the models. Is there any datetime information like time, and date in a timestamp format? Did authors used that to create other seasonal parameters? Is this a univariable problem? A multivariable? Is there any detailed description about each one? Please clarify.

9. Section 3 Proposed Methodology, subsection 3.1, there are no detailed statistics presented about the dataset. Please provide distribution diagrams, or boxplots to identify if there is normality on the distribution or if there are bias or skewness to the utilized parameters.

10. Section 3, Proposed Methodology, subsection 3.2, 1^st para, similar abbreviation issues for recurrent neural network (RNN), long short-term memory [LSTM]) etc. Authors should consider revising all abbreviations.

11. Section 3, Proposed Methodology, subsection 3.3, Fig 3 is not discussed in the text part of this section in depth. For example, what is feature map O? why is it called O? Similarly with feature map Ms. Every part of the figures should be represented on the text providing the necessary definitions. Please check and revise.

12. Section 4, Experiment, subsection 4.2, figure 4 seems to be reversed. Please check and revise.

13. Section 4, Experiment, subsection 4.2, I believe authors should alter the structure of the sections 3 and 4. Several subsections of the study like 4.1 and 4.3 should moved or merged with parts on the 3^rd section, for readability purposes. For example, there are some information about the utilized dataset, that is also mentioned in section 3.1. Please check and revise.

14. Section 4, Experiment, subsection 4.5, table 2. Look back (days) are referred on the usage of sliding window? If yes, as authors already mentioned the implementation of SW in sections 3.1, they should stick to referring the historical input of the model as sliding window. If not, they should consider a deeper explanation on review periods in the text.

15. Section 4, Experiment, subsection 4.5, did authors utilized a further validation? How did they avoid overfitting issues?

16. Section 4, as a generic comment, although it is mentioned, it is not clear to me whether this a one step ahead or multistep ahead forecasting problem. Authors should clarify the forecasting horizon for each model. This will give the reader a clearer overview of the study’s scope and outcomes. Please check and revise.

17. Section 4, Experiment, did authors utilized any preprocessing techniques? How did they encounter missing values and data fluctuations?

18. Section 5. Conclusion is somehow limited and should be expanded. No limitations or future work are provided. Please check and revise.

Comments on the Quality of English Language

There are many abbreviations that need attention.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

The authors have addressed many of my concerns. Here are some remaining minor comments that need adjustments.

Comment 6: The authors have successfully addressed this issue; however, 1 or 2 citations are needed for the new subsection.

Comment 15: The authors provided clarifications in their response. I believe the validation part should also be mentioned in some part of the section.

Comment 17: Similar to 15 this also should be mentioned

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The contribuiton of the paper is not clear. Since there are many other studies that explore ML in stock price prediciton, using various models, the specific advantages of the proposed model are not evident. There are other issues to be addressed. - The text should be thoroughly revised to correct for misprints or enhance format. For instance, "the previous T day.". - In Figure 1, it should be "Channel attention" and "Spatial attention" (instead of "Channel attetion", "Spatial attetion"). - Equations should be explained better (e.g. what is n in eq 1, why the number of columns in eq 2 is 4). - Relevant information or definition should be accompanied by references (e.g. "However, RNNs and their derivative structures (such as long short-term memory [LSTM]), which are mostly used in this field to memorize time series, still have difficulty solving the problem of gradient disappearance", "Convolution also enhances parallel computing capabilities, speeding up both training and inference"). - I am not sure whether the acronyms were adequately described (e.g. CBAM). - Table 1 seems to be unnecessary, as it just depicts some of the data used. The description in the previious paragraph is enough for readers to understand the variables, - It is not clear why the factor analysis was performed. It is expected that close, open, high, low prices (in monetary values) should be highly correlated. - Another metrics (such as changes in high prices, interval of prices, or intraday volatility would be better predictors). - In addition, the literature review should have suggested many other variables to be used in the predicting models. - It is not clear how a lower prediction error can be helpful. - For instance, it is not clear what would be the trading strategy for investors or portfolio managers to take advantage of the better prediction accuracy. When investor should buy or sell the index? - In addition, would the trading strategy lead to abnormal returns? Comments on the Quality of English Language

Moderate language review.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents an innovative approach to the design of a hybrid, multiple attention mechanism, Deep Neural Network model for feature extraction of data sourced from regional financial stock markets. However, I have a concern regarding the use of a linear regression model in the final prediction step. Given the complexity of stock price movements, which almost always involve complex non-linear relationships, it seems that a more flexible model, such as a multi-layer perceptron (MLP), might be more appropriate. In the spirit of Neural Networks being "black boxes", an MLP could still revert to a linear model if the data warrants it, but it would also allow for the modeling of any non-linear patterns (left in the data) in that final step. This could potentially improve prediction performance and at worse give the same results. The authors should objectively clarify their rationale for choosing a strictly linear model. In particular, how the use of a linear model in the final step objectively compares to the use of an MLP (in that same position within the model) in terms of the metrics that they have used in their analysis.

While the methodology presented in the paper shows promise, and given that what has been presented is essentially a model tuned to empirical data, I find the naming of the proposed technique as 'LSIM' (Long Short Information Mining) somewhat unclear. The Literature Review does not seem to provide a clear link into the need for this introduction and choice of terminology. Typically, this approach would be described as a hybrid deep neural network, and the decision to assign a unique name suggests that some deeper theoretical basis—such as derived from a formal Information Theory analysis—underpins this model. If 'Information' is indeed being created or extracted in a novel way, it would be beneficial for the authors to explicitly demonstrate this through mathematical relationships or an appropriate analysis of their results. The authors should clarify why they chose this name, and if there is no such analysis, reconsider whether a more conventional term might better describe the model.

The focus of the paper is on deep learning models, so Section 2.1 may be somewhat redundant, especially as section 2.2 already integrates many of the techniques discussed in the context of neural networks. The authors might consider either removing this section entirely, or integrating some of the more relevant references more succinctly into section 2.2, as they deem most appropriate.

Comments on the Quality of English Language

The English in this paper is of a very high level. Nonetheless, it is abrasive to find so many verbose sentences beginning with "Therefore,...". Ditto with other starts to sentences that one typically finds from the work of automated and manual English formatters. Not that I am against them (in fact, I wholeheartedly recommend their use for non-English speakers), just that one should really adjust to suit the format of the final publication and readers expectations.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents a new framework for forecasting financial asset prices using a new deep learning model based on mining long and short-time series information (LSIM). Although the topic of the paper is interesting, and there are possibly gains in forecasting with the use of nonlinear models and machine learning, the empirical analyses performed need to be expanded so that the results can be compared with other works.

In particular, all the analyses performed only compare models in similar classes, and other important models were not included in the analysis. For example, the most basic reference for comparison is a random walk, where the forecast is the last observed value for all horizons ahead. This model, based on the market efficiency/no arbitrage hypothesis, is the essential benchmark that needs to be included in this analysis. In addition, other nonlinear models, such as random forests, models with conditional volatility, etc. would also need to be included in the comparison.

Another important issue is the treatment of non-stationarity in the data. Although deep learning models do not assume this hypothesis, it is common in asset forecasting models to perform forecasts using return series rather than price series. Forecasting returns and comparing them in these series is in fact the most appropriate way to analyze these assets, since the goal is to predict price changes.

Traditional tests for evaluating predictive performance are also absent. Although there are point differences in the forecast metrics presented, it would be essential to verify whether the predictive difference is in fact statistically significant, using tests such as Diebold-Mariano, for example.

Another limitation of the article is that the entire predictive analysis is performed in only one predictive period, and in this regard, it is not possible to verify the robustness to sample selection problems (data snooping) in this analysis. In this regard, it would be important to compare the models using tools that are robust to the sample period, such as the model confidence set (Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453-497. https://doi.org/10.3982/ECTA5771) which uses the resampling principle to verify the robustness of the predictive performance to the sample used.

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

CNN-CBAM-LSTM: Enhancing Stock Return Prediction Through Long and Short Information Mining in Stock Prediction

Further Information

Guidelines

MDPI Initiatives

Follow MDPI