Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Hybrid Machine Learning Models for Long-Term Stock Market Forecasting: Integrating Technical Indicators

J. Risk Financial Manag. 2025, 18(4), 201; https://doi.org/10.3390/jrfm18040201

by Francis Magloire Peujio Fozap

Reviewer 1:

Ibrahim Yilmaz

Reviewer 2: Anonymous

J. Risk Financial Manag. 2025, 18(4), 201; https://doi.org/10.3390/jrfm18040201

Submission received: 5 February 2025 / Revised: 24 March 2025 / Accepted: 31 March 2025 / Published: 8 April 2025

(This article belongs to the Special Issue Risk Management in Capital Markets)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript presents a well-structured and well-researched study on hybrid deep learning models (LSTM-CNN) for stock market forecasting. The topic is relevant, and the methodology is robust. The manuscript is overall well-written and technically sound. The hybrid LSTM-CNN model is well-motivated, and the comparisons with traditional models (SVM, RF, ARIMA) are justified. However, the manuscript section needs improvement in the following ways:

1) To attract the attention of professional or general readers, the discussion section should focus more on the practical implications of the findings, not just numerical performance.

2)In methodology section, Provide reasoning for selecting 150 epochs and batch size of 64. Did other configurations perform worse? In addition, Explain why Random Forest achieved the highest R² (0.5655) and lowest RMSE (0.0859) despite lacking sequential learning capabilities.

3) The paper should address potential weaknesses, such as overfitting risks and limitations of technical indicators in volatile markets.

4) More emphasis on integrating macroeconomic indicators, sentiment analysis, or reinforcement learning should be added to improve real-world applications.

5) Mention how data preprocessing (handling missing values, scaling techniques) influenced performance.

6) Consider analyzing statistical significance by adding p-values or confidence intervals to performance metrics. Also, justify hyperparameter choices and preprocessing steps.

Author Response

Comment 1: [To attract the attention of professional or general readers, the discussion section should focus more on the practical implications of the findings, not just numerical performance.]

Response 1: [I have expanded the discussion section (4.3) to highlight the practical implications of the findings. The revised section now includes:

1- The potential use of the hybrid LSTM-CNN model for institutional investors, portfolio managers, and risk analysts.

2- How the model can be integrated into automated trading strategies and financial risk assessment tools.

3- The relevance of these findings for emerging fintech applications and algorithmic trading.]

Comment 2: [2)In methodology section, Provide reasoning for selecting 150 epochs and batch size of 64. Did other configurations perform worse? In addition, Explain why Random Forest achieved the highest R² (0.5655) and lowest RMSE (0.0859) despite lacking sequential learning capabilities.]

Response 2: [I now justify the choice of 150 epochs and batch size of 64. I tested different configurations (e.g., 100, 200 epochs; batch sizes of 32 and 128), and 150 epochs with batch size 64 provided the best balance between computational efficiency and convergence (Section 4.2). The Random Forest’s superior R² and RMSE is further analyzed in Section 4.3. I explain that RF’s ensemble learning approach effectively captures complex relationships in financial data despite its inability to model sequential dependencies, making it well-suited for short-term price movements.]

Comment 3: [3) The paper should address potential weaknesses, such as overfitting risks and limitations of technical indicators in volatile markets.]

Response 3: [I have added a new subsection under Limitations and Future Work discussing overfitting risks and how dropout regularization, batch normalization, and early stopping were used to mitigate it. The limitations of technical indicators in highly volatile conditions are discussed in Section 5. I highlight that technical indicators alone may not capture exogenous shocks such as political events or economic crises.]

Comment 4: [4) More emphasis on integrating macroeconomic indicators, sentiment analysis, or reinforcement learning should be added to improve real-world applications.]

Response 4: [I now emphasize how macroeconomic variables (e.g., interest rates, inflation, oil prices) could be integrated into future models. A discussion on sentiment analysis (social media & financial news sentiment scores) is included, outlining potential benefits in capturing investor sentiment. Reinforcement Learning (RL) is briefly introduced as a future approach for dynamically adjusting trading strategies based on market conditions.]

Comment 5: [5) Mention how data preprocessing (handling missing values, scaling techniques) influenced performance.]

Response 5: [Handling missing values: Forward-fill imputation was applied to avoid introducing artificial volatility. Scaling techniques: Min-Max scaling was used to normalize inputs, preventing large values from dominating learning. The influence of these preprocessing steps on model performance is now explicitly stated in Section 3.1 (Data Collection & Preprocessing).]

Comment 6: [6) Consider analyzing statistical significance by adding p-values or confidence intervals to performance metrics. Also, justify hyperparameter choices and preprocessing steps.]

Response 6: [I have now incorporated confidence intervals (95% CI) for RMSE, MAE, and R² scores by running multiple trials and reporting the mean and corresponding confidence bounds. Additionally, we have computed p-values to assess the statistical significance of the model's performance improvements. These updates provide a more rigorous validation of our findings and enhance the reliability of our results. (Table 2 and explanation)

Furthermore, I have expanded Section 3.3 to include a detailed justification for hyperparameter tuning choices. This includes the rationale behind selecting specific learning rates, dropout rates, and lookback periods. The selection process was based on systematic grid search optimization and empirical evaluation of model performance across multiple configurations. We also clarify how preprocessing steps, such as normalization, handling missing values, and feature engineering, impacted the model’s predictive accuracy and stability. Section 3.1]

Reviewer 2 Report

Comments and Suggestions for Authors

Authors did not list the technical indicators used for the study. They only stated key components of the dataset.
Was there hyperparameter tuning of the machine learning techniques and the hybrid deep learning technique? If there was, authors are expected to present that range of tuning parameters used and the optimal parameters after the fine tuning.
Authors claim to use data from September 2010 to September 2024 in the Data Collection section. However, there is a contradictory statement that the data used was from January 1, 2010 to December 31, 2024 in the Experimental Setup section. Authors should reconcile these two statements for consistency.
It should be noted that the training data used for the study did not capture the 2020 pandemic (COVID-19). It was only the testing data that included this condition. This drawback is evident in Figure 3. Clearly, in the first 200 days of the plot, the model performed well but deviated sharply as the days increase.
From Figure 3, there are 720 days, which is less than the number of days used for the testing data. Authors should explain the discrepancies.
Authors compared a standalone Random Forest and Support vector machine to a hybrid of LSTM and CNN. It would be appropriate if authors are to also compare different hybrid model to their proposed hybrid LSTM-CNN.

Comments on the Quality of English Language

Not Applicable

Author Response

Comment 1: [1.Authors did not list the technical indicators used for the study. They only stated key components of the dataset.]

Response 1: [We have now explicitly listed the technical indicators used in the study in Section 3.1 (Data Collection)]

Comment 2: [Was there hyperparameter tuning of the machine learning techniques and the hybrid deep learning technique? If there was, authors are expected to present that range of tuning parameters used and the optimal parameters after the fine tuning.]

Response 2: [Yes, hyperparameter tuning was conducted for the deep learning models using grid search and empirical validation. Section 3.3_b]

Comment 3: [Authors claim to use data from September 2010 to September 2024 in the Data Collection section. However, there is a contradictory statement that the data used was from January 1, 2010 to December 31, 2024 in the Experimental Setup section. Authors should reconcile these two statements for consistency.]

Response 3: [We have corrected this inconsistency by standardizing the data period across all sections to January 2010 – December 2024]

Comment 4: [It should be noted that the training data used for the study did not capture the 2020 pandemic (COVID-19). It was only the testing data that included this condition. This drawback is evident in Figure 3. Clearly, in the first 200 days of the plot, the model performed well but deviated sharply as the days increase.]

Response 4: [We acknowledge this limitation and have added a discussion on how the lack of COVID-19 data in the training set may have impacted the model’s generalization. We now suggest future research could use transfer learning or online learning to adapt models dynamically to sudden market shocks. Section 4.2]

Comment 5: [From Figure 3, there are 720 days, which is less than the number of days used for the testing data. Authors should explain the discrepancies.]

Response 5: [We have now clarified that Figure 3 displays only a subset of the testing period (approximately 3 years) for visualization clarity. However, the full test set includes 5 years (2020–2024).]

Comment 6: [Authors compared a standalone Random Forest and Support vector machine to a hybrid of LSTM and CNN. It would be appropriate if authors are to also compare different hybrid model to their proposed hybrid LSTM-CNN.]

Response 6: [We acknowledge the reviewer’s suggestion regarding the inclusion of additional hybrid model comparisons. The primary focus of our study was to evaluate the effectiveness of hybrid deep learning models compared to traditional machine learning techniques. Given the complexity and computational costs associated with training multiple deep learning hybrids, we prioritized the LSTM-CNN model due to its established success in capturing both temporal dependencies and spatial feature patterns in financial data. However, we recognize the importance of comparing different hybrid architectures. Future research could extend our findings by incorporating additional hybrid models, such as LSTM-GRU, CNN-GRU, or Transformer-based models, to assess their relative performance in financial forecasting. Section 4-3]

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Comment for 4 of the previous reviews has not been implemented.
Authors should divide the data in a way that the training data will capture part of the covid-19 period. This is a major drawback that should not be swept under the carpet.

Comments on the Quality of English Language

Not Applicable

Author Response

Comment: [Comment for 4 of the previous reviews has not been implemented.
Authors should divide the data in a way that the training data will capture part of the covid-19 period. This is a major drawback that should not be swept under the carpet.]

Response [

We appreciate this important observation. We understand the significance of incorporating the COVID-19 period in the training phase. However, in our original study design, we intentionally excluded this period from the training set to evaluate the model’s ability to generalize under unseen crisis conditions. This form of out-of-sample stress testing is common in financial forecasting literature and allows us to simulate how the model might perform during unexpected economic shocks.

That said, we acknowledge the merit of also evaluating model performance when such shocks are included in training. We have now clarified this rationale in the manuscript and recommend future work to include COVID-19 data in both training and testing to assess adaptive learning under crisis-driven volatility.]

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

Authors have responded to all of my comments.

Article Menu

Hybrid Machine Learning Models for Long-Term Stock Market Forecasting: Integrating Technical Indicators

Further Information

Guidelines

MDPI Initiatives

Follow MDPI