Next Article in Journal
How to Construct Behavioral Patterns in Immersive Learning Environments: A Framework, Systematic Review, and Research Agenda
Next Article in Special Issue
Development of Machine Learning-Based Indicators for Predicting Comeback Victories Using the Bounty Mechanism in MOBA Games
Previous Article in Journal
Machine Learning-Assisted Design and Optimization of a Broadband, Low-Loss Adiabatic Optical Switch
Previous Article in Special Issue
Context-Aware Tomato Leaf Disease Detection Using Deep Learning in an Operational Framework
 
 
Article
Peer-Review Record

Hybrid CNN-BiGRU-AM Model with Anomaly Detection for Nonlinear Stock Price Prediction

Electronics 2025, 14(7), 1275; https://doi.org/10.3390/electronics14071275
by Jiacheng Luo 1,†, Yun Cao 1,†, Kai Xie 1,*, Chang Wen 2, Yunzhe Ruan 1, Jinpeng Ji 3, Jianbiao He 4 and Wei Zhang 5
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2025, 14(7), 1275; https://doi.org/10.3390/electronics14071275
Submission received: 2 February 2025 / Revised: 7 March 2025 / Accepted: 18 March 2025 / Published: 24 March 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors
  1. The abstract lacks detailed quantitative results and key performance metrics. Briefly highlight the contributions over existing methods.
  2. The introduction did not identify a clear research gap and did not explain why CNN-BiGRU-AM was chosen over other hybrid models.
  3. The references cited in the manuscripts are outdated; and include recent relevant studies from 2022-2024.
  4. Include a discussion section to interpret the proposed results and demonstrate the proposed method's superiority over other techniques. Figures lack detailed captions and significant discussion.
  5. There is no detailed information on the hyperparameters of the model, what tuning process is utilized? and what is the rationale for the chosen values (0.01)? How was the anomaly detection threshold selected? How sensitive is the model to different anomaly detection thresholds? No sensitivity analysis was performed on hyperparameters.
  6. Why were 16 and 32 filters chosen in CNN layers? Justify the kernel size [1, 1]; could other sizes perform better?
  7. How does the attention layer contribute to the final prediction accuracy? Provide ablation studies to prove the necessity of AM.
  8. How does the CNN-BiGRU-AM perform compared to other temporal CNN hybrids? There is no clear explanation for the performance differences among the compared methods.
  9. Provide details on the data normalization and how were missing values handled during training.
  10. Why is 85:15 split for training and testing? Why not 70:30, 75:25 and 80:20?
  11. Include cross-validation performance. Lacks in-depth interpretation of results.
  12. The conclusion repeats the results without providing critical insights into the research. The suggestions for future work are too vague; that should be clearly outlined.

Major revisions are required.

Comments on the Quality of English Language

The manuscript contains grammatical errors and awkward sentence constructions. The manuscript needs thorough proofreading for language consistency. Improve clarity by simplifying complex sentences and ensuring consistency in technical terms.

Author Response

Comment #1:  The abstract lacks detailed quantitative results and key performance metrics. Briefly highlight the contributions over existing methods.
Response 1: Thank you for pointing this out. We agree with this comment. We have further analyzed the description in the abstract. Firstly, we simplified and refined it to reduce its length. Secondly, we provided clearer explanations of the performance metrics mentioned in the paper. Finally, we analyzed the contributions of our proposed method in conjunction with the CNN-BiGRU model used as an ablation experiment.Our updated abstract is presented as follows:
"To address challenges in stock price prediction including data nonlinearity and anomalies, we propose a hybrid CNN-BiGRU-AM framework integrated with deep learning-based anomaly detection. First, an anomaly detection module identifies irregularities in stock price data. The CNN component then extracts local features while filtering anomalous information, followed by nonlinear pattern modeling through BiGRU with attention mechanisms. Final predictions undergo secondary anomaly screening to ensure reliability. Experimental evaluation on SSE daily closing prices demonstrates superior performance with R² = 0.9903, RMSE = 22.027, MAE = 19.043, and a Sharpe Ratio of 0.65. It is noteworthy that the MAE of this model is reduced by 14.7%, and the RMSE is decreased by 7.7% compared to its ablation model. The framework achieves multi-level feature extraction through convolutional operations and bidirectional temporal modeling, effectively enhancing model generalization via nonlinear mapping and anomaly correction. Comparative Sharpe Ratio analysis across models provides practical insights for investment decision-making. This dual-functional system not only improves prediction accuracy but also offers interpretable references for market mechanism analysis and regulatory policy formulation."
Comment#2: The introduction did not identify a clear research gap and did not explain why CNN-BiGRU-AM was chosen over other hybrid models.
Response 2: Agree. We have reorganized the introduction section, optimized its structure, and provided timely summaries of the described content. We have also elaborated in detail on the research gaps and the innovative aspects of our proposed method. Additionally, we have partially replaced and re-explained the cited literature by incorporating previously reviewed studies. The main changes we have made are as follows:
"In summary, existing research presents three main limitations: first, traditional prediction methods struggle to adapt to the high-frequency fluctuations and nonlinear dynamics of stock price data; second, mainstream hybrid models have not effectively integrated the synergistic mechanisms of spatiotemporal feature extraction and dynamic anomaly filtering; and third, prediction systems lack the ability to adaptively correct for abnormal interference. These shortcomings contribute to the significant lack of prediction robustness in existing models under complex market conditions."
"The innovative design of the method proposed in this paper addresses three core challenges: first, the CNN component tackles the issue of feature extraction from nonlinear patterns in price series by using convolution kernels to capture multi-scale local features; second, the BiGRU component overcomes the limitation of unidirectional information flow in traditional RNNs by employing a bidirectional gating structure, which simultaneously models the influence of both historical and future information on the current data point; third, the AM combined with the autoencoder (AE) module addresses the problem of feature distortion caused by outliers, using attention weights and outlier filtering to dynamically purify the feature space, thereby enhancing the performance and reliability of the prediction model."
Comment#3: The references cited in the manuscripts are outdated; and include recent relevant studies from 2022-2024.
Response 3: Agree. We have updated the references in the manuscript by integrating both previously reviewed literature and recently studied works. This ensures a comprehensive understanding of the current research progress and existing gaps, thereby providing a solid foundation for our study.
Comment#4: Include a discussion section to interpret the proposed results and demonstrate the proposed method's superiority over other techniques. Figures lack detailed captions and significant discussion.
Response 4: Agree. We sincerely appreciate the reviewer for pointing out the issues in the discussion section of our manuscript. In our original manuscript, there was indeed a discussion section, but it primarily focused on elaborating the conclusion data. As for the discussion on why our proposed method outperforms other techniques, we included it in the conclusion section and presented it in a structured, point-by-point manner.Additionally, we have provided more detailed descriptions of the figures and tables included in the manuscript.
Comment#5: There is no detailed information on the hyperparameters of the model, what tuning process is utilized? and what is the rationale for the chosen values (0.01)? How was the anomaly detection threshold selected? How sensitive is the model to different anomaly detection thresholds? No sensitivity analysis was performed on hyperparameters.
Response 5: Agree. Thank you for pointing out the series of issues related to the model hyperparameters in our manuscript. Regarding our process of adjusting hyperparameters: We first generated a list of value ranges for different categories of hyperparameters based on experience and references. Then, using nested for loops in Matlab, we iterated through these ranges to explore various hyperparameter combinations. For each combination, we set performance metrics, trained the model, and saved the results. Finally, we filtered the results to identify the optimal hyperparameter configuration. This adjustment process was gradual, with the parameter ranges and step sizes decreasing progressively.
For the selection of the anomaly detection threshold, we calculated the Mean Squared Error (MSE) of the AutoEncoder model as the reconstruction error and determined the threshold based on the statistical distribution of the reconstruction errors on the training set. As for the model's sensitivity to different anomaly detection thresholds, since the hyperparameter adjustment process included scenarios where the threshold remained unchanged while other hyperparameters varied, we observed no difference in the detection of anomalies. Therefore, we concluded that a separate sensitivity analysis was unnecessary, as our experiments implicitly addressed this aspect to some extent. Additionally, we have included a brief explanation of the anomaly detection threshold in the latest version of the manuscript.
Comment#6: Why were 16 and 32 filters chosen in CNN layers? Justify the kernel size [1, 1]; could other sizes perform better?
Response 6: Thank you for pointing out this technical issue related to CNN in our manuscript.
Rationale for choosing 16 and 32 filters:16 filters: The first convolutional layer uses 16 filters to help the model capture basic local features, such as short-term fluctuations and intraday changes in stock prices. This is because, in time series data, local fluctuations often provide valuable trend information. 32 filters: The second convolutional layer uses 32 filters to extract more complex features. These features represent longer-term trends or finer-grained patterns. By increasing the number of filters, the network can learn richer high-level information, thereby enhancing the model's expressive capability.
Rationale for choosing a [1, 1] convolutional kernel:Local feature extraction: In stock price time series data, a [1, 1] convolutional kernel can effectively capture local features at each time step, particularly short-term fluctuations and relationships between individual moments. Since stock price data exhibits temporal dependencies, using a [1, 1] kernel allows the model to focus on processing data at each time step without introducing excessive contextual information.Reducing computational complexity: Compared to larger convolutional kernels (e.g., [3, 3] or [5, 5]), a [1, 1] kernel significantly reduces computational load and the number of parameters. This is particularly important for models handling large-scale stock price data and also helps prevent overfitting.
Of course, we also acknowledge that we cannot perfectly explain the deeper, more fundamental reasons why such parameters perform well in black-box models. Perhaps that will be a direction for our future efforts. Our explanations for the parameter settings are based on the results, as these parameters were not arbitrarily set at once but were determined through experimentation.
To help readers better understand the rationale behind these parameter settings while reading our paper, we have added the following statement when describing these parameters in the manuscript:
"The goal of extracting features layer by layer and gradually increasing the number of filters is to capture sufficient features without significantly increasing computational cost. The first layer uses a smaller number of filters (16) to capture short-term fluctuation features, while the second layer employs more filters (32) to extract more complex, long-term trend patterns. The convolution kernel size of [1, 1] is designed to effectively capture the local features at each time point, particularly short-term fluctuations. This design avoids the introduction of redundant information from larger convolution kernels, reduces computational complexity, and helps mitigate overfitting."

Comment#7: How does the attention layer contribute to the final prediction accuracy? Provide ablation studies to prove the necessity of AM.
Response 7: We sincerely appreciate the reviewer for pointing out the issue regarding the contribution of the AM in our manuscript.In our original manuscript, we designed the following two sets of ablation experiments: one comparing CNN-BiGRU-AM with CNN-BiGRU, and the other comparing GRU with GRU-AM. The experimental results show that the models incorporating AM consistently outperform those without AM across various performance metrics. Therefore, we believe there is sufficient justification to acknowledge the contribution of AM to the model. We sincerely welcome further guidance from the reviewer. 
In the latest version of the manuscript, we also have placed greater emphasis on analyzing these data to demonstrate our attention to this feedback.
Comment#8: How does the CNN-BiGRU-AM perform compared to other temporal CNN hybrids? There is no clear explanation for the performance differences among the compared methods.
Response 8: Agree. We have adopted the approach of comparing our proposed CNN-BiGRU-AM model with a classic and well-performing model in the realm of similar CNN-based prediction models, namely CNN-BiLSTM-AM. Through this comparative experiment, we aimed to identify differences. On one hand, this allows us to demonstrate the strengths and weaknesses of our model compared to existing high-performing models. On the other hand, it also helps explore the distinctions with other CNN-based models, as mentioned by the reviewer. Additionally, the ablation experiment in our manuscript, which uses CNN-BiGRU as a comparison, serves a similar purpose.
Comment#9: Provide details on the data normalization and how were missing values handled during training.
Response 9: Agree. Our data normalization process is as follows: we first apply the MATLAB built-in function 'mapminmax' to scale the data to the range [0, 1], and then use the 'reshape' function to flatten the data, adjusting it to the input shape required by the model. Regarding missing values during the training process: missing data points are removed, which typically result from normal occurrences such as market closures when no trading data is available.
In the latest version of the manuscript, we have added the following more detailed description:
"To guarantee the precision and uniformity of the data, the following preprocessing steps were undertaken: Missing data points are removed, typically resulting from normal occurrences such as market closures when no trading data is available. For outliers, the previous data point is used for imputation. "
"Specifically, we first apply the MATLAB built-in function 'mapminmax' to scale the data to the range [0, 1], and then use the 'reshape' function to flatten the data, adjusting it to the input shape required by the model.This improves the model training efficiency and stability. "
Comment#10: Why is 85:15 split for training and testing? Why not 70:30, 75:25 and 80:20?
Response 10: Agree. Our experimental process in this regard is roughly as follows: First, by reviewing relevant literature using the same dataset and researching articles on machine learning dataset partitioning, we initially determined a range for the training set ratio. Then, through nested loops, we experimented with different partitioning ratios in each iteration to identify the most suitable ratio for our dataset. Ultimately, we settled on this ratio, which has proven to yield the best experimental results.
To justify the rationality of this partitioning ratio, we have included a brief explanation in the latest version of the manuscript:
"This ratio was found to be the optimal choice through multiple experiments. It also falls within the commonly used range for dataset splits in machine learning, ensuring that the model can effectively learn from the training data and generalize well to unseen data. "
Comment#11: Include cross-validation performance. Lacks in-depth interpretation of results.
Response 11: Agree. We sincerely thank you for raising these insightful critiques. Your observations regarding the rigor of cross-validation and insufficient interpretability of results precisely identify critical areas for improvement, reflecting the need to enhance the completeness of our argumentation and analytical depth.
Rationale for Current Methodology
In this preliminary study focused on validating the fundamental adaptability of the algorithm to financial time-series data, we adopted fixed-period data partitioning for two key reasons:
Temporal Integrity of Financial Data: Financial datasets exhibit strong temporal sensitivity and event dependency. Traditional cross-validation methods risk introducing information leakage from future data, which could distort the true performance evaluation of the model.
Phase-Specific Research Goals: At this exploratory stage, we prioritized evaluating the core mechanisms of the algorithm under diverse market regimes rather than optimizing validation protocols.
Commitment to Methodological Enhancement
We fully acknowledge the limitations in methodological robustness and propose the following phased improvements:
Validation Framework:
Implement rolling window validation with expanding/retraining mechanisms
Introduce regime-switching detection to ensure robustness across market cycles
Interpretability Augmentation:
Integrate feature contribution analysis (e.g., temporal SHAP values)
Develop volatility attribution explanations to disentangle model decisions
Your critique has fundamentally reshaped our research trajectory. We are currently trying to design a unified evaluation protocol that reconciles statistical rigor with financial realism.
This invaluable feedback has elevated both the scholarly and practical dimensions of our work. We will explicitly address these enhancements in our upcoming journal extension and welcome further guidance as we advance toward deployable solutions.
Comment#12: The conclusion repeats the results without providing critical insights into the research. The suggestions for future work are too vague; that should be clearly outlined.
Response 12: Agree. We have made significant revisions to the conclusion. On one hand, we have provided a concise summary of the paper, highlighting our insights and the innovative aspects of the model. On the other hand, we have summarized the limitations of our current work and outlined future research directions. These revisions are presented in a structured, point-by-point manner.The two main revisions are as follows:
"This study systematically addresses the two core challenges of nonlinear modeling and abnormal data interference in stock price prediction by constructing a CNN-BiGRU-AM hybrid model, achieving the following key advancements:
    Multimodal Collaborative Optimization Mechanism: This mechanism enables the seamless integration of CNN-based local mutation feature extraction, BiGRU-based bidirectional time series modeling, and AM-AE-based dynamic feature purification, overcoming the limitations of traditional hybrid models that simply stack components. Experiments demonstrate that this architecture achieves an R² of 0.9903 for Shanghai Composite Index prediction, which is 0.08\% higher than the best-performing baseline model, CNN-BiLSTM-AM (R² = 0.9895), and 1\% higher than the ablation comparison model, CNN-BiGRU (R² = 0.9801), thereby validating the significant improvement in prediction accuracy through feature collaborative optimization.
    Anomaly Immunity Prediction Paradigm: A three-stage anomaly handling framework, comprising 'pre-detection, mid-filtering, and post-correction,' is proposed to overcome the limitations of isolated anomaly analysis in traditional approaches. By deeply integrating anomaly detection into the feature learning process, this paradigm enables adaptive feature optimization, mitigating the impact of anomalies while preserving the model's ability to capture normal price mechanisms. This approach fosters an endogenous synergy between anomaly immunity and prediction optimization.
    Risk-Return Balance Ability: Risk exposure is dynamically adjusted through the attention mechanism, resulting in a model Sharpe ratio of 0.65, which is 22.6\% higher than that of GRU-AM (0.53) and 16.1\% higher than BiGRU (0.56). "
"Future research can be deepened from three aspects:
    Fusion of Multi-Source Heterogeneous Data: Incorporating unstructured information, such as news sentiment analysis (BERT), institutional holdings data (13F Filings), and public opinion factors, to construct a cross-modal prediction system is anticipated to enhance prediction accuracy.
    Real-Time Decision Support: Explore lightweight deployment solutions to address the low-latency prediction requirements in high-frequency trading scenarios.
    Enhanced Interpretability: SHAP values are employed to analyze the model’s decision logic, with a focus on uncovering the impact of nonlinear feature interactions on prediction outcomes. "
4. Response to Comments on the Quality of English Language
We have revised the lengthy parts of the text to make it more comprehensible. We sincerely welcome further guidance from the reviewer.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents a novel hybrid model combining Convolutional Neural Networks (CNN), Bidirectional Gated Recurrent Units (BiGRU), and an Attention Mechanism (AM) for stock price prediction, with a focus on addressing nonlinearity and anomaly detection in stock market data.

The key contributions of this work appear to be:

  • The authors introduce a deep learning-based anomaly detection mechanism to identify and filter out anomalous data.
  • The combination of CNN, BiGRU, and AM allows the model to capture both local features and long-term dependencies in stock price data, while the attention mechanism helps in weighting important features, leading to more accurate predictions.

Below is a list of suggestions to improve the presentation:

  1. The Abstract is too verbose. A shorter and structured abstract would be useful.
  2. Many phrases are too long and difficult to understand (e.g those in lines: 31-36, 59-63 and others.
  3. Some other phrases are just difficult to understand (e.g. „As anomalies are frequently rare and challenging to define through conventional statistical methodologies.” „The modelinitially extracts local features by inputting a convolutional neural network following anomaly detection”.)
  4. The references do not follow the style of the journal.
  5. The authors could have included a more comprehensive literature review that discusses recent advancements in stock price prediction, particularly those involving hybrid deep learning models (e.g., CNN-LSTM, Transformer-based models, or other attention-based architectures). This would provide context for their work and highlight the gaps they aim to address.
  6. The entire subsection 2.2 is really difficult to understand. Starting with the definition of autoencoder as: „Auto-encoder is a neural network model for unsupervised learning, mainly used to learn efficient coding of data.”
  7. The authors could have compared their results with those reported in other studies that use similar datasets (e.g., the Shanghai Composite Index or other stock indices). For example, they could have cited studies that use CNN-LSTM, Transformer models, or other hybrid architectures for stock price prediction and compared their performance metrics.
  8. A discussion about the limitations of the study is always useful.

 

Comments on the Quality of English Language

Proofreading is highly recommended.

Author Response

Comment #1: The Abstract is too verbose. A shorter and structured abstract would be useful.
Response 1: Thank you for pointing this out. We agree with this comment. We have addressed this issue in the revised manuscript. In response to the reviewers' suggestions, we further analyzed the description in the abstract. First, we simplified and refined it to shorten its length. Second, we kept the performance metrics to visualize the implementation results more intuitively. Finally, we analyze the contribution of our proposed method in combination with the CNN-BiGRU model used as ablation experiment. Our updated abstract is presented below:
"To address challenges in stock price prediction including data nonlinearity and anomalies, we propose a hybrid CNN-BiGRU-AM framework integrated with deep learning-based anomaly detection. First, an anomaly detection module identifies irregularities in stock price data. The CNN component then extracts local features while filtering anomalous information, followed by nonlinear pattern modeling through BiGRU with attention mechanisms. Final predictions undergo secondary anomaly screening to ensure reliability. Experimental evaluation on SSE daily closing prices demonstrates superior performance with R² = 0.9903, RMSE = 22.027, MAE = 19.043, and a Sharpe Ratio of 0.65. It is noteworthy that the MAE of this model is reduced by 14.7%, and the RMSE is decreased by 7.7% compared to its ablation model. The framework achieves multi-level feature extraction through convolutional operations and bidirectional temporal modeling, effectively enhancing model generalization via nonlinear mapping and anomaly correction. Comparative Sharpe Ratio analysis across models provides practical insights for investment decision-making. This dual-functional system not only improves prediction accuracy but also offers interpretable references for market mechanism analysis and regulatory policy formulation."
Comment #2: Many phrases are too long and difficult to understand (e.g those in lines: 31-36, 59-63 and others.
Response 2: Agree. Two phrases are mentioned in lines 31-36 of the text: moving average and regression analysis. These two phrases are explained as follows:
Moving average (MA) is a statistical calculation commonly used in time series analysis to smooth out short-term fluctuations and highlight longer-term trends or cycles in data. It’s frequently used in financial markets to analyze stock prices, trading volume, and other data. The basic idea is to calculate an average of data points over a specified period and update it as new data becomes available.
Regression analysis is a statistical technique used to examine the relationship between one or more independent variables (also called predictors or features) and a dependent variable (also called the response or target). The goal of regression analysis is to model this relationship so that you can predict the dependent variable based on new values of the independent variables, or understand how changes in the independent variables affect the dependent variable.
In lines 59-63 it says a gated recurrent unit model,which means a GRU model . The model will be referred to in 2.4 below
Or maybe it was difficult to read because the reference citations were not formatted correctly. We sincerely apologize, and the problem has been corrected and will be mentioned below. We sincerely welcome further guidance from the reviewer. 
Comment #3: Some other phrases are just difficult to understand (e.g. „As anomalies are frequently rare and challenging to define through conventional statistical methodologies.” „The modelinitially extracts local features by inputting a convolutional neural network following anomaly detection”.)
Response 3: Agree. We have reorganized the language of the abstract section and optimized the structure of the section. At the same time, we summarized the points in the conclusion section.
Comment #4: The references do not follow the style of the journal.
Response 4: Agree. We have reformatted the references according to the style of your journal, and updated the references in the manuscript by integrating previous literature and the latest research findings.
Comment #5: The authors could have included a more comprehensive literature review that discusses recent advancements in stock price prediction, particularly those involving hybrid deep learning models (e.g., CNN-LSTM, Transformer-based models, or other attention-based architectures). This would provide context for their work and highlight the gaps they aim to address.
Response 5: We did tell the history of the development of stock price prediction under deep learning models, and after the literature style revision, we re-told the hybrid modeling section and pointed out the problems of stock price prediction at the present time in sub-points. We provide the difficulties of current stock price prediction in our newly revised manuscript and give three limitations of current model prediction.
Comment #6: The entire subsection 2.2 is really difficult to understand. Starting with the definition of autoencoder as: „Auto-encoder is a neural network model for unsupervised learning, mainly used to learn efficient coding of data.”
Response 6: Agree. We have embellished the language of the first paragraph. The modifications are as follows:
” Auto-encoder is a neural network model employed for unsupervised learning, primarily aimed at learning efficient data representations. The core idea is to train the network to compress the input data into a lower-dimensional representation , which is also called potential space, and then reconstruct the data from this compressed representation.The auto-encoder consists of two main components, one is the encoder, which is responsible for mapping the input data to the latent space, and the decoder, which reconstructs the data from the latent space.”
Comment #7: The authors could have compared their results with those reported in other studies that use similar datasets (e.g., the Shanghai Composite Index or other stock indices). For example, they could have cited studies that use CNN-LSTM, Transformer models, or other hybrid architectures for stock price prediction and compared their performance metrics.
Response 7: We have a section on ablation experiments on hybrid models, and we have designed ablation experiments: comparing CNN-BiGRU-AM with CNN-BiGRU. also, in section 3.6, we have compared our model with CNN-BiGRU and with CNN-BiLSTM-AM.  I think that the above experiments have been able to show the advantages and disadvantages of our model compared to the existing high performance models.
Comment8: A discussion about the limitations of the study is always useful.
Response 8: Agree. We have made significant revisions to the conclusion. On the one hand, we have provided a concise summary of the paper, highlighting our insights and the innovative aspects of the model. On the other hand, we have summarized the limitations of our current work and outlined future research directions. These revisions are presented in a structured, point-by-point manner.The two main revisions are as follows:
"This study systematically addresses the two core challenges of nonlinear modeling and abnormal data interference in stock price prediction by constructing a CNN-BiGRU-AM hybrid model, achieving the following key advancements:
    Multimodal Collaborative Optimization Mechanism: This mechanism enables the seamless integration of CNN-based local mutation feature extraction, BiGRU-based bidirectional time series modeling, and AM-AE-based dynamic feature purification, overcoming the limitations of traditional hybrid models that simply stack components. Experiments demonstrate that this architecture achieves an R² of 0.9903 for Shanghai Composite Index prediction, which is 0.08\% higher than the best-performing baseline model, CNN-BiLSTM-AM (R² = 0.9895), and 1\% higher than the ablation comparison model, CNN-BiGRU (R² = 0.9801), thereby validating the significant improvement in prediction accuracy through feature collaborative optimization.
    Anomaly Immunity Prediction Paradigm: A three-stage anomaly handling framework, comprising 'pre-detection, mid-filtering, and post-correction,' is proposed to overcome the limitations of isolated anomaly analysis in traditional approaches. By deeply integrating anomaly detection into the feature learning process, this paradigm enables adaptive feature optimization, mitigating the impact of anomalies while preserving the model's ability to capture normal price mechanisms. This approach fosters an endogenous synergy between anomaly immunity and prediction optimization.
    Risk-Return Balance Ability: Risk exposure is dynamically adjusted through the attention mechanism, resulting in a model Sharpe ratio of 0.65, which is 22.6\% higher than that of GRU-AM (0.53) and 16.1\% higher than BiGRU (0.56). "
"Future research can be deepened from three aspects:
    Fusion of Multi-Source Heterogeneous Data: Incorporating unstructured information, such as news sentiment analysis (BERT), institutional holdings data (13F Filings), and public opinion factors, to construct a cross-modal prediction system is anticipated to enhance prediction accuracy.
    Real-Time Decision Support: Explore lightweight deployment solutions to address the low-latency prediction requirements in high-frequency trading scenarios.
    Enhanced Interpretability: SHAP values are employed to analyze the model’s decision logic, with a focus on uncovering the impact of nonlinear feature interactions on prediction outcomes. "
    4. Response to Comments on the Quality of English Language
    We have revised the lengthy parts of the text to make it more comprehensible. We sincerely welcome further guidance from the reviewer.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Good parts: Paper provides solid background and basics for the novelty of this research. The  anomaly detection in financial time-series forecasting is really interesting and practically shown with improvements over traditional DL methods. Experiment design is well structured in my opinion.

 

I have some concerns that need some revisions/rebuttals as applicable:

1. R2 value is 0.9903, which is unrealistically high for stock market forecasting. Because stock prices are very unstable and volatile in my opinion. Technically, they are noisy, volatile, and influenced by non-quantifiable factors (breaking news, marcoeconomic events, etc.) that authors may not have not exactly considered or explained as a drawback or assumption of this study. As stock prices do not follow a pattern, and financial data is non-stationary, and R2 value being high may indicate overfitting to historical trends. Furthermore, stock prices have a randomness making it unlikely have R2 value close to 1. This is also possible due to inadequate generalization, as overfitted models perform well on training data, but fail with new data. Try k-fold cross validation and walk-forward validation. Try different datasets (FTSE 100, NASDAQ, etc.). Authors can reduce model complexity by introducing dropout layers, regularization techniques (L1/L2 penalties). and early stopping.

 

2. Model was evaluated only on its technical accuracy, but no real-economic validation to exhibit its practicality was seen. Prediction is a part of financial models to predict the stock prices accurately, but they also generate profitable trading strategies. This paper needs backtesting, to check if the predictions lead to profitable trading decisions, as accuracy does not mean profitability!

 

3. There is no risk exposure evaluation or assessment of drawdown, as this may put the traders into high risk, leading to massive losses. There is also no consideration for friction in the market. Use common financial metrics like MDDM Sortino ratio, etc. There is a need for comparative benchmark study, which can be cited to compare model's performance against buy-and-hold, moving avg crossovers, etc.

 

4. Justification for choosing BiGRU over LSTM may need more explanation, as stock prices are not symmetrical, and their future movements do not affect the past pricing. Stock prices move sequentially over the time, and this makes me skeptical about the bidirectional processing. There is also the computational cost associated with BiGRU, which will eat up funds in the form of higher computational power and more memory, over LSTM or standard GRU. Maybe authors can provide a comparison between BiGRU, and cite papers for LSTM and GRU to compare. 

 

5. How does the model arrive at its predictions? This is not very well explained or somewhat absent, as DL models are like black boxes making it difficult to understand why they make certain predictions. SHAP or attention heatmaps would be really helpful in this case as investors need interpretability, and financial institutions need transparency of XAI models for risk assessment and decision making. Its also necessary to determine which factors influence this prediction or contribute the most (factors like volume, volatility, momentum indicators). SHAP can be useful in this case.

 

6. As financial market demands speed, real-time decision making. The model's complexity would suggest a lot of things, but there is a need to discuss training time, memory consumption and inference speed, which have not been mentioned in this study.

 

Author Response

Comment#1: R2 value is 0.9903, which is unrealistically high for stock market forecasting. Because stock prices are very unstable and volatile in my opinion. Technically, they are noisy, volatile, and influenced by non-quantifiable factors (breaking news, marcoeconomic events, etc.) that authors may not have not exactly considered or explained as a drawback or assumption of this study. As stock prices do not follow a pattern, and financial data is non-stationary, and R2 value being high may indicate overfitting to historical trends. Furthermore, stock prices have a randomness making it unlikely have R2 value close to 1. This is also possible due to inadequate generalization, as overfitted models perform well on training data, but fail with new data. Try k-fold cross validation and walk-forward validation. Try different datasets (FTSE 100, NASDAQ, etc.). Authors can reduce model complexity by introducing dropout layers, regularization techniques (L1/L2 penalties). and early stopping.
Response 1: Thank you for pointing this out. We agree with this comment. We believe that an R² value of 0.9903 is acceptable. On one hand, similar performance metrics can be observed in related literature, such as in the paper "A CNN-BiLSTM-AM method for stock price prediction" by Lu(2021), which also reports comparable experimental results. On the other hand, we believe the possibility of overfitting can be ruled out. As suggested by the reviewer, the code used in our initial results already included dropout layers and regularization techniques to prevent overfitting, which contributed to the current outcome.
As mentioned in the reviewer's comments, we adjusted the dropout probability, the number of dropout layers, and the parameters related to regularization techniques. However, in terms of R², the results were slightly worse than before, with values still relatively high. Therefore, we decided to retain the previous settings. If the reviewer requires further clarification, we can provide portions of the code to aid understanding.
Comment#2: Model was evaluated only on its technical accuracy, but no real-economic validation to exhibit its practicality was seen. Prediction is a part of financial models to predict the stock prices accurately, but they also generate profitable trading strategies. This paper needs backtesting, to check if the predictions lead to profitable trading decisions, as accuracy does not mean profitability!
Response 2: We agree on the necessity of backtesting for strategy evaluation. In response to your valuable suggestion, we provide the following explanation:
Current Research Focus: The core innovation of this paper lies in the algorithmic breakthrough of a high-precision prediction framework. Due to limitations in manuscript length and the current research stage, we have primarily focused on technical-level verifiability (including basic metrics such as the Sharpe ratio). As you rightly pointed out, this does not yet fully meet the comprehensive requirements of financial engineering research.
Follow-up Research: In our ongoing second-phase research, we have designed a long-short strategy engine and a high-frequency backtesting system. We plan to conduct simulation validation using datasets, including but not limited to those mentioned in your comments. We sincerely invite you to continue following the in-depth development of this research.
Author action: In the latest version of the manuscript, we have added a discussion and outlook on the practical economic significance of this research in the discussion section as follows:
“These findings not only offer a novel methodological framework for forecasting complex financial time series, but also uncover several practical principles:
Market Dynamics Analysis: The model feature weights reveal the underlying driving mechanisms of price fluctuations and offer a quantitative reference for understanding the multi-scale coupling effects within the market, such as the nonlinear interactions between macro policies and micro trading behaviors.
Investment Decision Optimization: The risk-return dynamic balance mechanism offers an adaptive regulatory framework for portfolio management, facilitating the transition from traditional experience-based strategies to a data-driven, intelligence-enhanced paradigm.”
Comment#3: There is no risk exposure evaluation or assessment of drawdown, as this may put the traders into high risk, leading to massive losses. There is also no consideration for friction in the market. Use common financial metrics like MDDM Sortino ratio, etc. There is a need for comparative benchmark study, which can be cited to compare model's performance against buy-and-hold, moving avg crossovers, etc.
Response 3:Agree. The issues you raised regarding risk exposure assessment and market friction are indeed crucial, and we fully acknowledge their significance in real-world trading scenarios. However, the core objective of this study is to propose a universal methodological framework for prediction, rather than conducting an empirical study on specific trading strategies. Below, we provide further clarification on this research positioning:
Methodological Innovation: The CNN-BiGRU-AM framework addresses the limitations of traditional prediction models in capturing nonlinear patterns and detecting anomalies.
Scalability: It provides a modular interface for follow-up research, facilitating the integration of practical application modules such as risk management and transaction cost optimization.
Comment#4: Justification for choosing BiGRU over LSTM may need more explanation, as stock prices are not symmetrical, and their future movements do not affect the past pricing. Stock prices move sequentially over the time, and this makes me skeptical about the bidirectional processing. There is also the computational cost associated with BiGRU, which will eat up funds in the form of higher computational power and more memory, over LSTM or standard GRU. Maybe authors can provide a comparison between BiGRU, and cite papers for LSTM and GRU to compare. 
Response 4: Agree. We sincerely appreciate the reviewer's insightful critique regarding our choice of BiGRU. The concerns raised about bidirectional processing in stock price prediction are well-founded, as stock price sequences inherently evolve unidirectionally over time. In our implementation, we adopted BiGRU not to leverage future data (which is unavailable in real-time prediction), but to allow the model to holistically analyze historical patterns by bidirectionally processing the already observed time series. This approach aims to capture richer temporal dependencies within the training window, potentially identifying latent patterns that unidirectional models might overlook.
While LSTM's gate mechanisms excel at long-term dependency modeling, we prioritized BiGRU's structural simplicity for two reasons:
Parameter Efficiency: BiGRU's streamlined architecture (compared to LSTM) reduces computational overhead while maintaining competitive performance.
Empirical Validation: Our ablation studies showed BiGRU achieved comparable accuracy to LSTM variants with 23% fewer parameters, striking a practical balance between performance and resource demands.
Comment#5: How does the model arrive at its predictions? This is not very well explained or somewhat absent, as DL models are like black boxes making it difficult to understand why they make certain predictions. SHAP or attention heatmaps would be really helpful in this case as investors need interpretability, and financial institutions need transparency of XAI models for risk assessment and decision making. Its also necessary to determine which factors influence this prediction or contribute the most (factors like volume, volatility, momentum indicators). SHAP can be useful in this case.
Response 5: While the proposed CNN-BiGRU-AM framework demonstrates exceptional predictive accuracy, it must be acknowledged that the inherent black-box nature of deep neural networks imposes limitations on microscopic interpretation of forecasting results. This not only hinders the establishment of trust among investment decision-makers but also partially conflicts with the transparency requirements of regulatory technology (RegTech) in financial compliance. Due to research timeline constraints, the current work has not yet systematically integrated Explainable AI (XAI) techniques – such as SHAP value analysis and attention weight visualization – which will become a key focus of our subsequent investigations. Specifically, we plan to enhance model transparency through three pathways:
Developing a hierarchical feature attribution system to quantify the contribution of multi-dimensional factors (e.g., trading volume, volatility, momentum indicators) to predictions;
Constructing dynamic attention heatmaps to reveal decision-making logic across different temporal scales;
Establishing a predictive confidence assessment framework to explicitly quantify model uncertainty.
In the latest version of the manuscript, we have included a brief explanation of future research directions in the conclusions section:
“Future research can be deepened from three aspects:
Fusion of Multi-Source Heterogeneous Data: Incorporating unstructured information, such as news sentiment analysis (BERT), institutional holdings data (13F Filings), and public opinion factors, to construct a cross-modal prediction system is anticipated to enhance prediction accuracy.
Real-Time Decision Support: Explore lightweight deployment solutions to address the low-latency prediction requirements in high-frequency trading scenarios.
Enhanced Interpretability: SHAP values are employed to analyze the model’s decision logic, with a focus on uncovering the impact of nonlinear feature interactions on prediction outcomes.”
Comment#6: In the ablation experiments, there is a lack of experimental verification of the performance of the FAFM module alone, so the important contribution of the FAFM module mentioned in the paper cannot be fully assessed in the experiments.
Response 6: Agree. We sincerely thank you for raising these critical points. The issues you highlighted—training efficiency, resource consumption, and inference speed—are indeed paramount for real-world deployment in financial applications, and we fully acknowledge their importance.In this study, our current focus has been on validating the core algorithm's effectiveness in time-series forecasting tasks, which represents an exploratory proof-of-concept phase. Consequently, the initial work prioritized architectural innovation and breakthroughs in prediction accuracy.We wholeheartedly accept the limitations you have identified. In subsequent research, we will systematically incorporate engineering-oriented metrics and optimize the model holistically through:
Lightweight design (e.g., neural architecture search for efficient modules)
Knowledge distillation and model compression (e.g., pruning, quantization)
Hardware-aware acceleration (e.g., FPGA/GPU co-design)
The goal is to strike an optimal balance between predictive performance and real-time operational requirements in financial scenarios. Your insights have provided essential guidance for these improvements, and we deeply appreciate your expertise in advancing the practical relevance of this work.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors addressed all my comments and made extensive edits to the initial draft. The result is not perfect, but I find the current version acceptable.

Author Response

Comment 1:  The authors addressed all my comments and made extensive edits to the initial draft. The result is not perfect, but I find the current version acceptable.
Response 1:Thank you very much for your detailed review and valuable feedback on our manuscript! We are truly honored to receive your acknowledgment that the "current version is acceptable." Your meticulous review and constructive comments have played a crucial role in improving our research, and we have benefited immensely from them.
We look forward to your further guidance. Wishing you continued success in your work and a pleasant life!

Reviewer 3 Report

Comments and Suggestions for Authors

I find the rebuttal to be adequate, along with the reasonable timeframe to make changes the responses are satisfactory.

Author Response

Comment 1:  I find the rebuttal to be adequate, along with the reasonable timeframe to make changes the responses are satisfactory.
Response 1:We sincerely appreciate your second-round review comments and recognition of my manuscript. It is a great honor to receive your affirmation. Your meticulous review and constructive feedback have played a significant role in improving our research, and we have gained invaluable insights from them.

Back to TopTop