Enhancing Stock Price Forecasting with CNN-BiGRU-Attention: A Case Study on INDY

Louisa, Madilyn; Darmawan, Gumgum; Tantular, Bertho

doi:10.3390/math13132148

Open AccessArticle

Enhancing Stock Price Forecasting with CNN-BiGRU-Attention: A Case Study on INDY

by

Madilyn Louisa

^†,

Gumgum Darmawan

^*,†

and

Bertho Tantular

^†

Department of Statistics, Universitas Padjadjaran, Jl. Bandung Sumedang km 21 Jatinangor, Sumedang 45363, Indonesia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(13), 2148; https://doi.org/10.3390/math13132148

Submission received: 12 May 2025 / Revised: 23 June 2025 / Accepted: 24 June 2025 / Published: 30 June 2025

Download

Browse Figures

Versions Notes

Abstract

The stock price of PT Indika Energy Tbk (INDY) reflects the dynamics of Indonesia’s energy sector, which is heavily influenced by global coal price fluctuations, national energy policies, and geopolitical conditions. This study aimed to develop an accurate forecasting model to predict the movement of INDY stock prices using a hybrid machine learning approach called CNN-BiGRU-AM. The objective was to generate future forecasts of INDY stock prices based on historical data from 28 August 2019 to 24 February 2025. The method applied a hybrid model combining a Convolutional Neural Network (CNN), Bidirectional Gated Recurrent Unit (BiGRU), and an Attention Mechanism (AM) to address the nonlinear, volatile, and noisy characteristics of stock data. The results showed that the CNN-BiGRU-AM model achieved high accuracy with a Mean Absolute Percentage Error (MAPE) below 3%, indicating its effectiveness in capturing long-term patterns. The CNN helped extract local features and reduce noise, the BiGRU captured bidirectional temporal dependencies, and the Attention Mechanism allocated weights to the most relevant historical information. The model remained robust even when stock prices were sensitive to external factors such as global commodity trends and geopolitical events. This study contributes to providing more accurate forecasting solutions for companies, investors, and stakeholders in making strategic decisions. It also enriches the academic literature on the application of deep learning techniques in financial data analysis and stock market forecasting within a complex and dynamic environment.

Keywords:

CNN-BiGRU-AM; energy market volatility; hybrid deep learning model; INDY stock prediction; stock price forecasting

MSC:

62M10; 91B84; 62P20

1. Introduction

The stock price of PT Indika Energy Tbk (INDY) reflects the dynamic nature of Indonesia’s energy sector. Its movements are highly influenced by global market conditions, national energy policies, and various geopolitical factors [1]. As such, INDY stock serves as a key indicator in the analysis of the energy market [1]. The volatility exhibited by INDY’s stock prices highlights the unpredictable and fast-paced nature of the sector [1]. This volatility underlines the importance of accurate stock price forecasting, which plays a crucial role for companies, investors, and policymakers [2,3,4].

The growing complexity and nonlinearity of financial time series data have prompted researchers to explore more advanced modeling techniques. One promising approach is the development of hybrid models, such as the CNN-BiGRU-AM architecture [5,6,7,8]. This model integrates the strengths of three powerful components: Convolutional Neural Networks (CNNs), Bidirectional Gated Recurrent Units (BiGRUs), and the Attention Mechanism [5,6,7,8]. The CNN is particularly effective in extracting local features from historical stock data and in reducing irrelevant noise [5,9,10,11]. The BiGRU, on the other hand, is capable of capturing long-term dependencies in both the forward and backward directions [12,13,14,15]. Meanwhile, the Attention Mechanism allows the model to focus more precisely on the most relevant segments of historical data during the prediction process [7,16,17,18].

To further enhance the modeling of financial time series data, recent advances also explore neurodynamic-based optimization frameworks. For instance, the work by Leung et al. [19] presents a collaborative neurodynamic approach to tackle minimax and biobjective portfolio selection problems. Their framework optimizes both return and risk under nonlinear and constrained environments—conditions often encountered in stock market forecasting. While our work adopts a hybrid deep learning framework (CNN-BiGRU-AM) rather than a neurodynamic scheme, both approaches share a common objective of improving predictive robustness in high-volatility, high-noise financial domains. This connection affirms the need for further exploration of hybrid and adaptive architectures that account for multi-dimensional objectives and optimization landscapes in finance.

The integration of deep learning and attention-based mechanisms makes the hybrid model more adaptive to shifts in market patterns [7,9,15,16,18]. Unlike traditional models that often struggle with nonstationary behavior, the CNN-BiGRU-AM architecture offers increased flexibility in adjusting to sudden market changes [8,20,21]. This adaptability is critical in stock price forecasting, where patterns are not only complex but also subject to rapid external shocks [22,23,24,25]. By assigning higher weights to the most informative historical features, the model enhances its learning process and improves prediction accuracy [9,21,26]. As a result, the forecast outcomes are more aligned with real-world market conditions [27,28,29].

This research provides practical value by offering a tool that supports strategic decision making. Accurate stock price predictions can reduce the uncertainty faced by stakeholders in the energy and financial sectors [30,31,32]. Companies can plan their financial strategies more effectively, investors can optimize their portfolios, and shareholders can make informed decisions [30,31,32]. Therefore, the proposed model is not only a theoretical contribution but also a practical innovation with high relevance to real-world applications.

Despite the advances in deep learning, conventional models and standard Recurrent Neural Networks (RNNs) still present significant limitations [6,33,34]. One of the most persistent challenges is the vanishing gradient problem, which hinders the model’s ability to retain information over long sequences [35,36,37]. This issue becomes particularly problematic in financial data, where temporal dependencies often span extended periods [38,39,40]. Traditional models, while useful in capturing short-term trends, typically fail to recognize long-term structural patterns [28,29,33]. This limitation calls for more sophisticated and adaptive modeling approaches that can preserve important historical information.

To date, there is still a lack of studies that focus specifically on forecasting INDY’s stock price using complex hybrid architectures. Most existing research tends to concentrate on large-cap stocks or general market indices, leaving mid-cap and sector-specific stocks underexplored. INDY, as a representative of Indonesia’s energy sector, possesses unique characteristics that merit in-depth analysis [1]. The energy sector is highly sensitive to external influences, including global commodity prices, regulatory changes, and geopolitical events [41,42]. This sensitivity further amplifies the need for a focused and customized prediction model.

In recent years, the INDY stock has demonstrated significant instability. Yet, few studies attempt to analyze this behavior using nonlinear modeling approaches. Linear models often fail to capture extreme price fluctuations and regime shifts inherent in stock data. Given that INDY stock exhibits patterns of high variance and nonlinearity, it is more appropriate to approach its forecasting using nonlinear and adaptive methods. This study addresses this gap by employing a hybrid deep learning model tailored to the stock’s unique characteristics

Currently, there are limited machine learning solutions that effectively handle high-volatility stock data, especially in emerging markets like Indonesia. Many models do not incorporate long-term temporal dependencies, which are critical in understanding trends in the energy sector. Furthermore, most forecasting models are not designed to respond dynamically to evolving market conditions. As a result, there is a pressing need for the development of more reliable prediction tools that combine deep learning with temporal and attention-based features.

The demand for accurate forecasting of INDY’s stock price continues to grow. The uncertainty in the energy sector requires models that are capable of adapting quickly to shifting patterns and external influences. Stakeholders need predictive insights that are both timely and accurate to make sound investment and operational decisions. A robust model can help anticipate risks and seize market opportunities, thereby enhancing financial resilience and strategic positioning.

This study aims to construct a hybrid CNN-BiGRU-AM model specifically designed to predict INDY’s stock prices with a high degree of accuracy. The model was trained using historical data ranging from 2019 to 2025, enabling it to learn from various market cycles and anomalies. The ultimate goal is to produce a forecasting system that is not only precise but also efficient in capturing the intricate behaviors of financial time series data. Through this research, we seek to contribute both to academic knowledge and to the practical field of investment analytics.

This study contributes to the literature by presenting a tailored hybrid architecture— CNN-BiGRU-AM—specifically designed to address the forecasting challenges of mid-cap stocks in emerging markets, with INDY stock as a representative case. Unlike conventional models, this approach effectively captures nonlinear dependencies and adapts to regime changes commonly observed in the energy sector. Methodologically, the study introduces a sequence modeling pipeline that combines convolutional feature extraction, bidirectional memory retention, and attention-based weighting. This configuration not only improves predictive accuracy but also enhances the interpretability of model focus on temporally significant data. The proposed framework thus bridges the gap between theoretical advancements in deep learning and their practical applicability in dynamic financial forecasting.

2. Materials and Methods

2.1. Data

This study used secondary data consisting of daily stock prices (on working days) of INDY from 28 August 2019 to 24 February 2025. The data were obtained from the Yahoo Finance website and comprised a total of 1333 observations. The dataset was divided into three subsets, training, validation, and testing data, with a ratio of 60:20:20.

The selection of 28 August 2019 as the starting point was based on the availability of complete, consistent daily trading data from that date onward. This range enabled the model to capture both the pre-pandemic and post-pandemic market conditions, particularly those influenced by global energy demand shocks and economic disruptions. Including the early months preceding COVID-19 ensured that the model learned from baseline price behavior while still reflecting the volatility that emerged during and after the pandemic. This period was also sufficient to meet the data volume requirements for training deep learning architectures effectively.

2.2. Data Pre-Processing

In this study, the dataset underwent a normalization procedure to comply with the modeling standards commonly required in neural-network-based forecasting. Normalization functions as a transformation mechanism designed to reduce substantial differences in the range of feature values [43] by mapping each data point into a standardized interval between 0 and 1 [44]. This process facilitates a more efficient and stable training phase for deep learning models.

The normalization approach adopted in this research is the MinMax Normalization method, also widely recognized as the MinMax Scaler. The transformation is mathematically formulated as expressed in Equation (1):

{\overset{˘}{z}}_{i} = \frac{z_{i} - z_{\min}}{z_{\max} - z_{\min}}

(1)

In this notation,

{\overset{˘}{z}}_{i}

indicates the normalized value of the

i th

observation, which is constrained within the interval

[0, 1]

. The symbol

z_{i}

denotes the original, unnormalized observation, while

z_{\max}

and

z_{\min}

represent the maximum and minimum values within the corresponding feature, respectively. The use of a breve mark over z highlights that the variable has been transformed from its original state, aiding in distinguishing the normalized values throughout the analysis.

This formulation ensures that each input feature contributes proportionally during the model training process and avoids the domination of features with inherently larger scales. As a result, it enhances both learning stability and convergence speed in deep learning architectures.

2.3. CNN-BiGRU-AM Model Design

The design process of the CNN-BiGRU-AM model involved several layers, beginning with convolutional and pooling operations using a one-dimensional Convolutional Neural Network (1D-CNN), followed by the BiGRU sequence processing layer, the weight allocation using the Attention Mechanism (AM), and final integration through a fully connected layer.

2.3.1. Convolutional Layer (Conv1D)

The Conv1D layer functions as a feature extractor by performing convolution operations on the sequential input data, resulting in a set of representations known as feature maps that are forwarded to the subsequent layers. Mathematically, the convolutional output at time step

τ

and convolutional filter index

ζ

, represented as

φ_{τ, ζ}

, are formulated as shown in Equation (2) [45]:

φ_{τ, ζ} = ReLU (\sum_{ν = 1}^{κ} W_{ν, ζ} \cdot \hat{χ_{τ + ν - 1}} + δ_{ζ})

(2)

In Equation (2),

φ_{τ, ζ}

denotes the activation output (feature map) at temporal index

τ

for the

ζ

-th convolutional filter. The term

{\hat{χ}}_{τ + ν - 1}

refers to the pre-processed input data (i.e., normalized and/or imputed) at the shifted time index

τ + ν - 1

. The matrix

W_{ν, ζ}

represents the learnable weights associated with the

ζ

-th kernel at the

ν

-th position of the receptive field, while

δ_{ζ}

is the corresponding bias term, typically initialized to zero for simplification in predictive modeling. The index

ν

iterates across the window length

κ

, capturing localized temporal dependencies in the input sequence. The ReLU function is applied to introduce nonlinearity and enhance the model’s ability to represent complex patterns in the data. This convolutional mechanism is particularly effective in isolating important short-term trends and local dynamics from time series input such as stock prices.

2.3.2. Pooling Layer

The pooling layer is designed to extract the most significant and informative characteristics from the feature maps produced by the preceding convolutional process while concurrently reducing their spatial resolution. This operation enhances computational efficiency and reduces the likelihood of overfitting by limiting the number of trainable parameters [46]. In this study, an average pooling strategy was utilized to preserve the general pattern of localized features, especially in sequential data like stock prices.

Mathematically, the result of the pooling process for the

ρ

-th region, denoted by

ω_{ρ}

, is described by Equation (3):

ω_{ρ} = \frac{1}{|R_{ρ}|} \sum_{ξ \in R_{ρ}} α_{ξ}

(3)

In this formulation,

|R_{ρ}|

represents the number of elements within the pooling segment

R_{ρ}

, and

α_{ξ}

corresponds to the activation value at position

ξ

in that region. This averaging operation results in a downsampled yet information-rich representation of the original feature map, effectively summarizing the critical temporal or spatial signals needed for the next modeling stages, such as the bidirectional GRU or attention mechanism.

2.3.3. BiGRU Layer

The output generated from the pooling layer is subsequently directed to the Bidirectional Gated Recurrent Unit (BiGRU) architecture, which is responsible for extracting temporal dependencies in both chronological and reverse chronological directions. This dual-path mechanism enhances the model’s ability to learn complex sequential behavior in time series data, such as stock price fluctuations.

The BiGRU executes two distinct operations: a forward GRU denoted as

G^{\to} (η_{τ})

and a backward GRU expressed as

G^{\leftarrow} (η_{τ})

. These are later combined to form the bidirectional hidden representation,

η_{τ}

. In each GRU path, the process begins with the computation of the reset gate

ρ_{τ}

, followed by the update gate

ζ_{τ}

, and then the generation of a candidate hidden state

{\hat{η}}_{τ}

. These steps are mathematically defined in Equations (4)–(7):

\begin{matrix} ρ_{τ} & = σ (Θ^{ρ} \cdot χ_{τ} + Γ^{ρ} \cdot η_{τ - 1} + β^{ρ}) \end{matrix}

(4)

\begin{matrix} ζ_{τ} & = σ (Θ^{ζ} \cdot χ_{τ} + Γ^{ζ} \cdot η_{τ - 1} + β^{ζ}) \end{matrix}

(5)

\begin{matrix} \hat{η_{τ}} & = tanh (Θ^{ν} \cdot χ_{τ} + Γ^{ν} \cdot (ρ_{τ} ⊙ η_{τ - 1})) \end{matrix}

(6)

\begin{matrix} η_{τ} & = (1 - ζ_{τ}) ⊙ η_{τ - 1} + ζ_{τ} ⊙ \hat{η_{τ}} \end{matrix}

(7)

In these expressions,

χ_{τ}

denotes the input vector at time step

τ

,

Θ^{*}

and

Γ^{*}

are learnable weight matrices associated with the input and previous hidden state, respectively, and

β^{*}

represents the bias term for each gate. The operator ⊙ denotes element-wise multiplication.

The construction of the bidirectional hidden state continues with the forward and backward computations defined as follows:

\begin{matrix} η_{τ}^{\to} & = G^{\to} (χ_{τ}, η_{τ - 1}^{\to}) \end{matrix}

(8)

\begin{matrix} η_{τ}^{\leftarrow} & = G^{\leftarrow} (χ_{τ}, η_{τ + 1}^{\leftarrow}) \end{matrix}

(9)

\begin{matrix} η_{τ} & = η_{τ}^{\to} ‖ η_{τ}^{\leftarrow} \end{matrix}

(10)

In Equation (10), the symbol ‖ represents the concatenation of the forward and backward hidden states. This final representation

η_{τ}

encapsulates context from both temporal directions and is used as input to the subsequent attention mechanism or output layer.

2.3.4. Application of Attention Mechanism

The Attention Mechanism was applied to the latent representations generated by the preceding BiGRU layer, aiming to emphasize information at each time step based on its relative significance in producing the final predictive outcome. In this framework, the Bahdanau attention—also known as additive attention—was adopted due to its efficacy in managing nonlinear sequential relationships. The computation involved is detailed in Equations (11)–(13). The context vector, denoted by

χ

, was computed as a weighted aggregation of temporal hidden representations

μ_{ν}

, where the weights

ω_{ν}

quantify the attention level assigned to each temporal unit, as shown in Equation (11):

χ = \sum_{ν = 1}^{T} ω_{ν} μ_{ν}

(11)

The attention weight

ω_{ν}

was determined by normalizing the attention energy score

ζ_{ν}

using a softmax function to ensure probabilistic interpretation over the time steps, which is expressed as

ω_{ν} = softmax (ζ_{ν}) = \frac{exp (ζ_{ν})}{\sum_{κ = 1}^{T} exp (ζ_{κ})}

(12)

The attention energy

ζ_{ν}

was then computed using a combination of the decoder’s previous hidden representation

δ_{τ - 1}

and the encoder output

μ_{ν}

passed through a nonlinear transformation involving a hyperbolic tangent function, as shown in Equation (13):

ζ_{ν} = η^{⊤} tanh (U_{δ} δ_{τ - 1} + U_{μ} μ_{ν} + ϵ)

(13)

In these equations,

χ

represents the final context vector capturing the global attention-weighted information;

μ_{ν}

denotes the encoder’s latent vector at time step

ν

;

ω_{ν}

indicates the learned importance of each encoder output;

δ_{τ - 1}

is the decoder’s hidden state from the previous step;

U_{δ}

and

U_{μ}

are the weight matrices corresponding to decoder and encoder states, respectively;

ϵ

is the bias term; and

η^{⊤}

is a learned parameter vector that projects the combined transformation into a scalar attention score.

2.3.5. Fully Connected Layer

In the final processing stage of the hybrid CNN-BiGRU-AM architecture, the aggregated context vector derived from the attention mechanism was forwarded as the input to the dense (fully connected) layer. This transformation was designed to project the high-level temporal representations into the target output space. Mathematically, the output of the dense layer at time step

τ_{FC}

, denoted as

ψ_{τ_{FC}}

, is defined in Equation (14):

ψ_{τ_{F}} = ς (M_{F} \cdot χ + ξ_{FC})

(14)

In this expression,

ψ_{τ_{FC}}

represents the final activated output of the dense layer;

M_{FC}

is the trainable weight matrix associated with the fully connected transformation;

χ

is the context vector summarizing information from all time steps as produced by the attention mechanism;

ξ_{FC}

refers to the bias parameter specific to the dense layer; and

ς (\cdot)

denotes the activation function—commonly a sigmoid function—applied element-wise to introduce nonlinearity into the final prediction. This operation ensured the model’s ability to map complex, sequential dependencies into the desired predictive output form using the comprehensive information encoded in

χ

.

2.4. Model Evaluation Metrics

To evaluate the predictive performance of the proposed hybrid model, a model validation process was carried out using a widely accepted error metric in time series forecasting [47]. Among various evaluation criteria such as the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), the Mean Absolute Percentage Error (MAPE) was selected as the primary indicator of model accuracy in this study, owing to its interpretability in percentage terms and its sensitivity to relative errors. The MAPE was computed based on the following formulation:

MAPE = \frac{1}{κ} \sum_{j = 1}^{κ} |\frac{Y_{j} - \hat{Y_{j}}}{Y_{j}}| \times 100 %

(15)

In this expression,

κ

denotes the total number of time instances observed in the evaluation set;

Y_{j}

refers to the ground truth value at the j-th point in time, and

{\hat{Y}}_{j}

indicates the corresponding predicted value produced by the model. Notably, the symbol

\hat{Y}

is used to emphasize that the predicted output is derived from the imputed and learned sequences rather than raw observations. The use of the MAPE enabled the assessment of relative prediction errors across all forecasting points, facilitating a comprehensive understanding of the model’s generalization ability in capturing temporal patterns.

3. Results

3.1. Determination of Hyperparameter Tuning Intervals

This study utilized the Optuna library in Python 3 for conducting the hyperparameter tuning process. The range of values for each hyperparameter considered during the tuning process is shown in Table 1.

The selection of the most optimal hyperparameter combination from the specified intervals was based on the minimum value of the loss function. A time step of 5 was used in this study, as stock price movements were highly influenced by short-term factors. After performing the tuning, the combination of hyperparameters that yielded the lowest training loss (0.0004623) and testing loss (0.0001786) is presented in Table 2.

This combination was then used for training the CNN-BiGRU-AM model.

3.2. Model Training

The CNN-BiGRU-AM model was trained to achieve optimal performance in forecasting the stock prices of INDY. The training process was conducted in a sequential manner using a batch size of 62, meaning that a batch of data was processed only after the previous one was completed.

During training, the model was run for a total of 1000 epochs. However, based on the training and validation loss graph shown in Figure 1, the training stopped early after 257 iterations, when no further improvement was observed. At this point, the training loss reached 0.00045, while the validation loss settled at 0.00035.

The close resemblance between the training and validation loss trends indicated that the model generalized well to unseen data and was not overfitting. This suggests that the model has strong potential to perform robust predictions even on data with different characteristics.

3.3. Model Visualization

The results from the CNN-BiGRU-AM model training were visualized to compare the actual and predicted stock prices. These visualizations are presented in Figure 2, Figure 3, Figure 4 and Figure 5.

The graph for the validation data (Figure 3) demonstrates that the predicted curve closely followed the actual stock price values, although there were a few deviations during periods of significant change, such as between March and May 2024. Similarly, in the testing data (Figure 4), the predicted prices successfully captured the overall trend of the actual prices, with visible alignment even during large fluctuations, such as those observed around August to September 2024.

Overall, the CNN-BiGRU-AM model was able to forecast stock prices with high fidelity, even during periods of considerable market volatility. This performance was particularly notable in the context of major external factors such as the COVID-19 pandemic and the Russia–Ukraine conflict, which heavily influenced global coal prices during the 2021–2022 period.

3.4. Model Evaluation

The model was evaluated using the Mean Absolute Percentage Error (MAPE), a common metric that measures the average magnitude of forecasting errors as a percentage of actual values. The evaluation results for training, validation, testing, and overall data are summarized in Table 3.

As observed in Table 3, all MAPE values were below 3%, indicating that the model achieved excellent prediction accuracy across all subsets of data. These results suggest that the CNN-BiGRU-AM model is well suited for forecasting stock prices with high precision and reliability.

3.5. Benchmarking with Traditional Models and Related Works

To evaluate the relative performance of the proposed CNN-BiGRU-AM model, a benchmarking comparison was conducted against two widely used classical forecasting approaches: the AutoRegressive Integrated Moving Average (ARIMA) model and Exponential Smoothing. Both models were applied to the same normalized INDY stock price dataset used for our deep learning model. The optimal ARIMA configuration was determined using the Akaike Information Criterion (AIC), resulting in ARIMA(2,1,0). For Exponential Smoothing, the Holt–Winters additive method was employed.

Table 4 summarizes the comparative results in terms of the Mean Absolute Percentage Error (MAPE). The CNN-BiGRU-AM model outperformed the traditional models, achieving a significantly lower MAPE value. This demonstrates its superior capacity in capturing the nonlinear and dynamic patterns present in the INDY time series data.

These results confirm that the hybrid deep learning architecture not only performed better than classical models but also surpassed the results reported in recent studies. The integration of CNN, BiGRU, and Attention Mechanism components contributes significantly to this improvement by capturing short-term features, long-range dependencies, and salient temporal information simultaneously.

3.6. Error Direction Analysis

The forecasted prices for the five-day prediction window, as presented in Table 5, consistently exceeded the actual prices observed in the market. This pattern may initially suggest a directional bias in the model’s predictions. All five forecasts showed overestimated values, with deviations ranging between 56 and 146 units from the corresponding actual prices. Although this overestimation is evident in the short-term forecast window, it is important to examine whether such a bias persists across the broader test set.

To address this, an extended error direction analysis was performed over the entire test dataset, as presented in the next subsection. The results reveal that approximately 53.2% of the predictions were overestimated, while 46.8% were underestimated. This indicates a mild upward tendency in forecast outputs but does not reflect a dominant systematic bias. This observation highlights the need for continuous recalibration, particularly in volatile or bearish market conditions.

Analysis of Prediction Error Direction

To evaluate whether the model’s predictions exhibit directional bias, we analyzed the sign of prediction errors on the entire test set. Let the prediction error be defined as

e_{t} = {\hat{y}}_{t} - y_{t}

. The results show that out of all predictions, 53.2% were overestimations (

e_{t} > 0

), while 46.8% were underestimations (

e_{t} < 0

). This distribution suggests a slight upward bias, but the imbalance is not substantial enough to conclude the presence of a systematic forecasting error. The error direction histogram is presented in Figure 6, which confirms the near-balanced error spread around zero.

Figure 6 displays the histogram of prediction errors (

{\hat{y}}_{t} - y_{t}

) generated by the proposed CNN-BiGRU-AM model across the full test dataset. The distribution illustrates both positive and negative deviations from actual stock prices, allowing for a clearer assessment of model bias. The red dashed vertical line at zero represents a perfect prediction. The relatively symmetric distribution around this zero point suggests that the model produces both overestimations and underestimations in comparable proportions. This indicates an absence of a dominant systematic forecasting bias. The histogram also confirms that the majority of prediction errors fall within a narrow range, further supporting the model’s robustness and generalization capability.

3.7. Forecasting

After completing the training and evaluation stages, the final step involved forecasting the INDY stock price for the next five working days based on the last available data from the testing period. The results of this short-term forecast are shown in Table 5 and visualized in Figure 7.

The forecasted prices showed a gradual downward trend, which closely mirrored the actual movement of the stock prices during the same period. The MAPE for this 5-day forecast was calculated at 8.03%, which was still considered acceptable given the highly volatile nature of stock markets.

These forecast results could serve as valuable references for corporate decision making, particularly in the domains of operational planning, investment strategies, and financial policy formulation. Moreover, for investors and shareholders, the ability to anticipate short-term market movements enables more informed buy–sell decisions, potentially leading to profit maximization and risk mitigation.

4. Discussion

The process of determining the hyperparameter tuning interval employed the Optuna library in Python 3. The intervals for each hyperparameter were selected carefully to cover a wide range of possible optimal values, ensuring a thorough exploration of the parameter space. The tuning procedure relied on the minimization of the loss function, with the objective of identifying the best-performing model configuration. The tuning results indicate that the optimal combination includes 181 filters, a kernel size of 2, 200 GRU units, and a dropout rate of 0.0474. In addition, the selection of a time step of 5 supports the model’s ability to capture short-term trends, which is crucial in stock price forecasting, where market behavior is often driven by immediate, short-lived factors.

The training process was conducted sequentially with a batch size of 62 observations per iteration. Although the training was initially set for 1000 epochs, the implementation of early stopping allowed the process to terminate automatically after 257 iterations. This technique helps prevent overfitting and reduces unnecessary computational time. The loss graph illustrates a parallel movement between training and validation loss curves, suggesting that the model maintains a strong generalization capability and does not overfit the training data. The final training loss and validation loss values were 0.00045 and 0.00035, respectively, both of which reflect excellent model performance and stability.

The results of the modeling process have been visualized through graphs comparing the actual and predicted values for the training, validation, and testing datasets. These visualizations provide critical insights into the model’s predictive capabilities. The model accurately followed the movement patterns of stock prices, even though some discrepancies between actual and predicted values occurred, particularly during periods of abrupt price changes. Notably, during March–May 2024 and August–September 2024, there were visible significant fluctuations in the actual stock prices. The model was still able to capture these changes with reasonable accuracy, demonstrating its adaptive response to volatile market conditions, which could be influenced by external events such as political instability or global economic disruptions.

Model evaluation was carried out using the Mean Absolute Percentage Error (MAPE) as the primary metric, which quantifies the average percentage difference between actual and predicted values. This metric is widely used in time series forecasting to assess model accuracy. Based on the evaluation results, the model recorded an MAPE of 2.68% for the training data, 1.88% for the validation data, and 1.99% for the testing data. The overall MAPE of 2.38% indicates a high level of accuracy, supporting the conclusion that the CNN-BiGRU-AM model is both robust and reliable. These values are well within acceptable forecasting error thresholds and affirm that the model can be trusted for real-world implementation in financial decision making.

The stock price forecasting was conducted for the next five working days beyond the testing dataset, providing practical insights into the model’s predictive utility. The forecasting results exhibited a gradual downward trend in stock prices. While some variation exists between predicted and actual values, the forecasting maintained an acceptable MAPE of 8.03%. This level of accuracy remains practical and insightful, especially for short-term forecasting under fluctuating market conditions. The comparison graph between the predicted and actual values further confirms that the model successfully captures the general movement trend, even when minor discrepancies are present.

The implications of this forecasting exercise are substantial for both corporate and investor perspectives. For companies, such predictive capabilities support strategic decision making in areas such as inventory planning, resource allocation, and capital investment. Reliable short-term price forecasts also help firms prepare for market shifts and mitigate risks associated with price volatility. From an investor’s standpoint, the model’s ability to provide advance warnings of potential price movements allows for better-informed decisions regarding stock buying, selling, or holding strategies. Thus, the model not only serves academic and analytical purposes but also holds practical relevance for stakeholders involved in financial markets.

In summary, the use of the hybrid CNN-BiGRU-AM model yields promising results in forecasting stock prices. The careful selection and tuning of hyperparameters, combined with an effective model training strategy and a robust evaluation framework, enable the model to generate predictions that align closely with actual data. The model demonstrates reliability across training, validation, and testing datasets, as well as in unseen data used for real-time forecasting. These results validate the suitability of this model for short-term stock price forecasting and highlight its potential application in broader financial analytics contexts.

5. Conclusions

This study confirms the effectiveness of the CNN-BiGRU-Attention (CNN-BiGRU-AM) model in forecasting INDY stock prices with high accuracy. The model combines three powerful deep learning components: a CNN for extracting essential features and reducing noise, a BiGRU for capturing bidirectional temporal dependencies, and an Attention Mechanism for focusing on the most relevant parts of the data. This integration enables the model to better understand and adapt to complex patterns in volatile stock price movements.

The results show that the model achieved a low forecasting error, with a Mean Absolute Percentage Error (MAPE) consistently below 3% across different data splits. This indicates that the model successfully addresses the nonlinear and highly fluctuating nature of stock market data. It also maintains robust performance during periods affected by external factors such as global coal prices, energy policy shifts, and geopolitical events.

Practically, the model’s accurate forecasts provide valuable support for companies and investors in making more informed decisions. Businesses can utilize these predictions to plan strategic actions, manage financial risk, and respond more effectively to market shifts. Theoretically, this research contributes to the development of advanced predictive models in financial analytics, highlighting the potential of hybrid deep learning architectures in time series forecasting.

In addition to its predictive success, the study offers significant methodological contributions. By integrating three distinct neural components into a unified forecasting model, the CNN-BiGRU-AM architecture provides a replicable framework for time series modeling in volatile and nonlinear financial environments. Its performance on INDY stock demonstrates its applicability to other sector-specific assets with similar volatility profiles. Moreover, the modular design of the model allows for adaptation across various forecasting horizons and domains, paving the way for future applications in energy economics, portfolio optimization, and risk assessment. These findings contribute to the growing body of research on hybrid deep learning methods and support their continued adoption in financial analytics.

Overall, the CNN-BiGRU-AM model not only enhances forecasting performance but also offers practical and theoretical benefits that align with the evolving demands of financial market analysis.

Author Contributions

Conceptualization, M.L., G.D. and B.T.; methodology, M.L. and B.T.; software, M.L. and B.T.; validation, M.L. and B.T.; formal analysis, M.L.; investigation, M.L.; resources, G.D.; data curation, M.L.; writing—original draft preparation, M.L.; writing—review and editing, G.D.; visualization, M.L.; supervision, G.D. and B.T.; project administration, G.D.; funding acquisition, G.D. All authors have read and agreed to the published version of the manuscript.

Funding

The authors are grateful to the Universitas Padjadjaran, which supports this research under Rector’s Decree Number 1/UN6.RKT/Kep/HK/2025 concerning Assistance for Publishing Costs of Scopus Indexed Journals at Universitas Padjadjaran.

Data Availability Statement

The dataset is available from the corresponding author upon reasonable request to ensure ethical compliance and responsible data sharing.

Acknowledgments

The authors extend their utmost gratitude to Universitas Padjadjaran for sponsoring and funding the thesis’s publication, reflected in the number ‘S.279./MAD/PROG/STAT/2025’. This sponsorship will fuel the research to proceed and enhance this work’s scholarly contribution.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Azhar, R.; Kesumah, F.S.D.; Ambya, A.; Wisnu, F.K.; Russel, E. Application of short-term forecasting models for energy entity stock price (Study on Indika Energi Tbk, JII). Int. J. Energy Econ. Policy 2020, 10, 294–301. [Google Scholar] [CrossRef]
Todorov, I.B.; Sánchez Lasheras, F. Stock Price Forecasting of IBEX35 Companies in the Petroleum, Electricity, and Gas Industries. Energies 2023, 16, 3856. [Google Scholar] [CrossRef]
Rheynaldi, P.K.; Endri, E.; Minanari, M.; Ferranti, P.A.; Karyatun, S. Energy price and stock return: Evidence of energy sector companies in Indonesia. Int. J. Energy Econ. Policy 2023, 13, 31–36. [Google Scholar] [CrossRef]
Alkathery, M.A.; Chaudhuri, K.; Nasir, M.A. Implications of clean energy, oil and emissions pricing for the GCC energy sector stock. Energy Econ. 2022, 112, 106119. [Google Scholar] [CrossRef]
Akşehir, Z.D.; Kilic, E. How to handle data imbalance and feature selection problems in CNN-based stock price forecasting. IEEE Access 2022, 10, 31297–31305. [Google Scholar] [CrossRef]
Song, H.; Choi, H. Forecasting stock market indices using the recurrent neural network based hybrid models: CNN-LSTM, GRU-CNN, and ensemble models. Appl. Sci. 2023, 13, 4644. [Google Scholar] [CrossRef]
Xu, H.; Chai, L.; Luo, Z.; Li, S. Stock movement prediction via gated recurrent unit network based on reinforcement learning with incorporated attention mechanisms. Neurocomputing 2022, 467, 214–228. [Google Scholar] [CrossRef]
Zhou, G.; Guo, Z.; Sun, S.; Jin, Q. A CNN-BiGRU-AM neural network for AI applications in shale oil production prediction. Appl. Energy 2023, 344, 121249. [Google Scholar] [CrossRef]
Luo, A.; Zhong, L.; Wang, J.; Wang, Y.; Li, S.; Tai, W. Short-term stock correlation forecasting based on CNN-BiLSTM enhanced by attention mechanism. IEEE Access 2024, 12, 29617–29632. [Google Scholar] [CrossRef]
Joshi, S.; Mahanthi, B.L.; Pavithra, G.; Pokkuluri, K.S.; Ninawe, S.S.; Sahu, R. Integrating LSTM and CNN for Stock Market Prediction: A Dynamic Machine Learning Approach. J. Artif. Intell. Technol. 2025, 5, 168–179. [Google Scholar] [CrossRef]
Li, Y.; Wang, X.; Guo, Y. CNN-Trans-SPP: A small Transformer with CNN for stock price prediction. Electron. Res. Arch. 2024, 32, 6717–6732. [Google Scholar] [CrossRef]
Mao, Z.; Wu, C. Stock price index prediction based on SSA-BiGRU-GSCV model from the perspective of long memory. Kybernetes 2024, 53, 5905–5931. [Google Scholar] [CrossRef]
Ma, P.; Li, G.; Zhang, H.; Wang, C.; Li, X. Prediction of remaining useful life of rolling bearings based on multiscale efficient channel attention CNN and bidirectional GRU. IEEE Trans. Instrum. Meas. 2024, 73, 2508413. [Google Scholar] [CrossRef]
Wang, M.; Wang, X. Hybrid neural networks for solving fully coupled, high-dimensional forward–backward stochastic differential equations. Mathematics 2024, 12, 1081. [Google Scholar] [CrossRef]
Wang, H.; Zhang, L.; Zhao, J. Application of a Fusion Attention Mechanism-Based Model Combining Bidirectional Gated Recurrent Units and Recurrent Neural Networks in Soil Nutrient Content Estimation. Agronomy 2023, 13, 2724. [Google Scholar] [CrossRef]
Wen, X.; Li, W. Time series prediction based on LSTM-attention-LSTM model. IEEE Access 2023, 11, 48322–48331. [Google Scholar] [CrossRef]
Yan, J.; Qin, G.; Sun, M.; Liang, Y.; Zhang, Z. Dimension decoupling attention mechanism for time series prediction. Neurocomputing 2022, 494, 160–170. [Google Scholar] [CrossRef]
Abbasimehr, H.; Paki, R. Improving time series forecasting using LSTM and attention models. J. Ambient Intell. Humaniz. Comput. 2022, 13, 673–691. [Google Scholar] [CrossRef]
Leung, M.F.; Wang, J. Minimax and biobjective portfolio selection based on collaborative neurodynamic optimization. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 2825–2836. [Google Scholar] [CrossRef]
Mutinda, J.K.; Langat, A.K. Stock price prediction using combined GARCH-AI models. Sci. Afr. 2024, 26, e02374. [Google Scholar] [CrossRef]
Wang, J.; Cui, Q.; Sun, X.; He, M. Asian stock markets closing index forecast based on secondary decomposition, multi-factor analysis and attention-based LSTM model. Eng. Appl. Artif. Intell. 2022, 113, 104908. [Google Scholar] [CrossRef]
Jia, Y.; Anaissi, A.; Suleiman, B. ResNLS: An improved model for stock price forecasting. Comput. Intell. 2024, 40, e12608. [Google Scholar] [CrossRef]
Zhao, Y.; Chen, Z. Forecasting stock price movement: New evidence from a novel hybrid deep learning model. J. Asian Bus. Econ. Stud. 2022, 29, 91–104. [Google Scholar] [CrossRef]
Yadav, K.; Yadav, M.; Saini, S. Stock values predictions using deep learning based hybrid models. CAAI Trans. Intell. Technol. 2022, 7, 107–116. [Google Scholar] [CrossRef]
Kanwal, A.; Lau, M.F.; Ng, S.P.; Sim, K.Y.; Chandrasekaran, S. BiCuDNNLSTM-1dCNN—A hybrid deep learning-based predictive model for stock price prediction. Expert Syst. Appl. 2022, 202, 117123. [Google Scholar] [CrossRef]
Yun, K.K.; Yoon, S.W.; Won, D. Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process. Expert Syst. Appl. 2021, 186, 115716. [Google Scholar] [CrossRef]
Li, S.; Xu, S. Enhancing stock price prediction using GANs and transformer-based attention mechanisms. Empir. Econ. 2025, 68, 373–403. [Google Scholar] [CrossRef]
Jayanth, T.; Manimaran, A. Developing a Novel Hybrid Model Double Exponential Smoothing and Dual Attention Encoder-Decoder Based Bi-Directional Gated Recurrent Unit Enhanced with Bayesian Optimization to Forecast Stock Price. IEEE Access 2024, 12, 114760–114785. [Google Scholar] [CrossRef]
Saqware, G.J. Hybrid Deep Learning Model Integrating Attention Mechanism for the Accurate Prediction and Forecasting of the Cryptocurrency Market. In Operations Research Forum; Springer: Berlin/Heidelberg, Germany, 2024; Volume 5, p. 19. [Google Scholar]
Mu, G.; Gao, N.; Wang, Y.; Dai, L. A stock price prediction model based on investor sentiment and optimized deep learning. IEEE Access 2023, 11, 51353–51367. [Google Scholar] [CrossRef]
Manujakshi, B.; Kabadi, M.G.; Naik, N. A hybrid stock price prediction model based on pre and deep neural network. Data 2022, 7, 51. [Google Scholar] [CrossRef]
Liu, R.; Gupta, R. Investors’ uncertainty and forecasting stock market volatility. J. Behav. Financ. 2022, 23, 327–337. [Google Scholar] [CrossRef]
Pham, H.V.; Lam, H.P.; Duy, L.N.; Pham, T.B.; Trinh, T.D. An improved convolutional recurrent neural network for stock price forecasting. IAES Int. J. Artif. Intell. IJ-AI 2024, 13, 3381. [Google Scholar] [CrossRef]
Li, C.; Qian, G. Stock price prediction using a frequency decomposition based GRU transformer neural network. Appl. Sci. 2022, 13, 222. [Google Scholar] [CrossRef]
Wu, Y.; Hu, F.; Yue, C.; Sun, S. Residual-time gated recurrent unit. Neurocomputing 2025, 624, 129396. [Google Scholar] [CrossRef]
Li, Y.; Wang, Z.; Xing, R.; Shao, C.; Shi, S.; Li, J.; Zhong, G.; Gu, Y. Quantum Gated Recurrent Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 2493–2504. [Google Scholar] [CrossRef]
Zadorozhnyy, V.; Mucllari, E.; Pospisil, C.; Nguyen, D.; Ye, Q. Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation. Neural Comput. 2024, 36, 2651–2676. [Google Scholar] [CrossRef]
Stamatopoulos, N.; Mazzola, G.; Woerner, S.; Zeng, W.J. Towards quantum advantage in financial market risk using quantum gradient algorithms. Quantum 2022, 6, 770. [Google Scholar] [CrossRef]
Olorunnimbe, K.; Viktor, H. Ensemble of temporal Transformers for financial time series. J. Intell. Inf. Syst. 2024, 62, 1087–1111. [Google Scholar] [CrossRef]
Zaheer, S.; Anjum, N.; Hussain, S.; Algarni, A.D.; Iqbal, J.; Bourouis, S.; Ullah, S.S. A multi parameter forecasting for stock time series data using LSTM and deep learning model. Mathematics 2023, 11, 590. [Google Scholar] [CrossRef]
Kusmayadi, D.; Firmansyah, I.; Dwi Dermawan, W.; Kurniawan, K. Does an energy company’s sensitivity affect its performance?: Environmental, social and governance analysis in coal, gas, oil, and basic materials industry companies. Int. J. Energy Econ. Policy 2024, 14, 234–243. [Google Scholar] [CrossRef]
van Benthem, A.A.; Crooks, E.; Giglio, S.; Schwob, E.; Stroebel, J. The effect of climate risks on the interactions between financial markets and energy companies. Nat. Energy 2022, 7, 690–697. [Google Scholar] [CrossRef]
Han, J.; Kamber, M.; Pei, J. Introduction. In Data Mining; Elsevier: Amsterdam, The Netherlands, 2012; pp. 1–38. [Google Scholar]
Shalabi, L.A.; Shaaban, Z.; Kasasbeh, B. Data Mining: A Preprocessing Engine. J. Comput. Sci. 2006, 2, 735–739. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Guessoum, S.; Belda, S.; Ferrandiz, J.M.; Modiri, S.; Raut, S.; Dhar, S.; Heinkelmann, R.; Schuh, H. The short-term prediction of length of day using 1D convolutional neural networks (1D CNN). Sensors 2022, 22, 9517. [Google Scholar] [CrossRef]
Boshnakov, G.N. Introduction to Time Series Analysis and Forecasting, 2nd ed.; Wiley Series in Probability and Statistics; Montgomery, D.C., Jennings, C.L., Kulahci, M., Eds.; John Wiley and Sons: Hoboken, NJ, USA, 2015; 672p. [Google Scholar]

Figure 1. Training loss and validation loss over epochs.

Figure 2. Actual vs. predicted values for training data.

Figure 3. Actual vs. predicted values for validation data.

Figure 4. Actual vs. predicted values for testing data.

Figure 5. Actual vs. predicted values for all data subsets.

Figure 6. Histogram of prediction errors (

{\hat{y}}_{t} - y_{t}

) for the test set.

Figure 6. Histogram of prediction errors (

{\hat{y}}_{t} - y_{t}

) for the test set.

Figure 7. Forecast vs. actual for 5-day forecast horizon.

Table 1. Hyperparameter tuning interval.

Hyperparameter	Range Interval
Filter	[16–256]
Kernel Size	[2–4]
GRU Units	[16–256]
FC Units	[16–256]
Learning Rate	[0.01–0.0001]
Batch Size	[16–256]
Dropout Rate	[0.01–0.5]

Table 2. Optimal hyperparameter configuration.

Hyperparameter	Optimal Value
Filter	181
Kernel Size	2
GRU Units	200
FC Units	74
Learning Rate	0.0017
Batch Size	62
Dropout Rate	0.0474

Table 3. MAPE scores for CNN-BiGRU-AM model.

Data	MAPE
Training	2.68%
Validation	1.88%
Testing	1.99%
Overall	2.38%

Table 4. Forecasting accuracy comparison with traditional models and related studies.

Model	MAPE
ARIMA (2,1,0)	4.76%
Exponential Smoothing	5.12%
Proposed CNN-BiGRU-AM	1.99%

Table 5. Forecasted INDY stock prices for 5 working days.

Schedule	Forecasted Price	Actual Price
25 February 2025	1520	1440
26 February 2025	1508	1400
27 February 2025	1490	1420
28 February 2025	1480	1334
3 March 2025	1476	1365

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Louisa, M.; Darmawan, G.; Tantular, B. Enhancing Stock Price Forecasting with CNN-BiGRU-Attention: A Case Study on INDY. Mathematics 2025, 13, 2148. https://doi.org/10.3390/math13132148

AMA Style

Louisa M, Darmawan G, Tantular B. Enhancing Stock Price Forecasting with CNN-BiGRU-Attention: A Case Study on INDY. Mathematics. 2025; 13(13):2148. https://doi.org/10.3390/math13132148

Chicago/Turabian Style

Louisa, Madilyn, Gumgum Darmawan, and Bertho Tantular. 2025. "Enhancing Stock Price Forecasting with CNN-BiGRU-Attention: A Case Study on INDY" Mathematics 13, no. 13: 2148. https://doi.org/10.3390/math13132148

APA Style

Louisa, M., Darmawan, G., & Tantular, B. (2025). Enhancing Stock Price Forecasting with CNN-BiGRU-Attention: A Case Study on INDY. Mathematics, 13(13), 2148. https://doi.org/10.3390/math13132148

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Stock Price Forecasting with CNN-BiGRU-Attention: A Case Study on INDY

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. Data Pre-Processing

2.3. CNN-BiGRU-AM Model Design

2.3.1. Convolutional Layer (Conv1D)

2.3.2. Pooling Layer

2.3.3. BiGRU Layer

2.3.4. Application of Attention Mechanism

2.3.5. Fully Connected Layer

2.4. Model Evaluation Metrics

3. Results

3.1. Determination of Hyperparameter Tuning Intervals

3.2. Model Training

3.3. Model Visualization

3.4. Model Evaluation

3.5. Benchmarking with Traditional Models and Related Works

3.6. Error Direction Analysis

Analysis of Prediction Error Direction

3.7. Forecasting

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI