CEGH: A Hybrid Model Using CEEMD, Entropy, GRU, and History Attention for Intraday Stock Market Forecasting

Intraday stock time series are noisier and more complex than other financial time series with longer time horizons, which makes it challenging to predict. We propose a hybrid CEGH model for intraday stock market forecasting. The CEGH model contains four stages. First, we use complete ensemble empirical mode decomposition (CEEMD) to decompose the original intraday stock market data into different intrinsic mode functions (IMFs). Then, we calculate the approximate entropy (ApEn) values and sample entropy (SampEn) values of each IMF to eliminate noise. After that, we group the retained IMFs into four groups and predict the comprehensive signals of those groups using a feedforward neural network (FNN) or gate recurrent unit with history attention (GRU-HA). Finally, we obtain the final prediction results by integrating the prediction results of each group. The experiments were conducted on the U.S. and China stock markets to evaluate the proposed model. The results demonstrate that the CEGH model improved forecasting performance considerably. The creation of a collaboration between CEEMD, entropy-based denoising, and GRU-HA is our major contribution. This hybrid model could improve the signal-to-noise ratio of stock data and extract global dependence more comprehensively in intraday stock market forecasting.


Introduction
It is generally accepted that stock markets are crucial for modern societies and economies. With the increasing availability of stock time series at an intraday frequency and the development of quantitative trading, intraday stock market forecasting has become a hot issue for economists, investors, and regulators. Intraday stocks are securities that trade on the markets during regular business hours. Typically, 5-min, 30-min, and 60-min charts are used to capture intraday stock price movements. Unfortunately, intraday stock time series are noisier and more complex than other financial time series with longer time horizons. To begin with, the intraday market is more vulnerable to policy uncertainty and investor sentiment. Furthermore, the intraday time series contains intraday seasonal cycles. In addition, black swans, such as the current COVID-19 pandemic, can have a serious impact on stock markets, which are particularly sensitive to changes [1]. Moreover, the forecasting models often pick up noise instead of signals since intraday stock time series have low signal-to-noise ratios. These factors make it challenging to predict the intraday stock market. Several related methods have been proposed to predict the stock market. Traditional statistical models have a long history in stock forecasting. These models assume that time series are generated from a linear and stationary process and try to model the underlying time series generation process. However, this is inconsistent with the real stock market. The availability and analyzability of financial high-frequency data are gradually enhanced by the advantages of big data and artificial intelligence. Stock market forecasting on CEEMDAN, ADF, ARMA, and LSTM to predict the stock index [28]. The results of these studies suggest that the hybrid models based on decomposition and RNNs perform better than the individual models in stock forecasting. Improving the performance of hybrid models using various methods is a rising trend in stock prediction.
Intraday stock time series are inherently noisy, and therefore need denoising before forecasting. Information entropy can measure the complexity of time series, which helps to filter noise. Delgado-Bonal and Marshak showed that approximate entropy (ApEn) and sample entropy (SampEn) are two algorithms for determining the regularity of series based on the existence of patterns [29]. Considering the temporal dimension of uncertainty, Vinte and Ausloos proposed a comprehensive cross-sectional volatility estimator for stock markets based on intrinsic entropy [30]. Olbrys and Majewska used SampEn to capture sequential regularity in stock market time series [31]. Raubitzek and Neubauer found a correlation between ApEn, SampEn, and predictability in stock market data [32]. Moreover, some studies have introduced entropy to the EMD models. Chou et al. extracted the entropy features (ApEn and SampEn) of EMD-derived signals to predict the fall risk of elderly people [33]. Shang et al. combined the advantages of CEEMD and ApEn to eliminate the influence of noise in partial discharge and demonstrated the hybrid model's effectiveness and superiority [34].
Despite the extensive studies above, using hybrid models combining frequency decomposition methods and machine learning for intraday stock market forecasting still faces several limitations. First, there has been little discussion about the combination of entropy and EMD-based hybrid models in stock forecasting. Specifically, very little attention has been paid to filtering the high-frequency noise effectively after decomposition for stock forecasting. Second, most hybrid models use the same forecasting model for the IMFs indiscriminately, ignoring the characteristics of different signals. Third, most existing RNN models only use RNN layers to learn the sequential dependence in hidden states or directly stack the hidden states as inputs of the next layer. These models cannot effectively learn the global dependence of stock time series.
To overcome such limitations, we propose a CEGH (CEEMD-Entropy-GRU-HA) model for intraday stock market forecasting. This hybrid model is composed of frequency decomposition, entropy, GRU, and attention mechanisms. We first use CEEMD to decompose the original intraday stock market data into IMFs. Then, entropies, which can measure the complexities of signals, are used to denoise the IMFs. On this basis, the retained IMFs are divided into several groups and predicted using the FNN or the GRU with history attention (GRU-HA). Finally, we integrate the prediction results of the groups to obtain the ensemble results.
The main contributions are summarized as follows: (1) To the best of our knowledge, our work is among the first to attempt to create a collaboration between frequency decomposition, entropy, and attention. The proposed model was compared with the individual state-of-the-art models and other decomposition-based models. The experiments on the U.S. and China markets showed that this hybrid model improved accuracy in intraday stock forecasting; (2) we introduced an attention mechanism to IMF forecasting, which enhanced the ability to explore global dependence in stock time series; (3) we used entropies to remove the noise with high complexity, and this could improve the signal-to-noise ratio of the intraday stock data.
The rest of this paper is organized as follows. Section 2 describes the proposed CEGH model. Section 3 describes the experiments on the U.S. stock market and the China stock market. Section 4 evaluates the forecasting performance of the CEGH model and discusses the findings. Section 5 concludes this paper.

The CEGH Model
In this section, we first introduce the framework of CEGH in general and then describe the four main components.

Framework
To improve the signal-to-noise ratio of intraday stock data and extract global dependence more comprehensively, we propose a hybrid CEGH model. This model combines frequency decomposition, entropies, recurrent neural networks, and attention mechanisms. As is shown in Figure 1, the model consists of four stages.

The CEGH Model
In this section, we first introduce the framework of CEGH in general and then describe the four main components.

Framework
To improve the signal-to-noise ratio of intraday stock data and extract global dependence more comprehensively, we propose a hybrid CEGH model. This model combines frequency decomposition, entropies, recurrent neural networks, and attention mechanisms. As is shown in Figure 1, the model consists of four stages. Stage 1: Decomposition. Intraday forecasting with raw stock data is challenging due to the complexity of financial time series. Using highly complex data may result in the poor predicting performance of an excessively complicated machine learning model. To better extract high-level features from the original intraday stock data, CEEMD is used to decompose the raw data into several components with a well-defined instantaneous frequency using IMFs. Stage 1: Decomposition. Intraday forecasting with raw stock data is challenging due to the complexity of financial time series. Using highly complex data may result in the poor predicting performance of an excessively complicated machine learning model. To better extract high-level features from the original intraday stock data, CEEMD is used to decompose the raw data into several components with a well-defined instantaneous frequency using IMFs.
Stage 2: Entropy-based denoising. Information entropy provides information regarding the complexity of the time series, which helps to filter the noise. In this stage, the approximate entropy and sample entropy of each IMF are calculated, and a certain noise threshold is set to filter the noise. Stage 3: Grouping and Forecasting. Different IMFs have different complexities and time-frequency characteristics which need different prediction models. In this stage, the retained IMFs are divided into a high-frequency group, a medium-frequency group, a lowfrequency group, and a trend group. The comprehensive signals of each group are then predicted using the FNN or the GRU-HA separately.
Stage 4: Ensemble. The final prediction results are obtained by integrating the prediction results for each group in Stage 3.

CEEMD
In Stage 1, CEEMD [24] is used to decompose the original intraday stock data into IMFs. CEEMD is a noise-assisted EMD technique, which can be described with the following steps.
Step 1: Add white noise to the original intraday stock data x as follows: where w i (i = 1, 2, . . . , I) are different realizations of white noise and ε 0 is the parameter of white noise power.
Step 2: Decompose x i using EMD to obtain their first modes. By averaging I MF i 1 , the first component can be obtained as follows: Step 3: Calculate the first residue as follows: Step 4: Calculate the second IMF as follows: where E j (·) is the operator that produces the j-th mode obtained using EMD.
Step 5: For k = 2, . . . , K, calculate the k-th residual as follows: Step 6: Calculate the (k + 1)-th IMF as follows: Step 7: Repeat Step 5 and Step 6 until the obtained residual can no longer be decomposed feasibly. The final residual can be described as follows: Therefore, the given intraday stock data can be expressed as follows:

Entropy-Based Denoising
In Stage 2, information entropy is used to remove those IMFs that represented noise components. Because of the low signal-to-noise ratios of intraday stock data, denoising is particularly important. Information entropy is utilized to quantify the complexity and regularity of time series. A larger value means that the sequence has a higher probability of generating a new pattern. The larger the information entropy value, the more complex and irregular the sequence is. Unlike previous studies that used only single entropy for denoising [34], we use approximate entropy (ApEn) [35] and sample entropy (SampEn) [36] to measure the complexities of intraday stock IMFs.
In our model, if the entropy value is above a certain threshold, the IMF is regarded as noise and is discarded. Otherwise, the IMF is assumed to contain useful intraday stock market information and is kept. Given a time series u(i), i = 1, 2, . . . , N, we define an embedding dimension m and a tolerance r. The calculation of ApEn and SampEn can be defined with the following steps.
Step 2: Calculate the distance between V m (i) and V j (j) as follows: Step 3: Calculate approximate entropy. First, measure the regularity and frequency of patterns within tolerance r as follows: Then, calculate the mean value of the logarithm of C m i (r) as follows: Finally, the ApEn can be defined as follows: Step 4: Calculate sample entropy. First, calculate the two coefficients A m (r) and B m (r) as follows: Add them as follows: The SampEn can be defined as follows:

GRU-HA
After decomposing and grouping, we use different models to predict different group signals. The classic FNN is used for the relatively regular groups and the GRU-HA is used for the more complex groups. The GRU-HA combines the GRU and an attention mechanism as shown in Figure 2. First, the GRU layers learn the hidden states of the input stock data, and then the history attention layers exploit the global dependency in the hidden states. The history attention representation and the last hidden state are concatenated into an input vector of a fully connected layer. Finally, we obtain the output of the dense layer as the predicted value. The main components of the GRU-HA are described in detail as follows.

GRU
With the GRU-HA, we first use the GRU to learn the history state of the input data as follows:  With the GRU-HA, we first use the GRU to learn the history state of the input data as follows: where x t denotes the total signal of each group at time step t and h t denotes the hidden state, which contains the sequential dependence, leaned by the GRU. The structure of the GRU is shown in Figure 3.
where the update gate j t z controls how much the unit updates its information. The update gate is defined as follows: The candidate activation j t h is calculated as follows: where t r is the reset gate which is defined as follows:

History Attention
The hidden state t h learned by the GRU can represent the sequential dependencies of t x , but cannot effectively learn the global dependencies. To enhance the efficiency of dependency learning, we combine the GRU with history attention.
We define a query vector T q for the last hidden state T h : We then define a key vector t k and a value vector t v for all hidden states t h : W are the parameters to learn. Then, following Luong's multiplicative At the time step t, the activation h j t of the j-th GRU unit is a linear interpolation between the previous activation h j t−1 and the candidate activation h j t : where the update gate z j t controls how much the unit updates its information. The update gate is defined as follows: The candidate activation h j t is calculated as follows: where r t is the reset gate which is defined as follows:

History Attention
The hidden state h t learned by the GRU can represent the sequential dependencies of x t , but cannot effectively learn the global dependencies. To enhance the efficiency of dependency learning, we combine the GRU with history attention.
We define a query vector q T for the last hidden state h T : We then define a key vector k t and a value vector v t for all hidden states h t : where W q , W k , W v are the parameters to learn. Then, following Luong's multiplicative style, the attention score is computed as follows: Next, the attention scores are used to calculate the attention weight: Finally, the history states attention representation can be calculated as follows:

Experiments
In this section, we empirically evaluate the proposed model using the U.S. stock market and China stock market. All the experiments are performed in Python 3.6.12. Specifically, the experiments related to neural network models and attention models are implemented using the deep learning end-to-end platform Tensorflow. The decomposition models are implemented using PyEMD and pyhht.

Data
The U.S. stock market data were collected from the Trade and Quote database (TAQ) of Wharton Research Data Services (WRDS). The Standard and Poor's 500 (SP500) was chosen to represent the U.S. stock market. We used the price of SPY, the actively traded SP500 ETF, to represent the SP500. The U.S. stock market opens at 9:30 and closes at 16:00 Eastern Time. Every trading day has 13 half-hour intervals. The China stock market data were obtained from the Wind database. The China Securities Index 300 (CSI300) typically represents the overall China stock market. Chinese stock exchanges operate from 9:30 to 11:30 and from 13:00 to 15:00. Every trading day has eight half-hour intervals. The sample period spans from January 2015 through December 2020, a period which covers the well-known market event, the COVID-19 pandemic. The data are collected every half-hour during stock market trading hours.
We selected the last 20% of the sample for testing (out-of-sample data) and split the remaining data (in-sample data) into a training set and a validation set for 9:1.

Decomposing and Denoising
As mentioned before, we used CEEMD to decompose the raw intraday stock market signals into IMFs. The number of trials was set to 200, and the white noise standard deviation was set to 0.2. The decomposition results are shown in Figure 4. The arrangement of IMFs is from high to low frequency, and the residual occurs at the end.
To improve the signal-to-noise ratio of the intraday stock data, we use entropy-based denoising to remove the irregular noise. According to previous research [34], the parameters of the entropy algorithms are defined as: m = 2 and r = 0.2E SD , where E SD is the standard deviation of the IMFs. The ApEn values and SampEn values of the IMFs are shown in Table 1. Different IMFs process different information entropy values, which means that different degrees of complexity exist in the diverse decomposition levels. The ApEn and SampEn values are gradually reduced for both the SP500 or CSI300, so the irregularities of the IMFs are gradually decreased. We set the noise thresholds as λ ApEn = 1 and λ SampEn = 0.6 for the two entropies. The decomposed sub-signals can be retained only if the ApEn value is less than 1 and the SampEn value is less than 0.6. Then, IMF1 and IMF2 are abandoned as noise. IMF3-IMF11 are kept as useful information for training.

Grouping and Forecasting
After denoising, we divide the retained IMFs into several groups and use different models to predict them.

Grouping
The IMFs are grouped based on their characteristics. The standard deviation values are shown in Table 1. For the U.S. stock market, the standard deviations of IMF3 and IMF4 are relatively small, both less than 0.5. This reflects the fluctuations caused by short-term news. Therefore, IMF3 and IMF4 are used as the high-frequency group. The amplitudes of IMF5-IMF8 increase gradually with the standard deviation concentrated in the 0.7-2.5 range, and the period of IMF5-IMF8 becomes longer, reflecting the fluctuations caused by medium-term market factors. Therefore, IMF5-IMF8 are regarded as the mediumfrequency group. The standard deviations of IMF9 and IMF10 are close to each other, around 5.7, and the periods become longer. This represents fluctuations caused by special events, such as the COVID-19 pandemic. Therefore, IMF9 and IMF10 are regarded as the low-frequency group. IMF11, the residual term after decomposition, represents the long-term economic situation of the U.S. stock market and is regarded as the trend group. As with the U.S. stock market, the retained IMFs of the China stock market are divided into four groups. IMF3, IMF4-IMF7, IMF8-IMF10, and IMF11 are regarded as the highfrequency group, the medium-frequency group, the low-frequency group, and the trend group, respectively.
The different groups reflect the volatility of the stock market in different financial cycles. For the U.S. stock market, the fourth part of Figure 5a shows that there is an overall growth trend during 2015-2019, but that the growth rate slows down significantly in 2018 and 2019. As can be seen in the third part, there is a relatively stable trend from 2015 to 2017. The market fluctuates greatly in 2018 and 2019. In 2020, the market resembles a rollercoaster, influenced by the COVID-19 pandemic. The first and second parts of Figure 5a show that there were extreme short-term fluctuations in 2020.
For the China stock market, as is shown in the fourth part of Figure 5b, the stock market has been on an upward trend from 2015 to 2019. Compared with the U.S. stock market, the China stock market displays a steeper curve, reflecting the rapid growth of the Chinese capital market. The fluctuations of the low-frequency group reflect the influences of long-term market factors. As illustrated in the third part of Figure 5b, the stock market in 2015-2016 experiences a rapid growth followed by a huge recession, which corresponds to the stock market crash in 2015. Moreover, the first and second parts of Figure 5b show that this recession is accompanied by intense fluctuations, and the rise in fluctuations was due to the government's bailout policy, which did not reverse the bear market trend. This decline was accompanied by the participation of a large amount of leveraged funds, which also led to a greater fluctuation and a stronger downtrend. rollercoaster, influenced by the COVID-19 pandemic. The first and second parts of Figure  5a show that there were extreme short-term fluctuations in 2020. For the China stock market, as is shown in the fourth part of Figure 5b, the stock market has been on an upward trend from 2015 to 2019. Compared with the U.S. stock market, the China stock market displays a steeper curve, reflecting the rapid growth of the Chinese capital market. The fluctuations of the low-frequency group reflect the influences of long-term market factors. As illustrated in the third part of Figure 5b, the stock market in 2015-2016 experiences a rapid growth followed by a huge recession, which corresponds to the stock market crash in 2015. Moreover, the first and second parts of Figure  5b show that this recession is accompanied by intense fluctuations, and the rise in fluctuations was due to the government's bailout policy, which did not reverse the bear market trend. This decline was accompanied by the participation of a large amount of leveraged funds, which also led to a greater fluctuation and a stronger downtrend.

Forecasting
Before training, the input data had to be rescaled within the range of 0 and 1. We use the min-max scaler to normalize the grouped IMFs as follows:

Forecasting
Before training, the input data had to be rescaled within the range of 0 and 1. We use the min-max scaler to normalize the grouped IMFs as follows: After predicting, the output values were converted back to their original scale by reversing Equation (30). Table 2 shows the hyperparameters of the forecasting models. As mentioned before, we select different models to predict groups by their characteristics. Apparently, predicting a complex group is much more difficult than predicting a relatively regular one. As is shown in Figure 1, the high-frequency group and the medium-frequency group are predicted using the GRU-HA, and the low-frequency group and the trend group are predicted using the FNN. The FNN is composed of dense layers, i.e., regular densely connected neural network layers. The GRU-HA consists of GRU layers, attention layers, and dense layers. The grid search method is employed to determine the appropriate network structure, including the number of hidden layers and the number of units. For the hidden layers, we apply the rectified linear unit (ReLU) activation function to obtain the maximum of 0 and the input tensor as follows: ReLU(x) = max(0, x). For training, we use Adam [37], the stochastic gradient descent method based on adaptive moment estimation, for optimization. Its default learning rate is set to 0.001. In terms of loss function, we use the mean squared error (MSE) to compute the mean of squares of errors between labels and predicted values as follows: The number of epochs is the number of complete passes through the entire training set. Since too many epochs could lead to overfitting, we use early stopping as regularization. A training loop will check at the end of every epoch whether the loss is no longer decreasing. The maximum number of epochs is set to 500.

Results and Discussion
To test the forecasting performance of our model on the intraday stock market, we conducted experiments using the proposed CEGH model and the baseline methods. In this section, we first define the evaluation metrics for the forecasting models. The baseline methods are then described. Last, we report the results and discuss the findings in two phases: group results and ensemble results.

Evaluation Metric
The evaluation metric indicates the difference between the predicted value and the actual value. In our study, mean absolute error (MAE) and root mean squared error (RMSE) are used as metrics to evaluate the forecasting models, and they can be defined as follows: whereŷ i is the predicted value and y i is the actual value. The lower the above two error values, the better the forecasting performance.

Baseline Methods
The proposed CEGH model is compared with some baseline models, including: Feedforward neural network (FNN): the classical artificial neural network in which the connections of nodes do not form a loop.
Long short-term memory (LSTM) [38]: a wildly used extension of the recurrent neural networks (RNN) with three logic gates.
Gated recurrent unit (GRU) [39]: a variant of LSTM with a simpler unit structure. GRU-HA: a hybrid model that combines a GRU and history state attention.
CEEMD-GRU-HA: a hybrid model using CEEMD, a GRU, and HA. Compared with CEGH, CEEMD-GRU-HA does not have an entropy-based denoising sector.
The FNN, LSTM, GRU, and GRU-HA models are undecomposed models. The CEEMD-GRU-HA model is a decomposition-base model. Specifically, the GRU-HA and CEEMD-GRU-HA models are proposed in this paper.

Group Results
As mentioned in Section 3.3 we divided the IMFs into a high-frequency group, a medium-frequency group, a low-frequency group, and a trend group. After that, the total signal of each group was predicted. Using the method described above, we obtained the group results. Table 3 shows the performance comparison for the groups in stage 3. The control group was composed of IMF1, IMF2, and IMF3. The different total signals of the groups have different characteristics and degrees of complexity. Training the models on the groups with low complexity is easier than training them using the IMFs with high complexity. For the control group, the high-frequency group, and the medium-frequency group, we used the FNN, LSTM, and GRU models as baseline methods. The trend group and low-frequency group are predicted directly using the FNN due to the relatively low complexity of their patterns. Some interesting aspects of this table are highlighted below. (1) The recurrent neural network models performed better than the FNN, a finding which is consistent with results obtained in previous studies [8]. For the high-frequency U.S. stock market group, the MAE values of the FNN, LSTM, and GRU models were 0.0774, 0.0312, and 0.0308, respectively. For the high-frequency China stock market group, the MAE values of the FNN, LSTM, and GRU models were 1.1830, 0.6829, and 0.4744, respectively. These results indicate that for the two RNN variants, the GRU model performed better than LSTM model, a finding which is consistent with a previous study [9]. It may be reasonable to suppose that GRU is a better choice when the size of training sample is limited.
(2) The GRU-HA performed better than the GRU. For the high-frequency U.S. stock market group, the MAE value of the GRU-HA was 18% lower than that of the GRU. The MAE value of the GRU-HA was as much as 80% lower than that of the GRU for the medium-frequency group. For high-frequency China stock market group, the MAE value of the GRU-HA was 7% lower than that of the GRU. The MAE value of the GRU-HA was 35% lower than that of the GRU for the medium-frequency group. A similar descending trend was also true for the RMSE value. These results imply that adding attention to the GRU enhances the prediction performance. Moreover, the performance improvement of the medium-frequency group was much larger than that of the high-frequency group. The high-frequency group retains only IMF3 after denoising. The GRU is sufficient for identifying this sequential pattern of relatively low complexity. The medium-frequency group composed of multiple IMFs is more complex. It may be inferred that the GRU-HA has more significant advantages for complex time series prediction.

Ensemble Results
In our study, the ensemble results were obtained by integrating the prediction results of the high-frequency group, the medium-frequency group, the low-frequency group, and the trend group, as explained in the previous subsection. The baseline method, CEEMD-GRU-HA, integrated the results of the above groups. The FNN, LSTM, GRU, and GRU-HA models were undecomposed models, as the original intraday stock data were predicted by these models directly. Table 4 shows the performance comparison of the CEGH and baseline methods for the intraday stock market. In general, the proposed model had the lowest forecasting error, which verified its effectiveness. Some interesting observations are highlighted below. (1) Specifically, the decomposition-based models outperformed the undecomposed models, which is consistent with results obtained in previous studies [27,40,41]. For U.S. stock market, the MAE values of the decomposition-based models were about 20% lower than that of the best-performing undecomposed model, the GRU-HA. For China stock market, the MAE values of the decomposition-based models were about 40% lower than that of the best-performing undecomposed model, the GRU-HA. The RMSE values exhibited a similar descending trend. These results indicate that such hybrid methods could extract the sequential representation more efficiently. A possible explanation for this is that the decomposition method could transform the original intraday stock time series into several relatively regular components. Apparently, time series with low complexity are much easier to predict.
(2) According to the results for the undecomposed models in Table 4, the GRU-HA had the lowest error values, which is consistent with the group results in Section 4.3. The results indicate that the RNN models performed better than the FNN. They also imply that the idea of introducing history attention to the GRU for intraday stock market forecasting is effective. Compared with the existing models that only use RNNs to learn the sequential dependence in stock time series, the GRU-HA extracted global dependence more comprehensively.
(3) According to the results for the decomposition-based models in Table 4, the CEGH outperformed the CEEMD-GRU-HA. As can be seen, the MAE errors of the CEGH were approximately 7% lower than those of the CEEMD-GRU-HA for both the U.S. and China stock markets. Specifically, from the comparison of the control group and the high-frequency group in Table 3, it can be seen that the errors of the high-frequency group were much lower than that of the Control group. This could be attributed to the entropy-based denoising section of the CEGH. ApEn and SampEn can measure the complexity of stock time series to recognize the noisy components in IMFs. It is an effective way to improve forecasting performance because it allows us to increase the signal-to-noise ratio before training the model. Furthermore, Figure 6 shows brief graphs of the ensemble results for a better comparison. In these figures, the blue lines represent the actual values, the red lines indicate the predicted values, and the green lines express the errors between the actual values and the predicted values. As we can see, the proposed CEGH model had a poor performance during the COVID-19 outbreak. Figure 6a indicates that the absolute values of the errors became significantly larger from February to March 2020. During this time, the U.S. stock market fell sharply due to the impact of the COVID-19 pandemic. As we can see in Figure 6b, the absolute values of the errors were extremely large in early February. For the China stock market, the errors were highly volatile when the market fell again around March 2020. It is noteworthy that the stock market was closed due to the Chinese New Year holiday at the end of January. This may imply that the proposed model is not good at learning from the limited historical data of intraday stock markets under the influence of black swan events.

Conclusions
We proposed a hybrid CEGH model to better analyze and predict intraday stock markets. This model combined frequency decomposition, entropy, GRU, and history attention. Based on the results and discussion of experiments on the U.S. and China stock markets, the following conclusions can be drawn: (1) The decomposition-based models outperformed the undecomposed models, demonstrating the advantages of CEEMD. This could be due to the fact that CEEMD reduced the complexity of the intraday stock time series. (2) The GRU-HA had a better performance than other individual models. This implies that introducing history attention to the GRU for intraday stock market forecasting is effective, as the history attention could allow it to extract global dependence more comprehensively. (3) The prediction error of the CEGH is smaller than that of the CEEMD-GRU-HA. This indicates that the entropy-based denoising section of the CEGH improved the forecasting performance. An explanation could be that the ApEn and SampEn measured the complexity of the stock time series to recognize the noisy components in IMFs, which allowed us to increase the signal-to-noise ratio before training the model. (4) The CEGH performed poorly at the beginning of the COVID-19 pandemic since it is difficult

Conclusions
We proposed a hybrid CEGH model to better analyze and predict intraday stock markets. This model combined frequency decomposition, entropy, GRU, and history attention. Based on the results and discussion of experiments on the U.S. and China stock markets, the following conclusions can be drawn: (1) The decomposition-based models outperformed the undecomposed models, demonstrating the advantages of CEEMD. This could be due to the fact that CEEMD reduced the complexity of the intraday stock time series. (2) The GRU-HA had a better performance than other individual models. This implies that introducing history attention to the GRU for intraday stock market forecasting is effective, as the history attention could allow it to extract global dependence more comprehensively. (3) The prediction error of the CEGH is smaller than that of the CEEMD-GRU-HA. This indicates that the entropy-based denoising section of the CEGH improved the forecasting performance. An explanation could be that the ApEn and SampEn measured the complexity of the stock time series to recognize the noisy components in IMFs, which allowed us to increase the signal-to-noise ratio before training the model. (4) The CEGH performed poorly at the beginning of the COVID-19 pandemic since it is difficult to learn patterns from limited historical training data during black swan events. This tells us that machine learning forecasting models should not be blindly relied on when extreme events occur in the stock market.
Overall, our results provide compelling evidence for the effectiveness of the CEGH in intraday stock market forecasting. However, some limitations are worth noting: (1) the stock forecasting model was only evaluated using error values; (2) the model did not take into consideration the fact that the stock market changed its behavior due to the economic situation; (3) the experiments were only conducted on the U.S. and China stock markets. Future works could be investigate the following: (1) incorporating the trading strategy and the profit evaluation; (2) combining adaptive learning methods and trading rules based on financial concepts; (3) comparing European or other conditions.