Predicting the Trend of Stock Market Index Using the Hybrid Neural Network Based on Multiple Time Scale Feature Learning

: In the stock market, predicting the trend of price series is one of the most widely investigated and challenging problems for investors and researchers. There are multiple time scale features in ﬁnancial time series due to di ﬀ erent durations of impact factors and traders’ trading behaviors. In this paper, we propose a novel end-to-end hybrid neural network, a model based on multiple time scale feature learning to predict the price trend of the stock market index. Firstly, the hybrid neural network extracts two types of features on di ﬀ erent time scales through the ﬁrst and second layers of the convolutional neural network (CNN), together with the raw daily price series, reﬂect relatively short-, medium- and long-term features in the price sequence. Secondly, considering time dependencies existing in the three kinds of features, the proposed hybrid neural network leverages three long short-term memory (LSTM) recurrent neural networks to capture such dependencies, respectively. Finally, fully connected layers are used to learn joint representations for predicting the price trend. The proposed hybrid neural network demonstrates its e ﬀ ectiveness by outperforming benchmark models on the real dataset.


Introduction
The trend of the stock market index refers to the upward or downward movements of price series in the future. Accurately predicting the trend of the stock market index can help investors avoid risks and obtain higher returns in the stock exchange [1]. Hence, it has become a hot field and attracted many researchers' attention.
The above studies were conducted from single time scale features of the stock market index, but it is also meaningful for studying from multiple time scale features. There are multiple time scale features in the stock market index. On the one hand, the stock market is affected by many factors such as economic environment, political policy, industrial development, market news, natural factors and so on. And the durations of factors are different from each other. On the other hand, each investor has a different investment cycle, such as long-and short-term investment. Therefore, we can observe the features of multiple time scales in the stock market index. Among them, the features of a long time scale can reflect the long-term trend of the price, while the features of short time scale the short-term fluctuation of the price. The combination of multi-scale features facilitates accurate prediction.
In recent years, the convolutional neural network (CNN) has shown high power in feature extraction. Inspired by existing research, we use a CNN to extract multiple time scale features for more comprehensive learning of price sequences. For instance, the daily closing price series in Figure  1a is learned by a two-layer convolutional neural network. The outputs of the two layers of the CNN are called Feature map1 and Feature map2, respectively. As illustrated in Figure 1b, each point of the feature map corresponds to a region of the original price series (termed the receptive field [23]), and it can be considered as the description for the region. Due to different receptive fields, Feature map1 and Feature map2 describe the input price series from two-time scales. Compared with Feature map1, Feature map2 describes the original price sequence on a larger time scale. Therefore, we regard the outputs of different layers of the CNN as features of varying time scales of the original price series. 1 x 2 x 3 Meanwhile, the Long Short-Term Memory (LSTM) network works well on sequence data with long-term dependencies due to the internal memory mechanism [24,25]. Many studies use LSTM networks to learn the long-term relationship of features extracted by the CNN. In this way, we utilize LSTMs to learn long-term dependencies of multiple time scale feature sequences obtained by the CNN.
In this paper, we present a novel end-to-end hybrid neural network to learn multiple time scale features for predicting the trend of the stock market index. The network first combines the features obtained from different layers of the CNN with daily price subsequences to form multiple time scale features, reflecting the short-, medium-, and long-term laws of price series. Subsequently, three LSTM recurrent neural networks are utilized to capture time dependencies in multiple time scale features obtained in the previous step. Then, several fully connected layers combine features learned by LSTMs to predict the trend of the stock market index. The experimental analysis of real datasets demonstrates that the proposed hybrid network outperforms a variety of baselines in terms of trend prediction in accuracy.
The rest of the paper is organized as follows. Section 2 presents related work. In Section 3, we present the proposed hybrid neural network based on multiple time scale features of the stock market index. Section 4 reports experiments and results, and the paper is discussed in Section 5.

Related Work
From traditional methods to deep learning models, there are numerous techniques on forecasting financial time series, among which deep learning has received widespread attention due to its outperformance. Meanwhile, the Long Short-Term Memory (LSTM) network works well on sequence data with long-term dependencies due to the internal memory mechanism [24,25]. Many studies use LSTM networks to learn the long-term relationship of features extracted by the CNN. In this way, we utilize LSTMs to learn long-term dependencies of multiple time scale feature sequences obtained by the CNN.
In this paper, we present a novel end-to-end hybrid neural network to learn multiple time scale features for predicting the trend of the stock market index. The network first combines the features obtained from different layers of the CNN with daily price subsequences to form multiple time scale features, reflecting the short-, medium-, and long-term laws of price series. Subsequently, three LSTM recurrent neural networks are utilized to capture time dependencies in multiple time scale features obtained in the previous step. Then, several fully connected layers combine features learned by LSTMs to predict the trend of the stock market index. The experimental analysis of real datasets demonstrates that the proposed hybrid network outperforms a variety of baselines in terms of trend prediction in accuracy.
The rest of the paper is organized as follows. Section 2 presents related work. In Section 3, we present the proposed hybrid neural network based on multiple time scale features of the stock market index. Section 4 reports experiments and results, and the paper is discussed in Section 5.

Related Work
From traditional methods to deep learning models, there are numerous techniques on forecasting financial time series, among which deep learning has received widespread attention due to its outperformance.
LSTM is the most preferred deep learning model in studies of predicting financial time series. LSTM can extract dependencies in time series by the internal memory mechanism. In [11], LSTM networks were used for predicting out-of-sample directional movements for the constituent stocks of the S&P 500. They found LSTM networks outperform memory-free classification methods. Si et al. [12] constructed a trading model for the Chinese futures market through DRL and LSTM. They used deep neural networks to discover market features. Then an LSTM was applied to make continuous trading decisions. Bao et al. [13] use Wavelet Transforms and Stacked Autoencoders to learn useful information in technical indicators and use LSTMs to learn time dependencies for the forecasting of stock prices. In [14], limit order book and history information was input to the LSTM model for the determination of the stock price movements. Tsantekidis et al. [15] utilized limit order book and the LSTM model for the trend prediction. These works prove that LSTMs can successfully extract time dependencies in the financial sequence. However, these works do not consider the multiple time scale features in the price series.
Several studies focused on utilizing CNN models inspired by their remarkable achievements in other fields, such as image recognition [26], speech processing [27], natural language processing [28], etc. Convolutional neural networks can directly extract features of the input without sophisticated preprocessing, and can efficiently process various complex data. Chen et al. [16] used one-dimensional CNN with an agent-based RL algorithm to study Taiwan stock index futures. In [17], Siripurapu et al. convert price sequences into pictures and then use the CNN to learn useful features for prediction. In [18], a new CNN model was proposed to predict the trend of the stock prices. Correlations between instances and features are utilized to order the features before they are presented as inputs to the CNN. These works use CNNs to extract features of a single time scale in the price series. But in financial time series, multiple time scale features are ubiquitous, and it is meaningful to study them.
Besides, there are some studies combine the advantages of CNN and LSTM to form hybrid networks. In [19], the proposed model makes the stock selection strategy by using the CNN and then makes the timing strategy by using the LSTM. Wang et al. [20] proposed a Deep Co-investment Network Learning (DeepCNL) model, which combined convolutional and recurrent neural layers. Both [19,20] take advantage of the combination of CNN and LSTM. However, they ignore the multiple time scale features that exist in the financial time series. In [21], numerous pipelines combining CNN and bi-directional LSTM for improved stock market index prediction. In [22], both convolutional and recurrent neurons are integrated to build the multi-filter structure, so that the information from different feature spaces and market views can be obtained. Although the authors in [21,22] proposed models based on multi-scale features, they used multiple pipelines or networks to extract multi-scale features, which makes the model complex and vast, which is not conducive to training or obtaining useful information.
Differing from previous work, we propose a hybrid neural network that mainly focuses on multiple time scale features in financial time series for trend prediction. We innovatively use a CNN to extract features on multiple time scales, simplifying the model and facilitating better predictions. Then we use several LSTMs to learn time dependencies in feature sequences extracted by the CNN, and fully connected layers for higher-level feature abstraction.

Hybrid Neural Network Based on Multiple Time Scale Feature Learning
In this section, we provide the formal definition of the trend learning and forecasting problem. Then, we present the proposed hybrid neural network based on multiple time scale feature learning.

Problem Formulation
In the stock market, there are 5 trading days in a week and 20 trading days in a month. Investors are usually interested in price movements after a week or a month. Therefore we use a series of closing prices for 40 consecutive days to predict the trend of the closing price in n trading days, where the values of n are 5 (a week) and 20 (a month). Formally, we define the sequence of historical closing prices as X i = x i+1 , x i+2 , · · · x i+t , · · · x i+40 , where x i+t is the value of the closing price on the (i + t)-th day. Meanwhile, the upward or downward trend to be predicted is defined by the following rule: where Y i denotes the trend of the closing price a week (n = 5) or a month (n = 20) later, 0 represents the downward trend, and 1 represents the upward trend, x i+40 is the closing price value of the (i + 40)-th day and x i+40+n is the closing price value of the (i + 40 + n)-th day. Then, we aim to propose a hybrid neural network to learn the function f (X) to predict the price trend one week or one month later.

Hybrid Neural Network Based on Multiple Time Scale Feature Learning
In this part, we present an overview of the proposed hybrid neural network based on multiple time scale feature learning for the trend forecasting. Then, we detail each component of the hybrid neural network.

Overview
The idea of the proposed hybrid neural network is divided into three parts. The first part is to extract the characteristics of different time scales of price series through different layers of a CNN, and combine them with the original daily price series to reflect the relatively short-, medium-and long-term changes in the price sequence, respectively. The second part is to use multiple LSTMs to learn time dependencies of features of different time scales. The last part is to combine all the information learned by LSTMs through a fully connected neural network to forecast the trend of the closing price in the future. Though the hybrid neural network is composed of different kinds of network architectures, it can be jointly trained with one loss function. Figure 2 shows the structure of the hybrid neural network, which can be viewed as a combination of three models based on single time scale feature learning. The three models are shown in Figure 3. Next, we will introduce each part of the proposed model in detail.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 15 closing prices as is the value of the closing price on the i+t-th day. Meanwhile, the upward or downward trend to be predicted is defined by the following rule: i n x + + is the closing price value of the i+40+n-th day.
Then, we aim to propose a hybrid neural network to learn the function ( ) f X to predict the price trend one week or one month later.

Hybrid Neural Network Based on Multiple Time Scale Feature Learning
In this part, we present an overview of the proposed hybrid neural network based on multiple time scale feature learning for the trend forecasting. Then, we detail each component of the hybrid neural network.

Overview
The idea of the proposed hybrid neural network is divided into three parts. The first part is to extract the characteristics of different time scales of price series through different layers of a CNN, and combine them with the original daily price series to reflect the relatively short-, medium-and long-term changes in the price sequence, respectively. The second part is to use multiple LSTMs to learn time dependencies of features of different time scales. The last part is to combine all the information learned by LSTMs through a fully connected neural network to forecast the trend of the closing price in the future. Though the hybrid neural network is composed of different kinds of network architectures, it can be jointly trained with one loss function. Figure 2 shows the structure of the hybrid neural network, which can be viewed as a combination of three models based on single time scale feature learning. The three models are shown in Figures 3. Next, we will introduce each part of the proposed model in detail.  Figure 2. The proposed hybrid neural network based on multiple time scale feature learning.

Multiple Time Scale Feature Learning
Considering that there are multiple time scale features in the stock market index sequence and the combination of these features can help to predict the price trend more accurately. In this paper, we research the internal laws of price movement from three time scales. On the one hand, the daily price subsequence represented by 1 F can be regarded as the feature sequence of the minimum time scale. It can reflect local price changes, which are vital to the prediction. On the other hand, the output of different layers of the CNN describes the original price series from different time scales. These outputs can be regarded as the characteristics of different time scales. Since the CNN has two layers, we can get the features on two different time scales, which are represented as 2 F and 3 F . In this way, we obtain three kinds of features, 1 F , 2 F and 3 F , which can reflect relatively short-, mediumand long-term trend changes, respectively.

Learning the Dependencies in Multiple Time Scale Features
We use three LSTMs to learn time dependencies in features of different time scales. We need to convert feature maps extracted by the CNN into feature sequences suitable for LSTMs by map to sequence layer. As shown in Figure 4, feature maps represent features learned by the CNN. Different colors indicate that these feature maps are obtained by different convolution kernels. The points in the feature map are arranged chronologically from left to right. The feature sequence represents the input of LSTM. The feature vector in the feature sequence is represented by t fv , and the subscript t corresponds to its order in the series. Each feature vector is generated from left to right on feature maps by column. This means the i-th feature vector is the concatenation of the i-th columns of all the maps.

Multiple Time Scale Feature Learning
Considering that there are multiple time scale features in the stock market index sequence and the combination of these features can help to predict the price trend more accurately. In this paper, we research the internal laws of price movement from three time scales. On the one hand, the daily price subsequence represented by F 1 can be regarded as the feature sequence of the minimum time scale. It can reflect local price changes, which are vital to the prediction. On the other hand, the output of different layers of the CNN describes the original price series from different time scales. These outputs can be regarded as the characteristics of different time scales. Since the CNN has two layers, we can get the features on two different time scales, which are represented as F 2 and F 3 . In this way, we obtain three kinds of features, F 1 , F 2 and F 3 , which can reflect relatively short-, medium-and long-term trend changes, respectively.

Learning the Dependencies in Multiple Time Scale Features
We use three LSTMs to learn time dependencies in features of different time scales. We need to convert feature maps extracted by the CNN into feature sequences suitable for LSTMs by map to sequence layer. As shown in Figure 4, feature maps represent features learned by the CNN. Different colors indicate that these feature maps are obtained by different convolution kernels. The points in the feature map are arranged chronologically from left to right. The feature sequence represents the input of LSTM. The feature vector in the feature sequence is represented by f v t , and the subscript t corresponds to its order in the series. Each feature vector is generated from left to right on feature maps by column. This means the i-th feature vector is the concatenation of the i-th columns of all the maps. Each LSTM network learns the time dependencies in its corresponding feature sequence and the process is described as follows. In the LSTM, each cell has three main gates: the input gate, the forget gate and the output gate. Suppose that the input feature vector at the time t is t fv and the hidden state at the previous time step is 1 The forget gate t f is calculated by: The output gate t o is calculated by: The principle of the memory mechanism is to control the addition of new information through the input gate, and to forget of the former information through the forget gate. The old information is represented by 1 t c − , and the latest information is calculated as follows: The information stored in the memory unit is updated as follows: Then the output of the LSTM cell is expressed as: where ⨀ means element-wise product.
After all the feature vectors complete the above process, we will use the final output t h as the time dependencies learned by the LSTM cell. The time dependences learned by the three LSTMs are denoted by 1 D , 2 D and 3 D , which are all one-dimensional vectors.

Feature Fusion and Output
The concatenate layer is used to combine the output representations from three LSTM recurrent neural networks. As shown in Figure 2 Then, such a joint feature is fed to the fully connected layers to provide the trend prediction. Mathematically, the prediction of the hybrid neural network is expressed as: Each LSTM network learns the time dependencies in its corresponding feature sequence and the process is described as follows. In the LSTM, each cell has three main gates: the input gate, the forget gate and the output gate. Suppose that the input feature vector at the time t is f v t and the hidden state at the previous time step is h t−1 .
The input gate i t is calculated by: The forget gate f t is calculated by: The output gate o t is calculated by: The principle of the memory mechanism is to control the addition of new information through the input gate, and to forget of the former information through the forget gate. The old information is represented by c t−1 , and the latest information is calculated as follows: The information stored in the memory unit is updated as follows: Then the output of the LSTM cell is expressed as: where means element-wise product. After all the feature vectors complete the above process, we will use the final output h t as the time dependencies learned by the LSTM cell. The time dependences learned by the three LSTMs are denoted by D 1 , D 2 and D 3 , which are all one-dimensional vectors.

Feature Fusion and Output
The concatenate layer is used to combine the output representations from three LSTM recurrent neural networks. As shown in Figure 2, D 1 , D 2 and D 3 are concatenated to form a joint feature. Then, such a joint feature is fed to the fully connected layers to provide the trend prediction. Mathematically, the prediction of the hybrid neural network is expressed as: where ϕ is the sigmoid activation function. W 1 , W 2 and W 3 are weights for the first fully connected layer. W and b are the weights and bias of the second fully connected layer.

Experiments and Results
In this section, we report experiments and detailed results to demonstrate the process of obtaining multiple time scale features and the advantage of the proposed model by comparing it to a variety of baselines.

Experimental Data
The S&P 500 index (formerly Standard & Poor's 500 Index) is a market capitalization-weighted index of the 500 largest U.S. publicly traded companies by market value, and it is widely used in scientific research. In this paper, we study the daily closing price dataset of the S&P 500 index from 30 January 1999 to 30 January 2019 for a total of 20 years obtained from the Yahoo Finance Website [29].
Data normalization is required to transform raw time series data into an acceptable form for applying a machine learning technique. The normalization makes raw closing price series in the interval [0, 1] according to the following formula: where X is the original closing price before normalization, X min and X max are the minimum value and the maximum value before X is normalized, respectively. X is the data after normalization. After the normalization, data instances are built by combining historical closing prices and the target trend for each time series subsequence. We then take the samples from 30 January 1999 to 30 January 2015 as the training set, and the samples from 1 February 2015 to 30 January 2017 as the validation set, and the remaining samples were used for testing.

Models based on single time scale features
• Model based on F 1 : As shown in Figure 3a, the model directly treats the daily price series as a relatively short-term feature sequence that is subsequently learned by an LSTM and fully connected layers.

•
Model based on F 2 : As shown in Figure 3b, the model uses the convolutional neural network with one layer to extract the relatively medium-term features, and then predicts the price trend through an LSTM and fully connected layers.

•
Model based on F 3 : As shown in Figure 3c, the relatively long-term features are extracted by a CNN with two layers. Then it uses an LSTM and fully connected layers to forecast the trend of the closing price.
• LSTM: Because of the time dependencies in financial time series, LSTM is often used in financial forecasting, such as [11,12,15]. We mainly adjusted the parameters L (number of network layers), and N (number of hidden units). We select appropriate parameters in the sets L ∈ {1, 2, 3} and N ∈ {10, 20, 30}. • CNN: Similar to LSTM, CNN is also a common model in this field, such as [16][17][18]. We mainly adjusted parameters L (number of network layers), S (convolution kernel size) and N (number of convolution kernels  [22], the authors proposed a novel end-to-end model named multi-filters neural network (MFNN) specifically for prediction on financial time series. Both convolutional and recurrent neurons are integrated to build the multi-filters structure, so that the information from different feature spaces and market views can be obtained.

Evaluation Metric
Generally, stock index trend prediction can be considered as a classification problem. In order to evaluate the quality of predictions, we use Accuracy as the evaluation metric. Accuracy represents the proportion of samples that be correctly predicted to the total number of samples. The higher the Accuracy, the better the predictive performance of the model. Accuracy is calculated as follows: where N correct represents the samples with the same predicted trend as the actual trend, and N all represents the total number of samples.

Training
The proposed hybrid neural network includes a CNN to extract multiple time scale features, three LSTMs to learn time dependencies, and fully connected layers for higher-level feature abstraction. The CNN has two layers containing 1-d convolution, activation, and pooling operations. Convolution operations of two layers have 10 and 20 filters, respectively. Considering that the data we studied is a one-dimensional price series, 1-d convolution is sufficient here. The activation function is LeakyReLU. The pooling operation is max pooling with size 2 and stride 2. The number of LSTM units is 10, and the number of units in the subsequent fully connected layer is 10 and 1.
Besides, the loss function is binary cross-entropy. An Adam optimizer is used to train the neural network. The learning rate is initialized to 0.001, and decays [30] with the iteration according to Equation (11).
where lr represents the learning rate, I describes the current iteration, and d represents the attenuation coefficient which takes a value of 10 −6 . The value of the batch size is 80. We use early stopping to prevent the network from overfitting. That is, training will stop if the accuracy on the validation set has not been improved for 70 epochs. In addition, we use dropout [31] to control the capacity of neural networks to prevent overfitting, and use batchnormalization [32] after convolution operation to reduce internal covariate shift for the better performance of the network.

Determination of Time Scales of Features
Considering that we predict the price trend in combination with multiple time scale features (F 1 , F 2 and F 3 ), we need to determine appropriate time scales for these features in order to predict more accurately. Specifically, since we directly treat the daily data as the feature sequence (F 1 ), we only need to determine time scales for features learned by CNN (F 2 and F 3 ). Time scales of F 2 and F 3 correspond to the receptive fields, which are determined by kernel size and stride of max-pooling operation and convolution operation. Since we fix other parameters as common values, we only need to adjust kernel sizes of two convolution layers to find more appropriate time scales. The results are shown in Figure 5. validation set has not been improved for 70 epochs. In addition, we use dropout [31] to control the capacity of neural networks to prevent overfitting, and use batchnormalization [32] after convolution operation to reduce internal covariate shift for the better performance of the network.

Determination of Time Scales of Features
Considering that we predict the price trend in combination with multiple time scale features ( 1 F , 2 F and 3 F ), we need to determine appropriate time scales for these features in order to predict more accurately. Specifically, since we directly treat the daily data as the feature sequence ( 1 F ), we only need to determine time scales for features learned by CNN ( 2 F and 3 F ). Time scales of 2 F and 3 F correspond to the receptive fields, which are determined by kernel size and stride of maxpooling operation and convolution operation. Since we fix other parameters as common values, we only need to adjust kernel sizes of two convolution layers to find more appropriate time scales. The results are shown in Figure 5. In Figure 5, we observe that when predicting the price trend one week later, the convolution kernel sizes of two convolutional layers are preferably 7 and 5, respectively. Therefore, the time scales of 2 F and 3 F are 8 trading days and 18 trading days, that is, each point in 2 F is obtained by the closing price sequence of 8 trading days, and each point in 3 F is obtained by the closing price sequence of 18 trading days. Similarly, when predicting the price trend one month later, the sizes of two convolution kernels are preferably 9 and 7. In this case, the time scales of 2 F and 3 F are 10 trading days and 24 trading days.
In addition, we can find that time scales of the features used to predict the price trend in a month is larger than that used to predict the price trend in a week. This is consistent with our experience, that is, larger-grained data are used for long-term forecasting, and smaller-grained data are used for short-term forecasting. In Figure 5, we observe that when predicting the price trend one week later, the convolution kernel sizes of two convolutional layers are preferably 7 and 5, respectively. Therefore, the time scales of F 2 and F 3 are 8 trading days and 18 trading days, that is, each point in F 2 is obtained by the closing price sequence of 8 trading days, and each point in F 3 is obtained by the closing price sequence of 18 trading days. Similarly, when predicting the price trend one month later, the sizes of two convolution kernels are preferably 9 and 7. In this case, the time scales of F 2 and F 3 are 10 trading days and 24 trading days.
In addition, we can find that time scales of the features used to predict the price trend in a month is larger than that used to predict the price trend in a week. This is consistent with our experience, that is, larger-grained data are used for long-term forecasting, and smaller-grained data are used for short-term forecasting.

Comparisons with Models Based on Single Time Scale Features
In this part, we investigate the advantages of the proposed hybrid neural network in combining multiple time scale features. We compared the proposed network with models based on single time scale features in accuracy. The results are reported in Table 1. The models based on F 1 , F 2 , and F 3 represent the three models in Figure 3. They are all based on single time scale features.  Table 1 shows that combining the features of multiple time scales can promote accurate prediction. The models based on F 1 , F 2 , and F 3 are based on the features of a single time scale, while the proposed hybrid neural network combines the features of multiple time scales to predict the trend. The predictive performance of the hybrid neural network is better than other networks. Therefore, combining the features of multiple time scales is helpful.
In the next group of experiments, taking forecasting price trends one week later as an example, we visualize the trend prediction using test samples, as shown in Figures 6-8. From these graphs, we can intuitively understand the benefits of combining features of multiple time scales.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 15 In this part, we investigate the advantages of the proposed hybrid neural network in combining multiple time scale features. We compared the proposed network with models based on single time scale features in accuracy. The results are reported in Table 1. The models based on 1 F , 2 F , and 3 F represent the three models in Figure 3. They are all based on single time scale features.

73.64
The proposed model 74.55 Table 1 shows that combining the features of multiple time scales can promote accurate prediction. The models based on 1 F , 2 F , and 3 F are based on the features of a single time scale, while the proposed hybrid neural network combines the features of multiple time scales to predict the trend. The predictive performance of the hybrid neural network is better than other networks. Therefore, combining the features of multiple time scales is helpful.
In the next group of experiments, taking forecasting price trends one week later as an example, we visualize the trend prediction using test samples, as shown in Figures 6-8. From these graphs, we can intuitively understand the benefits of combining features of multiple time scales.    In Figure 6, we can find that due to the influence of short-term fluctuations, in some cases, the model based on 1 F fails to predict the price trend in a week accurately. Similarly, in Figure 7, Model based on 2 F extracts medium-term features to predict the trend, but ultimately fails due to the neglect of long-term and short-term changes in price series. In Figure 8, the model based on 3 F only pays attention to the long-term characteristics in the price series and cannot accurately predict the trend. Therefore, we can conclude that predicting the price trend based on features of a single time scale is not feasible in some cases due to the complexity and variability of price series. It makes sense to combine the features of multiple time scales to predict the future price trend.

Comparisons with Existing Models
Due to the differences in data preprocessing, model training methods, and learning targets, it is not easy to directly compare among existing works, we try to select some models commonly used in financial forecasting and adjust these models to the best state to make relatively fair comparisons. The experimental results are shown in Table 2.

Forecast Horizon
Model Accuracy (%)  In Figure 6, we can find that due to the influence of short-term fluctuations, in some cases, the model based on 1 F fails to predict the price trend in a week accurately. Similarly, in Figure 7, Model based on 2 F extracts medium-term features to predict the trend, but ultimately fails due to the neglect of long-term and short-term changes in price series. In Figure 8, the model based on 3 F only pays attention to the long-term characteristics in the price series and cannot accurately predict the trend. Therefore, we can conclude that predicting the price trend based on features of a single time scale is not feasible in some cases due to the complexity and variability of price series. It makes sense to combine the features of multiple time scales to predict the future price trend.

Comparisons with Existing Models
Due to the differences in data preprocessing, model training methods, and learning targets, it is not easy to directly compare among existing works, we try to select some models commonly used in financial forecasting and adjust these models to the best state to make relatively fair comparisons. The experimental results are shown in Table 2.

Forecast Horizon
Model Accuracy (%) Figure 8. Visualization of the trend prediction by different models on test example 3.
In Figure 6, we can find that due to the influence of short-term fluctuations, in some cases, the model based on F 1 fails to predict the price trend in a week accurately. Similarly, in Figure 7, Model based on F 2 extracts medium-term features to predict the trend, but ultimately fails due to the neglect of long-term and short-term changes in price series. In Figure 8, the model based on F 3 only pays attention to the long-term characteristics in the price series and cannot accurately predict the trend. Therefore, we can conclude that predicting the price trend based on features of a single time scale is not feasible in some cases due to the complexity and variability of price series. It makes sense to combine the features of multiple time scales to predict the future price trend.

Comparisons with Existing Models
Due to the differences in data preprocessing, model training methods, and learning targets, it is not easy to directly compare among existing works, we try to select some models commonly used in financial forecasting and adjust these models to the best state to make relatively fair comparisons. The experimental results are shown in Table 2. From Table 2, we can find that the proposed hybrid neural network performs better than other models, whether the forecast horizon is one week or one month. On the one hand, SVM is a commonly used machine learning model in financial forecasting, while CNN and LSTM are commonly used deep learning models. Compared with the Simplistic Model, these models can extract profitable information, but they only learn features from a single scale and ignore some useful information. On the other hand, the Multiple Pipeline Model and MFNN are models based on multi-scale feature learning for financial time series forecasting. They use different branches or different networks to extract different scale features. However, it will increase the complexity of the model. The model we proposed only utilizes a CNN to extract features of different scales simplifying the model and predicting more accurately. Therefore, we can conclude that the proposed hybrid neural network is superior to the commonly used models in the existing works.

Discussion
In this paper, we propose a hybrid neural network based on multiple time scale feature learning for stock market index trend prediction. Because there are multi-scale features in financial time series, it makes sense to combine them to predict future trends. First, the proposed model only utilizes one CNN to extract multiple time scale features, instead of using multiple networks like other models. It simplifies the model and makes more accurate predictions. Second, time dependences in the multiple time scale features are learned by three LSTMs. Finally, the information learned by LSTMs is fused through fully connected layers to predict the price trend.
The experimental results demonstrate that such a hybrid network can indeed enhance the predictive performance compared with benchmark networks. Firstly, by comparing with the models based on F 1 , F 2 and F 3 , we conclude that combining multiple time scale features can promote accurate prediction. Secondly, in comparison with the Simplistic Model, we found that the proposed model can learn valuable information. SVM, CNN, and LSTM all learn the features in price series from a single scale. However, there are multiple scales in financial time series, which makes these methods sometimes unable to perform accurate predictions. Finally, both the Multiple Pipeline Model and MFNN are models based on multi-scale feature learning, but they use several branches or networks to extract multi-scale features, making the network huge and complex. The hybrid neural network we proposed uses only a CNN to extract multi-scale features, simplifying the model and predicting more accurately.
However, we can find that the proposed model cannot accurately predict trends for some data samples. There may be two reasons for this issue. First, the three time scales features are not enough to reflect the internal law of price series. Second, these samples are seriously affected by other factors that we have not considered, such as political policy, industrial development, natural factors, and so on. It provides directions for our future work. We can consider using a multi-layer CNN to extract more scale features for prediction. At the same time, we can extract useful information from more sources, such as macroeconomic indicators, news, market sentiment and so on.
Author Contributions: Y.H. proposed the basic framework of hybrid neural network and completed the model construction, experimental research, and thesis writing. Throughout the process, Q.G. guided and gave a lot of suggestions. All authors have read and agreed to the published version of the manuscript.