Genetic Algorithm-Optimized Long Short-Term Memory Network for Stock Market Prediction

: With recent advances in computing technology, massive amounts of data and information are being constantly accumulated. Especially in the ﬁeld of ﬁnance, we have great opportunities to create useful insights by analyzing that information, because the ﬁnancial market produces a tremendous amount of real-time data, including transaction records. Accordingly, this study intends to develop a novel stock market prediction model using the available ﬁnancial data. We adopt deep learning technique because of its excellent learning ability from the massive dataset. In this study, we propose a hybrid approach integrating long short-term memory (LSTM) network and genetic algorithm (GA). Heretofore, trial and error based on heuristics is commonly used to estimate the time window size and architectural factors of LSTM network. This research investigates the temporal property of stock market data by suggesting a systematic method to determine the time window size and topology for the LSTM network using GA. To evaluate the proposed hybrid approach, we have chosen daily Korea Stock Price Index (KOSPI) data. The experimental result demonstrates that the hybrid model of LSTM network and GA outperforms the benchmark model.


Introduction
With recent advances in computing technology, massive amounts of data and information are being constantly accumulated.Big data is being used as a key mechanism to support the innovation of artificial intelligence (AI) techniques, which are undergoing rapid development in recent years, and it is expected to play an important role in improving social and environmental sustainability.This study adopts big data and AI techniques in the field of finance, in order to manage the potential risks of financial market and help achieve socioeconomic sustainability.
In the field of finance, we have great opportunities to create useful insights by analyzing this information, because the financial market produces a tremendous amount of real-time data, including transaction records.Accordingly, this study intends to develop a novel stock market prediction model using the available financial data.We adopt a deep learning technique, since one of the main advantages of this technique is the excellent learning ability from massive datasets.
Stock market predictions have an important role, since they can significantly impact the global economy.Due to of its functional importance, analyzing stock market volatility has become a major research issue in various areas, including finance, statistics, and mathematics [1].However, most stock indices behave very similarly to a random walk, because the financial time series data is noisy and non-stationary in nature [2].Undoubtedly, it is very difficult to predict the stock market, since the volatility is too large to be captured in a model [3].many studies have suggested general approaches based on statistical methods or trial and error, along with various heuristics.We apply GA technique to obtain the best solution and optimize the prediction efficacy [12].To the best of our knowledge, most research on LSTM network does not take this aspect into account.We tested our method on the Korea Composite Stock Price Index (KOSPI) for 2000-2016, and found that it was more predictable than other methods.
The remainder of this paper is organized as follows: Section 2 provides a brief overview of the theoretical literature.Section 3 describes the methodologies that are used in this study, and introduces the hybrid model of LSTM network and GA.Section 4 describes data and variables that are used in this study.Section 5 presents the experimental results and compares the proposed method to a benchmark model.Section 6 summarizes the findings and provides suggestions for further research.

Stock Market Prediction
Stock market forecasting is a known challenging task, since it is characterized by being non-stationary and with a high degree of uncertainty [2].Stock market prediction has been studied for decades, although the efficient market hypothesis (EMH) asserts that price changes in capital market can occur independently; in addition, several empirical studies have demonstrated that stock market predictions are possible, to some extent [4,13,14].EMH can be divided into three types (weak, semi-strong, and strong) according to the level of reflection of market information.Among three types of EMH exist, this study assumes weak EMH, which only concerns past market trading data [15].
Previous studies usually employed statistical and machine learning techniques to forecast future financial values.Traditional stock market prediction techniques, based on statistical methods, are generated via a linear process [12].Statistical analysis based on historical stock data, such as the autoregressive integrated moving average model (ARIMA), the autoregressive conditional heteroscedasticity (ARCH) model, and the generalized autoregressive conditional heteroscedasticity (GARCH) model, has been widely used to make predictions about the financial market [16][17][18][19].However, prediction systems based on statistical methods do not perform well, and have their own limitations because they require more historical data to meet statistical assumptions, such as normality postulates [20].
Since stock markets are regarded as nonlinear and non-parametric dynamic systems [2], more flexible methods that can learn complex dimensionality are essential to improve the prediction performance.Machine learning techniques have strong advantages in that respect, because they can extract nonlinear relationships between data without prior knowledge of the input data [3].These techniques have been widely adopted, with relative success in making stock market predictions [21].Among them, ANN and SVM are the most popular techniques for forecasting financial time series, since they can investigate the noisy behavior of data without making any statistical restrictions [4].Empirical results show that machine learning techniques produce outstanding performance as compared with statistical models [7,22], since they have a better ability to learn the hidden relationships among market factors and capture the complex patterns in data [23].
Saad et al. (1998) compared three neural network models, time delay, recurrent, and probabilistic neural networks, and employed training methods of conjugate gradient and multi-stream extended Kalman filter for time delay neural network (TDNN) and RNN for stock trend prediction [9].RNN showed the best performance among other models.Chen et al. (2005) utilized the neural network, TS fuzzy system and hierarchical fuzzy system to verify the efficacy of the hybrid model, and various parameters of each models were optimized by one of the search algorithms, the particle swarm optimization (PSO) algorithm [24].Yu et al. (2014) employed SVM to construct stock selection system and applied principal component analysis (PCA) to get low dimensional and informative financial time series [25].The experimental result showed that the return of stocks selected by PCA-SVM were apparently superior to other benchmarks.Chen and Hao (2017) proposed the hybrid framework to predict the stock market indices with feature weighted SVM and feature weighted K-nearest neighbor [26].They used information gain to consider the influence of each feature and made it possible to take into account the relative importance of each feature.
Recently, with the outstanding performance in various classification problems [27,28], there have been attempts to apply deep learning techniques to stock market prediction.Deep learning techniques have achieved remarkable success in numerous prediction tasks, since they can extract useful features automatically during the learning process [29,30].Chong et al. (2017) predicted future market trend by examining the effect of three unsupervised feature extraction methods (PCA, auto encoder, and restricted Boltzmann machine (RBM)) on the deep learning network [21].Sezer et al. (2017) proposed a stock trading system based on deep neural network for buy-sell-hold predictions [31].GA was used to optimize the technical analysis parameters and create the buy-sell point of the system.
There have also been some approaches to integrate qualitative information with deep learning techniques for stock market forecasting.Yoshihara et al. (2014) exploited the textual information as input variable and predicted market trends based on RNN model combined with restricted Boltzmann machine (RBM) to investigate the temporal effects of past events [8].Ding et al. (2015) proposed an event-driven stock market prediction system [32].The events were obtained from the news text, and deep CNN was exploited for the examining the long-term and short-term influences of extracted events on S&P 500 index and individual stock movements.Table 1 presents a summary of recent stock market prediction studies.

RNN for Time Series Prediction
Most ANNs, including multi-layer perceptron (MLP) can only learn spatial patterns from time independent inputs and outputs [5].RNN provides advantages over traditional ANN because the "memory feature" of RNN can be employed to elicit temporal patterns in data.RNN has been used for time series analyses, due to its useful characteristics.In recent years, with the significant advance in deep learning techniques, RNN is actively applied to various tasks, such as natural language processing, speech recognition, and computer vision, that deal with sequential data [38,39].In addition, several existing studies, using RNN including LSTM networks, achieved satisfactory performance in the financial time series forecasting problem.Lin et al. (2009) predicted the closing price of the following trading day by utilizing one variant of RNN, echo state networks (ESN).They chose an initial transient by using Hurst exponent, and select subseries with the greatest capability of forecasting during training [6].Wei and Cheng (2012) proposed a hybrid method using a synthesis feature selection to detect crucial technical indicators for stock market prediction [20].They exploited the stepwise regression and decision tree to reduce the dimension of financial data.The experimental result showed the superiority of the proposed model.Dixon (2017) applied RNN to high frequency trading, classifying the movements of short-term price from limit order books of financial futures to predicting the price flip of the next event [40].Fischer and Krauss (2018) deployed LSTM networks to forecast the directional movement of constituent stocks of the S&P 500 from 1992 to 2015 [41].They compared the simulation result with memory-free classifiers, such as random forest (RF), deep neural network (DNN), and logistic regression.The LSTM model outperformed the other comparative models by a very clear margin, and they found that the LSTM network is suitable for the financial domain.
Furthermore, studies that integrate RNN and search algorithms have been conducted.2011) suggested an integrated system with a combination of an artificial bee colony (ABC) algorithm and RNN for stock market prediction [12].They applied ABC to optimize the connection weight of RNN, and utilized the wavelet transform to decompose the market data, along with removing the noise.Rather et al. (2015) presented an integrated model which was comprised of RNN and two linear models, including ARIMA and exponential smoothing, to predict stock returns, and the optimal weight of RNN is produced by GA [43].
RNN has highly sensitive network parameters that can affect its performance, such as the number of hidden neurons, the depth of a network, and the size of time window.Determining the time window size is particularly important, because it defines the shape of the input variables entered into the RNN, and the degree of past information to be considered.Trial and error based on heuristics is commonly used to estimate the time window size of RNN.Meanwhile, various statistical or mathematical techniques, such as autocorrelation function (ACF), rescaled range analysis (R/S analysis), and information theory, can be applied to determine the appropriate time lag for time series analysis.ACF presents the degree of autocorrelation as time progresses [5].It represents the covariance and correlation coefficient between time points in sequential data, and investigates the pattern of seasonality.The R/S analysis is similar to ACF that is used as a measure of the long-term memory of time series [6].Information theory is one of the quantification methods which can search the length of dimension that can lead the time series data in a statistically significant manner.Zhang et al. (2017) adopted concepts of mutual information of information theory to specify the time shift of the input variable [44].However, most studies on RNN still follow the experience rather than systematic approaches, and the literature remains limited.

Long Short-Term Memory (LSTM) Network
LSTM network is a type of deep RNN model composed of LSTM units.As discussed earlier, RNN is a deep learning network with internal feedback between neurons.These internal feedbacks enable the memorization of significant past events and incorporate past experience.Unlike a traditional fully connected feedforward network, RNN shares parameters across all the parts of a model, so it can be generalized to sequence lengths that have not been seen during training.Figure 1 presents an example of RNN architecture that produces an output at every time step, and has recurrent connections among hidden neurons [45].The RNN has weight matrices  that connects the input-to-hidden weight matrix  , that connects hidden-to-hidden, and a weight matrix  , that connects hidden-to-output.Forward propagation proceeds by defining the initial state of the hidden unit  .Then, for each time step from  , we apply the following update equations.The input value of hidden neuron ℎ at time  is given as where  is the weight between input neuron  and hidden neuron , and  is input value at time . denotes the weight between hidden neuron  and  , and  is output value of hidden neuron  at time  − 1.
The transfer function of hidden neuron is named  , and the output of hidden neuron is expressed as Finally, the output value of the hidden layer  is fed into output neuron , and the output value of output layer is given as where  is the weight between hidden and output neurons.However, RNN has difficulty in learning long time-dependencies that are more than a few time steps in length [46].As the number of time steps to consider increases, information from the past events exponentially disappears.LSTM is proposed as a way to overcome the long-term dependency problem.LSTM networks can contain past information of more than 1000 time steps.LSTM can scale to much longer sequences than simple RNN, overcoming the intrinsic drawbacks of simple RNN, i.e., vanishing and exploding gradients.Today, LSTM is widely used in many sequential modeling tasks, including speech recognition, motion detection, and natural language processing [47].The LSTM block diagram is depicted in Figure 2. The RNN has weight matrices U that connects the input-to-hidden weight matrix W, that connects hidden-to-hidden, and a weight matrix V, that connects hidden-to-output.Forward propagation proceeds by defining the initial state of the hidden unit j 0 .Then, for each time step from j 0 , we apply the following update equations.The input value of hidden neuron h at time t is given as where u ji is the weight between input neuron i and hidden neuron j, and x t i is input value at time t.w jj denotes the weight between hidden neuron j and j , and z t−1 j is output value of hidden neuron j at time t − 1.
The transfer function of hidden neuron is named f , and the output of hidden neuron is expressed as Finally, the output value of the hidden layer z is fed into output neuron k, and the output value of output layer is given as where v j is the weight between hidden and output neurons.However, RNN has difficulty in learning long time-dependencies that are more than a few time steps in length [46].As the number of time steps to consider increases, information from the past events exponentially disappears.LSTM is proposed as a way to overcome the long-term dependency problem.LSTM networks can contain past information of more than 1000 time steps.LSTM can scale to much longer sequences than simple RNN, overcoming the intrinsic drawbacks of simple RNN, i.e., vanishing and exploding gradients.Today, LSTM is widely used in many sequential modeling tasks, including speech recognition, motion detection, and natural language processing [47].The LSTM block diagram is depicted in Figure 2.
The LSTM block contains memory cell and three multiplicative gating units; an input, an output, and a forget gate.There are recurrent connections between the cells, and each gate provides continuous operations for the cells.The cell is responsible for conveying "state" values over arbitrary time intervals, and each gate conducts write, read, and reset operations for the cells [45][46][47].
events exponentially disappears.LSTM is proposed as a way to overcome the long-term dependency problem.LSTM networks can contain past information of more than 1000 time steps.LSTM can scale to much longer sequences than simple RNN, overcoming the intrinsic drawbacks of simple RNN, i.e., vanishing and exploding gradients.Today, LSTM is widely used in many sequential modeling tasks, including speech recognition, motion detection, and natural language processing [47].The LSTM block diagram is depicted in Figure 2.  The computation process within an LSTM block is as follows.The input value can only be preserved in the state of the cell if the input gate permits it.The input value of i t and the candidate value of the memory cells, C t , at time step, t, is calculated as follows: where W, U, b represent the weight matrices and bias, respectively.The weight of the state unit is managed by the forget gate and the value of forget gate is computed as Through this process, the new state of memory cell is updated as With the new state of memory cell, the output value of the gate is calculated as follows: The final output value of cell is defined as The output of the cell can be blocked by the output gate, and all gates use sigmoidal nonlinearity, and the state unit can perform as an extra input to other gating units [45][46][47].Through this process, the LSTM architecture can solve the problem of long-term dependencies at small computational costs [48].

Genetic Algorithm (GA)
GA is metaheuristic and stochastic optimization algorithm inspired by the process of natural evolution [49].They are widely used to find near-optimal solutions to optimization problems with large search spaces.The process of GA includes operators that imitate natural genetic and evolutionary principles, such as crossover and mutation.The major feature of GA is the population of "chromosomes".Each chromosome acts as a potential solution to a target problem, and is usually expressed in the form of binary strings.These chromosomes are generated randomly, and the one that provides the better solution gets more chance to reproduce [15].Processing the GA can be divided into six stages: initialization, fitness calculation, termination condition check, selection, crossover, and mutation, as shown in Figure 3 [50].In the initialization stage, a chromosome in the search space is arbitrarily selected, and then the fitness of each selected chromosome is calculated in accordance with the predefined fitness function.The fitness function is a concept used to numerically encode a chromosome's performance [5].In optimization algorithms, such as GA, the definition of a fitness function is a crucial factor that affects the performance.Through the process of calculating the fitness for the fitness function, only solutions with excellent performance are preserved for further reproduction processes.Some chromosomes are selected several times through the selection process, and chromosomes that disappear without selection are generated because they are chosen stochastically according to the adaptability of fitness function.That is, the chromosomes with prominent performance have a higher probability of being inherited by the next generation.Selected superior chromosomes produce offspring by interchanging corresponding parts of the string and changing gene combinations.The crossover process leads to new solutions being created from existing ones.In the mutation process, one of the chromosomes is selected to change one randomly chosen bit.The aim of this process is to introduce diversity and novelty into the solution pool by arbitrarily swapping or turning off solution bits.The crossover process has the limitation in that completely new information cannot be generated.However, these limitations can be overcome by the mutation operation by changing corresponding bits to completely new values.The newly generated chromosome through selection, crossover, and mutation processes calculates the fitness to the model, and verifies the termination criteria.The standard procedure of GA is over when the termination criteria have been satisfied.If some termination criteria are not satisfied, the selection, crossover, and mutation processes are repeated, to generate a superior chromosome with higher performance.In this study, chromosomes are represented as binary arrays and the mean squared error (MSE) of the prediction model is acting as the fitness value.

A Hybrid Approach to Optimization in LSTM Network with GA
Evolutionary algorithms, mostly GA, have been widely applied to neural network models, such as MLP and RNN, and used in various hybrid approaches for financial time series forecasting to optimize technical analysis or train the neural network [23,36].Muhammad and King (1997) exploited evolutionary fuzzy networks in the foreign exchange market forecasting [51], and Kai and Wenhua (1997) proposed an ANN model trained with GA to predict the stock price [52].Kim and Han (2000) optimized the connection weights between layers and conducted the feature discretization with GA, reducing the dimensionality of feature space [23].Kim et al. (2006) also predicted the stock index using a GA-based multiple classifier combination technique to incorporate classifiers that stem from machine learning, experts, and users [53].Meanwhile, combining optimized size of time window with LSTM networks in time series forecasting has not been studied extensively.
In this study, we propose a hybrid approach of LSTM network integrating GA to find the The newly generated chromosome through selection, crossover, and mutation processes calculates the fitness to the model, and verifies the termination criteria.The standard procedure of GA is over when the termination criteria have been satisfied.If some termination criteria are not satisfied, the selection, crossover, and mutation processes are repeated, to generate a superior chromosome with higher performance.In this study, chromosomes are represented as binary arrays and the mean squared error (MSE) of the prediction model is acting as the fitness value.

A Hybrid Approach to Optimization in LSTM Network with GA
Evolutionary algorithms, mostly GA, have been widely applied to neural network models, such as MLP and RNN, and used in various hybrid approaches for financial time series forecasting to optimize technical analysis or train the neural network [23,36].Muhammad and King (1997) exploited evolutionary fuzzy networks in the foreign exchange market forecasting [51], and Kai and Wenhua (1997) proposed an ANN model trained with GA to predict the stock price [52].Kim and Han (2000) optimized the connection weights between layers and conducted the feature discretization with GA, reducing the dimensionality of feature space [23].Kim et al. (2006) also predicted the stock index using a GA-based multiple classifier combination technique to incorporate classifiers that stem from machine learning, experts, and users [53].Meanwhile, combining optimized size of time window with LSTM networks in time series forecasting has not been studied extensively.
In this study, we propose a hybrid approach of LSTM network integrating GA to find the customized time window and number of LSTM units for financial time series prediction.Since LSTM network uses past information during the learning process, a suitably chosen time window plays an important role in the promising performance.If the window is too small, the model will neglect important information, while, if the window is too large, the model will be overfitted on the training data.Figure 4   This study consists of two stages, which are as follows.The first stage of the experiment involves designing the appropriate network parameters for the LSTM network.We use a LSTM network with sequential input layer followed by two hidden layers, and optimal number of hidden neurons in each hidden layer is investigated by GA.In LSTM-RNN model, the hyperbolic tangent function is utilized as an activation function of the input nodes and hidden nodes.The hyperbolic tangent function is a scaled sigmoid function, and returns input value into a range between −1 and 1.The activation function of output node is designated as a linear function, since our goal is the prediction of closing price of the next day which can be formulated as a problem of regression.Initial weights of network are set as random values, and the network weight is adjusted by using a gradient-based "Adam" optimizer, which is famous for its simplicity, straightforwardness, and computational efficiency.The method is appropriate for problems which have large data and parameters, and also has strength in dealing with non-stationary problems with very noisy and sparse gradients [54].
As described above, we employ one of the evolutionary search algorithms, GA, to investigate the optimal size of time windows and architectural factors of LSTM network.In the second stage, various sizes of time windows and different numbers of LSTM units of each hidden layer are applied to evaluate the fitness of GA.The populations that are composed with possible solutions are initialized with random values, before the genetic operators start to explore the search space.The chromosomes used in this study are encoded in binary bits that represent the size of the time window and number of LSTM cells.Based on the population, the selection and recombination operators begin to search for the superior solution.The solutions are evaluated by predefined fitness function, and strings with prominent performance are selected for the reproduction.Fitness function is a crucial part of GA, and has to be chosen carefully.In this research, we use the MSE to calculate the fitness of each chromosome, and the subset of architectural factors that returns the smallest MSE is selected as the optimal solution.If the output of the reproduction process satisfies the termination criteria, derived optimal or near-optimal solution is applied to the prediction model.If not, the whole process of selection, crossover, and mutation are repeated again.In order to acquire the outstanding solution for the problem, genetic parameters, such as crossover rate, mutation rate, and population size, can affect the result.In this study, we use a population size of 70, 0.7 crossover rate, and 0.15 mutation rate in the experiment.As a stopping condition, the number of generations is assigned as 10.

Data Description
Research data in this study comes from the daily Korea Stock Price Index (KOSPI) for January This study consists of two stages, which are as follows.The first stage of the experiment involves designing the appropriate network parameters for the LSTM network.We use a LSTM network with sequential input layer followed by two hidden layers, and optimal number of hidden neurons in each hidden layer is investigated by GA.In LSTM-RNN model, the hyperbolic tangent function is utilized as an activation function of the input nodes and hidden nodes.The hyperbolic tangent function is a scaled sigmoid function, and returns input value into a range between −1 and 1.The activation function of output node is designated as a linear function, since our goal is the prediction of closing price of the next day which can be formulated as a problem of regression.Initial weights of network are set as random values, and the network weight is adjusted by using a gradient-based "Adam" optimizer, which is famous for its simplicity, straightforwardness, and computational efficiency.The method is appropriate for problems which have large data and parameters, and also has strength in dealing with non-stationary problems with very noisy and sparse gradients [54].
As described above, we employ one of the evolutionary search algorithms, GA, to investigate the optimal size of time windows and architectural factors of LSTM network.In the second stage, various sizes of time windows and different numbers of LSTM units of each hidden layer are applied to evaluate the fitness of GA.The populations that are composed with possible solutions are initialized with random values, before the genetic operators start to explore the search space.The chromosomes used in this study are encoded in binary bits that represent the size of the time window and number of LSTM cells.Based on the population, the selection and recombination operators begin to search for the superior solution.The solutions are evaluated by predefined fitness function, and strings with prominent performance are selected for the reproduction.Fitness function is a crucial part of GA, and has to be chosen carefully.In this research, we use the MSE to calculate the fitness of each chromosome, and the subset of architectural factors that returns the smallest MSE is selected as the optimal solution.If the output of the reproduction process satisfies the termination criteria, derived optimal or near-optimal solution is applied to the prediction model.If not, the whole process of selection, crossover, and mutation are repeated again.In order to acquire the outstanding solution for the problem, genetic parameters, such as crossover rate, mutation rate, and population size, can affect the result.In this study, we use a population size of 70, 0.7 crossover rate, and 0.15 mutation rate in the experiment.As a stopping condition, the number of generations is assigned as 10.

Data Description
Research data in this study comes from the daily Korea Stock Price Index (KOSPI) for January 2000-December 2016.The total number of cases comprises 4203 trading days, and historical data is obtained from Bloomberg; each sample contains daily price information, including the low price, high price, opening price, closing price, and trading volume [55].The entire dataset is divided into training (first 80% of the whole) and holdout sets (last 20% of the whole).Within the training set, a validation set (15% of the training set) is set aside to adopt some form of weight pruning, preventing overfitting.The training set is used to investigate efficient parameters and specification of the model, while the holdout set is reserved for the evaluation of out-of-sample and performance comparison among prediction models.

Feature Selection
In this study, five technical indicators and five historical values (high price, low price, opening price, closing price, and trading volume) are employed as input variables.The output of the prediction model is the closing price the next day.Many investors and traders in the stock market use technical indicators as cues for future market trends [56].We selected five technical indicators by reviewing domain experts and prior research [23,33,57].All indicators are computed from collected raw data, and the original data is scaled into the range of [0, 1], obtaining the normalized multi-dimensional time series.Through the linear scaling process, each feature component is normalized to the specified range, since the range of values of raw data varies widely, and it helps gradient descent to converge much faster.The linearly scaled value of x is as follows: where min(x), max(x) are the minimum and maximum value of the attribute x, respectively.The technical indicators used in this study, including their formulae, are summarized in Table 2, and descriptive statistics of input variables are shown in Table 3.A brief explanation of each indicator is provided here.

Method
Indicator Formula Simple 10-day moving average Weighted 10-day moving average Note: C t is the closing price, L t is the low price, and H t is the high price at time t.LL t and HH t are the lowest low and highest high in the last t days, respectively.U p t represents the upward price change and Dw t represents the downward price change at time t.Moving average is one of the popular technical indicators that can identify the short-, medium-, and long-term price trend.Simple moving average (SMA) is the unweighted mean value of the specified time period.Weighted moving average (WMA) assigns more weight to the latest data points, since they contain more relevant information than data points in the distant past.In this study, the time period of the moving average is 10 days.

Relative Strength Index (RSI)
RSI is a momentum indicator that investigates the current and historical gain and losses of the recent trading period, measured on a scale from 0 to 100.It measures the speed and change in price movements of a security [26].Since the most typically used timeframe of RSI is 14-day, we also adopted it in our research.

Stochastic Oscillator
Stochastic oscillator presents the position of a closing price of stock in relation to the high and low range of the price over a set period.There are two kinds of stochastic oscillators; stochastic %K and stochastic %D, and %D is the 3-day moving average of %K.

Result and Analysis
As discussed earlier, this study applies GA to investigate the optimal architectural factors, including the size of the time window to be fed a LSTM network, and derives results through this genetic search.The best time window size for stock market prediction has been chosen as 10 by GA.In other words, it is most effective to analyze the stock market by using the information of the past 10 trading days in stock market prediction.Moreover, the best number of LSTM units, that is composing two hidden layers, have been derived as 15 and 7, respectively.
We apply input embedding of the last 10 time steps, and optimized the architecture to verify the effectiveness of GA-LSTM model on holdout data.The derived result of the GA-optimized LSTM network is measured by computing the mean squared error (MSE), mean absolute error (MAE), and the mean absolute percentage error (MAPE) of the actual closing price of stock market, and the output of the proposed hybrid model.MSE is defined as where ŷi is the predicted output value of the model's ith observation, y i is the desired one, and n denotes the number of samples.
MAE is defined as follows: MAPE is given as These performance measures have been widely used in several studies, and provide the means with which to determine the effectiveness of the model for forecasting the daily stock index [24,[35][36][37].The results are compared to a simple algorithm that predicts no day-to-day change, tested against the same dataset used in this study, to test the efficacy of proposed model.Table 4 presents the experimental results of the proposed approach in this study.As shown in Table 4, GA-optimized LSTM network presents better performance than the benchmark in all error measures.The predict MSE of benchmark model is 209.45, while the predicted MSE of the combined GA model and LSTM network is 181.99, and the prediction result enhances by 13.11% compared to the benchmark model.The predicted MAE of the benchmark model is 11.71, while the predicted MAE of the proposed model is 10.21, and the prediction result enhances 12.80% compared to the benchmark model.Lastly, the predicted MAPE of the benchmark, which expresses accuracy as a percentage of error, is 1.10%, while the MAPE of the GA-LSTM hybrid model is 0.91%.The normalized prediction output of GA-optimized LSTM network model is presented in Figure 5, of which the blue line presents the actual closing price, while the red line is the prediction output of the proposed model in this research.
compared to the benchmark model.Lastly, the predicted MAPE of the benchmark, which expresses accuracy as a percentage of error, is 1.10%, while the MAPE of the GA-LSTM hybrid model is 0.91%.The normalized prediction output of GA-optimized LSTM network model is presented in Figure 5, of which the blue line presents the actual closing price, while the red line is the prediction output of the proposed model in this research.The superior performance derived from the GA-LSTM model may be explained by the fact that the globally investigated time window and architecture of LSTM network enhanced the efficiency of learning process and prevented unnecessary computations.The results suggest that appropriate tuning of the parameters is an important condition to achieve satisfactory performance.Despite the fast growth of deep learning algorithms, it is very difficult task to find an optimal set of parameters of deep architectures by expert knowledge.However, the experimental results demonstrate that the method used in this study can be an effective tool to determine the optimal or near-optimal model for deep learning algorithms, and showed the potential for its applicability.
One form of statistical verification, called -test has been conducted to investigate whether GA-LSTM outperforms the benchmark significantly.A -test is employed to investigate the difference in unknowns of two groups, comparing the mean values of two samples extracted from each group.The superior performance derived from the GA-LSTM model may be explained by the fact that the globally investigated time window and architecture of LSTM network enhanced the efficiency of learning process and prevented unnecessary computations.The results suggest that appropriate tuning of the parameters is an important condition to achieve satisfactory performance.Despite the fast growth of deep learning algorithms, it is very difficult task to find an optimal set of parameters of deep architectures by expert knowledge.However, the experimental results demonstrate that the method used in this study can be an effective tool to determine the optimal or near-optimal model for deep learning algorithms, and showed the potential for its applicability.
One form of statistical verification, called t-test has been conducted to investigate whether GA-LSTM outperforms the benchmark significantly.A t-test is employed to investigate the difference in unknowns of two groups, comparing the mean values of two samples extracted from each group.The t-test result of prediction performance, between proposed model and benchmark, is summarized in Table 5.The p-value of t-test results of the predictive performance, between the GA based model and the benchmark, is derived as 0.015, which means that the difference between two models is statistically significant at a 5% significance level.This result verifies that the GA-LSTM network performs better than the benchmark model, and indicates the capability of the proposed model to consider the temporal properties in the stock market prediction problem.We also applied the GA-LSTM model on individual stocks included in the KOSPI, to validate the efficiency of the proposed model.In addition to forecasting of the market index, predictability for individual stocks is also important for investment decisions.We start by identifying the ten largest stocks in terms of market capitalization, among which we eliminate the stocks with no price records over the sample period.Six stocks are left, the list of which is presented in Table 6.The experimental results for those individual stocks are shown in Table 7.As reported in Table 7, the integrated model of GA and LSTM network proves to be as performant on the individual stocks, as on the index.GA-LSTM even shows better prediction performance and statistical significance.
Based on these empirical results, we conclude that the optimization of time window and topology of LSTM network is an important task which can improve the performance of prediction models significantly.The integrating approach of combining GA and LSTM network has advantages in financial time series prediction problems, and this is evident when the simultaneous optimization over time lag and composing units is accomplished.The proposed hybrid model learned to capture some aspects of the market's chaotic behavior, and was able to predict the price index of the next day with prominent performance compared to the benchmark.
In this research, we use GA technique, and prove that it is able to find the optimal solution for the model effectively.Through the overall results, we identify the superiority of the GA-LSTM network to predict the stock market.Experimental results suggest that using a suitably chosen time window and architectural factors on financial time series tasks can significantly improve the predictability of the LSTM network.The proposed GA-LSTM model could also tradeoff the limitations of commonly used methods, which are known as heuristic approaches.
In recent years, research on stock prediction using deep learning has been increasing.Chen et al. (2015) adopted LSTM network for forecasting the return of the Chinese stock market, which demonstrated poor performance [58].Hiransha et al. (2018) predicted the price of different companies from the National Stock Exchange (NSE) of India and New York Stock Exchange (NYSE) using several deep networks, including LSTM network and CNN, and compared their performance [59].The recent work by Dixon (2018) also used a LSTM network, and predicted short-term price movements on the S&P500 E-mini futures level II data [40].However, these studies do not deal with the problem of network architecture when it comes to defining the number of hidden layers and nodes it can include.This is where our model comes in handy, as it provides the solution based on GA, rather than a simple rule of thumb, which was shown to achieve better performance along with higher efficiency.This hybrid deep learning method can make better predictions, and it successfully deals with complex high dimensional data by offering an optimal model for financial prediction.

Conclusions
The prediction of the stock market can generate an actual financial loss or gain, so it is practically important to enhance the predictability of models.Consequently, many studies have been trying to model and predict financial time series, using statistical or soft computational skills that are capable of examining the complex and chaotic financial market.In recent years, deep learning techniques have been actively applied, based on their excellent achievements in various classification problems.
In this study, we constructed a stock price prediction model based on RNN using LSTM units, which is one of the typical methodologies of deep learning.We integrated GA and LSTM network to consider the temporal properties of the stock market, and utilized the customized architectural factors of a model.The LSTM network used in this study is composed with two hidden layers, which is a deep architecture for expressing nonlinear and complex features of the stock market more effectively.GA was employed to search the optimal or near-optimal value for the size of the time window and number of LSTM units in an LSTM network.
To verify the effectiveness of this approach, we perform the experiment on 17 years' worth of KOSPI values, and predicted the closing price one day after, and the result is compared with a simple benchmark model that predicts no day-to-day change.The experimental results presented show that our proposed approach has lower MSE, MAE, and MAPE, and the improvements are found to be statistically significant.These overall results demonstrate that a GA-LSTM approach can be an effective method for stock market forecasting to reflect temporal patterns.
This study suggests useful implications for designing the proper architecture for LSTM network, which affects the detection of temporal patterns.Defining the time window is particularly crucial, because it plays an important role in investigating the temporal properties of given a dataset, but when processing the LSTM network, they are not specifically fine-tuned to the purpose of the problem, which causes the model to learn the least significant patterns.Much of the existing literature, that uses LSTM network in time series problems, usually employs subjective approaches based on trial and error, rather than systematic approaches to find the optimal size of the time window.Furthermore, other approaches that using statistical methods have the limitations that come from various statistical assumptions.However, we solved this problem by adapting an evolutionary search algorithm, GA, and our empirical results support the efficacy of the proposed model.We suggest a purpose-specific model with less restrictions, that can capture more significant leading signals in financial time series.The ability of the proposed model to track noisy patterns of financial time series may be applicable to various domains.
Although the proposed integrated model has a prominent predictive performance, it still has some insufficiencies.First, we did not take into consideration the trading commission in the analysis, and only forecasted the value of stock index and prices.However, in real-world investment environments, it is necessary to consider the trading commissions for higher returns, which can be a good topic for further discussion.Second, this study is conducted using only Korean stock market data.Therefore, further research can include data from various stock Third, as mentioned earlier, the decision output of LSTM network is not easy to comprehend.Another improvement can be derived, providing an interpretable LSTM model in association with other machine learning techniques.Finally, there are some requirements to consider when designing the LSTM network architecture: restriction of overfitting phenomena and the influence of noise.These difficulties are particularly relevant for the financial time series, which are chaotic and require complex learning algorithms.To prevent these problems, learning parameters of the neural network should be properly selected, thus, further research can be conducted to optimize various other hyperparameters in LSTM networks.In addition, when it also comes to setting control parameters of GA, like the crossover rate and mutation rate, many suitable combinations can be derived that can improve the performance of research.

Figure 1 .
Figure 1.Basic structure of a simple recurrent neural network (RNN).

Sustainability 2018 ,
10, x FOR PEER REVIEW 8 of 17novelty into the solution pool by arbitrarily swapping or turning off solution bits.The crossover process has the limitation in that completely new information cannot be generated.However, these limitations can be overcome by the mutation operation by changing corresponding bits to completely new values.

Figure 3 .
Figure 3. Basic process of a genetic algorithm (GA).

Figure 3 .
Figure 3. Basic process of a genetic algorithm (GA).
depicts the flowchart of the model proposed in our work.Sustainability 2018, 10, x FOR PEER REVIEW 9 of 17

Figure 5 .
Figure 5. Normalized prediction outputs of the holdout data.

Figure 5 .
Figure 5. Normalized prediction outputs of the holdout data.

Table 1 .
A summary of recent studies on stock market prediction.
Cai et al. (2007) combined global search algorithm PSO and evolutionary algorithm (EA) with RNN to estimate missing values in time series data [42].Hsieh et al. (

Table 2 .
Selected technical indicators and their formulae.

Table 3 .
Summary statistics of selected input variables.

Table 5 .
t-Test results of model comparison.

Table 6 .
List of sample stocks used in this study.

Table 7 .
Experimental result on individual stocks.