Prediction of Stock Performance Using Deep Neural Networks

: Stock performance prediction is one of the most challenging issues in time series data analysis. Machine learning models have been widely used to predict ﬁnancial time series during the past decades. Even though automatic trading systems that use Artiﬁcial Intelligence (AI) have become a commonplace topic, there are few examples that successfully leverage the proven method invented by human stock traders to build automatic trading systems. This study proposes to build an automatic trading system by integrating AI and the proven method invented by human stock traders. In this study, ﬁrstly, the knowledge and experience of the successful stock traders are extracted from their related publications. After that, a Long Short-Term Memory-based deep neural network is developed to use the human stock traders’ knowledge in the automatic trading system. In this study, four di ﬀ erent strategies are developed for the stock performance prediction and feature selection is performed to achieve the best performance in the classiﬁcation of good performance stocks. Finally, the proposed deep neural network is trained and evaluated based on the historic data of the Japanese stock market. Experimental results indicate that the proposed ranking-based stock classiﬁcation considering historical volatility strategy has the best performance in the developed four strategies. This method can achieve about a 20% earning rate per year over the basis of all stocks and has a lower risk than the basis. Comparison experiments also show that the proposed method outperforms conventional methods.


Introduction
Stock performance prediction is one of the most challenging issues in time series data analysis. How to accurately predict stock performance changing is an open question with respect to the financial world and academia field. Stock performance prediction is a difficult task, due to the complexity and dynamic of the markets and many inexplicit, intertwined factors involved. Economic analysts and stock traders are the earliest pioneers who perform the prediction of stock performance. In the past several decades, thousands of books in stock trading have been published.
Many economic analysts and stock traders have studied the historical patterns of financial time series data and have proposed various methods to predict stock performance. In order to achieve a promising performance, most of these methods require careful selection of index variables and finding the sharing features among the distinguished stocks. William J. O'Neil and M. Weinstein are two representatives of successful traders. They summarized their stock trading experience in the publications [1-4]. William J. O'Neil's CAN SLIM method has a huge following, and also performed well in American Association of Individual Investors (AAII)'s implementation of his and stock traders are used in a deep neural network to predict future stock performance. The second contribution of this study is the verification of the effectiveness of the knowledge of human stock traders and various investment strategies for constructing a successful intelligent trading system. In this study, what index variables are the most significant, and how to perform the stock performance prediction to maximize earning and minimize the risk of investment are investigated. This study is focused on Japanese stock data to explore a reliable investment algorithm for the Japanese stock market. This also aims to verify whether the method invented based on United State (US) stocks is also effective in Japanese stocks, because the traders William J. O'Neil and M. Minervini summarize their experience based on US stocks.
The rest of the paper is organized as follows: Section 2 describes the important index variables and four strategies for stock classification. Section 3 presents the proposed deep neural network to classify the distinguished stocks with good performance. Section 4 shows the evaluation of the proposed systems. Finally, Section 5 concludes this paper.

Important Index Variables for Stock Performance Prediction
In the current stock market, there are hundreds of index variables indicating the value of a stock from different aspects. Professional analysts and stock traders have tried hard to find the correlation between variables and the future performance of stocks. William J. O'Neil and M. Minervini provided many important points and rules for successful stock trading [1,2]. Table 1 lists the 21 index variables (are also called features) that are the most frequently used for recognizing the distinguished stocks in their related publications [1,2].
Among the most important issues in the development of an intelligent trading system is to decide what features should be used for stock performance prediction. One way is just following the suggestions of human stock traders and feeding all features into the developed system. In addition to using all these suggested features, this study presents a feature selection test and verifies the effectiveness of those features. Based on the definition and characteristic of the features, the 21 features are categorized into four groups: price-related features, trading volume features, company financial status-related features, and others. The results of the feature selection test are discussed in Section 4. In this study, the related data of the important indices with a weekly resolution are downloaded from the stock database and these important features are used as the input of the deep learning algorithm. Daily price-related data and daily trading volume data are also used as the input data of the deep learning algorithm because these two kinds of data (price and trading volume) are the most important for the prediction of stock price from the viewpoint of human stock traders. Using the additional daily resolution data can avoid missing significant dynamic in each week. When the weekly and daily data are used together, the two kinds of data should be synchronized based on time. The solution for data synchronization is explained in Section 3.

Definition of Positive Samples for Stock Classification
One of the simplest ways of constructing an intelligent trading system is to employ the binary classification algorithm to classify all stocks as two groups: positive samples which are the stocks with good future performance, and negative samples which are the rest of the stocks. This study presents four strategies for classification, and evaluates the strategies from the aspects of both earning rate and risk of investment.

Constant Threshold-Based Stock Classification
Fund managers are concerned about the rising rate α C of stock price-expressed in Equation (1). For example, if the price rising rate of a stock could surpass a threshold in the next 12 weeks, this stock could be a good candidate for investment. In Equation (1), CP is the closing price in the current week, and HP in_next_12weeks denotes the highest price in the next 12 weeks. In this constant threshold-based stock classification, α C is used to classify stocks. It means that if α C of a stock is higher than the threshold, the stock will be defined as a positive sample in the stock classification; otherwise, the stock is a negative sample. In this study, 70% was chosen as the threshold based on the experience of fund managers. If the developed intelligent trading system uses the constant threshold-based method to select stocks, the system will simply predict whether α C of the stock is over the threshold or not. The system will buy the positive stock in the current week, and sell the positive stock when its α C reaches the threshold in the next 12 weeks.

Ranking-Based Stock Classification
Because the situation of the market is different every year, in a "good" year, many stocks have good performance and have a high price rising rate. In a "bad" year, the number of stocks with a high price rising rate becomes fewer. In this case, the investment will focus on fewer stocks if the constant threshold method is used for selecting stocks. However, it is necessary to maintain the number of selected stocks and distribute the investment in different stocks to reduce risk.
This study proposes the second strategy in the stock classification. All the stock samples are ranked based on the rising rate α R expressed in Equation (2), then the top x% samples are defined as positive samples and the rest (100 − x)% of the samples are negative samples. In Equation (2), CP is the closing price in the current week, and CP after_12weeks denotes the closing price after 12 weeks. In this ranking-based stock classification, α R is used to classify the stocks. The value of x was empirically decided as 10 in this research. When the developed intelligent trading system uses this method to select stocks, the system will predict whether the stock is ranked as the top 10% or not, and select the 10% stocks as the positive samples for investment. In this method, there is no constant threshold to decide whether the stocks are in the top 10% or not; therefore, it is impossible to use the strategy of the constant threshold-based method (sell the positive stock when α C of the stock reaches the threshold) for trading. In this method, the developed intelligent trading system will predict whether the α R of the stock is in the top 10%, and keep the detected top 10% stocks for 12 weeks before selling them. Therefore, α R is defined as the price rising rate after 12 weeks, which is different from Equation (1).

Constant Threshold-Based Stock Classification Considering Historical Volatility
In addition to the price rising rate, volatility is also a factor that needs to be considered in the investment. It is necessary to select the stocks which have both a high price rising rate and low volatility. Therefore, this study proposes the third strategy: constant threshold-based stock classification considering historical volatility. In this method, the target rate β C is defined as Equation (3).
where α C can be calculated using Equation (1), and CP in_past_12weeks is a vector which includes the daily closing price in the past 12 weeks. STD CP in_past_12weeks /CP is the standard deviation of the normalized price in the past 12 weeks. The value of STD CP in_past_12weeks /CP can indicate the historical volatility. In this method, if β C of a stock is higher than a threshold, the stock will be defined as a positive sample in the stock classification; otherwise, the stock is a negative sample. In this study, 8 was chosen as the threshold based on the experience of fund managers. When the developed intelligent trading system uses this strategy, the system will predict whether β C of the stock is over the threshold or not. The system will buy the positive stock in the current week, and sell the positive stock when β C of the stock reaches the threshold in the next 12 weeks.

Ranking Threshold-Based Stock Classification Considering Historical Volatility
Similar to the idea in the second strategy, it is also possible to develop the ranking threshold-based stock classification considering historical volatility. In this method, the target rate β R is defined as Equation (4).
where α R can be calculated using Equation (2). The developed intelligent trading system will predict whether β R of the stock is in the top 10%, and keep the detected top 10% stocks for 12 weeks before selling them. The designed four strategies are summarized in Table 2. Section 4 presents the performance of the stock trading system developed based on the proposed 4 strategies.

Long Short-Term Memory Networks
Long Short-Term Memory networks-usually just called "LSTMs"-are a special kind of RNN equipped with a special gating mechanism that controls access to memory cells. Since the introduction of the gates, LSTM and its variant have shown great promise in tackling various sequence modeling tasks in machine learning-e.g., natural language processing, image captioning, and speech recognition. Basically, a LSTM unit consists of an input gate, a forget gate, and an output gate. The architecture of an LSTM unit is shown in Figure 1. Suppose that x t is the input and h t−1 is the hidden output from the last time step t-1, the input gate decides how much of the new information will be added to the cell state c t , and generates a candidatẽ c t by: where i t can be thought of as a knob that the LSTM learns to selectively considerc t for the current time step. σ is the logistic sigmoid function and φ is tanh. Generally, W terms denote weight matrices (e.g., W xi is the matrix of weights from the input to the input gate), and b terms are the bias vectors. The forget gate decides how previous information will be kept in the new time step, and is defined as: Then, the cell state c t is updated by: where is the element-wise product of the vectors. Then, the output gate uses the output o t to control what is then read from the new cell state c t onto the hidden vector h t as follows: In this study, the functional LSTM(·,·,·) is used as shorthand for the LSTM model in Equation (11): where W and b include the weight matrices and bias vectors indicated in Equations (5)- (9). The value of W and b are determined in the training step.

Concatenated Double-Layered LSTM for Stock Performance Prediction
This study proposes a LSTM-based network to predict the future performance of stocks by classification. The proposed network classifies stocks into two categories (buying or not) based on the historical sequence data. The "Many to one" model has been widely used in sequence data processing. To fully use the memory and forget ability of LSTM, our proposed network is also a "many to one" model. The architecture of the proposed classification network is shown in Figure 2. This means that when classifying the stocks into two categories (buying or not), the historical sequence data from time t− n to t: (r t−n , . . . , r t ) are input into the network together. In the proposed network, n was empirically decided as 52 by considering the experience of professional traders.  There are double-layered LSTMs. Finally, the output is connected with the last LSTM layer. Here, x t denotes the input data at week t. The double-layered LSTM model can be explained by: In Figure 3, s n t stands for h n t , c n t , n = 1 or 2. To reduce the complexity of the task, in this study, a binary-class classification system was developed for stock performance prediction. This means that the classification system is expected to recognize two categories. Therefore, the output of the final hidden layer connects two nodes to indicate the probabilities for two categories, the probabilities can be estimated from the output of the second LSTM layer as: where W 2 is a weight matrix from the hidden layer to the output layer, and b 2 is the bias vector. P t is a vector to indicate the probability of the sample for two categories: 0 and 1. The category "0" means not buying a stock, and the category "1" means buying a stock. The softmax layer and classification layer are responsible for normalization and category selection which are explained in the following equations: The double-layer LSTM model can predict whether the automatic trading system should buy or not buy a stock, given the past 52-week history information of that stock. is decided in the category selection block based on the comparison of the normalized probabilities provided by the softmax layer. If the probability of category "0" (not buying a stock) is higher than the probability of category "1" (buying a stock), the output is [1, 0]. Otherwise, the output will be [0, 1].
In the training of the network, (r t−n , . . . , r t ); Ground Truth ([1, 0]or[0, 1]) is used as the sample data because of the "many to one" architecture. The training algorithm automatically adjusts the parameters in the model based on the principle of the gradient descent. In the training, the loss function is a cross entropy: where, N is the number of samples in the training dataset. T j is the row vector format ground truth for sample j. The objective of the training process is to minimize the value of Loss Function Equation (17). In Figure 3, both weekly and daily data are used as the input of the LSTM-based deep learning network; one weekly datum can be connected with five daily data from Monday to Friday.
Considering the particularity of the stock classification, the mistakes in the classification have different practical meanings. For example, in comparison to false negative (stocks with good performance are missed in detection), the false positive (stocks are incorrectly detected as good performance stocks) has a higher risk in the real investment. It is possible to give a higher weighting to false positive in the loss function to force the training to reduce the false positive. For example, the ratio of the weighting of false negative and false positive could be 1:2, 1:3, and so on. Therefore, the loss function is reformed as Equation (18). In the experiments, this research presents an attempt to find the best option for parameter W Loss in order to achieve good performance of the developed trading system.

Experiment Setup and Evaluation Criteria
In this study, a deep neural network was adopted and 52-week historical data of features were used as the input of the deep neural network for a binary classification. For example, when the system performed the classification on 2018/04/01 to select the stocks with good performance in the next 12 weeks, the historical data of 2017/04/03-2018/04/01 (52 weeks) were input into the deep neural network. The ground truth of each historical data is binary data: buying the stock or not, when talking about the future 12-week performance. Because the data are organized weekly, one stock can provide 52 samples per year.  Figure 4. In the following description, this paper uses the time of the end of data to denote the 52-week length historical data input into the deep neural network. For example, "2018/04/01" denotes the historical data "2017/04/03-2018/04/01". In this study, the training dataset, validation dataset, and test dataset were separated based on the year and month as shown in Table 3. In order to verify the repeatability of the proposed method, this study presents the evaluation of the different datasets. For example, when the system was tested on the dataset of the period from 2018/04/01 to 2018/09/30 (as indicated in the final row of Table 3), the data from 2017/04/01 to 2017/12/31 were used for validation, and the data from 2001/04/01 to 2016/09/30 were used for training. The datasets in each row of Table 3 are considered as one set. Averagely, the number of samples in each training, validation, and test dataset is about 450,000, 45,000, and 30,000, respectively. It is important to note that there is no overlap among training, validation, and test data in each set. The basic process in the evaluation of each set of datasets is to use the training dataset for training the model and obtaining multiple classifiers. After that, the best classifier is selected based on the validation dataset. Finally, the selected classifier is evaluated in the test dataset. This process was conducted on each set of datasets to demonstrate the repeatability of the proposed methods.
The training dataset was used to train the model. Training is an iteration process with multiple epochs. One epoch means that all training data have been used once for backpropagation. In this study, the number of epochs was set as 50, because 50 epochs are enough for the convergency of the training process. The training process output one classifier after each epoch. Therefore, 50 different classifiers were generated after 50 epochs. Theoretically, the final classifier should have the best performance. However, the performance of the classifiers did not change too much in the training process. One reason is that enough training data were provided for the deep learning algorithm. After several epochs, the training processing converged, and the parameters of the classifier were optimal. However, how to choose the best classifier is a problem. In this study, a validation dataset was used to choose the best classifier. As shown in Table 3, the validation dataset is the most recent year before the test dataset. In addition, there is three-month gap between the validation data and test data. When the system works on the day of 2018/03/31 and predicts the future of stocks in the next 12 weeks, the validation dataset should be the data from 2017/04/01 to 2018/03/31. However, the future 12-week data for the historical data from 2018/01/01 to 2018/03/31 are not available on the day 2018/03/31. Therefore, the data from 2018/01/01 to 2018/03/31 cannot be used for validation, and the validation dataset has a 9-month period.
In addition, as described in Section 3, the low false positive value is also expected in the stock selection. Therefore, when choosing the classifier, it is also necessary to consider which one has a low false positive value and maintain the high true positive value at the same time. In this study, the following best precision criterion was adopted to select the classifier in the validation dataset: where TP is the number of true positive samples and FP is the number of false positive samples. In the repeatability evaluation, the test dataset had half a year period, and the validation dataset had a 9-month period. The training dataset is the data excluding the test and validation dataset. For example, when the test dataset is data from 2011/04/01 to 2011/09/30, the validation dataset is data from 2010/04/01 to 2010/12/31. In this case, the training dataset was the data from 2001/04/01 to 2010/03/31 and the data from 2012/10/01 to 2018/09/30. In this study, future data after the test data were used for training, because the deep learning needs huge training data to achieve good performance. Using the data after the test data period increases the number of training samples. It is important to note that there is a one-year gap from the end of the test data period to the training data period, because excluding the data in that one year can strictly guarantee that any part of the test data is not used in the training.
In this study, True Positive Rate (TPR, Recall), True Negative Rate (TNR), Average Correction Rate (ACR), and Precision were used to evaluate the performance of the developed prediction systems. The evaluation criteria are denoted in Equations (20)-(23): Precision = TP TP + FP (23) where P is the number of all positive samples, and N is the number of all negative samples. TP is the number of true positive samples, FP is the number of false positive samples, TN is the number of true negative samples, and FN is the number of false negative samples. In addition to the four criteria, the average maximum price rising rate of stocks and average maximum price decreasing rate of stocks were also used in the evaluation. Moreover, this study also presents a simulation of stock trading to evaluate the performance of the proposed methods, and the details of the simulation are presented in each following each subsection.

Results of Constant Threshold-Based Stock Classification
In the evaluation of the method of constant threshold-based stock classification, two factors should be discussed: W Loss parameter in Equation (18), and features in Table 1. Table 4 shows the performance of the classification using all features and W Loss (1:1) values.
The first column indicates the time period of the test dataset. True negative rate, true positive rate (recall), average correction rate, and precision are listed from the second to fifth columns. The average of the maximum rising rate of detected good performance stocks and all stocks are demonstrated in the sixth and seventh columns. The average of the maximum decreasing rate of detected good performance stocks and all stocks are shown in the eighth and ninth columns.
Moreover, this study also presents a simulation of a real stock trading system. In the case of the binary-class classification system using 70% rising rate threshold, the system sets up a 70% rising rate as the selling point. The system will firstly buy all selected stocks. If a selected stock (detected positive sample) achieves 70% rising rate, the system sells it immediately. Otherwise, the stock is kept and sold by the end of 12 weeks. The tenth and eleventh columns of Table 4 show the earning rate of the simulated stock trading system. In addition, the twelfth column of Table 4 provides the basis of all stocks. The basis is the average earning rate from present to 12 weeks later.
In addition, the other two criteria are used in the evaluation of risk: Sharpe ratio with trading on selling point and Sharpe ratio without trading on selling point. The two criteria are defined as Equations (24) where CP is the closing price in the current week, P selling is the price when selling the stock, and CP after_12weeks is the close price after 12 weeks. In Equation (24), CP until_selling is a vector which includes the daily closing price from the current week to selling. CP in_next_12weeks is a vector which includes the daily closing price in the next 12 weeks. In fact, the Sharpe ratio without trading on selling point means the stocks will be sold at the end of 12 weeks. The thirteenth and fourteenth columns of Table 4 show the Sharpe ratio with trading on selling point, the fifteenth and sixteenth columns of Table 4 illustrate the Sharpe ratio without trading on selling point. In addition, the average of all tests is listed in the last row of Table 4. This study presents the evaluation of different features and W Loss values. Because of the limitation of the page length, Table 5 shows the summary of these evaluations. In this study, the results generated using different W Loss values were compared, and then the best W Loss values were chosen for the feature selection. The following conclusions can be obtained from the data in Table 5: (1) Values of parameter W Loss affect the earning rate.
(2) Price features can provide the best performance compared to all features.
(3) The best earning rate happens when the stock classification uses W Loss (1:2), price features. A 2.146% (6.037-3.891%) earning rate per 12 weeks above basis is achieved in constant threshold-based stock classification. (4) The Sharpe ratio of the detected stock is lower than all stocks, which indicates that the constant threshold-based stock classification method has a relatively high risk in stock classification. The reason for this is that the stock volatility is not considered in this method.

Results of Ranking-Based Stock Classification
Similar to the constant threshold-based stock classification, this study also presents multiple evaluations for the ranking-based stock classification. In the evaluation, the effect of different values of W Loss and input features was tested. In the ranking-based stock classification, the top 10% stocks were considered as positive samples. Table 6 shows results generated using the different configurations. There was no fixed threshold for stock trading; therefore, the simulated stock trading system kept the detected or all stocks until the end of the following 12 weeks and then sold them. Thus, there was no Sharpe ratio with trading. The following conclusions can be obtained from the data in Table 6.
(1) Values of parameter W Loss affect the earning rate.
(2) Price and trading volume-related features can provide the best performance compared to all features. (3) The best earning rate happens when the stock classification uses W Loss (1:2), price and trading volume-related features. A 6.984% (10.875-3.891%) earning per 12 weeks above basis is achieved in the ranking-based stock classification. (4) The Sharpe ratio of the detected stock is lower than all stocks, which indicates the ranking-based stock classification method has a relatively high risk for the stock classification. The reason is that the stock volatility is not considered in this method. (5) From the aspect of the earning rate, the ranking-based stock classification method is more effective than the constant threshold-based stock classification method.

Results of Constant Threshold-Based Stock Classification Considering Historical Volatility
Similar to the constant threshold-based stock classification, this study also presents multiple evaluations for the constant threshold-based stock classification considering historical volatility. In the evaluation, the effect of different values of the parameter W Loss and input features was tested. Table 7 shows results using different configurations. In the ranking-based stock classification, the threshold for target rate β C was set as eight to classify positive samples. In the simulated trading system, the target rate eight was set as the selling point. If the stock achieves the target rate eight, it will be sold immediately. Otherwise, the system will keep the stock and sell it by the end of 12 weeks. The following conclusion can be obtained from the data in Table 7.
(1) Values of parameter W Loss affect the earning rate.
(2) Price and trading volume-related features can provide the best performance compared to all features. (3) The best earning rate happens when the stock classification uses W Loss (1:1), price and trading volume-related features. A 2.504% (6.395-3.891%) earning per 12 weeks above basis is achieved. (4) The Sharpe ratio of the detected stock is higher than all stocks, which indicates this method can also reduce risk in stock selection. The reason is that stock volatility is considered in this method. (5) From the aspects of Sharpe ratio, this method has better performance than the stock classification method without considering historical volatility.

Results of Ranking-Based Stock Classification Considering Historical Volatility
Similar to the previous methods, this study also presents multiple evaluations for the ranking-based stock classification considering historical volatility. In the evaluation, the effect of different values of parameter W Loss and input features was tested. Table 8 shows the results using different configurations. In this method, the top 10% of stocks were considered as positive samples. Table 8 shows a summary of the evaluations. There was no fixed threshold for stock trading; therefore, the simulated stock trading system kept the detected or all stocks until the end of next 12 weeks, then sold them. Thus, there was no Sharpe ratio with trading. The following conclusion can be obtained from the data in Table 8.
(1) Values of parameter W Loss affect the earning rate.
(2) Price and trading volume-related features can provide the best performance compared to all features. (3) The best earning rate happens when the stock classification uses W Loss (1:1), price and trading volume-related features. The method of the ranking-based stock classification considering historical volatility has a 9.044% earning rate per 12 weeks in Japanese stock data from 2011 to 2018. A 5.153% (9.044-3.891%) earning per 12 weeks above basis is achieved in ranking-based stock classification considering historical volatility. (4) The Sharpe ratio of the detected stock is higher than all stocks, which indicates this method can reduce the risk in stock classification. The value of Sharpe ratio of this method is very similar to its value in the method of the constant threshold-based stock classification considering historical volatility. (5) By considering both earning rate and Sharpe ratio, the ranking-based stock classification method considering historical volatility is the most effective strategy among the proposed four strategies.
In addition, in this study, the proposed deep neural network-based stock performance prediction method was compared with two conventional methods: Logistic Regression-based classification and Support Vector Machine (SVM)-based classification. This comparison was performed for the ranking-based stock classification considering historical volatility. The comparison in Table 9 shows the proposed method has a higher earning rate and a lower risk than the conventional methods.

Conclusions
This study presents four strategies for stock classification and performs feature selection to achieve a higher earning rate and lower risk in stock classification. The following points are concluded based on the evaluations and analysis: (1) Using the ranking method can improve the earning rate above basis.
(2) Using historical volatility information for training can increase the Sharpe ratio and reduce the risk in stock classification. (3) Using price and trading volume-related features has a better performance than using full features in stock classification. (4) By considering both earning rate and risk, the ranking-based stock classification method considering historical volatility is the most effective strategy among the proposed four methods. About 5.2% earning rate per 12 weeks over than the basis of all stocks is achieved in the repeatability test. The selected stocks have lower risky than basis. (5) The proposed deep neural network-based stock performance prediction method has better performance than the conventional methods.
The proposed method can have about 36% (9.044% per 12 weeks) earning rate per year in the Japanese stock market. However, excellent human stock traders have achieved triple-digit returns per year [6]. There is still a gap between the performance of the developed intelligent trading system and the achievements of human stock traders. This paper proposed the use of the classification network to classify the stocks into two categories: buying or not. In the future, a regression network will be developed to predict the exact value of the future price. In this way, the developed trading system is expected to obtain higher earnings.