A Nonlinear Technical Indicator Selection Approach for Stock Markets. Application to the Chinese Stock Market

: In this paper we present a combinatorial nonlinear technical indicator approach for the identiﬁcation of appropriate combinations of stock technical indicators as inputs in non-linear models. This approach is illustrated with the example of Chinese stock indexes and 35 different stock technical indicators using neural networks as the chosen non-linear method. Stock market technical indicators can generate contradictory signals regarding the future performance of the stock analyzed. Furthermore, some non-linear methods, such as neural networks, can have poor generalization power when dealing with problems of high dimensionality due to the issue of local minima. Therefore, non-linear approaches that can identify appropriate combinations of input variables are of clear importance. It will be shown that the proposed approach, when using neural networks as classiﬁers, generates error rates lower than those using directly neural networks without dimensionality reduction. It will also be shown that merely increasing the number of neurons does not increase the accuracy. The approach proposed in this article is illustrated with an application to the stock market using neural networks but it could be applied to other ﬁelds and it can also be used with other non-linear techniques such as for instance support vector machines.


Introduction
Over the last few decades there has been an increase in non-linear forecasting techniques, such as for instance neural networks (NN) and support vector machines (SVM). The non-linearity of the stock market performance has been mentioned by many researchers such as for instance Vrbka and Rowland [1]. In fact, the non-linearity of the stock market was one of the reasons behind choosing neural networks (a non-linear approach) by Vrbka and Rowland [1] as a model to forecast stocks prices in Prague Stock Exchange. In this paper the authors successfully applied multilayer perceptron and radial basis networks to that particular stock market. While neural networks are an important stock forecasting technique it should be noted that, as in any other technique, it has limitations. Horack and Krulicky [2] did in this regard an interesting comparison between the exponential time series alignment method and the time series alignment with neural networks method. The authors highlighted the importance of neural networks in the field of stock forecasting mentioning that generally neural networks provide better forecast than traditional methods. However, they also concluded that in their example, applied to a volatile stock, the traditional forecasting method generated better results than the neural network. This highlights the importance of using the appropriate stock forecasting techniques with papers in the existing literature finding less than optimal results for some popular techniques. For instance, Groda and Vrbka [3] concluded that the Box-Jenkins method is not an appropriate method in the case of stocks listed in the Prague Stock Exchange.
An important development in recent years is the very large increase in data available in many disciplines. It is relatively frequent to have a high number of variables that can potentially have an impact in non-linear processes, see Guyon and Elisseeff [4]. There is substantial research covering the topic of variable selection using linear methods, such as for instance Hocking [5], but these techniques might not be ideal when intended for non-linear modelling. There is relatively little literature covering the issue of variable selection of non-linear processes from a combinatorial approach. The basic assumption is that in non-linear processes the way in which different independent variables interact with each other can be highly complex. Identifying which combination of variables work better for a non-linear problem is clearly not trivial, see Yuan and Lin [6]. There are some interesting methods, such as for instance Rech and Terasvirt [7], using polynomial approximation. The drawback of this type of approach is that it is only applicable when there is a relatively small number of variables.
Ye and Sun [8] proposed an iterative method in which starting from all the variables considered one, of the variables is dropped and the resulting set of variables is used, using neural networks, and then the results compared.
The stock market, as many other fields, has seen a large increase in the number of data available. More specifically, many researchers and practitioners have developed large amount of technical indicators. Getting into the details of these indicators is beyond the scope of this article but it is important to mention that they are typically constructed using historical data such as for instance the closing price of a stock. A moving average is a well-known technical indicator [9][10][11][12][13][14][15]. A simple moving average can be constructed as the average of the closing price of a certain stock or index over a certain period of time. There are multiple different technical indicators and strategies based on those indicators with diverse levels of profitability [16][17][18][19]. Neural networks have been applied successfully to the field of stock market forecasting [20][21][22][23][24][25]. One of the frequently mentioned drawbacks of neural networks is the issue of local minima [26][27][28][29] which can cause the neural network to generalize poorly or in other words, generate poor forecast when faced with new data. In this regard there has been a focus on reducing the dimensionality of the data to avoid having the neural network stuck in this local minima [30][31][32][33][34]. In this paper we propose a non-linear combinatorial approach for variable selection applied to stock market technical indicators. Given the number of possible technical indicators that are available today, is clear that not all possible combinations can be tested.
In this work we use a pool of 35 technical indicators, so the number of possible combinations is staggering. Because of that the proposed approach resorts to randomization. The algorithm starts by generating a preset number of combinations each with a size of half the number of available indicators. The reason for using such size is because the maximum number of combinations for a given n and k, that is the binomial coefficient n choose k, is attained for k = n 2 . This is evident by inspecting the Pascal's Triangle, but can be proved using the properties of the Newton binomial. From this initial set of combinations, new combinations are generated randomly retaining the best ones in an iterative process that ends when the stop condition is met. The proposed method is shown to improve the baseline approach, that is using all the available indicators, when applied to the task of predicting market trends. As a case study for the proposed method, this paper focuses on the chinese stock market. China stock market is a very dynamic and of increasingly importance due to the economic grow of China. The most relevant chinese stock indexes have been considered. The strategy described in this paper has obtained good results and better combinations of technical indicators have been identified for such indexes.
The remaining of the paper is organized as follows: Section 2 presents the proposed approach. Section 3 shows the application of the proposed approach to the China stock market. The results are discussed in Sections 4 and 5 presents the conclusions of the paper.

Technical Indicator Selection Approach
Let X i T (t) be tuple of T values of the i-th technical indicator from a pool of up to N nonlinear technical indicators computed at the time period t (usually the time period is measured in days, but could be weeks, months or whatever), such as for instance a moving average: where X i (t) is the value of the i-th technical indicator evaluated at time t. Let R T (t) be a vector that groups the direction of the change of the price of a stock or index, in the period from t − (T − 1) to t, that is R T = {0, 1} T with 0 meaning that the stock increases or remains constant at the end of a period, 1 meaning otherwise. Additionally, letφ define a nonlinear mapping from X i whereˆT(t) is an estimation of R T (t). In order to guarantee that an estimatorφ can be found is necessary to assume the following: Assumption 1 (Ground truth function existence). It is assumed that, φ, a mapping from X i T (t) to R T (t) exists.
Technical indicators in the stock market can produce contradictory signals and some non-linear techniques can have local minima issues when using high dimensionality input variables, (large N or T value). Therefore it might be convenient to find a combination of the technical indicators X i T rather than using all N available indicators. We present a combinatorial selection approach for technical indicators that do not have to be linear variable combination in non-linear processes.
The steps are as follows: 1. Split the available instances of X i T (t) and R T (t) data into two subsets, an estimation subset S e {X i T,e (t), R T,e (t)} and a validation (To keep the notation as clear as possible we use here the term "validation" with the meaning of test) subset S v {X i T,v (t), R T,v (t)}. 2. Generate C s , a set of M combinations of N 2 random numbers in the range {1, 2, . . . , N} denoted from i 1 to i N 2 . Check for repetition within each combination. If there is repetition, leave out the repeated numbers and keep generating random numbers in the aforementioned range until there is no repetition. Similarly, check that there are no repeated combinations and proceed to replace the redundant ones. 3. For each of the M combinations in C s , denoted as I ∈ C s , perform the following steps: (3a) Computeφ, i.e., the estimated non-linear classification mapping, using any technique of choice. In this case, the nonlinear mapping will have as arguments X i T (t), ∀i ∈ I. Using neural networks as an example, this step involves training a neural network with the training data set {X i T,e (t), R T,e (t)} with i ∈ I. (3b) Evaluateφ over the validation set. LetR T,v denote the estimated classification output of R T,v . (3c) Calculate the error ξ T (t) of the non-linear classification approach for each instance in the validation set, so that 1 Otherwise with the resulting total error being: where card(S v ) denotes the cardinality of S v . This is therefore the total error for the initial random combination of technical indicators chosen in step 2. (3d) Randomly generate a single value i a ∈ {1, 2, ..., N} that would denote a technical indicator candidate to be added to the combination chosen in step 2. Check for repetition with the previously generated N 2 values. If there is repetition, randomly generate another value i a . Repeat this step until there is no repetition. (3e) Randomly generate a single value i r ∈ {i 1 , . . . , i N 2 }. This value will be used later to denote a technical indicator to be removed from the combination chosen in step 2. Substitute the initial combination I by the two combinations picked in the previous step. This will double the number of combinations in set C s , i.e., C s will contain 2M combinations after this step.
4. Retain the M combinations in C s with the smallest error and discard the remaining M combinations with greatest error. 5. Find the minimum ξ Total of all the combinations I ∈ C S . 6. Repeat step 3 starting from 3d until a certain stop condition is met. Here it is proposed to check if a certain objective error is achieved or a maximum number of iterations is met, so that an infinite loop is avoided.

Remark 1.
This strategy can be easily parallelized as the tasks in step 3 can be done independently for each combination, hence can be done in parallel, with one exception. The latter part of step 3f, that is the rejection of repeated combinations, cannot be done in parallel and must be done sequentially in a separate step outside of step 3.

Remark 2.
The number of combinations in C s , that is M, and the number of iterations are related in the sense that similar performances can be achieved by using a greater M and fewer iterations or vice versa. Besides the differences in performance due to the randomized nature of the proposed strategy, a practical difference can lead to one choice or the other. With a greater M one can exploit the parallelizable nature of the algorithm, whereas in the case of a high number of iterations that cannot be made.
To illustrate the proposed approach consider a simple example with just two combinations (i.e., M = 2) and one iteration. If for instance, we assume that we have a total of 6 technical indicators to choose from, i.e., {X 1 5 , X 2 5 , X 3 5 , X 4 5 , X 5 5 , X 6 5 }, where the sub-index 5 means that each of the indicators are evaluated over five periods of time. There is also a related classification vector R 5 identifying up and down movements in the stock at each time t (in this case R 5 ∈ {0, 1} 5 ). First, the algorithm selects randomly 2 combinations of 3 initial technical indicator indexes. For example: Then the non-linear classification error is estimated for this configuration; let us assume that the obtained value is (with a slight abuse of notation the parameters of each combination will denote in this example the set of the parameters of both combinations): Then the indexes i a and i r are computed for each combination: The addition of the new technical indicator index is done randomly, ensuring that there is no repetition, i.e., each input variable (technical indicator) is used only once. The index removed is also computed randomly. Then the combinations I up and I down are formed: Note that in every combination generated so far no index is repeated and that there are not repeated combinations. Once the new combinations are formed, their total classifying errors are computed: For every combination in C s we choose the two combinations (from I, I up and I down ) with the smallest error. From the fist combination we choose then, I down We order the combinations using the error as sort criterion: Now we can reduce the numbers of combinations to its original number (M = 2) picking the two with the smallest error: Then at the end of this first iteration the algorithm would pick the combination I down 2 as the best technical indicator combination, as it has the lowest error (0.2). Thus, the technical indicators proposed to forecast the direction of the change of price will be {X 1 5 , X 3 5 }. In practice, the number of combinations and iterations will be determined by the computing power available. The process thus will be repeated until a stop criteria is reached (that is, a maximum number of iterations or a target classification error).

Data and Methodology
The Chinese equity market is an increasingly important market propelled by the large economic expansion of the Chinese economy over the last few decades. The Chinese equity market is divided into two major stock exchanges. The Shanghai and the Shenzhen Stock Exchange with several major stock indexes describing the performance of those markets. We used 6 different stock indexes describing the Chinese stock. For completeness purposes, and to exclude some form of regional bias in the results, we also used to other international (non-Chinese) stock indexes (see Table 1). As described in the previous section the selection approach is used to forecast the direction of the movement of the stock index (up or down) in the next time period rather than the exact end price for that period. Daily closing prices for all the indexes mentioned in Table 1 were collected from the Bloomberg database for the period from 14 February 2007 to 30 March 2020. The returns of those index can be seen in Figure 1. Positive or zero returns will result in an R T value of 0 and 1 otherwise. The proposed approach was used to identify an appropriate combination of technical indicators trying to describe the performance of the equity market. Stock technical indicators are indicators typically based on historical performance of the stock as well as the traded volume of that stock. There is an ever increasing amount of technical indicators in the existing literature which can generate contradictory signals. We used 35 commonly used stock technical indicators (see Table 2) extracted from the database Bloomberg. Table 2. Technical indicators. C t denotes the current price and L t and H t are the minimum and maximum prices in period t and MA means moving average. The exact formulas for some of the indexes are proprietary with the indicators for specific stocks obtainable from databases such as Bloomberg. Source: Blomberg, Kim [35].
RSI (14)  The main hypothesis is that it is possible to construct an algorithm that estimates an appropriate combination of inputs (technical indicators) for non-linear forecasting tools, generating lower error rates when forecasting the direction of stock movements than using directly all the technical indicators available. Another hypothesis is that on average such algorithm generates lower error rates than completely random combinations of the input variables (technical indicators).
The proposed approach was implemented in Matlab using neural networks as classifiers. 100 initial configurations times 2500 iterations were carried out for each index, which translates in 250,000 neural networks per index and a total of 2,000,000 neural networks (8 indexes). The neural networks used were back propagation classification neural networks with one hidden layer with 25 neurons and trained with the Levenberg-Marquardt rule. The value for the number of neurons was chosen as a result of the preliminary sensitivity analysis on the number of neurons presented in Section 3.2.

Results
Previously to testing the proposed strategy, a preliminary sensitivity analysis on the number of neurons has been carried out to find the most convenient number of neurons in the hidden layer. The full set of 35 technical indicators was used, and the number of neurons in the hidden layer was increased from 25 to 25,000 in steps of 25 neurons. Simply increasing the number of neurons did not appear to increase the classification accuracy of neural networks for the performance on stock index in the following period (t + 1) (see Figure 2) for any of the six indexes analyzed. On the other hand, Figure 3 shows an example of the evolution of the error rate (misclassification of up/down days) in the training of one the neural networks. It can be seen that training improves the fitting to the data up to a significant number of training iterations, showing that the NN is really learning the ground truth function φ. The indicator selection approach is a combinatorial approach that can be used to select an appropriate combination of variables in non-linear models, using techniques such as neural networks. In the example illustrated in this article the proposed approach was implemented in six different Chinese stock indexes plus two world indexes. The error rate obtained using the algorithm in combination with neural networks was lower than the error rate obtained using directly neural networks including all the 35 available variables, see Table 3. The average improvement in the error rate (over all the considered indexes) was 9.1%. It turns out that this is a very good result considering how difficult is the task of predicting the movements of the market and the great amount of benefits that can be realized even with a small improvement in the forecasting. Moreover, in some of the indexes the improvement is quite high, in excess of 11% (A50, Shanghai Composite and SSE50). A very interesting finding is that the baseline approach, that is, considering all the technical indicators available for forecasting yields error rates greater than 50% in some cases (A50, CSI800, Shanghai Composite, SSE 50 and Euro Stoxx 50). This means that tossing a coin produces better results than using the full pool of indicators. The reason for that is that, as pointed out in Section 3.1, some of the technical indicators produce contradictory signals. The histogram indicating the frequency of appearance of the technical indicators in the output of the algorithm for the various indexes shown in Table 3 can be seen in Figure 4. It is evident that some of the technical indicators are picked more frequently than others, thus they are more likely to provide better accuracy in the prediction of the price change direction. Nevertheless, this does not imply that using only these more relevant indicators will lead to a better forecasting. As an example the combination formed by the indicators with highest frequency of ocurrence in Table 3 and Figure 4, i.e., {2, 4, 12, 25, 34}, gives an average improvement of 2.3% over the baseline approach, which is significantly worse than that achieved by using the proposed combinations.  Table 4 shows the improvement along the iterations on the algorithm. The average improvement from the first iteration to the last was 7.1%. While this value clearly evinces that the algorithm improves the initial combinations, a more rigorous test has been done. A Wilcoxon test was carried out comparing the distribution of error rates obtained in the first and last iterations for each index. The Wilcoxon test rejects the hypothesis, for all the indexes analyzed, that the median of the error rates for the initial and final distributions are statistically the same ( Table 5), suggesting that the iterative process does significantly improve accuracy. The same approach was followed to compare the error rate using neural networks directly (with all 35 technical indicators) with the error rate obtained using the technical indicator selection approach ( Table 6). The Wilcoxon test rejects the null hypothesis that the median of the error rates obtained using these two methods are statistically equivalent, suggesting that the proposed method statistically significantly improved the forecasting accuracy of up/down stock index movements.   Remark 2 has been also taken into account, and the algorithm has also been used with M = 2 and a total 125, 000 iterations which should be roughly equivalent to the parameters used previously, that is 100 combinations and 2500 iterations. The average improvement in this case was 8.7% which is marginally worse than that previously achieved. The difference could be due to the lower diversity of the set of candidate solutions, but also to the randomized nature of the strategy.
The average total time per index (100 initial configurations times 2500 iterations) was 157,691 s. The calculation time for each index can be seen in Table 7. The calculations were carried out in Matlab 2016 in an Intel, i5-3470, 3.2 GHz, 64 bit computer. The selection approach requires a significant amount of computation time but it clearly is faster than calculating all the possible combinations of technical indicators, which is for the example presented in this paper is not a feasible calculation in a normal computer.

Discussion
The proposed method can be a feasible approach when trying to determine a combination of variables or features to be used when forecasting the behaviour of non-linear processes. In the particular example of the stock market there is a very large number of technical indicators that are intended to give the investor some indication of the future performance of the stock. These indicators can generate contradictory signals and selecting the appropriate combination of technical indicators can become a difficult tasks. Reducing the dimensionality of the problem is also important to avoid issues such as local minimum that can cause poor generalization when applying techniques such as neural networks. We showed in this paper that it is possible to use our approach in the Chinese stock market (generating an appropriate combination of independent variables for non-linear models) obtaining better results than directly applying neural networks to all the available independent variables. This was tested using 6 Chinese stock index (as well as two international indexes) and 35 technical indicators.
There was an average 9.1% improvement when using the combinatorial approach with neural networks over the results using directly all the technical indicators and neural networks as the non-linear forecasting technique. The formal statistical analysis comparing the results using neural networks directly (all technical indicators) with the results from the combinatorial approach using neural networks shows that there are statistically significant difference for the error rates obtained at a 1%, 5% and 10% significance level, supporting once more the hypothesis that the combinatorial approach using neural networks is a more appropriate tool for forecasting the direction of the stock market movement, at least for the 8 indexes analyzed, than using neural networks directly. Thus, for eight different indexes, better combinations of the technical indicators have been found, offering a practical choice to improve the forecasting accuracy and hence the expected benefits. The total calculation time per index (100 initial configurations time 2500 iterations) was 157,691. While this is a substantial amount of time it is a calculation that can be done with a normal laptop computer. Moreover, many of the operations of the proposed algorithm can be done in parallel further shortening the computation times.
While direct comparison is challenging the approach of using the combinatorial approach with neural networks seems to be generate better results for stock forecasting purposes than other approaches used in the existing literature such as for instance the Box-Jenkins approach used by Groda and Vrbka [3], which the authors considered not suitable. A more comparable paper is Kim and Han [36] that achieved a hit rate of 61% in the Korean market using genetic algorithm in combination to neural networks which is comparable to the 59% rate that we obtained in the Chinese market. Nevertheless comparison across different stock markets should be taken with caution. For instance, it should be naïve to believe that the same approach would generate the same results in two markets as different as South Korea and China with China being an open stock market dominated by institutional investors while the Chinese market is a market dominated by local retail investors.
The selection approach was illustrated in the context of the stock market and using neural networks but the approach is easily applicable to other fields. This is increasingly important as the amount of data available in many fields has increased substantially over the last few decades with an ever increasing need for tools to process large databases. Besides neural networks other non-linear models, such as for instance support vector machines, can be used using the proposed approach. This could be an interesting area of future work.

Conclusions
The proposed combinatorial method for variable selection is applicable to the problem of forecasting the direction of stock market movements using non-linear techniques such as neural networks. This approach generates better results than directly applying non-linear forecasting methods such as neural networks using all the available variables. For large amount of technical indicators (independent variables) it is clearly not possible to estimate the forecasts for all the combinations with the proposed method providing a reasonable alternative. In fact, it has been shown that using all the available indicators can be counterproductive as it has higher error than a pure random approach. Another relevant result is that better indicator choices have been researched for 8 different indexes. The calculation time for the combinatorial approach is another factor to take into account as it is a computationally demanding, but clearly more efficient, from a calculation time point of view, than estimating the forecasts for all the possible combinations.
The combinatorial approach was tested thoroughly for the Chinese stock market as well as for some indexes describing the US and European stock market. The forecasting accuracy of this approach in other markets might be different and this can be potentially an area of future work. There are several factors that could potentially impact the forecasting accuracy of this approach. For instance, narrow and deep markets might have different behaviors and hence the combinatorial approach might also have rather different forecasting accuracy.
Overall, when there are many potential variables (technical indicators) that can affect the performance of the stock market and no strong fundamental support for choosing a specific combination of these variables, the proposed approach can be an appropriate alternative. Similarly, while the approach was tested using neural networks it can be easily applied to other non-linear forecasting techniques such as support vector machines. It can also be generalized to other forecasting problems besides stocks. It is in that sense a rather general approach with many potential applications.