Training Multilayer Perceptron with Genetic Algorithms and Particle Swarm Optimization for Modeling Stock Price Index Prediction

Predicting stock market (SM) trends is an issue of great interest among researchers, investors and traders since the successful prediction of SMs’ direction may promise various benefits. Because of the fairly nonlinear nature of the historical data, accurate estimation of the SM direction is a rather challenging issue. The aim of this study is to present a novel machine learning (ML) model to forecast the movement of the Borsa Istanbul (BIST) 100 index. Modeling was performed by multilayer perceptron–genetic algorithms (MLP–GA) and multilayer perceptron–particle swarm optimization (MLP–PSO) in two scenarios considering Tanh (x) and the default Gaussian function as the output function. The historical financial time series data utilized in this research is from 1996 to 2020, consisting of nine technical indicators. Results are assessed using Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and correlation coefficient values to compare the accuracy and performance of the developed models. Based on the results, the involvement of the Tanh (x) as the output function, improved the accuracy of models compared with the default Gaussian function, significantly. MLP–PSO with population size 125, followed by MLP–GA with population size 50, provided higher accuracy for testing, reporting RMSE of 0.732583 and 0.733063, MAPE of 28.16%, 29.09% and correlation coefficient of 0.694 and 0.695, respectively. According to the results, using the hybrid ML method could successfully improve the prediction accuracy.


Introduction
Accurately predicting the stock market (SM) index direction has frequently been a topic of great interest for many researchers, economists, traders and financial analysts [1]. Nonetheless, the SM field is neither static nor predictable. In fact, SM trends are sensitive to both external and internal drivers. Thus, SM index movement estimation can be categorized under complex systems [2]. Stock price movement is often interpreted as the direction of stock price and used for prediction. Determining the The rest of this research is organized as follows: The next section reviews the relevant literature. Section 3 deals with the research methodology. The results are given in Section 4, while the findings are discussed in Section 5. The conclusions are presented in the final section.

Literature Review
In recent years, there have been a great deal of papers investigating the direction of the next day trends of SMs. Academicians and traders have made enormous efforts to forecast the next day trends of SM index for translating the predictions into profits (Kara et al., 2011). In this section, we focus the review of methods and technical indicators utilized for forecasting of direction movement of stock index. As shown in Table 1, an ANN model was used in some of the studies [18,25,29,30], whilst hybrid models were preferred in other studies [17,21,[31][32][33] as displayed in Table 2. In Table 1. the notable algorithms are back-propagation neural network (BPNN), independent component analysis-BPNN (ICA-BPNN), Naive Bayes (NB), and k-nearest neighbors algorithm (k-NN). Table 1 summarizes several machine learning methods proposed for stock exchange index direction prediction. The most popular methods are RF, SVM and ANN, followed by k-NN and NB, with dissimilar accuracy outcomes. The state of the art shows a research gap in using hybrid models. Table 2 summarizes the notable machine learning models. Hybrid models of the SVM trained with simple evolutionary algorithms such as GA have been the most popular. The state of the art of hybrid models shows a research gap in using more sophisticated machine learning models trained with advanced soft computing techniques. Table 1. Stock market index direction forecasting with machine learning considering comparative analysis involving ANN-based methods.
[21] GMM-SVM Indonesia ASII.JK (2000-2017) The GMM-SVM model has been found to be superior to other models. MPR is an effective classifier.

Technical Indicators
As mentioned above, technical indicators have been useful and effective financial instruments for estimating direction of stock price index for years. Technical indicators used for SM direction prediction from past to present can be seen in Table 3.
In summary, MACD, %K, %D, RSI, %R, A/D, MOM, EMA, CCI, OSCP and SMA are the technical indicators that are frequently preferred by researchers. Furthermore, ANN and its extensions (MLP, PNN, etc.) are the most used methods. As far as the authors know, MLP-GA and MLP-PSO methodologies with and without Tanh (x) as the output function have not been proposed to forecast stock exchange movement prediction for any stock exchange in the literature. As a result, it is anticipated that this paper will constitute a significant contribution to the related field. [29] MACD, SMA, %R, CCI, A/D oscillator, %D, weighted moving average (WMA), RSI, MOM, %K. [24] %D, %K, RSI, MOM, MACD, WMA, %R, A/D oscillator, SMA, CCI. [35] SMA, MACD, RSI, OSCP, MOM, volume. Author/s Technical Indicators [18] MACD, RSI, %D, SMA, Bollinger band, MOM, %R.

Data
As shown in Table 4, nine technical indicators for each trading day were utilized as input data. Plenty of investors and traders handle certain criteria for technical indicators. A great deal of technical indicators is available. As already mentioned above, technical indicators have often been considered as input variables in the construction of forecasting systems for estimating the trend of movement of SM index [24]. As a result, we determined nine technical indicators by previous studies and the opinion of area experts.

Technical Indicators Abbreviation Formulas
Simple n (10 here)-day Moving Average SMA SMA = Ct + Ct − 1 +···+ Ct − n n Simple n (10 here)-day Moving Average WMA C t is the closing price, L t the low price, H t the high price at time t, DIFF: EMA(12) t -EMA(26) t , EMA exponential moving average, EMA = a × x t + (1 − a) × x t−m , α smoothing factor: 2/(1 + k), k is time period of k day exponential moving average, LL t and HH t mean lowest low and highest high in the last t days, respectively, M t : (H t +L t +C t )/3; The input variables utilized in this work are technical indicators described in Table 4 and the direction of change in the daily Borsa Istanbul (BIST 100) SM index. The entire data contains the period Entropy 2020, 22, 1239 6 of 19 from 28 March 1996 to 7 February 2020, providing a total of 5968 trading day observations. Furthermore, information about opening and closing price is available for each trading day. The number of entire data with decreasing direction is 2827 (47.36%), whereas the number of entire data with increasing direction is 3141 (52.64%). All the data were obtained from Matriks Information Delivery Services Inc. (https://www.matriksdata.com).

Multilayer Perceptron (MLP)
The architecture of ANN is based on connections of layers by nodes called neurons as well as the biological neurons of brain [50]. Each path transmits a signal among neurons in a manner similar to that of synapses [51]. MLP, as a feedforward ANN, contains three main parts: one input layer, one or more hidden layers and one output layer, which can be successfully employed for prediction, classification, signal processing and error filtering [52]. Each node employs one nonlinear function. MLP employs backpropagation learning algorithm for training process [53,54]. MLP as popular and frequently used techniques among other MLPs was employed to predict the direction value. MLP was developed by the use of MATLAB software. Figure 1 indicates the architecture of developed network. Initially, the network divided data into two sets of training data (with a share of 80%) and testing data (with a share of 20%) randomly. In the first step of the training process, training to find the optimum number of neurons in the hidden layer. In each training process, Mean Square Error (MSE) was computed as the performance function. MLP employs backpropagation learning algorithm for training process [53,54]. MLP as popular and frequently used techniques among other MLPs was employed to predict the direction value. MLP was developed by the use of MATLAB software. Figure 1 indicates the architecture of developed network. Initially, the network divided data into two sets of training data (with a share of 80%) and testing data (with a share of 20%) randomly. In the first step of the training process, training to find the optimum number of neurons in the hidden layer. In each training process, Mean Square Error (MSE) was computed as the performance function. Genetic algorithm (GA) and particle swarm optimization (PSO), as evolutionary algorithms, have been employed to train the neural network. This approach of hybridization of ANN has a lot of advantages such as increasing the accuracy of ANN by updating the weights and bias values using GA and PSO [55,56]. The aim of this study is to estimate the weights of hidden and output layers of an ANN architecture using GA and PSO during a convergence and accurate estimation process to generate accurate results, and, on the other hand, to control the deviation from target point in such a way that it prevents deviation and large errors even in different performances. However, the neural network needs to be left alone due to the random selection of a sample in order to arrive at an answer with appropriate accuracy. Therefore, this can be attributed to the stability and reliability of the neural network through GA and PSO.

Genetic Algorithm (GA)
GA is a subset of approximation techniques in computer science to estimate a proper solution for optimization problems. GA is a type of evolutionary algorithm (EA) that employs biological techniques such as heredity and mutations [57,58]. The hypothesis begins with a completely random population and continues for generations. In each generation, the total population capacity is assessed, several individuals are selected in a random process from the current generation (based on Genetic algorithm (GA) and particle swarm optimization (PSO), as evolutionary algorithms, have been employed to train the neural network. This approach of hybridization of ANN has a lot of advantages such as increasing the accuracy of ANN by updating the weights and bias values using GA and PSO [55,56]. The aim of this study is to estimate the weights of hidden and output layers of an ANN architecture using GA and PSO during a convergence and accurate estimation process to generate accurate results, and, on the other hand, to control the deviation from target point in such a way that it prevents deviation and large errors even in different performances. However, the neural network needs to be left alone due to the random selection of a sample in order to arrive at an answer with appropriate accuracy. Therefore, this can be attributed to the stability and reliability of the neural network through GA and PSO. GA is a subset of approximation techniques in computer science to estimate a proper solution for optimization problems. GA is a type of evolutionary algorithm (EA) that employs biological techniques such as heredity and mutations [57,58]. The hypothesis begins with a completely random population and continues for generations. In each generation, the total population capacity is assessed, several individuals are selected in a random process from the current generation (based on competencies) and modified (deducted or re-combined) to form a new generation, and the next repetition of the algorithm becomes the current generation [59][60][61].
The optimization process in the genetic algorithm is based on a randomly guided process. This method is based on Darwin's theory of gradual evolution and fundamental ideas. In this method, a set of objective parameters is randomly generated for a fixed number of so-called populations. Or the fit of that set of information is attributed to that member of that population [62][63][64]. This process is repeated for each of the created members, then formed by calling the operators of the genetic algorithm, including mutation and next-generation selection, and this process will continue until the convergence criterion is met [59,65]. There are three common criteria for stopping: Algorithm execution time, the number of generations that are created and the convergence of the error criterion. The process of implementing GA, which is the basis of evolutionary algorithms, is presented in Figure 2. which is adapted and regenerated from [66]. of implementing GA, which is the basis of evolutionary algorithms, is presented in Figure 2. which is adapted and regenerated from [66]. The main components of the Genetic algorithm include: representation of the environment, evaluation function, Population (set of answers), the process of choosing parents, Operators of Diversity (Generation), The process of selecting the living (choosing the best population to build the next generation) and stop condition. Genetic organization determines how each person displays themselves and behaves, and their physical quality. Differences in genetic organization are one of the criteria for distinguishing between different methods of evolutionary computation. The genetic algorithm uses linear binary organization. The most standard type of this organization is the use of an array of bits. Of course, an array of other types of data can also be used. This is due to their constant size. This facilitates integration operations [61,67,68]. However, it is possible to use variable length structures in organizing GA, which makes the implementation of integration very complex.
In this research, the genetic algorithm was utilized to find the optimal point of complex nonlinear functions in integrating with the artificial neural network. Genetic algorithms optimize artificial neural network weights and bias values. In fact, the objective function of the genetic algorithm is a function of the statistical results of the MLP. To train, the P number of population of each generation, MLP was randomly initialized and the error rate was calculated using training data. In the next step, the network characteristics were updated according to the input and output values. The training process of the algorithm was repeated until the network features improved, taking into account the newly population. In the last step, the output gathered from the network execution was compared with the actual values and the model execution finished by minimizing the difference between the two values. Figure 3 presents the flowchart of the MLP-GA algorithm. Table 5 presents the setting parameters for GA. The main components of the Genetic algorithm include: representation of the environment, evaluation function, Population (set of answers), the process of choosing parents, Operators of Diversity (Generation), The process of selecting the living (choosing the best population to build the next generation) and stop condition. Genetic organization determines how each person displays themselves and behaves, and their physical quality. Differences in genetic organization are one of the criteria for distinguishing between different methods of evolutionary computation. The genetic algorithm uses linear binary organization. The most standard type of this organization is the use of an array of bits. Of course, an array of other types of data can also be used. This is due to their constant size. This facilitates integration operations [61,67,68]. However, it is possible to use variable length structures in organizing GA, which makes the implementation of integration very complex.
In this research, the genetic algorithm was utilized to find the optimal point of complex nonlinear functions in integrating with the artificial neural network. Genetic algorithms optimize artificial neural network weights and bias values. In fact, the objective function of the genetic algorithm is a function of the statistical results of the MLP. To train, the P number of population of each generation, MLP was randomly initialized and the error rate was calculated using training data. In the next step, the network characteristics were updated according to the input and output values. The training process of the algorithm was repeated until the network features improved, taking into account the newly population.
In the last step, the output gathered from the network execution was compared with the actual values and the model execution finished by minimizing the difference between the two values. Figure 3 presents the flowchart of the MLP-GA algorithm. Table 5 presents the setting parameters for GA.

Particle Swarm Optimization (PSO)
PSO is a popular and robust optimization method to deal with problems in the n-dimensional space. The PSO is a mass search algorithm that is modeled on the social behavior of bird groups. Initially, the algorithm was employed for pattern detection of flight of birds at the same time and to suddenly change their path and optimize the shape of the handle. In PSO, particles flow in the search space which is affected by their experience and knowledge of their neighbors; thus the position of another particle mass affects how a particle is searched. Results of recognition of this behavior is the searching for particles to reach successful areas. The particles follow each other and move towards their best neighbors. In PSO particles are regulated throughout their neighborhood [69][70][71][72].
At the beginning of the work, a group of particles are produced to reach the best solution. Each particle is updated through the finest position (pbest) and the finest position ever obtained by the particle population used by the algorithm (gbest) in each step, which is presented in Figure 4 based on an adaptation from [73][74][75][76][77]. Updating the velocity and location of each particle is the next step after finding the best values, Equations (1) and (2):

Particle Swarm Optimization (PSO)
PSO is a popular and robust optimization method to deal with problems in the n-dimensional space. The PSO is a mass search algorithm that is modeled on the social behavior of bird groups. Initially, the algorithm was employed for pattern detection of flight of birds at the same time and to suddenly change their path and optimize the shape of the handle. In PSO, particles flow in the search space which is affected by their experience and knowledge of their neighbors; thus the position of another particle mass affects how a particle is searched. Results of recognition of this behavior is the searching for particles to reach successful areas. The particles follow each other and move towards their best neighbors. In PSO particles are regulated throughout their neighborhood [69][70][71][72].
At the beginning of the work, a group of particles are produced to reach the best solution. Each particle is updated through the finest position (pbest) and the finest position ever obtained by the particle population used by the algorithm (gbest) in each step, which is presented in Figure 4 based on an adaptation from [73][74][75][76][77]. Updating the velocity and location of each particle is the next step after finding the best values, Equations (1) and (2): on an adaptation from [73][74][75][76][77]. Updating the velocity and location of each particle is the next step after finding the best values, Equations (1) and (2): ( + 1) = ( + 1) + ( )  Equation (1) has three parts in the right side, current particle velocity v(t), the second part c1 × rand(t) × (pbest(t) − position(t)) and third part c2 × rand(t) × (gbest(t) − position(t)) are responsible for the rate of change of particle velocity and its direction towards the best personal experience (nostalgia) and the finest experience of the group (collective intelligence). If the first part is not considered in this equation v(t), then the velocity of the particles is determined according to the current position and the best particle experience, and in practice the effect of the current velocity and its inertia is eliminated. Accordingly, the best particle in the group stays in place, and the others move toward that particle. In fact, the mass movement of particles without the first part of Equation (1) will be a process in which the search space gradually shrinks and local search is formed around the best particle. The parameters c1 and c2 (the value is about 2) determine the importance and weight of collective intelligence and nostalgia [74][75][76]. As for the condition of stopping, the following ways are available: • A certain number of repetitions, • Achieve a decent threshold, • A number of repetitions that do not change the competence (for example, if after 10 repetitions the competency was constant and did not improve), • The last way is based on the aggregation density around the optimal point.
One of the advantages of PSO over GA is the simplicity and its low parameters. Selecting the best values for the cognitive and social component leads to accelerating the algorithm and preventing premature convergence occurs locally at optimal points. In PSO optimization, the proposed variables are included in the training of a neural network, including network weights and bias. The process is as follows: First, N vector of position Xi, which N is equal to the number of members of the category, is generated randomly. The neural network is executed according to the parameters equal to the variables of these vectors and the error obtained from each run is considered as the degree of fit of the variable vector of that network. This process is repeated till the final convergence is achieved. The ultimate convergence is to achieve the optimal position vector (values of optimal weights and bias) so that the training error is minimized. So the objective function in this optimization need to be minimized as the amount of error Forecast [78][79][80]. Table 6 presents the setting values of the MLP-PSO.

Training Phase
Training process categorized into two main steps. First is to select the best architecture of the ANN, the second is to integrate MLP with optimizers. Therefore, training was developed with 10 to 19 neurons in the first hidden layer and 2 neurons in the second hidden layer with the 80 percent of total data, according to Table 7. MLP models called as Models 1-6. After this process, MLP was integrated with GA using population size 50, 100 and 150 (models 7-9, respectively) and PSO using particle size 50, 75, 100 and 125 (models 10-13, respectively). Training was performed into two scenarios, with Tanh (x) as the output function and with Gaussian function as the default function.        Table 8 presents evaluation criteria which compares predicted and output values. These equations are metrics for indicating the performance of models in predicting the target values as the model output values. In fact, this metrics compares the output of models and target values to calculate a value for indicating the accuracy of models [51,56,81]. Table 8. Model Evaluation metrics.

Accuracy and Performance Index Description
N: Number of Data X: Target value Y: Output value.

Results
Training was performed by 80% of total data. Results were evaluated in terms of correlation coefficient, MAPE and RMSE, according to Tables 9 and 10. Table 9 presents results of the training step without the use of Tanh (x) (using the Gaussian function as default) as the output function and Table 10 gives results of the training step with Tanh (x) as the output function. As is clear from Tables 9 and 10, Model 13 provides higher accuracy compared with other models. It is also clear that using Tanh (x) as the output function of MLP increases the accuracy of the prediction. According to the results, the hybrid methods increase the processing time (s) compared with those for the single methods. This was also claimed by Mosavi et al., 2019 [82] and Ardabili et al., 2019 [83]. The main reason can be due to the optimizing process on setting the weights and bias values of the MLP which consumes more processing time (s). On the other hand, according to the Tables 9 and 10, it is clear that the GA requires more processing time compared with that of the PSO. More processing time also can be due to the complexity of the optimizers [84,85]. Using Tanh (x) reduced the processing time.

Testing Results
Tables 11 and 12 present the testing results, respectively, for the Gaussian function and with Tanh (x). as is clear, in testing process, results are different from those of training process. According to results of testing process, Model 7 followed by Model 13 for both scenarios, with Tanh (x) and Gaussian function, provide higher accuracy and lower error compared with other models. It is clear that the presence of Tanh (x) as the output function increases the accuracy and reduces the error values.  Figure 6 presents the deviation from target values of all models into two scenarios with the Gaussian function as the default and with Tanh(x) as the output function. According to Figure 6, it can be concluded that the presence of Tanh (x) as the output function reduces the range of deviation. Models with a high accuracy have lower deviation values compared to others. Figure 7 indicates a simple yet essential form of Taylor diagram for the testing process of the developed models. This diagram is developed according to correlation coefficient and standard deviation. A point with a lower standard deviation and higher correlation coefficient has higher accuracy compared with other points. As is clear from Figure 7, Model 13 and Model 7 present higher accuracy compared with other models.
According to the results, the advantage of the single models is their lower processing time, but the lowest accuracy can be the most important limitation and disadvantage of the single models compared with the hybrid ones. This was also claimed by several researches. In the case of using the hybrid models, the advantages of MLP-PSO such as higher accuracy and lower processing time overtake the MLP-GA.  Figure 6 presents the deviation from target values of all models into two scenarios with the Gaussian function as the default and with Tanh(x) as the output function. According to Figure 6, it can be concluded that the presence of Tanh (x) as the output function reduces the range of deviation. Models with a high accuracy have lower deviation values compared to others.  Figure 7 indicates a simple yet essential form of Taylor diagram for the testing process of the developed models. This diagram is developed according to correlation coefficient and standard deviation. A point with a lower standard deviation and higher correlation coefficient has higher accuracy compared with other points. As is clear from Figure 7, Model 13 and Model 7 present higher accuracy compared with other models.  According to the results, the advantage of the single models is their lower processing time, but the lowest accuracy can be the most important limitation and disadvantage of the single models compared with the hybrid ones. This was also claimed by several researches. In the case of using the hybrid models, the advantages of MLP-PSO such as higher accuracy and lower processing time overtake the MLP-GA.

Conclusions
In this paper, modeling was performed by MLP-GA and MLP-PSO in two scenarios including with Tanh (x) and with the Gaussian function as default as the output function in thirteen categories. Research outcomes were evaluated using RMSE and correlation coefficient values to compare the accuracy and performance of the developed models in training and testing steps. Based on the results, using Tanh (x) as the output function improved the accuracy of models significantly. MLP-PSO with population size 125 followed by MLP-GA with population size 50 provided higher accuracy in the testing step by RMSE 0.732583 and 0.733063, MAPE of 28.16%, 29.09% and correlation coefficient 0.694 and 0.695, respectively. As is clear, the only advantage of the single MLP is its lower processing time but the important disadvantage can be claimed the lower accuracy compared with the hybrid models. According to the results, using hybrid ML method could successfully improve the prediction accuracy. Accordingly, MLP-PSO with lower processing time and higher accuracy (as the main advantage of the PSO compared with GA) overtakes the MLP-GA. In this way, the problem statements were successfully covered by the solution presented in the study. The main limitation for the future, is about presenting and beating a new stock market index using evolutionary methods. The future work will address the beating of stock market, that the variance of stock market will be successfully addressed. Thus, the return variance poses a limitation of the present research. Acknowledgments: The support of Alexander von Humboldt foundation is acknowledged.

Conflicts of Interest:
The authors declare no conflict of interest.

Conclusions
In this paper, modeling was performed by MLP-GA and MLP-PSO in two scenarios including with Tanh (x) and with the Gaussian function as default as the output function in thirteen categories. Research outcomes were evaluated using RMSE and correlation coefficient values to compare the accuracy and performance of the developed models in training and testing steps. Based on the results, using Tanh (x) as the output function improved the accuracy of models significantly. MLP-PSO with population size 125 followed by MLP-GA with population size 50 provided higher accuracy in the testing step by RMSE 0.732583 and 0.733063, MAPE of 28.16%, 29.09% and correlation coefficient 0.694 and 0.695, respectively. As is clear, the only advantage of the single MLP is its lower processing time but the important disadvantage can be claimed the lower accuracy compared with the hybrid models. According to the results, using hybrid ML method could successfully improve the prediction accuracy. Accordingly, MLP-PSO with lower processing time and higher accuracy (as the main advantage of the PSO compared with GA) overtakes the MLP-GA. In this way, the problem statements were successfully covered by the solution presented in the study. The main limitation for the future, is about presenting and beating a new stock market index using evolutionary methods. The future work will address the beating of stock market, that the variance of stock market will be successfully addressed. Thus, the return variance poses a limitation of the present research.