Next Article in Journal
Hyers-Ulam Stability of Quadratic Functional Equation Based on Fixed Point Technique in Banach Spaces and Non-Archimedean Banach Spaces
Next Article in Special Issue
Android Malware Detection Using Machine Learning with Feature Selection Based on the Genetic Algorithm
Previous Article in Journal
Generalized Counting Processes in a Stochastic Environment
Previous Article in Special Issue
Evolving Deep Learning Convolutional Neural Networks for Early COVID-19 Detection in Chest X-ray Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genetic Feature Selection Applied to KOSPI and Cryptocurrency Price Prediction

Department of Computer Science, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Korea
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(20), 2574; https://doi.org/10.3390/math9202574
Submission received: 8 September 2021 / Revised: 1 October 2021 / Accepted: 4 October 2021 / Published: 14 October 2021
(This article belongs to the Special Issue Swarm and Evolutionary Computation—Bridging Theory and Practice)

Abstract

:
Feature selection reduces the dimension of input variables by eliminating irrelevant features. We propose feature selection techniques based on a genetic algorithm, which is a metaheuristic inspired by a natural selection process. We compare two types of feature selection for predicting a stock market index and cryptocurrency price. The first method is a newly devised genetic filter involving a fitness function designed to increase the relevance between the target and the selected features and decrease the redundancy between the selected features. The second method is a genetic wrapper, whereby we can find the better feature subsets related to KOPSI by exploring the solution space more thoroughly. Both genetic feature selection methods improved the predictive performance of various regression functions. Our best model was applied to predict the KOSPI, cryptocurrency price, and their respective trends after COVID-19.

1. Introduction

When using multidimensional data in the real world, the number of cases required to find the best feature subsets increases exponentially. The problem of finding a global optimal feature subset is NP-hard [1]. Rather than finding a global optimal solution by exploring all the solution spaces, heuristic search techniques [2] are used to find a reasonable solution in a constrained time frame. In stock markets, a specific index is related to a number of other economic indicators; however, it is difficult to predict a stock index which tends to be non-linear, uncertain, and irregular. There are two mainstream methods to predict a stock index: one is the improvement of feature selection techniques, and the other is the improvement of regression models to predict a stock index. We take the former approach to predict the stock market index using various machine learning methods. This study is a new attempt to predict the KOSPI using various external variables rather than internal time series data. The predictive performance was improved through feature selection that selects meaningful variables among many external variables. We propose the two new types of feature selection techniques using a genetic algorithm [3,4] which is a metaheuristic [5] method. The first technique is a genetic filter [6,7], and the second one is a genetic wrapper [8,9]. In our genetic filter, a new fitness function was applied to overcome the disadvantages of traditional filter-based feature selection. In addition, we can find the optimal feature subset by exploring the solution space more sufficiently using our genetic wrapper. The remainder of the paper is consisted as follows. The background is explained in Section 2. In Section 3, the operation and structure of our genetic algorithm for feature selection techniques are introduced. Section 4 contains the results of KOSPI prediction using feature selection techniques with various machine learning methods. In addition, our best model was applied to predict the KOSPI, cryptocurrency price, and their respective trends after COVID-19. Our conclusions are presented in Section 5.

2. Related Work

2.1. Feature Selection

Machine learning algorithms can be constructed using either linear or non-linear models. Because the performance of machine learning is highly dependent on the quantity and quality of data, the most ideal input data contain information that is neither excessive nor insufficient. Moreover, high-dimensional data may contain redundant or irrelevant features. Thus, the latent space that effectively explains the target variable may be smaller than the original input space. Dimensionality reduction transforms data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains important properties of the original data. It finds a latent space by compressing original data or removing noisy data. Feature selection [10] is a representative method for reducing the dimension of data. Filter methods use a simple but fast-scoring function to select features, whereas wrapper methods use a predictive model to score a feature subset. Filter-based feature selection is a method suitable for ranking features to show how relevant each feature is, rather than deriving the best feature subset for the target data. Even though a filter-based feature selection is effective in computation time compared to wrapper methods, it may select redundant features when it does not consider the relationships between selected features. In contrast, wrapper-based feature selection is a method that selects the feature subset that shows the best performance in terms of predictive accuracy. It requires significant time to train and test a new model for each feature subset; nonetheless, it usually provides prominent feature sets for that particular learning model.

2.2. Genetic Algorithm

A genetic algorithm is one of the metaheuristic techniques for global optimization and is a technique for exploring the solution space by imitating the evolutionary process of living things in the natural world. It is widely used in solving non-linear or incomputable complex problem in fields such as engineering and natural science [11,12,13,14]. To find the optimal solution through the genetic algorithm, we have to define two things. The solution of the problem should be expressed in the form of a chromosome, and a fitness function has to be derived to evaluate the chromosome. The series of these processes are similar to the process of confirming how entity can adapt to the environment. Each generation consists of a population that can be regarded as a set of chromosomes. Selection is performed based on the fitness of each chromosome, and crossover, replacement, and mutation are performed. By repeating the above process, the generated solution is improved, and searching the solution space is searched until specific conditions are satisfied.

2.3. Stock Index Prediction

There have been various methods and frameworks for analyzing stock indices. Among these, there exists the portfolio theory [15] and the efficient market hypothesis [16] based on the rational expectation theory that follows the assumption that economic agents are rational. On the contrary, a study of a stock index using behavioral finance theory [17] also exists. There are many studies that have attempted to analyze the stock index by combining data mining [18] with the above viewpoints of the stock index. Tsai et al. [19] used optimized feature selection through a combination of a genetic algorithm, principal component analysis, and decision tree, and predicted stock prices using neural networks. Lngkvist et al. [20] proposed a method that applies deep learning to multivariate time series data including stock index, social media, transaction volume, market conditions, and political and economic factors. Zhang et al. [21] proposed a model that performs feature selection using minimum redundancy maximum relevance [22,23] for stock index data. Nalk et al. [24] improved the performance of stock index prediction using the Boruta feature selection algorithm [25] with an artificial neural network [26]. Yuan et al. [27] compared the performance of the stock index prediction models such as a support vector machine (SVM) [28], random forest [29], and an artificial neural network. Hu et al. [30] improved the performance of stock index prediction by improving Harris hawks optimization.

3. Genetic Algorithm for Feature Selection

3.1. Encoding and Fitness

The initial task when using a genetic algorithm is to design an encoding scheme and a fitness function. The solution of the genetic algorithm is expressed in the form of a chromosome through an appropriate data structure, which is called encoding. In this study, encoding was conducted by a binary bit string, which indicates whether each feature is included or not. In the first experiment, 264-bit string was used as a chromosome to predict the KOSPI, and in the second experiment, 268-bit string was used to predict a cryptocurrency price. In a genetic algorithm, fitness is measured to evaluate how well an encoded chromosome solves a problem. The fitness is obtained from the implemented fitness function, and we used different fitness functions according to the genetic filter and genetic wrapper. The fitness of our genetic filter is a numerical value obtained by combining the correlations between selected features, and the fitness of our genetic wrapper is a mean absolute error between the target values and the predicted values of the machine learning algorithms preceded by feature selection.

3.2. Selection

Selection is the process of choosing the parent chromosomes to generate offspring chromosomes in each generation. In this study, we used roulette wheel selection based on the fitness. We set the selection probability of each chromosome in proportion to its fitness; then, we randomly selected chromosomes. It means that chromosomes with good fitness are more likely to be selected as parents, and chromosomes with relatively poor fitness are less likely to be selected as parents.

3.3. Crossover

Crossover is an operation that generates the offspring of the next generation by crossing the parental chromosomes obtained through selection. There are several methods of crossover; in this study, multi-point crossover was implemented. Multi-point crossover is an extension of one-point crossover. One-point crossover is an operation that randomly selects a point on chromosomes and crosses them based on that point. Multi-point crossover is similar to a one-point crossover, but uses two or more points. Indeed, an even number of multi-point crossover has the effect of crossing circular-shaped chromosomes because the first and last genes of the chromosomes are adjacent to each other. Because the degree of perturbation of multi-point crossover is larger than that of one-point crossover, a relatively wide solution space can be explored. However, strong perturbation may decrease convergence, and multi-point crossover with odd points may not maintain uniform traits of selected chromosomes. In this study, we used the chromosomes of a circular shape, a list of features with no meaning in the order. To increase the degree of perturbation moderately and for effective crossover in a circular shape, we used a 2-point crossover.

3.4. Mutation and Replacement

Mutation is an operator that modifies the gene of a chromosome to prevent a premature convergence and increase the diversity of the population. A general mutation generates a random number between 0 and 1 for each gene on a chromosome. If the value is less than the threshold, the corresponding gene is arbitrarily modified. In this study, a mutation probability was set to 0.001. Replacement is an operator that replaces the chromosomes of the existing population with the offspring chromosomes produced by crossover and mutation. We applied a replacement to change existing chromosomes with offspring chromosomes. Furthermore, we also applied the elitism to retain the best chromosome in the previous population to the next generation (Figure 1).

3.5. Genetic Filter

Filter-based feature selection [31,32,33] has the advantage of deriving feature subsets by identifying correlations between features within a relatively short time; however, it has the disadvantage that it may be difficult to quantify relevance and redundancy between selected features. In this study, a new fitness function was devised to emphasize the advantages and make up for the disadvantages. Equation (1) favors feature subsets that are highly correlated with the target variable and largely uncorrelated with each other.
f i t n e s s = i = 1 n f S t a r g e t , S i i = 1 n 1 j = i + 1 n f S i , S j
subject to f S i , S j = I G S i , S j + F S i , S j + C S i , S j , where n corresponds to the total number of features, S t a r g e t is the target variable, and I G , F, and C refer to the information gain, F-statistic, and Pearson correlation coefficient (PCC), respectively.
Moreover, fitness was obtained by combining the information gain, F-statistic, and PCC to derive various correlations of chromosomes. Specifically, to calculate the fitness of a chromosome, the sum of the results of the information gain, F-statistic, and PCC between target data and the selected feature S i was obtained. Another sum was also obtained for those between the selected features S i and S j . Finally, the difference between the two summations was calculated to identify the fitness of each chromosome. Figure 2 shows the flow diagram of our genetic filter.

3.5.1. Mutual Information

Mutual information [34] provides a numerical value quantifying the relationship between two random variables. The mutual information of random variables X and Y is I ( X , Y ) , the probability that events X and Y occur simultaneously is P ( X , Y ) , and the pointwise mutual information (PMI) of the events X and Y is P M I ( X , Y ) . If the random variables are continuous, Equation (2) is satisfied.
I ( X ; Y ) = x y P ( x , y ) · P M I ( x ; y ) d x d y , P M I ( x ; y ) = log p ( x , y ) p ( x ) p ( y ) = log p ( x | y ) p ( x ) = log p ( y | x ) p ( y )
In other words, the mutual information of variables X and Y is the sum of the values obtained by multiplying the P M I and the probability of all cases belonging to the variables X and Y. P M I is the value obtained by dividing the probability of two events occurring at the same time by the probability of each occurrence. It can be seen that X and Y are not related to each other when the mutual information is closer to 0.

3.5.2. F-Test

Hypothesis testing methods for testing differences in sample variance can be divided into the chi-squared test and F-test. The chi-squared test is applied when the population of a single sample follows a normal distribution and the variance is known in advance; however, considering that the variance is generally not known in advance, the F-test is used when the population is unknown. The F-test is a statistical hypothesis test that determines whether or not the difference in variance between two samples is statistically significant. We endeavored to include statistical significance between features by adding the F-statistic to the fitness of the genetic filter.

3.5.3. Pearson Correlation Coefficient

In statistics, the Pearson correlation coefficient [35] quantifies the correlation between two variables X and Y. According to the Cauchy-Schwarz inequality, it has a value between [−1, 1], and it indicates no correlation when it is closer to 0, positive linear correlation when it is closer to 1, and negative linear correlation when it is closer to −1.
r = Σ ( x i x ¯ ) ( y i y ¯ ) Σ ( x i x ¯ ) 2 Σ ( y i y ¯ ) 2

3.6. Genetic Wrapper

While our genetic filter calculates fitness through the correlations between features, our genetic wrappers [36,37] use machine learning models to evaluate the fitness of each chromosome. Therefore, the computational time is longer than that of a genetic filter; however, the genetic wrapper tries to search for an optimal feature subset tailored to a particular learning algorithm. We used three machine learning models for our genetic wrapper. Figure 3 shows the flow diagram of our genetic wrapper.

3.6.1. Support Vector Regression

Support vector regression (SVR) [38] refers to the use of an SVM to solve regression problems. The SVM is used for classification based on training data, but an ε -insensitive loss function is introduced in the regression model of the SVM to predict unknown real values. The goal of SVR is quite different from the goal of SVM. As shown in Figure 4, SVR minimizes the error outside the margin to have as many data as possible within the margin.

3.6.2. Extra-Trees Regression

The random forest is a representative ensemble model, and it assembles multiple decision trees using bootstrap samples to prevent overfitting. The general performance of the random forest is higher than that of a single tree. Extra-trees [39] is a variant of the random forest model. Extra-trees increases randomness by randomly selecting a set of attributes when splitting a node. The importance of features evaluated by Extra-trees is higher than that evaluated by the random forest model; that is, Extra-trees evaluated features from a broad perspective. We used the feature selection results obtained using Extra-trees regression.

3.6.3. Gaussian Process Regression

Gaussian process (GP) regression [40,41] is a representative model of the Bayesian non-parametric methodology and is mainly used to solve regression problems. Assuming that f is a function that describes the input and output data, the GP assumes that the joint distribution of finite f values follows a multivariable normal distribution. In general, the mean is assumed to be 0 and covariance C is set by a kernel function. GP regression gives a high prediction performance, allows the probabilistic interpretation of prediction results, and can be implemented with a relatively simple matrix operation. Figure 5 shows that deviation of functions in a given sample is very small. On the other hand, in the unknown region without samples, the predicted values of functions show a large variance. Finding the distribution of function is the main point of GP regression. Since GP regression involves computationally expensive operations, various approximation algorithms were devised.

4. Experiments and Evaluation

4.1. Experimental Setup

We first applied our genetic filter and genetic wrapper to KOSPI data; then, we compared the prediction results obtained using the machine learning models. The data for 12 years (from 2007 to 2018), which had 264 features including global economic indices, exchange rates, commodity indices, and etc, were used (Figure 6). Because the Korean economy is very sensitive to external variables due to its industrial structure, it was very important to grasp the trend of the global economy. Therefore, major countries and global economic indicators closely related to South Korea were selected. Various index data were preprocessed in three forms: index, net changes, and percentage changes (Figure 7). To compensate for the missing data, linear interpolation was used; further, non-trading days were excluded based on the KOSPI. The test data were not affected by the training data during the preprocessing and experiment. The SVR, Extra-trees, and GP regression were applied to compare the performance of preprocessed data with and without feature selection. Next, we selected the feature selection method and evaluation model that showed the best performance among them, and we conducted an experiment to predict the KOSPI in 2020 by adding data corresponding to 2019 and 2020 to the 12-year data from 2007 to 2018. Consequently, we endeavored to verify whether or not our feature selection technique also explains the data after COVID-19 adequately. We also tested whether feature selection improved predictive performance or not. The last experiment we conducted was to change the target data to cyptocurrency. Cryptocurrency is encrypted with blockchain technology, distributed, and issued. Specifically, it is electronic information that can be used as a currency in a certain network. Cryptocurrency was devised as a medium for the exchange of goods, that is, a means of payment. However, it serves as an investment whose price is determined according to supply and demand in the market through the exchange. Therefore, we conducted feature selection with cryptocurrency price as the target to check whether cryptocurrency can be regarded as an economic indicator affected by the market.

4.2. KOSPI Prediction

4.2.1. Experiment of Genetic Filter

Table 1 shows the parameters of our genetic filter. We trained and evaluated the data from 2007 to 2018 by dividing them into 20 intervals as shown in Table A1 (see Appendix A). As mentioned in Section 4.1, all the variables of the data were preprocessed into three different values: index, net changes, and percentage changes, respectively.
Our genetic filter was applied to each dataset, and the results of applying SVR, extra-trees regression, and GP regression are shown in Table A1, Table A2 and Table A3 (see Appendix A). The results of predicting net changes and percentage changes were converted into original indices, and the mean absolute error (MAE) with the actual indices was derived. The results obtained without any feature selection were compared with those obtained by applying our genetic filter; our genetic filter showed an improved average MAE for the three types of preprocessed data. When the experimental results were classified by evaluation method, GP regression showed the best performance overall among SVR, extra-trees regression, and GP regression. When the experimental results were classified by preprocessed type, predicting percentage changes and converting them into indices showed the least error. The experiment in which feature selection was performed with percentage changes in GP regression showed the best performance, and the average error was improved by approximately 32% than in the case without feature selection. Table 2 shows the process in which our genetic algorithm selects features between 2015 and 2016. The number and fitness of features in the best solution for each generation are shown. The features frequently selected among the feature subsets obtained for each interval are shown in Table 3, which identifies the feature subset closely related to KOSPI.

4.2.2. Experiment of Genetic Wrapper

Similar to the application of the genetic filter in Section 4.2.1, the parameters of the genetic wrapper are the same as in Table 1, but with a different number of generations. As in Section 4.2.1, intervals and types of data are the same. Table A1, Table A2 and Table A3 shows the results of applying the genetic wrapper to each data, and combining SVR, extra-trees regression, and GP regression (see Appendix A). Similarly, the results of predicting net changes and percentage changes were converted into original indices, and the MAE with the actual indices was derived. When we compared the results, our genetic wrapper showed improved average of the MAE than that without feature selection. Our genetic wrapper also showed better results compared with the genetic filter in all intervals. In particular, when we used GP regression with the percentage changes data and compared with no feature selection results, our genetic wrapper showed an improvement in the error by approximately 39%. Therefore, based on the findings of this study, the best way to explain the KOSPI is to apply percentage changes data to a genetic wrapper combined with GP regression.

4.2.3. Prediction of KOSPI after COVID-19

Following the global financial crisis in 2008, the KOSPI could not avoid the impact of COVID-19 on the stock market in 2020, and it showed significant fluctuations. It will be important in the real world to predict a situation in which the stock index fluctuates largely during an economic crisis. We added the data for 2019–2020 to the existing 2007–2018 data, resulting in total 14 years of data. We tried to predict the KOSPI after COVID-19 in 2020 by training 13 years of data corresponding to 2007–2019. We applied the combination of the genetic wrapper and GP regression, which had shown the best performance in Section 4.2.1 and Section 4.2.2 on the percentage changes data. Figure 8 shows the actual KOSPI, the results of applying feature selection, and those without applying feature selection. It was confirmed that GP regression on the selected features could predict the KOSPI after COVID-19 better without considerable fluctuation than that without feature selection.
It is meaningful to predict the KOSPI itself, but from an actual investment point of view, predicting whether the stock index on that day will rise or fall compared to the previous day may be of interest. The optimization carried out in this study is genetic feature selection, which can better predict the numerical value of the target data. Additional experiments were carried out to see whether the predicted index data can predict the direction of stock index. We compared the prediction results derived from GP regression with those of the genetic wrapper and that without any feature selection on percentage changes data. Each target value was post-processed to UP and DOWN, which mean upward and downward direction of the stock price, respectively. Table 4 shows the results of predicting the UP and DOWN of the KOSPI. The technique that sufficiently well predicted the KOSPI in the above section also predicted the actual UP and DOWN of the KOSPI relatively well. Although our purpose of optimization was not set as the UP or DOWN compared to the previous day, our feature selection could predict the UP and DOWN of the KOSPI with relatively high accuracy.

4.3. Prediction of Cryptocurrency Price and Direction

Cryptocurrency [42,43], which advocates decentralization, seeks to promote the role of an independent and objective safe asset distinct from exchange rates or other economic indicators. However, unintentional artificial surges and plunges may occur, and similar to other safe assets, fluctuations occur owing to changes in currency values such as increases in interest rate or inflation and deflation. Until now, we have used stock index data existing in the actual stock market such as KOSPI. However, in this Section, feature selection was applied with cryptocurrency set as the target data. We tried to predict the daily prices and UP and DOWN of Bitcoin. A total of 268 features including the KOSPI data were preprocessed in the same manner as in Section 4.2.3. The start of the data was set as 2013 because Bitcoin prices began fluctuating to some extent only from 2013. Bitcoin prices in 2020 were predicted by training 7-year data from 2013 to 2019. The results of predicting Bitcoin prices by applying the combination of genetic wrapper and GP regression were compared with those without feature selection. We converted the percentage changes of the predicted Bitcoin prices from the previous day to original Bitcoin prices and obtained the MAE with the actual Bitcoin prices.
Figure 9 shows the actual Bitcoin prices, the results of applying feature selection, and those of not applying feature selection. Bitcoin prices predicted without any feature selection may show considerable fluctuation in a specific interval, which means that the training did not proceed properly. However, when the genetic wrapper was applied, the prediction was similar to the actual Bitcoin prices and did not show considerable fluctuation. An additional experiment was carried out to determine whether our feature selection can adequately explain the fluctuations. Table 5 shows the results of predicting the direction of Bitcoin prices. The feature selection technique that sufficiently well predicted the KOSPI and Bitcoin prices in the above section showed the better precision, recall, F 1 -score, and accuracy of the UP and DOWN of the Bitcoin prices relatively well. The purpose of our optimization was also to accurately predict the Bitcoin prices; however, the actual index UP and DOWN were also predicted quite accurately.

5. Conclusions

In this study, we proposed genetic feature selection techniques to predict the KOSPI and performed various experiments to predict the KOSPI using machine learning. Traditional feature selection techniques aim to create an improved model through dimensionality reduction of the data. We presented a new genetic filter to increase the strength of feature selection and reduce the shortcomings of feature selection. We also presented a new genetic wrapper that maximizes prediction performance. The three important findings of this study are as follows: First, a genetic filter and a genetic wrapper, combined with various statistical techniques and machine learning, were applied to index, net changes, and percentage changes data. These combinations were compared, and the optimal form of the input data was percentage changes. By converting percentage changes into the original index, we created a better predictive model. Second, to overcome the disadvantages of the traditional filter-based feature selection, we tried a new fitness function. Redundant features were removed, and the formula was developed to have high relevance with the target variable; thus, improved results were obtained through various evaluation functions. Third, the best performance of the genetic wrapper in the 2007–2018 interval also produced meaningful results in predicting the KOSPI or cryptocurrency prices after COVID-19. It means that our stock index prediction model does not overfit to past data. Our genetic filter reduced MAE by 32% when using Gaussian Process (GP) regression and percentage change data. When the genetic wrapper was applied, the results were improved in all intervals compared to the genetic filter. GP with the genetic wrapper showed the best result with approximately 39% improvement. Although the proposed genetic wrapper has relatively good performance compared to our genetic filter, it has the disadvantage of long computation time. Our genetic filter runs faster than the genetic wrapper. In the next experiment, the genetic wrapper combined with GP regression, which showed the best result, was used to predict the KOSPI and cryptocurrency price after COVID-19. We trained predictive models using 2007–2019 data and tested them with 2020 data. Our feature selection improved KOSPI predictions in the post-COVID era. In addition, our genetic feature selection improved the prediction of stock market direction in terms of accuracy and F 1 -score. Our final experiment was conducted to predict cryptocurrency after COVID-19. Our feature selection also improved the Bitcoin price predictions. As future work, we plan experiments needed to find the fitness combination by applying more various statistical techniques in the genetic filter. In addition to the filter improvement, it will be necessary to apply various prediction models and conduct experiments to tune the hyperparameters of the model. With respect to the wrapper improvement, it will be necessary to reduce the computational cost without degeneration of prediction quality. Furthermore, it is promising to conduct research to derive more meaningful models by applying the ensemble method from several classifiers. Finally, we aim to predict various equities or assets such as US stock market, Chinese stock market, Ethereum, and Ripple using our genetic feature selection.

Author Contributions

Conceptualization, Y.-H.K.; methodology, Y.-H.K.; software, D.-H.C.; validation, D.-H.C.; formal analysis, S.-H.M.; investigation, D.-H.C.; resources, Y.-H.K.; data curation, S.-H.M.; writing–original draft preparation, D.-H.C.; writing–review and editing, S.-H.M.; visualization, D.-H.C.; supervision, Y.-H.K.; project administration, Y.-H.K.; funding acquisition, Y.-H.K. All authors have read and agreed to the published version of the manuscript.

Funding

The work reported in this paper was conducted during the sabbatical year of Kwangwoon University in 2021. This work was also supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1F1A1048466).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All used datasets are publicly available at Investing (https://www.investing.com) (access on 1 January 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Results of Applying Genetic Feature Selection to Various Data

In this appendix, we provide results of applying feature selections to KOSPI, the net changes of KOSPI, and the percentage changes of KOSPI. Each table shows the MAE values of SVR, Extra-trees regression, and GP regression.
Table A1. Results of applying feature selection to KOSPI.
Table A1. Results of applying feature selection to KOSPI.
Support Vector Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE177.766314.83345.64551.226321.983182.291312.07689.221234.325211.874317.419200.754259.086
Genetic filterMAE177.757314.84445.64051.224321.982182.289312.05889.227234.326211.870317.415200.756259.086
Genetic wrapperMAE177.755314.84045.63951.223321.975182.286312.03589.226234.316211.859317.411200.756259.083
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE185.103201.15538.72039.381287.315150.335310.08374.941332.257239.094251.678239.433245.556
Genetic filterMAE185.080201.08938.71339.374287.311150.313310.06874.949332.264239.094251.559239.427245.493
Genetic wrapperMAE185.018201.08938.71239.373287.307150.300310.03774.949332.258239.081251.557239.418245.487
Extra-Trees Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE165.491127.58966.53762.434232.963131.002260.48156.035246.607187.708108.637231.505170.071
Genetic filterMAE176.295120.52858.08553.078224.110126.419197.49584.312155.840145.883152.652179.329165.991
Genetic wrapperMAE165.46082.84347.61850.715214.924112.31294.57565.917154.673105.05566.519176.728121.624
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE134.866128.808110.17095.954117.360117.43192.62154.512285.179144.104194.587186.409190.498
Genetic filterMAE140.87647.64549.81354.896151.58188.962142.72467.217215.173141.705141.372206.220173.796
Genetic wrapperMAE75.67644.41147.64052.637137.50171.57389.29264.663214.648122.868110.697202.073156.385
Gaussian Process Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE72.135167.170265.885137.36291.568146.824211.181405.919173.120263.407365.998155.047260.522
Genetic filterMAE76.803141.921239.784117.810144.677144.199276.622329.960106.970237.851361.232146.354253.793
Genetic wrapperMAE73.758134.860174.954117.760102.760120.818259.285143.801101.354168.147353.156128.033240.594
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE129.86350.69177.23774.696125.35391.56871.64794.057142.701102.801634.979364.232499.605
Genetic filterMAE100.72265.60462.66372.879100.60880.49582.99565.525152.018100.179169.793189.886179.839
Genetic wrapperMAE94.50541.15259.63846.65685.10165.41082.41661.501141.64395.18658.731114.77086.750
Table A2. Results of applying feature selection to the net changes of KOSPI.
Table A2. Results of applying feature selection to the net changes of KOSPI.
Support Vector Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE13.96019.01010.50911.29612.96413.54816.84310.97912.15813.32714.86912.12213.495
Genetic filterMAE13.95919.00910.50911.29412.96313.54716.84210.97912.15713.32614.86712.12213.494
Genetic wrapperMAE13.95919.00910.50811.29412.96313.54616.84110.97812.15713.32514.86712.12013.493
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE12.58614.1099.48910.55915.77012.50318.97610.69412.97714.21511.83412.15511.994
Genetic filterMAE12.58614.1089.48810.55915.77012.50218.97410.69312.97514.21411.83312.15411.994
Genetic wrapperMAE12.58414.1079.48710.55815.76912.50118.97310.69212.97514.21411.83012.15411.992
Extra-Trees Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE17.78918.74816.97615.05114.98916.71018.35317.50314.27616.71116.63617.10616.871
Genetic filterMAE18.96019.97115.29013.14415.11716.49618.08714.88915.30316.09316.06814.47015.269
Genetic wrapperMAE17.86919.84314.94312.75015.10116.10117.90314.53514.76215.73315.89814.29015.094
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE13.87617.78715.19013.71817.29015.57219.51716.00713.69416.40614.40814.74014.574
Genetic filterMAE14.47916.35312.31414.36017.48814.99919.14215.41514.39116.31614.52412.96213.743
Genetic wrapperMAE13.97315.77312.17613.53516.69014.42919.05614.80814.22116.02814.41912.94013.680
Gaussian Process Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09-’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE18.66319.08315.70312.73913.70915.98017.34314.35112.51314.73615.19613.84714.522
Genetic filterMAE14.53317.89712.81210.94312.66513.77016.47212.34311.99513.60314.42012.58913.504
Genetic wrapperMAE14.33617.63912.49410.85712.27613.52015.81412.25511.96213.34413.69012.57913.134
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE14.76214.87512.52710.94115.22613.66617.95213.56812.93214.81712.67512.67512.675
Genetic filterMAE12.61513.71611.73210.73315.42312.84417.43611.50012.45313.79611.87712.65712.267
Genetic wrapperMAE12.38613.67810.92810.70614.56612.45316.84811.40312.13013.46011.51211.75211.632
Table A3. Results of applying feature selection to the percentage changes of KOSPI.
Table A3. Results of applying feature selection to the percentage changes of KOSPI.
Support Vector Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE13.96119.09010.50911.29412.96413.56416.90410.99712.15313.35114.95212.11913.535
Genetic filterMAE13.92519.01710.45711.28412.89513.51616.81710.96612.06313.28214.89712.09113.494
Genetic wrapperMAE13.91819.00710.42911.28412.87413.50216.80910.95412.04413.26914.86612.08813.477
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE12.58314.1559.48210.55815.76412.50819.04610.69212.96614.23511.89412.14712.021
Genetic filterMAE12.50614.0669.43410.48215.73912.44518.97410.63212.93214.17911.73611.91211.824
Genetic wrapperMAE12.50114.0619.42910.47815.70612.43518.93110.62612.88214.14711.66211.86511.764
Extra-Trees Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE15.07717.51915.25812.92013.08314.77116.80615.17613.70715.23015.62815.34215.485
Genetic filterMAE16.23316.88414.21212.46512.83914.52715.63415.45214.05915.04814.84315.56915.206
Genetic wrapperMAE15.99416.34314.20312.41812.68814.32914.07213.49612.96313.51014.50813.42613.967
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE14.93713.65412.41112.76414.66613.68617.27213.45914.40415.04513.23513.34513.290
Genetic filterMAE12.32914.05313.24411.95316.27913.57116.76414.94713.39215.03412.91513.15513.035
Genetic wrapperMAE11.60413.98512.31011.46014.19112.71016.10312.54112.32713.65712.85312.69412.773
Gaussian Process Regression
Train (year)’07–’08’09–’10’11–’12’13–’14’15–’16Average’07–09’’10–’12’13–’15Average’07–’10’11–’14Average
Test (year)’09–’10’11–’12’13–’14’15–’16’17–’18’10–12’’13–’15’16–’18’11–’14’15–’18
All featuresMAE15.83218.77219.41341.22213.02921.65315.23319.00219.77318.00315.70834.80325.255
Genetic filterMAE13.57114.39012.77524.16410.95715.17115.32112.0489.91412.42812.45917.25114.855
Genetic wrapperMAE12.37713.95412.23519.34110.23813.62912.70511.5319.41511.21712.22412.24112.233
Train (year)’07–’09’09–’11’11–’13’13–’15’15–’17Average’07–10’’10–’13’13–’16Average’07–’11’11–’15Average
Test (year)’10’12’14’16’18’11–12’’14–’15’17–’18’12–’14’16–’18
All featuresMAE12.13810.60411.68130.99613.21315.72716.49813.43511.47113.80112.04823.82217.935
Genetic filterMAE10.61611.08010.6167.82513.63410.75414.24411.36411.22612.27810.4989.76410.131
Genetic wrapperMAE10.26210.75610.1107.66311.77810.11412.47310.94410.30811.24210.0779.7309.904

References

  1. Hochba, D.S. Approximation algorithms for NP-hard problems. ACM Sigact News 1997, 28, 40–52. [Google Scholar] [CrossRef]
  2. Pearl, J. Heuristics: Intelligent Search Strategies for Computer Problem Solving; Addison-Wesley: Boston, MA, USA, 1984. [Google Scholar]
  3. Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
  4. Mitchell, M. An Introduction to Genetic Algorithms; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  5. Glover, F.W.; Kochenberger, G.A. Handbook of Metaheuristics; Kluwer: Norwell, MA, USA, 2003. [Google Scholar]
  6. Yu, L.; Liu, H. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 856–863. [Google Scholar]
  7. Lanzi, P.L. Fast feature selection with genetic algorithms: A filter approach. In Proceedings of the IEEE International Conference on Evolutionary Computation, Indianapolis, IN, USA, 13–16 April 1997; pp. 537–540. [Google Scholar]
  8. Hall, M.A.; Smith, L.A. Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In Proceedings of the 12th International Florida Artificial Intelligence Research Society Conference, Orlando, FL, USA, 1–5 May 1999; pp. 235–239. [Google Scholar]
  9. Huang, J.; Cai, Y.; Xu, X. A hybrid genetic algorithm for feautre selection wrapper based on mutual information. Pattern Recognit. Lett. 2007, 28, 1825–1844. [Google Scholar] [CrossRef]
  10. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  11. Mahfoud, S.; Mani, G. Financial forecasting using genetic algorithms. Appl. Artif. Intell. 1996, 10, 543–566. [Google Scholar] [CrossRef]
  12. Kim, K.J.; Han, I. Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert Syst. Appl. 2000, 19, 125–132. [Google Scholar] [CrossRef]
  13. Cho, H.-Y.; Kim, Y.-H. A genetic algorithm to optimize SMOTE and GAN ratios in class imbalanced datasets. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Cancn, Mexico, 8–12 July 2020; pp. 33–34. [Google Scholar]
  14. Kim, Y.-H.; Yoon, Y.; Kim, Y.-H. Towards a better basis search through a sourrogate model-based epistasis minimization for pseudo-boolean optimization. Mathematics 2020, 8, 1287. [Google Scholar] [CrossRef]
  15. Markowitz, H.M. Foundations of portfolio theory. J. Financ. 1991, 46, 469–477. [Google Scholar] [CrossRef]
  16. Malkiel, B.G. The efficient market hypothesis and its critics. J. Econ. Perspect. 2003, 17, 59–82. [Google Scholar] [CrossRef] [Green Version]
  17. Hursh, S.R. Behavioral economics. J. Exp. Anal. Behav. 1984, 42, 435–452. [Google Scholar] [CrossRef]
  18. Bramer, M. Principles of Data Mining; Springer: London, UK, 2007. [Google Scholar]
  19. Tsai, C.F.; Hsiao, Y.C. Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decis. Support Syst. 2010, 50, 258–269. [Google Scholar] [CrossRef]
  20. Lngkvist, M.; Karlsson, L.; Loutfi, A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit. Lett. 2014, 42, 11–24. [Google Scholar] [CrossRef] [Green Version]
  21. Zhang, X.D.; Li, A.; Pan, R. Stock trend prediction based on a new status box method and AdaBoost probabilistic support vector machine. Appl. Soft Comput. 2016, 49, 385–398. [Google Scholar] [CrossRef]
  22. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
  23. Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef] [PubMed]
  24. Naik, N.; Mohan, B.R. Optimal feature selection of technical indicator and stock prediction using machine learning technique. In Proceedings of the International Conference on Emerging Technologies in Computer Engineering, Jaipur, India, 1–2 February 2019; pp. 261–268. [Google Scholar]
  25. Kursa, M.B.; Rudnicki, W.R. Feature selection with the Boruta package. J. Statisical Softw. 2010, 36, 1–13. [Google Scholar]
  26. Hassoun, M.H. Fundamentals of Artificial Neural Networks; MIT Press: Cambridge, MA, USA, 1995. [Google Scholar]
  27. Yuan, X.; Yuan, J.; Jiang, T.; Ain, Q.U. Integrated long-term stock selection models based on feature selection and machine learning algorithms for China stock market. IEEE Access 2020, 8, 22672–22685. [Google Scholar] [CrossRef]
  28. Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef] [PubMed]
  29. Breiman, L. Random forests. In Machine Learning; Baesens, B., Batista, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; pp. 5–32. [Google Scholar]
  30. Hu, H.; Ao, Y.; Bai, Y.; Cheng, R.; Xu, T. An improved Harris’s hawks optimization for SAR target recognition and stock market index prediction. IEEE Access 2020, 8, 65891–65910. [Google Scholar] [CrossRef]
  31. Moon, S.-H.; Kim, Y.-H. An improved forecast of precipitation type using correlation-based feature selection and multinomial logistic regression. Atmos. Res. 2020, 240, 104928. [Google Scholar] [CrossRef]
  32. Kim, Y.-H.; Yoon, Y. A genetic filter for cancer classification on gene expression data. Bio-Med. Mater. Eng. 2015, 26, S1993–S2002. [Google Scholar] [CrossRef] [Green Version]
  33. Cho, D.-H.; Moon, S.-H.; Kim, Y.-H. An improved predictor of daily stock index based on a genetic filter. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Lille, France, 10–14 July 2021; pp. 49–50. [Google Scholar]
  34. Kraskov, A.; Stgbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. 2004, 69, 066138. [Google Scholar] [CrossRef] [Green Version]
  35. Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation. In Noise Reduction in Speech Processing; Benesty, J., Chen, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
  36. Cho, D.-H.; Moon, S.-H.; Kim, Y.-H. A daily stock index predictor using feature selection based on a genetic wrapper. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Cancn, Mexico, 8–12 July 2020; pp. 31–32. [Google Scholar]
  37. Seo, J.-H.; Lee, Y.-H.; Kim, Y.-H. Feature selection for very short-term heavy rainfall prediction using evolutionary computation. Adv. Meteorol. 2014, 2014, 1–15. [Google Scholar] [CrossRef] [Green Version]
  38. Smola, A.J.; Scholkpf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  39. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. In Machine Learning; Baesens, B., Batista, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 3–42. [Google Scholar]
  40. Quinonero-Candela, J.; Raumussen, C.E. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 2005, 6, 1939–1959. [Google Scholar]
  41. Liu, H.; Ong, Y.-S.; Shen, X.; Cai, J. When Gaussian process meets big data: A review of scalable GPs. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4405–4423. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Abraham, J.; Higdon, D.; Nelson, J.; Ibarra, J. Cryptocurrency price prediction using tweet volumes and sentiment analysis. SMU Data Sci. Rev. 2018, 1, 1–21. [Google Scholar]
  43. Stbinger, J. Statistical arbitrage with optimal causal paths on high-frequency data of the S&P 500. Quant. Financ. 2019, 19, 921–935. [Google Scholar]
Figure 1. An example of our two-point crossover.
Figure 1. An example of our two-point crossover.
Mathematics 09 02574 g001
Figure 2. Flowchart of our genetic filter.
Figure 2. Flowchart of our genetic filter.
Mathematics 09 02574 g002
Figure 3. Flowchart of our genetic wrapper.
Figure 3. Flowchart of our genetic wrapper.
Mathematics 09 02574 g003
Figure 4. Examples of the one-dimensional SVM and SVR model.
Figure 4. Examples of the one-dimensional SVM and SVR model.
Mathematics 09 02574 g004
Figure 5. Examples of Gaussian process regression.
Figure 5. Examples of Gaussian process regression.
Mathematics 09 02574 g005
Figure 6. Daily KOSPI from 2007 to 2018.
Figure 6. Daily KOSPI from 2007 to 2018.
Mathematics 09 02574 g006
Figure 7. The percentage changes of the KOSPI from 2007 to 2018.
Figure 7. The percentage changes of the KOSPI from 2007 to 2018.
Mathematics 09 02574 g007
Figure 8. Prediction of KOSPI after COVID-19.
Figure 8. Prediction of KOSPI after COVID-19.
Mathematics 09 02574 g008
Figure 9. Prediction of Bitcoin in 2020.
Figure 9. Prediction of Bitcoin in 2020.
Mathematics 09 02574 g009
Table 1. Operators and parameters of our genetic filter.
Table 1. Operators and parameters of our genetic filter.
Operator / ParameterValue
Size of population100
Number of generations500
Length of chromosome264
SelectionRoulette wheel
Crossover2-points
Mutation rate0.001
ReplacementElitism
Table 2. The fitness of our genetic filter from 2015 to 2016.
Table 2. The fitness of our genetic filter from 2015 to 2016.
C S t a r g e t , S i ( α 1 ) C S i , S j ( α 2 )
C Mathematics 09 02574 i001 Mathematics 09 02574 i002
I G S t a r g e t , S i ( α 3 ) I G S i , S j ( α 4 )
I G Mathematics 09 02574 i003 Mathematics 09 02574 i004
F S t a r g e t , S i ( α 5 ) F S i , S j ( α 6 )
F Mathematics 09 02574 i005 Mathematics 09 02574 i006
α 1 * + α 2 * α 3 * + α 4 * α 5 * + α 6 *
F i t n e s s Mathematics 09 02574 i007
The above terms are followed by Equation (1). α * means normalized value of α .
Table 3. List of features highly relevant to KOSPI.
Table 3. List of features highly relevant to KOSPI.
CategoryFeatureCategoryFeature
CommoditiesGas, Corn,ForexUSD/JPY, INR/KRW,
WheatGBP/KRW, EUR/GBP
Bond yieldSouth Korea,IndicesSSEC, FTSE,
Japan, FranceIDX, CSE
Table 4. Prediction of the direction of KOSPI. Results without feature selection (left) and with the genetic wrapper (right).
Table 4. Prediction of the direction of KOSPI. Results without feature selection (left) and with the genetic wrapper (right).
All FeaturesPredictedTotalGenetic WrapperPredictedTotal
++
Observed+7749126Observed+10052152
7546121405595
Total15295247Total140107247
UpDown UpDown
Precision0.6110.380 Precision0.6580.579
Recall0.5070.484 Recall0.7150.514
F 1 -score0.5540.426 F 1 -score0.6850.545
Accuracy0.498 Accuracy0.628
Table 5. Prediction of the direction of Bitcoin price. Results without feature selection (left) and with the genetic wrapper (right).
Table 5. Prediction of the direction of Bitcoin price. Results without feature selection (left) and with the genetic wrapper (right).
All FeaturesPredictedTotalGenetic WrapperPredictedTotal
++
Observed+7662138Observed+9158149
6544109504898
Total141106247Total141106247
UpDown UpDown
Precision0.5510.403 Precision0.6110.490
Recall0.5390.415 Recall0.6450.453
F 1 -score0.5450.409 F 1 -score0.6280.471
Accuracy0.486 Accuracy0.563
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Cho, D.-H.; Moon, S.-H.; Kim, Y.-H. Genetic Feature Selection Applied to KOSPI and Cryptocurrency Price Prediction. Mathematics 2021, 9, 2574. https://doi.org/10.3390/math9202574

AMA Style

Cho D-H, Moon S-H, Kim Y-H. Genetic Feature Selection Applied to KOSPI and Cryptocurrency Price Prediction. Mathematics. 2021; 9(20):2574. https://doi.org/10.3390/math9202574

Chicago/Turabian Style

Cho, Dong-Hee, Seung-Hyun Moon, and Yong-Hyuk Kim. 2021. "Genetic Feature Selection Applied to KOSPI and Cryptocurrency Price Prediction" Mathematics 9, no. 20: 2574. https://doi.org/10.3390/math9202574

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop