Predicting Bitcoin (BTC) Price in the Context of Economic Theories: A Machine Learning Approach

Bitcoin (BTC)—the first cryptocurrency—is a decentralized network used to make private, anonymous, peer-to-peer transactions worldwide, yet there are numerous issues in its pricing due to its arbitrary nature, thus limiting its use due to skepticism among businesses and households. However, there is a vast scope of machine learning approaches to predict future prices precisely. One of the major problems with previous research on BTC price predictions is that they are primarily empirical research lacking sufficient analytical support to back up the claims. Therefore, this study aims to solve the BTC price prediction problem in the context of both macroeconomic and microeconomic theories by applying new machine learning methods. Previous work, however, shows mixed evidence of the superiority of machine learning over statistical analysis and vice versa, so more research is needed. This paper applies comparative approaches, including ordinary least squares (OLS), Ensemble learning, support vector regression (SVR), and multilayer perceptron (MLP), to investigate whether the macroeconomic, microeconomic, technical, and blockchain indicators based on economic theories predict the BTC price or not. The findings point out that some technical indicators are significant short-run BTC price predictors, thus confirming the validity of technical analysis. Moreover, macroeconomic and blockchain indicators are found to be significant long-term predictors, implying that supply, demand, and cost-based pricing theories are the underlying theories of BTC price prediction. Likewise, SVR is found to be superior to other machine learning and traditional models. This research’s innovation is looking at BTC price prediction through theoretical aspects. The overall findings show that SVR is superior to other machine learning models and traditional models. This paper has several contributions. It can contribute to international finance to be used as a reference for setting asset pricing and improved investment decision-making. It also contributes to the economics of BTC price prediction by introducing its theoretical background. Moreover, as the authors still doubt whether machine learning can beat the traditional methods in BTC price prediction, this research contributes to machine learning configuration and helping developers use it as a benchmark.


Introduction
Cryptocurrency is a private system that enables trades between individuals without a central and intermediate agency. In early 2009, Bitcoin (BTC) was valued for the first time at US$0.08. The currency fluctuated for more than four years until the price touched $1110 in 2013. Due to high volatility and massive fluctuations in prices in cryptocurrencies, accurate price predictions are a complex and challenging task. That is mainly because the costs of

Underlying Theory of the Macroeconomic Indicators: Demand and Supply Theory
The quantity theory of money is a concept in monetary economics that holds that money's supply and demand determine the price level. Using this paradigm, Buchholz et al. [11] highlighted how the forces of supply and demand are the main factors influencing the price of Bitcoin. Additionally, utilizing the Keynesian theory of speculative demand for money framework, NaiFovino, et al. [12] and Ciaian et al. [13] highlighted the association between macrofinancial indicators and Bitcoin prices. According to the hypothesis, people who trade in currencies do so to avoid suffering a capital loss on their investments in bonds and other financial assets. A rise in interest rates lowers the value of economic assets, resulting in a loss on the investment of financial assets [14].
Kristoufek [15] extended the research to study the impact of some macroeconomic indicators on the BTC price prediction. He found that Bitcoin appreciates in the long run if it is used more for trade, i.e., non-exchange transactions.

Underlying Theory of the Microeconomic Indicators: Microstructure Theory
The theoretical frameworks of the microstructure approach developed by Lyons [16] imply that the market information structure is asymmetric, which means not all market participants know about the market information. Some agents have their private information, not necessarily about fundamentals. Lyons found that order flow is the most critical determinant of exchange rate determination in the short run. According to Lyons [16], order flow can be measured as the number of buyer-initiated orders minus the number of seller-initiated orders in the market. In microeconomics, supply and demand is an economic market price determination model [17,18]. Theory and empirical evidence suggest that, for an asset with a given cash flow, the higher its market liquidity, the lower its expected return (e.g., [19,20]). Market liquidity affects asset prices and expected returns. In the Bitcoin market, the bid-ask spread factor as a proxy for market liquidity. As more and more buy and sell orders are placed, overall supply and demand become more and more apparent. Some empirical studies also showed the short-term predictability of the Bitcoin microstructure. For example, Dyhrberg et al. [21] investigated the liquidity and transaction costs of Bitcoin markets as a microstructure analysis of Bitcoin. Scaillet et al. [22] showed the bid-ask spread has significant predicting power over jumps in Bitcoin price. In another study, Guo et al. [23] made a short-term prediction of BTC price fluctuations (measured with volatility) using buy and sell orders.
The private information in the BTC market is different from the stock market. In stock market trading, private information is referred to an improved understanding of a firm or company's prospects and provides a better evaluation of a potential cash flow. When a particular group of traders is made accessible to private information, it helps to create a clear-cut distinction between a BTC market and a stock market. However, it is essential to note that, like the stock market, BTC entertains an uninformed group of traders who enter the market for liquidity only. The questions here follow: What if there remains no future cash flow available for discounting or there remains no asset for valuation? In such a scenario, what exactly would private information provide?
It is indicated that the valuation of BTC is strongly dependent on the level of confidence of its traders. Hence, private information announces great estimation and prediction of the value that a BTC can potentially gain. These types of evaluations are dependent on the consumption of BTC and their usages. Private information like this adds to the prices of BTC and stimulates its demand. Since BTC has a fixed supply, private information helps increase the demand, increasing the prices in the global market. Data provided by the order book covers all the causes of demand and supply conditions of an asset in the form of bids and asks, which are implemented as trades ultimately. The data here provide an insight into the market's microstructure and an internal overview, which might not be easy to comprehend otherwise. Bid and ask price are two essential components of private information. The bid price refers to the highest price that a potential buyer of BTC is willing to pay. It is also referred to as the buying price for the exchange. When demand for BTC is high, the bid price increases, which means trading volume affects the bid price.
Ask price is the lowest price a seller wants to accept BTC. If the demand falls, there is a fall in the asking price as well. Ask prices are generally higher in comparison to bid prices. Therefore, the difference between these two prices, called the spread, is precisely the profit extracted in these exchanges. BTC prices are highly volatile, which causes extreme fluctuations along with the spreads, which is why sellers enter this market after a great deal of negotiation with the investors and traders to initiate a bidding war. Once that happens, this buying pressure will force an increase price.

Underlying Theory of Blockchain Information Indicators: Cost-Based Pricing Theory
According to Noble and Gruca [24], the cost price of any service or product can be computed based on a predefined profit margin percent calculated over the total cost. The primary focus of the cost-based pricing theory focuses on the variable cost and fixed cost components classified as part of the internal cost. This pricing theory is crucial to BTCs miners as it helps them compute from which cost price is the mining activity more profitable. Blockchain information is one of the critical considerations of BTC's cost price, as per Wang and Vergne [5]. The mining hardware efficiency can be improved significantly using the right technology resulting in a reduced cost of mining the BTC and a lower price. The lower cost and lower price will lead to increased demand, resulting in ultimately improved return on the overall investment in BTC. Extra hashing power can be achieved for the global mining network on blockchain information which contradicts the lower cost of mining as the difficulty level increases leading to higher mining costs and higher prices for BTCs, resulting in reduced demand and lesser returns.
By developing a cost-of-production model for valuing Bitcoin, Hayes [25] showed that the three factors of computational power, rate of coin production, and mining difficulty used might account for more than 84% of relative value formation. Increasing the difficulty will result in fewer units produced for a given amount of hash power, increasing the relative cost of production. Similar to this, reducing the block reward will result in fewer units. The marginal cost of production is reduced with improved mining hardware energy efficiency, drop-in electricity charges globally, or reduced mining difficulty. With improvement in technical processes, the efficiency of the mining process also improves, which leads to a reduction in the cost of production, which in turn puts downward pressure on prices. In another study, Hayes [26] back-tests a marginal cost of production model applied to value Bitcoin. The author applied vector autoregression (VAR) and traditional regression models on the historical data from 29 June 2013, through 27 April 2018, when the mining difficulty changes, i.e., every two weeks. Results demonstrate that the marginal cost of production is important in explaining Bitcoin pricing in the long run (considering every two weeks a long run prediction).
The block size limits the number of transactions verified with each block, resulting in more computation power for verifying larger blocks. This increased need for more computational power will increase the cryptocurrency price in line with what has been discussed. By definition, hash rate means the quantum of processing and computing power that the mining process contributes to the network. The value of hash rate is referred to provide the value of the network power. Thus computed, this value is used to correct the mining difficulty, i.e., to increase or decrease it and thereby correspondingly increase or decrease the BTC price.
The average block time of the network is evaluated after n number of blocks, and if it is higher than the expected block time, then the level of difficulty of the proof of work algorithm is declined. On the contrary, if the average block time is less than expected, the difficulty level will increase, which is in line with the concept of economics called the law of diminishing marginal utility. The speed with which the things are made available, then the value decreases over time. In terms of BTC terminology, the faster the rate of unit formation, the lower the price of the coin goes.
Difficulty is changed based on the time it took to discover 2016 previous blocks. If a block is found every 10 min (finding 2016 blocks will take exactly 2 weeks). The more (or less) time was spent on finding the previous 2016 blocks the more will difficulty be lowered (raised). Because mining is still lucrative despite the difficulties adjusting higher and the margins becoming somewhat slimmer, more miners are encouraged to join. more miners joining the effort means that the network is growing, which is good for Bitcoin's price in the long run. This cycle keeps going until a sizable part of the miners can no longer keep up. Some are compelled to sell a growing proportion of the newly created Bitcoins, which finally depletes their treasuries. This causes an increased supply of Bitcoins for sale on the market. They eventually give up and cease mining. The difficulty is then adjusted downward when the hash rate declines.

Application of Machine Learning in Real-World Problem Solving
Artificial intelligence (AI) is a relatively new trend in science that wants to bring about fundamental changes in people's lives. AI is a little challenging to define, but it can be said that it combines different sciences to make machines more intelligent. One of the most popular subfields of artificial intelligence is machine learning, which is hotly debated. Everyone feels the impact of the learning machine every day in daily life. Simply machine Learning is a science that teaches machines how to learn new things from themselves. Machine learning is one of the modern human inventions that has contributed to the progress of various industries and businesses and has also been very influential in the individual lives of human beings [27]. Machine learning is a subset of artificial intelligence that focuses on learning from the database to build intelligent computer systems. At present, machine learning has been used in various fields and industries. For example, machine learning has been used to diagnose and treat diseases [28], image processing [29], classification [30], and more. Support vector regression can be used in many areas, such as dynamic response prediction of magnetorheological elastomer base isolator [31], thermal spring back of hot press forming [32], text classification [33], etc.

Related Work and Research Gap
Thus far, empirical studies do not demonstrate a clear advantage for the emerging techniques of using machine learning algorithms to predict the BTC price. Research in this area is insufficient [34,35]. Therefore, this study will help to show the significance of machine learning methods in BTC price prediction problems. Also, some research shows machine learning outperforms statistical analysis, and some still believe in the superiority of conventional statistical analysis. Table 1 presents some related work on the BTC price prediction problem. The current research differs from previous studies in terms of completeness and comprehensiveness, and the comparative analysis in the current study has not been conducted before. In addition, a variety of indicators, including macroeconomic indicators, microstructure indicators, blockchain information, and technical indicators, have been used to analyze the significant variables as BTC price predictors. In the existing literature, there is no comprehensive work in which almost all categories of indicators are investigated. Most of the works regarding BTC price prediction are empirical analyses. However, the current study first looks at the BTC price prediction problem from the perspective of economic theories, including demand and supply theory, microstructure theory, and Cost-based pricing theory. It then identifies the associated variables affecting the BTC price. After that, we empirically prove the predictability power of the attributes through emerging machine learning models and traditional methods.

Materials and Methods
This research applies a traditional OLS method [45] and some machine learning methods for the BTC price prediction problem, including Ensemble learning, SVR, and MLP multilayer perceptron, which are briefly explained.

Multilayer Perceptron
Rosenblatt [46] introduced a multilayer perceptron (MLP) concept with a single perceptron in 1958, consisting of the input layer, middle layers, and output layer. The input layer is a connection between outer space with the network. The middle layers are called hidden layers. Because there is no connection with the outside world, its values are not observed in the training set. The number of neurons in the input layer corresponds to the number of input parameters. Neurons in the hidden layer can be determined by the "trial and error" method. The output layer includes neurons according to our desired output, e.g., the forecasted value in the forecasting problems. A set of weights connects the neurons (see Figure 1).  The output value of a three-layer perceptron can be formulated as: where is the number of neurons in the hidden layer, is the weight of the second layer, is the output of neuron j, is the bias of the output neuron and is the activation function of the output neuron. Several activation functions have been used in MLP models, such as scaled conjugate gradient (SCG), Levenberg-Marquardt (LM), gradient descent with adaptive learning rate (GDA), gradient descent with momentum (GDM), and others. The output value of neuron j in the hidden layer is given by: where is the number of inputs, are the weights of the first layer, are inputs and is the bias of neuron j, and is the activation function of hidden layers. The reason behind choosing MLP is that they are fast to train and can afford hidden layer size 256 instead of 32-64. Also, colossal variance gives a strong ensemble with a single model type.

Support Vector Regression
Support vector regression (SVR) is an emerging nonlinear regression method based on statistical learning theory having a more stable solution than traditional neural network models. Adopting the structural risk minimization principle in SVM reduces overfitting and local minima issues. In SVR, the nonlinear regression problem is transformed into a linear regression problem by mapping the input data into a high dimensional feature space by applying kernel functions [47]. Consider a set of data , ⊂ ℝ ℝ where is a vector of inputs, represents the scalar output. In the nonlinear regression case, the linear estimation function can be formulated as ⟨ , ⟩ where, ∈ ℝ is weight vector, is the mapping function, ⟨⋅,⋅⟩ denotes the dot product in the feature space, and b is a constant. Several cost functions can be used in SVR, including Huber's Gaussian, ε-insensitive, and Laplacian. The robust ε-insensitive loss function introduced by Vapnik [48] is the most frequently used function, which can be formulated as follows: The output value y of a three-layer perceptron can be formulated as: where N is the number of neurons in the hidden layer, v j is the weight of the second layer, z j is the output of neuron j, b 0 is the bias of the output neuron and ϕ 2 is the activation function of the output neuron. Several activation functions have been used in MLP models, such as scaled conjugate gradient (SCG), Levenberg-Marquardt (LM), gradient descent with adaptive learning rate (GDA), gradient descent with momentum (GDM), and others. The output value of neuron j in the hidden layer is given by: where M is the number of inputs, w ij are the weights of the first layer, x i are inputs and b j is the bias of neuron j, and ϕ 1 is the activation function of hidden layers. The reason behind choosing MLP is that they are fast to train and can afford hidden layer size 256 instead of 32-64. Also, colossal variance gives a strong ensemble with a single model type.

Support Vector Regression
Support vector regression (SVR) is an emerging nonlinear regression method based on statistical learning theory having a more stable solution than traditional neural network models. Adopting the structural risk minimization principle in SVM reduces overfitting and local minima issues. In SVR, the nonlinear regression problem is transformed into a linear regression problem by mapping the input data into a high dimensional feature space by applying kernel functions [47]. Consider a set of data ( is a vector of inputs, y i represents the scalar output. In the nonlinear regression case, the linear estimation function can be formulated as is the mapping function, ·, · denotes the dot product in the feature space, and b is a constant. Several cost functions can be used in SVR, including Huber's Gaussian, ε-insensitive, and Laplacian. The robust ε-insensitive loss function introduced by Vapnik [48] is the most frequently used function, which can be formulated as follows: where ε is the tube radius around the regression function f (x), affecting the number of support vectors used to construct the regression function. The cost of errors on the points inside the tube is zero. Figure 2 shows a schematic diagram of the nonlinear regression by SVR.
where ε is the tube radius around the regression function f(x), affecting the number of support vectors used to construct the regression function. The cost of errors on the points inside the tube is zero. Figure 2 shows a schematic diagram of the nonlinear regression by SVR. The SVR performs linear regression in the feature space using the -insensitive loss function by minimizing the empirical risk ∑ − as well as minimizing the regularization term, ‖ ‖ to reduce the model complexity (flatness). The slack variables and * represents the deviation of training samples out of the ε-insensitive zone. The optimal regression function can be obtained [47]: , * 0 where is the regularization constant determining the trade-off between the empirical risk and the regularization term. The above optimization problem can be solved by using Lagrangian multipliers * and and Karush-Kuhn-Tucker conditions as the following form: . .
where , is the kernel function which is defined as the inner product of and ) in the feature space. After solving the optimization problem, the optimal form of the regression function can be obtained as [47]: The SVR performs linear regression in the feature space using the ε-insensitive loss function by minimizing the empirical risk as well as minimizing the regularization term, w 2 to reduce the model complexity (flatness). The slack variables ξ i and ξ * i represents the deviation of training samples out of the ε-insensitive zone. The optimal regression function can be obtained [47]: where C is the regularization constant determining the trade-off between the empirical risk and the regularization term. The above optimization problem can be solved by using Lagrangian multipliers α * i and α i and Karush-Kuhn-Tucker conditions as the following form: where K x i , x j is the kernel function which is defined as the inner product of φ(x i ) and φ(x j ) in the feature space. After solving the optimization problem, the optimal form of the regression function can be obtained as [47]: By setting the parameters C and ε and the kernel parameters, the estimation accuracy can be obtained. The reason for choosing SVR is that it is robust to outliers. The decision model can be easily updated. It has excellent generalization capacity with high prediction accuracy, and its implementation is straightforward.

Ensemble Method
Various experiences show no specific training algorithm in machine learning methods that can be the best and most accurate for all applications. Each algorithm is a particular model based on certain assumptions. Sometimes these assumptions are met, and sometimes they are violated. Therefore, no algorithm alone can succeed in all situations. Ensemble methods have been introduced to solve this problem. The primary motivation for developing the Ensemble method is to reduce the error rate. Forecasting error using the Ensemble approach, a group of techniques is much lower than using a single model. When combining independent and different classifiers, the likelihood of making the right decision is strengthened since each of these classifiers will perform better than a random guess.
Hansen and Salamon [49] presented deploying multiple models on regression. They proved that someone could show that the overall error E decreases uniformly concerning N with the N independent classifier with a probability of error e < 0.5. Also, the overall performance is significantly reduced if someone uses dependent categories. The methodology consists of two consecutive steps: The training and testing phases. As shown in Figure 3, several predictive models are produced using training samples in the training phase. Predictive models would combine to predict the next step or the testing phase.
By setting the parameters and and the kernel parameters, the estimation accuracy can be obtained. The reason for choosing SVR is that it is robust to outliers. The decision model can be easily updated. It has excellent generalization capacity with high prediction accuracy, and its implementation is straightforward.

Ensemble Method
Various experiences show no specific training algorithm in machine learning methods that can be the best and most accurate for all applications. Each algorithm is a particular model based on certain assumptions. Sometimes these assumptions are met, and sometimes they are violated. Therefore, no algorithm alone can succeed in all situations. Ensemble methods have been introduced to solve this problem. The primary motivation for developing the Ensemble method is to reduce the error rate. Forecasting error using the Ensemble approach, a group of techniques is much lower than using a single model. When combining independent and different classifiers, the likelihood of making the right decision is strengthened since each of these classifiers will perform better than a random guess.
Hansen and Salamon [49] presented deploying multiple models on regression. They proved that someone could show that the overall error E decreases uniformly concerning N with the N independent classifier with a probability of error e < 0.5. Also, the overall performance is significantly reduced if someone uses dependent categories. The methodology consists of two consecutive steps: The training and testing phases. As shown in Figure 3, several predictive models are produced using training samples in the training phase. Predictive models would combine to predict the next step or the testing phase. Some popular ensemble methods are Boosting, Bagging, and Blending, of which the Bagging approach will be used in this research. There are two main reasons to choose an Some popular ensemble methods are Boosting, Bagging, and Blending, of which the Bagging approach will be used in this research. There are two main reasons to choose an Ensemble model: performance and robustness. The Ensemble model can make better forecasts and do better than any single model. An Ensemble model reduces the spread or distribution of the estimates and model accuracy.

Feature Selection Methods
Feature selection, variable selection, or attribute selection plays an essential role in classification problems. It reduces the number of attributes by excluding the irrelevant and redundant ones to achieve the lower complexity model (see Figure 4). The more uncomplicated and faster models with fewer variables are desirable in machine learning models. Feature selection is an essential part of the machine learning process, leading to overfitting. Overfitting happens when the model learns details and noises made by too many variables, and then the model will not generalize well when presented with new data.
Ensemble model: performance and robustness. The Ensemble model can make better forecasts and do better than any single model. An Ensemble model reduces the spread or distribution of the estimates and model accuracy.

Feature Selection Methods
Feature selection, variable selection, or attribute selection plays an essential role in classification problems. It reduces the number of attributes by excluding the irrelevant and redundant ones to achieve the lower complexity model (see Figure 4). The more uncomplicated and faster models with fewer variables are desirable in machine learning models. Feature selection is an essential part of the machine learning process, leading to overfitting. Overfitting happens when the model learns details and noises made by too many variables, and then the model will not generalize well when presented with new data. In this research, some feature selections, such as principal component analysis (PCA), particle swarm optimization (PSO), evolutionary search, genetic search, best-first search, and variance inflation factor (VIF), are used.

Model Evaluation
A model evaluation metric quantifies a predictive model's performance, typically involving training a model on a dataset, using the model to make predictions on a "test dataset" not used during training, then comparing the predictions to the expected values in the test dataset. Different authors use different metrics to compare their models. Table  2 shows the evaluation metrics used in this study. In all formulas, T is the target value, output value, and the size of a test dataset in out-of-sample or out-of-fold prediction.  In this research, some feature selections, such as principal component analysis (PCA), particle swarm optimization (PSO), evolutionary search, genetic search, best-first search, and variance inflation factor (VIF), are used.

Model Evaluation
A model evaluation metric quantifies a predictive model's performance, typically involving training a model on a dataset, using the model to make predictions on a "test dataset" not used during training, then comparing the predictions to the expected values in the test dataset. Different authors use different metrics to compare their models. Table 2 shows the evaluation metrics used in this study. In all formulas, y tŷt T is the target value, output value, and the size of a test dataset in out-of-sample or out-of-fold prediction. Table 2. Common types of evaluation metrics.

Accuracy Metrics Formula
T is the size of a test dataset in out of sample prediction

Model Validation
One of the more used statistical analyses, cross-validation, helps assess and validate the machine learning model's performance. The key intention behind evaluating the model is to see whether or not one can check if the trained model is generalizable. As part of the K-fold cross-validation process, the entire data set is first split into several folds. After that, the model is trained on all folds but one and the test model on the remaining fold. The test is reiterated multiple times until the model tests all the folds. Finally, the average scores obtained in every fold are taken as the final metrics. Predictions are made on the test sets that were not used to train the model during the process. These predictions are called 'out of fold predictions,' a type of 'out of the sample' forecast. In contrast to the simple train-test, the method discussed prevents overfitting and helps in a more robust model evaluation form.
Cross-validation on a rolling basis is a method that is used for cross-validating the time series models. According to Kuhn and Johnson [52], the value of k = 10 is expected. The repeated K-fold cross-validation method replicates the entire process multiple times. For instance, if ten-fold cross-validation were repeated five times, it would result in 50 times outof-fold predictions, estimating the model's efficacy. The ten times K-fold cross-validation is a prevalent method to Kuhn and Johnson [52]. As depicted in Figure 5, the process starts with a small subset of data for training. Subsequently, the forecast for the later data point finally, the data point is for checking the accuracy. The same forecasted data point is included in the following training data set basis on which the next data points are predicted.
of the K-fold cross-validation process, the entire data set is first split into several folds. After that, the model is trained on all folds but one and the test model on the remaining fold. The test is reiterated multiple times until the model tests all the folds. Finally, the average scores obtained in every fold are taken as the final metrics. Predictions are made on the test sets that were not used to train the model during the process. These predictions are called 'out of fold predictions,' a type of 'out of the sample' forecast. In contrast to the simple train-test, the method discussed prevents overfitting and helps in a more robust model evaluation form.
Cross-validation on a rolling basis is a method that is used for cross-validating the time series models. According to Kuhn and Johnson [52], the value of k = 10 is expected. The repeated K-fold cross-validation method replicates the entire process multiple times. For instance, if ten-fold cross-validation were repeated five times, it would result in 50 times out-of-fold predictions, estimating the model's efficacy. The ten times K-fold crossvalidation is a prevalent method to Kuhn and Johnson [52]. As depicted in Figure 5, the process starts with a small subset of data for training. Subsequently, the forecast for the later data point finally, the data point is for checking the accuracy. The same forecasted data point is included in the following training data set basis on which the next data points are predicted.

Results and Discussion
This section consists of three parts. In the first part, a multilinear regression model is built for the BTC price prediction problem on monthly BTC prices from 18 August 2010 to 17 September 2018. Data includes macroeconomic and blockchain information indicators. The second part presents two comparative approaches: feature-based and categorybased comparative analysis consisting of OLS, Ensemble methods, SVR, and MLP for the BTC price prediction problem on a daily data set from 11 October 2016 to 12 June 2017. Data is composed of macroeconomic, microeconomic, and technical indicators. All predictions in this part are out-of-fold predictions.
During the k-fold cross-validation process, predictions are made on test sets comprised of data not used to train the model. These predictions are called out-of-fold predictions, a type of out-of-sample predictions. Another analysis similar to the second part is described in the third part on different BTC datasets, including macroeconomic, microeconomic, blockchain information, and technical indicators from 1 January 2018 to 5 June 2018. For validation of results in this research, three metrics, namely RMSE, R 2 , and Pearson r, have been used to compare the out-of-sample and out-of-fold predictive models under the T-test at the significance level of 0.05. The k-fold cross-validation with k = 10 (so-

Results and Discussion
This section consists of three parts. In the first part, a multilinear regression model is built for the BTC price prediction problem on monthly BTC prices from 18 August 2010 to 17 September 2018. Data includes macroeconomic and blockchain information indicators. The second part presents two comparative approaches: feature-based and category-based comparative analysis consisting of OLS, Ensemble methods, SVR, and MLP for the BTC price prediction problem on a daily data set from 11 October 2016 to 12 June 2017. Data is composed of macroeconomic, microeconomic, and technical indicators. All predictions in this part are out-of-fold predictions.
During the k-fold cross-validation process, predictions are made on test sets comprised of data not used to train the model. These predictions are called out-of-fold predictions, a type of out-of-sample predictions. Another analysis similar to the second part is described in the third part on different BTC datasets, including macroeconomic, microeconomic, blockchain information, and technical indicators from 1 January 2018 to 5 June 2018. For validation of results in this research, three metrics, namely RMSE, R 2 , and Pearson r, have been used to compare the out-of-sample and out-of-fold predictive models under the T-test at the significance level of 0.05. The k-fold cross-validation with k = 10 (so-called cross-validation on a rolling basis) is used to construct a high-performance model and have robust results. Results are averaged on 100 prediction trials.

The BTC Price Prediction Problem Using OLS
According to the theoretical analysis regarding demand and supply theory, macroeconomic indicators have long-term predictability power on BTC prices. For the empirical analysis, a multilinear regression model is built for the BTC price prediction problem (model 1 in the Appendix A) on monthly BTC prices from 18 August 2010 to 17 September 2018, including macroeconomic and blockchain information indicators.

Data Description
Monthly BTCUSD transactions occurring on the significant BTC exchanges, available at blockchain.com from 18 August 2010 and ending on 17 September 2018, including 24 variables, have been examined. Dependent variables can be categorized into Macroeconomic indicators and Blockchain information indicators obtained via provided API at blockchain.com (see Table 3). Some descriptive statistics, including minimum, maximum, mean, and standard deviation, have been calculated and shown in Table A1 in the Appendix A.1 to describe or summarize the data. First, data cleaning, including estimating outliers (extreme values) and missing values, has been applied to the raw data to build a better data set. After that, VIF is applied to the data set to deal with multicollinearity. Table A2 in the Appendix A.1 shows variables, namely, market capitalization, transactions per block, Hash Rate, mining difficulty, cost per transaction, total transactions per day, Nasdaq Composite, Dow Jones Industrial Average, and S&P 500, which have a VIF greater than 10. Instead of dropping variables, the entire sample period has been tested in nine models with different combinations of variables. Table A3 in the Appendix A.1 shows the results of nine regression models built to avoid multicollinearity. The variables in quotes are the variables with a high correlation. They are added to the rest of the variables to build a new regression model. The response variable in each model is the BTC price. The value in parentheses represents the results of the t-test for the null hypothesis-rejecting variables, based on a p-value of 0.05. The R 2 from regression models is relatively high, suggesting that, for example, approximately 73% of the variation in BTC prices in model "9" is determined by the variables in the model. Due to the t-statistics and p-value, all models are statistically significant. By looking at the coefficients, which are not tiny, it is evident that all variables are economically significant for the models.

OLS Regression for BTC Price Prediction
The regression analysis showed that the significant macroeconomic indicators in all models for monthly BTC price are market capitalization, Nasdaq Composite, Dow Jones Industrial Average, and S&P500. Therefore, macroeconomic indicators have longterm predictive power on BTC prices as expected a priori and the t-statistic indicates the significance of the results. Also, blockchain information indicators, including the block size, cost per transaction, mining difficulty, hash rate, transaction fees, and estimated transaction value, verify that the supply and demand theory is the underlying theory of predictors. Therefore, blockchain information indicators have a long-term predictive power on BTC prices as expected a priori. The t-statistic indicates that it is highly statistically significant that blockchain information indicators influence the price confirming that the cost-based pricing theory is underlying the predictors. Empirical results answer the first and second research questions. (1) What are the significant variables as short-term or long-term BTC price predictors? (2) What are the underlying economic theories of BTC price predictors?

Proposed Comparative Analysis for Dataset 1
According to the theoretical analysis regarding demand and supply theory, macroeconomic indicators do not have short-term predictability power on BTC prices. For the empirical analysis, a comparative machine learning model, including OLS, Ensemble methods, SVR, and MLP for the BTC price prediction problem on data sets from 11 October 2016, to 12 June 2017, including macroeconomic, microeconomic, and technical indicators. Feature selections, namely Best First Search, PSO Search, and Evolutionary Search, are applied to the data. The price prediction model is described in the Appendix A (model 2).

Data Description
Daily BTC/USD transactions occurring on the Bitfinex exchange, obtained via provided API at bitfinex.com (accessed on 2 October 2019) from 11 October 2016, to 12 June 2017, including 22 independent variables, have been examined. Dependent variables can be categorized into three groups; Macroeconomic indicators, obtained at fred.stlouisfed.org, and microeconomic and technical indicators extracted from bitfinex.com. Table 4 shows the specification for each group. Some descriptive statistics, including minimum, maximum, mean, and standard deviation, have been calculated and shown in Table A4 in the Appendix A.2 to describe or summarize the data.

Feature-Based Comparative Analysis
This section applies the comparative analysis to different datasets containing the indicators chosen by different feature selection techniques, including VIF, genetic search, evolutionary search, and best-first search. Table A5 in the Appendix A.2 shows the different features chosen by various methods. The comparison is conducted under the T-test at the significance level of 0.05 by WEKA software (version 3.9.4, developed at the University of Waikato, New Zealand). To evaluate the predictive machine learning models' performance and have robust results, the 10-fold cross-validation on a rolling basis evaluation technique is used, and each model is repeated ten times. Therefore, the average results of 100 prediction trials, including the forecasting ability of models, namely RMSE and Pearson's r, are shown in Tables 5 and 6. The standard deviation is shown in parenthesis.  According to Tables 5 and 6, the SVR performs better on the attributes made by PCA. Thus, one can use a combination of SVR and PCA to boost the model. No feature selection can improve the models. The VIF method is the worst feature selection method among the mentioned feature selection methods due to the poor prediction results. Different models are compared to identify the best model for each data set, except for VIF data (due to the not promising forecasting results). Table 7 summarizes the model's comparisons, showing that the SVR model has the best accuracy and the MLP has the worst accuracy.

Category-Based Comparative Analysis
This section applies the comparative analysis to different datasets containing different categories such as macroeconomic, microeconomic, and technical indicators. Comparison is conducted under the T-test at the significance level of 0.05 by WEKA software. To evaluate the predictive machine learning models' performance and have robust results, the 10-fold cross-validation on a rolling basis evaluation technique is used, and each model is repeated ten times. Therefore, the average results of 100 prediction trials, including the forecasting ability of models, namely RMSE and Pearson's r, are shown in Tables 8 and 9. The standard deviation is represented in parenthesis. According to Tables 8 and 9, technical indicators impact prediction results in OLS and SVR models. The Ensemble methods and MLP models have the best accuracy on the data, including all variables. Prediction using technical indicators or using all indicators has nearly the same accuracy. In addition, all models applied on the macroeconomic and microeconomic indicators have bad accuracy with a very low Pearson's r and high RMSE. Therefore, it is not recommended to be used. The order of indicators according to their impact on prediction is shown in Table 10. Models applied to all attributes, and technical indicators are compared in Table 11. In both cases, the SVR model outperforms other models. Also, MLP is considered the worst model.  Table 11. The order of the models in terms of accuracy.

Indicators Models
All Indicators SVR, OLS, Ensemble methods, and MLP Technical Indicators SVR, OLS, Ensemble methods, and MLP The category-based comparative analysis showed that macroeconomic indicators (trade-weighted US dollar index, gold fixing price, DJIA index, Brent crude oil price, and WTI) are not significant predictors for short-term BTC price. Microeconomic indicators are also not significant except for the MLP model. In addition, technical indicators, namely volume, MTM, CCI, and SMA, predict the price with nearly the same accuracy as the prediction model using all indicators. Therefore, the recommendation is to use technical analysis to predict the short-term BTC price. These empirical results answer the first and second research questions. (1) What are the significant variables as short-term or longterm BTC price predictors? (2) What are the underlying economic theories of BTC price predictors? To answer the third research question (What machine learning model performs better? What are the best feature selection techniques?), empirical results showed that the SVR model in feature-based and category-based comparative analyses outperform other models. Also, in terms of data preparation, no feature selection improved the model, and VIF turned out to be the worst feature selection.

Proposed Comparative Analysis for Dataset 2
According to the theoretical analysis regarding demand and supply theory and costbased pricing theory, macroeconomic and blockchain information indicators do not have short-term predictability power on BTC prices. For the empirical analysis, a comparative machine learning model, including OLS, Ensemble methods, SVR, and MLP for the BTC price prediction problem on datasets from 1 January 2018 to 5 June 2018, including macroeconomic, microeconomic, technical indicators, and blockchain information indicators. Feature selections, namely best first search, PSO search, and evolutionary search, are applied to the data. The price prediction model is described in the Appendix A (model 3).

Data Description
Daily BTCUSD transactions occurring on the Bitfinex exchange obtained via provided API at bitfinex.com from 1 January 2018, to 5 June 2018, including 17 independent variables, have been examined. Dependent variables can be categorized into macroeconomic variables, extracted from macrotrends.net (accessed on 2 October 2019), microeconomic, technical indicators, and Blockchain information indicators obtained from data.BTCity.org. Table 12 shows the specification for each group. Some descriptive statistics, including minimum, maximum, mean, and standard deviation, have been calculated and shown in Table A6 in the Appendix A.3 to describe or summarize the data.

Feature-Based Comparative Analysis
This section applies the comparative analysis to different datasets containing the indicators chosen by different feature selection techniques, including best-first search, evolutionary search, PSO search, and PCA dimension reduction methods. Table A7 in the Appendix A.3 presents the different features chosen by other methods. For the analysis, machine learning models, including OLS, Ensemble methods (bagging), SVR (with a polynomial kernel), and MLP (with one hidden layer and nine neurons), have been applied to different datasets, which include the indicators selected by other feature selections. The aim is to specify the best feature selection method and determine the best machine learning method. To evaluate the predictive machine learning models' performance and have robust results, the 10-fold cross-validation on a rolling basis evaluation technique is used, and each model is repeated ten times. Therefore, the average results of 100 prediction trials, including the forecasting ability of models, namely RMSE and Pearson's r, are shown in Tables 13 and 14. According to Tables 13 and 14, all models applied to all indicators have the best accuracy than those applied to the other datasets. Therefore, it can be concluded that no feature selection improves the model's accuracy. Compared to those applied to the different datasets, all models applied to data reduced by PCA have the lowest accuracy. Therefore, it can be concluded that the PCA reduction method is not a promising feature selection method for this research data. Different models are compared together for each data set to identify the best model. Table 15 summarizes the model's comparisons, showing that the SVR model has the best accuracy for all datasets, and the MLP has the least accuracy.

Category-Based Comparative Analysis
OLS, Ensemble methods, SVR, and MLP are applied to economic and technical indicators. The aim is to see which indicators can be selected as better predictive indicators. Also, different models are compared on the same data to find a more accurate model. To evaluate the predictive machine learning models' performance and have robust results, the 10-fold cross-validation on a rolling basis evaluation technique is used, and each model is repeated ten times. Therefore, the average results of 100 prediction trials, including the forecasting ability of models, namely RMSE and Pearson's r, are shown in Tables 16 and 17. According to Tables 16 and 17, all models applied to all indicators have the best accuracy. Therefore, it is recommended that the combination of technical, microeconomics, macroeconomic, and Blockchain information indicators work better for price prediction than each indicator category alone. Moreover, technical indicators are also considered good predictors. However, prediction slightly improves by combining with other variables.
Blockchain information and macroeconomic indicators are considered bad predictive indicators due to the very low Pearson's r and high RMSE. The order of indicators according to their impact on prediction is shown in Table 18. Models applied on all indicators and technical indicators are compared in Table 19. In both cases, the SVR model outperforms other models. Also, MLP is considered the worst model.  The results of the category-based comparative analysis showed that macroeconomic indicators (trade-weighted US dollar index, gold-fixing price, DJIA index, Brent crude oil price, and WTI) are not significant predictors. Also, the Blockchain information indicators, including hash rate, mining difficulty, number of transactions per block, and block time, are not significant predictors for short-term BTC price. Also, microeconomic indicators, including trades per minute, bid/ask sum, bid-ask spread, and buy/sell signals, are not significant for the BTC price prediction except for the MLP model. Since the technical indicators have nearly the same results as all indicators, the recommendation is to use the technical analysis to predict the short-term BTC price. These empirical results answer the first and second research questions. (1) what are the significant variables as short-term or long-term BTC price predictors? (2) What are the underlying economic theories of BTC price predictors? To answer the third research questions (What machine learning model performs better? What are the best feature selection techniques?), empirical results showed that the SVR model in feature-based and category-based comparative analyses outperform the other models. In terms of data preparation, no feature selection improved the model, and PCA dimension reduction turned out to be the worst feature selection.

Conclusions
Today, international finance is a multi-trillion-dollar sector that needs a secure and stable mechanism that cryptocurrencies are currently inching. Cryptocurrencies were developed under Blockchain technology. In contrast with the traditional central authority systems wherein the sole control lies under one organization, Blockchain technology has a diversified approach. This paper applied several machine learning models to the BTC price prediction model on different data sets to verify the theoretical analysis and answer the research questions. A multilinear regression model to monthly BTC prices showed that macroeconomic and Blockchain information indicators are significant longterm predictors. That verifies that supply and demand and cost-based pricing theory are underlying BTC price predictors. These empirical results answer the first and second research questions. (1) What are the significant variables as short-term or long-term BTC price predictors? (2) What are the underlying economic theories of BTC price predictors? In addition, the empirical results showed that SVR is the best machine learning model, and no feature selection technique is proven to be the best, which answers the third research questions (Are machine learning algorithms superior to traditional methods for BTC price prediction? What machine learning model performs better? What are the best feature selection techniques?).
The conclusions are relevant to central bankers, investors, asset managers, etc., who are generally interested in information about which indicators provide reliable, accurate forecasts of BTC price. The study can be used to set asset pricing and improve investment decision-making. Therefore, it provides a significant opportunity to contribute to international finance since the results have significant implications for the future decisions of asset managers. In time series prediction, the correlation between independent variables and dependent variables differs from time to time. Consequently, reestimating prediction models is not unlikely. This study has used many data categories composing macroeconomic, microstructure, Blockchain information, and technical indicators to make a wide-ranging work.
In this study, attributes are selected based on economic theories. Macroeconomic indicators are chosen based on the supply and demand theory. Microstructure theory is the underlying theory of microeconomic indicators. Also, Blockchain information indicators are selected according to the cost-based pricing theory. Previous studies are mostly empirical research in which the focus is on the prediction methods. After describing the price movement from the perspective of economic theories, the empirical results confirmed the theoretical analysis. This study compared methodologies to predict short-term and long-term BTC prices. The conclusion is also helpful for machine learning developers to understand the configuration of machine learning prediction models and use it as benchmarks. According to the literature review, the authors still doubt whether machine learning can beat the traditional methods for BTC price prediction. Therefore, this study is evidence of the superiority of machine learning. This research has some suggestions for future work, which are as follows. In this research, only a few critical feature selection methods have been applied to data sets. Many other attribute selection techniques, including ranker search, Tabu search, and many more, can be examined to improve the model. Other research can compare trending models, such as recurrent neural networks (RNN) to SVR. According to this research, a correct prediction of BTC prices can be profitable; therefore, it can diversify a portfolio. Further studies can be conducted to examine the portfolio return by adding BTC to a portfolio to determine the right amount of BTC to keep. Future research can predict other cryptocurrencies, including Ethereum and Ripple. In addition, some other indicators, such as "news," can be investigated in other studies.  Data Availability Statement: The datasets used and analyzed during this study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest. X i,21 : West Texas Intermediate crude oil (WTI) prices or Texas light sweet in month i is a benchmark in oil pricing, refined mainly in the Midwest and Gulf Coast regions in the United States.
X i, 22 : US federal funds rate in month i is the interest rate at which depositors trade federal funds with each other at night. When a depository institution has a surplus in its reserve accounts, it can lend to other banks that need those funds. In other words, a bank with extra cash can lend it to another bank with a liquidity problem, and thus the cash balance of a bank with a problem with cash increases rapidly.
X i, 23 : Breakeven inflation rate in month i is a measure of expected inflation, the difference between a nominal bond's yield and an inflation-linked bond with the same maturity.    '***' Significant at the 0.001 level, '**' Significant at the 0.01 level, '*' Significant at the 0.05 level, '.' Significant at the 0.1 level.