You are currently viewing a new version of our website. To view the old version click .
Journal of Risk and Financial Management
  • Article
  • Open Access

10 February 2022

Forecasting the Price of the Cryptocurrency Using Linear and Nonlinear Error Correction Model

,
and
1
Statistics Discipline, Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN 56267, USA
2
School of Business and Natural Science, Black Hills State University, Spearfish, SD 57783, USA
3
Department of Finance, Bloomsburg University of Pennsylvania, Bloomsburg, PA 17815, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Machine Learning Applications in Finance

Abstract

We employed linear and nonlinear error correction models (ECMs) to predict the log returns of Bitcoin (BTC). The linear ECM is the best model for predicting BTC compared to the neural network and autoregressive models in terms of RMSE, MAE, and MAPE. Using a linear ECM, we are able to understand how BTC is affected by other coins. In addition, we performed Granger-causality tests on fourteen cryptocurrencies.

1. Introduction

The Coronavirus Disease 2019 (COVID-19) pandemic made the investment environment more uncertain. Since then, it has been very difficult for investors to forecast financial markets because of the uncertainty of this pandemic. Bitcoin, a decentralized currency in the cryptocurrency market, has emerged a popular investment asset and is often referred to as the currency of the future. Recently, various error correction models have been applied to the cryptocurrency market. Using an ECM, Liang (2021) found the result that the relationship between Bitcoin rate of return and the relevant indicators to measure monetary function was not significant, which rejects the original assumption that Bitcoin can assume monetary function, indicating that Bitcoin does not have the ability and potential to assume monetary function. Haffar and Fur (2021) analyzed the impact of shocks in the financial markets of emerging and developed countries on the price of Bitcoin using a structural vector error correction model. Keilbar and Zhang (2021) analyzed the role of cointegration relationships within a large system of cryptocurrencies using a vector error correction model (VECM) framework. Szetela et al. (2021) verified the existence of short-term and long-term relationships between the strength of a trend and the volume in bullish and bearish cryptocurrency markets through the application of a VECM to Bitcoin daily data. Giudici and Pagnottoni (2020) investigated return connectedness across eight of the major exchanges of Bitcoin, both from a static and a dynamic viewpoint by employing an extension of the order-invariant forecast error variance decomposition proposed by Diebold and Yilmaz (2012) to a generalized vector error correction framework. Using a time-varying VEC model, Chang and Shi (2020) examined the dynamic information shares of the top four cryptocurrencies: Bitcoin, Ethereum, Ripple, and Litecoin. Kapar and Olmo (2020) proposed an empirical model for analyzing the dynamics of Bitcoin prices by considering a VEC model over two overlapping periods: 2010–2017 and 2010–2019. These findings provided empirical evidence on the presence of a correction in the price of Bitcoin during the period 2018–2019 uncorrelated to market fundamentals. Ibrahim et al. (2020) forecasted the Bitcoin closing price using vector autoregression (VAR) and Bayesian vector autoregression (BVAR) prediction models. Experimental results showed that the VAR models achieved better performance compared to the traditional autoregression models and the BVAR models. Hakim das Neves (2020) studied the relationship between the price of virtual currency, the price of Bitcoin, and the number of Google searches that used the terms bitcoin, bitcoin crash, and crisis between December 2012 and February 2018 by using an error correction model. Goczek and Skliarov (2019) aimed to determine what drives the price of Bitcoin and analyzed a large set of data by using VEC models augmented by factors representing unobservable economic forces. Goczek and Skliarov (2019) also found that the main factor driving the price of Bitcoin is its popularity. Wang et al. (2016) performed a cointegration analysis and used a VEC model to demonstrate that there is a relationship between price of Bitcoin and some variables, including stock price index, the price of oil, and the daily trading volume of Bitcoin. The short-run analysis of Wang et al. (2016) revealed that price of oil and Bitcoin trading volume had little influence on the price of Bitcoin, whereas the stock price index had a relatively larger influence on the price of Bitcoin. Georgoula et al. (2015) used a VECM to investigate the existence of long-term relationships between cointegrated variables. Georgoula et al. (2015) revealed that the price of Bitcoin was positively associated with the number of Bitcoins and negatively associated with the S&P 500 stock market index.
Even though Bitcoin is the most dominant cryptocurrency, and it was found to influence other cryptocurrencies by Kwapień et al. (2021), our goal in this research was to build a model to predict Bitcoin (BTC) log-return prices based on other cryptocurrencies’ prices, because BTC has strong correlations with other major cryptocurrencies, such as Ethereum (ETH) and Binance Coin (BNB). Currently, Miller and Kim (2021) have applied several deep learning time-series models to predict BTC log-return prices, but there is no standard guideline for selecting the correct deep learning tools, which requires knowledge of the topology, training method, and other parameters. Therefore, we still need a prediction model from which researchers can make statistical inferences on cryptocurrency price data. We propose linear and nonlinear ECM prediction models compared with the current available univariate time-series models, including the neural network time-series model.
This paper is organized as follows. Section 2 presents the summary and graphical data analysis for the top fourteen cryptocurrencies. Section 3 gives an overview of the econometrical models used in this study. The illustrated comparison study for the proposed methods is performed in terms of the measures of errors is in Section 4, with the conclusion presented Section 5.

2. Description of Data

The cryptocurrency data used in this study were obtained from crypto2 R package. The variables for each of the cryptocurrency datasets before manipulation were low, open, time, high, volume from, volume to, conversion type, conversion symbol, and close. The data period studied for each of the fourteen cryptocurrencies was from 1 January 2019 to 27 August 2021. Each variable is calculated as log ( P t P t 1 ) . Table 1 shows the 14 cryptocurrencies that are used in this paper.
Table 1. Variable definitions.
Table 2 shows the summary statistics for each of the cryptocurrency datasets. In terms of the median for log returns, Bitcoin (BTC), Ethereum (ETH), Cardano (ADA), Binance Coin (BNB), XRP, Tether (USDC), Bitcoin Cash (BCH), Litecoin (LTC), Chainlink (LINK), Ethereum Classic (ETC), and Stellar (XLM) have positive values. ADA has the highest median log returns, but DOGE has the highest mean log returns among 14 cryptocurrencies. The values of kurtosis in the log returns of all cryptocurrencies in Table 2 are greater than 3, meaning heavy tails compared to normal distribution. BTC, ETH, ADA, BNB, XRP, BCH, LTC, LINK, and ETC are left skewed while USDT, DOGE, USDC, LUNA, and XLM are right skewed.
Table 2. Summary Statistics.
Table 3 shows Pearson and Kendall correlations between the price log returns of cryptocurrencies. We can notice that (BTC, ETH), (BTC, XRP), (BTC, BCH), (BTC, LTC), (ETH, ADA), (ETH, BNB), (ETH, XRP), (ETH, BCH), (ETH, LTC), (ETH, ETC), (ETH, XLM), (ADA, XRP), (ADA, BCH), (ADA, LTC), (ADA, XLM), (XRP, BCH), (XRP, LTC), (XRP, ETC), (XRP, XLM), (BCH, LTC), (BCH, ETC), (BCH, XLM), (LTC, ETC), and (LTC, XLM) have high Pearson and Kendall correlations, greater than or equal to 0.50.
Table 3. Correlation Tables. Data range is from 1/1/2019 to 8/27/2021. Each variable is calculated as log(Pt/Pt−1) and expressed as a percentage.
Table 4 shows the augmented Dickey–Fuller (ADF) unit root tests for the log returns of 14 cryptocurrencies. The p-values in Table 4 are smaller than the significance level 0.05. This means that the log returns of the 14 cryptocurrencies are stationary time-series data. With the tsm and vars R packages, we performed an alternative unit root test when the data include a structural break in Table 4 and we also rejected the null of unit root with constant and trend at the 5% significance level. We also performed principal component analysis by using “prcomp” command in stats R package. Figure 1 shows the 2D and 3D graphical presentations by using the leading two and three principal components in principal component analysis. From Figure 1, LUNA, USDT, and USDC have different locations compared to other cryptocurrencies, including BTC and ETH.
Table 4. Augmented ADF and alternative unit root tests. Data range is from 1/1/2019 to 8/27/2021. Each variable is calculated as log(Pt/Pt−1) and expressed as a percentage.
Figure 1. Graphical display by principal component analysis.

3. Econometrical Methods

In this section, we briefly define the econometric methods that we used in this paper. We first want to look at the causality in mean by using the linear Granger causality (Granger 1969) in a vector autoregressive (VAR) system to explore informational linkages between pairs of markets. Given any pair of stationary data ( X t and Y t ), variable X t Granger-causes Y t linearly, provided that lags of X t offer a significant information for explaining the current values of Y t . The bivariate Granger causality is specified in a VAR system as follows:
X t = φ 1 + i = 1 k a 1 i X t i + i = 1 k b 1 j Y t j + v 1 t  
and
Y t = φ 2 + i = 1 k a 2 i X t i + i = 1 k b 2 j Y t j + v 2 t
where φ 1 and φ 2 are the constant terms of the system of equations; a and b denote estimated coefficients; k is the optimal lag length based on the Akaike information criterion (AIC); and v 1 t and v 2 t   represent residuals from the VAR model. The general format of an error correction model (ECM) is:
Δ y t = β 0 + β 1 Δ x i ,   t + + β i Δ x i ,   t + γ ( y t 1 ( α 1 x 1 ,   t 1 + + α i x i ,   t 1 ) )
The ECM function of the R Package ‘ecm’ in Bansal (2021) modifies the equation to the following:
Δ y t = β 0 + β 1 Δ x i ,   t + + β i Δ x i ,   t + γ y t 1 + γ 1 x 1 ,   t 1 + + γ i x i ,   t 1
where γ i = γ α i , so it can be modeled as a simpler ordinary least squares (OLS) function using R’s lm function.
By default, R’s base ‘lm’ is used to fit the model. However, researchers can opt to use ‘earth’, which uses Jerome Friedman’s multivariate adaptive regression splines (MARS) to build a nonlinear regression model, which transforms each continuous variable into piecewise linear hinge functions. This allows for non-linear features in both the transient and equilibrium terms. ECM models are used for time-series data.
To forecast the log-returns of BTC, we used the Hyndman et al. (2021) “forecast” R package for employing univariate time-series models, such as autoregressive integrated moving average (ARIMA) model, exponential smoothing state space (ETS) model, autoregressive fractional integrated moving average (ARFIMA) model, BATS model (exponential smoothing state space model with Box-Cox transformation, ARMA errors, trend and seasonal components), TBATS, which is a modification of BATS that allows for multiple non-integer seasonality cycles, and neural network autoregressive (NNAR) model, which is a feed-forward neural network with a single hidden layer and lagged inputs for forecasting univariate time series. We also used a hybrid univariate time-series model through the hybridModel function in “forecastHybrid” R package from Shaub and Ellis (2020). The hybridModel function fits multiple individual model specifications to allow the easy creation of ensemble forecasts. With our data, the automated selected model from hybridModel function is (ETS, NNAR, THETAM, TBATS). THETAM fits an exponential smoothing state space model with an artificial neural network to the target variable, having first performed classic multiplicative seasonal adjustment. These two “forecast” and “forecastHybrid” R packages automatically select the best model in each time-series model based on the AIC model selection method.

4. Data Analysis

In this Section, we look at Granger-causality test and perform a comparison of forecasting methods. Firstly, Table 5 shows the Granger-causality test result, which shows that the Granger cause variables with lag 1 order to BTC are ADA, DOGE, ETC, and XLM, the Granger cause variables with lag 1 order to ETH are BTC, ADA, DOGE, ETC, and XLM, and the Granger cause variables with lag 1 order to XRP are BTC and XLM at the 5% significance level.
Table 5. Granger Causality Test.
To forecast the log returns of BTC, we divide the data into 80% of the total observations (968), which are the training data (774 observations), and 20% of the total observations (968), which are the test data (194 observations). To compare the accuracy of the univariate time series models, we employ three measures.
Root mean square (prediction) error (RMSE):
R M S E = t = 1 n ( y t y ^ t ) 2 n
Mean absolute error deviation (MAE):
M A E = t = 1 n | y t y ^ t | n
Weighted Mean absolute percentage error (WMAPE):
W M A P E = t = 1 n | y t y ^ t | t = 1 n | y t |
The metric errors such as the RMSE, MAE, and WMAPE are used to analyze the performance of the methods. MAE is not sensitive to outliers as they are weighted less than the other observations when comparing actual and predicted values. RMSE takes bias and variance into account, but normalizes the units. Model 1 is a linear ECM model of BTC with 13 other cryptocurrencies; the summary of estimates is shown in Table 6. The R-squared of model 1 was 0.883. Model 2 is a nonlinear MARS-based ECM model of BTC with 13 other cryptocurrencies; the summary of estimates is shown in Table 7. The R-squared of nonlinear ECM (model 2) was 0.888. Table 8 shows the measures of accuracy of forecasting BTC. Among the eight different univariate time series models, model 1 has the smallest values of accuracy in terms of RMSE, MAE, and WMAPE. Thus, our proposed ECM prediction to BTC price log returns performed better than other univariate time-series models, including the neural network time-series model. Therefore, our ECM prediction model can help cryptocurrency market investors to identify threats to capital and earnings well from an uncertain and unexpected financial volatility. Financial policy committees in each country can reduce the difficulty of making future financial decisions with predictable cryptocurrency price log-returns information from our prediction model.
Table 6. Summary for model 1 with training data.
Table 7. Summary for model 2 with training data.
Table 8. Measures of accuracy of forecasting BTC.

5. Conclusions

By using linear ECMs and nonlinear ECMs, comprising six different univariate time series models such as neural network and autoregressive models to predict the price log returns of cryptocurrencies based on their previous values and relationships with each other, a better understanding can be achieved on whether they can be used to predict the log returns of BTC. We found that the linear ECM was the best model compared to other machine learning univariate time-series models. We can use linear ECM for predicting future log-return prices of each cryptocurrency with highly correlated cryptocurrencies.

Author Contributions

Conceptualization, J.-M.K.; methodology, C.J. and J.-M.K.; software, C.C. and J.-M.K.; validation, C.C. and J.-M.K.; formal analysis, C.C., C.J. and J.-M.K.; investigation, C.J.; resources, C.J. and J.-M.K.; data curation, C.J., and J.-M.K.; writing—original draft preparation, C.C., C.J., and J.-M.K.; writing—review and editing, C.C. and J.-M.K.; visualization, C.C., and J.-M.K.; supervision, C.C. and J.-M.K.; project administration, J.-M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bansal, Gaurav. 2021. Build Error Correction Models. R Package ‘ecm’. Vienna: Foundation for Statistical Computing. [Google Scholar]
  2. Chang, Le, and Yanlin Shi. 2020. Does Bitcoin dominate the price discovery of the Cryptocurrencies market? A time-varying information share analysis. Operations Research Letters 48: 641–45. [Google Scholar] [CrossRef]
  3. Diebold, Francis X., and Kamil Yilmaz. 2012. Better to give than to receive predictive directional measurement of volatility spillovers. International Journal of Forecasting 28: 57–66. [Google Scholar] [CrossRef] [Green Version]
  4. Georgoula, Ifigeneia, Demitrios Pournarakis, Christos Bilanakos, Dionisios Sotiropoulos, and George M. Giaglis. 2015. Using Time-Series and Sentiment Analysis to Detect the Determinants of Bitcoin Prices. Paper presented at the 2015 Mediterranean Conference on Information Systems, Samos, Greece, October 3–5. [Google Scholar]
  5. Giudici, Paolo, and Paolo Pagnottoni. 2020. Vector error correction models to measure connectedness of Bitcoin exchange markets. Applied Stochastic Models in Business and Industry 36: 95–109. [Google Scholar] [CrossRef] [Green Version]
  6. Goczek, Łukasz, and Ivan Skliarov. 2019. What drives the Bitcoin price? A factor augmented error correction mechanism investigation. Applied Economics 51: 6393–410. [Google Scholar] [CrossRef]
  7. Granger, Clive W. J. 1969. Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica 37: 424–38. [Google Scholar] [CrossRef]
  8. Haffar, Adlane, and Eric Le Fur. 2021. Structural vector error correction modelling of Bitcoin price. The Quarterly Review of Economics and Finance 80: 170–78. [Google Scholar] [CrossRef]
  9. Hakim das Neves, Rodrigo. 2020. Bitcoin pricing: Impact of attractiveness variables. Financial Innovation 6: 1–18. [Google Scholar] [CrossRef]
  10. Hyndman, Rob, George Athanasopoulos, Chrisoph Bergmeir, Gabriel Caceres, Leanne Chhay, Mitchell O’Hara-Wild, Fotios Petropoulos, Slava Razbash, Earo Wang, Farah Yasmeen, and et al. 2021. Forecast: Forecasting Functions for Time Series and Linear Models. Vienna: Foundation for Statistical Computing. [Google Scholar]
  11. Ibrahim, Ahmed, Rasha Kashef, Menglu Li, Esteban Valencia, and Eric Huang. 2020. Bitcoin Network Mechanics: Forecasting the BTC Closing Price Using Vector Auto-Regression Models Based on Endogenous and Exogenous Feature Variables. Journal of Risk and Financial Management 13: 189. [Google Scholar] [CrossRef]
  12. Kapar, Burcu, and Jose Olmo. 2020. Analysis of Bitcoin prices using market and sentiment variables. The World Economy 44: 45–63. [Google Scholar] [CrossRef]
  13. Keilbar, Georg, and Yanfen Zhang. 2021. On cointegration and cryptocurrency dynamics. Digital Finance 3: 1–23. [Google Scholar] [CrossRef]
  14. Kwapień, Jarosław, Marcin Wątorek, and Stanisław Drożdż. 2021. Cryptocurrency Market Consolidation in 2020–2021. Entropy 23: 1674. [Google Scholar] [CrossRef] [PubMed]
  15. Liang, Huan. 2021. Application of Error Correction model with monetary function factors in the return of bitcoin. Journal of Physics: Conference Series 1941: 012058. [Google Scholar] [CrossRef]
  16. Miller, Dante, and Jong-Min Kim. 2021. Univariate and Multivariate Machine Learning Forecasting Models on the Price Returns of Cryptocurrencies. Journal of Risk and Financial Management 14: 486. [Google Scholar] [CrossRef]
  17. Shaub, David, and Peter Ellis. 2020. Convenient Functions for Ensemble Time Series Forecasts. R Package ‘forecastHybrid’. Vienna: Foundation for Statistical Computing. [Google Scholar]
  18. Szetela, Beata, Grzegorz Mentel, Yuriy Bilan, and Urszula Mentel. 2021. The relationship between trend and volume on the bitcoin market. Eurasian Economic Review 11: 25–42. [Google Scholar] [CrossRef]
  19. Wang, Junpeng, Yubo Xue, and Minghao Liu. 2016. An Analysis of Bitcoin Price Based on VEC Model. Paper presented at the 2016 International Conference on Economics and Management Innovations, Beijing, China, July 9–10. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.