Forecasting of Bitcoin Illiquidity Using High-Dimensional and Textual Features
Abstract
:1. Introduction
- Providing an approach to calculate the illiquidity label;
- Examining data split policies in illiquidity;
- Investigating the impact of deep learning approaches in predicting illiquidity;
- Presenting a hybrid approach to forecast illiquidity.
- Collection of hash rate and historical Bitcoin data;
- Data preprocessing (dividing data into different intervals, applying different indicators, imputing missing values, removing outliers and labeling data);
- Applying the RNN network to predict illiquidity.
- The daily price of Bitcoin is in the form of a daily cycle, which means there is no daily close price.
- There are no case studies to predict the lack of listening criticism, so in the best case, comparisons with previous approaches cannot be made.
- Price features are not capable of predicting the price alone at best, so we will need rich features to predict illiquidity.
- According to the continuous price of Bitcoin (lack of closes), is there a solution to calculate its illiquidity?
- Are deep learning approaches suitable solutions for predicting illiquidity?
- Do the features of the eight Bitcoin rates provide a good view of predicting illiquidity?
2. Related Work
- Description of the market structure (which can be referred to the research in [24]): In this article, the authors examine Bitcoin investments by estimating transaction costs and daily trading patterns of the BTC–USD exchange rate. They found that implicit transaction costs are low, and the number of investments involved is lower than in major global markets. Also, the depth is sufficient for midterm trades. Bitcoin shows a distinct intraday pattern, with significant trading throughout the day. Transaction volume has a positive correlation with volatility and a negative correlation with capital expansion. Overall, their results show that Bitcoin is particularly investable for retail transactions.
- The relationship between liquidity and volatility [25,26]: In [25], the liquidity of 456 different digital currencies was examined, where it was shown that the predictability of returns in digital currencies with high market liquidity decreases. It was also shown that while Bitcoin returns show signs of efficiency, cryptocurrencies are autocorrelated and non-independent. Their findings also show a solid cross-sectional relationship between panic strength and liquidity. Therefore, they concluded that liquidity plays a vital role in market efficiency and the predictability of returns in new digital currencies. In [26], it was also investigated whether the volatility and liquidity of digital currencies are related to each other or not. Their data sample included 12 digital currencies with high trading capital. They considered daily and weekly liquidity. In order to investigate the dependence between digital currencies, they used the causality approach. They used the asymmetric causality test to separate the effect of growth and volatility reduction from changes in liquidity and vice versa. Overall, the empirical results show that high volatility is a Granger cause of high liquidity, which means that high volatility attracts investors and induces more interest in new financial instruments. The Granger causality test, a statistical hypothesis test to determine whether a one-time series helps predict another, was first proposed in 1969. Typically, regression reflects “pure” correlations. However, Clive Granger argued that causality in economics could be tested by measuring the ability to predict future values of one-time series using previous values of another time series.
- Liquidity [27,28]: In Ref. [27], the authors analyzed the liquidity of four digital currencies in four major trading venues over four years. They estimated the Abdi–Ranaldo spread estimator from the hourly transaction data and compared the liquidity of cryptocurrencies and exchanges. In order to identify the drivers of digital currency liquidity, they analyzed a comprehensive set of explanatory variables from general financial markets and global digital currency markets, as well as specific variables of each currency–currency pair. They concluded that the volatility of digital currency returns, the volume of dollar transactions, and the number of transactions are the most critical determinants of liquidity. Simultaneously, it is noted that conventional financial market variables exhibit a limited explanatory capability. Within the analysis of the four cryptocurrencies (Bitcoin, Ethereum, Litecoin, and Ripple), Bitcoin stands out as the most liquid, while among the four examined exchanges, Coinbase Pro claims the highest liquidity. Regression analysis findings suggest that cryptocurrency liquidity is mostly independent of broader financial markets, including stocks and foreign exchange (FX). Instead, it predominantly relies on variables unique to digital currencies. In a complementary investigation [28], the authors explore the dynamic changes in Bitcoin liquidity and the factors influencing it.
- 4.
- The paper, titled “How to gauge liquidity in the digital currency market” [27], explores the effectiveness of liquidity measures derived from low-frequency transactions in capturing real-time (high-frequency) liquidity dynamics. Noteworthy among these measures are the estimators proposed by Corvin and Schultz [29] and Abdi and Ranaldo [30], both proving adept at describing time series changes across various observation frequencies, transaction locations, high-frequency liquidity measures, and digital currencies. These measures exhibit a robust performance in periods of both high and low returns, volatility, and trading volume. In contrast, Kyle and Obizhaeva’s [31] estimator and Amihud’s [32] liquidity ratio excel at estimating liquidity levels and reliably identifying differences in the liquidity between trading venues. The findings underscore the absence of a universally superior measure while confirming the effectiveness of certain low-frequency measures.
- (a)
- (Percentage quoted spread): for interval t, it is defined as: ;
- (b)
- ES (percentage effective spread): for interval t, it is defined as where refers to the first transaction after the order book snapshot was recorded and is a trade indicator variable;
- (c)
- PI (percentage price impact): for interval t, defined as where is the quote midpoint from the next order book snapshot;
- (d)
- AvgD (average BBo depth): depth for interval t equal to avg ;
- (e)
- DV (U.S. dollar volume): for interval t, where is the amount of bitcoins traded in transaction j;
- (f)
- numTX (number of transactions);
- (g)
- OI (order imbalance): for interval t, this measure is equal to ;
- (h)
- OIV (order imbalance volume): for interval t, this measure is defined as: ;
- (i)
- CRT (percentage cost of a round trade): this measure is equal to: .
- 5.
- Liquidity prediction: None of the above research has attempted to explain or predict liquidity in the digital currency market. In addition, considering the complexity of the microstructure of this liquidity, the authors in [36] claim that it is better to use non-parametric models to predict it. They found that the k-nearest neighbor (KNN) approach is more suitable for predicting cryptocurrency market liquidity than a classical linear model such as the autoregressive moving average (ARMA). In this research, they have different units such as the Canadian dollar, British pound, Ethereum, Australian dollar, Euro, Japanese yen, Danish krone, Mexican peso, South African rand, Swedish krona, Norwegian krone, Swiss franc, New Zealand dollar, Bitcoin, Taiwanese dollar, Brazilian real, Ripple, Singaporean dollar, and South Korean won. They compared short-term market liquidity forecasts of significant cryptocurrencies and fiat currencies using classical time series models such as ARMA and GARCH and a non-parametric machine learning algorithm called the KNN approach. They found that the KNN algorithm outperformed the others due to the nonlinearity of market liquidity and complexity. Its market microstructure predicts cryptocurrency and fiat rates better than the ARMA and GARCH models.
3. Research Methodology
- Simple RNN: A simple recurrent neural network (sRNN) can be viewed as a single-layer recurrent neural network where activation is delayed and fed back simultaneously with the external input (or the previous layer’s output). Mathematically, a simple recurrent neural network (sRNN) is expressed as [41]:
- Gated recurrent unit (GRU): Gated recurrent neural networks (gated RNNs) have been successfully exercised in several sequential or temporal data applications. For example, they have been widely used in speech recognition, music synthesis, natural language processing, machine translation, medical and biomedical applications, etc. Short-term memory (LSTM) RNNs and subsequently introduced gated recurrent unit (GRU) RNNs have performed reasonably well with long sequence programs. GRU reduces the gate signals from three in LSTM architecture to two. These two gates are called the update gate zt and reset gate rt. The GRU model was presented for the first time in its original form in [43], which was expressed as follows:
- Independent recurrent neural network (IndRNN): IndRNN was proposed in [44] as a main component of RNN, which is as follows:
4. Data Collection
4.1. Hash Rate Data Collection (Feature Vector)
4.2. Computational Linguistics Data Collection (Linguistic Vector)
- Numbers of words and sentences: The number of words in tweets is distributed in a broad spectrum, which shows that some fake tweets have very few words and some have many words. Word count is just a simple view for analyzing tweets. In addition, actual tweets have more sentences on average than fake tweets. These features are considered under , which is a 2D vector (including the average number of tweets) per day.
- Question marks, exclamation marks, and capital letters: Considering the text of the tweets, it can be concluded that spam tweets have more punctuation marks than actual tweets. Real tweets have fewer question marks than spam tweets. The reason may be that there are many rhetorical questions in spam tweets. These rhetorical questions are always used to emphasize ideas and intensify emotions consciously. This 3-dimensional vector is called the vector (includes the average number of tweets per day).
- Psychological perspective: From a psychological perspective, we also examine using first person pronouns (e.g., I, we, and me) in real and fake tweets. Deceptive people often use language that minimizes self-reference. A person who lies tends not to use “we” and “I” and does not use personal pronouns. On average, fake tweets have fewer first-person pronouns. We define the vector extracted from this step as P h, which contains the average number of daily tweets.
- Sentiment analysis: TextBlob (https://textblob.readthedocs.io/en/dev/ (accessed on 18 September 2023)) library was used for sentiment analysis. TextBlob is a Python (2 and 3) library for processing textual data. It is a simple API used in common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. This library from NLTK (Natural Language ToolKit) uses the main core, and the input contains a single sentence, while the outputs of the TextBlob are the polarity and subjectivity. The polar score ranges from (−1 to 1), where −1 indicates the most negative words, such as “disgusting”, “awful”, and “pathetic”, and 1 indicates the most positive words such as “excellent” and “best”. It specifies if the subjectivity score is between (0 to 1), which shows the number of personal opinions. If a sentence has a high subjectivity, i.e., close to 1, it seems the text contains more personal opinions than real information. We call the vector extracted from this step a binary, Se, which contains the average number of tweets per day.
4.3. Illiquidity Label
5. Results
- RNN (feature vector and ): The input of this model is all the extracted indicator features and the features related to the number of words and sentences of tweets on that day. In fact, the feature space of this model is equal to + .
- RNN (feature vector and ): The input of this model is all the extracted indicator features and the features related to the number of question marks, exclamation marks, and capital letter counts of tweets on that day. In fact, the feature space of this model is equal to + .
- RNN (feature vector and Ph): In this model, the features of the feature vector and Ph are used as input features. The feature space of this model is equal to + .
- RNN (feature vector and SE): In this model, the features of the feature vector and Ph are used as input features. The feature space of this model is equal to + .
- RNN (all features): This model incorporates all features as inputs, encompassing the entire feature space of previously considered cases and combinations, denoted as + + + + .
- Split validation: This method involves dividing the dataset into training and test groups, with the training set typically larger than the test set. The training dataset is utilized for training a machine learning model, while the test dataset evaluates the trained model. Both datasets feature a label attribute containing the prediction column indicating the degree of illiquidity.
- Cross-validation: In this method, the dataset is partitioned into N groups, with each group serving as the test set in the turn, while the remaining groups are used for training. The ultimate outcome is the average of the results obtained from each group. Although cross-validation is recognized as more demanding, it typically produces more dependable results. However, caution is advised in this study when predicting the early illiquidity of Bitcoin based on its prior price, as this is the focus of cross-validation. Examining Figure 4, the Bitcoin price chart illustrates a significant historical price surge, accompanied by increased volatility in recent years. Forecasting the price in the initial years incurs less error due to this substantial rise. Given that these early years outnumber the preceding years with the highest prices, the average forecast error for this period is considerably lower. Averaging the errors across all cross-validation groups mitigates the impact of inaccurate forecasting in later years, resulting in the lowest error occurring in the initial years and a substantial reduction in the final mean error. This phenomenon gives the illusion of effective prediction for the machine learning algorithm. Consequently, split validation is considered more reliable for predicting Bitcoin’s illiquidity, prioritizing the anticipation of the cryptocurrency’s future trajectory over its initial prices.
- Linear: This approach preserves the order of the records based on the original dataset. For example, suppose the split ratio is 80% and 20% for the training and testing datasets. In that case, the training dataset will be the first 80% of the initial dataset, and the test dataset will be the last 20%.
- Random: This strategy involves the random selection of unique records from the original dataset, while ensuring the distribution ratio of label features is maintained in both the training and testing datasets.
- Artificial neural network (ANN): The neural network considered in the study is characterized by specific hyperparameters: optimizer (Adam), hidden layers with neurons (2 layers with 128 neurons each), learning rate (0.08), epoch (5000), batch size (64), activation function (ReLU), and loss function (logcosh). The original article discusses the application of this network in both regression and classification modes. However, for our purposes, we specifically employed the regression mode to predict illiquidity.
- Stacked artificial neural network (SANN): In this approach, five ANN networks were considered with the settings mentioned in the ANN approach. A SANN consists of five individual ANNs that are used to train a larger ANN model. Individual models are trained using training data with a fivefold cross-validation; each model is trained with the same configuration in a separate layer. Since ANNs have random initial weights, each trained ANN gets different weights, and this advantage enables them to learn their differences well. This network is used in two modes of regression and classification in the basic article, and we used the regression mode to predict illiquidity.
- Support vector machines (SVM): This algorithm is a supervised ML model that operates based on the idea of separating points using a hyperplane, and in fact, its primary goal is to maximize the margin. In SVM, kernels can be linear or nonlinear depending on the data and include the radial basis function (RBF), hyperbolic tangent, and polynomial kernels. This algorithm can provide predictions with a low error rate for small datasets without much training. In the introductory article, this approach is considered with the Gaussian RBF kernel, which was considered only in its regression mode to predict the illiquidity of this approach.
- Long short-term memory: This approach is an RNN network that uses four gates to learn long sequences. In the previous section, RNN approaches were discussed. This approach is used in both regression and classification modes, and in this research, its regression mode was used depending on the types of labels.
6. Conclusions and Gap Analysis
- Resampling: Time series forecasting is a challenging task where the nonstationary characteristics of the data require strict settings for forecasting tasks.
- High-dimensional imbalanced time-series classification (OHIT) [42]: OHIT first uses a density ratio-based joint nearest-neighbor clustering algorithm to capture minority class states in a high-dimensional space.
- IB-GAN [48]: The standard methods of class weight, oversampling, or data augmentation are the approaches studied in “An empirical survey of data augmentation for time series classification with neural networks”.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huberman, G.; Leshno, J.; Moallemi, C.C. An economic analysis of the bitcoin payment system. Columbia Bus. Sch. Res. Pap. 2019, 17, 92. [Google Scholar]
- Sensoy, A. The inefficiency of Bitcoin revisited: A high-frequency analysis with alternative currencies. Financ. Res. Lett. 2019, 28, 68–73. [Google Scholar] [CrossRef]
- Saito, T. Bitcoin: A Search-Theoretic Approach. In Digital Currency: Breakthroughs in Research and Practice; IGI Global: Hershey, PA, USA, 2019; pp. 1–23. [Google Scholar]
- Mensi, W.; Al-Yahyaee, K.H.; Kang, S.H. Structural breaks and double long memory of cryptocurrency prices: A comparative analysis from Bitcoin and Ethereum. Financ. Res. Lett. 2019, 29, 222–230. [Google Scholar] [CrossRef]
- Kajtazi, A.; Moro, A. The role of bitcoin in well diversified portfolios: A comparative global study. Int. Rev. Financ. Anal. 2019, 61, 143–157. [Google Scholar] [CrossRef]
- Conti, M.; Kumar, E.S.; Lal, C.; Ruj, S. A survey on security and privacy issues of bitcoin. IEEE Commun. Surv. Tutor. 2018, 20, 3416–3452. [Google Scholar] [CrossRef]
- Pilkington, M.; Crudu, R.; Grant, L.G. Blockchain and bitcoin as a way to lift a country out of poverty-tourism 2.0 and e-governance in the Republic of Moldova. Int. J. Internet Technol. Secur. Trans. 2017, 7, 115–143. [Google Scholar]
- Hong, K. Bitcoin as an alternative investment vehicle. Inf. Technol. Manag. 2017, 18, 265–275. [Google Scholar] [CrossRef]
- Koo, E.; Geonwoo, K. Prediction of Bitcoin price based on manipulating distribution strategy. Appl. Soft Comput. 2021, 110, 107738. [Google Scholar] [CrossRef]
- Chen, P.-W.; Jiang, B.-S.; Wang, C.-H. Blockchain-based payment collection supervision system using pervasive Bitcoin digital wallet. In Proceedings of the 2017 IEEE 13th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Rome, Italy, 9–11 October 2017. [Google Scholar]
- Décourt, R.F.; Chohan, U.W.; Perugini, M.L. Bitcoin returns and the monday effect. Horiz. Empres. 2017, 16. [Google Scholar]
- Baur, D.G.; Hong, K.; Lee, A.D. Bitcoin: Medium of exchange or speculative assets? J. Int. Financ. Mark. Inst. Money 2018, 54, 177–189. [Google Scholar] [CrossRef]
- Khalilov, M.C.K.; Levi, A. A survey on anonymity and privacy in bitcoin-like digital cash systems. IEEE Commun. Surv. Tutor. 2018, 20, 2543–2585. [Google Scholar] [CrossRef]
- Presthus, W.; O’Malley, N.O. Motivations and barriers for end-user adoption of bitcoin as digital currency. Procedia Comput. Sci. 2017, 121, 89–97. [Google Scholar] [CrossRef]
- Fanusie, Y.; Robinson, T. Bitcoin laundering: An analysis of illicit flows into digital currency services. Cent. Sanction. Illicit Financ. Memo. 2018, 1–15. [Google Scholar] [CrossRef]
- Xu, M.; Chen, X.; Kou, G. A systematic review of blockchain. Financ. Innov. 2019, 5, 27. [Google Scholar] [CrossRef]
- Kim, J.-H.; Hanul, S. Understanding bitcoin price prediction trends under various hyperparameter configurations. Computers 2022, 11, 167. [Google Scholar] [CrossRef]
- Chen, M. A Study of How Stock Liquidity Differs in Bullish and Bearish Markets: The Case of China’s Stock Market. In Advances in Economics, Business and Management Research; Atlantis Press: Amsterdam, The Netherlands, 2019. [Google Scholar]
- Ebrahimi, P.; Basirat, M.; Yousefi, A.; Nekmahmud, M.; Gholampour, A.; Fekete-Farkas, M. Social networks marketing and consumer purchase behavior: The combination of SEM and unsupervised machine learning approaches. Big Data Cogn. Comput. 2022, 6, 35. [Google Scholar] [CrossRef]
- Ebrahimi, P.; Khajeheian, D.; Soleimani, M.; Gholampour, A.; Fekete-Farkas, M. User engagement in social network platforms: What key strategic factors determine online consumer purchase behaviour? Econ. Res.-Ekon. Istraživanja 2023, 36, 2106264. [Google Scholar] [CrossRef]
- Salamzadeh, A.; Ebrahimi, P.; Soleimani, M.; Fekete-Farkas, M. Grocery apps and consumer purchase behavior: Application of Gaussian mixture model and multi-layer perceptron algorithm. J. Risk Financ. Manag. 2022, 15, 424. [Google Scholar] [CrossRef]
- Matz, L.; Neu, P. Liquidity Risk Measurement and Management: A Practitioner’s Guide to Global Best Practices; John Wiley & Sons: Hoboken, NJ, USA, 2006; Volume 408. [Google Scholar]
- Comerton-Forde, C.; Frino, A.; Mollica, V. The impact of limit order anonymity on liquidity: Evidence from Paris, Tokyo and Korea. J. Econ. Bus. 2005, 57, 528–540. [Google Scholar] [CrossRef]
- Dyhrberg, A.H.; Foley, S.; Svec, J. How investible is Bitcoin? Analyzing the liquidity and transaction costs of Bitcoin markets. Econ. Lett. 2018, 171, 140–143. [Google Scholar] [CrossRef]
- Wei, W.C. Liquidity and market efficiency in cryptocurrencies. Econ. Lett. 2018, 168, 21–24. [Google Scholar] [CrossRef]
- Będowska-Sójka, B.; Hinc, T.; Kliber, A. Volatility and liquidity in cryptocurrency markets—The causality approach. In Contemporary Trends and Challenges in Finance; Jajuga, K., Locarek-Junge, H., Orlowski, L., Staehr, K., Eds.; Springer Proceedings in Business and Economics; Springer: Cham, Switzerland, 2020; pp. 31–43. [Google Scholar]
- Brauneis, A.; Mestel, R.; Theissen, E. What drives the liquidity of cryptocurrencies? A long-term analysis. Financ. Res. Lett. 2021, 39, 101537. [Google Scholar] [CrossRef]
- Scharnowski, S. Understanding bitcoin liquidity. Financ. Res. Lett. 2021, 38, 101477. [Google Scholar] [CrossRef]
- Corwin, S.A.; Schultz, P. A simple way to estimate bid-ask spreads from daily high and low prices. J. Financ. 2012, 67, 719–760. [Google Scholar] [CrossRef]
- Abdi, F.; Ranaldo, A. A simple estimation of bid-ask spreads from daily close, high, and low prices. Rev. Financ. Stud. 2017, 30, 4437–4480. [Google Scholar] [CrossRef]
- Kyle, A.S.; Obizhaeva, A.A. Market microstructure invariance: Empirical hypotheses. Econometrica 2016, 84, 1345–1404. [Google Scholar] [CrossRef]
- Amihud, Y. Illiquidity and stock returns: Cross-section and time-series effects. J. Financ. Mark. 2002, 5, 31–56. [Google Scholar] [CrossRef]
- Ee, M.S.; Hasan, I.; Huang, H. Stock liquidity and corporate labor investment. J. Corp. Financ. 2022, 72, 102142. [Google Scholar] [CrossRef]
- Diebold, F.X.; Yilmaz, K. Better to give than to receive: Predictive directional measurement of volatility spillovers. Int. J. Forecast. 2012, 28, 57–66. [Google Scholar] [CrossRef]
- Baruník, J.; Křehlík, T. Measuring the frequency dynamics of financial connectedness and systemic risk. J. Financ. Econom. 2018, 16, 271–296. [Google Scholar] [CrossRef]
- Cortez, R.M.; Johnston, W.J. The Coronavirus crisis in B2B settings: Crisis uniqueness and managerial implications based on social exchange theory. Ind. Mark. Manag. 2020, 88, 125–135. [Google Scholar] [CrossRef]
- Bianchi, D.; Babiak, M.; Dickerson, A. Trading volume and liquidity provision in cryptocurrency markets. J. Bank. Financ. 2022, 142, 106547. [Google Scholar] [CrossRef]
- Kubiczek, J.; Tuszkiewicz, M. Intraday Patterns of Liquidity on the Warsaw Stock Exchange before and after the Outbreak of the COVID-19 Pandemic. Int. J. Financ. Stud. 2022, 10, 13. [Google Scholar] [CrossRef]
- Dospinescu, N.; Dospinescu, O. A profitability regression model in financial communication of Romanian stock exchange companies. Ecoforum 2019, 8, 1–4. [Google Scholar]
- Chikwira, C.; Mohammed, J. The Impact of the Stock Market on Liquidity and Economic Growth: Evidence of Volatile Market. J. Econ. 2023, 11, 155. [Google Scholar] [CrossRef]
- Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
- Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990, 78, 1550–1560. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Mudassir, M.; Bennbaia, S.; Unal, D.; Hammoudeh, M. Time-series forecasting of Bitcoin prices using high-dimensional features: A machine learning approach. Neural Comput. Appl. 2020, 1–15. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 1–39. [Google Scholar] [CrossRef]
- Hansen, P.R.; Kim, C.; Kimbrough, W. Periodicity in Cryptocurrency Volatility and Liquidity. arXiv 2021, arXiv:2109.12142. [Google Scholar] [CrossRef]
- Deng, G.; Han, C.; Dreossi, T.; Lee, C.; Matteson, D.S. Ib-gan: A unified approach for multivariate time series classification under class imbalance. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), Alexandria, VA, USA, 28–30 April 2022. [Google Scholar]
- Sasani, F.; Mousa, R.; Karkehabadi, A.; Dehbashi, S.; Mohammadi, A. TM-vector: A Novel Forecasting Approach for Market stock movement with a Rich Representation of Twitter and Market data. arXiv 2023, arXiv:2304.02094. [Google Scholar]
Variable | Indicator | Equation |
---|---|---|
Amihud’s illiquidity index | ||
Turnover rate | ||
Fluctuation range | ||
Zhuang | Zhao’s liquidity index |
Price Prediction Features | |||
---|---|---|---|
III | II | I | Intervals |
* | * | MDT fee 30: median transaction fee 30 trx | |
* | * | MDT fee 7: median transaction fee 7 trx | |
* | Price 90 ema | ||
* | * | * | Size 90 trx |
* | * | Transactions | |
* | * | Price 30 wma | |
* | * | Price 3 wma | |
* | * | * | Price 7 wma |
* | * | Median transaction fee 7 roc | |
* | * | Difficulty 30 rsi | |
* | * | * | Mining profitability |
* | * | * | Price30smaUSD |
* | * | ||
* | * | Sentinusd 90 ema | |
* | * | * | Transaction value |
* | * | * | Top 100 cap |
* | Difficulty 90 mom | ||
* | * | Hashrate 90 var | |
* | * | * | Price 90 wma |
* | * | Sentinusd 90 sma | |
* | * | * | Median transaction fee |
Dataset | Interval 1 | Interval 2 | Interval 3 |
---|---|---|---|
Range | April 2013–July 2016 | April 2013–April 2017 | April 2013–April 2022 |
# Records | 1206 | 1462 | 3285 |
# Train (80%) | 964 | 1169 | 2628 |
# Test (20%) | 242 | 293 | 657 |
Parameter Name | Dimension | Features |
---|---|---|
Feature vector | Median_transaction_fee30 trx USD | |
Median transaction fee 7 trx USD | ||
Price 90 emaUSD | ||
Size 90 trx | ||
Transactions | ||
Price 30 wma USD | ||
Price 3 wma USD | ||
Price 7 wma USD | ||
Median transaction fee 7 roc USD | ||
Difficulty 30 rsi | ||
Mining profitability | ||
Price 30 sma USD | ||
Sentinusd 90 ema USD | ||
Transaction value USD | ||
Top 100 cap | ||
Difficulty 90 mom | ||
Hashrate 90 var | ||
Price 90 wma USD | ||
Sent in usd 90 sma USD | ||
Median transaction tee USD | ||
Word count and sentence count | ||
Question mark, exclamation mark and capital letters count | ||
Ph | Physiology | |
SE | Polarity and subjectivity |
Validation Method ↓ | Metrics → | MAE ($) | MAPE (%) | ||||
---|---|---|---|---|---|---|---|
Intervals → Model ↓ | I | II | III | I | II | III | |
Random split (paper) | ANN | 0.45 | 2.61 | 9.50 | 1.08 | 1.28 | 2.78 |
SVM | 0.72 | 3.23 | 7.04 | 0.74 | 1.28 | 1.44 | |
SANN | 0.24 | 2.13 | 4.58 | 0.55 | 0.93 | 2.73 | |
LSTM | 0.20 | 4.55 | 6.90 | 0.95 | 1.95 | 3.61 | |
Simple RNN | 0.67 | 3.2 | 3.67 | 0.72 | 1.21 | 1.42 | |
GRU | 0.22 | 2.12 | 3.23 | 0.56 | 1.19 | 1.51 | |
IndRNN | 0.21 | 1.99 | 3.89 | 0.45 | 0.93 | 1.04 | |
Random split (run) | ANN | 1.05 | 8.12 | 6.37 | 3.00 | 8.22 | 1.32 |
SVM | 1.23 | 5.37 | 9.47 | 0.96 | 2.09 | 2.21 | |
SANN | 1.06 | 5.77 | 7.45 | 2.96 | 6.13 | 1.22 | |
LSTM | 0.55 | 4.47 | 5.54 | 0.68 | 1.70 | 1.11 | |
Simple RNN | 1.22 | 2.41 | 9.40 | 0.98 | 2.53 | 2.01 | |
GRU | 1.45 | 3.65 | 6.66 | 0.69 | 1.91 | 1.12 | |
IndRNN | 0.52 | 1.21 | 5.01 | 0.65 | 1.83 | 1.19 | |
Linear split (run) | ANN | 8.21 | 9.4 | 8.50 | 5.70 | 22.2 | 3.21 |
SVM | 2.04 | 6.19 | 9.87 | 0.87 | 14.49 | 7.51 | |
SANN | 2.75 | 12.7 | 12.13 | 3.89 | 9.58 | 2.10 | |
LSTM | 2.83 | 5.02 | 14.77 | 3.75 | 1.18 | 2.51 | |
Simple RNN | 2.05 | 6.12 | 6.10 | 0.90 | 1.51 | 7.92 | |
GRU | 2.40 | 5.67 | 4.87 | 3.80 | 1.90 | 3.01 | |
IndRNN | 2.01 | 4.80 | 3.89 | 3.70 | 1.15 | 2.42 |
Model | ANN | SVM | SANN | LSTM | Simple RNN | GRU | IndRNN |
---|---|---|---|---|---|---|---|
ANN | Statistic = 0.0, p-value = 1.0, df = 34.0 | Statistic = −0.0622, p-value = 0.9507, df = 34.0 | Statistic = −0.17560, p-value = 0.8616, df = 34.0 | Statistic = −0.0071, p-value = 0.9943, df = 34.0 | Statistic = −0.3070, p-value = 0.76069, df = 34.0 | Statistic = −0.1642, p-value = 0.8704, df = 34.0 | Statistic = −0.2393, p-value = 0.8122, df = 34.0 |
SVM | Statistic = 0.0622, p-value = 0.9507, df = 34.0 | Statistic = 0.0, p-value = 1.0, df = 34.0 | Statistic = −0.1134, p-value = 0.9103, df = 34.0 | Statistic = 0.0551, p-value = 0.9563, df = 34.0 | Statistic = −0.2453, p-value = 0.8076, df = 34.0 | Statistic = −0.1020, p-value = 0.9192, df = 34.0 | Statistic = −0.1773, p-value = 0.8602, df = 34.0 |
SANN | Statistic = 0.1756, p-value = 0.8616, df = 34.0 | Statistic = 0.1134, p-value = 0.9103, df = 34.0 | Statistic = 0.0, p-value = 1.0, df = 34.0 | Statistic = 0.168, p-value = 0.8671, df = 34.0 | Statistic = −0.1323, p-value = 0.8954, df = 34.0 | Statistic = 0.0115, p-value = 0.9908, df = 34.0 | Statistic = −0.0640, p-value = 0.9493, df = 34.0 |
LSTM | Statistic = 0.0071, p-value = 0.9943, df = 34.0 | Statistic = −0.0551, p-value = 0.9563, df = 34.0 | Statistic = −0.1685, p-value = 0.8671, df = 34.0 | Statistic = 0.0, p-value = 1.0, df = 34.0 | Statistic = −0.2999, p-value = 0.7660, df = 34.0 | Statistic = −0.1571, p-value = 0.8760, df = 34.0 | Statistic = −0.2322, p-value = 0.8177, df = 34.0 |
Simple RNN | Statistic = 0.3070, p-value = 0.7606, df = 34.0 | Statistic = 0.2453, p-value = 0.8076, df = 34.0 | Statistic = 0.1323, p-value = 0.8954, df = 34.0 | Statistic = 0.2999, p-value = 0.76606, df = 34.0 | Statistic = 0.0, p-value = 1.0, df = 34.0 | Statistic = 0.1439, p-value = 0.8863, df = 34.0 | Statistic = 0.06848, p-value = 0.9457, df = 34.0 |
GRU | Statistic = 0.1642, p-value = 0.8704, df = 34.0 | Statistic = 0.1020, p-value = 0.9192, df = 34.0 | Statistic = −0.01152, p-value = 0.9908, df = 34.0 | Statistic = 0.1571, p-value = 0.8760, df = 34.0 | Statistic = −0.14391, p-value = 0.8863, df = 34.0 | Statistic = 0.0, p-value = 1.0, df = 34.0 | Statistic = −0.0755, p-value = 0.9401, df = 34.0 |
IndRNN | Statistic = 0.2393, p-value = 0.8122, df = 34.0 | Statistic = 0.1773, p-value = 0.8602, df = 34.0 | Statistic = 0.0640, p-value = 0.9493, df = 34.0 | Statistic = 0.2322, p-value = 0.8177, df = 34.0 | Statistic = −0.0684, p-value = 0.9457, df = 34.0 | Statistic = 0.0755, p-value = 0.9401, df = 34.0 | Statistic = 0.0, p-value = 1.0, df = 34.0 |
Validation Method ↓ | Metrics → | MAE ($) | MAPE(%) | ||||
---|---|---|---|---|---|---|---|
Intervals → Model ↓ | I | II | III | I | II | III | |
Random split (paper) | ANN | 1.53 | 3.62 | 10.3 | 2.98 | 2.38 | 4.35 |
SVM | 1.75 | 3.22 | 9.76 | 1.70 | 2.77 | 2.43 | |
SANN | 1.25 | 3.14 | 6.54 | 1.56 | 1.89 | 5.55 | |
LSTM | 1.24 | 4.53 | 8.92 | 1.94 | 2.09 | 2.13 | |
Simple RNN | 1.63 | 5.24 | 4.66 | 1.76 | 1.64 | 2.42 | |
GRU | 1.22 | 2.42 | 4.22 | 1.54 | 1.16 | 2.78 | |
IndRNN | 1.23 | 2.93 | 4.85 | 1.08 | 1.95 | 2.15 | |
Random split (run) | ANN | 2.53 | 4.63 | 8.39 | 4.31 | 9.24 | 3.64 |
SVM | 2.33 | 9.63 | 11.5 | 1.89 | 4.13 | 3.15 | |
SANN | 2.83 | 6.75 | 8.49 | 3.94 | 7.89 | 2.54 | |
LSTM | 1.75 | 5.91 | 6.57 | 1.38 | 2.13 | 2.25 | |
Simple RNN | 2.23 | 2.98 | 10.42 | 1.31 | 2.19 | 3.19 | |
GRU | 2.44 | 6.63 | 8.64 | 1.13 | 2.12 | 2.08 | |
IndRNN | 1.47 | 2.08 | 7.91 | 0.98 | 2.85 | 2.42 | |
Linear split (run) | ANN | 9.46 | 10.24 | 9.93 | 4.78 | 25.3 | 4.42 |
SVM | 3.34 | 8.08 | 10.83 | 1.47 | 16.41 | 9.04 | |
SANN | 3.73 | 13.91 | 13.10 | 4.22 | 11.12 | 3.24 | |
LSTM | 3.33 | 6.93 | 15.74 | 4.34 | 3.17 | 3.54 | |
Simple RNN | 3.35 | 7.13 | 7.12 | 1.43 | 3.13 | 7.90 | |
GRU | 3.65 | 7.09 | 5.24 | 5.52 | 3.92 | 4.24 | |
IndRNN | 2.23 | 5.78 | 4.06 | 5.72 | 3.88 | 2.98 |
Validation Method ↓ | Metrics → | MAE ($) | MAPE (%) | ||||
---|---|---|---|---|---|---|---|
Intervals → Model ↓ | I | II | III | I | II | III | |
Random split (paper) | ANN | 1.50 | 3.54 | 11.3 | 3.98 | 3.14 | 3.67 |
SVM | 1.72 | 4.26 | 10.76 | 2.70 | 2.42 | 4.46 | |
SANN | 1.23 | 4.87 | 7.54 | 1.56 | 3.40 | 6.57 | |
LSTM | 1.27 | 3.59 | 9.92 | 2.94 | 4.63 | 3.14 | |
Simple RNN | 1.67 | 4.95 | 5.66 | 1.76 | 2.24 | 3.47 | |
GRU | 1.98 | 3.42 | 4.22 | 2.54 | 1.15 | 2.79 | |
IndRNN | 1.26 | 2.76 | 4.85 | 1.08 | 2.83 | 2.53 | |
Random split (run) | ANN | 1.57 | 5.94 | 9.39 | 5.31 | 5.13 | 5.67 |
SVM | 2.38 | 8.36 | 12.5 | 2.89 | 6.38 | 2.24 | |
SANN | 1.85 | 7.97 | 9.49 | 4.94 | 4.14 | 4.75 | |
LSTM | 1.78 | 6.92 | 7.57 | 1.38 | 4.13 | 4.23 | |
Simple RNN | 1.29 | 3.92 | 9.42 | 3.31 | 2.30 | 2.65 | |
GRU | 3.49 | 7.75 | 9.64 | 2.13 | 2.17 | 3.61 | |
IndRNN | 2.58 | 3.80 | 6.91 | 1.98 | 2.43 | 3.42 | |
Linear split (run) | ANN | 10.98 | 9.22 | 8.93 | 3.78 | 22.5 | 5.46 |
SVM | 6.65 | 9.96 | 11.83 | 2.47 | 17.93 | 10.6 | |
SANN | 4.81 | 12.04 | 13.10 | 5.22 | 14.14 | 4.29 | |
LSTM | 3.82 | 7.47 | 19.74 | 4.34 | 2.14 | 4.34 | |
Simple RNN | 2.86 | 8.64 | 8.12 | 2.43 | 5.98 | 8.85 | |
GRU | 2.78 | 6.92 | 6.24 | 5.52 | 5.13 | 5.53 | |
IndRNN | 3.99 | 4.65 | 5.06 | 6.72 | 5.97 | 3.75 |
Validation Method ↓ | Metrics → | MAE ($) | MAPE (%) | ||||
---|---|---|---|---|---|---|---|
Intervals → Model ↓ | I | II | III | I | II | III | |
Random split (paper) | ANN | 2.51 | 2.56 | 10.3 | 4.98 | 3.14 | 3.67 |
SVM | 2.31 | 5.56 | 11.7 | 3.70 | 2.42 | 4.46 | |
SANN | 1.25 | 5.25 | 6.54 | 2.56 | 3.40 | 6.57 | |
LSTM | 1.23 | 5.54 | 10.92 | 1.94 | 4.63 | 3.14 | |
Simple RNN | 1.23 | 3.42 | 6.66 | 2.72 | 1.24 | 4.47 | |
GRU | 0.53 | 2.23 | 5.22 | 1.12 | 2.15 | 4.79 | |
IndRNN | 1.55 | 1.55 | 5.85 | 4.81 | 3.83 | 5.53 | |
Random split (run) | ANN | 2.23 | 5.42 | 10.39 | 4.32 | 6.13 | 2.67 |
SVM | 2.3 | 8.14 | 11.5 | 1.29 | 5.38 | 3.24 | |
SANN | 1.89 | 7.91 | 10.49 | 5.93 | 3.14 | 3.75 | |
LSTM | 2.24 | 6.24 | 7.57 | 2.12 | 5.13 | 3.23 | |
Simple RNN | 1.23 | 3.54 | 9.42 | 3.32 | 2.30 | 2.65 | |
GRU | 2.43 | 8.23 | 9.64 | 2.31 | 4.17 | 4.61 | |
IndRNN | 1.12 | 4.82 | 5.91 | 1.32 | 7.43 | 5.42 | |
Linear split (run) | ANN | 13.40 | 8.25 | 8.93 | 5.13 | 23.5 | 6.46 |
SVM | 7.24 | 8.23 | 10.83 | 2.43 | 13.93 | 9.6 | |
SANN | 5.24 | 10.3 | 13.10 | 5.23 | 15.14 | 5.29 | |
LSTM | 4.98 | 8.23 | 14.74 | 4.54 | 4.14 | 5.34 | |
Simple RNN | 1.68 | 8.52 | 5.12 | 2.09 | 3.98 | 7.85 | |
GRU | 4.42 | 6.23 | 7.24 | 5.89 | 4.13 | 3.53 | |
IndRNN | 3.24 | 5.42 | 6.06 | 4.33 | 6.97 | 4.75 |
Validation Method ↓ | Metrics → | MAE ($) | MAPE (%) | ||||
---|---|---|---|---|---|---|---|
Intervals → Model ↓ | I | II | III | I | II | III | |
Random split (paper) | ANN | 1.53 | 1.56 | 12.3 | 3.08 | 2.14 | 3.67 |
SVM | 2.34 | 6.56 | 13.7 | 2.24 | 3.42 | 4.46 | |
SANN | 2.22 | 6.25 | 6.54 | 1.34 | 4.40 | 6.57 | |
LSTM | 2.25 | 3.54 | 11.92 | 2.93 | 5.63 | 3.14 | |
Simple RNN | 2.15 | 5.42 | 7.66 | 2.43 | 3.66 | 3.40 | |
GRU | 1.14 | 2.23 | 6.22 | 3.76 | 2.35 | 2.73 | |
IndRNN | 2.13 | 2.55 | 7.85 | 2.04 | 2.64 | 2.24 | |
Random split (run) | ANN | 3.21 | 4.42 | 11.39 | 4.34 | 5.42 | 6.43 |
SVM | 3.32 | 2.14 | 16.5 | 1.43 | 6.42 | 2.65 | |
SANN | 3.81 | 4.91 | 11.49 | 5.93 | 4.10 | 4.35 | |
LSTM | 2.21 | 2.24 | 8.57 | 2.36 | 4.15 | 4.09 | |
Simple RNN | 2.22 | 5.54 | 10.42 | 1.35 | 2.34 | 2.24 | |
GRU | 2.43 | 9.23 | 10.64 | 1.13 | 2.90 | 3.66 | |
IndRNN | 2.11 | 3.82 | 6.91 | 2.94 | 2.65 | 3.46 | |
Linear split (run) | ANN | 11.2 | 10.9 | 9.93 | 2.73 | 22.6 | 5.35 |
SVM | 8.23 | 10.3 | 11.83 | 3.48 | 17.2 | 10.8 | |
SANN | 6.22 | 11.3 | 11.10 | 4.25 | 14.9 | 4.34 | |
LSTM | 5.94 | 13.0 | 16.74 | 5.36 | 2.90 | 4.30 | |
Simple RNN | 3.61 | 12.5 | 6.12 | 3.44 | 5.23 | 8.42 | |
GRU | 2.42 | 11.0 | 8.24 | 5.53 | 5.42 | 5.65 | |
IndRNN | 3.23 | 11.0 | 7.06 | 6.71 | 5.23 | 3.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sasani, F.; Moghareh Dehkordi, M.; Ebrahimi, Z.; Dustmohammadloo, H.; Bouzari, P.; Ebrahimi, P.; Lencsés, E.; Fekete-Farkas, M. Forecasting of Bitcoin Illiquidity Using High-Dimensional and Textual Features. Computers 2024, 13, 20. https://doi.org/10.3390/computers13010020
Sasani F, Moghareh Dehkordi M, Ebrahimi Z, Dustmohammadloo H, Bouzari P, Ebrahimi P, Lencsés E, Fekete-Farkas M. Forecasting of Bitcoin Illiquidity Using High-Dimensional and Textual Features. Computers. 2024; 13(1):20. https://doi.org/10.3390/computers13010020
Chicago/Turabian StyleSasani, Faraz, Mohammad Moghareh Dehkordi, Zahra Ebrahimi, Hakimeh Dustmohammadloo, Parisa Bouzari, Pejman Ebrahimi, Enikő Lencsés, and Mária Fekete-Farkas. 2024. "Forecasting of Bitcoin Illiquidity Using High-Dimensional and Textual Features" Computers 13, no. 1: 20. https://doi.org/10.3390/computers13010020
APA StyleSasani, F., Moghareh Dehkordi, M., Ebrahimi, Z., Dustmohammadloo, H., Bouzari, P., Ebrahimi, P., Lencsés, E., & Fekete-Farkas, M. (2024). Forecasting of Bitcoin Illiquidity Using High-Dimensional and Textual Features. Computers, 13(1), 20. https://doi.org/10.3390/computers13010020