# An Advanced CNN-LSTM Model for Cryptocurrency Forecasting

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Multiple-Input Cryptocurrency Deep Learning Model

- Convolutional layer: Convolutional layers [21] constite a novel class of neural network layers which are characterized by their remarkable ability to learn the internal representation of their inputs. This is performed by applying convolutional operations between the input data and the use of convolution kernels, called “filters”, for developing new feature values;
- Pooling layer: Pooling layers [21] are utilized to reduce the spatial dimensions, aiming to reduce the number of operations required for all following layers. Notice that less spatial information implies less weights, so less chance to overfit the training data and less computional effort. In more detail, these layers are utilized to downsample the output of a previous layer, which is usually a convolutional layer, attempting to pass only valid and useful information. Probably, max pooling and average pooling layers constitute the most widely utilized choices, which use the maximum value and the average value from each cluster of outputs of the previous layer, respectively [22];
- LSTM layer: LSTM layers [23] belong to the class of recurrent neural network layers, enhanced with a separate memory cell and adaptive gate units (input, forget, and output) for controlling the information flow. The utilization of gates in each cell implies that data can be filtered, discarded, or added therefore maintaining useful information in the memory cell for longer periods of time. The advantage of LSTM layers are their ability to identify both short and long term correlation features within time series and considerably address the vanishing gradient problem [23];
- Dense layer: Dense layers constitute the most popular and widely utilized choice for composing the hidden layer of a deep neural network [24]. In particular, each dense layer is composed by neurons, which are connected with all neurons of the previous layer. Generally, dense layers add a non-linearity property and theoretically a neural network composed by dense layers is able to model any mathematical function [25];
- Batch-normalization layer: Batch normalization constitutes an elegant technique for training deep neural networks which focuses on stabilizing the learning process by standardizing the inputs of the next layer for each mini-batch [21]. Batch normalization significantly reduces the problem of coordinating updates across many layers and usually accelerates training by considerably reducing the number of epochs;
- Dropout layer: Dropout constitutes one of the most famous regulization methods for preventing neural-networks from overfitting. The dropout layer is a non-learnable layer which is added between existing layers of a neural network model. It is applied to outputs of the prior layer and temporarily sets a random set of outputs to zero with a pre-defined probability p, called the dropout rate, which are fed to the next layer. The key idea in dropout and its motivation is to make each layer less sensitive to statistical fluctuations in the inputs [26].

## 4. Data

## 5. Numerical Experiments

- Model${}_{1}$ consists of a convolutional layer of 16 filters of size $(2;)$, followed by an average pooling layer of size 2, a LSTM layer of 50 units, a batch normalization layer a dropout out layer with $p=0.4$, a dense layer of 64 neurons, a batch normalization layer a dropout out layer with $p=0.2$, and an output layer of one neuron;
- Model${}_{2}$ consists of a convolutional layer of 32 filters of size $(2;)$, followed by an average pooling layer of size 2, a LSTM layer of 50 units, a batch normalization layer a dropout out layer with $p=0.5$, a dense layer of 128 neurons, a batch normalization layer a dropout out layer with $p=0.2$, and an output layer of one neuron;
- MICDL model consists of 3 convolutional layers with 16 filters of size $(2;)$, each one takes as input a unique cryptocurrency time-series data, i.e., BTC, ETH, and XRP. Each convolutional layer is followed by am average pooling layer of size $(2;)$ and a LSTM layer with 50 units. The outputs of the LSTM layers are merged by a concatenate layer which is followed by a dense layer of 256 neurons, a batch normalization layer, a dropout layer with $p=0.3$ a dense layer of 64 neurons, a batch normalization layer, a dropout layer with $p=0.2$, and a final output layer of one neuron.

## 6. Discussion, Conclusions, & Future Research

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Nasir, M.A.; Huynh, T.L.D.; Nguyen, S.P.; Duong, D. Forecasting cryptocurrency returns and volume using search engines. Financ. Innov.
**2019**, 5, 2. [Google Scholar] [CrossRef] - Livieris, I.E.; Stavroyiannis, S.; Pintelas, E.; Pintelas, P. A novel validation framework to enhance deep learning models in time-series forecasting. Neural Comput. Appl.
**2020**, 32, 17149–17167. [Google Scholar] [CrossRef] - Derbentsev, V.; Matviychuk, A.; Soloviev, V.N. Forecasting of Cryptocurrency Prices Using Machine Learning. In Advanced Studies of Financial Technologies and Cryptocurrency Markets; Springer: Berlin/Heidelberg, Germany, 2020; pp. 211–231. [Google Scholar]
- Chowdhury, R.; Rahman, M.A.; Rahman, M.S.; Mahdy, M. Predicting and Forecasting the Price of Constituents and Index of Cryptocurrency Using Machine Learning. arXiv
**2019**, arXiv:1905.08444. [Google Scholar] - Wan, R.; Mei, S.; Wang, J.; Liu, M.; Yang, F. Multivariate temporal convolutional network: A deep neural networks approach for multivariate time series forecasting. Electronics
**2019**, 8, 876. [Google Scholar] [CrossRef] [Green Version] - Vidal, A.; Kristjanpoller, W. Gold Volatility Prediction using a CNN-LSTM approach. Expert Syst. Appl.
**2020**, 157, 113481. [Google Scholar] [CrossRef] - Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy
**2019**, 182, 72–81. [Google Scholar] [CrossRef] - Xie, H.; Zhang, L.; Lim, C.P. Evolving CNN-LSTM Models for Time Series Prediction Using Enhanced Grey Wolf Optimizer. IEEE Access
**2020**, 8, 161519–161541. [Google Scholar] [CrossRef] - Livieris, I.E.; Pintelas, E.; Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl.
**2020**, 32, 17351–17360. [Google Scholar] [CrossRef] - Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Appl.
**2020**, 1–13. [Google Scholar] [CrossRef] - Pintelas, E.; Livieris, I.E.; Stavroyiannis, S.; Kotsilieris, T.; Pintelas, P. Investigating the Problem of Cryptocurrency Price Prediction: A Deep Learning Approach. In IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Berlin/Heidelberg, Germany, 2020; pp. 99–110. [Google Scholar]
- Livieris, I.E.; Pintelas, E.; Stavroyiannis, S.; Pintelas, P. Ensemble Deep Learning Models for Forecasting Cryptocurrency Time-Series. Algorithms
**2020**, 13, 121. [Google Scholar] [CrossRef] - Yates, R.D.; Goodman, D.J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Sun, Y.; Zhu, L.; Wang, G.; Zhao, F. Multi-input convolutional neural network for flower grading. J. Electr. Comput. Eng.
**2017**, 2017, 9240407. [Google Scholar] [CrossRef] [Green Version] - Li, H.; Shen, Y.; Zhu, Y. Stock price prediction using attention-based multi-input LSTM. In Proceedings of the Asian Conference on Machine Learning (ACML 2018), Beijing, China, 14–16 November 2018; pp. 454–469. [Google Scholar]
- Livieris, I.E.; Dafnis, S.D.; Papadopoulos, G.K.; Kalivas, D.P. A Multiple-Input Neural Network Model for Predicting Cotton Production Quantity: A Case Study. Algorithms
**2020**, 13, 273. [Google Scholar] [CrossRef] - Apolo-Apolo, O.E.; Pérez-Ruiz, M.; Martínez-Guanter, J.; Egea, G. A mixed data-based deep neural network to estimate leaf area index in wheat breeding trials. Agronomy
**2020**, 10, 175. [Google Scholar] [CrossRef] [Green Version] - Pan, Y.; Xiao, Z.; Wang, X.; Yang, D. A multiple support vector machine approach to stock index forecasting with mixed frequency sampling. Knowl.-Based Syst.
**2017**, 122, 90–102. [Google Scholar] [CrossRef] - Patel, M.M.; Tanwar, S.; Gupta, R.; Kumar, N. A Deep Learning-based Cryptocurrency Price Prediction Scheme for Financial Institutions. J. Inf. Secur. Appl.
**2020**, 55, 102583. [Google Scholar] [CrossRef] - Cai, J.x.; Zhong, R.; Li, Y. Antenna selection for multiple-input multiple-output systems based on deep convolutional neural networks. PLoS ONE
**2019**, 14, e0215672. [Google Scholar] [CrossRef] - Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
- Ibrahim, Z.; Isa, D.; Idrus, Z.; Kasiran, Z.; Roslan, R. Evaluation of Pooling Layers in Convolutional Neural Network for Script Recognition. In International Conference on Soft Computing in Data Science; Springer: Berlin/Heidelberg, Germany, 2019; pp. 121–129. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst.
**2016**, 28, 2222–2232. [Google Scholar] [CrossRef] [Green Version] - Demuth, H.B.; Beale, M.H.; De Jess, O.; Hagan, M.T. Neural Network Design; Oklahoma State University: Stillwater, OK, USA, 2014. [Google Scholar]
- Bennur, A.; Gaggar, M. LCA-Net: Light Convolutional Autoencoder for Image Dehazing. arXiv
**2020**, arXiv:2008.10325. [Google Scholar] - Livieris, I.E.; Stavroyiannis, S.; Pintelas, E.; Kotsilieris, T.; Pintelas, P. A dropout weight-constrained recurrent neural network model for forecasting the price of major cryptocurrencies and CCi30 index. Evol. Syst.
**2020**. [Google Scholar] [CrossRef] - Brockwell, P.; Davis, R. Introduction to Time Series and Forecasting; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Montgomery, D.C.; Jennings, C.L.; Kulahci, M. Introduction to Time Series Analysis and Forecasting; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Kurbiel, T.; Khaleghian, S. Training of deep neural networks based on distance measures using RMSProp. arXiv
**2017**, arXiv:1708.01911. [Google Scholar] - Hodges, J.L.; Lehmann, E.L. Rank methods for combination of independent experiments in analysis of variance. Ann. Math. Stat.
**1962**, 33, 482–497. [Google Scholar] [CrossRef] - Finner, H. On a monotonicity problem in step-down multiple test procedures. J. Am. Stat. Assoc.
**1993**, 88, 920–923. [Google Scholar] [CrossRef] - Chan, E. Algorithmic Trading: Winning Strategies and Their Rationale; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 625. [Google Scholar]
- Ma, Y.; Yang, B.; Su, Y. Technical trading index, return predictability and idiosyncratic volatility. Int. Rev. Econ. Financ.
**2020**, 69, 879–900. [Google Scholar] [CrossRef]

**Figure 2.**Daily price of cryptocurrencies Bitcoin (BTC), Etherium (ETH), and Ripple (XRP) in USD from 1 January 2017 to 31 October 2020.

Data | Minimum | Maximum | Mean | Std. Dev. | Median | Skewness | Kurtosis | |
---|---|---|---|---|---|---|---|---|

BTC | Training set | 777.76 | 19,497.40 | 6459.19 | 3496.35 | 6588.31 | 0.44 | 0.26 |

Validation set | 4970.79 | 9951.52 | 7810.27 | 1358.18 | 7801.33 | −0.25 | −1.08 | |

Testing set | 9045.39 | 13,780.99 | 10,663.35 | 1178.23 | 10,680.84 | 0.47 | −0.46 | |

ETH | Training set | 8.17 | 1396.42 | 291.33 | 240.22 | 217.05 | 1.71 | 3.03 |

Validation set | 110.61 | 243.53 | 181.16 | 35.93 | 192.09 | −0.35 | −1.12 | |

Testing set | 222.96 | 477.05 | 328.80 | 71.57 | 353.21 | −0.26 | −1.44 | |

XRP | Training set | 0.01 | 3.38 | 0.39 | 0.36 | 0.30 | 3.68 | 19.98 |

Validation set | 0.14 | 0.24 | 0.19 | 0.02 | 0.20 | −0.23 | −0.26 | |

Testing set | 0.18 | 0.31 | 0.24 | 0.04 | 0.24 | 0.15 | −1.00 |

Data | Up | Down | |||
---|---|---|---|---|---|

BTC | Training set | 630 | 54.74% | 521 | 45.26% |

Validation set | 50 | 53.76% | 43 | 46.24% | |

Testing set | 87 | 57.24% | 65 | 42.76% | |

ETH | Training set | 577 | 50.22% | 572 | 49.78% |

Validation set | 51 | 54.84% | 42 | 45.16% | |

Testing set | 85 | 55.92% | 67 | 44.08% | |

XRP | Training set | 538 | 46.74% | 613 | 53.26% |

Validation set | 52 | 55.91% | 41 | 44.09% | |

Testing set | 80 | 52.63% | 72 | 47.37% |

Time-Series | t-Statistic | p-Value | |
---|---|---|---|

Levels | BTC | −1.831855 | 0.364765 |

ETH | −2.058319 | 0.261601 | |

XRP | −3.880291 | 0.002185 | |

Transformed | BTC | −36.832969 | 0.000000 * |

ETH | −36.832969 | 0.000000 * | |

XRP | −36.832969 | 0.000000 * |

Model | Lag | MAE | RMSE | ${\mathit{R}}^{2}$ | Accuracy | AUC | GM | Sen | Spe |
---|---|---|---|---|---|---|---|---|---|

Model${}_{1}$ | 7 | 169.817 | 256.688 | 0.953 | 55.03% | 0.494 | 20.058 | 0.881 | 0.108 |

Model${}_{2}$ | 169.604 | 256.318 | 0.953 | 53.64% | 0.497 | 26.567 | 0.770 | 0.224 | |

MICDL | 170.761 | 257.728 | 0.952 | 53.04% | 0.502 | 30.886 | 0.698 | 0.306 | |

Model${}_{1}$ | 14 | 171.292 | 262.339 | 0.950 | 53.53% | 0.504 | 27.962 | 0.720 | 0.288 |

Model${}_{2}$ | 170.105 | 256.849 | 0.952 | 52.60% | 0.506 | 23.727 | 0.645 | 0.367 | |

MICDL | 171.147 | 257.847 | 0.952 | 51.88% | 0.508 | 29.173 | 0.582 | 0.434 |

Model | Lag | MAE | RMSE | ${\mathit{R}}^{2}$ | Accuracy | AUC | GM | Sen | Spe |
---|---|---|---|---|---|---|---|---|---|

Model${}_{1}$ | 7 | 9.172 | 13.517 | 0.964 | 51.51% | 0.495 | 25.770 | 0.666 | 0.324 |

Model${}_{2}$ | 9.302 | 13.591 | 0.964 | 48.85% | 0.498 | 23.961 | 0.419 | 0.576 | |

MICDL | 9.233 | 13.551 | 0.964 | 50.86% | 0.504 | 29.582 | 0.483 | 0.526 | |

Model${}_{1}$ | 14 | 9.309 | 13.657 | 0.964 | 49.57% | 0.482 | 27.888 | 0.596 | 0.369 |

Model${}_{2}$ | 9.196 | 13.539 | 0.964 | 50.38% | 0.503 | 27.930 | 0.513 | 0.492 | |

MICDL | 9.146 | 13.492 | 0.964 | 51.11% | 0.495 | 30.461 | 0.628 | 0.363 |

Model | Lag | MAE | RMSE | ${\mathit{R}}^{2}$ | Accuracy | AUC | GM | Sen | Spe |
---|---|---|---|---|---|---|---|---|---|

Model${}_{1}$ | 7 | 0.005 | 0.007 | 0.960 | 48.97% | 0.497 | 22.031 | 0.348 | 0.646 |

Model${}_{2}$ | 0.005 | 0.007 | 0.958 | 49.61% | 0.497 | 22.274 | 0.475 | 0.520 | |

MICDL | 0.005 | 0.007 | 0.958 | 49.07% | 0.498 | 25.053 | 0.366 | 0.630 | |

Model${}_{1}$ | 14 | 0.005 | 0.007 | 0.962 | 49.34% | 0.499 | 22.418 | 0.391 | 0.607 |

Model${}_{2}$ | 0.006 | 0.009 | 0.936 | 49.23% | 0.501 | 23.351 | 0.340 | 0.662 | |

MICDL | 0.007 | 0.009 | 0.953 | 49.23% | 0.495 | 26.157 | 0.442 | 0.549 |

**Table 7.**Friedman aligned ranking (FAR) test and Finner post hoc test based on the mean absolute error (MAE) metric.

Series | Friedman | Finner Post-Hoc Test | |
---|---|---|---|

Ranking | p-Value | ${\mathit{H}}_{0}$ | |

MICDL | 11.5 | − | − |

Model${}_{2}$ | 7.167 | 0.29398 | accepted |

Model${}_{1}$ | 9.833 | 0.38694 | accepted |

Series | Friedman | Finner Post-Hoc Test | |
---|---|---|---|

Ranking | p-Value | ${\mathit{H}}_{0}$ | |

Model${}_{2}$ | 8.4167 | − | − |

MICDL | 9.4167 | 0.745603 | accepted |

Model${}_{1}$ | 10.667 | 0.71420 | accepted |

Series | Friedman | Finner Post-Hoc Test | |
---|---|---|---|

Ranking | p-Value | ${\mathit{H}}_{0}$ | |

Model${}_{1}$ | 8.0833 | − | − |

MICDL | 9.5 | 0.587788 | accepted |

Model${}_{2}$ | 10.9167 | 0.645784 | accepted |

Series | Friedman | Finner Post-Hoc Test | |
---|---|---|---|

Ranking | p-Value | ${\mathit{H}}_{0}$ | |

MICDL | 6.917 | − | − |

Model${}_{2}$ | 8.250 | 0.66531 | accepted |

Model${}_{1}$ | 13.333 | 0.07332 | reject |

Series | Friedman | Finner Post-Hoc Test | |
---|---|---|---|

Ranking | p-Value | ${\mathit{H}}_{0}$ | |

MICDL | 3.500 | − | − |

Model${}_{2}$ | 12.500 | 0.006989 | reject |

Model${}_{1}$ | 12.500 | 0.006989 | reject |

Series | Friedman | Finner Post-Hoc Test | |
---|---|---|---|

Ranking | p-Value | ${\mathit{H}}_{0}$ | |

MICDL | 3.500 | − | − |

Model${}_{1}$ | 11.833 | 0.006857 | reject |

Model${}_{2}$ | 13.167 | 0.003419 | reject |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Livieris, I.E.; Kiriakidou, N.; Stavroyiannis, S.; Pintelas, P.
An Advanced CNN-LSTM Model for Cryptocurrency Forecasting. *Electronics* **2021**, *10*, 287.
https://doi.org/10.3390/electronics10030287

**AMA Style**

Livieris IE, Kiriakidou N, Stavroyiannis S, Pintelas P.
An Advanced CNN-LSTM Model for Cryptocurrency Forecasting. *Electronics*. 2021; 10(3):287.
https://doi.org/10.3390/electronics10030287

**Chicago/Turabian Style**

Livieris, Ioannis E., Niki Kiriakidou, Stavros Stavroyiannis, and Panagiotis Pintelas.
2021. "An Advanced CNN-LSTM Model for Cryptocurrency Forecasting" *Electronics* 10, no. 3: 287.
https://doi.org/10.3390/electronics10030287