# Price Movement Prediction of Cryptocurrencies Using Sentiment Analysis and Machine Learning

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Market Data

#### 2.2. Social Data

- Has been created during the time period the study takes place: previous tweets are not taken into account even when they may be influencing current behavior, as such analysis is outside the scope of this study.
- Contained the name (i.e., bitcoin) or the ticker symbol (i.e., btc) of one of the analyzed currencies in either its text fields or tags: this gives a high degree of confidence that the tweet is at least related to one of the cryptocurrencies in question.
- Is written in English: Being dictionary based, our sentiment analysis tool only works with the English language.
- Is not duplicated: while re-tweets were allowed as this may signal a sentimental trend, duplicated tweets not taken in consideration as this type of activity is mainly displayed by bot accounts.

#### 2.3. Sentiment Analysis

#### 2.4. Feature Vectors

- $neu$ is the average of neutral sentiments defined as $\frac{{\sum}_{i=1}^{n}{t}_{neu}}{n}$
- $neg$ is the average of negative sentiments defined as $\frac{{\sum}_{i=1}^{n}{t}_{neg}}{n}$
- $norm$ is the sum of the valence scores of each word defined as $\frac{{\sum}_{i=1}^{n}{t}_{norm}}{n}$
- $pos$ is the average of positive sentiments $\frac{{\sum}_{i=1}^{n}{t}_{pos}}{n}$
- $pol$ is the geometric mean of $pos$ and $neg$ defined as $\sqrt{{V}_{pos}{V}_{neg}}$
- close is the closing price in the time period
- high is the highest price in the time period
- low is the lowest price in the time period
- open is the opening price in the time period
- volumeto is the trading volume for the time period

#### 2.5. Multi-Layer Perceptron

#### 2.6. Support Vector Machines

#### 2.7. Random Forests

#### 2.8. Training

## 3. Results

#### 3.1. Setup

#### 3.2. Evaluation

- ${t}_{p}$ = Number of true positive values
- ${t}_{n}$ = Number of true negative values
- ${f}_{p}$ = Number of false positive values
- ${f}_{p}$ = Number of false negative values.

#### 3.3. Results

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Data Availability

## References

Cryptocurrency | Collected Tweets | Total Percentage |
---|---|---|

Bitcoin | 13,096,598 | 63% |

Ethereum | 5,366,126 | 25.81% |

Ripple | 1,143,634 | 5.5% |

Litecoin | 1,183,214 | 5.69% |

Cryptocurrency | Price Increased | Price Decreased |
---|---|---|

Bitcoin | 28 | 32 |

Ethereum | 28 | 32 |

Ripple | 23 | 37 |

Litecoin | 29 | 31 |

**Table 3.**Results of applying multi-layer perceptron (MLP), support vector machine (SVM) and random forest (RF) using Twitter data, market data or both for predicting daily market movements for Bitcoin.

Model | Accuracy (95% CI) | Precision | Recall | F_{1} Score |
---|---|---|---|---|

MLP Twitter | 0.39 (±0.02) | 0.38 | 0.39 | 0.38 |

MLP Market | 0.72 (±0.03) | 0.74 | 0.72 | 0.71 |

MLP Twitter and Market | 0.72 (±0.06) | 0.76 | 0.72 | 0.72 |

SVM Twitter | 0.50 (±0.03) | 0.29 | 0.50 | 0.37 |

SVM Market | 0.55 (±0.03) | 0.53 | 0.56 | 0.47 |

SVM Twitter and Market | 0.55 (±0.03) | 0.31 | 0.56 | 0.40 |

RF Twitter | 0.44 (±0.04) | 0.50 | 0.80 | 0.62 |

RF Market | 0.61 (±0.04) | 0.67 | 0.25 | 0.36 |

RF Twitter and Market | 0.44 (±0.04) | 0.28 | 0.44 | 0.34 |

Random | 0.50 (±0.28) | 0.49 | 0.50 | 0.50 |

Majority | 0.55 (±0.0) | 0.31 | 0.56 | 0.40 |

**Table 4.**Results of applying MLP, SVM and RF using Twitter data, market data or both for predicting daily market movements for Ethereum.

Model | Accuracy (95% CI) | Precision | Recall | F_{1} Score |
---|---|---|---|---|

MLP Twitter | 0.39 (±0.02) | 0.44 | 0.39 | 0.38 |

MLP Market | 0.44 (±0.02) | 0.44 | 0.39 | 0.35 |

MLP Twitter and Market | 0.44 (±0.03) | 0.56 | 0.44 | 0.39 |

SVM Twitter | 0.39 (±0.03) | 0.15 | 0.39 | 0.22 |

SVM Market | 0.39 (±0.03) | 0.15 | 0.39 | 0.22 |

SVM Twitter and Market | 0.39 (±0.03) | 0.15 | 0.39 | 0.22 |

RF Twitter | 0.33 (±0.03) | 0.14 | 0.33 | 0.19 |

RF Market | 0.28 (±0.03) | 0.12 | 0.28 | 0.17 |

RF Twitter and Market | 0.39 (±0.03) | 0.15 | 0.39 | 0.22 |

Random | 0.50 (±0.28) | 0.54 | 0.50 | 0.49 |

Majority | 0.61 (±0.0) | 0.37 | 0.61 | 0.46 |

**Table 5.**Results of applying MLP, SVM and RF using Twitter data, market data or both for predicting daily market movements for Ripple.

Model | Accuracy (95% CI) | Precision | Recall | F_{1} Score |
---|---|---|---|---|

MLP Twitter | 0.54 (±0.03) | 0.50 | 0.50 | 0.50 |

MLP Market | 0.64 (±0.04) | 0.68 | 0.67 | 0.66 |

MLP Twitter and Market | 0.56 (±0.02) | 0.56 | 0.56 | 0.55 |

SVM Twitter | 0.53 (±0.04) | 0.60 | 0.56 | 0.50 |

SVM Market | 0.50 (±0.04) | 0.50 | 0.50 | 0.41 |

SVM Twitter and Market | 0.50 (±0.04) | 0.25 | 0.50 | 0.33 |

RF Twitter | 0.39 (±0.03) | 0.39 | 0.39 | 0.39 |

RF Market | 0.50 (±0.03) | 0.50 | 0.50 | 0.41 |

RF Twitter and Market | 0.44 (±0.03) | 0.44 | 0.44 | 0.44 |

Random | 0.50 (±0.28) | 0.50 | 0.50 | 0.49 |

Majority | 0.50 (±0.0) | 0.25 | 0.50 | 0.33 |

**Table 6.**Results of applying MLP, SVM and RF using Twitter data, market data or both for predicting daily market movements for Litecoin.

Model | Accuracy (95% CI) | Precision | Recall | F_{1} Score |
---|---|---|---|---|

MLP Twitter | 0.59 (±0.05) | 0.61 | 0.61 | 0.61 |

MLP Market | 0.61 (±0.04) | 0.78 | 0.61 | 0.54 |

MLP Twitter and Market | 0.61 (±0.04) | 0.62 | 0.61 | 0.60 |

SVM Twitter | 0.52 (±0.04) | 0.50 | 0.50 | 0.41 |

SVM Market | 0.52 (±0.04) | 0.25 | 0.50 | 0.33 |

SVM Twitter and Market | 0.66 (±0.04) | 0.80 | 0.67 | 0.62 |

RF Twitter | 0.50 (±0.03) | 0.50 | 0.50 | 0.49 |

RF Market | 0.50 (±0.03) | 0.50 | 0.50 | 0.49 |

RF Twitter and Market | 0.61 (±0.03) | 0.66 | 0.61 | 0.58 |

Random | 0.50 (±0.28) | 0.50 | 0.50 | 0.50 |

Majority | 0.50 (±0.0) | 0.25 | 0.50 | 0.33 |

