# A Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the Universal Language Model Fine-Tuning and SVM

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Transfer Learning with ULMFiT

#### 2.2. Language Model

#### 2.3. AWD-LSTM

#### 2.4. Support Vector Machine (SVM)

#### 2.5. Long Short-Term Memory (LSTM)

## 3. Methodology

#### 3.1. ULMFit–SVM Model

- General-domain LM pre-training;
- Target task LM fine-tuning;
- Target task classifier.

#### 3.2. Pretrained Phase

#### 3.3. Fine-Tuning the Language Model

#### 3.3.1. Slanted Triangular Learning Rates (STLR)

- $\left(T\right)$ refers to the count of training iterations (one training iteration is equal to the number of epochs times the number of updates per epoch).
- $(cut\_frac)$ refers to the fraction of iterations.
- $\left(cut\right)$ refers to the iteration in case of raising or lowering the LR.
- (for $t<cut,p)$ refers to the count of iterations the LR has increased upon the total number of increasing iterations.
- $t>=cut,p$ refers to the total count of iterations the LR has decreased upon the total number of decreasing iterations.
- $\left(ratio\right)$ states the size of the lowest LR compared with the maximum LR, ${\eta}_{max}$.
- $\left(\eta t\right)$ refers to the learning rate at iteration t.
- $cut\_frac=0.1,ratio=32$ and $et{a}_{max}=0.01$.

#### 3.3.2. Discriminative Fine-Tuning (DFT)

#### 3.4. Model Training

#### 3.4.1. Concat Pooling

#### 3.4.2. Gradual Unfreezing

- The first unfrozen layer is the last LSTM layer, and then, the model is fine-tuned for one epoch.
- Subsequently, the following lower layer is unfrozen.
- The same procedures of unfreezing are performed on all layers until they are fine-tuned to convergence.

#### 3.4.3. BPTT for Text Classification (BPT3C)

#### 3.4.4. Bidirectional Language Model

#### 3.5. Dataset Overview

#### 3.6. Dataset Preprocessing

#### 3.7. Word Embedding

- Extra spaces, tab characters, newline characters, and other characters should be removed and replaced with regular characters.
- To tokenize the data, we use the spaCy library. Since spaCy does not have a parallel/multicore tokenizer, the fast.ai package is used to offer this feature. This parallel version of the spacy tokenizer takes advantage of all of the cores on your computer’s CPUs and is significantly faster than the serial version.

- Making a list of all the words that appear in the same order.
- Replacing each word with its index into that list.

#### 3.8. Evaluation Metrics

## 4. Performance Evaluation

#### 4.1. Evaluation Based on Testing Data

#### 4.2. Effect of Hyper-Parameters and Hidden Units Number Setting in Our Model Efficiency

## 5. Discussion and Additional Comparisons

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

ULMFiT | Universal Language Model Fine-tuning |

SVM | Support Vector Machine |

SLT | Statistical Learning Theory |

NLP | Natural Language Processing |

AWD | ASGD Weight-dropped |

LSTM | Long Short-term Memory Networks |

RNN | Recurrent Neural Network |

RBF | Radian Basis Function |

DFT | Discriminative Fine-tuning |

## References

- Asr, F.T.; Taboada, M. The data challenge in misinformation detection: Source reputation vs. content veracity. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Brussels, Belgium, November 2018; pp. 10–15. [Google Scholar]
- Mukherjee, S. Sentiment analysis. In ML. NET Revealed; Springer: Berlin/Heidelberg, Germany, 2021; pp. 113–127. [Google Scholar]
- Tompkins, J. Disinformation Detection: A review of linguistic feature selection and classification models in news veracity assessments. arXiv
**2019**, arXiv:1910.12073. [Google Scholar] - Hepburn, J. Universal Language model fine-tuning for patent classification. In Proceedings of the Australasian Language Technology Association Workshop, Dunedin, New Zealand, 11–12 December 2018; pp. 93–96. [Google Scholar]
- Katwe, P.; Khamparia, A.; Vittala, K.P.; Srivastava, O.A. Comparative Study of Text Classification and Missing Word Prediction Using BERT and ULMFiT. In Evolutionary Computing and Mobile Sustainable Networks; Springer: Berlin/Heidelberg, Germany, 2021; pp. 493–502. [Google Scholar]
- Shu, K.; Bhattacharjee, A.; Alatawi, F.; Nazer, T.H.; Ding, K.; Karami, M.; Liu, H. Combating disinformation in a social media age. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2020**, 10, e1385. [Google Scholar] [CrossRef] - Howard, J.; Ruder, S. Universal language model fine-tuning for text classification. arXiv
**2018**, arXiv:1801.06146. [Google Scholar] - Chauhan, U.A.; Afzal, M.T.; Shahid, A.; Moloud, A.; Basiri, M.E.; Xujuan, Z. A comprehensive analysis of adverb types for mining user sentiments on amazon product reviews. World Wide Web
**2020**, 23, 1811–1829. [Google Scholar] [CrossRef] - Liu, B. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
- Zhao, W.; Peng, H.; Eger, S.; Cambria, E.; Yang, M. Towards scalable and reliable capsule networks for challenging NLP applications. arXiv
**2019**, arXiv:1906.02829. [Google Scholar] - Georgieva-Trifonova, T.; Duraku, M. Research on N-grams feature selection methods for text classification. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2021; Volume 1031, p. 012048. [Google Scholar]
- Chaturvedi, I.; Ong, Y.S.; Tsang, I.W.; Welsch, R.E.; Cambria, E. Learning word dependencies in text by means of a deep recurrent belief network. Knowl.-Based Syst.
**2016**, 108, 144–154. [Google Scholar] [CrossRef] - Basiri, M.E.; Kabiri, A. HOMPer: A new hybrid system for opinion mining in the Persian language. J. Inf. Sci.
**2020**, 46, 101–117. [Google Scholar] [CrossRef] - Abdar, M.; Basiri, M.E.; Yin, J.; Habibnezhad, M.; Chi, G.; Nemati, S.; Asadi, S. Energy choices in Alaska: Mining people’s perception and attitudes from geotagged tweets. Renew. Sustain. Energy Rev.
**2020**, 124, 109781. [Google Scholar] [CrossRef] - Cambria, E.; Li, Y.; Xing, F.Z.; Poria, S.; Kwok, K. SenticNet 6: Ensemble application of symbolic and subsymbolic AI for sentiment analysis. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 105–114. [Google Scholar]
- Zhang, L.; Ghosh, R.; Dekhil, M.; Hsu, M.; Liu, B. Combining Lexicon-Based and Learning-Based Methods for Twitter Sentiment Analysis; Technical Report HPL-2011; HP Laboratories: Palo Alto, CA, USA, 2011; Volume 89. [Google Scholar]
- Sharaf Al-deen, H.S.; Zeng, Z.; Al-sabri, R.; Hekmat, A. An Improved Model for Analyzing Textual Sentiment Based on a Deep Neural Network Using Multi-Head Attention Mechanism. Appl. Syst. Innov.
**2021**, 4, 85. [Google Scholar] [CrossRef] - Singh, J.; Singh, G.; Singh, R. Optimization of sentiment analysis using machine learning classifiers. Hum.-Cent. Comput. Inf. Sci.
**2017**, 7, 1–12. [Google Scholar] [CrossRef] - Dong, J.; Ding, C.; Mo, J. A low-profile wideband linear-to-circular polarization conversion slot antenna using metasurface. Materials
**2020**, 13, 1164. [Google Scholar] [CrossRef][Green Version] - Jakkula, V. Tutorial on support vector machine (svm). Sch. EECS Wash. State Univ.
**2006**, 37, 121–167. [Google Scholar] - Suthaharan, S. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification; Springer: Berlin/Heidelberg, Germany, 2016; pp. 207–235. [Google Scholar]
- Pisner, D.A.; Schnyer, D.M. Support vector machine. In Machine Learning; Elsevier: Amsterdam, The Netherlands, 2020; pp. 101–121. [Google Scholar]
- Hope, T.; Resheff, Y.S.; Lieder, I. Learning Tensorflow: A Guide to Building Deep Learning Systems; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
- Tarasov, D. Deep recurrent neural networks for multiple language aspect-based sentiment analysis of user reviews. In Proceedings of the 21st International Conference on Computational Linguistics Dialogue, Sydney, NSW, Australia, July 2015; Volume 2, pp. 53–64. [Google Scholar]
- Tai, K.S.; Socher, R.; Manning, C.D. Improved semantic representations from tree-structured long short-term memory networks. arXiv
**2015**, arXiv:1503.00075. [Google Scholar] - Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.Y.; Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642. [Google Scholar]
- Yu, F.; Liu, Q.; Wu, S.; Wang, L.; Tan, T. A Convolutional Approach for Misinformation Identification. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3901–3907. [Google Scholar]
- Czapla, P.; Howard, J.; Kardas, M. Universal language model fine-tuning with subword tokenization for polish. arXiv
**2018**, arXiv:1810.10222. [Google Scholar] - Zhang, J.; Cui, L.; Fu, Y.; Gouza, F.B. Fake news detection with deep diffusive network model. arXiv
**2018**, arXiv:1805.08751. [Google Scholar] - Rane, A.; Kumar, A. Sentiment classification system of twitter data for US airline service analysis. In Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, Japan, 23–27 July 2018; Volume 1, pp. 769–773. [Google Scholar]
- Maas, A.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 142–150. [Google Scholar]
- Abdul-Mageed, M.; Novak, P.K. Deep Learning for Natural Language Sentiment and Affect. Available online: http://kt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf (accessed on 14 October 2021).
- Rathi, M.; Malik, A.; Varshney, D.; Sharma, R.; Mendiratta, S. Sentiment analysis of tweets using machine learning approach. In Proceedings of the 2018 Eleventh International Conference on Contemporary Computing (IC3), Noida, India, 2–4 August 2018; pp. 1–3. [Google Scholar]
- Can, E.F.; Ezen-Can, A.; Can, F. Multilingual sentiment analysis: An rnn-based framework for limited data. arXiv
**2018**, arXiv:1806.04511. [Google Scholar] - Wang, J.; Yu, L.C.; Lai, K.R.; Zhang, X. Dimensional sentiment analysis using a regional CNN-LSTM model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 7–12 August 2016; pp. 225–230. [Google Scholar]
- Singh, R.; Singh, R.; Bhatia, A. Sentiment analysis using Machine Learning technique to predict outbreaks and epidemics. Int. J. Adv. Sci. Res.
**2018**, 3, 19–24. [Google Scholar] - Basiri, M.E.; Nemati, S.; Abdar, M.; Cambria, E.; Acharya, U.R. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gener. Comput. Syst.
**2021**, 115, 279–294. [Google Scholar] [CrossRef] - Xie, Q.; Dai, Z.; Hovy, E.; Luong, M.T.; Le, Q.V. Unsupervised data augmentation for consistency training. arXiv
**2019**, arXiv:1904.12848. [Google Scholar] - Benesty, J.; Chen, J.; Huang, Y. Automatic Speech Recognition: A Deep Learning Approach. 2008. Available online: https://www.microsoft.com/en-us/research/publication/automatic-speech-recognition-a-deep-learning-approach/ (accessed on 12 October 2021).
- Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Aldayel, H.K.; Azmi, A.M. Arabic tweets sentiment analysis—A hybrid scheme. J. Inf. Sci.
**2016**, 42, 782–797. [Google Scholar] [CrossRef] - Rani, S.; Singh, J. Sentiment analysis of Tweets using support vector machine. Int. J. Comput. Sci. Mob. Appl.
**2017**, 5, 83–91. [Google Scholar] - Agarwal, A.; Yadav, A.; Vishwakarma, D.K. Multimodal sentiment analysis via RNN variants. In Proceedings of the 2019 IEEE International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD), Honolulu, HI, USA, 29–31 May 2019; pp. 19–23. [Google Scholar]

Split | Twitter US Airlines [30] | IMDB [31] | GOP Debate [32] | |
---|---|---|---|---|

Positive | Train | 1773 | 18,750 | 1665 |

Test | 590 | 6250 | 555 | |

Negative | Train | 6884 | 18,750 | 6357 |

Test | 2294 | 6250 | 2104 | |

Natural | Train | 2325 | – | 2393 |

Test | 774 | – | 797 | |

Total | – | 14,640 | 50,000 | 13,871 |

Method | Accuracy |
---|---|

Support Vector Machine (SVM) | 78.5% |

Bag-of-words SVM | 78.5% |

Deep Learning Model with Dropouts in Keras | 77.9% |

SIS-ULMFiT [7] | 84.1% |

(ULMFiT-SVM) [Ours] | 99.78% |

Hyper-Parameter Name | Meaning | The Best Value |
---|---|---|

em-sz | Embedding vector size | 0.77 |

nh | Hidden activations number | 0.000005 |

nl | Number of layers | 3 |

bs | Batch size | 32 |

$\beta 1$ | Optimal bias | 0.8 |

$\beta 2$ | Optimal bias | 0.99 |

C-GAMMA | SVM parameters | 5.6569–1.0667 |

**Table 4.**Performance comparisons for ULMFiT-SVM-based Twitter US Airlines, IMDB, and GOP debate datasets with several related approaches.

Dataset | Used Model | Accuracy |
---|---|---|

Twitter US Airlines [30] | SVM only [33] | 78% |

RNN/LSTM (ULMFiT) [34] | 77.8% | |

LSTM, CNN [35] | 79.64% | |

MultinomialNB [36] | $\pm 80$% | |

ABCDM [37] | $\pm 92.75$% | |

ULMFit-SVM (Ours) | 99.78% | |

IMDB [31] | ToWE-SG [38] | 90.8% |

ULMFiT [7] | 95.4% | |

BERT large fine-tune UDA [39] | 95.8% | |

RCNN [40] | 84.70% | |

ULMFit-SVM (Ours) | 99.71% | |

GOP Debate [32] | SIS-ULMFiT [41] | 55.034% |

ULMFit-SVM (Ours) | 95.78% |

**Table 6.**Training time, testing time, and number of support vectors (nSV) of ULMFit-SVM in comparison with SVM for binary classification.

Technique | Time of Training (s) | Time of Testing (s) | nSV |
---|---|---|---|

SVM | 901.095 | 18.533 | 3649 |

ULMFit-SVM | 682.10 | 4.321 | 3448 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

AlBadani, B.; Shi, R.; Dong, J.
A Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the Universal Language Model Fine-Tuning and SVM. *Appl. Syst. Innov.* **2022**, *5*, 13.
https://doi.org/10.3390/asi5010013

**AMA Style**

AlBadani B, Shi R, Dong J.
A Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the Universal Language Model Fine-Tuning and SVM. *Applied System Innovation*. 2022; 5(1):13.
https://doi.org/10.3390/asi5010013

**Chicago/Turabian Style**

AlBadani, Barakat, Ronghua Shi, and Jian Dong.
2022. "A Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the Universal Language Model Fine-Tuning and SVM" *Applied System Innovation* 5, no. 1: 13.
https://doi.org/10.3390/asi5010013