Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning
Abstract
:1. Introduction
2. Related Work
3. Dataset and Preprocessing
Data Cleaning and Noise Reduction
4. Linguistic Data Processing Using Natural Language Processing (NLP)
4.1. Computing Sentiment Polarity
4.2. Machine Learning Techniques for Sentiment Analysis
- X: instance of class (sentiment labels for each tweet)
- Y: sentiment class (positive, negative, or neutral)
- P(X|Y): instance occurred in particular class for each value of Y (class-conditional density)
- P(Y): prior probability of class
5. Results and Discussion
6. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Neri, F.; Aliprandi, C.; Capeci, F.; Cuadros, M.; By, T. Sentiment analysis on social media. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), Istanbul, Turkey, 26–29 August 2012. [Google Scholar]
- Steven, B.; Klein, E.; Loper, E. Natural Language Processing with Python; O’Reilly Media, Inc.: Newton, MA, USA, 2009. [Google Scholar]
- Twitter. Available online: https://about.twitter.com/company (accessed on 30 June 2019).
- Olsson, F. A Literature Survey of Active Machine Learning in the Context of Natural Language Processing. Available online: https://www.semanticscholar.org/paper/A-literature-survey-of-active-machine-learning-in-Olsson/abebd207b1cf56ced502b0bb203d1f231b58d699 (accessed on 1 April 2009).
- Eman, M.G.Y. Sentiment Analysis and Text Mining for Social Media Microblogs using Open Source Tools: An Empirical Study. Int. J. Comput. Appl. 2015, 112, 44–48. [Google Scholar]
- Firmino Alves, A.L.; Baptista, C.D.S.; Firmino, A.A.; Oliveira, M.G.D.; Paiva, A.C.D. A Comparison of SVM versus naive-bayes techniques for sentiment analysis in tweets: A case study with the 2013 FIFA confederations cup. In Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, João Pessoa, Brazil, 18–21 November 2014; ACM: New York, NY, USA, 2014. [Google Scholar]
- Hemalatha, I.; Dr GP Saradhi, V.; Govardhan, A. Case Study on Online Reviews Sentiment Analysis Using Machine Learning Algorithms. Int. J. Innov. Res. Comput. Commun. Eng. 2014, 2, 3182–3188. [Google Scholar]
- Hemalatha, I.; Dr GP Saradhi, V.; Govardhan, A. Preprocessing the informal text for efficient sentiment analysis. Int. J. Emerg. Trends Technol. Comput. Sci. (Ijettcs) 2012, 1, 58–61. [Google Scholar]
- Bandgar, B.M.; Kumar, B. Real time extraction and processing of social tweets. Int. J. Comput. Sci. Eng. 2015, 2347–2693, 1–6. [Google Scholar]
- Augustyniak, Ł.; Szymański, P.; Kajdanowicz, T.; Tuligłowicz, W. Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis. Entropy 2015, 18, 4. [Google Scholar] [CrossRef] [Green Version]
- Isah, H.; Trundle, P.; Neagu, D. Social media analysis for product safety using text mining and sentiment analysis. In Proceedings of the 2014 14th UK Workshop on Computational Intelligence (UKCI), Bradford, UK, 8–10 September 2014; IEEE: New York, NY, USA, 2014. [Google Scholar]
- Bakshi, R.K.; Kaur, N.; Kaur, R.; Kaur, G. Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2008, 2, 1–135. [Google Scholar] [CrossRef] [Green Version]
- MacArthur, A. The Real History of Twitter, In Brief-How the Micro-Messaging Wars were Won. Lifewire. Available online: https://www.lifewire.com/history-of-Twitter-3288854 (accessed on 11 September 2020).
- Fernández-Gavilanes, M.; Álvarez-López, T.; Juncal-Martínez, J.; Costa-Montenegro, E.; González-Castaño, F.J. Unsupervised method for sentiment analysis in online texts. Expert Syst. Appl. 2016, 58, 57–75. [Google Scholar] [CrossRef]
- Perkins, J. Python Text Processing with NLTK 2.0 Cookbook; Packt Publishing Ltd.: Birmingham, UK, 2010. [Google Scholar]
- Toman, M.; Roman, T.; Karel, J. Influence of word normalization on text classification. Proc. Inscit 2006, 4, 354–358. [Google Scholar]
- Porter Martin, F. An algorithm for suffix stripping. Program 1980, 14, 130–137. [Google Scholar] [CrossRef]
- Liu, H.; Christiansen, T.; Baumgartner, W.A.; Verspoor, K. BioLemmatizer: A lemmatization tool for morphological processing of biomedical text. J. Biomed. Semant. 2012, 3, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bhattacharyya, P.; Bahuguna, A.; Talukdar, L.; Phukan, B. Facilitating multi-lingual sense annotation: Human mediated lemmatizer. In Proceedings of the Seventh Global Wordnet Conference, Tartu, Estonia, 25–29 January 2014. [Google Scholar]
- Saif, H.; He, Y.; Alani, H. Semantic sentiment analysis of Twitter. In International Semantic Web Conference; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Kouloumpis, E.; Wilson, T.; Moore, J. Twitter Sentiment Analysis: The Good the Bad and the OMG! In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM), Catalonia, Spain, 17–21 July 2011; pp. 538–541. [Google Scholar]
- Wawer, A.; Choukri, K.; Maegaard, B.; Mariani, J.; Odijk, J. Is Sentiment a Property of Synsets? Evaluating Resources for Sentiment Classification using Machine Learning. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, 17–23 May 2010; pp. 1101–1104. [Google Scholar]
- Ohana, B.; Tierney, B. Sentiment classification of reviews using SentiWordNet. In Proceedings of the 9th IT & T Conference Dublin Institute of Technology, Dublin, Ireland, 22–23 October 2009. [Google Scholar]
- Esuli, A.; Sebastiani, F. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, 22–28 May 2006; Volume 6, pp. 417–422. [Google Scholar]
- Weka 3: Data Mining Software in Java. Weka 3-Data Mining with Open Source Machine Learning Software in Java. N.p., n.d. Web. 03 Jan 2017. Available online: http://www.cs.waikato.ac.nz/ml/weka/ (accessed on 16 July 2019).
- Murphy, K.P. Naive Bayes Classifiers; The University of British Columbia: Vancouver, BC, Canada, 2006; Volume 18, p. 60. [Google Scholar]
- Wang, L. (Ed.) Support Vector Machines: Theory and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2005; Volume 177. [Google Scholar]
- Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Al Amrani, Y.; Lazaar, M.; El Kadiri, K.E. Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput. Sci. 2018, 127, 511–520. [Google Scholar] [CrossRef]
Hashtags # | Date | Number of Tweets | File Size (Approximately) |
---|---|---|---|
#brazil2014 | 8 June to 15 June 2014 (8 days) | 1,415,958 | 268 MB |
#worldcup | 6 June to 14 July 2014 (40 days) | 44,040,192 | 4 GB |
Game hashtags (e.g., #ALGRUS Algeria vs. Russia) | June–July 2014 | Approx. 2 million tweets | More than 2 GB |
Monthly Active Users | 313 M |
---|---|
Unique visits monthly to the sites with embedded tweets | 1 Billion |
Active users on mobile | 82% |
Employees around the world | 3860 |
Offices around the world | 35+ |
Accounts outside U.S. | 79% |
Languages supported | 40+ |
Employees in technical roles | 40% |
Id | Created at | Screen Name | Followers Count | Retweet | Text |
---|---|---|---|---|---|
4760000000 | Sun 8 June 19:49:54 2014 CDT | ravi2talk28 | 4 | TRUE | RT @MeetTheMazda: birthday From Waka Waka for South Africa to this for Brazil. LOVE Shakira _ÙÕÄ #Brazil2014 |
4760000000 | Mon 9 June 23:59:58 2014 CDT | Franc**** | 185 | FALSE | Feel it, it‘s here I know how Brazilians r feeling, that feeling is special @robertmarawa @YesWeCrann @Soccer_Laduma @GoalcomSA |
47600000002 | Mon 9 June 23:59:16 2014 CDT | B**Farlz | 27 | TRUE | RT @Socceroos: NEWS | Chile are likely to be without Arturo Vidal for our #Brazil2014 opener - http://t.co/yJ4ej6M6lS #GoSocceroos #CHIAUS |
Input: Twitter comments or text data Output: Pre-processed text data |
For each comment in the Twitter data file Initialize temporary empty string processedTweet to store result of output. 1. Replace all URLs or https://links with the word ‘URL’ using regular expression methods and store the result in processedTweet. 2. Replace all ‘@username’ with the word ‘AT_USER’ and store the result in processedTweet. 3. Filter All #Hashtags and RT from the comment and store the result in processedTweet. 4. Look for repetitions of two or more characters and replace with the character itself. Store result in processedTweet. 5. Filter all additional special characters (: \ | [ ] ; : {} – + ( ) < > ? ! @ # % *,) from the comment. Store result in processedTweet. 6. Remove the word ‘URL’ which was replaced in step 1 and store the result in processedTweet. 7. Remove the word ‘AT_USER’ which was replaced in step 1 and store the result in processedTweet. Return processedTweet. |
Preprocessing Tasks | File Size (KB) | File Size in % | Processing Time (s) |
---|---|---|---|
Before preprocessing | 4308 | 100% | NA |
After removing URLs | 3695 | 85.77% | 1.15 |
Rename and removing of “RT@username” from the tweets | 3518 | 81.66% | 1.32 |
Filtering #Hashtags from tweets | 3442 | 79.90% | 2.06 |
Removing repeated characters | 3431 | 79.64% | 2.42 |
Removing special characters | 3420 | 79.39% | 2.70 |
Input: Filtered tweets Output: Tokenize words |
For all words in Processed Tweets Tokenize the word passing to Tweet Tokenizer Method and append Tokenize Sentence Return Tokenize Sentence |
Input: Tokenize words Output: Stemmed and lemmatized words |
For word in word Tokens Initialize StemmedSentence variable to empty list If length of word greater than 2 Method call for stemming the word using PorterStemmer object. Method call for lemmatizing the word using WordNetLemmatizer object Append StemmedSentence list Return Stemmed Sentence List |
Input: Stemmed and lemmatized words Output: Negation tagged word ‘1’ for negative reference word and ‘0’ for positive reference word |
Initialize Total_Mark_List For neg_mark in mark_negation Parse last 4 character in the neg_mark is ‘_NEG’ If parsed word contain the tag ‘_NEG’ If tail word contains ‘_NEG’ Add 1 to the Total_Mark_List Else neg_mark Add 0 to the Total_Mark_List Return Total_Mark_List |
Example: Negation Words | Example 1 | Example 2 |
---|---|---|
Input words | [‘I’, ‘am’, ‘connect’, ‘with’, ‘world’, ‘cup’, ‘and’, “it’”, ‘GOOD’, ‘Connect’, ‘each’, ‘other’, ‘with’, ‘team’, ‘World’, ‘Cup’, ‘Song’, ‘connect’, ‘Worldcup’, ‘2014’, ‘Brazil’, ‘2014’] | [‘I’, “don’t”, ‘enjoy’, ‘this’, ‘game’, ‘it’, ‘was’, ‘disgusting’, ‘and’, ‘all’, ‘the’, ‘audience’, ‘was’, ‘upset’] |
First step (Mark Negation) | [‘I’, ‘am’, ‘connect’, ‘with’, ‘world’, ‘cup’, ‘and’, “it’”, ‘GOOD’, ‘Connect’, ‘each’, ‘other’, ‘with’, ‘team’, ‘World’, ‘Cup’, ‘Song’, ‘connect’, ‘Worldcup’, ‘2014’, ‘Brazil’, ‘2014’] | [‘I’,“don’t”, ‘enjoy_NEG’, ‘this_NEG’, ‘game_NEG’, ‘it_NEG’, ‘was_NEG’, ‘disgust_NEG’, ‘and_NEG’, ‘all_NEG’, ‘the_NEG’, ‘audienc_NEG’, was_NEG’, ‘upset_NEG’] |
Second Step List of score: ‘1’—Negative sense word meaning in sentence ‘0’—Positive sense word meaning in sentence | [‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’] | [‘0’, ‘0’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’, ‘1’] |
Input: POS (part-of-speech) tagged word, negation marks (‘1’ for negative or ‘0’ for positive) Output: A unique synset word with its part of speech and close meaning to the word. |
Method GetSynset by passing POS tag word and Negation mark Method to Sanitize part-of-speech (POS) tag to WordNet accepted POS For synset in WordNet Synsets (word, POS tag): Returns list of synsets for the words For lemma in synset list: If word equals to ‘lemma name’ Append Synonyms (word with the same meaning) list If word has its Antonyms Append Antonyms (word with opposite meaning) list If negation mark is ‘0’ and it is not NULL Return first synonym of word and POS tag from Synonyms list Else Return the same word and POS requested Else IF negation mark is ‘1’ and it is not NULL Return first antonyms of word and POS tag from Synonyms list Else Return the same word and POS requested |
Example: Word Sanitize | Tagged Text Data |
---|---|
POS tagged sentence | [(‘I’, ‘PRP’), (‘am’, ‘VBP’), (‘connect’, ‘JJ’), (‘with’, ‘IN’), (‘world’, ‘NN’), (‘cup’, ‘NN’), (‘and’, ‘CC’), (“it’”, ‘VB’), (‘GOOD’, ‘JJ’), (‘Connect’, ‘NNP’), (‘each’, ‘DT’), (‘other’, ‘JJ’), (‘with’, ‘IN’), (‘team’, ‘NN’), (‘World’, ‘NNP’), (‘Cup’, ‘NNP’), (‘Song’, ‘NNP’), (‘connect’, ‘NN’), (‘Worldcup’, ‘NNP’), (‘2014’, ‘CD’), (‘Brazil’, ‘NNP’), (‘2014’, ‘CD’)] |
Sanitized POS tags with word | (I , None) (am , v) (connect , a) (with, None) (world, n) (cup, n) (and, None) (it’, v) (GOOD, a) (Connect, n) (each, None) (other, a) (with, None) (team, n) (World, n) (Cup, n) (Song, n) (connect, n) (Worldcup, n) (2014, None) (Brazil, n) (2014, None) |
Example: Synsets for Word | Text Data |
---|---|
Sanitized POS tags with word | (I, None) (am, v) (connect, a) (with, None) (world, n) (cup, n) (and, None) (it‘, v) (GOOD, a) (Connect, n) (each, None) (other, a) (with, None) (team, n) (World, n) (Cup, n) (Song, n) (connect, n) (Worldcup, n) (2014, None) (Brazil, n) (2014, None) |
Synsets obtained for each word followed by POS tag and sense number # | [Synset(‘iodine.n.01’), Synset(‘one.n.01’), Synset(‘i.n.03’), Synset(‘one.s.01’)] [Synset(‘be.v.01’), Synset(‘be.v.02’), Synset(‘be.v.03’), Synset(‘exist.v.01’), Synset(‘be.v.05’), Synset(‘equal.v.01’), Synset(‘constitute.v.01’), Synset(‘be.v.08’), Synset(‘embody.v.02’), Synset(‘be.v.10’), Synset(‘be.v.11’), Synset(‘be.v.12’), Synset(‘cost.v.01’)] [Synset(‘universe.n.01’), Synset(‘world.n.02’), Synset(‘world.n.03’), Synset(‘earth.n.01’), Synset(‘populace.n.01’), Synset(‘world.n.06’), Synset(‘worldly_concern.n.01’), Synset(‘world.n.08’)] [Synset(‘cup.n.01’), Synset(‘cup.n.02’), Synset(‘cup.n.03’), Synset(‘cup.n.04’), Synset(‘cup.n.05’), Synset(‘cup.n.06’), Synset(‘cup.n.07’), Synset(‘cup.n.08’)] [Synset(‘good.a.01’), Synset(‘full.s.06’), Synset(‘good.a.03’), Synset(‘estimable.s.02’), Synset(‘beneficial.s.01’), Synset(‘good.s.06’), Synset(‘good.s.07’), Synset(‘adept.s.01’), Synset(‘good.s.09’), Synset(‘dear.s.02’), Synset(‘dependable.s.04’), Synset(‘good.s.12’), Synset(‘good.s.13’), Synset(‘effective.s.04’), Synset(‘good.s.15’), Synset(‘good.s.16’), Synset(‘good.s.17’), Synset(‘good.s.18’), Synset(‘good.s.19’), Synset(‘good.s.20’), Synset(‘good.s.21’)] [Synset(‘each.s.01’), Synset(‘each.r.01’)] [Synset(‘other.a.01’), Synset(‘other.s.02’), Synset(‘early.s.03’), Synset(‘other.s.04’)] [Synset(‘team.n.01’), Synset(‘team.n.02’)] [Synset(‘universe.n.01’), Synset(‘world.n.02’), Synset(‘world.n.03’), Synset(‘earth.n.01’), Synset(‘populace.n.01’), Synset(‘world.n.06’), Synset(‘worldly_concern.n.01’), Synset(‘world.n.08’)] [Synset(‘cup.n.01’), Synset(‘cup.n.02’), Synset(‘cup.n.03’), Synset(‘cup.n.04’), Synset(‘cup.n.05’), Synset(‘cup.n.06’), Synset(‘cup.n.07’), Synset(‘cup.n.08’)] [Synset(‘song.n.01’), Synset(‘song.n.02’), Synset(‘song.n.03’), Synset(‘birdcall.n.01’), Synset(‘song.n.05’), Synset(‘sung.n.01’)] [Synset(‘brazil.n.01’), Synset(‘brazil_nut.n.02’)] |
Example: Obtaining Lemmas (Head Word) from the Synsets | Text Data The Example Shown for the Synset Term “Good”. |
---|---|
Synsets obtained for each word followed by POS tag and sense number # | [Synset(‘good.a.01’), Synset(‘full.s.06’), Synset(‘good.a.03’), Synset(‘estimable.s.02’), Synset(‘beneficial.s.01’), Synset(‘good.s.06’), Synset(‘good.s.07’), Synset(‘adept.s.01’), Synset(‘good.s.09’), Synset(‘dear.s.02’), Synset(‘dependable.s.04’), Synset(‘good.s.12’), Synset(‘good.s.13’), Synset(‘effective.s.04’), Synset(‘good.s.15’), Synset(‘good.s.16’), Synset(‘good.s.17’), Synset(‘good.s.18’), Synset(‘good.s.19’), Synset(‘good.s.20’), Synset(‘good.s.21’)] |
Lemmas for the synsets Here the last or end word is known as ‘lemma’s name’ | Lemma(‘good.a.01.good’) Lemma(‘full.s.06.full’) Lemma(‘full.s.06.good’) Lemma(‘good.a.03.good’) Lemma(‘estimable.s.02.estimable’) Lemma(‘estimable.s.02.good’) Lemma(‘estimable.s.02.honorable’) Lemma(‘estimable.s.02.respectable’) Lemma(‘beneficial.s.01.beneficial’) Lemma(‘beneficial.s.01.good’) Lemma(‘good.s.06.good’) Lemma(‘good.s.07.good’) Lemma(‘good.s.07.just’) Lemma(‘good.s.07.upright’) Lemma(‘adept.s.01.adept’) Lemma(‘adept.s.01.expert’) Lemma(‘adept.s.01.good’) Lemma(‘adept.s.01.practiced’) Lemma(‘adept.s.01.proficient’) Lemma(‘adept.s.01.skillful’) Lemma(‘adept.s.01.skilful’) Lemma(‘good.s.09.good’) Lemma(‘dear.s.02.dear’) Lemma(‘dear.s.02.good’) Lemma(‘dear.s.02.near’) Lemma(‘dependable.s.04.dependable’) Lemma(‘dependable.s.04.good’) Lemma(‘dependable.s.04.safe’) Lemma(‘dependable.s.04.secure’) Lemma(‘good.s.12.good’) Lemma(‘good.s.12.right’) Lemma(‘good.s.12.ripe’) Lemma(‘good.s.13.good’) Lemma(‘good.s.13.well’) Lemma(‘effective.s.04.effective’) |
Example: Assigning Polarity Using SentiWordNet | Text Data |
---|---|
Input: Sanitized POS tags with word | (I, None) (am, v) (connect, a) (with, None) (world, n) (cup, n) (and, None) (it‘, v) (GOOD, a) (Connect, n) (each, None) (other, a) (with, None) (team, n) (World, n) (Cup, n) (Song, n) (connect, n) (Worldcup, n) (2014, None) (Brazil, n) (2014, None) |
Output: Sentiment score for the synset term obtained from the SentiWordNet database | <i.n.01: PosScore = 0.0 NegScore = 0.0> <be.v.01: PosScore = 0.25 NegScore = 0.125> <universe.n.01: PosScore = 0.0 NegScore = 0.0> <cup.n.01: PosScore = 0.0 NegScore = 0.0> <good.a.01: PosScore = 0.75 NegScore = 0.0> <each.s.01: PosScore = 0.0 NegScore = 0.0> <other.a.01: PosScore = 0.0 NegScore = 0.625> <team.n.01: PosScore = 0.0 NegScore = 0.0> <universe.n.01: PosScore = 0.0 NegScore = 0.0><cup.n.01: PosScore = 0.0 NegScore = 0.0> <song.n.01: PosScore = 0.0 NegScore = 0.0> <brazil.n.01: PosScore = 0.0 NegScore = 0.0> |
Text Data | Total PosScore | Total NegScore | Sentiment Polarity |
---|---|---|---|
[‘I’, ‘am’, ‘connect’, ‘with’, ‘world’, ‘cup’, ‘and’, “it‘”, ‘GOOD’, ‘Connect’, ‘each’, ‘other’, ‘with’, ‘team’, ‘World’, ‘Cup’, ‘Song’, ‘connect’, ‘Worldcup’, ‘2014’, ‘Brazil’, ‘2014’] | 1.0 | 0.75 | positive |
[‘MATCHDAY’, ‘arg’, ‘v’, ‘bel’, ‘argbel’, ‘WorldCup’, ‘2014’, ‘TousEnsembl’] | 0.0 | 0.0 | neutral |
[‘I’, ‘am’, ‘child’, ‘woman’, ‘swimmer’, ‘and’, ‘I’, ‘like’, ‘swim’] | 0.375 | 0.125 | positive |
[‘I’, “don’t”, ‘enjoy’, ‘thi’, ‘game’, ‘it‘, ‘wa’, ‘disgust’, ‘and’, ‘all’, ‘the’, ‘audienc’, ‘wa’, ‘upset’] | 0.375 | 1.125 | negative |
Ratio | Sensitivity/Recall | FP Rate | Precision | F-Measure | Accuracy | AUC |
---|---|---|---|---|---|---|
Classifier: Naïve Bayes | ||||||
60:40 | 0.860 | 0.137 | 0.868 | 0.856 | 86.00% | 0.957 |
70:30 | 0.876 | 0.119 | 0.881 | 0.873 | 87.57% | 0.958 |
80:20 | 0.878 | 0.116 | 0.883 | 0.876 | 87.79% | 0.958 |
90:10 | 0.882 | 0.111 | 0.887 | 0.880 | 88.17% | 0.958 |
Classifier: SVM | ||||||
60:40 | 0.409 | 0.362 | 0.427 | 0.393 | 40.91% | 0.524 |
70:30 | 0.411 | 0.362 | 0.426 | 0.393 | 41.11% | 0.524 |
80:20 | 0.412 | 0.363 | 0.428 | 0.394 | 41.22% | 0.525 |
90:10 | 0.417 | 0.364 | 0.431 | 0.398 | 41.74% | 0.527 |
Classifier: Random Forest | ||||||
60:40 | 0.853 | 0.127 | 0.853 | 0.840 | 85.32% | 0.970 |
70:30 | 0.848 | 0.135 | 0.848 | 0.835 | 84.75% | 0.969 |
80:20 | 0.850 | 0.130 | 0.850 | 0.837 | 85.01% | 0.970 |
90:10 | 0.848 | 0.132 | 0.848 | 0.835 | 84.81% | 0.968 |
Classifier: KNN | ||||||
60:40 | 0.875 | 0.086 | 0.874 | 0.874 | 87.45% | 0.911 |
70:30 | 0.875 | 0.086 | 0.874 | 0.875 | 87.48% | 0.911 |
80:20 | 0.875 | 0.085 | 0.874 | 0.875 | 87.49% | 0.911 |
90:10 | 0.875 | 0.086 | 0.874 | 0.874 | 87.45% | 0.911 |
Predicted Class | |||||
---|---|---|---|---|---|
Actual Class | A (positive) | B (neutral) | C (negative) | Total | |
A (positive) | 5955 (TP) | 1171 (FN) | 128 (FN) | 7254 | |
B (neutral) | 308 (FP) | 12,479 | 206 | 12,993 | |
C (negative) | 68 (FP) | 998 | 3022 | 4088 | |
Total | 6331 | 14,648 | 3356 | 24,335 |
Predicted Class | |||||
---|---|---|---|---|---|
Actual Class | A (positive) | B (neutral) | C (negative) | Total | |
A (positive) | 4463 (TP) | 2537(FN) | 254 (FN) | 7254 | |
B (neutral) | 6972 (FP) | 5542 | 479 | 12,993 | |
C (negative) | 2324 (FP) | 1612 | 152 | 4088 | |
Total | 13759 | 9691 | 885 | 24,335 |
Predicted Class | |||||
---|---|---|---|---|---|
Actual Class | A (positive) | B (neutral) | C (negative) | Total | |
A (positive) | 6606 (TP) | 498(FN) | 150 (FN) | 7254 | |
B (neutral) | 448 (FP) | 12,364 | 181 | 12,993 | |
C (negative) | 510 (FP) | 1785 | 1793 | 4088 | |
Total | 7564 | 14,647 | 2124 | 24,335 |
Predicted Class | |||||
---|---|---|---|---|---|
Actual Class | A (positive) | B (neutral) | C (negative) | Total | |
A (positive) | 6183 (TP) | 795(FN) | 276 (FN) | 7254 | |
B (neutral) | 722 (FP) | 11,835 | 436 | 12,993 | |
C (negative) | 299 (FP) | 517 | 3272 | 4088 | |
Total | 7204 | 13,147 | 3984 | 24,335 |
Ratio | Accuracy | AUC | |
---|---|---|---|
Naïve Bayes | 90:10 | 88.17% | 0.958 |
SVM | 90:10 | 41.77% | 0.527 |
Random Forest | 60:40 | 85.32% | 0.970 |
KNN | 80:20 | 87.49% | 0.911 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Patel, R.; Passi, K. Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning. IoT 2020, 1, 218-239. https://doi.org/10.3390/iot1020014
Patel R, Passi K. Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning. IoT. 2020; 1(2):218-239. https://doi.org/10.3390/iot1020014
Chicago/Turabian StylePatel, Ravikumar, and Kalpdrum Passi. 2020. "Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning" IoT 1, no. 2: 218-239. https://doi.org/10.3390/iot1020014
APA StylePatel, R., & Passi, K. (2020). Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning. IoT, 1(2), 218-239. https://doi.org/10.3390/iot1020014