TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
Abstract
:1. Introduction
- The proposal of a new term weighting scheme for effective text classification, especially for SA purposes. The proposed method relies on three aspects:
- –
- Adopting a method to distinguish between specific and neutral common terms by classifying them into several groups.
- –
- Revising the term weight differently for each term group so it presents its ability to discriminate.
- –
- Ensuring that the term weight reflects its actual presence in the dataset by handling data imbalance issues.
- The performance of the proposed scheme is validated by conducting a comparative study with two other weighting schemes. Moreover, the performance of the three term weighting schemes (including the proposed scheme) have been explored using three different local factors.
- The experiments are performed on four sentiment analysis datasets diversified in terms of language, size, and subject using two ML classification models.
2. Related Work
2.1. Document Representation and Term Weighting
2.2. Local Factor
2.3. Global Factor
2.3.1. UTW Schemes
2.3.2. STW Schemes
- IDF Variants:
- Feature Selection Metrics:
- Relevance Frequency:
- Class Frequency:
3. The Proposed STW Scheme
3.1. Term Classification Approach
- Frequently positive (): common terms related to the positive class.
- Frequently negative (): common terms related to the negative class.
- General (G): common terms related to both classes (neutral).
3.2. Weight Revision Approach
3.2.1. Distinguishing Ability
3.2.2. Relevant Class
- The denominator will avoid division by zero while weighting pure term groups. Moreover, since TF-TDA deals with the percentage of the word’s presence, it is possible to obtain a very low value (less than 1) for the factor. Then, the direct division by such small values will give the word more weight than it deserves. For example, assume that term has and . Then .
4. Experimental Setup
4.1. Datasets
- Multi-domain Arabic resources for sentiment analysis (MARSA): The largest annotated Gulf dataset was provided by the Arabic sentiment analysis research group at Imam Muhammad Ibn Saud Islamic University [42]. This includes several domains; however, the social and sport domains were used as two independent datasets in this study. For the sport domain, the tweets were collected using hashtags generated about football matches. The social dataset concentrated on issues affecting the Saudi society; therefore, the hashtags were created about social issues such as royal orders, Saudi budget, issues affecting the income of Saudi citizens, and others.
- SenWave: The third dataset presents Arabic tweets concerns the COVID-19 domain and was published by [43] for SA purposes.
- Twitter airline sentiment: Dataset made available by Kaggle, consisting of tweets in English that represent reviews of six United States (US) airline companies.
Data Pre-Processing
- Remove stop words: Tweets typically contain valueless words known as stop words, including pronouns, prepositions, and other words [44]. Accordingly, two lists of stop words were defined based on the Arabic and English stop word corpus contained in the NLTK package. Then, all tweets were filtered from the list of words.
- Normalization: The data must be normalized by converting all word forms into a common form. For the English dataset, all words are converted to lower case. For the Arabic datasets, the nature of the Arabic language may require additional steps, such as the following:
- –
- Remove Arabic diacritical marks (taskeel), for example:
(Arabic) is converted to
;
- –
- Remove ‘tatweel’ character ‘ـ’, for example:
(Beautiful) is converted to
;
- –
- Replace elongation characters with a single character, for example: [أإاآ] is replaced with ا and [ؤئ] with ء; also replace the final letter ة with ه and ى with ي. Moreover, some special characters are normalized, for example: گ is normalized to ك.
- Emoji handling: Currently, emojis are widely used by people on social media to express their opinions, and can provide significant information about a text, especially in SA. Therefore, the emojis were handled by replacing them with their text format.Moreover, corresponding texts consisting of several words were combined into a single word. For example, the text format “broken heart” was converted to “brokenheart”. As words such as “heart” can appear in emotions with opposite meanings, such as “red heart” and “broken heart”, weighting the word “heart” as an individual feature will make it a meaningless (neutral) term. Moreover, combining corresponding text into a single word can reduce feature dimensionality.
- Stemming: Stemming is the process of reducing a word to its stem, base, or root [44]. In this study, two stemmers included in NLTK were used: ISRIStemmer for the Arabic datasets and PorterStemmer for the English dataset.
4.2. Feature Representation and Term Weighting
4.3. Classification Models
4.4. Criteria for Performance Evaluation
5. Experimental Results
5.1. Discussion
5.1.1. The Effect of Using Different Local Factors
5.1.2. Find the Optimal k Value
- The same distribution being produced among term groups: Some consecutive k values produce the same distribution among term groups, which immediately produces the same result. For example, the values from 3.7 to 4.3 in Figure 8b obtained the same terms’ distribution and the same F1 score. Table 11 presents the number of terms in each group and the F1 score that was obtained.
- The effect of using certain terms together: A term’s presence or absence within a group of terms can significantly affect the result. For example, when using the SVM with MARSA (Sport), consider the results of values 4.4, 4.5, and 4.6, as presented in Table 12.
- The value that provides the highest weighted F1 score was chosen because the F1 score incorporates both precision and recall scores.
- Among the set of values that provided the same highest F1 score, the value that provided the lowest number of G terms was chosen.
5.1.3. Evaluate the Effectiveness of Common Term Grouping
- Approach 1: all common terms were considered general (G); hence, there are only three groups of terms (G, , and ).
- Approach 2: the common terms were classified based on the optimal k presented in Table 14. Thus, there are five term groups (G, , , , and ).
6. Conclusions and Future Directions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dogan, T.; Uysal, A.K. On term frequency factor in supervised term weighting schemes for text classification. Arab. J. Sci. Eng. 2019, 44, 9545–9560. [Google Scholar] [CrossRef]
- Giachanou, A.; Crestani, F. Like it or not: A survey of twitter sentiment analysis methods. ACM Comput. Surv. (CSUR) 2016, 49, 1–41. [Google Scholar] [CrossRef]
- Dogra, V.; Alharithi, F.S.; Álvarez, R.M.; Singh, A.; Qahtani, A.M. NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange. Systems 2022, 10, 233. [Google Scholar] [CrossRef]
- Kharde, V.; Sonawane, P. Sentiment analysis of twitter data: A survey of techniques. arXiv 2016, arXiv:1601.06971. [Google Scholar]
- Narayanaswamy, G.R. Exploiting BERT and RoBERTa to Improve Performance for Aspect Based Sentiment Analysis. Master’s Thesis, Technological University Dublin, Dublin, Ireland, 2021. [Google Scholar]
- Alruily, M. Classification of arabic tweets: A review. Electronics 2021, 10, 1143. [Google Scholar] [CrossRef]
- Adwan, O.; Al-Tawil, M.; Huneiti, A.; Shahin, R.; Zayed, A.A.; Al-Dibsi, R. Twitter sentiment analysis approaches: A survey. Int. J. Emerg. Technol. Learn. (iJET) 2020, 15, 79–93. [Google Scholar] [CrossRef]
- Aggarwal, C.C. Machine Learning for Text; Springer: Berlin/Heidelberg, Germany, 2018; Volume 848. [Google Scholar] [CrossRef]
- Shanavas, N. Graph-Theoretic Approaches to Text Classification. Ph.D. Thesis, Ulster University, Ulster, Ireland, 2020. [Google Scholar]
- Kumar, A.; Dabas, V.; Hooda, P. Text classification algorithms for mining unstructured data: A SWOT analysis. Int. J. Inf. Technol. 2020, 12, 1159–1169. [Google Scholar] [CrossRef]
- Ezzat, S.; El Gayar, N.; Ghanem, M.M. Sentiment analysis of call centre audio conversations using text classification. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2012, 4, 619–627. [Google Scholar]
- Fayyad, U.M.; Piatetsky-Shapiro, G.; Uthurusamy, R. Summary from the KDD-03 panel: Data mining: The next 10 years. ACM Sigkdd Explor. Newsl. 2003, 5, 191–196. [Google Scholar] [CrossRef]
- Prusa, J.D.; Khoshgoftaar, T.M.; Dittman, D.J. Impact of feature selection techniques for tweet sentiment classification. In Proceedings of the Twenty-Eighth International Flairs Conference, Hollywood, FL, USA, 18–20 May 2015. [Google Scholar]
- Parlar, T.; Özel, S.A. An Investigation of Term Weighting and Feature Selection Methods for Sentiment Analysis. Majlesi J. Electr. Eng. 2018, 12, 63–68. [Google Scholar]
- Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
- Domeniconi, G.; Moro, G.; Pasolini, R.; Sartori, C. A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf. idf. In Proceedings of the International Conference on Data Management Technologies and Applications, Colmar, France, 20–22 July 2015; pp. 39–58. [Google Scholar] [CrossRef]
- Wu, H.; Gu, X.; Gu, Y. Balancing between over-weighting and under-weighting in supervised term weighting. Inf. Process. Manag. 2017, 53, 547–557. [Google Scholar] [CrossRef] [Green Version]
- Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef] [Green Version]
- Jones, D. Group nepotism and human kinship. Curr. Anthropol. 2000, 41, 779–809. [Google Scholar] [CrossRef]
- Liu, Y.; Loh, H.T.; Sun, A. Imbalanced text classification: A term weighting approach. Expert Syst. Appl. 2009, 36, 690–701. [Google Scholar] [CrossRef]
- Leopold, E.; Kindermann, J. Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 2002, 46, 423–444. [Google Scholar] [CrossRef] [Green Version]
- Jones, K.S. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, eb026526. [Google Scholar] [CrossRef]
- Mujahid, M.; Lee, E.; Rustam, F.; Washington, P.B.; Ullah, S.; Reshi, A.A.; Ashraf, I. Sentiment analysis and topic modeling on tweets about online education during COVID-19. Appl. Sci. 2021, 11, 8438. [Google Scholar] [CrossRef]
- Aslam, N.; Xia, K.; Rustam, F.; Hameed, A.; Ashraf, I. Using Aspect-Level Sentiments for Calling App Recommendation with Hybrid Deep-Learning Models. Appl. Sci. 2022, 12, 8522. [Google Scholar] [CrossRef]
- Rustam, F.; Ashraf, I.; Mehmood, A.; Ullah, S.; Choi, G.S. Tweets classification on the base of sentiments for US airline companies. Entropy 2019, 21, 1078. [Google Scholar] [CrossRef] [Green Version]
- Aslam, N.; Xia, K.; Rustam, F.; Lee, E.; Ashraf, I. Self voting classification model for online meeting app review sentiment analysis and topic modeling. PeerJ Comput. Sci. 2022, 8, e1141. [Google Scholar] [CrossRef]
- Altawaier, M.; Tiun, S. Comparison of machine learning approaches on arabic twitter sentiment analysis. Int. J. Adv. Sci. Eng. Inf. Technol. 2016, 6, 1067–1073. [Google Scholar] [CrossRef]
- Wu, H.; Salton, G. A comparison of search term weighting: Term relevance vs. inverse document frequency. In Proceedings of the 4th Annual International ACM SIGIR Conference on Information Storage and Retrieval: Theoretical Issues in Information Retrieval, Oakland, CA, USA, 31 May–2 June 1981; pp. 30–39. [Google Scholar] [CrossRef]
- Tokunaga, T.; Iwayama, M. Text Categorization Based on Weighted Inverse Document Frequency; Information Processing Society of Japan: Tokyo, Japan, 1994. [Google Scholar]
- Martineau, J.; Finin, T. Delta tfidf: An improved feature space for sentiment analysis. In Proceedings of the International AAAI Conference on Web and Social Media, San Jose, CA, USA, 17–20 May 2009; Volume 3, pp. 258–261. [Google Scholar] [CrossRef]
- Paltoglou, G.; Thelwall, M. A Study of Information Retrieval Weighting Schemes for Sentiment Analysis. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010; Association for Computational Linguistics: Uppsala, Sweden, 2010; pp. 1386–1395. [Google Scholar]
- Debole, F.; Sebastiani, F. Supervised term weighting for automated text categorization. In Proceedings of the 2003 ACM Symposium on Applied Computing, Melbourne, FL, USA, 9–12 March 2003; pp. 784–788. [Google Scholar] [CrossRef]
- Deng, Z.H.; Luo, K.H.; Yu, H.L. A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 2014, 41, 3506–3513. [Google Scholar] [CrossRef]
- Lan, M.; Tan, C.L.; Su, J.; Lu, Y. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 721–735. [Google Scholar] [CrossRef]
- Carvalho, F.; Guedes, G.P. TF-IDFC-RF: A novel supervised term weighting scheme. arXiv 2020, arXiv:2003.07193. [Google Scholar]
- Wang, D.; Zhang, H. Inverse-category-frequency based supervised term weighting scheme for text categorization. arXiv 2010, arXiv:1012.2609. [Google Scholar] [CrossRef]
- Ren, F.; Sohrab, M.G. Class-indexing-based term weighting for automatic text classification. Inf. Sci. 2013, 236, 109–125. [Google Scholar] [CrossRef]
- Jiang, Z.; Gao, B.; He, Y.; Han, Y.; Doyle, P.; Zhu, Q. Text classification using novel term weighting scheme-based improved tf-idf for internet media reports. Math. Probl. Eng. 2021, 2021, 1–30. [Google Scholar] [CrossRef]
- Chen, K.; Zhang, Z.; Long, J.; Zhang, H. Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst. Appl. 2016, 66, 245–260. [Google Scholar] [CrossRef]
- Ghosh, S.; Desarkar, M.S. Class specific TF-IDF boosting for short-text classification: Application to short-texts generated during disasters. In Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1629–1637. [Google Scholar] [CrossRef] [Green Version]
- Roul, R.K.; Sahoo, J.K.; Arora, K. Modified TF-IDF term weighting strategies for text categorization. In Proceedings of the 2017 14th IEEE India Council International Conference (INDICON), Roorkee, India, 15–17 December 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Alowisheq, A.; Al-Twairesh, N.; Altuwaijri, M.; Almoammar, A.; Alsuwailem, A.; Albuhairi, T.; Alahaideb, W.; Alhumoud, S. MARSA: Multi-domain Arabic resources for sentiment analysis. IEEE Access 2021, 9, 142718–142728. [Google Scholar] [CrossRef]
- Yang, Q.; Alamro, H.; Albaradei, S.; Salhi, A.; Lv, X.; Ma, C.; Alshehri, M.; Jaber, I.; Tifratene, F.; Wang, W.; et al. Senwave: Monitoring the global sentiments under the COVID-19 pandemic. arXiv 2020, arXiv:2006.10842. [Google Scholar]
- Oussous, A.; Benjelloun, F.Z.; Lahcen, A.A.; Belfkih, S. ASA: A framework for Arabic sentiment analysis. J. Inf. Sci. 2020, 46, 544–559. [Google Scholar] [CrossRef]
- Wu, X.; Kumar, V.; Ross Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Yu, P.S.; et al. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef] [Green Version]
- Sabbah, T.; Selamat, A.; Selamat, M.H.; Al-Anzi, F.S.; Viedma, E.H.; Krejcar, O.; Fujita, H. Modified frequency-based term weighting schemes for text classification. Appl. Soft Comput. 2017, 58, 193–206. [Google Scholar] [CrossRef]
- Abdelaal, H.M.; Elmahdy, A.N.; Halawa, A.A.; Youness, H.A. Improve the automatic classification accuracy for Arabic tweets using ensemble methods. J. Electr. Syst. Inf. Technol. 2018, 5, 363–370. [Google Scholar] [CrossRef]
- Duwairi, R.M.; Qarqaz, I. A framework for Arabic sentiment analysis using supervised classification. Int. J. Data Mining Model. Manag. 2016, 8, 369–381. [Google Scholar] [CrossRef]
- AlSalman, H. An improved approach for sentiment analysis of arabic tweets in twitter social media. In Proceedings of the 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, 19–21 March 2020; pp. 1–4. [Google Scholar] [CrossRef]
- Aljabri, M.; Chrouf, S.M.B.; Alzahrani, N.A.; Alghamdi, L.; Alfehaid, R.; Alqarawi, R.; Alhuthayfi, J.; Alduhailan, N. Sentiment analysis of Arabic tweets regarding distance learning in Saudi Arabia during the COVID-19 pandemic. Sensors 2021, 21, 5431. [Google Scholar] [CrossRef]
- Duwairi, R.M.; Marji, R.; Sha’ban, N.; Rushaidat, S. Sentiment analysis in arabic tweets. In Proceedings of the 2014 5th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 1–3 April 2014; pp. 1–6. [Google Scholar]
- Dietterich, T.G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998, 10, 1895–1923. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Notation | Formulation | Description |
---|---|---|
Raw term frequency, finds occurrences of term in the document. | ||
Logarithmic term frequency. | ||
Square root of the term frequency. | ||
1, if 0, otherwise | Term presence. | |
Augmented term frequency [18]. | ||
BM25 [19]. | ||
Normalized [20]. | ||
Inverse term frequency [21]. |
Notation | Description |
---|---|
a | Positive document frequency; number of documents in positive class where the term occurs at least once. |
b | Number of training documents in the positive category where the term does not occur. |
c | Negative document frequency; number of documents in negative class where the term occurs at least once. |
d | Number of training documents in the negative category where the term does not occur. |
N | Total number of training documents in all classes; . |
Total number of training documents in positive class; . | |
Total number of training documents in negative class; . | |
Number of classes in the collection. | |
Class frequency—the number of classes that include the term . |
Traditional Feature Engineering | |||
---|---|---|---|
Ref. | Feature Engineering | Dataset | Classification Model |
[23] | BoW and TF-IDF | Online-education-during-COVID-19 | ETC, AdaBoost, GNB, kNN, SGD, RF, DT, SVM |
[24] | BoW, TF-IDF, and hashing | Reviews regarding the calling apps | SVM, kNN, DT, LR, RF, LSTM, CNN, and GRU |
[25] | TF, TF-IDF and word2vec | Twitter-airline-sentiment | Calibrated, SVM, AdaBoost, DT, Gaussian NB, ET, RF, LR, SGD, and GBM |
[26] | TF-IDF, BoW, and hashing | Meeting app’s reviews | SVM, DT, LR, kNN, and RF |
Improved Weighting Schemes | |||
Ref. | Global Factor | Dataset | Classification Model |
[30] | didf | Movie review | SVM |
[31] | dsidf, dspidf, dbidf | Movie review, multi-domain sentiment dataset of Amazon products reviews, and BLOGS06 | SVM |
[32] | ig, gr, chi | Reuters-21578 | Roccio, kNN, and SVM |
[33] | ig, mi, chi | Cornell movie review, multi-domain sentiment dataset of Amazon products reviews, and Stanford large movie review | SVM |
[1] | DFS | Reuters-21578 and 20 newsgroups | SVM and Roccio |
[34] | rf | Reuters-21578, 20 newsgroups, and Ohsumed | kNN and SVM |
[35] | IDFC-RF | Movie review, subjectivity, Amazon sarcasm, and polarity | NB, SVM |
[16] | idfec-b | Reuters-21578, 20 newsgroups, movie review data, and multi-domain sentiment dataset of Amazon product reviews | RF, SVM |
[20] | prob-based | MCV1 and Reuters-21578 | NB, SVM |
[36] | icf, icf-based | Reuters-21578, the balanced 20 newsgroups, and la12 | kNN, SVM, and centroid |
[37] | IDF-ICF, IDF-ICSF | Reuters-21578, 20 newsgroups, and RCV1-v2 | Centroid-based, SVM, and NB |
Term Group | Constraint |
---|---|
G |
Term | Weight | |||
---|---|---|---|---|
30 | 10 | 20 | ||
30 | 21 | 9 |
Term Group | Weighting Scheme |
---|---|
TF(,) | |
TF(,) | |
G | TF(,) |
TF(,) | |
TF(,) |
Dataset | Class Label | # Training Samples | # Testing Samples | # Total Samples |
---|---|---|---|---|
MARSA (Sport) | Positive | 9276 | 2283 | 19,568 |
Negative | 6378 | 1631 | ||
MARSA (Social) | Positive | 1992 | 507 | 7523 |
Negative | 4026 | 998 | ||
SenWave | Positive | 1206 | 305 | 4244 |
Negative | 2189 | 544 | ||
Airline | Positive | 1580 | 416 | 1083 |
Negative | 7090 | 1752 |
Dataset | # All Features | |||
---|---|---|---|---|
MARSA (Sport) | 3228 | 2375 | 2230 | 7833 |
MARSA (Social) | 2249 | 3211 | 862 | 6322 |
SenWave | 1771 | 817 | 2507 | 5095 |
Airline | 1775 | 767 | 3968 | 6510 |
Model | Dataset | TF-IDF | icf-Based | TF-TDA | TF-IDF | icf-Based * | Average |
---|---|---|---|---|---|---|---|
M-NB | MARSA (Sport) | 87.70% | 86.93% | 89.00% | 1.30% | 2.07% | 1.69% |
MARSA (Social) | 82.31% | 83.25% | 85.52% | 3.21% | 2.27% | 2.74% | |
Senwave | 78.62% | 78.19% | 80.96% | 2.34% | 2.77% | 2.56% | |
Airline | 88.10% | 90.20% | 90.72% | 2.62% | 0.52% | 1.57% | |
SVM | MARSA (Sport) | 85.62% | 88.04% | 88.64% | 3.02% | 0.60% | 1.81% |
MARSA (Social) | 79.23% | 81.55% | 83.22% | 3.99% | 1.67% | 2.83% | |
Senwave | 75.29% | 77.04% | 76.94% | 1.65% | −0.10% | 0.77% | |
Airline | 85.93% | 88.76% | 89.76% | 3.83% | 1.00% | 2.42% |
Dataset | Model | TF-TDA vs. TF-IDF | TF-TDA vs. icf-Based |
---|---|---|---|
p-Value | p-Value | ||
MARSA (Sport) | M-NB | 0.846 | 0.0000597 |
SVM | 0.0006 | 0.0455 | |
MARSA (Social) | M-NB | 0.00052 | 0.0000296 |
SVM | 0.000233 | 0.000039 | |
SenWave | M-NB | 0.021 | 0.000021 |
SVM | 0.38 | 0.92 | |
Airline | M-NB | 0.076 | 0.0034 |
SVM | 0.000028 | 0.00005 |
k | #G | # | # | F1 Score |
---|---|---|---|---|
[3.7–4.3] | 2235 | 12 | 2 | 83.22% |
k | #G | # | # | F1 Score |
---|---|---|---|---|
4.4 | 3211 | 7 | 10 | 88.64% |
4.5 | 3212 | 7 | 9 | 88.61% |
4.6 | 3213 | 6 | 9 | 88.64% |
Terms in | Terms in | ||||||||
---|---|---|---|---|---|---|---|---|---|
Term ID | Term | Term ID | Term | ||||||
برك | * | * | * | هلل | * | * | * | ||
الف | * | * | * | الي | * | * | * | ||
فوز | * | * | * | * | * | * | |||
زعم | * | * | * | دونيس | * | * | * | ||
بطل | * | * | * | فتح | * | * | * | ||
جمل | * | * | * | فرج | * | * | * | ||
شكر | * | * | - | حسب | * | * | * | ||
تسل | * | - | - | ||||||
* | * | * | |||||||
ردس | * | * | * | ||||||
Total = | 7 | 7 | 6 | Total = | 10 | 9 | 9 |
Model | Dataset | k | #G | # | # |
---|---|---|---|---|---|
M-NB | MARSA (Sport) | 0.3 | 2806 | 156 | 266 |
MARSA (Social) | 0.5 | 1995 | 111 | 143 | |
SenWave | 3.1 | 1751 | 16 | 4 | |
Airline | 0.7 | 1594 | 57 | 124 | |
SVM | MARSA (Sport) | 3.1 | 3200 | 13 | 15 |
MARSA (Social) | 3.7 | 2235 | 12 | 2 | |
SenWave | 3 | 1749 | 17 | 5 | |
Airline | 2.3 | 1737 | 12 | 26 |
Model | Dataset | Approach 1 | Approach 2 | Improvement |
---|---|---|---|---|
M-NB | MARSA (Sport) | 89.35% | 89.00% | −0.35% |
MARSA (Social) | 84.93% | 85.52% | 0.59% | |
SenWave | 80.37% | 80.96% | 0.59% | |
Airline | 89.67% | 90.72% | 1.05% | |
SVM | MARSA (Sport) | 88.61% | 88.64% | 0.03% |
MARSA (Social) | 83.02% | 83.21% | 0.19% | |
SenWave | 76.83% | 76.94% | 0.11% | |
Airline | 89.56% | 89.76% | 0.20% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alshehri, A.; Algarni, A. TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis. Electronics 2023, 12, 1632. https://doi.org/10.3390/electronics12071632
Alshehri A, Algarni A. TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis. Electronics. 2023; 12(7):1632. https://doi.org/10.3390/electronics12071632
Chicago/Turabian StyleAlshehri, Arwa, and Abdulmohsen Algarni. 2023. "TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis" Electronics 12, no. 7: 1632. https://doi.org/10.3390/electronics12071632
APA StyleAlshehri, A., & Algarni, A. (2023). TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis. Electronics, 12(7), 1632. https://doi.org/10.3390/electronics12071632