Next Article in Journal
Convergence of Generalized Lupaş-Durrmeyer Operators
Next Article in Special Issue
A New Machine Learning Algorithm Based on Optimization Method for Regression and Classification Problems
Previous Article in Journal
A Novel Technique to Solve the Fuzzy System of Equations
Article

Comparison of Supervised Classification Models on Textual Data

Department of Industrial Engineering & Management, Cheng Shiu University, Kaohsiung 83347, Taiwan
Mathematics 2020, 8(5), 851; https://doi.org/10.3390/math8050851
Received: 8 May 2020 / Revised: 21 May 2020 / Accepted: 21 May 2020 / Published: 24 May 2020
Text classification is an essential aspect in many applications, such as spam detection and sentiment analysis. With the growing number of textual documents and datasets generated through social media and news articles, an increasing number of machine learning methods are required for accurate textual classification. For this paper, a comprehensive evaluation of the performance of multiple supervised learning models, such as logistic regression (LR), decision trees (DT), support vector machine (SVM), AdaBoost (AB), random forest (RF), multinomial naive Bayes (NB), multilayer perceptrons (MLP), and gradient boosting (GB), was conducted to assess the efficiency and robustness, as well as limitations, of these models on the classification of textual data. SVM, LR, and MLP had better performance in general, with SVM being the best, while DT and AB had much lower accuracies amongst all the tested models. Further exploration on the use of different SVM kernels was performed, demonstrating the advantage of using linear kernels over polynomial, sigmoid, and radial basis function kernels for text classification. The effects of removing stop words on model performance was also investigated; DT performed better with stop words removed, while all other models were relatively unaffected by the presence or absence of stop words. View Full-Text
Keywords: machine learning; text classification; sentiment analysis; IMDb reviews; 20 newsgroups machine learning; text classification; sentiment analysis; IMDb reviews; 20 newsgroups
Show Figures

Figure 1

MDPI and ACS Style

Hsu, B.-M. Comparison of Supervised Classification Models on Textual Data. Mathematics 2020, 8, 851. https://doi.org/10.3390/math8050851

AMA Style

Hsu B-M. Comparison of Supervised Classification Models on Textual Data. Mathematics. 2020; 8(5):851. https://doi.org/10.3390/math8050851

Chicago/Turabian Style

Hsu, Bi-Min. 2020. "Comparison of Supervised Classification Models on Textual Data" Mathematics 8, no. 5: 851. https://doi.org/10.3390/math8050851

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop