This section reviews the state-of-the-art approaches for sentiment analysis. The existing methods can be broadly divided into two categories: machine learning and deep learning.
2.1. Machine Learning Approaches
Hemakala and Santhoshkumar (2018) [
3] conducted sentiment analysis on a dataset collected from Indian Airlines using seven classical machine learning algorithms, namely Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, and AdaBoost. The dataset was preprocessed to remove stop words and perform lemmatization. The results showed that the AdaBoost model achieved the highest precision of 84.5%.
Makhmudah et al. (2019) [
4] performed sentiment analysis on a dataset of tweets regarding homosexuality in Indonesia using Support Vector Machines (SVM). The dataset was labeled into two classes: positive and negative. The raw data were preprocessed to remove stop words and perform lemmatization and stemming. The Term Frequency–Inverse Document Frequency (TF-IDF) was used as the feature representation for the SVM. The method achieved an accuracy of 99.5% on the dataset.
In the research carried out by Alsalman (2020) [
5], Multinomial Naive Bayes was used for sentiment analysis on Arabic tweets. The authors employed a 4-g tokenization technique and Khoja stemmer for text preprocessing, and represented the processed text as TF-IDF features. The Multinomial Naive Bayes model was trained on the dataset containing 2000 tweets, which were labeled into positive and negative classes and evaluated using five-fold cross validation. The proposed approach demonstrated an impressive accuracy of 87.5% on the Arabic tweet dataset.
In the study by Tariyal et al. (2018) [
6], various classification algorithms were compared for sentiment analysis on product review tweets. The methods explored included Linear Discriminant Analysis, K-Nearest Neighbors, Classification Furthermore, Regression Trees (CART), SVM, Random Forest, and C5.0. The dataset comprised 1150 tweets that underwent preprocessing steps, including stop words and punctuation removal, case folding, and stemming. The cleaned text was transformed into a Term Document matrix and fed into the classification algorithms for sentiment analysis. The experimental results indicated that the CART method achieved the highest accuracy of 88.99%.
Gupta et al. (2019) [
7] proposed a sentiment analysis approach that leverages four different machine learning algorithms: Logistic Regression, Decision Tree, Support Vector Machine, and Neural Network. The sentiment140 dataset was used in the experiments. The raw data were preprocessed, including stop-words removal and lemmatization, and then represented as TF-IDF features. The authors found that the Neural Network model achieved the highest accuracy of 80% compared with the other models.
Similarly, Jemai et al. (2021) [
8] developed a sentiment analyzer that employs a set of machine learning algorithms, including Naive Bayes, Bernoulli Naive Bayes, Multinomial Naive Bayes, Logistic Regression, and Linear Support Vector Classification. The experiments were conducted using the “twitter samples” corpus in the Natural Language Toolkit, which includes 5 k positive and 5 k negative tweets. The preprocessing steps included tokenization, stop-words removal, URLs removal, symbols removal, case folding, and lemmatization. The results showed that the Naive Bayes model achieved the highest accuracy of 99.73%.
This subsection discusses different studies that use various machine learning algorithms for sentiment analysis on different datasets, including Indian Airlines feedback, Indonesian tweets about homosexuality, Arabic tweets, product review tweets, and the sentiment140 dataset. Preprocessing steps, such as stop-words removal, lemmatization, and stemming, were employed in the majority of the studies. The algorithms used include AdaBoost, Support Vector Machine, Multinomial Naive Bayes, Logistic Regression, Decision Tree, Bernoulli Naive Bayes, and Linear Support Vector Classification. The results show that the accuracy of the algorithms varies depending on the dataset and the algorithm used, with accuracy ranging from 80% to 99.73%.
2.2. Deep Learning Approaches
In the work by Ramadhani and Goo (2017) [
9], a deep learning method, Multilayer Perceptron (MLP), was utilized for sentiment analysis. The authors employed a self-collected dataset of 4000 tweets in Korean and English for their experiments. To preprocess the dataset, several steps were applied, including tokenization, case folding, stemming, and the removal of numbers, stop words, and punctuations. The MLP model consisted of three hidden layers and used Stochastic Gradient Descent (SGD) as the optimizer. The proposed method achieved an accuracy of 75.03%.
Similarly, in the work by Demirci et al. (2019) [
10], an MLP model was used for sentiment analysis on Turkish tweets. The authors used a dataset of 3000 positive and negative Turkish tweets with the hashtag “15Temmuz”. The data was preprocessed with the Turkish Deasciifier, tokenization, stop-words and punctuation removal, and stemming. To convert the text into embeddings, the authors employed the Word2vec pretrained model. An MLP model, consisting of six dense layers and three dropout layers, was then used for sentiment classification. The proposed method recorded an accuracy of 81.86% on the dataset.
Additionally, Raza et al. (2021) [
11] utilized a Multilayer Perceptron (MLP) architecture for sentiment analysis on COVID-19-related tweets. The collected dataset consisted of 101,435 tweets labeled as positive or negative. The preprocessing of the dataset involved the removal of HTML tags and non-letters, tokenization, and stemming. The cleaned texts were then transformed into numerical features using Count Vectorizer and TF-IDF Vectorizer. The resulting features were fed into an MLP model consisting of five hidden layers for sentiment classification. The study found that the MLP model with Count Vectorizer achieved an accuracy of 93.73%.
In another work, Rhanoui et al. (2019) [
12] proposed a hybrid model that combines a Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM) for sentiment analysis. The dataset used in their study included 2003 articles and international news in French, which were labeled as neutral, positive, or negative. The texts were represented as embeddings using the pretrained doc2vec model. The hybrid model consists of a convolutional layer, a max pooling layer, a Bi-LSTM layer, a dropout layer, and a classification layer. The results showed that the proposed hybrid model achieved an accuracy of 90.66% on the dataset.
Similarly, Tyagi et al. (2020) [
13] utilized a hybrid architecture that integrates CNN and Bi-LSTM for sentiment analysis, utilizing the Sentiment140 dataset. This dataset consists of 1.6 million positive and negative tweets, which were preprocessed to remove stop words, numbers, URLs, Twitter users’ names, and punctuations, and subjected to case folding and stemming. The hybrid model consisted of an embedding layer using the GloVe pretrained model, one-dimensional CNN layer, Bi-LSTM layer, multiple fully connected layers, dropout layers, and a classification layer. The proposed hybrid model achieved an accuracy of 81.20% on the Sentiment140 dataset.
Jang et al. (2020) [
14] further improved the hybrid architecture of a CNN and Bi-LSTM by incorporating an attention mechanism. The study used the Internet Movie Database (IMDb) dataset with 50k positive and negative reviews for experiments. The texts were represented using the word2vec pretrained embedding model. The model was optimized using the Adam optimizer, L2 regularization, and dropout techniques. The proposed model recorded an accuracy of 90.26% on the IMDb dataset.
In the study by Hossain et al. (2020) [
15], a hybrid model combining a CNN and LSTM was proposed for sentiment analysis. The authors used a self-collected dataset containing 100 restaurant reviews from the Foodpanda and Shohoz Food apps, which underwent preprocessing to remove unimportant words and symbols. The texts were then transformed into word embeddings using the word2vec algorithm. The hybrid model consisted of an embedding layer using the pretrained word2vec model, a convolutional layer, a max pooling layer, an LSTM layer, a dropout layer, and a classification layer. The model achieved an accuracy of 75.01% on the self-collected dataset.
In Yang (2018) [
16], the author proposed a Recurrent Neural Filter-based Convolutional Neural Network (RNN-CNN) and LSTM model for sentiment analysis. In this model, the RNN was utilized as the convolutional filter. The experiments were conducted using the Stanford Sentiment Treebank dataset, which was transformed into word embeddings using the GloVe word embedding model. The model consisted of an embedding layer using the pretrained GloVe model, a pooling layer, and an LSTM layer. The Adam optimizer and early stopping were used to prevent overfitting. The proposed RNN-CNN-LSTM model achieved an accuracy of 53.4% on the Stanford Sentiment Treebank dataset.
The study conducted by Harjule et al. (2020) [
17] aimed to compare the performance of both machine learning and deep learning methods in sentiment analysis. To conduct the experiments, the authors utilized two datasets, namely the Sentiment140 and Twitter US Airline Sentiment datasets. Before the analysis, the datasets were preprocessed to remove noise, such as stop words, URLs, hashtags, punctuations, etc., and tokenized to make the analysis easier. Five methods were used in the comparison, including Multinomial Naive Bayes, Logistic Regression, Support Vector Machine, Long Short-Term Memory, and an ensemble of Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine using majority voting. The results showed that Long Short-Term Memory achieved the highest accuracy of 82% on the Sentiment140 dataset. On the other hand, Support Vector Machine recorded the highest accuracy of 68.9% on the Twitter US Airline Sentiment dataset.
Various studies that employed deep learning approaches for sentiment analysis are presented in this subsection. Most studies used MLP architecture and a hybrid model comprising CNNs and LSTM or Bi-LSTM. Preprocessing the dataset included tokenization, case folding, stemming, and the removal of numbers, stop words, punctuations, and other irrelevant symbols. The datasets used in the experiments were in different languages such as Turkish, French, English, and Korean, and included tweets, restaurant reviews, news articles, and movie reviews. The accuracy of the models ranged from 53.4% to 93.73%. Some studies also utilized pretrained models for text embeddings, such as Word2vec, GloVe, and Doc2vec, and while text embedding is a crucial component of sentiment analysis, it is surprising that only a few studies utilize Transformer models for this task. The benefits of using Transformers for text embedding include their ability to capture contextual relationships, handle long-range dependencies, and produce highly expressive representations that outperform traditional embedding techniques. A summary of the related works is presented in
Table 1.