An Analysis of Sentiment: Methods, Applications, and Challenges †

: Sentiment analysis involves contextually examining text to identify and extract subjective information from source material. It aids businesses in comprehending the public sentiment surrounding their brand, product, or service while monitoring online discussions. Nevertheless, analyzing social media content is often limited to basic sentiment analysis and simple count-based metrics. Devices that allow the collection of huge amounts of unstructured, opinionated data are becoming increasingly connected with humans. Everyday-activity-related comments and evaluations have been obtained as a result of the advances in Internet-based services like social media platforms and blogs. This study supplies a comprehensive assessment of sentiment analysis approaches to provide academics with a global perspective on the analysis of feelings and its associated domain, applications, and challenges. To comprehend the applications of sentiment analysis, this article provides a detailed explanation of the technique for performing this activity. To comprehend the beneﬁts and drawbacks of each method, they are then evaluated, compared, and discussed. To establish future perspectives, the difﬁculties of sentiment analysis are ﬁnally evaluated.


Introduction
The computational investigation of human behavior, perspectives, and feelings towards an item is referred to as "sentiment analysis (SA) or opinions mining (OM)".A person, an event, or a topic may be represented by the object.The Natural Language Processing (NLP) approach known as SA tries to extract opinions and attitudes from texts [1,2].The objectives of SA are to identify if a particular text reflects a positive or negative attitude; hence, this task may be thought of as a text-classification issue [3].Nevertheless, while sentiment analysis may appear to be a simple procedure, it involves the evaluation of other NLP subtasks, including sarcasm and subjectivity identification [4,5].
Nowadays, sentiment analysis is widely considered by businesses, governments, and organizations in addition to researchers [6].Despite the long and distinguished history of using public opinion in decision making, there should be several prior publications acknowledging SA, a branch of NLP [7].Even though SA has evolved in the modern era, it is still functional.The issues, approaches, applications, and algorithms used in sentiment analysis will all be examined in this paper.It provides the task with simple tables, flowcharts, and graphs that demonstrate comparative data analysis.According to our knowledge, existing surveys typically overlook certain SA methods in aid of lexiconbased, machine learning (ML), and hybrid techniques.The goals of this study are to develop the conceptual framework and discuss its applications and challenges in this sector.The key contributions of the survey may be summed up as follows: • The definition of the sentiment analysis procedure in detail, as well as the identification of well-known techniques for performing it in this paper; it was achieved via the reviews of a variety of publications.

•
Analyses of the different techniques to select which is best for a certain application.

•
To comprehend more easily obtainable techniques like ML, lexicon-based models, and hybrid models, we categorize and describe commonly used SA methodologies.
Section 2, phases of SA, is how the literature review is structured.Section 3 explains the "Data Gathering, characteristic extracting, and method of Feature Selection", including each step from data extraction through to the different sentiment analysis techniques.The basic sentiment analysis approach and its summaries are detailed in Section 4. Sections 5 and 6 contain applications and challenges of sentiment analysis in several fields, and finally, Section 7 concludes the work.

Phases of Sentiment Analysis
At various stages, sentiment analysis has been researched.Thoughts and views can, nevertheless, primarily be examined during the document phase, sentence phase, or aspect phase [8,9].The stages of sentiment analysis are depicted in Figure 1.The first two phases are both interesting and quite difficult.The third step, however, is more challenging, since it conducts a fine-grained study.Here is a quick explanation of each stage:  The definition of the sentiment analysis procedure in detail, as well as the identification of well-known techniques for performing it in this paper; it was achieved via the reviews of a variety of publications.

•
Analyses of the different techniques to select which is best for a certain application.

•
To comprehend more easily obtainable techniques like ML, lexicon-based models, and hybrid models, we categorize and describe commonly used SA methodologies.
Section 2, phases of SA, is how the literature review is structured.Section 3 explains the "Data Gathering, characteristic extracting, and method of Feature Selection", including each step from data extraction through to the different sentiment analysis techniques.The basic sentiment analysis approach and its summaries are detailed in Section 4. Sections 5 and 6 contain applications and challenges of sentiment analysis in several fields, and finally, Section 7 concludes the work.

Phases of Sentiment Analysis
At various stages, sentiment analysis has been researched.Thoughts and views can, nevertheless, primarily be examined during the document phase, sentence phase, or aspect phase [8,9].The stages of sentiment analysis are depicted in Figure 1.The first two phases are both interesting and quite difficult.The third step, however, is more challenging, since it conducts a fine-grained study.Here is a quick explanation of each stage:

Document Phase
A single polarity is assigned to the whole document after conducting SA at the document phase.Not many organizations use SA of this kind.The classification of a book's chapters or pages as good, negative, or neutral could be performed using this method.At this stage, the document can be categorized using "supervised and unsupervised learning techniques" [10].The most important difficulties in document-level sentiment analysis are "cross-domain" and "cross-linguistic" sentiment analysis [11].

Sentence Phase
At this stage of analysis, each sentence is analyzed to identify its polarity.This is extremely powerful when the subject is associated with a wide range of emotions [12].This phase of categorization is connected to subjective classification [13].For some applications, the data-level analytics view is not enough [14], and prior research on sentencelevel analysis aimed at identifying subjective sentences is needed.Nevertheless, there are more challenging assignments, including coping with conditional words or unclear statements [15].

Document Phase
A single polarity is assigned to the whole document after conducting SA at the document phase.Not many organizations use SA of this kind.The classification of a book's chapters or pages as good, negative, or neutral could be performed using this method.At this stage, the document can be categorized using "supervised and unsupervised learning techniques" [10].The most important difficulties in document-level sentiment analysis are "cross-domain" and "cross-linguistic" sentiment analysis [11].

Sentence Phase
At this stage of analysis, each sentence is analyzed to identify its polarity.This is extremely powerful when the subject is associated with a wide range of emotions [12].This phase of categorization is connected to subjective classification [13].For some applications, the data-level analytics view is not enough [14], and prior research on sentencelevel analysis aimed at identifying subjective sentences is needed.Nevertheless, there are more challenging assignments, including coping with conditional words or unclear statements [15].

Aspect Phase
This phase conducts a fine-grained analysis to identify attitudes towards certain aspects of things.As a result, the complexities at this level assist in understanding precisely what people like and dislike [16].According to [17], the fundamental job of sentiment analysis is aspect extraction.This can be accomplished using either implicit or explicit characteristics.Mowlaei et al. [18] suggested a method for aspect-based SA employing adaptive aspect-based lexicons.To achieve a higher performance, two or more levels could be integrated to complete sentiment analysis rather than just one level.Mai and Le [19] suggested a combined strategy of sentiment analysis at the aspect and phrase layers for product comments on YouTube.

Aspect Phase
This phase conducts a fine-grained analysis to identify attitudes towards certain aspects of things.As a result, the complexities at this level assist in understanding precisely what people like and dislike [16].According to [17], the fundamental job of sentiment analysis is aspect extraction.This can be accomplished using either implicit or explicit characteristics.Mowlaei et al. [18] suggested a method for aspect-based SA employing adaptive aspect-based lexicons.To achieve a higher performance, two or more levels could be integrated to complete sentiment analysis rather than just one level.Mai and Le [19] suggested a combined strategy of sentiment analysis at the aspect and phrase layers for product comments on YouTube.

Data Pre-Processing
Specifically with social media, there is a plethora of unstructured data produced.These data may be noisy in their raw form and include a variety of syntax and grammatical problems [20].The overall process includes numerous typical tasks:

Feature Extraction
In sentiment classification, feature extraction (FE) is a crucial operation since it requires sifting through text data to find important information that may be used to improve the model's performance.Due to the difficulty in extracting features from the text, Venugopalan and Gupta (2015) [21] also included features in their research.

Terms Frequency (TF)
It is among the easiest methods for expressing characteristics that are increasingly often being utilized in sentiment analysis and other NLP applications, such as information retrieval.According to Sharma et al. (2013) [22], it aids in determining one word, or a unigram, or a group of between two and three terms, which could be a bi-gram or tri-gram.

Parts of Speech (PoS) Tag
Grammatical tagging is the technique of categorizing words in a corpus of texts (corpus) according to their syntax and semantics."Tokens are divided into nouns, verbs, pronouns, adverbs, adjectives, and prepositions".PoS taggers, that are included in NLTK or

Data Pre-Processing
Specifically with social media, there is a plethora of unstructured data produced.These data may be noisy in their raw form and include a variety of syntax and grammatical problems [20].The overall process includes numerous typical tasks:

Feature Extraction
In sentiment classification, feature extraction (FE) is a crucial operation since it requires sifting through text data to find important information that may be used to improve the model's performance.Due to the difficulty in extracting features from the text, Venugopalan and Gupta (2015) [21] also included features in their research.

Terms Frequency (TF)
It is among the easiest methods for expressing characteristics that are increasingly often being utilized in sentiment analysis and other NLP applications, such as information retrieval.According to Sharma et al. (2013) [22], it aids in determining one word, or a uni-gram, or a group of between two and three terms, which could be a bi-gram or tri-gram.

Parts of Speech (PoS) Tag
Grammatical tagging is the technique of categorizing words in a corpus of texts (corpus) according to their syntax and semantics."Tokens are divided into nouns, verbs, pronouns, adverbs, adjectives, and prepositions".PoS taggers, that are included in NLTK or Spacy, may be used for this job.Stanford PoS-tagger is most frequently used in the research [23].

Negations
These are the phrases that could transform the meaning of a sentence and modify the polarity of an opinion.Negation terms like not, cannot, neither, never, nowhere, none, etc., are often used.Excluding all negative phrases from stop words might increase processing expenses and decrease the model's accuracy since not every word in the sentence will cause the polarity to shift.Negative words should be used with great care [24].

Bag of Words (BoW)
BoW is among the most basic ways for deleting text features.BoW would describe the word consequences in a document.The repertoire of words needed to create a vector for every statement is represented by a bag.The primary issue with this approach is that it disregards the grammatical significance of the text.TF-IDF is used to analyze the performance of the BoW method, which generally makes it more effective.

Feature Selection (FS)
An attribute of the data is described by a feature, as was previously explained.A feature may be redundant, relevant, or neither.Several feature selection (FS) approaches are used to eliminate duplicate and unnecessary characteristics.To lower the size of the feature dimension space and increase the precision of emotion classification, FS is a technique that involves finding and removing extraneous and unnecessary characteristics from the feature list [25].
Lexicon-based approaches and statistical techniques are used in feature selection [26].Features in lexicon-based techniques are developed by humans.To develop a smaller feature set, the procedure is often begun by gathering phrases with strong feeling.The next step is to add more terms to this collection using web resources or synonym detection.The Senti-WordNet lexicon is a well-known illustration of this methodology.The four standard classifications for statistical techniques are the filter method, wrapper approach, embedding method, and hybrid model [27].

Filter Method
This method of feature selection is the most used one.Based on the overall features of the training data, it chooses features absent from the use of other machine learning techniques.Various statistical criteria are used to rate the feature, and the features with the highest rankings are then selected [28].

Wrapper Method
This method is dependent on ML approaches since it depends on the results of the algorithm.As a consequence of such dependence, approaches are usually continuous and computationally time-consuming, even though they can discover the optimum feature set for a certain prediction method.

Embedded Method
This framework incorporates the FS into the operation of the analysis approach.It employs classification algorithms with feature selection functionality built in.However, the method is algorithm specific [29].

Hybrid Method
In this technique, filter and wrapper approaches are combined; hybrid techniques typically use different methods to create the ideal feature subset.For sentiment analysis, several hybrid feature selection algorithms have been created [30].

Classification of Sentiment Analysis
The study of sentiment is a fast-growing and dynamic research field with numerous applications.The purpose is to enhance the analysis of sentiment performance and address challenges related to this topic.Based on different viewpoints, it is important to integrate the current techniques into sentiment analysis.Nevertheless, the majority of research often separates SA methodologies into three groups: ML methodologies, lexicon-based methodologies, and hybrid methodologies [31].The lexicon-based technique makes use of the emotion lexicon, which is a collection of expressions that are often used to describe either good or negative emotions [32]."A hybrid technique, on the other hand, combines machine learning and lexicon-based methods to enhance sentiment analysis performance".Figure 3 depicts the structure of the SA techniques.

Classification of Sentiment Analysis
The study of sentiment is a fast-growing and dynamic research field with numerous applications.The purpose is to enhance the analysis of sentiment performance and address challenges related to this topic.Based on different viewpoints, it is important to integrate the current techniques into sentiment analysis.Nevertheless, the majority of research often separates SA methodologies into three groups: ML methodologies, lexiconbased methodologies, and hybrid methodologies [31].The lexicon-based technique makes use of the emotion lexicon, which is a collection of expressions that are often used to describe either good or negative emotions [32]."A hybrid technique, on the other hand, combines machine learning and lexicon-based methods to enhance sentiment analysis performance".Figure 3 depicts the structure of the SA techniques.

Machine Learning Approaches
Based on train and test datasets, ML techniques are used to categorize sentiment polarity.According to [33], these methods may be categorized as follows: "unsupervisedlearning", "semi-supervised-learning", and "supervised learning".When a classification program contains a predetermined set of classes, the supervised technique is utilized; however, when identifying this set is difficult due to a lack of labeled data, the unsupervised method may be critical.In contrast, the semi-supervised approach can be applied to unlabeled datasets that contain some labeled instances.To aid the agent's integration with its environment and maximize cumulative rewards, reinforcement learning algorithms utilize trial-and-error techniques.
However, classifiers trained on specific data do not perform well for different domains [34].
These algorithms must be trained before being applied to real data.Data in text format can be used to extract features.SA systems can be trained to go beyond simple content to detect informational content, profanity, and misuse of words.Commonly used algorithms include the following:

Naive Bayes (NB)
Both categorization and training are completed using the NB approach.The Bayes rule-based Bayesian classification method NB is used.The Bayes rule is used by NB, a probabilistic classifier, to forecast the likelihood that a specific collection of features will be included with a particular label.When the proportion of data used for training is small, NB is commonly utilized.Positive NB categorization was 10% more accurate than negative NB categorization.As an outcome of its implementation, average accuracy was enhanced.

Machine Learning Approaches
Based on train and test datasets, ML techniques are used to categorize sentiment polarity.According to [33], these methods may be categorized as follows: "unsupervisedlearning", "semi-supervised-learning", and "supervised learning".When a classification program contains a predetermined set of classes, the supervised technique is utilized; however, when identifying this set is difficult due to a lack of labeled data, the unsupervised method may be critical.In contrast, the semi-supervised approach can be applied to unlabeled datasets that contain some labeled instances.To aid the agent's integration with its environment and maximize cumulative rewards, reinforcement learning algorithms utilize trial-and-error techniques.
However, classifiers trained on specific data do not perform well for different domains [34].
These algorithms must be trained before being applied to real data.Data in text format can be used to extract features.SA systems can be trained to go beyond simple content to detect informational content, profanity, and misuse of words.Commonly used algorithms include the following:

Naive Bayes (NB)
Both categorization and training are completed using the NB approach.The Bayes rule-based Bayesian classification method NB is used.The Bayes rule is used by NB, a probabilistic classifier, to forecast the likelihood that a specific collection of features will be included with a particular label.When the proportion of data used for training is small, NB is commonly utilized.Positive NB categorization was 10% more accurate than negative NB categorization.As an outcome of its implementation, average accuracy was enhanced.

Support Vector Machine (SVM)
The SVM approach is used in that technique to analyze data and set decision boundaries, which makes use of hyper-planes.SVMs are a kind of supervised learning method that is not probabilistic and is widely used for classification problems.Finding the hyperplane that divides the data into subgroups most effectively is the main goal of SVM.Because of this, SVM aims for the hyperplane with the maximum achievable margin.Linear SVM and VADER were used by Borg and Boldt (2020) [35] to predict the sentiment of customer reviews.
The linear SVM classifier was implemented enormously [36], with an F1 score of 83.4% and a mean AUC of 0.896.Moreover, their algorithm showed an improvement in email exchanges that was predicted using the emotion of unseen emails.In contrast to the conventional binary classification, the problem emphasized that the subjectivity of opinion and legitimacy of the expresser must be evaluated [37].To integrate perspectives from microblogs, a methodology (Wu et al., 2020) [38] was suggested.The issues stated in the views that were relevant to the users' requests were located and extracted, and the opinions were then classified using SVM.For the experiment, ref. [39] was also implemented using Twitter tweet data.They considered that compiling the opinions for microblogs would be advantageous.

Machine Learning Approaches
Multiplying the value of an input through a weight value is a step in the machinelearning technique called logistic regression.It is the classifier, which generates knowledge of the input characteristics that aid in classifying between positive and negative classes.Any category of independent variables is possible.According to the LR technique [40], the dependent variable is binary and the predicting variables demonstrate little to no multicollinearity.

K-Nearest Neighbors (KNN)
Although KNN is not frequently used in sentiment analysis, the KNN technique has been found to yield good results if properly trained.The K value can be chosen using any hyper-parameter tuning process, such as grid search or randomized search cross-validation.The polarity can be determined via soft addition or hard voting depending on the scores of the K nearest neighbors.

Decision Tree (DT)
A tree is formed using the training example in the supervised learning method known as the DT Classifier to categorize the text's polarity.These are employed to identify the trait that best captures reality and ought to be at the base.Yan-Yan et al. (2010) proposed a graph-based reporting strategy to combine sentence-level and sentence-level data [41].Tagged information that may be used to identify between legitimate and fraudulent reviews is presented in the work of [42].

Lexicon-Based Approach
Lexicons are arrays of tokens that have a predetermined score that indicates whether the content is neutral, positive, or negative.The Lexicon-based approach adds positive, negative, and neutral values to each token for a certain review or text.Finally, based on the highest value of the individual evaluations, the statement is assigned an overall polarity.As a result, the text is separated into single-word tokens, and each token's polarity is computed and averaged.The lexicon-based method is especially useful for feature and sentence-level sentiment analysis.Because no training data is necessary, this strategy may be classified as unsupervised.

Corpus-Based Approach
The model uses the semantic and syntactic structure to determine the meaning of a sentence and a syntactic model or another similar model is used for describing the concept of tokens and their orientation in the larger body after using a list of words and their orientation.This approach is unique and works well for learning from multiple domains.The following methods are known for corpus-based methods.Statistical methods and semantic methods are described below.
Statistical method: statistical methods can be used to identify facts or association patterns between ideas.
Semantic method: This method calculates the similarity score of the markers for emotional evaluation.Widely used for this purpose is WordNet.

Dictionary-Based Method
This strategy is predicated on the idea that whereas antonyms and synonymous terms have opposing emotional polarity, respectively, the well-known dictionaries are used to generate the sentiment lexicons in this method.Like other dictionary-based methods, the main problem with dictionary-based concepts is the inability to define conceptual words with specific expressions, making them unsuitable for content and specific distribution.Additionally, it is difficult and time consuming to write dependency code.

Hybrid Methods
Lexicon and machine learning techniques are combined in the hybrid model.To process ambiguity and include the context of sentiment words, it integrates the efficiency of lexical analysis with the adaptability of ML methodologies [43].The key benefit of the hybrid technique is that it inherits both stability from the lexicon-based model and good accuracy from ML.The hybrid system integrates techniques from the two preceding methods to overcome their disadvantages and enhance their advantages.

Comparison of Different Classification Techniques
The accuracy, precision, and recall of different classification techniques based on machine learning techniques are evaluated and compared in this section.Table 1 shows the results of the study.In the comparison of machine learning techniques, the SVM has achieved the highest accuracy of 94.05%, whereas the Decision Tree (DT) has achieved the highest precision rate of 88.86% when compared to all machine techniques, and the SVM has achieved the highest recall rate of 87.34% as compared to all other techniques.Figure 4 shows the graphical representation of the overall comparison of classification techniques of sentiment analysis.

Applications of Sentiment Analysis
The following list of relevant sectors and areas where sentiment analysis is used is provided:

Business Analysis
SA has several advantages in the context of business intelligence (BI).Businesses may also use sentiment analysis data to enhance goods, look into customer comments, and build creative marketing plans.

Applications of Sentiment Analysis
The following list of relevant sectors and areas where sentiment analysis is used is provided: Eng. Proc.2023, 59, 68 8 of 11

Business Analysis
SA has several advantages in the context of business intelligence (BI).Businesses may also use sentiment analysis data to enhance goods, look into customer comments, and build creative marketing plans.

Health Care and Medical Domain
Recently, sentiment analysis has been used in this industry, among others.Surveys, Twitter [44], blogs, news stories, reviews, and other sources may all be used to gather data.The usage of sentiment analysis and other NLP applications is being actively researched by domain specialists [45] (Ebadi et al., 2021).

Review Analysis
The field of entertainment makes substantial use of sentiment analysis.Reviews of movies, television series, and short films can be examined to establish the opinion of the public [46] (Kumar et al., 2019).The travel industry is working to improve the customer experience through machine learning and web-based customer insights, often focused on smart, data-driven decision making.

Customer Voice
Compiling and evaluating all user input from call centers, emails, surveys, chats, and the web, data may be categorized and organized using sentiment analysis to find patterns and recurrent problems and concerns.

Social Media Monitoring
Social media data tracks client opinions in real time, around the clock, seven days a week, allowing businesses to respond promptly to unfavorable comments and better their reputation when they receive positive comments.

Challenges in Sentiment Analysis
SA primarily focuses on analyzing evaluations and comments about various people and processing them to extract any helpful information.The SA process is influenced by several factors, all of which must be managed properly to produce the final classification or clustering report.Some of these difficulties are described further below.

Sarcasm Handling
Sarcasm is often referred to as the use of words that convey the opposite of what is intended, e.g., "What a good batsman he is, he scores zero in every other inning".These phrases are challenging to identify.

Domain Dependency
In SA, words are mostly implemented as an analytical element.However, the meaning of the terms varies from sentence to sentence.There are not many terms whose definitions vary depending on the context.Additionally, there are terms known as contronyms that have contradictory meanings in certain contexts.Identifying the context in which a term is used might be difficult since it determines how the text is analyzed and, ultimately, how the result turns out.

Negations
The inclusion of negative words in the content can drastically change the precise meaning of the phrase in which they appear.As a result, when reading reviews, it is critical to keep these phrases in mind.

Spam Detection
Reviews are examined as part of sentiment analysis.However, a small amount of qualitative analysis has been performed so far to determine whether the evaluations are genuine or were provided by someone who is indeed a valid reviewer.The service is reviewed favorably or unfavorably by a large number of people who do not know the company's product or service.Knowing which reviews are false and which are real is quite tough to do, but it is crucial for SA.

Conclusions
A brief description of the analysis of sentiment and associated techniques is provided in this publication.The major goal of this article is to explore and categorize the most prevalent sentiment analysis categorization methods.Following a brief introduction of crucial steps including data collection and FS, various layers of sentiment analysis were first reviewed.The classification process then moved on to sentiment categorization approaches.Then, several strategies for categorizing sentiments were constructed.In this field, supervised ML algorithms are frequently used due to their ease of use and high precision.

Figure 1 .
Figure 1.Various phases of sentiment analysis.

Figure 1 .
Figure 1.Various phases of sentiment analysis.

Figure 2
Figure 2 illustrates the general sentiment analysis process.Data have been collected, having been retrieved in different formats from a wide range of sources, transformed into text, and then analyzed with NLP methods.The processing step especially includes features like "text pre-processing, feature extraction, and feature selection".

Figure 2
Figure 2 illustrates the general sentiment analysis process.Data have been collected, having been retrieved in different formats from a wide range of sources, transformed into text, and then analyzed with NLP methods.The processing step especially includes features like "text pre-processing, feature extraction, and feature selection".

Figure 2 .
Figure 2. General flow of sentiment analysis.

Figure 2 .
Figure 2. General flow of sentiment analysis.

Figure 4 .
Figure 4. Performance comparison of classification techniques.

Table 1 .
Evaluation results of various machine learning techniques.