Multi-Aspect Oriented Sentiment Classiﬁcation: Prior Knowledge Topic Modelling and Ensemble Learning Classiﬁer Approach

: User-generated content on numerous sites is indicative of users’ sentiment towards many issues, from daily food intake to using new products. Amid the active usage of social networks and micro-blogs, notably during the COVID-19 pandemic, we may glean insights into any product or service through users’ feedback and opinions. Thus, it is often difﬁcult and time consuming to go through all the reviews and analyse them in order to recognize the notion of the overall goodness or badness of the reviews before making any decision. To overcome this challenge, sentiment analysis has been used as an effective rapid way to automatically gauge consumers’ opinions. Large reviews will possibly encompass both positive and negative opinions on different features of a product/service in the same review. Therefore, this paper proposes an aspect-oriented sentiment classiﬁcation using a combination of the prior knowledge topic model algorithm (SA-LDA), automatic labelling (SentiWordNet) and ensemble method (Stacking). The framework is evaluated using the dataset from different domains. The results have shown that the proposed SA-LDA outperformed the standard LDA. In addition, the suggested ensemble learning classiﬁer has increased the accuracy of the classiﬁer by more than ~3% when it is compared to baseline classiﬁcation algorithms. The study concluded that the proposed approach is equally adaptable across multi-domain applications.


Introduction
Amid the active usage of social networks and micro-blogs, especially during the COVID-19 pandemic, we may glean insights into any product or service through users' feedback and opinions. Platforms such as micro-blogs, social media sites, online reviews, and discussion forums are rapidly growing. Therefore, it is challenging and time-consuming to go through all the reviews and analyse them with the intention of discovering the notion of the overall goodness or badness of these reviews. Accordingly, the essential endeavours to automatically analyse the sentiments of the users' reviews are increasingly needed.
Opinion mining and sentiment analysis are automatic classifications of textual information that focus on classifying data according to polarity (positive or negative). These automatic techniques could possibly be among the adopted ways to gauge both user impressions and satisfaction. User-generated content usually contains unstructured text that is used in classification tasks such as information extraction (IE), text analysis and natural language processing (NLP). It is applied to a vast number of reviews. Therefore, there is urgent demand for an advanced framework and formulas that can deal with the massive amount of information in order to precisely handle them and provide the most accurate related results.

Related Work
In this section, we offer a brief summary of the previous work in the context of aspect extraction via prior knowledge topic modelling, sentiment lexicon classification, and ensemble learning methods for Sentiment Analysis.

Multi-Aspect Topic Modelling for Aspect Extraction (Prior Knowledge Models)
Aspect extraction is one of the central phases in analysing the expressed opinions, emotions and viewpoints in textual data shared for a certain topic. Despite the current aspect extraction procedures that are based on topic models, the result of engaging only topic-models leads to generate unrelated and incoherent aspects. Prior knowledge semisupervised models are introduced to enhance the correctness of aspects extraction using topic models with minimal user involvement. These proposed models aim to use domainspecific knowledge to guide the model in the topics extraction task to border the amount of unrelated extracted topics.
Several studies revealed that employing prior knowledge of a topic model has raised the aspect extraction accuracy. However, existing research studies have concentrated on a single domain using knowledge to extract aspects from a specific domain. For instance, Shi et al. [1] proposed a novel clustering method by leveraging prior knowledge to enhance the web services clustering task accuracy using a semi-supervised technique. The results have confirmed that the approach provides a major improvement in the clustering accuracy.
The overall performance of all these and most other prior knowledge topic modelling techniques have used LDA-based techniques for aspect extraction to indicate that the extracted aspects are more corresponding and more accurate, as they significantly optimize the execution of the baseline topic models [10,11].

Sentiment Lexicon Classification
Sentiment lexicon classification (sentiment analysis) is the computational analysis of people's thoughts, ideas, and feelings towards an entity [12], and it involves classifying them into positive, neutral, or negative categories. Sentiment lexicon approaches are applied to label data and to measure the sentiment polarity. Sentiment lexicon classification relies on two sorts of approaches which are corpus-based and dictionary-based [13].
Many existing studies have applied sentiment lexicon to different domains and languages [14][15][16][17][18]. Most of these studies have used the lexicon SentiWordNet to extract sentiments and the results with little manual intervention. As it turns out, the chosen lexicon has improved the accuracy in terms of topic-specific lexical sentiments.

Ensemble Learning Method
Ensemble learning methods are among the top current research topics in machine learning [19]. Machine learning models are used for performing predictive classification in order to achieve a good performance, and special attention has been drawn to sentiment classification tasks. Some of the common ensemble learning methods include Averaging, Bagging, AdaBoost and Staking.

Materials and Methods
An overview of the proposed methodology is shown in Figure 1. It consists of data preprocessing followed by three core modules: (1) aspect extraction using the prior knowledge topic model (SA-LDA) algorithm; (2) automatic labelling (SentiWordNet); (3) ensemble learning classifier (Stacking). The details of each component are described in the following subsections.

Dataset and Pre-Processing
The first module of the proposed methodology consists of data collection and preprocessing. In this module, the data about users' opinions towards different aspects is collected from different online reviews on several domains. Table 1 shows the basic descriptive information of the three datasets used in the experimental analysis. A step-by-step procedure for data collection and pre-processing is outlined in Algorithm 1. The results were generated in a pre-processed textual corpus which contained an opinion unit (sentence) that would be ready to be handled to extract aspects and opinion aspects in the next step.

Dataset and Pre-Processing
The first module of the proposed methodology consists of data collection and preprocessing. In this module, the data about users' opinions towards different aspects is collected from different online reviews on several domains. Table 1 shows the basic descriptive information of the three datasets used in the experimental analysis. A step-by-step procedure for data collection and pre-processing is outlined in Algorithm 1. The results were generated in a pre-processed textual corpus which contained an

Aspect Extraction
The next step of the proposed model pipeline is automatically extracting semantic aspects (which are also called topics) from the pre-processed textual corpus. In this paper, a modified LDA model, called Seeded-Aspects LDA (SA-LDA), is proposed. It has an unlabelled pre-processed textual corpus that contains opinion units of a specific domain and an aspect specification as an input. An aspect specification is known as predefined aspects (seed words). In basic LDA, the model tends to only detect the most obvious aspects of a text corpus which may not cover the expected and desired aspects. Thus, we proposed a modified LDA model by providing seed words (seed aspects) to guide the model to only generate words from analogous seed aspects as presented in Figure 2. a modified LDA model, called Seeded-Aspects LDA (SA-LDA), is proposed. It has an unlabelled pre-processed textual corpus that contains opinion units of a specific domain and an aspect specification as an input. An aspect specification is known as predefined aspects (seed words). In basic LDA, the model tends to only detect the most obvious aspects of a text corpus which may not cover the expected and desired aspects. Thus, we proposed a modified LDA model by providing seed words (seed aspects) to guide the model to only generate words from analogous seed aspects as presented in Figure 2. The SA-LDA at its basis comprises an LDA-based topic modelling, and it is extended with biased topic modelling hyper-parameters (β and α) that are based on continuous word embeddings. The number of aspects (k) is set based on the number of unique main aspects needed. Each review is modelled by an aspect and contains a sentence. The proposed model in plate notation is illustrated in Figure 2, where the generative hypothesis algorithm is described in Algorithm 2. The SA-LDA at its basis comprises an LDA-based topic modelling, and it is extended with biased topic modelling hyper-parameters (β and α) that are based on continuous word embeddings. The number of aspects (k) is set based on the number of unique main aspects needed. Each review is modelled by an aspect and contains a sentence. The proposed model in plate notation is illustrated in Figure 2, where the generative hypothesis algorithm is described in Algorithm 2. For each aspect k = 1 . . . . . . K, For each review d, • Draw a topic Z n ∼ Multi(θ d ).
• Draw an indicator y d,n ∼ Bern π d,n • if y d,n = A: We provided the model with several seed words for each main aspect as shown in Table 2. After feeding in unique aspects and seeded words for each dataset, each review sentence becomes ready for the next phase of the sentiment analysis task as described in the next subsection.

Automatic Labelling System
Automatic labelling uses the sentiment lexicon approach to label data and to measure the sentiment polarity. In order to label a dataset in this work, SentiWordNet is applied. SentiWordNet is obtained from the WordNet dictionary where each word is associated with a numerical score. In this phase, for each sentence, the SentiWordNet dictionary is applied to determine the polarity of each word, and then the polarity of the whole sentence is calculated by adding the polarity of each word. If the word is not in the SentiWordNet dictionary, it is searched for in the WordNet dictionary. WordNet is an English language dictionary that contains synonym words gathered into a set called syn-set. Thus, the analogous words related to the word in WordNet are fetched and searched in the SentiWordNet dictionary such that their sentiment score is selected for polarity calculation. This procedure increases the efficiency and effectiveness of automatic labelling.
Furthermore, some words, called negation words, may affect the sentiment orientation of other words in the sentence. Negation words are those words that reverse the polarity of the sentence when occurring in it. For example, in the text "the food is not good", the negation word "not" reverses the polarity of the sentence. To handle this issue, a negation is considered in the polarity calculation. The algorithm of the automatic labelling phase is illustrated in Algorithm 3. The result demonstrates the label (1 for positive and 0 for negative) and the sentiment polarity. Then, it is used for the next phase, which is the ensemble learning classifier. The labelled dataset is used to train the classification model. The ensemble learning classifier method is used for sentiment classification. Precisely, in the ensemble method, stacked generalization is employed on different classifier algorithms as explained in the next sub-section.

Predicting Polarity of Largescale Social Data Using Supervised Learning (The Ensemble Learning Classifier Method)
An ensemble algorithm is trained on the labelled dataset to classify the unseen reviews as positive or negative on the go. Up-to-date numerous ensemble learning methods have been developed and introduced to enhance the performance of classification tasks. The major purpose of the ensemble models is to combine a set of classifiers with the intention of achieving a better and more reliable predictive performance than a single classifier [43]. The focus will be on the capability of an ensemble model to generate a better result compared to each baseline classifier. In this experiment, a stacked generalization method was used, as shown in Figure 3, because it minimizes generalization error. The idea of stacked generalization is meant to combine the prediction result of several base classifiers in the first level using a meta classifier in the next level in order to minimize the generalization error. The process of performing a stacked generalization with k-fold cross-validation is shown in Figure 3.
The first step includes training the base classifiers in the first level, which are support vector machine, logistic regression, random forest, decision tree, naïve Bayes, and K-nearest neighbours by employing k-fold cross-validation on each classifier. The dataset is divided into k subsets. For each time in k sequential rounds, one of the k subsets is used as the test set and the other k − 1 subset is drawn from the training set. After that, each base classifier generates a prediction. Then, the prediction values from each classifier are combined and provided as the dataset for the second level. Finally, this step includes a training meta classifier on the second level with the first level dataset to produce the final prediction. Algorithm 4 describes the stacked generalization with k-fold cross-validation with k = 10.  The idea of stacked generalization is meant to combine the prediction result of several base classifiers in the first level using a meta classifier in the next level in order to minimize the generalization error. The process of performing a stacked generalization with k-fold cross-validation is shown in Figure 3.

Algorithm 4: Stacked Generalization with k-fold cross-validation
The first step includes training the base classifiers in the first level, which are support vector machine, logistic regression, random forest, decision tree, naïve Bayes, and K-nearest neighbours by employing k-fold cross-validation on each classifier. The dataset is divided into k subsets. For each time in k sequential rounds, one of the k subsets is used as the test set and the other k − 1 subset is drawn from the training set. After that, each base classifier generates a prediction. Then, the prediction values from each classifier are combined and provided as the dataset for the second level. Finally, this step includes a training meta classifier on the second level with the first level dataset to produce the final prediction. Algorithm 4 describes the stacked generalization with k-fold cross-validation with k = 10.

Algorithm 4: Stacked Generalization with k-fold cross-validation
Input: Dataset D, Base classifiers t, base classifier prediction p, meta classifier m Output: Ensemble Classifier Prediction P Apply k-fold CV, k = 10, D n = {D 1 , D 2 , . . . , D 10 } //Split the dataset into 10 subsets for k ← 1 to n do for each t ← 1 to T //base classifiers train the classifier p kt from D n . end for for D p do //generate first level dataset get a dataset D p , where D p = {p t 1 , p t 2 , . . . , p T }. end for train m from D p //meta classifier return P //final prediction

Evaluation Criteria and Experimental Results
The evaluation methods for classification models used in this paper are precision, recall and F-measure, as in [44]. They were used to estimate the performance result of each classifier. We evaluated our classifiers and models according to a 10-fold cross-validation scheme on the datasets.
In this section, we will evaluate and discuss the three main modules of the proposed model. In the first module (aspect extraction), we evaluated the proposed model, named SA-LDA topic modelling. This evaluation relies on two parts: (1) manual evaluation of each extracted aspect; (2) comparison of results with the based topic modelling algorithm regarding each domain.
In the second module (automatic labelling), we tested the accuracy of the proposed lexicon-based approach and verified the results with the manually labelled dataset. We also compared three lexicon-based approaches with the related works and the present results.
In the third module (ensemble classifier), we illustrated the performance of the proposed classifier model for the purpose of aspect sentiment analysis. This evaluation relies on two parts: (1) evaluating the performance and accuracy of the proposed model on three different domains; (2) comparing the proposed model to the baseline classifiers as well as another ensemble method.

Aspect Extraction (SA-LDA Model)
The result shows that SA-LDA extracts valuable aspects and relates them to the main aspect. However, LDA extracts many unrelated aspects along with some adjective words which are considered as opinion words more than aspects. Table 3 compares the results obtained from both models for each domain. The coloured words in 'red' indicate the errors or unrelated aspects. We manually evaluated the model based on the number of words that are related to the seed words/aspect which is our manual evaluation of the models. Even with these upsetting words, the proposed models can produce better results. However, the proposed model is flexible in a way that enables it to be adapted in any domain by specifying the seed words for the needed aspects.
Additionally, when the two results are compared, it is obvious that the proposed model outperforms the baseline model. Tables 3 and 4 illustrate the results of the performance of the two models in light of the three domains. Concerning the accuracy of SA-LDA, as illustrated in Table 4, it is clear that the Restaurant has the highest score with 86.7% while the Movie comes second with a score of 83.3%. Yet, Domestic Saudi Airline has the lowest score of 80%. Conversely, the standard model (LDA) scored lower accuracy results with 54%, 41% and 32% for Movie, Restaurant and Domestic Saudi Airlines, respectively. In conclusion, these results indicate that the proposed model has been more successful in detecting more correlated aspects, and it is likely to yield improved results with better performance.

Automatic Labelling (SentiWordNet)
Sentiment classification is an indication of the task of sentiment analysis which is a subfield of natural language processing. The lexicon approach is applied to extract the opinion of each aspect by using SentiWordNet, which determines whether the text content specifies a positive or a negative review. Opinion extraction and automatic labelling are carried out in three steps: (1) applying part-of-speech tagging to each sentence; (2) extraction of all the opinion words and detecting the polarity of each opinion word; (3) looking for a negation word that is close to any opinion word, and once it is found, the polarity is reversed.
Opinion words are usually represented in the adjective, adverb, and verb forms such as "like" or "really" which affect the final result. For instance, the sentences "I like pizza" and "I really like pizza" both contain positive opinions, but the second sentence is more positive. Opinion words can be identified after applying POS tagging for each sentence, and it is typically found near the aspect.
The accuracy of SentiWordNet performance was measured by applying SVM classifier and five-fold cross-validation. The overall results of the accuracy for each domain are shown in Table 5. The results are compared with the related work where SentiWordNet and SVM classifier have been used for different sentiment analysis tasks.
The results indicate that the accuracy of 'Restaurant' scores has recorded the highest percentage with 69.4%, while 'Movie' comes second with 65%, and the lowest score is recorded by the 'Domestic Saudi Airline' with 63.2%. The percentage distribution of the sentiment polarity for each aspect of the three domains is presented in Figure 4.

Ensemble Classifier (Stacking Generalization)
The performance evaluation of the proposed ensemble classifier model for the purpose of aspect sentiment analysis relies on two parts: (1) making a comparison between the proposed model and the baseline classifiers in addition to another ensemble method on three different domains; (2) evaluating the performance and accuracy of the proposed model on three different domains. Tables 6 and 7 illustrate the comparison between the proposed model and the baseline classifiers as well as three other different ensemble methods including bagging, adaboost and majority voting for the selected domains.

Ensemble Classifier (Stacking Generalization)
The performance evaluation of the proposed ensemble classifier model for the purpose of aspect sentiment analysis relies on two parts: (1) making a comparison between the proposed model and the baseline classifiers in addition to another ensemble method on three different domains; (2) evaluating the performance and accuracy of the proposed model on three different domains. Tables 6 and 7 illustrate the comparison between the proposed model and the baseline classifiers as well as three other different ensemble methods including bagging, adaboost and majority voting for the selected domains.
As outlined in Tables 6 and 7, the proposed model has scored better results compared to the baseline classifiers and other ensemble classifier methods, with an accuracy level of 81.2%, precision of 81.1%, recall of 80.4%, and F1-scores of 81%. The lowest accuracy performance of other ensemble methods is for 'majority voting' with 77.5%. The lowest accuracy performance of the baseline classifier is 'decision tree' with 68.8%, whereas the highest accuracy result is 80.4% for the naïve Bayes classifier.

Conclusions
The main aim of this paper is to develop an efficient model to discover sentiments associated with different aspects of a given text in order to make a more accurate decision from the users' perspective. The main objectives of the proposed system are: (1) Designing an efficient model to identify and extract all the possible aspects from given textual data. This is achieved by using natural language processing (NLP) to prepare the text in a format adopted by a topic model in addition to a topic model that extracts the main topics/aspects in that text. (2) Mapping between the extracted aspects and their opinions using linguistic and statistical techniques through utilizing a topic model and lexicon classification. (3) Developing a sentiment classification model in order to identify the sentiment orientation of the extracted aspect using an ensemble learning classifier.
To evaluate the performance of the proposed framework, we have compared each component to the baseline algorithms for the topic modelling, lexicon-based method and ensemble learning classifiers. The results have shown that the proposed framework is able to predict labels of the three review domains-restaurant, movie, and Saudi airlines-with an accuracy of 83.2%, 84% and 84.4% in each domain, respectively. Furthermore, once the proposed system is compared to the baselines algorithms, better results (higher than 2%) were scored in terms of the ability to predict the labels correctly.
This study has shown some promising results in the field of aspect-based sentiment analysis. It opened the windows wide for further research to enhance and expand this area of research. For future research, the proposed framework could be expanded to handle Arabic texts, which will be a challenging task. Likewise, future studies could apply more resources to the proposed framework to further enhance the results.

Conflicts of Interest:
The authors declare no conflict of interest.