A New Ontology-Based Method for Arabic Sentiment Analysis

Safaa M. Khabour; Qasem A. Al-Radaideh; Dheya Mustafa

doi:10.3390/bdcc6020048

Abstract

Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for Arabic sentiment analysis based on domain ontologies and features’ importance. In this paper, we built a semantic orientation approach for calculating overall polarity from the Arabic subjective texts based on built domain ontology and the available sentiment lexicon. We used the ontology concepts to extract and weight the semantic domain features by considering their levels in the ontology tree and their frequencies in the dataset to compute the overall polarity of a given textual review based on the importance of each domain feature. For evaluation, an Arabic dataset from the hotels’ domain was selected to build the domain ontology and to test the proposed approach. The overall accuracy and f-measure reach 79.20% and 78.75%, respectively. Results showed that the approach outperformed the other semantic orientation approaches, and it is an appealing approach to be used for Arabic sentiment analysis.

Keywords:

sentiment analysis; arabic language; domain ontology; semantic orientation; feature level sentiment analysis; features selection and weighting; domain features

1. Introduction

The Web offers a massive virtual space where users can express and publish their opinions and experiences. People use social media daily as a primary place in a wide range of applications in their lives, not only for social life purposes but also e-learning, e-commerce, politics, and many other applications. In the Middle East (where Arabic is the mother language), Facebook and Twitter were determined as the most prevalent social media websites that affect youth [1,2]. While web content has witnessed an unprecedented increase in size, the process of extracting useful information is becoming more challenging as well [3,4].

Sentiment analysis, or opinion mining, is a type of text mining research that depends mainly on Machine Learning (ML) and Natural Language Processing (NLP) approaches for mining subjective texts [4,5,6,7,8,9]. Sentiment analysis research scope in the field of computer science is rising very quickly [10]. The semantic web is a logical expansion of the World Wide Web, which is intended to make the web more machine-understandable [11]. The ontology is an essential semantic technology used widely for data handling in the semantic web [12,13]. Ontologies facilitate communication between humans and agents; they also describe the domain theories for the explicit representation of the semantics of the data [14] and web interoperability [15]. Ontology is a systematic account of existence [16], where it can be used to formalize and model-specific domain knowledge to be represented and applied in different fields, such as the semantic web, artificial intelligence, system engineering, information architecture, enterprise bookmarking and biomedical informatics. Furthermore, the ontology concept is valuable in text mining applications, such as in-text clustering and classification [9,17], and in-text summarization [18], which can be applied to different domains.

Ontology is usually used for accomplishing two main tasks in sentiment analysis research, either for lexicon creation or for aspect (feature) extraction [19,20,21,22,23,24,25,26]. Sentiment analysis researchers are immediately directed toward ontology-based approaches to represent a common sense knowledge base [27,28]. Tartir and Abdul-Nabi [29] utilized ontology for lexicon creation as they created Arabic Sentiment Ontology (ASO). SenticNet is another example of a concept-based resource, which was created to comprise 5732 single and multi-word concepts along with their polarity scores in the range of –1 to +1 [30]. The use of ontology in such text mining applications has achieved considerable results [31].

Recently, few studies were conducted on sentiment analysis for the Arabic language as compared with those for the English language; several researchers shifted towards the analysis of the Arabic language [32]. Major challenges of Arabic sentiment analysis (ASA) are related to the language nature [33]. The rich morphology of the Arabic language, with which you can express the same meaning in different ways by combining different stems, roots, prefixes, and suffixes. This increases the need to conduct a morphological analysis, where each term is divided into morphemes, and each morpheme combines with morphological information such as root, stem, POS, and affix [5]. This complex challenge raises the need to improve convenient NLP tools to handle morphological analysis, tokenization, stemming, spell checking, lemmatization, part-of-speech tagging, and pattern matching [34,35]. Furthermore, the existence of the various speaking dialects in real life and on the web [8], the rich Arabic synonyms, and the low number of studies that are concentrated on ASA using domain-specific ontologies and features importance, are considered motivations for this research.

This study used ontology to extract features based on ontology concepts, along with concepts’ levels and concepts’ frequencies. Then, it determines the importance of each feature based on its location in the ontology tree and its appearance in the review’s dataset. The proposed semantic orientation approach benefits from domain ontologies and sentiment lexicons for accomplishing ASA to increase the accuracy of the analysis process of the users’ subjective opinions. In this approach, we used ontology to explain some knowledge about the domain features during the feature extraction and selection phases.

We intend to solve the limitations in the previous research on ASA focused on the semantic orientation approaches, where the semantic features in the proposed approach are treated with their different weights of importance in the subjective text. Overall, this paper makes the following contributions:

Building a semantic orientation approach using ontology for mining the different opinions to decrease the effort needed by ordinary users or organizations to make more accurate sentiments classification. The approach is working at the level of semantic features, which are extracted and weighted using the domain ontology.
Using the domain features’ levels to determine the polarity of the overall review. Also, the important domain features from the users’ point of view are used to efficiently calculate the overall semantic polarity of a subjective text. This approach is different from the previous ontology-based approaches in using a weighting method with two factors to identify the different weights of importance for each semantic domain feature.
Evaluating the proposed approach with an Arabic dataset from the hotels’ domain, which was selected to build the domain ontology.

The rest of the paper is organized as follows: The next section discusses related work. Section 3 presents the research methodology and proposed sentiment analysis approach. Evaluation and results are presented and discussed in Section 4, followed by conclusions in Section 5.

2. Related Work

Sentiment classification approaches can be categorized into three fields: Sentimental SO, Machine Learning (ML), and hybrid approaches. Nithish et al. [36] proposed a feature-based sentiment analysis model of the English language using product reviews. They applied the feature level analysis to mobile product reviews and reached 70% accuracy. Thakor and Sasi [19] presented a sentiment analysis approach to classifying negative sentiments in social media content based on ontology. The proposed approach successfully classified 253 negative tweets out of 494 tweets. In [21], the authors proposed a sentiment analysis approach based on Latent Dirichlet Allocation (LDA) topic clusters, domain ontology, and SentiWordNet for Nokia 6610 cellular phone reviews. The precision of the extracted product features was 76.1%.

Alfonso and Sardinha [22] proposed an approach for holding the relationships between aspects, associations of aspects, and their expressions of opinion for aspect-based sentiment analysis using a fuzzy ontology. They tested their approach on the hotels’ domain, where each aspect of the hotel got a score, and then they calculated the total score for the hotel by accumulating the scores of each aspect.

Zehra et al. [23] proposed an approach to construct a recommendation system based on sentiment analysis using ontology. The researchers focused on a Facebook closed group that includes posts and comments about various schools collected randomly. Salas-Zárate et al. [24] proposed an aspect-level opinion mining approach to the diabetes domain using ontologies to identify the aspects related to diabetes in the tweets.

Lazhar and Yamina [37] examined the effectiveness of domain ontologies in ASA. Mahyoub et al. [38] presented in their study a sentiment lexicon for the Arabic language when the proposed system worked to specify the sentiment scores for each word included in the Arabic WordNet. The accuracy of the classification reached 96%. Soliman et al. [39] presented an approach to building a Slang Sentimental Words and Idioms Lexicon (SSWIL) of opinion words. They also worked to categorize Arabic news comments on Facebook separating the SVM classifier into two classes: satisfy and dissatisfy, with an accuracy rate of 86.86%.

ML approaches use ML techniques such as Naive Bayes (NB), Support Vector Machines (SVM), Bayesian Network (BN), Maximum Entropy (ME), and Neural Network (NN) for building classifier models [12,40,41,42,43,44,45,46]. Several ML approaches were proposed for sentiment analysis of standard or dialect Arabic tweets dataset based on classes of polarity [47,48,49,50,51,52] or using the rough set-based concepts [53]. Tripathy et al. [54] presented an ML approach for English language sentiment classification. Jagdale et al. [55] applied sentiment analysis to the English language using ML techniques on a dataset collected from Amazon about different products reviews. Other studies considered sentiment analysis for different languages, such as Chinese [56], Turkish [57], and Lithuanian [58].

Hybrid approaches combine different semantic orientation approaches with different machine learning approaches to improve the results of the sentiment analysis process [59,60,61]. Several studies proposed a hybrid approach for sentiment analysis of different Arabic dialects tweets [62,63,64] and tweets of product reviews [65].

We benefit from these studies to build an enhanced approach for ASA using the ontology model. Tartir and Abdul-Nabi [29] focused on the semantic relations between sentiments and their instances to present a semantic orientation approach. In other semantic orientation approaches such as the studies of Thakor and Sasi [19,20,21,22,23,24], they focused on the use of ontology for feature identification and extraction without considering any other information from the ontology tree such as the levels of features, while El-Halees and Al-Asmar [25] used the levels of features to calculate the polarity, by multiplying each feature level with its sentiment polarity, where the levels indicate the feature importance. In this research, we used the ontology to identify and extract the domain features and their levels, while at the same time the frequencies of these features in the review’s dataset are also used to identify the importance of each feature.

3. Method

This section presents and discusses the methodology followed in this study. The first subsection describes the overall approach design. The Arabic resources used in this work are described in Section 3.2, while the third subsection describes the main research phases and the entire steps in each phase in more detail.

3.1. Overall Approach Design

The overall methodology to classify Arabic textual reviews based on sentiment analysis using ontology is divided into five main phases, and each phase has several steps, as illustrated in Figure 1.

Figure 1. Overall approach design.

3.2. Description of the Arabic Resources

To illustrate the steps of the proposed method, it is beneficial to introduce the Arabic resources that were used in the evaluation; this will help the reader to gain insight into the proposed method. We used ElSahar and El-Beltagy [66] dataset to extract the domain-specific ontology and to evaluate and test the model. The overall dataset comprises around 33 thousand automatically annotated reviews in various domains which are: movies, restaurants, hotels, and products. Also, domain-specific lexicons contain about two thousand entries semi-automatically generated from the reviews.

The hotel reviews dataset contains around 15 thousand Arabic user reviews, extracted from the TripAdvisor website. The authors employed the open-source Scrapy framework, for establishing custom web crawlers. Table 1 describes the general statistics of the hotels’ datasets of ElSahar and El-Beltagy [66]. Table 2 holds a sample hotel review from the dataset, where each row is considered as a user opinion on a particular hotel and the identified polarity for that review. We added the review translation.

Table 1. Hotel reviews’ statistics of ElSahar and El-Beltagy dataset.

Table 2. Sample hotel review from ElSahar and El-Beltagy dataset.

Previous ASA studies suffered from the unavailability of adequate resources that classify the opinion words (sentiment lexicons). Although there exist some efforts to build lexicons in Arabic, they still have limitations such as unclear usability, small size, and non-publicly shared lexicons. ArSenL is an Arabic SentiWordNet lexicon developed by Badaro et al. [67] to solve the previously mentioned limitations. The developers created the first large-scale publicly shared resource for opinion mining in standard Arabic. Their lexicon was built based on three different available resources: English sentiWordNet, the Standard Arabic Morphological Analyzer (SAMA), and Arabic WordNet.

Two values are attached with each existing lemma entry in the lexicon which indicates the positive and negative polarity scores. It contains four types of Part of Speech (POS) tags (adjective, noun, verb, and adverb). The lemmas are presented in Buckwalter’s (2004) format to facilitate the NLP processes. ArSenL contains a total of around 28,760 lemmas and 157,969 Synsets which is considered a large-scale Arabic sentiment lexicon. Table 3 provides a sample of the ArSenL lexicon content; we added a column that represents each sentiment in an Arabic form and its translation in English as well.

Table 3. Sample of ArSenL lexicon.

3.3. Main Phases of the Approach

This section aims to briefly describe and discuss the main phases depicted in Figure 1 by explaining the steps and processes which are used for each phase.

3.3.1. Ontology Building

For the proposed semantic orientation approach of sentiment analysis, we need to build domain ontology. This ontology is used as a domain concept dictionary to extract the domain features with their importance. In this phase, we built domain ontology by extracting the concepts that are relevant to the hotel domain using Latent Dirichlet Allocation (LDA) with manual approaches. Two lists of domain concepts are generated; one of them is extracted using the LDA algorithm, and the other list is extracted from the dataset manually because LDA ignores the concepts with low frequencies [26].

Figure 2 provides a graphical representation of LDA topic modeling. The Latent Dirichlet Allocation (LDA) model, proposed by Blei et al. [68], is an unsupervised method that is well-known in text mining applications. It can recognize the latent topics from several documents automatically [26]. LDA is used to arrange a document text into specified topics. It generates topics per documents model and words per topic model, using Dirichlet distributions [69]. Each topic is a collection of keywords, and each keyword participates in a specific weightage to the topic [68]. Variables and parameters which appear in Figure 2 of the LDA model are interpreted as: D is the number of documents in the corpus, N is the number of words in a specified document, A is the Dirichlet prior parameter on the topic distributions per document, Β is the Dirichlet prior parameter on the word distribution per topic, Θ is the topic distribution for a specified document, Φ is the word distribution for a specified topic k, TP is the topic assignment for a word in the specified document, and W is the specified word.

Figure 2. Graphical model representation of LDA.

In the proposed approach, at first, the LDA is used to generate topic clusters from the dataset where each topic contains a group of keywords. To implement the LDA model using Python, Algorithm 1 is used. The portion of the dataset which is assigned for building ontology is imported in Python. Several preprocessing steps are utilized to normalize reviews’ sentences, tokenize them into words, and remove unnecessary words. Two inputs are required for running the LDA modeling which are the dictionary and the corpus that report the distinct words and their repetitions in the training data. The Term Frequency-Inverse Document Frequency (TF-IDF) transformation is applied to the entire corpus, and then the LDA is run. The resulting topics contain keywords unlike to be domain concepts such as sentiments [21], so human evaluation is used to filter these topics and to judge each keyword to determine suitable domain concepts. Table 4 provides a sample of LDA-generated topics from the dataset of ElSahar and El-Beltagy [66], where the keywords in bold represent possible domain concepts.

Algorithm 1 Building the LDA topic model

Input: Hotel Reviews Dataset
Output: Topics with Keywords

1-: Load the hotel reviews dataset.
2-: Preprocess the reviews dataset:

Normalization.
Tokenization.
Stopword Removal.

3-: Apply Bag-of-Words (BoWs) on the dataset.
4-: Apply TF-IDF transformation to the entire corpus.
5-: Train the LDA topic model using the Gensim module.
6-: Present the topics of the LDA model.

Table 4. Sample of the generated LDA topics and keywords.

For the manual list of concepts, human evaluators are contributed to extracting domain concepts from a set of reviews manually, and then the extracted concepts are compared with the list of concepts using LDA and combined the two lists. The evaluators read the final list to identify the distinct concepts and their synonyms; also, they identify the relationships between them to determine their positions from the top to the bottom of the ontology tree. The final ontology is presented using the Protégé tool [70] to facilitate identifying the level for each concept, where the classes and subclasses represent concepts and sub-concepts for that domain [71]. We used the Protégé tool only to draw the ontology instead of manual drawing.

After identifying the concepts, the Arabic WordNet browser and Google translation are used to search for more semantic Arabic synonyms for each concept. This phase aims to extract all semantic domain features and all words that have the same meaning as the domain features. Table 5 provides an example of semantic synonyms for extracted hotel concepts from the dataset of ElSahar and El-Beltagy [66]. Table 6 shows the total number of distinct domain concepts and the total number of levels in the constructed hotel ontology.

Table 5. Example of semantic synonyms for extracted hotel concepts.

Table 6. Characteristics of the constructed hotel ontology model.

For each concept, the level is identified using the Protégé structure, we assume that the highest level (6) is at the ontology tree root and the lowest level (1) is at the last bottom feature in the ontology tree. Furthermore, for each concept, we identify the total frequency by calculating the sum of the concept’s frequency and its synonyms’ frequencies, and then two important weights are calculated for each of them. All the needed information from the ontology is stored in a separate file as a domain concepts dictionary. Each row in the domain concepts dictionary consists of Domain_Concept, Concept_Level_Importance, Concept_Frequency_Importance, and List_of_Synonyms.

3.3.2. Text Preprocessing

The reviews dataset is unstructured and contains stopwords, so it needs to be preprocessed. Text preprocessing is intended to make the reviews consistent and to represent them in some standard form to facilitate conducting systematic processes. Some NLP processes were used to preprocess the textual reviews. These processes include sentence tokenization, normalization, stopword removal, word tokenization, POS tagging, and stemming. Table 7 provides an example for each of them. English translation of the Arabic input is added.

Table 7. Example of a user review after each of the preprocessing steps.

3.3.3. Domain Features and Initial Polarity Identification

This phase aims to distinguish the domain features and sentiment words using the POS, where the nouns are considered as candidate domain features for identification and extraction using the domain dictionary. The noun tags using the Stanford POS tagger [72] are NN, DTNN, NNP, DTNNP, NNS, DTNNS, NNPS, DTNNPS, NOUN, NOUN_QUANT. The other words such as adjectives, verbs, and the residual nouns which were not found in the domain dictionary are considered candidate sentiment words to match with the lexicon [37].

To extract the sentiment words around each domain feature, the N-gram-around method achieves considerable results in identifying the sentiment words related to each domain feature [20,24]. The initial polarity for the domain feature is calculated based on the sum of the positive scores and the sum of the negative scores for the sentiment words which are extracted using the N-gram-around method.

To search and match each sentiment word with the lexicon, three methods are used: the original word is matched with the lexicon; if not found, the word stem is matched with the lexicon; and if not found, the word root is matched with the lexicon. If neither the word nor its stem nor its root is found, its sentiment polarity is considered zero. For this step, we used the Tashaphyne stemmer [73], which is supported in Python, to generate both stems and roots.

Negations and intensifiers are handled during identifying sentiment words’ polarities. Negation in the Arabic language is expressed by adding (مش/لا/لن/عدم) “not” before a verb, noun, or adjective. If any of the negation terms appears before a sentiment word, it counters the meaning of that word; adding a negation particle before a positive word would make it negative, and vice versa. For example, in the sentence (الادارة مش أمانة), the word (أمانة) is positive and its positive and negative scores using the ArSenL lexicon are (0.083, 0.05), respectively. When the negation particle (مش) comes before it, its scores change to (0.05, 0.083), which is negative. Intensifiers in the Arabic language, such as (كثيرا جدا/تماما) are added after a sentiment word to emphasize the meaning and indicate the strength of the meaning. So, we consider that when they appear after a sentimental word, the polarity for that word is doubled. For example, (الفندق رائع) is a sentiment word with positive and negative scores of (0.402, 0.069), respectively. After adding (جدا) to the sentence, its scores changed to (0.804, 0.138). Table 8 provides an example of this phase.

Table 8. Example of domain features and initial polarity identification.

3.3.4. Overall Semantic Review Polarity Calculation

Based on the extracted semantic domain features for each review, we need to calculate a total semantic review polarity. The initial polarity of each domain feature is affected by the importance of that feature. The Formula (1) is used to calculate overall semantic review polarity based on semantic features’ importance:

\begin{matrix} Overall Semantic Review Polarity = \\ \sum_{i = 1}^{n} Initial Polarity (DFi) x (L (DFi) + F (DFi)) \end{matrix}

(1)

where n is the number of the extracted domain features from a review, DFi represents the specific domain feature that has an initial polarity, L represents the level of importance of the domain feature (DFi), which is identified based on its level in the ontology tree, and F represents the frequency importance of the domain feature (DFi) which takes the following values—0.1, 0.25, 0.50, 0.75, 1—to indicate its importance from domain users’ point of view. Since features’ levels are not dependent on the dataset, we consider the domain feature frequency to represent the importance of the domain features as they are repeated in the dataset. High frequent domain features in the dataset means that users are more interested in those features in that domain than the other ones. Domain features are divided into five groups based on their frequencies; the most frequent features in the dataset get the highest importance value as (1), and so on, whereas the lowest frequent features get the lowest importance value as (0.1). We experiment with assigning different weights for this factor for each group of domain features. We noticed that these weights of importance have improved the performance of the semantic orientation sentiment analysis.

The review label is determined as positive (+1) if the overall semantic review polarity is greater than or equal to zero because the third class (neutral) is ignored in the proposed approach and we noticed that the number of reviews where their total semantic polarity exactly equals to zero are very few in the dataset, so we considered them as positive reviews. Conversely, the review label is determined as negative (−1) if the overall semantic review polarity is less than zero. Table 9 illustrates the phase of calculating the overall semantic review polarity using the previous phase example.

Table 9. Example of calculating overall semantic review polarity.

The overall review polarity is considered positive, although the review contains one feature that is considered positive with an initial polarity of (+0.33757), and one feature that is considered negative with an initial polarity of (−0.51666), where the negative feature has the higher initial value. Since the positive feature has higher importance than the negative feature; the total importance of the positive feature is (6) and for the negative feature is (2.25).

3.3.5. Performance Evaluation

In this phase, some performance evaluation metrics are used to measure the performance of the proposed approach, and to compare it with some other semantic orientation approaches used by researchers in the literature. The performance evaluation measures are accuracy, recall, precision, and f1-measure. Referring to [54], the precision and recall measures can be computed for the positive class using the following equations:

Precision (Positive) = \frac{TP}{TP + FP}

(2)

Recall (Positive) = \frac{TP}{TP + FN}

(3)

where:

TP (True Positive): represents the number of reviews that are classified as positive in both original classifications and predicted classifications.
TN (True Negative): represents the number of reviews that are classified as negative in both original classifications and predicted classifications.
FP (False Positive): represents the number of reviews that are classified as positive in the predicted classifications, while classified as negative in the original classifications.
FN (False Negative): represents the number of reviews that are classified as negative in the predicted classifications, while classified as positive in the original classifications.

4. Results and Discussion

This section describes and discusses the conducted experiments for performance evaluation. We have implemented an automatic framework that combines several tools and libraries. The software architecture is depicted in Figure 3. We used a Python version of 3.7 and worked on anaconda 3 with the following libraries and modules: Pandas, Gensim, NLTK, CLTK, PyArabic, PyAramorph, Stanford POS Tagger, and Tashaphyne. Pandas offer a Data Frame Object for quick and effective data handling along with integrated indexing; tools capable of reading and writing data between in-memory data structures and various formats such as text, Excel, and CSV files [74]. A python dictionary is a Python data structure that consists of a set of (key: value) pairs, where the keys are unique within one dictionary. The main functions of a dictionary are storing and extracting values using their keys [75]. We used nested dictionaries, where a collection of dictionaries is inside one single dictionary.

Figure 3. The tools and libraries that are used for semantic orientation evaluation.

4.1. Dataset Balancing

We examine the proposed approach using the hotel reviews dataset of ElSahar and El-Beltagy [66] which was presented in Section 3.2. The dataset consists of unbalanced classes because it contains different sizes of positive, negative, and neutral reviews. At first, the neutral reviews were excluded based on the assumption that neutral texts are located close to the boundary of the binary classifier. Moreover, neutral texts are supposed to be less informative in comparison with clear positive or negative texts [76].

After that, we balanced the remaining positive and negative reviews using the under-sampling method. The objective of using under-sampling to balance the reviews is to gain a high performance of classification and to prevent the classifier from acting biased toward the majority group examples [77]. The random under-sampling is a non-heuristic method that is used to balance class sizes through the random elimination of majority class examples to make them equivalent to the smallest class size [78].

Table 10 shows the size of each class before and after class balancing. The balanced reviews dataset consists of 5294 hotel reviews (2647 positive reviews and 2647 negative reviews). 3294 hotel reviews are used for domain ontology extraction using LDA and the manual approach. The main goal is to extract the domain ontology based on the available review’s dataset. The remaining 2000 hotel reviews (1000 positive reviews and 1000 negative reviews) are used for ASA experiments, to evaluate the effectiveness of the proposed approach. The authors of [20] divided the reviews dataset into a similar approach for the same purposes.

Table 10. Dataset statistics before and after class balancing.

4.2. Lexicon Baseline Evaluation

The lexicon baseline approach is selected for the comparison since the lexicon baseline approach does not consider the domain concepts to identify review polarity; it simply used a sentiment lexicon to extract all the words from the review with their polarities. The ArSenL lexicon of Badaro et al. [67] is used in this experiment. Table 11 and Table 12 present the confusion matrix and performance measures of the lexicon baseline approach.

Table 11. Confusion matrix for lexicon baseline approach.

Table 12. Performance evaluation of lexicon baseline approach.

The confusion matrix of the lexicon baseline approach shows that the number of correctly classified positive reviews is 898, and the number of correctly classified negative reviews is 596. The number of incorrectly classified positive reviews is 404, and the number of incorrectly classified negative reviews is 102. The overall precision of the lexicon baseline approach is 77.17% with a higher precision value for the negative reviews; the opposite is the case with the recall since the higher recall value is for the positive class with an overall recall of 74.70%. The overall f-measure value is 74.10%.

4.3. Ontology Baseline Evaluation

The hotel ontology, built of 203 concepts and 6 levels, is used in this experiment as a domain concepts dictionary for features selection. The domain features are considered the best semantic features to represent each review. The hotel concepts, along with the noun POS tags, are used to identify the domain features and calculate their polarities using the N-gram around method with N = 4. 4 words before and 4 words after each domain feature are extracted and searched in the ArSenL lexicon to identify its polarity. The confusion matrix of this approach is shown in Table 13. The number of true-positive reviews is 929, and the number of true-negative reviews is 638. The number of false-positive reviews is 362, and the number of false-negative reviews is 71. Table 14 presents performance measures of the ontology baseline approach. The overall precision is 80.96% with a higher precision value for the negative reviews. The overall recall is 78.35%, where the positive class obtained a higher recall value. The overall f-measure is 77.87% with the higher f-measure value for the positive class.

Table 13. Confusion matrix for ontology baseline approach.

Table 14. Performance evaluation of ontology baseline approach.

4.4. Ontology with Level Importance Evaluation

The ontology with level importance approach is utilizing the ontology for both domain features extraction and domain feature importance identification based on their levels in the ontology tree. The hotel dictionary which was built based on the extracted ontology is used to determine the hotel features and their levels. The confusion matrix of this approach is depicted in Table 15. This approach predicted 938 and 644 reviews truly from the original one thousand positive reviews and one thousand negative reviews, respectively. The number of falsely predicted reviews from the original negative reviews is 356, and the number of falsely predicted reviews from the original positive reviews is 62.

Table 15. Confusion matrix for ontology with level importance approach.

Performance measures of ontology with the level importance approach are shown in Table 16. The negative class precision is 91.21% which is higher than the precision of the positive class, and the average precision of this approach is 81.84%. The positive class recall is 93.80% which is higher than the negative class recall, and the average recall for both classes is 79.1%. The average f-measure is 78.63% with a higher value for positive reviews.

Table 16. Performance evaluation of ontology with level importance approach.

4.5. Ontology with Level and Frequency Importance Evaluation

In this experiment, we extracted the hotel features by matching the ontology concepts with the identification of their levels and their frequency importance, so the hotel concepts dictionary is used in this experiment to identify the three elements. Table 17 and Table 18 present the confusion matrix and performance measures of ontology with the level and frequency importance approach.

Table 17. Confusion matrix for ontology with level and frequency importance approach.

Table 18. Performance evaluation of ontology with level and frequency importance approach.

The number of correctly classified positive reviews using this approach is 937, and the number of correctly classified negative reviews is 647. The number of incorrectly classified positive reviews is 353, and the number of incorrectly classified negative reviews is 63. The performance measures that are presented in Table 18 demonstrated that the proposed approach achieved an overall precision of 81.87% with a higher precision value for the negative reviews, and it achieved an overall recall of 79.20% with a higher recall value for the positive class. The f-measure value is 78.75%.

4.6. Results Summary and Discussion

Using ontology with domain features‘ importance in the two approaches, we observed the following: the ontology with level importance and the ontology with level and frequency importance have the best results through all the semantic orientation approaches with a minor difference between them. Figure 4 summarizes results for the four schemes described earlier on average of positive and negative.

Figure 4. Performance evaluation of ontology for the four implemented schemes.

A comparison between different state of the art approaches for ASA is depicted in Figure 5. It reveals that the first approach yields 79.10% as accuracy. The second approach yields 79.20% as accuracy. This may indicate that the way we utilized the concepts’ frequencies in the formula needs improvement to increase the enhancement of the proposed approach. Although the difference between their performances is small, the suggested method that incorporates two factors to represent semantic domain features importance still has comparable results to other approaches.

Figure 5. Accuracy of different state of the art approaches for ASA.

Combining domain ontology with the lexicon baseline approach showed an improvement up to 3.65% on accuracy value. The lexicon baseline approach did not apply any feature selection method; it just extracted all review words. Combining domain features’ importance using two factors with the ontology baseline approach presents an improvement reached 0.85% for the accuracy value. Finally, the proposed approach improved the lexicon baseline approach by 4.5% for accuracy.

A comparison between the proposed approach with some state-of-the-art deep learning, machine learning, and aspect-based classifiers used for ASA is provided in Table 19. We have selected approaches that have used in common the same sentiment lexicon in [67], or the same hotels domain dataset in [66] for aspect-level-based methods.

Table 19. A comparison with some state-of-the-art approaches for ASA.

Al-Sallab et al. [79] presented A Recursive Deep Learning Model for Opinion Mining in Arabic (AROMA). AROMA was tested on three Arabic datasets that were varied in writing styles and genres. Their method on the second dataset obtains an accuracy that is similar to our approach accuracy, which was (79.2%). Baly et al. [80] presented another deep learning approach for opinion mining using Recursive Neural Tensor Networks (RNTN). Their method obtains a slightly higher accuracy rate than our approach, where the best value of accuracy was 80%. Mataoui et al. [81] and Mohammad et al. [83] methods were based on aspects of detection and extraction of hotel datasets. In comparison with their experimentation results, which were 74.39% and 76.42% accuracy, respectively, our proposed approach of sentiment analysis based on domain aspects detection, outperformed the first method accuracy by 4.81%, and the second method by 2.78%.

5. Conclusions

In this paper, we propose a semantic orientation approach for ASA using ontology. It incorporates a semantic domain features importance weighting method. The approach works at the feature level using an ontology of the domain concepts to extract the semantic features. It combines different factors which are: features’ levels in the ontology tree, and features’ frequencies in the dataset to generate overall semantic review polarity based on domain features’ importance. The conducted experiment for the ontology with the level and frequency importance approach and the obtained results from this experiment demonstrated that using the frequency importance factor along with the level importance factor as an indication for the domain feature importance can increase the performance of the lexicon baseline and ontology baseline approaches with overall accuracy and f-measure values reach to 79.20% and 78.75%, respectively. The proposed approach can be comparable with the state-of-the-art methods for sentiment analysis in the Arabic language.

During this work, many limitations were faced, including the unavailability of suitable Arabic ontology for the selected domain and the unavailability of adequate lexicons for the different Arabic dialects. Future work can be derived based on these limitations: (1) Using a fully automatic approach to extract the domain ontology from the dataset available; (2) Building and using sentiment lexicon for different dialects in the Arabic language, as well as the lexicon that is used for the standard Arabic; (3) Building and using domain-specific sentiment lexicon for different domains.

Author Contributions

Conceptualization, S.M.K. and Q.A.A.-R.; methodology, S.M.K. and Q.A.A.-R.; software, S.M.K.; validation, S.M.K., D.M.; formal analysis, D.M.; investigation, S.M.K. and D.M.; resources, D.M.; data curation, S.M.K. and D.M.; writing—original draft preparation, S.M.K. and D.M.; writing—review and editing, S.M.K. and D.M.; visualization, S.M.K. and D.M.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement