Multi-Attribute Online Decision-Making Driven by Opinion Mining

: With the evolution of data mining systems, the acquisition of timely insights from un-structured text is an organizational demand which is gradually increasing. The existing opinion mining systems have a variety of properties, such as the ranking of products’ features and feature level visualizations; however, organizations require decision-making based upon customer feedback. Therefore, an opinion mining system is proposed in this work that ranks reviews and features based on novel ranking schemes with innovative opinion-strength-based feature-level visualization, which are tightly coupled to empower users to spot imperative product features and their ranking from enormous reviews. Enhancements are made at different phases of the opinion mining pipeline, such as innovative ways to evaluate review quality, rank product features and visualize opinion-strength-based feature-level summary. The target user groups of the proposed system are business analysts and customers who want to explore customer comments to gauge business strategies and purchase decisions. Finally, the proposed system is evaluated on a real dataset, and a usability study is conducted for the proposed visualization. The results demonstrate that the incorporation of review and feature ranking can improve the decision-making process.


Introduction
Improvements in information and communication technologies break down geographical boundaries, allowing for faster connection and communication worldwide [1]. Further, the proliferation of social networks due to the ubiquity of Web 2.0 revolutionizes the way people present their opinions by providing social interactions [2,3]. As a result, consumers from all over the world are sharing their emotions, opinions, evaluations and judgments to a wide ranging audience by connecting themselves to online platforms such as blogs, newsgroups, discussion boards, and social networking sites [4][5][6]. Consequently, the Web consists of huge volumes of publicly available opinion data about different objects, for instance, individuals, government, products, events, organizations, services, education, news [7,8]. The volume of opinion data about different entities (individuals, products, events, organizations, services) is growing rapidly on these platforms due to the accessibility, scalability, and enhanced user participation of Web 2.0. The fast-growing opinion of review quality and feature ranking. A feature-based opinion summary with ample visualization may be more valuable than a summary showing only an average rating for features of a target product [41]. However, existing review quality evaluation methods are not integrated with feature ranking, opinion visualizations, and user preferences and they overlook a few parameters, such as visitor count and title information. Therefore, there is a need for an integrated system that ranks, analyzes, summarizes and visualizes these online reviews to fulfil the requirements of consumers and enterprises.
In the light of above discussion, the motivation of the work is to (i) remove low-quality reviews from feature ranking as suggested by Ref. [47], (ii) enhancing the feature ranking by incorporating missing parameters, and (iii) improving existing opinion visualization to provide opinion-strength-based summary. Therefore, this study aims to propose and develop a reputation system to provides users a multi-level analysis and summarization of consumer opinions. Specifically, the objectives of this paper are to propose (i) a review ranking method incorporating vital parameters and user preferences, (ii) a feature ranking method based on indispensable parameters, and (iii) opinion strength-based visualization.
The main contributions include: (a) Scheme for the selection of high-quality reviews by incorporating users' preferences.
(b) Feature ranking scheme based on multiple parameters for a deeper understanding of consumers' opinions. (c) Opinion-strength-based visualization based on high-quality reviews to provide highquality information for decision-making. The proposed visualization provides a multi-level detail of consumers' opinions (ranging from −3 to +3) on critical products features at a glance, which allows entrepreneurs and consumers to highlight decisive product features having a key impact on the sale, product choice and adoption. (d) Reputation system is evaluated on a real dataset (e) Usability study for the evaluation of the proposed visualization The rest of the paper is organized as follows. Section 2 presents existing work on review quality evaluation, feature ranking and opinion visualizations. Section 3 presents the proposed system. Section 4 presents the results and discussion, and finally, Section 5 concludes the paper.

Related Work
Existing studies on review quality evaluation, feature ranking, and opinion visualizations are presented in this section.

Review Quality Evaluation and Review Ranking
It is difficult for customers and enterprises to identify high-quality reviews projecting the true quality of a target product due to the massive volume of reviews. Existing studies of review quality evaluation have focused on a number of features such as helpfulness votes, rating, review length and term frequency [20,48,49]. In Ref. [30], five feature classes: Lexical (uni-gram and bi-gram), Structural (i.e. length, number of sentences), Syntactic (nouns, adjectives, verbs), Meta-data (rating) and Semantic (features and opinion words) of a review were explored to predict the quality of a review. Review length, user ratings, and term frequency were found to be significant in review quality prediction. Similarly, the experimental results performed by Ref. [29] highlighted the shallow syntactic features such as verbs, nouns, and interjections as the strongest predictor of review quality prediction [29]. The authors in Ref. [50] utilized three more feature sets reviewer history features, reviewer profile features, and readability features to identify the helpfulness of a review. Their results demonstrated a correlation between these feature sets and the perceived helpfulness of reviews. Ref. [38] pinpointed the helpfulness of a review from three different perspectives: the writing style of the review, the reviewer's expertise, and the timeliness of the review. The experimental results of Ref. [26] found two main features review length and the number of product features to be significant while ranking reviews [26]. Ref. [51] proposed a review ranking scheme for book reviews based on the number of features. A score is assigned to each review on the basis of the number of features that appeared in the review. Then reviews were ranked according to the assigned score. The review ranking scheme outclassed the helpfulness votes-based ranking scheme. Ref. [42] extended the previous scheme by incorporating the number of opinion words with the number of features for review ranking. The extended scheme outperformed the term frequency-based scheme.
In Ref. [20], more weightage is given to the title of the review as compared to the body while computing review ranking. According to the authors, the title of the review conveys the overall mood of the reviewer and an effective summary of the review. However, this work only considers opinion words from the title and ignores product features. The pioneer work that included integration review quality evaluation, feature ranking, and user preferences were presented by Shamim et al. [32], wherein a review quality evaluation scheme based on user preferences and four other parameters including helpfulness ratio, review rating, number of features and opinion words were proposed. In terms of user preferences, the users are allowed to perform the following: (i) adjust the weight of each parameter and (ii) select reviews for features ranking.
However, Ref. [52] ignored the title of the review in ranking reviews and features, and Ref. [26] ignored product features in the review title for ranking reviews. To address these limitations, the method proposed in the current study to rank the reviews calculates separate weights for title and body of the based on both the number of features and opinion words expressed in the title and body of a review along with metadata feature (review rating and helpfulness ratio). We enhanced the previous review ranking scheme in Ref. [52] by including the title score. Further, in contrast to Ref. [26] in which authors considered rating for review raking, our work does consider multiple parameters: (i) feature frequency, (ii) number of opinion words, (iii) accumulated strength of associated opinion words, and (iv) title information of a review. The reason to include the title information of a review in ranking is that the title highly summarizes a review and presents the overall opinion of the reviewer [42]. Considering the significance of the title, the title score is included in the review ranking and is associated with the weight coefficient α.

Feature Ranking
The pioneer work on feature-based opinion mining was done by Hu and Liu, [53] to mine and summarize customers' reviews by developing a system called feature-based summarization (FBS). This work aimed to identify product features, and opinion orientation for each feature and present a summary of the identified features and corresponding opinions in a textual form. In this work, the authors' utilized the classification based on associations (CBA) system using an Apriori algorithm to extract frequent explicit features. The adjective synonyms and antonyms of Wordnet are utilized in FBS to identify the opinion orientation of opinion words. An average accuracy of 84 % was achieved by FBS for opinion orientation.
In the literature, a variety of feature ranking schemes are available that rank features on the basis of feature frequency, opinion words, star rating, and semantic polarities. The most popular feature ranking approach ranks features based on the frequency of a feature [44,45]. For instance, Ref. [44] utilized feature frequency for feature ranking. Likewise, the feature frequency-based PageRank algorithm was revised for product ranking [45], and the findings of this approach showed promising results. In Ref. [42], the authors utilized the number of associated opinion words with each feature to rank features. The experimental results of this ranking outperformed ranking based on feature frequency. The previous ranking approach was enhanced by Ref. [43] by integrating guidelines defined by the review website with the number of associated opinion words. The outcomes of this integration exhibited significant improvement in the accuracy of the existing system [53]. The ranking approach of Ref. [42] was also extended by Ref. [54] such that the authors incorporated review rating with opinion words for feature ranking. The results of the extended approach outclassed the frequency-based method. Correspondingly, the amalgamation of review rating and opinion polarity resulted in higher precision than the frequency based method [41]. Semantic polarity with feature frequency was also deployed in ranking features, and this method achieved 92% precision [55]. Ref. [52] provided two types of feature ranking, positive and negative, based on the semantic orientation and intensity (strength).
In the context of feature ranking, Refs. [44,45] utilized only feature frequency, Ref. [43] targeted feature frequency and opinion words, Ref. [52] ignored opinion and feature frequency, Ref. [55] overlooked opinion strength and opinion frequency. Therefore, the current work proposed new methods for feature ranking based on imperative parameters (i) title count, (ii) review count, (iii) accumulated opinion strength, (iv) feature frequency, and (iv) opinion orientation count. These parameters are described in detail in Section 3. The current work contributes to the literature of feature ranking by providing four types of ranking, ranking by weight, feature by positive credence, ranking by negative credence, and overall ranking based on novel ranking methods.

Opinion Visualizations
The existing literature highlights a variety of visualizations that have been utilized to show consumer opinions, including bar charts, radials, pie charts, graphs and maps. Radial visualization was deployed in the opinion wheel and rose plot to present hotel customer feedback and sentiment contents from a large number of documents, respectively [24,56,57]. Graphs are used for opinion visualization and include coordinated graphs [57], line graphs and pie charts [58], positioning maps [59], comparative relation maps [16] and bar charts [52]. The contradictory comments on the 'Da Vinci Code' (bestseller and controversial novel) was visualized using a coordinated graph [57]. The positioning map [59], comparative relation map [16], and bar chart [60] provide competitive intelligence by comparing competitive product based on key features.
A scalable method 'visual summary report' for comparing several products and features at a glance was proposed in Ref. [61]. The glowing bars [62] and bars with different shapes [31] present a visual analysis of the really simple syndication RSS news feed. The treemap in Ref. [39] presents a summary of car reviews in which prominent keywords are rendered as boxes. The size of the boxes indicates the number of sentences in which a specific keyword has appeared. The color of the boxes specifies the average opinion of a keyword ranging from red to green to encode the opinion tendency (red for negative opinion and green for positive opinion). The treemap provides multi-dimensional information, such as the most common keywords, the average semantic associated with keywords, and the most positive and negative keywords.
The treemap [39] and bar chart [52] are unable to present opinion strength (that ranges from −3 to +3) on each feature of a target product. Therefore, the treemap is enhanced in this work to present opinion strength (that ranges from −3 to +3) on each feature of a target product. We selected the treemap visualization based on the finding of a usability study with 146 participants performed in our previous work to identify a suitable opinion visualization [63]. This work contributed to opinion visualization literature by proposing opinion-strength-based visualization to provide a multi-dimensional view of consumers' opinions by displaying the comparison of positive and negative opinions at various levels of (+3 to −3) of opinion strength and significance of a feature.

Theoretical Framework
Let document D with product reviews contain n reviews R= [r 1 , r 2 , r 3 , . . . ., r n ]. Every review (r k ) is comprised of a set of feature-opinion pair, which consists of a feature f j and an opinion word OPW o . Each feature f j may pair with more than one opinion words in a single review or over the set of n reviews. In our proposed system, each review r k is represented by a tuple (termed as review tuple) of two elements MD r k , B r k . The review tuple is as follows: Review = r k = MD r k , B r k where MD r k = [MD r k HR, MD r k Rating, MD r k Title], MD r k Title =[MD r k TitleF, MD r k TitleOPW], B r k = s 1 , s 2 , s 3 , . . . ., s p ∀ s i ∈ B r k , s i = [B r k s i f j , B r k s i f j SP, B r k s i f j OS, B r k s i Content] The MD represents the metadata of a review, and B represents a set of sentences in the body of reviews. Table 1 depicts the description of the abbreviation used. Each sentence s i in B r k is represented by a proposed tuple. The proposed tuple is an extension of tuples presented in Refs. [40,58]. As shown in Figure 1, each sentence s i contains single feature f j . The opinion related to the feature f s i j in a sentence can be positive (OSP POS) or negative (OSP NEG) ). Opinion polarity is estimated in the range of −3 to −1 for negative and +1 to +3 for positive (three for strongest and one for weakest). The opinion with positive semantic polarity can have opinion strength strong positive (OS POS_S ), mildly positive (OS POS_M ) or weak positive (OS POS_W ) [52,64,65]. Similarly, the opinion with negative semantic polarity can have opinion strength strong negative (OS NEG_S ), mildly negative (OS NEG_M ) or weak negative (OS NEG_W ) [52,64,65]. A feature tuple is also proposed in work, as shown below.  Consider the following review shown in Figure 2. The helpfulness ratio ( ,) of the above-mentioned review is 75 (3/4*100) with a 5-star rating. The title of the review indicates a positive opinion with 'great' opinion word in the title having an opinion strength strong positive ( _ ) associated with the weight of +3 ( _ _ . The review presents opinions on battery, picture quality, and viewfinder features of a camera. The 'battery', 'picture quality', and 'viewfinder' features are described by the opinion word good, poor, and very good, respectively. The opinion word good is a positive word with the weak positive ( _ ) opinion strength, which is associated with the weight +1 ( _ _ ). However, the opinion word poor is a negative The mathematical model of the feature tuple that is part of the review tuple is shown below:  The helpfulness ratio ( ,) of the above-mentioned review is 75 (3/4*100) with a 5-star rating. The title of the review indicates a positive opinion with 'great' opinion word in the title having an opinion strength strong positive ( _ ) associated with the weight of +3 ( _ _ . The review presents opinions on battery, picture quality, and viewfinder features of a camera. The 'battery', 'picture quality', and 'viewfinder' features are described by the opinion word good, poor, and very good, respectively. The opinion word good is a positive word with the weak positive ( _ ) opinion strength, which is associated with the weight +1 ( _ _ ). However, the opinion word poor is a negative The helpfulness ratio (MD r k HR,) of the above-mentioned review is 75 (3/4*100) with a 5-star rating. The title of the review indicates a positive opinion with 'great' opinion word in the title having an opinion strength strong positive (OS POS_S ) associated with the weight of +3 (W_OS POS_S ). The review presents opinions on battery, picture quality, and viewfinder features of a camera. The 'battery', 'picture quality', and 'viewfinder' features are described by the opinion word good, poor, and very good, respectively. The opinion word good is a positive word with the weak positive (OS POS_W ) opinion strength, which is associated with the weight +1 (W_OS POS_W ). However, the opinion word poor is a negative word with the OS NEG_W strength (associated with the weight −1). The semantic orientation of the opinion word very good is positive with the opinion strength of OS POS_S (where W_OS POS_S = 3). The tuple of the review presented in Figure 2 is demonstrated in Figure 3.  Figure 2 is demonstrated in Figure   3. Consider the following reviews shown in Figure 4.  Figure 4 shows three reviews containing opinions on three different features (picture quality, battery, viewfinder) of a digital camera. The resultant feature tuple of the battery feature is presented in Figure 5. The battery feature was mentioned in all of the reviews three times; therefore, its weight is three. Three opinion words (good, poor, disappointing) are associated with the battery feature. These opinion words are Weakly Positive, Weakly Negative and Mildly Negative with corresponding values of +1, -1, and -2, respectively. One positive opinion word having an opinion strength of +1 is associated with the battery feature, and hence the value of the battery is +1, while two negative words with -1 and -2 opinion strengths are connected with the feature battery, resulting in the value of of the feature equals to -3 (-2 + -1). Consider the following reviews shown in Figure 4.  Figure 2 is demonstrated in Figure   3. Consider the following reviews shown in Figure 4.  Figure 4 shows three reviews containing opinions on three different features (picture quality, battery, viewfinder) of a digital camera. The resultant feature tuple of the battery feature is presented in Figure 5. The battery feature was mentioned in all of the reviews three times; therefore, its weight is three. Three opinion words (good, poor, disappointing) are associated with the battery feature. These opinion words are Weakly Positive, Weakly Negative and Mildly Negative with corresponding values of +1, -1, and -2, respectively. One positive opinion word having an opinion strength of +1 is associated with the battery feature, and hence the value of the battery is +1, while two negative words with -1 and -2 opinion strengths are connected with the feature battery, resulting in the value of of the feature equals to -3 (-2 + -1).  Figure 4 shows three reviews containing opinions on three different features (picture quality, battery, viewfinder) of a digital camera. The resultant feature tuple of the battery feature is presented in Figure 5. The battery feature was mentioned in all of the reviews three times; therefore, its weight is three. Three opinion words (good, poor, disappointing) are associated with the battery feature. These opinion words are Weakly Positive, Weakly Negative and Mildly Negative with corresponding values of +1, −1, and −2, respectively. One positive opinion word having an opinion strength of +1 is associated with the battery feature, and hence the OSP POS value of the battery is +1, while two negative words with −1 and −2 opinion strengths are connected with the feature battery, resulting in the value of OSP NEG of the feature equals to −3 (−2 + −1).

Architecture of the System
The proposed system consists of five components: pre-processor, feature and opinion extractor, review ranker, feature ranker, and opinion visualizer (see Figure 6). This archi-

Architecture of the System
The proposed system consists of five components: pre-processor, feature and opinion extractor, review ranker, feature ranker, and opinion visualizer (see Figure 6). This architecture is based on a previous study [52].

Architecture of the System
The proposed system consists of five components: pre-processor, feature and op extractor, review ranker, feature ranker, and opinion visualizer (see Figure 6). This tecture is based on a previous study [52].

Pre-processor
A pre-processor prepares a document containing reviews for review and f ranking. A variety of processes, including conversion of review text to lower case, re of non-alphabetic characters, tokenization, stop word filter, spell checker, word ming, and part of speech (POS) tagging, are performed by the component. Firstly, th of the document is transformed into lower case. Secondly, stop words are eliminated the document by applying a stop word filter using a defined list of stop words. Th word-stemming is performed to convert derivationally and inflectional forms of w into a base form. After that, noise is removed from the document by spell checking POS tagging is employed to assign a POS category to each word in the document. end, the tokenization returns a list of words.

Pre-Processor
A pre-processor prepares a document containing reviews for review and feature ranking. A variety of processes, including conversion of review text to lower case, removal of non-alphabetic characters, tokenization, stop word filter, spell checker, word stemming, and part of speech (POS) tagging, are performed by the component. Firstly, the text of the document is transformed into lower case. Secondly, stop words are eliminated from the document by applying a stop word filter using a defined list of stop words. Thirdly, word-stemming is performed to convert derivationally and inflectional forms of words into a base form. After that, noise is removed from the document by spell checking. Then POS tagging is employed to assign a POS category to each word in the document. In the end, the tokenization returns a list of words.

Feature and Opinion Extractor
Feature and opinion extractor extracts candidate features along with opinion words to generate a list of potential features. An existing study revealed that 60%-70% of the product features are represented by frequent explicit nouns [55]. The current study considered frequent nouns or noun phrases as candidate features based on the findings of existing studies [42,52,53,58]. Let us suppose there are q nouns (i.e., features) (n 1 , n 2 , n 3 . . . ., n q ) extracted from all the review tuples and stored in the list. A window based approach [41] is then utilized to extract opinion words associated with a particular feature in which opinion words discussed within K words of a particular feature are selected as associated opinion words. In contrast to existing studies [42][43][44]52] which extract nouns based only on feature frequency, we utilized three parameters to extract prominent features from a review document.
The noun j weight (n j weight) is calculated based on the assumption that frequently discussed nouns in a large number of high-quality reviews associated with several opinion words discussed in considerable review titles are significant product features. To identify potential features and associated opinion words from the list of nouns, an algorithm called feature and opinion extractor is proposed and presented in Figure 7. The inputs to the algorithm are the list of nouns (NounsList[]) extracted from reviews tuples (stored in the list Reviews[]) and review document. NounsList[] is an adjacency list [10] that can store the associated opinion words with a noun. Associated opinion words for feature n j is searched in all reviews, and opinion words that are on a distance of K words from the selected noun are populated in the NounsList[]. The following five equations are used to calculate n j weight. In Equation (1), the frequency of noun n j (n j f req) is calculated based on its occurrence in the review document consisting of m reivews. In other words, n j f req in Equation (1) represents count of n j occurrence in reviews.
The number of opinion words associated with noun n j in the whole review document is calculated in Equation (2). n j OPWCount (depicted in Equation (2) is the number of opinion words associated with the noun n j .
The numbers of times noun n j appeared in the titles of reviews is described by Equation (3) where TitleCountwithn j is the number of titles in which the noun n j discussed. In Equation (3), the bracket value is 1 (using the inversion notation [10]), if the condition holds otherwise, it is 0 [10]. The condition [n j ExistinReviewTitle(r k ) = True] in Equation (3), returns 1, if n j exists in the review r k title.
Equation (4) computes the numbers of times noun n j appeared in reviews.
Equation (5) shows the calculation of feature weight. The values of n j f req, n j OPWCount, TitleCountwithn j , and ReviewCountwithn j calculated in Equations (1)-(4) are utilized in the Equation (4) to calculated the weight of a feature. Therefore, our proposed method to calculated weight of a noun is based on four paramters: (i) frequency of noun n j , (ii) the number of associated opinion words with noun n j , and (iii) the number of times noun n j appreared in reviews' title, and number of reviews in which n j appeared (ReviewCountwithn j ). These parametera are calculated for each noun from (Lines 8-17) of the algorithm.
n j weight = n j f req + n j OPWCount + TitleCountwithn j + ReviewCountwithn j The noun frequency n j f req, n j OPWCount, and TitleCount n j are then summed up to compute the weight of a review called n j FinalScore in the algorithm (Line 20-21). This n j FinalScore is used to filter the nouns. Nouns having a FinalScore n j above a threshold β are selected as potential features. After this, FeatureOpinionList[] containing the selected (most frequent) features with associated opinion words is built.
The noun frequency , , and are then summed up to compute the weight of a review called in the algorithm (Line 20-21). This is used to filter the nouns. Nouns having a FinalScore above a threshold β are selected as potential features. After this, FeatureOpinionList[] containing the selected (most frequent) features with associated opinion words is built.

Review Ranker
The job of the review ranker is to calculate the rank of reviews stored in the review document. To calculate the rank of a review, first, the weight of each review is calculated based on five parameters. A user can define the contribution of these parameters by assigning them a weight. After assigning the weight to the reviews, the reviews are classified into five classes: (a) excellent, (b) good, (c) average, (d) fair, and (e) poor according to their weights.
To compute the class of each review stored in the review document, a ReviewRanking algorithm is proposed and presented in Figure 8. The core of the algorithm is to assign weights to reviews. The parameters used to compute the weight of a review tuple are: (i) title score (TitleScore), (ii) number of features in the review body ( ) , (iii) number of opinion words in the review body ( ), (iv) helpfulness ratio ( ), and (v) users' rating . The Title score depicts the sum of the number of feature and opinion words in the review title. Firstly, for each review tuple, the algorithm computes the number of features ( and opinion words ( ) Figure 7. Algorithm for identifying potential features with associated opinion words.

Review Ranker
The job of the review ranker is to calculate the rank of reviews stored in the review document. To calculate the rank of a review, first, the weight of each review is calculated based on five parameters. A user can define the contribution of these parameters by assigning them a weight. After assigning the weight to the reviews, the reviews are classified into five classes: (a) excellent, (b) good, (c) average, (d) fair, and (e) poor according to their weights.
To compute the class of each review stored in the review document, a ReviewRanking algorithm is proposed and presented in Figure 8. The core of the algorithm is to assign weights to reviews. The parameters used to compute the weight of a review tuple are: (i) title score (TitleScore), (ii) number of features in the review body (B r k Fcount), (iii) number of opinion words in the review body (B r k OPWcount), (iv) helpfulness ratio (MD r k HR), and (v) users' rating MD r k rating. The Title score depicts the sum of the number of feature and opinion words in the review title. Firstly, for each review tuple, the algorithm computes the number of features (MD r k TitleFcount) and opinion words (MD r k TitleOPWcount) appearing in the review title and then these computations are used to calculated title score (TitleScore) (Line 4-12). In other words, the Title score represents the sum of MD r k TitleFcount, and MD r k TitleOPWcount Moreover, for each review tuple the number of opinion words (B r k OPWcount) and features (B r k Fcount) appearing in the body are calculated (Line 14-21). Weight of each review (r k Weight) is computed based on the values of the parameters (B r k Fcount, MD r k rating, MD r k HR, B r k OPWcount, TitleScore). Users' preferences are incorporated in review ranking by defining the weight of each parameter (W_UP 1 , W_UP 2 , W_UP 3 , W_UP 4 , and W_UP 5 ) by users. User preferences weights (W_UP 1 , W_UP 2 , W UP 3 , W_UP 4 , and W_UP 5 ) and the weights assigned to TitleScore (α) are presented in (Line 23-25). Title weight coefficient (α) can be adjusted depending on the size and nature of experimental data. We set the value of α to 10 based on the conclusion of Ref. [42]. Maximum weight (MaxWeightReview) among the m reviews is computed in Line 27. After calculating the weights of reviews, class of each review is calculated (based on review own weight and maximum weight (MaxWeightReview) among all reviews weights). Based on r k Weight and MaxWeightReview; r k can be classified as one of the following review classes: (i) Excellent, (ii) Good, (iii) Average, (iv) Fair, and (v) Poor. We utilized these five review classes of Ref. [52] to depict the quality of each review in the review document and to distinguish high-quality reviews (HQ_reviews) from low-quality reviews to improve feature ranking. The presented scheme requires the user to decide which of the classes to be selected. The reviews that belong to selected classes (termed as high-quality reviews) are considered for feature ranking and opinion summary. For example, if the user selects classes Excellent, and Good; then all the reviews with r k Class equals Excellent, and Good are declared as high-quality reviews.
appearing in the review title and then these computations are used to calculated title score (TitleScore) (Line 4-12). In other words, the Title score represents the sum of , and Moreover, for each review tuple the number of opinion words ( and features ( appearing in the body are calculated (Line 14-21). Weight of each review ( ℎ is computed based on the values of the parameters ( , , , , ). Users' preferences are incorporated in review ranking by defining the weight of each parameter ( _UP , _UP , _UP , _UP , and _UP by users. User preferences weights ( _ , _ , , _ , and _ and the weights assigned to (α) are presented in (Line 23-25). Title weight coefficient (α) can be adjusted depending on the size and nature of experimental data. We set the value of α to 10 based on the conclusion of Ref. [42]. Maximum weight (Max ℎ among the m reviews is computed in Line 27. After calculating the weights of reviews, class of each review is calculated (based on review own weight and maximum weight (Max ℎ ) among all reviews weights). Based on ℎ nd Max ℎ ; can be classified as one of the following review classes: (i) Excellent, (ii) Good, (iii) Average, (iv) Fair, and (v) Poor. We utilized these five review classes of Ref. [52] to depict the quality of each review in the review document and to distinguish high-quality reviews (HQ_reviews) from lowquality reviews to improve feature ranking. The presented scheme requires the user to decide which of the classes to be selected. The reviews that belong to selected classes (termed as high-quality reviews) are considered for feature ranking and opinion summary. For example, if the user selects classes Excellent, and Good; then all the reviews with equals Excellent, and Good are declared as high-quality reviews.  Consider the review shown in Figure 9 as an example. In this review, features are highlighted in red while opinion words are highlighted in green. The TitleScore of the review is two as the title of the review comprises of one feature (Picture) and one opinion word (Brilliant). There are four features in the body of the review (picture quality, viewfinder, zoom, battery), and as a result, the B r k FCount score of the review is four. Four opinion words (excellent, poor, good and fantastic) are expressed in the review, so the B r k OPWCount of the review is four. Putting the values of the r k Weight computation equation results in 25.4. In the calculation of the r k Weight of the review r k shown in Figure 9, assuming the value of 0.20 for all preferences (W_UP 1 , W_UP 2 , W_UP 3 , W_UP 4 , and W_UP 5 ).
Mathematics 2021, 9,833 13 of 25 Consider the review shown in Figure 9 as an example. In this review, features are highlighted in red while opinion words are highlighted in green. The of the review is two as the title of the review comprises of one feature (Picture) and one opinion word (Brilliant). There are four features in the body of the review (picture quality, viewfinder, zoom, battery), and as a result, the score of the review is four. Four opinion words (excellent, poor, good and fantastic) are expressed in the review, so the of the review is four. Putting the values of the ℎ computation equation results in 25.4. In the calculation of the ℎ of the review shown in Figure 9, assuming the value of 0.20 for all preferences ( _ , _ , , _ , and _ ).

Feature Ranker
After discarding low-quality reviews (with below a certain threshold specified by user) among reviews, we are left with high-quality reviews (HQ-Reviews[]). The feature ranker ranks the extracted features (FeatureOpinionList[]) by utilizing high-quality reviews provided by the review ranker. In contrast to Ref. [52], we are enhancing the feature ranking by incorporating opinion and feature frequency along with opinion strength and orientation. The proposed feature ranker computes four rankings for every feature based on the information presented in high-quality reviews: (i) feature weight ( ℎ ,(depicted in Equation (10)); (ii) positive credence of ( , (depicted in Equation (12)); (iii) negative credence ( , (depicted in Equation (14)); and (iv) overall credence ( , (depicted in Equation (15)). An algorithm is proposed and presented in Figure 10 to calculate these ranking of a feature.
The ℎ is calculated based on the idea that frequently discussed features in a large number of high-quality reviews associated with many opinion words that appeared in substantial reviews' titles are decisive product features. Therefore, the value of ℎ is calculated using four parameters; (i) count of occurrence in high-quality reviews ( ), (ii) the number of opinion words associated with the ( ), (iii) number of reviews which discussed the feature in title or body ( ℎ ), and (iv) the number of reviews' titles that contains the feature ( ℎ ). These parameters are computed using Equations (6)-(9), respectively as shown (Lines 3-10) of the algorithm. Moreover, ℎ and ℎ are exploited in calculating the weight of a feature on the ground that if a feature is discussed in many

Feature Ranker
After discarding low-quality reviews (with Class r k below a certain threshold θ specified by user) among n reviews, we are left with m high-quality reviews (HQ-Reviews[]). The feature ranker ranks the extracted features (FeatureOpinionList[]) by utilizing high-quality reviews provided by the review ranker. In contrast to Ref. [52], we are enhancing the feature ranking by incorporating opinion and feature frequency along with opinion strength and orientation. The proposed feature ranker computes four rankings for every feature f j based on the information presented in high-quality reviews: (i) feature f j weight ( f j weight), (depicted in Equation (10)); (ii) positive credence of f j ( f j POSCred), (depicted in Equation (12)); (iii) negative credence ( f j NEGCred), (depicted in Equation (14)); and (iv) overall credence ( f j Rank), (depicted in Equation (15)). An algorithm is proposed and presented in Figure 10 to calculate these ranking of a feature.
The f j weight is calculated based on the idea that frequently discussed features in a large number of high-quality reviews associated with many opinion words that appeared in substantial reviews' titles are decisive product features. Therefore, the value of f j weight is calculated using four parameters; (i) count of f j occurrence in high-quality reviews ( f j f req), (ii) the number of opinion words associated with the f j ( f j OPWCount), (iii) number of reviews which discussed the feature in title or body (ReviewCountwith f j ), and (iv) the number of reviews' titles that contains the feature (TitleCountwith f j ). These parameters are computed using Equations (6)-(9), respectively as shown (Lines 3-10) of the algorithm. Moreover, TitleCountwith f j and RevCountwith f j are exploited in calculating the weight of a feature f j on the ground that if a feature f j is discussed in many reviews and titles, then it is significant. The calculation of f j weight is depicted in Equation (10).
f j weight = f j f req + f j OPWCount + ReviewCountwith f j + TitleCountwith f j . (10) reviews and titles, then it is significant. The calculation of ℎ is depicted in Equation (10).
in Equation (11) represents the count of positive opinions for the feature in all sentences of reviews. In Equation (11), the condition returns to 1 if is , otherwise it would return zero. In Equation (12), the condition  OS POS_M s k f j OS = OS POS_M sums the opinion strength OS POS_M the number of times it appears in the sentences of all m reviews. Positive credence ( f j POSCred) of a feature f j in Equation (12), denotes the number of positive opinion words used to describe a feature f j and the accumulated strength of associated positive opinion words. The larger the value of f j POSCred denotes that the feature f j was discussed more positively and many times.
W_ OS POS_W * s k f j OS = OS POS_W . (12) Equation (13) depicts the total number of occurrences of negative opinions of a feature f j in the body of m high-quality reviews. f j NEGCred in Equation (14) reflects the number of negative opinion words used to describe a feature and the total strength of these negative opinion words. The idea behind f j NEGCred is that the rank of a feature f j should be higher than other features if f j is associated with more negative words. The high value of f j NEGCred indicates that the features are discussed negatively by a large number of users.
w_OS NEG_W * s k f j OS = OS NEG_W . (14) Count f j OSP NEG is subtracted from f j POSCred of a feature f j to obtain an overall rank ( f j Rank) as shown in Equation (15).

Opinion Visualizer
An extensive literature review followed by a usability study with 146 participants was performed by the authors in their previous work to identify a suitable opinion visualization [63]. In Ref. [63], a questionnaire survey was performed to get feedback from users about existing opinion visualizations. Users' preferred visualization (tree map [39]) is adapted for the current study based on the findings of a previous study. The proposed visualization provides a multi-dimensional view of consumer opinions. The proposed visualization is discussed in the result section.

Dataset
Python 2.7 using a natural language toolkit (NLTK) was used to implement the proposed system. For the evaluation of the proposed system, experiments were performed on a real dataset (from amazon.com) utilized by Refs. [52,53,64,66,67]. The dataset contains user reviews of five digital devices, as shown in Table 2. The evaluation of the proposed system was performed by computing the accuracy of review quality evaluation, f j POSCred, f j NEGCred , and f j Rank.
Manually calculated class (actual class) is compared with a system generated class (extracted class) to calculate the accuracy of reviews, as shown in the formula below: The accuracy reveals how accurate the proposed review ranking scheme is in calculating the review quality class. Correspondingly, the actual values of f j POSCred, f j NEGCred and f j Rank are compared with extracted values to find the accuracy of the proposed feature ranking scheme. An example of accuracy calculation of f j POSCred, f j NEGCred, and f j Rank is given in Table 3. Table 3. Calculation of accuracy.

Review Quality Classification
The classification of the review quality of 'digital camera 1' shows mixed quality reviews ( Table 4). The majority of the reviews are classified as good, presenting sufficient opinions on digital camera 1. It is interesting to note that only a few reviews are labelled as excellent. Furthermore, 64% of the reviews belong to 'good' and 'average' reviews, delivering ample opinions on different features of digital camera 1. Only 14 reviews out of 45 (31%) were found to be 'fair' and 'poor'. The review quality classes of 'DVD Player' are illustrated in Table 4. Forty-four percent of the reviews were collectively classified as 'Excellent' and 'Good'. However, many reviews belong to the poor class, showing the low quality of these reviews. Notably in Table 4, 58% of reviews are categorized as the top three classes (Excellent, Good, Average) of review quality. The average accuracy of the review classification of all five products is presented in Figure 11. The system accomplished greater than 80% accuracy for all products. Further, the system achieved an average accuracy of 85% for all products. The average accuracy of the review classification of all five products is presented in Figure 11. The system accomplished greater than 80% accuracy for all products. Further, the system achieved an average accuracy of 85% for all products.

Feature Ranking
This section reports the , , and of the data files along with the accuracy achieved by the System in Table 5. The , and of each feature f were computed using Equations (12), (14), and (15), respectively, given in Section 3.2.4. The accuracy of the feature ranking scheme was calculated using Equation (16) given in Section 4.1. Due to word limits, only the results of DVD Player are presented here.
The top ten features of 'DVD Player' are highlighted in Table 5 according to the positive credence ( . The 'Player' received the highest , indicating its appreciation by many users. The next three features (Play, Price, Feature) show users' endorsement with positive ranks of 31, 28, and 23, respectively. The features Apex, Picture, Work, 'Product', and 'Unit' are also acknowledged positively by some users. The accuracy of the top 10 features of DVD Player according to the are shown in Table 5. The accuracy of features 'Product', 'Unit', 'Service', and 'Feature' was found to be 100%. Moreover, another four features, Player', 'Play', 'Apex', and 'Work', achieved accuracy of 86%, 90%, 92%, and 87.5%, respectively. Further, the accuracy of only 60% belongs to the feature 'Feature', resulting in an average accuracy of 90%.  Player  144  87  Player  196  91  Feature  23  100  Play  31  90  Play  35  81  Price  17  61  Price  28  61  Picture  27  69  Work  7  71  Feature  23  100  Apex  22  100  Product  3  67  Apex  14  93  Quality  11  58  Unit  −3  100  Picture  13  77  Video  9 Figure 11. Accuracy of Reviews Quality Classification.

Feature Ranking
This section reports the f j POSCred, f j NEGCred, and f j Rank of the data files along with the accuracy achieved by the System in Table 5. The f j POSCred, f j NEGCred and f j Rank of each feature f j were computed using Equations (12), (14), and (15), respectively, given in Section 3.2.4. The accuracy of the feature ranking scheme was calculated using Equation (16) given in Section 4.1. Due to word limits, only the results of DVD Player are presented here. The top ten features of 'DVD Player' are highlighted in Table 5 according to the positive credence ( f j POSCred). The 'Player' received the highest f j POSCred, indicating its appreciation by many users. The next three features (Play, Price, Feature) show users' endorsement with positive ranks of 31, 28, and 23, respectively. The features Apex, Picture, Work, 'Product', and 'Unit' are also acknowledged positively by some users. The accuracy of the top 10 features of DVD Player according to the f j POSCred are shown in Table 5. The accuracy of features 'Product', 'Unit', 'Service', and 'Feature' was found to be 100%. Moreover, another four features, Player', 'Play', 'Apex', and 'Work', achieved accuracy of 86%, 90%, 92%, and 87.5%, respectively. Further, the accuracy of only 60% belongs to the feature 'Feature', resulting in an average accuracy of 90%.
The top ten features of DVD Player are shown in Table 5 according to the f j NEGCred (negative credence). DVD Player received f j NEGCred of 196, indicating its inadequacy. Users also disapproved the 'Play', 'Picture', 'Apex' and 'Quality' features of DVD Player, as indicated by their larger f j NEGCred. The features, namely 'Video', 'Unit', 'Disc', 'Button', and 'Product', also were negatively discussed by some users. The accuracy of the top 10 features of DVD Player is shown in Table 5, according to the f j NEGCred. Three features (apex, button, unit) achieved 100% accuracy. 'Player', 'Play', and 'Product' features showed more than 80% accuracy resulting in overall accuracy of 81%. However, the accuracy of one feature 'Quality' is only 58%.
The top ten features of DVD Player are highlighted in Table 5, according to the f j Rank. The top four features, namely 'Feature', 'Price', 'Work', 'Product', have positive f j Rank, describing users' satisfaction about these features. Conversely, the negative f j Rank score of features Unit', 'Service', 'Play', 'Button', 'Disc', and 'Apex' illustrate users' dissatisfaction. The accuracy of DVD Player's top ten features, according to the O rank , is shown in Table 5, illustrating four features achieved 100% accuracy (feature, unit, service, button). However, the average accuracy of the system was found to be 81%.

Comparison of Proposed System with FBS System and Opinion Analyzer
We compared the results of the proposed system with two state-of-the-art systems, namely, the opinion analyzer (our previous work that is enhanced in the current study) [52] and the FBS system [53]. These systems are selected for the comparison, as the dataset utilized is same. In addition, the objectives of these systems are feature ranking based on consumers' opinions. It is notable that the top ten features of Digital Camera 1 of the proposed system and opinion analyzer are different as the methods used to extract features in both systems are different. To compare these systems, firstly, we extracted the common features from top ten features of these systems according to positive and negative ranks. There are eight and nine common features in positive and negative rank, respectively. Secondly, we compared the accuracy of these common features for positive and negative ranks separately, as shown in Figures 12 and 13. The average accuracy of the proposed system (95%) is slightly better than opinion analyzer (92%) for positive rank (Table 6). However, the proposed system showed a little degradation on average accuracy for negative rank (Table 6). This is might due to the fact that we utilized more parameters for feature extraction as compared to opinion analyzer.   Similarly, we compared the average accuracy of the top ten features of five products based on the positive and negative credence's with the accuracy of the FBS system [53]. The proposed system outclassed FBS system on the accuracy of four products (cellular phone, Digital Camera 1, MP3 Player, DVD Player), as shown in Figure 14. In the case of Digital Camera 2, the proposed exhibited a little accuracy deprivation; however, the proposed system surpassed FBS based on average accuracy. Table 6 shows the average accuracy achieved by the proposed system and opinion analyzer for positive and negative ranks. It can be seen that the average accuracy for positive and negative ranks are same for both systems.   Similarly, we compared the average accuracy of the top ten features of five products based on the positive and negative credence's with the accuracy of the FBS system [53]. The proposed system outclassed FBS system on the accuracy of four products (cellular phone, Digital Camera 1, MP3 Player, DVD Player), as shown in Figure 14. In the case of Digital Camera 2, the proposed exhibited a little accuracy deprivation; however, the proposed system surpassed FBS based on average accuracy. Table 6 shows the average accuracy achieved by the proposed system and opinion analyzer for positive and negative ranks. It can be seen that the average accuracy for positive and negative ranks are same for both systems.  Similarly, we compared the average accuracy of the top ten features of five products based on the positive and negative credence's with the accuracy of the FBS system [53]. The proposed system outclassed FBS system on the accuracy of four products (cellular phone, Digital Camera 1, MP3 Player, DVD Player), as shown in Figure 14. In the case of Digital Camera 2, the proposed exhibited a little accuracy deprivation; however, the proposed system surpassed FBS based on average accuracy. Table 6 shows the average accuracy achieved by the proposed system and opinion analyzer for positive and negative ranks. It can be seen that the average accuracy for positive and negative ranks are same for both systems.

Opinion Visualizer
In this work, due to space constraints, we are presenting the opinion summary of Digital Camera 1 only. The proposed tree map visualization is shown in Figure 15. The tree map consists of ten rectangles. Each rectangle represents one feature. The weight of a feature is depicted by the size of the rectangle. Each rectangle is further divided into various sections according to opinion orientation and strength. Positive and negative opinions on a feature are expressed by the rectangle at 6 levels: three for positive (weakly positive, mildly positive, strongly positive) and three for negative (weakly negative, mildly negative, and strongly negative), using different shades of red and green colors. Figure 16 shows the color scheme used in the tree map. The proposed tree map presents the comparison of opinions at six levels of opinion strength as compared to the tree map of Reference [39].
A large number of users discussed about the camera as shown by the size of camera rectangle in Figure 15. Two types of negative opinions (strongly negative and mildly negative) expressed on the camera. However, users appreciated the camera by stating strongly, mildly, and weakly positive opinions. The second feature is picture according to the size of rectangle (weight). It received only three types of opinions: strongly positive, mildly positive, and strongly negative. The features 'Battery' and 'Use' acknowledged by only positive opinions. On the other hand, 'Viewfinder' of the camera is discussed negatively with mildly negative or weakly negative comments. Only mildly positive opinions were expressed by users on the features 'LCD' and 'Lens'. The features 'Software' and 'Flash' of Digital Camera 1 were considered both positively and negatively by the users. The overall opinion of users on Digital Camera 1 found to be positive.

Opinion Visualizer
In this work, due to space constraints, we are presenting the opinion summary of Digital Camera 1 only. The proposed tree map visualization is shown in Figure 15. The tree map consists of ten rectangles. Each rectangle represents one feature. The weight of a feature is depicted by the size of the rectangle. Each rectangle is further divided into various sections according to opinion orientation and strength. Positive and negative opinions on a feature are expressed by the rectangle at 6 levels: three for positive (weakly positive, mildly positive, strongly positive) and three for negative (weakly negative, mildly negative, and strongly negative), using different shades of red and green colors. Figure 16 shows the color scheme used in the tree map. The proposed tree map presents the comparison of opinions at six levels of opinion strength as compared to the tree map of Reference [39].

Case Study
The proposed opinion-strength-based visualization was evaluated by conducting a usability study. The aim of the usability study is to identify the effectiveness and usefulness of the visualization. A total of ten participants (6 Male, 4 Female) was participated in the study. At first, the concepts of the proposed visualization were presented to the participants. After that the participants were asked to provide their feed back about user-friendliness, visual appeal, informativeness, understandability, and intuitiveness of the visualization. A 5-point Likert scale (Strongly Disagree to Strongly Agree) was ultilized to get the feedback. Figure 17 demonstrates the result of the usability study.  A large number of users discussed about the camera as shown by the size of camera rectangle in Figure 15. Two types of negative opinions (strongly negative and mildly negative) expressed on the camera. However, users appreciated the camera by stating strongly, mildly, and weakly positive opinions. The second feature is picture according to the size of rectangle (weight). It received only three types of opinions: strongly positive, mildly positive, and strongly negative. The features 'Battery' and 'Use' acknowledged by only positive opinions. On the other hand, 'Viewfinder' of the camera is discussed negatively with mildly negative or weakly negative comments. Only mildly positive opinions were expressed by users on the features 'LCD' and 'Lens'. The features 'Software' and 'Flash' of Digital Camera 1 were considered both positively and negatively by the users. The overall opinion of users on Digital Camera 1 found to be positive.

Case Study
The proposed opinion-strength-based visualization was evaluated by conducting a usability study. The aim of the usability study is to identify the effectiveness and usefulness of the visualization. A total of ten participants (6 Male, 4 Female) was participated in the study. At first, the concepts of the proposed visualization were presented to the participants. After that the participants were asked to provide their feed back about user-friendliness, visual appeal, informativeness, understandability, and intuitiveness of the visualization. A 5-point Likert scale (Strongly Disagree to Strongly Agree) was ultilized to get the feedback. Figure 17 demonstrates the result of the usability study.

Case Study
The proposed opinion-strength-based visualization was evaluated by conducting a usability study. The aim of the usability study is to identify the effectiveness and usefulness of the visualization. A total of ten participants (6 Male, 4 Female) was participated in the study. At first, the concepts of the proposed visualization were presented to the participants. After that the participants were asked to provide their feed back about user-friendliness, visual appeal, informativeness, understandability, and intuitiveness of the visualization. A 5-point Likert scale (Strongly Disagree to Strongly Agree) was ultilized to get the feedback. Figure 17 demonstrates the result of the usability study.  None of the participants strongly disagreed with the visual appeal, understandability, intuitiveness, and informativeness. Figure 17 shows that most of the participants reported strong agreement or agreement on the usability of the proposed visualization. The use of a color scale to increase the understanding of the visualization was suggested by two participants. The suggestion was incorporated. Another suggestion provided by many participants is to increase the width of borders and this suggestion is amalgamated. Last modification was done is the increase of font size based on the results of the usability study.

Conclusion, Limitation and Future work
In this paper, authors proposed novel ranking schemes for users' reviews and product features along with an opinion-strength-based visualization to present users high quality information from massive reviews. The focus is to improve existing ranking schemes of reviews and features by incorporating users' preferences with enhanced parameters set that are not considered in previous studies. In contrast to existing opinion mining system, the proposed system integrates review ranking and feature ranking by utilized only high-quality reviews based on users' preferences for feature ranking that result in enhanced product feature ranking. First, the information overload problem (selecting high quality reviews) was addressed by proposing a new review ranking scheme. Second, a new scheme for feature ranking based on an enhanced parameter set was proposed. Third, binary classification-based visualization was improved by the introduction of opinionstrength-based visualization that present users' opinions on critical product features at multiple levels according to opinion intensity. Four, the accuracy of the system is accessed using a real dataset of 332 reviews of five products from amazon.com. Finally, a usability study is performed to evaluate the quality of the proposed visualization. Our results show an average accuracy of 85% for review quality classification. Moreover, the results of Digital Camera 1 and DVD Player show five classes of reviews, presenting the insight about the quality of reviews. Player, play, and feature are found to be the top three features of the DVD Player according to positive credence, whereas player, play, and pictures are the top three features according to negative credence. The proposed system achieved promising results over existing systems. The study has some limitations as the system is evaluated on 332 reviews of one domain (electronic product). Future research should target more products having a large number reviews from different domains.