Next Article in Journal
Sustainable Adaptation to Multiple Water Risks in Agriculture: Evidence from Bangladesh
Next Article in Special Issue
Enhancement of E-Commerce Websites with Semantic Web Technologies
Previous Article in Journal
Practical Experiences with the Application of Corporate Social Responsibility Principles in a Higher Education Environment
Previous Article in Special Issue
Corporate Sustainability in the Process of Employee Recruitment through Social Networks in Conditions of Slovak Small and Medium Enterprises
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting the Helpfulness of Online Customer Reviews across Different Product Types

Department of Business Administration, Seoul National University of Science and Technology, Seoul 01811, Korea
Sustainability 2018, 10(6), 1735; https://doi.org/10.3390/su10061735
Submission received: 27 April 2018 / Revised: 21 May 2018 / Accepted: 23 May 2018 / Published: 25 May 2018
(This article belongs to the Special Issue Sustainability in E-Business)

Abstract

:
Online customer reviews are a sustainable form of word of mouth (WOM) which play an increasingly important role in e-commerce. However, low quality reviews can often cause inconvenience to review readers. The purpose of this paper is to automatically predict the helpfulness of reviews. This paper analyzes the characteristics embedded in product reviews across five different product types and explores their effects on review helpfulness. Furthermore, four data mining methods were examined to determine the one that best predicts review helpfulness for each product type using five real-life review datasets obtained from Amazon.com. The results show that reviews for different product types have different psychological and linguistic characteristics and the factors affecting the review helpfulness of them are also different. Our findings also indicate that the support vector regression method predicts review helpfulness most accurately among the four methods for all five datasets. This study contributes to improving efficient utilization of online reviews.

1. Introduction

Online product reviews written by customers who have already purchased products help future customers make better purchase decisions. Reviews can be defined as peer-generated, open-ended comments about the product posted on company or third party websites [1]. Since reviews are autonomously updated by customers themselves without corporate efforts, they are perceived as a sustainable form of word of mouth (WOM) in e-business.
However, as the reviews accumulate, it becomes almost impossible for customers to read all of them; furthermore, poorly authored low-quality reviews can even cause inconvenience. Thus, it becomes important for e-business companies to identify helpful reviews and selectively present them to their customers.
In fact, customers often require only a small set of helpful reviews. Some online vendors provide mechanisms to identify reviews that customers perceive as most helpful [1,2,3]. The most widely applied method is simply asking review readers to vote on the question: “Was this review helpful to you?”, and the answer can be either “Yes” or “No”. Then, review helpfulness is evaluated by calculating the number of helpful votes divided by the total number of votes [4]. Thereafter, the reviews that receive the highest ratings are reorganized to the top of the web page so that customers can easily check them. Leading online retailers—such as Amazon.com and TripAdvisor—also use this method to measure review helpfulness. Figure 1 shows how Amazon.com gathers helpful votes of the reviews from their readers.
However, a large proportion of online reviews have few or no votes at all; thus, it is hard to identify their helpfulness. According to Yang et al. [5], more than 80% of the reviews in the Amazon review dataset [6] have fewer than five votes. Moreover, newly authored reviews and less well-known products have fewer opportunities to be read by other customers, and thus, cannot receive many votes. Therefore, to use the entire review dataset efficiently, it is necessary to estimate the helpfulness of online reviews by using an automatic system rather than depending entirely on the manual helpfulness voting system.
The purpose of this paper is to predict the helpfulness of product reviews automatically by analyzing psychological as well as linguistic features of the reviews. This study helps online customers to access helpful reviews easily and efficiently even when reviews do not have any manual votes, which supports sustainable e-business strategy in terms of improving continuous utilization of online reviews.
There are some previous studies on this issue; however, most of them focus on linguistic characteristics or limit themselves to a consideration of basic psychological characteristics, such as positivity. This study considers some in-depth psychological characteristics, such as the level of analytical thinking, authentic expression, expertise, the ratio of perceptual process words, and cognitive process words embedded in reviews, as well as the basic features. Also, the product type is used as a control variable in this study. It is because the determinant factors affecting review helpfulness for different product types can vary according to product types. For example, a highly analytical review may be perceived as helpful to readers looking for cell phone products, while it may not be perceived as helpful for those buying clothing products.
In short, our research focuses on the following three questions. First, what are the psychological and linguistic review characteristics across different product types and how are they different? Second, what are the factors determining perceived helpfulness of reviews based on product type? Finally, which data mining method, among the four widely used data mining methods, best predicts review helpfulness?
To address these research questions, five different online datasets from different product types (beauty, cellphone, clothing, grocery, and video) on Amazon.com are used. The psychological and linguistic characteristics of online reviews for each product type are extracted by using a widely adopted text analysis software, Linguistic Inquiry and Word Count (LIWC). Then, the review characteristics across five product types are compared with each other using one-way analysis of variance (ANOVA). Next, the determinant factors of review helpfulness for each product type are examined using regression analysis. Finally, instead of depending on a single analytical method, four widely used data mining methods (linear regression, support vector regression, M5P, and random forest) are implemented to predict review helpfulness by using datamining package WEKA and Java programming language. The methods’ MAE performances are compared to determine the one that predicts review helpfulness most accurately.
The rest of this paper is organized into four sections. Section 2 presents literature related to the current study. Section 3 describes the research settings, and Section 4 presents the results and discussion of this study. Finally, the conclusion and scope for further research are described in Section 5.

2. Literature Review

Many previous studies consider two important issues for predicting the helpfulness of reviews. First, finding out the variables affecting the helpfulness of reviews and, second, adopting a suitable analyzing method for predicting review helpfulness. In Section 2.1, the related studies focusing on the first issue are reviewed and, in Section 2.2, the studies focusing on the second issue are described.

2.1. The Characteristics of Online Reviews Affecting Their Helpfulness

There are many features, such as linguistic characteristics (the number of words, word per sentences, etc.), the content of reviews (positivity/negativity, subjectivity/objectivity, etc.) and other peripheral factors (product rating score, review time, reputation of a reviewer, etc.) affecting review helpfulness, addressed by previous studies.
In terms of the linguistic aspect, the reviews with an appropriate length, high readability, and that are free of grammatical errors are likely to be perceived as helpful [1,4,7,8,9,10,11]. Mudambi and Schuff [1] study the effect of review length (word count) and review extremity on review helpfulness by analyzing Amazon.com’s review datasets. Their results show that the review length has a positive effect on review helpfulness and the product type has a moderating effect on their relationship. Pan and Zhang [7] also collected review datasets from Amazon.com for both experiential and utilitarian products and show the positive relationship between review length and review helpfulness. Korfiatis et al. [8] examine the effects of readability on review helpfulness using Amazon.com’s review datasets. They use four readability measures—Gunning’s fog index, Flesch reading ease index, automated readability index, and Coleman–Liau index—and show that the readability has a greater effect on review helpfulness than the review length. Ghose and Ipeirotis [9] consider six readability predictors with other variables, such as reviewer information, subjectivity levels, and the extent of spelling errors using Amazon.com’s review datasets. Their study also supports the view that readability-based features matter in influencing perceived review helpfulness and product sales. Similarly, Forman et al. [10] examine the effect of readability and spelling errors on review helpfulness as well as the subjectivity of the review text and reviewer information. They use three types of products on Amazon.com (audio and video players, digital cameras, and DVDs) and show that the readability of reviews has a positive impact on perceived helpfulness, and spelling errors have a negative impact on perceived helpfulness. Furthermore, Krishnamoorthy [11] considers a greater variety of linguistic features such as the ratio of adjectives, state verbs, and action verbs in reviews. This study considers four different kinds of features—metadata, subjectivity, readability, and linguistic category—and shows that a hybrid set of features deliver the best predictive accuracy. Additionally, the results show that, in most cases, a stand-alone model that uses linguistic features delivers a superior performance compared to a model that uses either subjectivity or readability as features.
In terms of the content aspect of reviews, the semantic features and sentiment features have been covered in some previous studies. Cao et al. [2] extracted the meaning of reviews with the help of latent semantic analysis (LSA). They empirically examined the impact of the basic, stylistic, and semantic characteristics of online reviews on review helpfulness and show that the semantic characteristics are more influential than other characteristics. Some previous research examines the effect of subjectivity of reviews on review helpfulness. Ghose and Ipeirotis [9] show that reviews having a mixture of objective and subjective sentences are rated as more helpful by other users than reviews that tend to include only subjective or objective information. Forman et al. [10] also showed that reviews with a mixture of subjective and objective elements are more helpful.
Emotions embedded in a review are also indicated as important determinants affecting review helpfulness [7,12]. Pan & Zhang [7] find that consumers tend to rate positive reviews to be more helpful than negative ones; this is often manifested in the inflated helpfulness ratings for positive reviews, which misguide consumers. On the other hand, there are some studies claiming that negative reviews tend to be more influential than positive ones [13,14,15,16]. The study of Chevalier and Mayzlin [13], which uses Amazon.com and Barnesandnoble.com datasets, shows that most reviews are overwhelmingly positive; however, negative reviews have a greater impact on sales than positive reviews. Kuan et al. [14] also show that negative reviews are more likely to receive helpful votes, and that they are generally considered to be more helpful. Yin et al. [3] explore the effects of emotions in greater detail. They specially focus on two negative emotions—anxiety and anger. They claim that anxiety-embedded reviews are considered as more helpful than anger-embedded reviews, because anxious reviewers write theirs more carefully than angry ones. Ahmad and Laroche [17] also study how discrete emotions—such as hope, happiness, anxiety, and disgust—affect the helpfulness of a product review. They adopt LSA to measure the emotional content in reviews, and their results show that discrete emotions have different effects on review helpfulness.
There are other peripheral factors influencing review helpfulness, such as reviewer’s reputation or product rating score. Otterbacher [18] collected data on the total votes a reviewer has received, the total reviews written, and the reviewer rank on Amazon.com, to measure reviewer reputation; their results show that reviewer reputation is positively correlated to review helpfulness. Product rating score was also found to be a strong determinant of review helpfulness in some previous research [1,2,8]. In addition, Luan et al. [19] studied on consumers’ review search behavior according to the product types and showed that customers more positively respond to attribute-based online reviews than experience-based reviews for search products, while responding oppositely for experience products.
In previous work related to this study, Park and Kim [20] analyze the review characteristics using LIWC and explore the determinant factors affecting review helpfulness. However, this research is limited to a focus on finding out determinant factors using Linear Regression for two types of products—electronics and clothing—on Amazon.com and does not predict review helpfulness using datamining methods.

2.2. The Analyzing Methods for Predicting Review Helpfulness

Analyzing methods can differ depending on whether the dependent variable (DV) is numeric or nominal. Aforementioned in Section 1, the dependent variable ‘review helpfulness’ is defined as the percentage of the helpfulness votes, which is numeric. In this case, one of the most widely adopted analyzing methods is Linear Regression. It has been frequently used in many previous studies because it is generally faster than the other methods, and it has an explanation capability as to how explanatory variables affect a dependent variable. Thus, many previous studies including Mudambi and Schuff [1], Yin et al. [3], Yang et al. [5], Korfiatis et al. [8], Forman et al. [10], Chevalier & Mayzlin [13], Otterbacher [18], and Park and Kim [20] have adopted Linear Regression for predicting review helpfulness scores. Some studies transformed the raw percentage of the helpfulness votes to nominal data such as “unhelpful” or “helpful”, based on whether the raw percentage exceeds a benchmark cutoff value [10]. In that case, it becomes a classification problem. Cao et al. [2] uses logistic regression to examine the impact of the basic, stylistic, and semantic characteristics of a “helpfulness rank” based on the number of votes a review receives. Likewise, Pan and Zhang [7] also used logistic regression for classifying helpful reviews.
Support vector machines (SVM) have also been used in some related research. SVM can handle both linear and nonlinear relationship between the dependent variable (DV) and independent variables (IVs). Moreover, they can predict both numeric and nominal types of DV. Specifically, a version of SVM called support vector regression (SVR) is used for regression, and a version of SVM called support vector classification (SVC) is used for classification. Kim et al. [4] and Zhang [21] applied a SVR method for predicting the review helpfulness using Amazon.com dataset. Similarly, Hu and Chen [22] predict review helpfulness using TripAdvisor dataset using three datamining methods (SVR, linear regression, and M5P) and show that M5P significantly outperforms the other two methods. Other related research adopts the SVC method. Martin and Pu [12] apply SVC with two other data mining methods—naïve bayes and random forest—and show that SVC performs the best among the three methods for TripAdvisor.com dataset. Krishnamoorthy [11] also adopted SVC, naïve bayes and random forest methods using Amazon.com dataset, however, unlike the results of Martin and Pu’s study [12], they show that random forest produces the best results.
Finally, decision tree methods such as JRip, J48, and random forest are also applied in some related research. Decision trees are a non-parametric supervised learning method used for classification and regression [23]. They produce output rules which are easy to understand and suitable for non-linear relationships between DV and IVs. Ghose and Iperirotis [9] use random forest based classifiers for predicting the impact of reviews on sales and their perceived usefulness. O’Mahony and Smyth [24] use two decision tree methods—JRip and J48—and naïve bayes, and show that JRip predicts review helpfulness most accurately.
In our study, we adopted four datamining methods (linear regression, SVR, M5P, and random forest) and compared their results in order to find the best method for predicting review helpfulness. Linear regression was selected because it is the most popular method in the previous research. The other three methods (SVR, M5P, and random forest) were selected because they were indicated as the best performing methods in more than one related studies. The various features and analyzing methods of the previous studies are shown in Table 1.

3. Research Settings

3.1. Data & Research Variables

The data used in this study was originally collected from Amazon.com spanning May 1996–July 2014, which we gathered from http://jmcauley.ucsd.edu/data/amazon/. We chose five product types having different product characteristics with each other as follows. The selected product types are beauty, cellphone, clothing, grocery, and video. Beauty and grocery products are both categorized as experience goods, for which it is relatively difficult and costly to obtain information on product quality prior to interaction with it [1]. The difference is that beauty products are closer to hedonic products, which are consumed for luxury purposes, while grocery products are closer to utilitarian products, which are consumed for practical use or for survival. Cellphone products are categorized as search goods, for which it is relatively easy to obtain information on product quality prior to interaction [1]. In addition, they are electronic products based on relatively advanced technology, and thus, the subject of more complex reviews. Clothing involves a mix of search and experience attributes. Branded clothing is categorized as search goods, whereas, non-branded clothing could be considered as experience goods. Video products are categorized as digital products, unlike the other four product types, which are physical products.
The original dataset contained 859,998 reviews; however, we selected 41,850 reviews that had more than 10 votes, because review helpfulness based on a small number of votes can be biased and unreliable. The details of the data used in this experiment for each product type are presented in Table 2.
The initial form of the review data is presented in Figure 2a; we used the review text, rating, the number of votes on review helpfulness, and the total number of votes from the original dataset. Review time was excluded because it does not contain recency information, that is, information about the time the review was written and the time it got votes.
Because the review text was in unstructured form, we transformed it to a structured form with numeric scores, as presented in Figure 2b. To transform the text, LIWC 2015 was used. LIWC is a text analysis software program developed by Pennebaker et al. [25] for evaluating psychological and structural components of text samples. The tool has been widely adopted in psychology and linguistics [26], and its reliability and validity have been investigated extensively [25,27]. It operates on the basis of an internal dictionary and produces approximately 90 output variables.
However, these 90 output variables are not all mutually exclusive and many of them are part of a hierarchy [28]. For example, as shown in Table 3, the sadness variable belongs to the broader negative emotion variable, and negative emotion belongs to the affective process variable. Thus, using both higher and lower variables belonging to the same hierarchy would cause information redundancy and multi-collinearity problems. Furthermore, many of these variables may not influence the prediction of review helpfulness. For example, a proportion of biological process words may not affect the review helpfulness. Therefore, we selectively use 11 variables which may influence the review helpfulness, rather than using the entire range of LIWC variables.
We categorize these variables into three groups—psychological, linguistic, and metadata. The psychological group is related to thinking and feeling processes based on semantics, while the linguistic group is related to the structure of sentences, or grammar. Unlike the previous two groups, the metadata group captures observations which are independent of the text [4]. The seven selected psychological variables are Analytic, Clout, Authentic, CogProc, Percept, PosEmo, and NegEmo; the three structural variables are WC, WPC, and Compare; the one metadata variable is product rating given by a reviewer. Analytic, Clout, Authentic, CogProc, Percept, and Compare are exploratory variables, which have been considered for the first time in research on this topic, whereas the other variables are confirmatory which have already been considered as determinants in previous research. The explanation of the research variables and the reason for selecting each variable are given below. Furthermore, the detailed explanations, including the scales and calculation methods, are explained in Table 4.
  • Explanotory Variables
    [Psychological variables]
    -
    Analytic: The level of formal, logical, and hierarchical thinking. Reviews containing analytical thinking are assumed to be more helpful, especially for information-intensive search goods.
    -
    Clout: The level of expertise and confident thinking. Reviews containing more professional expressions are assumed to be more helpful for complex products, such as hi-tech electronic products.
    -
    Authentic: The level of honest and disclosing thinking. Reviews containing more personal expressions and disclosures are assumed to be more helpful for high-involvement goods, which customers consider as their representatives.
    -
    CogProc: The ratio of cognitive process words such as “cause”, “know”, and “ought”. Reviews containing terms related to cognitive processes are assumed to be more helpful, especially for search goods, since their product qualities are often measured cognitively, rather than through the senses.
    -
    Percept: The ratio of perceptual process words such as “look”, “heard”, and “feeling”. Reviews containing terms related to perceptual processes are assumed to be more helpful for the goods whose quality is often evaluated by using senses.
    -
    PosEmo: The ratio of positive emotion words. It is a confirmatory variable identified as a determinant of review helpfulness in the previous studies [7,13,14,15,16,29].
    -
    NegEmo: The ratio of negative emotion words. It is a confirmatory variable identified as a determinant of review helpfulness in the previous studies [7,13,14,15,16,29].
    [Linguistic variables]
    -
    WC: The length of a review measured by the number of words in the review text. It is a confirmatory variable identified as a determinant of review helpfulness in the previous studies [1,7].
    -
    WPS: The level of conciseness of a review, measured by the average number of words per sentence. A lower value reflects more concise and readable sentences. It is a confirmatory variable identified as a determinant of review helpfulness in the previous studies [8,9,10].
    -
    Compare: The ratio of comparison words such as “bigger”, “best”, and “smaller”. Reviews with more comparison expressions are assumed to be more helpful for describing experience goods, which are hard to explain by focusing on their characteristics, and are rather easier to explain by comparing them to other products.
    [Metadata variable]
    -
    Rating: Product rating score received from a reviewer. It is a confirmatory variable identified as a determinant of review helpfulness in previous studies [1,2,8,29].
  • Dependent Variables
    -
    Helpfulness: The helpful quality perceived by readers, measured by the number of helpful votes in the total number of votes.

3.2. Research Method

The overall procedure and research methodologies used in this research are explained step by step in this section. In the first step, the charactieristics of review text in Amazon.com dataset explained in Section 3.1 were transformed into numeric form. In order to do this, each word in review text was searched from LIWC dictionary file. If the target word was matched with a dictionary word, then the matched word category scale was incremented. In this way, the resulting scores of explanatory variables, explained in Table 4, were produced. Figure 3 shows how target words were categorized using LIWC 2015 and, in Figure 4, the resulting scores of review text representing psychological and linguistic characteristics are presented.
In the second step, the characteristics embedded in product reviews across five different product types were explored by performing exploratory data analysis (EDA). In other words, we calculated the averages and standard deviations of review characteristics for each product type. After then, the average scores of explanatory variables across five product types were statistically compared with each other using one-way ANOVA. These results are presented in Section 4.1.
In the third step, the effect of explanatory variables on review helpfulness was explored using linear regression (LR) with a stepwise option in statistical software SPSS. LR has many advantages, such as being easy to understand and capable of explaining how the explanatory variables affect a dependent variable; thus, it is one of the most widely adopted methods for identifying determinants. The performances of the derived LR models were measured using the adjusted R-squared values and p-values of the F-test. These results are presented in Section 4.2.
Next, in step four, the helpfulness of online reviews were predicted using the four most widely used data mining methods (LR, SVR, M5P, and RandF). Every data mining method has its own advantages and disadvantages; so, it is important to choose a method suitable for the data being analyzed [33]. Thus, we compared the results of these four data mining methods to determine the best method for the review dataset. In this step, we excluded the computationally expensive Neural Networks method and the less scalable case-based reasoning (CBR) method and included relatively fast and simple methods. The models were built according to a 10-fold cross-validation so that all the examples in a dataset could be used for both training and testing process. In this 10-fold cross validation, the entire dataset was divided into 10 mutually exclusive subsets with the same class distribution. Each fold was used once as a test dataset to evaluate the performance of the predictive model that was generated from the training dataset which was a combination of the remaining nine folds [34]. The datamining methods were implemented using the Java programing language with the WEKA package. The detail explanation of these datamining methods and the WEKA functions used for implementing them are explained as follows.
  • Explanation of the examined data mining methods:
    (1)
    Linear regression (LR): This approach is used to analyze the linear relationship between a dependent variable and one or more independent variables. The standard least-squares LR method, contained in weka.classifiers.functions.LinearRegression, was used.
    (2)
    Support vector regression (SVR): This is a sequential minimal optimization algorithm for solving regression problems. SVR is the adapted form of SVM when the dependent variable is numerical rather than categorical [23]. The weka.classifiers.functions.SMOreg method with the PolyKernel option was used.
    (3)
    Random forest (RandF): This is an ensemble learning algorithm that operates by constructing a multitude of decision trees [35,36]. The weka.classifiers.trees.RandomForest method was used.
    (4)
    M5P (M5P): This is a decision tree algorithm for solving regression problems using the separate-and-conquer approach. In each iteration, it builds a model tree using M5P and makes the “best” leaf into a rule [37,38]. The weka.classifiers.trees.M5P method was used.
The performances of these four datamining methods were measured by MAE using the formula
MAE = i = 1 n | Y i Y i ^ | / n
(Y: real helpfulness, Y ^ : predictive helpfulness, n: the number of records in a test dataset)
Lastly, the MAE results were compared with each other using repeated-measure ANOVA. In other words, MAE results for each fold of a method were compared with the corresponding fold for the other methods in 10-fold cross validation results. The results of Step 4 are presented in Section 4.3.
The whole procedure of this research explained above are breifly summarized in Figure 5.

4. Results and Discussion

4.1. Review Characteristics According to Product Type

Our first research question was concerned with whether review characteristics varied across different product types, and if so, how they were different. The averages of review characteristics were explored, and one-way ANOVA was performed to examine the differences, as presented in Table 5. The ANOVA result shows that all research variables are significantly different at the 95% confidence interval across the five product types. In Figure 6, we also graphically illustrated the averages of some variables having similar scale to compare them conveniently. The distinctive results according to product types can be interpreted as follows. Product reviews for the cellphone category (345 words) are found to be the longest among the five product types based on WC, and approximately triple the length of reviews for video products (103 words). Moreover, WPS (22.233) and Analytic (65.447) for cellphone are the highest. This means that the reviews for cellphones are composed of lengthy and analytical sentences. This phenomenon may occur since reviewers may require more words to write reviews containing analytical expressions for complex and hi-technology cellphone products. The level of Clout shows the highest score (44.037) for video, but the lowest score (27.531) for beauty. On the other hand, the Authentic and Compare scores showed the opposite results. Reviews for beauty have the highest Authentic (53.836) and Compare (2.998) scores, while video has the lowest Authentic (30.735) and Compare (2.232) scores. In other words, product reviews for video tend to be written in an expert manner, whereas those of beauty are written authentically, with many comparative expressions. Additionally, CogProc (12.454) and Percept (5.211) scores for beauty were the highest among the five product types. For Percept, this is expected because reviewers may use many sensory-based expressions such as “looked”, “heard”, or “feeling” for beauty, the quality of which is evaluated based on senses. On the other hand, the results of CogProc are surprising, since reviews for cellphones are expected to include more cognitive process than those for beauty. This may be due to functional cosmetic products, however, further research is needed to analyze this result. The PosEmo (positive emotions) contained in reviews for clothing were the highest (4.479), whereas those for cellphones were the lowest (3.424). The NegEmo (negative emotions) contained in reviews for video were the highest, whereas those for clothing were the lowest (0.976). In other words, reviews for electronic and digital products tend to be written more critically than the other product types.
In conclusion, as seen in the previous results, reviews for different product types have different characteristics, thus it would be necessary to analyze review helpfulness for each product type separately.

4.2. Factors Determining Review Helpfulness

Our second research question was related to identifying the determinant factors in the perceived helpfulness of reviews depending on their product type. We performed a preliminary correlation analysis to check the linear relationships between the explanatory variables and the dependent variable, review helpfulness, as presented in Table 6. It was found that all explanatory variables were significant for more than one product category at the 95% confidence interval. However, Pearson’s correlation coefficients for Authentic, Compare, and Percept ranged between −0.1 and 0.1 for all product types, which means there was almost no linear relationship between Authentic, Compare, and Percept and review helpfulness. In this research, we did not remove any of the explanatory variables, because even though their correlation coefficients were small, they were statistically significant and there may have been non-linear relationships between them and review helpfulness.
Next, regression analysis was performed to examine the explanatory variables affecting the review helpfulness for each product category. Table 7 shows the detail regression result for the beauty category. Because there were five different regression models for each product type, we summarized the results for the sake of brevity, as presented in Table 8. In Table 8, the standard coefficients of the explanatory variables which are statistically significant are marked as * for all datasets.
For all product types, Rating, WC, and Analytic have a positive effect on review helpfulness. In other words, a review with a high rating score, comprising many words, and being highly analytical, is perceived as helpful to their readers for all the five datasets, while the other variables only influence the review helpfulness for some product types. Clout has a negative effect on helpfulness for the grocery dataset; however, it has a positive effect on the video dataset, which means that reviews with a high level of expertise and confidence are perceived to be more helpful for video products, whereas such reviews negatively affect helpfulness in the case of grocery dataset. Authentic only affects the review helpfulness for cellphone products, and WPS only influences clothing. That is, a reviews containing more honest, personal, and disclosing expressions are perceived as more helpful only for cellphone products, and reviews comprising concise sentences are perceived as more helpful only for clothing products. Similarly, Percept, which has expressions such as “looking”, “hearing”, and “feeling”, positively affects the review helpfulness only for beauty products. In the beauty and clothing datasets, reviews with more comparative expressions tend to be perceived as more helpful. PosEmo and NegEmo also affect review helpfulness. PosEmo has a positive relationship with helpfulness for the beauty category, and NegEmo has a negative relationship with helpfulness for the cellphone and clothing categories. CogProc has a negative effect on helpfulness for the beauty category; however, its effect is the opposite for cellphone and video categories.
In short, not only the conventional explanatory variables, such as Rating, WC, WPS, PosEmo and NegEmo, but also the novel variables used in this research, such as Analytic, Clout, Authentic, Compare, CogProc, and Percept, have a significant influence on review helpfulness. In particular, Analytic affects the review helpfulness for all five datasets, and the others partially influence the review helpfulness for some product types, according to their characteristics.
Adjusted R-square values of the regression models range from 0.135 to 0.388. The p-value for the F-test is less than 0.001 for all datasets; thus, the five regression models are significant overall.

4.3. Prediction Results of Review Helpfulness Using Datamining Methods

In this section, the prediction results of the four data mining methods (SVR, LR, RandF, and M5P) are examined and the results of their comparison are presented. The detailed MAE results of the four methods for each fold are presented in Table A1, Table A2, Table A3, Table A4 and Table A5 in the Appendix; they are arranged sequentially for the beauty, cellphone, clothing, grocery, and video datasets. To graphically compare the results of these four data mining methods, we depict the results for beauty products in Figure 7. The results show that the SVR method performs the best, producing the smallest MAE among the four methods. Likewise, the SVR method performs the best among the four methods for the other four datasets as well, as presented in Table A2, Table A3, Table A4 and Table A5.
To compare the overall results more efficiently, the average MAE of each data mining method is calculated and ranked, as presented in Table 9. The results indicate that the SVR method produces the most accurate predictive results among the four data mining methods across all five datasets, and the M5P method produces the second-best results.
In order to verify whether the differences in MAEs across the four data mining methods are statistically significant, repeated-measure ANOVA was performed. The null hypothesis in the ANOVA is that there is no difference in the average MAEs, and the alternative hypothesis is that they are not all equal. As presented in Table 9, the p-values indicate that the differences among the data mining methods are statistically significant for four out of the five datasets (beauty, clothing, grocery, and video) at the 95% confidence interval and they are statistically insignificant for the cellphone dataset.
Furthermore, we examined the MAE results of the best-performing SVR, which statistically outperforms the other methods, by performing the paired t-test. Even though ANOVA can statistically compare the results among the four data mining methods, it does not imply that SVR statistically outperforms the others. Thus, the paired t-tests between SVR and the other methods were also examined. The results show that SVR statistically outperforms the other methods in 7 out of 15 comparisons at the 95% confidence interval, as presented in Table 10. Conclusively, based on the previous experimental results, SVR would be the most desirable method among the four datamining methods for predicting review helpfulness.
Finally, the estimated results of helpfulness using the SVR method for sample reviews having no votes are presented in Figure 8. Although these reviews do not have manual votes, the SVR model can predict their helpfulness automatically and these prediction results can be gainfully used for reordering the reviews.

5. Conclusions and Future Work

In this paper, three research questions were examined. First, we examined the psychological, as well as linguistic characteristics, embedded in product reviews across five different product types, and showed how they were different. The reviews for the cellphone product category were found to be the longest and most analytical among the five product types. The reviews for video products were the most professional and confident, but the least authentic. The reviews for beauty products were the most authentic, while the least analytical. Moreover, they contained the most comparison expressions, cognitive process words and perceptual expressions. We demonstrated that the differences in review characteristics among the five product types were statistically significant at the 95% confidence interval. Second, the determinant factors for each product category were explored. The results showed that rating, word count, and analytical thinking affect the review helpfulness for all five product types; however, positive/negative emotions, comparative expressions, cognitive process words, and perceptual process words influence the review helpfulness for only some product types. Finally, among the four widely used data mining methods, the method that best predicted review helpfulness was determined. The results showed that the support vector regression (SVR) performs the best for all data types. This study would help online customers efficiently access helpful reviews, even when reviews have only a few or no manual votes.
There are several limitations of this study. First, we did not verify the reliability and validity of the review characteristics extracted by LIWC. Although the reliability and validity of LIWC have been investigated in prior research, whether LIWC works well for analyzing review text must be further studied. Second, we chose five different products in the present study without broadly categorizing them into groups such as hedonic vs. utilitarian or experience vs. search products. Even though some product groups have been changed in the e-commerce era and the boundary between them has become obscured, analyzing reviews according to these product groups would be still meaningful. Finally, we implemented the experiments using only Amazon.com datasets. In order to get more general results, we would like to expand this study by obtaining datasets from other e-business companies.
There are several possible future works related to this study. First, providing personalized reviews for each customer considering his/her preferences can be an interesting research topic. Second, comparing review characteristics written in social media with reviews posted on online shopping malls would be a prospective future work, since social media has become increasingly an important marketing channel spreading e-WOM [39]. Lastly, customer reviews can be analyzed based on several different methods and combining them using grey systems theory [40] could be useful for future research.

Funding

This study was supported by the research program funded by the SeoulTech (Seoul National University of Science and Technology).

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Table A1. Average MAEs of employing the data mining methods for each fold (beauty).
Table A1. Average MAEs of employing the data mining methods for each fold (beauty).
FoldData #LRSVRM5PRandF
083612.082011.613511.794411.9215
183612.712012.249712.498912.6919
283613.224213.185413.131513.2427
383612.390311.966412.308412.2743
483611.420410.945111.425511.5257
583611.665811.195911.472911.8278
683612.103111.586311.953712.1718
783511.393610.891111.258711.3696
883511.869211.228112.059212.1587
983512.535412.341512.326312.5452
Average 12.139611.720312.022912.1729
(Std. Dev) (0.5567)(0.6862)(0.5414)(0.5300)
Table A2. Average MAEs of employing the data mining methods for each fold (Cellphone).
Table A2. Average MAEs of employing the data mining methods for each fold (Cellphone).
FoldData #LRSVRM5PRandF
052012.535911.135412.107012.0818
152013.235211.548612.509812.8459
252011.847810.473611.759811.9244
352011.242710.324010.637210.7557
452012.605511.870012.258112.8158
552011.711611.520311.893011.9088
652012.636711.921612.448612.4628
752011.652512.191611.830311.9767
852012.442313.350512.298412.4866
952012.836212.972112.673212.8100
Average 12.274611.730812.041512.2068
(Std.dev) 0.59380.91780.54950.6025
Table A3. Average MAEs of employing the data mining methods for each fold (clothing).
Table A3. Average MAEs of employing the data mining methods for each fold (clothing).
FoldData #LRSVRM5PRandF
07519.83659.56959.80509.6974
17519.50549.52259.35539.6078
27509.06008.35808.89529.1874
37509.18188.64978.99379.4431
47509.71329.15889.776210.0726
57509.37828.73909.38709.6297
67508.25617.42588.04118.4316
77509.44638.88559.37049.6762
87508.98337.88928.95489.1500
97508.84588.18029.09289.5963
Average 9.22078.63789.16719.4492
(Std.dev) 0.43970.65660.48220.4213
Table A4. Average MAEs of employing the data mining methods for each fold (grocery).
Table A4. Average MAEs of employing the data mining methods for each fold (grocery).
FoldData #LRSVRM5PRandF
058613.898614.243913.464013.9866
158513.695413.427713.222813.2893
258513.443813.230412.941013.1115
358512.021911.736411.972812.0896
458514.476713.873014.029414.2847
558513.214712.306913.214713.3313
658511.930210.723511.864912.0930
758512.690811.827512.312212.7311
858513.315312.626713.212713.6465
958514.257513.853814.427214.5736
Average 13.294512.785013.066213.3137
(Std.dev) 0.82001.07640.78920.8040
Table A5. Average MAEs employing the data mining methods for each fold (video).
Table A5. Average MAEs employing the data mining methods for each fold (video).
FoldData #LRSVRM5PRandF
0149420.759220.287520.670220.9268
1149421.239720.746921.178521.5217
2149421.953121.627721.875421.9022
3149421.683721.352721.336221.5131
4149418.797418.069218.563619.0017
5149419.916819.286119.570520.0399
6149419.582419.512619.267419.5827
7149419.767520.410519.479419.4520
8149418.876118.422018.628419.5134
9149418.985919.244918.806719.0826
Average 20.156219.896019.937620.2536
(Std.dev) 1.11771.12761.16031.0484

References

  1. Mudambi, S.M.; Schuff, D. What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Q. 2010, 34, 185–200. [Google Scholar] [CrossRef]
  2. Cao, Q.; Duan, W.; Gan, Q. Exploring determinants of voting for the ‘helpfulness’ of online user reviews: A text mining approach. Decis. Support Syst. 2011, 50, 511–521. [Google Scholar] [CrossRef]
  3. Yin, D.; Bond, S.; Zhang, H. Anxious or angry? Effects of discrete emotions on the perceived helpfulness of online reviews. MIS Q. 2014, 38, 539–560. [Google Scholar] [CrossRef]
  4. Kim, S.M.; Pantel, P.; Chklovski, T.; Pennacchiotti, M. Automatically assessing review helpfulness. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 22–23 July 2006; pp. 423–430. [Google Scholar]
  5. Yang, Y.; Yan, Y.; Qiu, M.; Bao, F. Semantic analysis and helpfulness prediction of text for online product reviews. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 38–44. [Google Scholar]
  6. McAuley, J.; Leskovec, J. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys’, Hong Kong, China, 12–16 October 2013; pp. 165–172. [Google Scholar]
  7. Pan, Y.; Zhang, J.Q. Born unequal: A study of the helpfulness of user-generated product reviews. J. Retail. 2011, 87, 598–612. [Google Scholar] [CrossRef]
  8. Korfiatis, N.; Garcia-Bariocanal, E.; Sanchez-Alonso, S. Evaluating Content Quality and Helpfulness of Online Product Reviews: The Interplay of Review Helpfulness vs. Review Content. Electron. Commer. Res. Appl. 2012, 11, 205–217. [Google Scholar] [CrossRef]
  9. Ghose, A.; Ipeirotis, P.G. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Trans. Knowl. Data Eng. 2011, 23, 1498–1512. [Google Scholar] [CrossRef]
  10. Forman, C.; Ghose, A.; Wiesenfeld, B. Examining the relationship between reviews and sales: The role of reviewer identity disclosure in electronic markets. Inf. Syst. Res. 2008, 19, 291–313. [Google Scholar] [CrossRef]
  11. Krishnamoorthy, S. Linguistic features for review helpfulness prediction. Expert Syst. Appl. 2015, 42, 3751–3759. [Google Scholar] [CrossRef]
  12. Martin, L.; Pu, P. Prediction of helpful reviews using emotions extraction. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014. No. EPFL-CONF-210749. [Google Scholar]
  13. Chevalier, J.A.; Mayzlin, D. The effect of word of mouth on sales: Online book reviews. J. Mark. Res. 2006, 43, 345–354. [Google Scholar] [CrossRef]
  14. Kuan, K.K.; Hui, K.L.; Prasarnphanich, P.; Lai, H.Y. What makes a review voted? An empirical investigation of review voting in online review systems. J. Assoc. Inf. Syst. 2015, 16, 48–71. [Google Scholar] [CrossRef]
  15. Sen, S.; Lerman, D. Why are you telling me this? An examination into negative consumer reviews on the web. J. Interact. Mark. 2007, 21, 76–94. [Google Scholar] [CrossRef]
  16. Willemsen, L.M.; Neijens, P.C.; Bronner, F.; De Ridder, J.A. “Highly recommended!” The content characteristics and perceived usefulness of online consumer reviews. J. Comput. Mediat. Commun. 2011, 17, 19–38. [Google Scholar] [CrossRef]
  17. Ahmad, S.N.; Laroche, M. How do expressed emotions affect the helpfulness of a product review? Evidence from reviews using latent semantic analysis. Int. J. Electron. Commer. 2015, 20, 76–111. [Google Scholar] [CrossRef]
  18. Otterbacher, J. ‘Helpfulness’ in online communities: A measure of message quality. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Boston, MA, USA, 4–9 April 2009; pp. 955–964. [Google Scholar]
  19. Luan, J.; Yao, Z.; Zhao, F.; Liu, H. Search product and experience product online reviews: An eye-tracking study on consumers’ review search behavior. Comput. Hum. Behav. 2016, 65, 420–430. [Google Scholar] [CrossRef]
  20. Park, Y.-J.; Kim, K.-J. Impact of semantic characteristics on perceived helpfulness of online reviews. J. Intell. Inf. Syst. 2017, 23, 29–44. [Google Scholar]
  21. Zhang, Z. Weighing stars: Aggregating online product reviews for intelligent e-commerce applications. IEEE Intell. Syst. 2008, 23, 42–49. [Google Scholar] [CrossRef]
  22. Hu, Y.H.; Chen, K. Predicting hotel review helpfulness: The impact of review visibility, and interaction between hotel stars and review ratings. Int. J. Inf. Manag. 2016, 36, 929–944. [Google Scholar] [CrossRef]
  23. Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2016. [Google Scholar]
  24. O’Mahony, M.P.; Smyth, B. Learning to recommend helpful hotel reviews. In Proceedings of the 3rd ACM Conference on Recommender Systems, RecSys’, New York, NY, USA, 23–25 October 2009; pp. 305–308. [Google Scholar]
  25. Pennebaker, J.W.; Booth, R.J.; Francis, M.E. Linguistic Inquiry and Word Count (LIWC2007); LIWC: Austin, TX, USA, 2007; Available online: http://www.liwc.net (accessed on 27 April 2018).
  26. Tausczik, Y.R.; Pennebaker, J.W. The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 2010, 29, 24–54. [Google Scholar] [CrossRef]
  27. Pennebaker, J.W.; Francis, M.E. Cognitive, emotional, and language processes in disclosure. Cogn. Emot. 1996, 10, 601–626. [Google Scholar] [CrossRef]
  28. Pennebaker, J.W.; Boyd, R.L.; Jordan, K.; Blackburn, K. The Development and Psychometric Properties of LIWC2015. Available online: http://hdl.handle.net/2152/31333 (accessed on 27 April 2018).
  29. Hu, N.; Koh, N.S.; Reddy, S.K. Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales. Decis. Support Syst. 2014, 57, 42–53. [Google Scholar] [CrossRef]
  30. Pennebaker, J.W.; Chung, C.K.; Frazee, J.; Lavergne, G.M.; Beaver, D.I. When small words foretell academic success: The case of college admissions essays. PLoS ONE 2014, 9, e115844. [Google Scholar] [CrossRef] [PubMed]
  31. Kacewicz, E.; Pennebaker, J.W.; Davis, M.; Jeon, M.; Graesser, A.C. Pronoun use reflects standings in social hierarchies. J. Lang. Soc. Psychol. 2013, 33, 125–143. [Google Scholar] [CrossRef]
  32. Newman, M.L.; Pennebaker, J.W.; Berry, D.S.; Richards, J.M. Lying words: Predicting deception from linguistic style. Personal. Soc. Psychol. Bull. 2003, 29, 665–675. [Google Scholar] [CrossRef] [PubMed]
  33. Auria, L.; Moro, R.A. Support Vector Machines (SVM) as a Technique for Solvency Analysis; DIW Berlin Discussion Paper; Deutsches Institut für Wirtschaftsforschung (DIW): Berlin, Germany, 2008. [Google Scholar]
  34. Park, Y.-J.; Kim, B.-C.; Chun, S.-H. New knowledge extraction technique using probability for case-based reasoning: Application to medical diagnosis. Expert Syst. 2006, 23, 2–20. [Google Scholar] [CrossRef]
  35. Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–15 August 1995; pp. 278–282. [Google Scholar]
  36. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  37. Quinlan, R.J. Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Singapore, 16–18 November 1992; pp. 343–348. [Google Scholar]
  38. Wang, Y.; Witten, I.H. Induction of Model Trees for Predicting Continuous Classes. Available online: https://researchcommons.waikato.ac.nz/handle/10289/1183 (accessed on 25 May 2018).
  39. Mikalef, P.; Giannakos, M.; Pateli, A. Shopping and word-of-mouth intentions on social media. J. Theor. Appl. Electron. Commer. Res. 2013, 8, 17–34. [Google Scholar] [CrossRef]
  40. Julong, D. Introduction to grey system theory. J. Grey Syst. 1989, 1, 1–24. [Google Scholar]
Figure 1. An example of helpful votes in a review on Amazon.com.
Figure 1. An example of helpful votes in a review on Amazon.com.
Sustainability 10 01735 g001
Figure 2. Transformation of unstructured review data to structured data. (a) An example of the original review data in an unstructured form; (b) transformed review data in a structured form.
Figure 2. Transformation of unstructured review data to structured data. (a) An example of the original review data in an unstructured form; (b) transformed review data in a structured form.
Sustainability 10 01735 g002
Figure 3. Categorizing target words in review text using LIWC 2015. (a) Matched target words in review text in LIWC dictionary; (b) categorizing target words in review text.
Figure 3. Categorizing target words in review text using LIWC 2015. (a) Matched target words in review text in LIWC dictionary; (b) categorizing target words in review text.
Sustainability 10 01735 g003
Figure 4. Transformed review text into numeric scores representing psychological and linguistic characteristics.
Figure 4. Transformed review text into numeric scores representing psychological and linguistic characteristics.
Sustainability 10 01735 g004
Figure 5. Overall procedure of this research.
Figure 5. Overall procedure of this research.
Sustainability 10 01735 g005
Figure 6. Average scores of Analytic, Clout, and Authentic across product types.
Figure 6. Average scores of Analytic, Clout, and Authentic across product types.
Sustainability 10 01735 g006
Figure 7. Performance of the four data mining methods for the 10 folds of the beauty dataset.
Figure 7. Performance of the four data mining methods for the 10 folds of the beauty dataset.
Sustainability 10 01735 g007
Figure 8. Estimated helpfulness using the SVR method for sample reviews having no votes.
Figure 8. Estimated helpfulness using the SVR method for sample reviews having no votes.
Sustainability 10 01735 g008
Table 1. Previous studies on review helpfulness.
Table 1. Previous studies on review helpfulness.
WorkReview CharacteristicsAnalyzing MethodDataset
ContentLinguisticReviewerOthers
Chevalier et al. [13] Linear regressionAmazon.com
Barnsandnoble.com
Kim et al. [4] SVRAmazon.com
Forman et al. [10] Linear regressionAmazon.com
Zhang [21] SVRAmazon.com
Otterbacher [18] Linear regressionAmazon.com
Mudambi and Schuff [1] Linear regressionAmazon.com
O’Mahony and Smyth [24] JRip, J48, NBTripAdvisor.com
Cao et al. [2] Logistic regressionCNET
Ghose and Iperirotis [9]RandFAmazon.com
Pan and Zhang [7] Logistic regressionAmazon.com
Korfiatis et al. [8] Linear regressionAmazon.com
Yin et al. [3] Linear regressionYahoo! Shopping
Martin and Pu [12] NB, SVC, RandFTripAdvisor.com
Krishnamoorthy [11]v NB, SVC, RandFAmazon.com
Yang et al. [5] Linear regressionAmazon.com
Hu and Chen [22]Linear regression, M5P, SVRTripAdvisor.com
Park and Kim [20] Linear regressionAmazon.com
(SVR: upport vector regression, SVC: support vector classification, NB: naïve bayes, RandF: random forest).
Table 2. Number of data in each product type.
Table 2. Number of data in each product type.
BeautyCellphoneClothingGroceryVideo
# of Data835752007502585114,940
Table 3. Example of output variables of LIWC in the affective process category.
Table 3. Example of output variables of LIWC in the affective process category.
CategoryExamples# of Words in the Category
Affective processeshappy, cried1393
Positive emotionlove, nice, sweet620
Negative emotionhurt, ugly, nasty744
Anxietyworried, fearful116
Angerhate, kill, annoyed230
Sadnesscrying, grief, sad136
Table 4. Explanation of the research variables.
Table 4. Explanation of the research variables.
VariableExplanationCalculation
RatingRating score of a product from a reviewer scaled from 1 to 5Rating score of a product
WCTotal number of words included in the review textWord count
WPSAverage number of words in a sentence# of words/# of sentences
CompareRatio of the number of comparison words (bigger, best, smaller, etc.) in the review text to a total of 317 comparison words in the LIWC 2015 dictionary(# of related words in the review text/total # of related words) × 100
AnalyticLevel of formal, logical, and hierarchical thinking scaled from 0 to 100. Lower numbers reflect more informal, personal, here and now, and narrative thinking.Derived based on previously published findings from Pennebarker et al. [30]
CloutLevel of expertise and confident thinking scaled from 0 to 100. Low Clout numbers suggest a more tentative, humble, and even anxious style.Derived based on previously published findings from Kacewicz et al. [31]
AuthenticLevel of honest, personal, and disclosing thinking scaled from 0 to 100. Lower numbers suggest a more guarded, distanced form of discourse.Derived based on previously published findings from Newman et al. [32]
CogProcRatio of the number of cognitive process words (cause, know, ought, etc.) in the review text to a total of 797 cognitive words in the LIWC 2015 dictionary(# of related words in a review text/total # of related words) × 100
PerceptRatio of the number of perceptual process words (look, heard, feeling, etc.) in the review text to a total of 436 perceptual words in the LIWC 2015 dictionary(# of related words in a review text/total # of related words) × 100
PosEmoRatio of the number of positive emotion words (love, nice, sweet, etc.) in the review text to a total of 620 negative emotion words in the LIWC 2015 dictionary(# of related words in a review text/total # of related words) × 100
NegEmoRatio of the number of negative emotion words (hurt, ugly, nasty, etc.) in the review text to a total of 744 negative emotion words in the LIWC 2015 dictionary(# of related words in a review text/total # of related words) × 100
HelpfulnessRatio of the number of helpful votes to the total number of votes(Helpful #/Total #) × 100
Table 5. Descriptive statistics and comparison of the average scores for review characteristics across product types.
Table 5. Descriptive statistics and comparison of the average scores for review characteristics across product types.
Average (Standard Deviation)ANOVA
BeautyCellphoneClothingGroceryVideoFp-Value
Rating4.0713.8834.0563.9312.7401640.0430.000
(1.387)(1.421)(1.300)(1.485)(1.759)
WC202.312345.135153.158134.391103.5481457.0940.000
(190.045)(422.671)(158.714)(132.207)(122.574)
Analytic50.01565.44754.29360.85863.328623.5450.000
(21.696)(19.907)(23.048)(23.579)(25.266)
Clout27.53138.27433.49436.57344.037869.3540.000
(18.853)(18.464)(20.036)(21.139)(24.230)
Authentic53.83642.57349.84831.96830.7351369.7990.000
(27.863)(24.381)(28.237)(25.736)(27.414)
WPS18.47422.23316.86218.45016.839224.0950.000
(9.108)(18.697)(9.177)(10.409)(11.862)
Compare2.9982.6882.7972.7792.232233.5640.000
(1.848)(1.713)(1.991)(2.176)(2.182)
PosEmo3.6883.4244.4793.9793.942137.3710.000
(2.171)(2.115)(2.725)(2.642)(3.200)
NegEmo1.0181.0780.9761.1022.2601078.2560.000
(1.104)(1.056)(1.403)(1.404)(2.560)
CogProc12.45410.84310.49510.73710.560316.5670.000
(3.871)(3.239)(3.713)(4.434)(4.922)
Percept5.2114.3883.5694.0843.258733.9290.000
(2.992)(2.566)(2.687)(2.980)(2.683)
Table 6. Results of the correlation analysis.
Table 6. Results of the correlation analysis.
AttributeBeautyCellphoneClothingGroceryVideo
Rating0.441 **0.448 **0.352 **0.578 **0.609 **
WC0.122 **0.154 **0.064 **0.101 **0.184 **
Analytic0.107 **0.136 **0.032 **0.145 **0.162 **
Clout0.072 **0.01600.085 **0.203 **
Authentic0.0040.030 *0.008−0.011−0.030 **
WPS0.057 **0.037 **−0.0140.039 **0.102 **
Compare0.072 **0.040 **0.054 **0.053 **0.050 **
PosEmo0.127 **0.070 **0.106 **0.143 **0.170 **
NegEmo−0.132 **−0.158 **−0.119 **−0.160 **−0.177 **
CogProc−0.131 **−0.042 **−0.038 **−0.097 **−0.087 **
Percept0.082 **0.0270.0130.047 **0.025 **
(* p < 0.05, ** p < 0.01).
Table 7. Regression result for beauty products.
Table 7. Regression result for beauty products.
AttributeCoefficientStd. ErrorStd. Coeff.t-Valuep-Value
Rating5.5500.1370.41440.5710.000
WC0.0100.0010.10710.6990.000
Analytic0.0510.0090.0595.6300.000
Compare0.4720.1000.0474.7450.000
CogProc−0.1920.051−0.040−3.7600.000
PosEmo0.3180.0870.0373.6450.000
Percept0.1920.0610.0313.1520.002
Adjusted R20.217
F (p-value)331.772 (0.000)
Table 8. Summary regression results.
Table 8. Summary regression results.
AttributeStandardized Coefficient
BeautyCellphoneClothingGroceryVideo
Rating0.414 ***0.434 ***0.349 ***0.566 ***0.580 ***
WC0.107 ***0.090 ***0.055 ***0.048 ***0.089 ***
Analytic0.059 ***0.091 ***0.024 *0.065 ***0.065 ***
Clout −0.050 *** 0.045 ***
Authentic 0.067 ***
WPS −0.030 **
Compare0.047 *** 0.054 ***
PosEmo0.037 ***
NegEmo −0.026 *−0.044 ***
CogProc−0.040 ***0.034 * 0.015 *
Percept0.031 **
Adjusted R20.2170.2240.1350.3410.388
F331.772 ***251.587 ***168.031 ***1009.740 ***1891.262 ***
(* p < 0.05, ** p < 0.01, *** p < 0.001).
Table 9. Rank-ordered MAE for each data mining method and repeated-measure ANOVA results.
Table 9. Rank-ordered MAE for each data mining method and repeated-measure ANOVA results.
Rank1234Fp-Value
BeautySVRM5PLR 7.1260.001
(MAE)(11.7203)(12.0229)(12.1396)(12.1729)
CellphoneSVRM5PRandFLR2.0560.130
(MAE)(11.7308)(12.0415)(12.2068)(12.2746)
ClothingSVRM5PLRRandF55.1420.000
(MAE)(8.6378)(9.16715)(9.2207)(9.4492)
GrocerySVRM5PLRRandF267.2620.000
(MAE)(12.7850)(13.0662)(13.2945)(13.3137)
VideoSVRM5PLRRandF3.8830.020
(MAE)(19.8960)(19.9376)(20.1562)(20.2536)
(SVR: support vector regression, LR: linear regression, RandF: random forest).
Table 10. Overview of the paired t-test results.
Table 10. Overview of the paired t-test results.
p-Values
SVR-LRSVR-M5PSVR-RandF
Beauty0.0050.1610.134
Cell00.0040.159
Clothing0.0790.20
Grocery00.0010
Video0.0850.7790.099
(SVR: support vector regression, LR: linear regression, RandF: random forest).

Share and Cite

MDPI and ACS Style

Park, Y.-J. Predicting the Helpfulness of Online Customer Reviews across Different Product Types. Sustainability 2018, 10, 1735. https://doi.org/10.3390/su10061735

AMA Style

Park Y-J. Predicting the Helpfulness of Online Customer Reviews across Different Product Types. Sustainability. 2018; 10(6):1735. https://doi.org/10.3390/su10061735

Chicago/Turabian Style

Park, Yoon-Joo. 2018. "Predicting the Helpfulness of Online Customer Reviews across Different Product Types" Sustainability 10, no. 6: 1735. https://doi.org/10.3390/su10061735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop