Predicting eWOM’s Influence on Purchase Intention Based on Helpfulness, Credibility, Information Quality and Professionalism

Product reviews co-written by many Internet reviewers can help consumers make purchase decisions and provide a basis for companies to improve their business strategies. For the company, the most important thing is to understand how the various factors of reviews influence the purchase intention. Therefore, we took this issue as the core and investigated the influence of eWOM on purchase intention based on helpfulness, credibility, information quality and professionalism. We adopted feature filtering algorithms and proposed an ensemble model to integrate these classification results to obtain the most accurate prediction. The empirical evaluation shows that the models based on the four importan aspects of reviews can effectively predict the degree of impact of reviews on purchase intention.


Introduction
Online reviews are one of the main sources of eWOM communications, and consumers will consult them before making a purchase decision. The BrightLocal survey [1] pointed out that 93% of customers read online reviews to make a purchase decision. In addition, 85% of consumers trust reviews at the same level as their personal recommendations from friends or family. The study also pointed out that online reviews of restaurants and hotels play an extremely important role for consumers. Therefore, online reviews have not only become an important reference resource for consumers to make purchase decisions [2][3][4], but also an important source of strategic intelligence for companies to understand their current location and competitive advantages [5][6][7].
Previous studies [5][6][7] have shown that online reviews will greatly affect the purchase decision of consumers who read reviews and may affect the company's sales and revenue. A large amount of literature [8][9][10][11][12][13] is devoted to studying the influence of online reviews on people's purchase intention based on different aspects of reviews (such as usefulness, credibility, professionalism and information quality). Many of these studies use predictive models to predict how the characteristics of reviews in these aspects (usefulness [8,[14][15][16], polarity [17], acceptance [18], quality [19], reputation [20], etc.) affect purchase decision. These predictive models are designed to understand how comments affect consumers' purchasing intentions from different perspectives. It can be seen that no matter which aspect of the review, the impact of reviews on purchase intention is very obvious. Therefore, online reviews are now playing an important role in consumer purchasing decisions.
For companies, it has become a top priority to understand which characteristics of online reviews affect consumers' purchasing intentions and how. In fact, so far, past studies have often used some important characteristics of reviews to explore how they affect readers' purchase intentions. However, these studies only explored the influence of online reviews on purchase intention based on the characteristics of certain aspect of online reviews and lack a comprehensive consideration. For example, some studies have explored the influence of online reviews on purchase intention from different perspectives of reviews, such as the credibility [21], the usefulness [22], the information quality [23] and the professionalism [24]. Here, credibility refers to the degree of trust or belief, usefulness refers to the degree to which information can help users, information quality refers to the quality of the information content and professionalism refers to expected abilities or skills. A detailed review and discussion of these four factors are given in Section 2.
Based on the above examples, we can see that various aspects of reviews can have an impact on consumers' purchase intentions, and the important aspects include usefulness, information quality, professionalism, credibility, etc. However, one of the shortcomings of these past studies is that they only start from a certain angle to explore the impact on purchasing intentions. However, these different aspects are related to each other but at the same time are different from each other. Therefore, their exploration of purchasing intention is partial and incomplete, and they lack a comprehensive and holistic understanding of the possible factors that affect purchasing intention. Therefore, from this shortcoming, a research need is pointed out, that is, how to integrate the different perspectives used to understand the influence of purchasing in the past to achieve a holistic and comprehensive understanding, and find out a more complete list of influencing factors and build a predictive model to explore how these factors affect purchasing intentions.
In this paper, we aim to address these research gaps. We discussed the influencing factors of purchase intention from the perspective of consumers and conducted joint analysis from various aspects. The model in the research uses the important aspects related to reviews in the literature, combined with the features that readers can recognize, to capture the relevant variables. Therefore, this paper discusses the factors that may affect purchase intention from four aspects: usefulness, information quality, credibility and professionalism, and further extracts these factors from the review content, reviewer information, review action and product information. As a result, a total of 41 variables were extracted. Then, we took a closer look at how the components of the review affect purchase intentions. The entire conceptual framework is shown in Figure 1. The advantage of this framework is that it combines the four dimensions used in previous studies, so it can combine the variables of these four dimensions to explore which variables have greater influence and which variables have less influence. In addition, this framework can also help us clarify the similarities and differences between the four-dimensional variables and clarify how they complement each other. In addition, the framework can objectively evaluate the predictive capabilities of various aspects on purchase intention, and by integrating these aspects, the overall predictive capabilities can be significantly improved. This paper makes multiple contributions to the field. First, we identify the important factors of reviews that affect purchase intention from four aspects rather than a single aspect. Second, our proposed model can provide a more comprehensive understanding of which characteristics of reviews are more likely to influence purchase decisions, because our perspective covers four different but related aspects. Third, we propose an en- This paper makes multiple contributions to the field. First, we identify the important factors of reviews that affect purchase intention from four aspects rather than a single aspect. Second, our proposed model can provide a more comprehensive understanding of which characteristics of reviews are more likely to influence purchase decisions, because our perspective covers four different but related aspects. Third, we propose an ensemble model that predicts the influence of reviews on purchase intention by combining four different aspects. Fourth, the experimental results show that our model can predict review influence more accurate than the model based on a single aspect.
The rest of the paper is organized as follows. Section 2 is a review of related work. Section 3 introduces the method and process of constructing the prediction model. Section 4 presents a series of experiments to prove the effectiveness of our prediction model. Section 5 is the conclusion of this paper.

Literature Review
For the related literature on the characteristics of eWOM, there have been several significant aspects in the past, namely usefulness, credibility, information quality and professionalism. The related literature will be introduced below.

Usefulness/Helpfulness
Among the studies related to reviews, the most common are studies on usefulness and helpfulness [8,9,[25][26][27][28][29]. In these papers, "helpfulness" and "usefulness" are used interchangeably. This is due to a phenomenon that has emerged in recent years that voting on how helpful online reviews are is particularly important because they constitute a check on consumer decisions during the purchase process. Many large e-commerce websites have helpful voting mechanisms. Such a voting mechanism will not only affect the comments themselves but will also affect the judgements of the commenters. Generally, reviews with more votes will be considered more useful. Usefulness is regarded as a criterion reflecting judgments. Gupta et al. [30] found that highly useful reviews can help customers evaluate and improve the credibility of reviews, which also means that these reviews can increase customers' confidence in making purchasing decisions. Mudambi et al. [31] also pointed out that usefulness analysis can effectively help understand the impact of comments on the decision-making process, and the depth of comments has a positive correlation with perceived usefulness. Moreover, useful reviews have a greater impact on the experience of products. Compared with advertising, people pay more attention to whether the actual experience content is useful to them.
Racherla and Friske [32] pointed out that negative reviews are more influential than positive reviews. The study suggests that finding negative reviews is a better way to understand the content of the product. Salehan and Kim [14] further found that neutral polarity reviews were considered more helpful. Korfiatis et al. [9] used basic readability measures, average length measures, text content and positive and negative sentiments to assess the quality and usefulness of reviews. The findings of [8] on the impact of reviews on sales and perceived usefulness indicate that subjectivity, readability, self-disclosed identity, product-related information and negative reviews are important impact characteristics. Almutairi et al. [33] showed that the reviewer's disclosure and the reviewer's history will affect the online social community and behaviors such as review helpfulness. Therefore, we also use the reviewer's personal information and past records as research variables. Liu and Park [34] found that the combination of reviewers and information features has a significant impact on the usefulness of reviews, while [32] pointed out that online reviews are essentially a source of information used by consumers to obtain knowledge about products and services. Therefore, the amount of information available in the reviews helps customers evaluate the function of the product. Based on the above findings, this study used polarity, relevance, readability, subjectivity, popularity, information and history as the influencing factors of usefulness.

Credibility
The trustworthiness of messages has always been an important issue in online communication research topics. As the population using the Internet continues to grow, consumers have more experiences online. However, even with so much experience, it is often impossible to determine the credibility of the online environment. How to evaluate credibility is the research direction that many scholars attach importance to [35][36][37][38]. Gunawan and Huarng [39] believe that trust in information will affect consumers' attitudes and subjective consciousness, and in turn affect their purchase intentions. Some studies [40,41] have pointed out that credible comments will be accepted by readers and affect subsequent behavior. In general, when product reviews are more credible, it may have a significant impact on consumers' preference for products.
Kusumasondjaja et al. [36] used positive reviews, negative reviews and the personal identification information of reviewers to investigate the credibility of online reviews and initial trust. Zhang and Watts [42] found that the consistency of information is the key. The consistency of comments refers to whether the information in the comments has the same trend as other messages. Since consumers view messages as clues, the process of receiving them will also be affected. Similar messages presented by numerous reviewers may be considered more credible. This is also consistent with what [43,44] found. The support of a growing number of other people will strengthen the so-called "consensus power." Readers tend to believe what most other people believe, even if those beliefs may not be true.
However, there are many anonymous messages on the Internet, and the anonymity often results in irresponsible speech. Even if many operators actively develop monitoring mechanisms to find false reviews, the current situation is still inevitable. Xie et al. [45] studied how the existence of personal identification information of electronic reviewers influences consumers' handling of contradictory online booking intentions. The presence of personally identifiable information will have a positive impact on the credibility of online reviews. In addition, [46] found that when a review does not reveal the identity of the reviewer, there is actually not much difference between positive and negative reviews of consumers. Almost all websites now require identity authentication. In addition to having a higher level of control, it also allows consumers to acquire more references. Therefore, the reviewer's personal information and website reputation are also important variables. Based on the above findings, this study used consistency, richness, subjectivity, relevancy, reputation, information as the influencing factors of credibility.

Information Quality
Obviously, the impact of a review may also be related to the quality of the information.
The higher the quality, the more likely it is to benefit consumers. Atika et al. [47] indicated that source credibility and information quality have a significant impact on brand image and purchase intention. The research model of [48] confirms that when users evaluate and judge online information, the most common mention is information quality. In the research of [11], the criteria for judging information quality and credibility have been strengthened. The most common concern is the usefulness, correctness and particularity of the message content. In addition, reputation, expertise and enthusiasm are related to reliability in judging authors. Chua and Banerjee [49] found that the relationship between information quality and review assistance varies with emotion and product type. The study also determined the three dimensions of information quality for evaluation, namely, comprehensibility, specificity and reliability. In the research of [50], the quality evaluation of product reviews was regarded as a classification problem. The characteristics of the message used to assess quality take into account all aspects of the review, including believability, subjectivity, reputation, relevance, timeliness, completeness, appropriate quantity, comprehension and simplicity. These were also adopted in our study. Among them, we use consistency to replace the believability in the research. In the literature related to decision-making, [21] showed that the number and quality of online consumer reviews are two of the characteristics that affects users' information processing. Lee and Shin [51] showed that, especially in the evaluation of search products, the quality of reviews has a greater impact on purchase intention. In addition, when there are photos of reviewers, the review quality will affect the evaluation of the website, which means that the photos of reviewers may cause consumers to adopt different message processing strategies. Based on the above findings, this study used subjectivity, richness, polarity, relevancy, timeliness, reputation and consistency as the influencing factors of information quality.

Professionalism
Anyone can access resources on the Internet, but whether to accept it depends on personal judgment. Professionalism is an indicator that is highly related to credibility. Most people measure credibility according to the professionalism revealed in the message. For example, people usually trust reviews that contain detailed information. Rieh and Belkin [52] proved that information seekers can measure the ultimate perceived credibility based on the reliability and professionalism of information. From the consumer's point of view, when considering or making a decision, a "professional" is usually the final confirmation. Professional knowledge means in-depth knowledge of a specific field. Therefore, the research of [53] found that, compared with search products, the expertise of reviewers in trust-based and experience-based services will increase value and influence. People are more influenced by messages from authoritative and professional sources. Especially when professional comments have eliminated personal prejudice to a certain extent, trust and authority will be enhanced [54]. The study of [55] demonstrated the importance of the interaction between the comments of professional reviewers and other external signals that may influence consumer behavior. Zhou and Duan [24] confirmed that professional reviews influence online user choices indirectly through the volume of online user reviews, in addition to their direct impact. There are numerous studies linking the process of generating credibility with the professionalism of the source. Mackiewicz [56] further studied how these professional claims are connected to contributions. The study showed that reviewers can gain credibility through the performance of their behavior or expression (Indirect way), rather than through the promotion of professional knowledge (Direct way). Jameson's research [57] also shows that persuasiveness can be achieved through narrative experience, and the process of linking past experiences with products will naturally turn to expertise. Lin et al. [58] showed that the expertise of senders will attract user to adopt the information and make decision to purchase when they made a comment in consumer review. Generally speaking, professionalism can be viewed as "authoritativeness," "competence" and "expertness" [59]. Based on the above findings, this study used information, history, reputation, relevancy, subjectivity as the influencing factors of professionalism.

Framework and Method
In this study, the evaluation of the impact of electronic word-of-mouth on purchase intentions is regarded as a classification problem, and we propose a model for predicting the level of influence of reviews. Since we consider predictive influence as a classification problem, we have selected several suitable classification algorithms to develop predictive models and analyzed the most appropriate method for analyzing the relationship between attributes and influence scores. The framework of this article is described in the Graphical abstract and can be divided into eight steps. First, we collected review data sets from TripAdvisor, which is one of the largest travel platforms in the world (data collection). We mainly used important aspects related to reviews to obtain rich variables (feature extraction). Then, we asked experts to assign an impact score (dataset labeling) for each review and divided it into three levels. We then preprocessed the data set, including data conversion and standardization (Preprocess). Then, using Pearson correlation analysis (correlation analysis) and five feature selection algorithms to evaluate and filter the variables, we found the set of variables with the strongest interpretation capability for modeling (Feature Assessment). The model was trained using cross validation, and finally several important evaluation principles were used to assess the performance of our predictive model (Model Training and Validation).

Data Collection
Our research selected TripAdvisor as the target, which is one of the major travel websites in the world. Its business is mainly to provide searchers with reviews and information about global hotels, restaurants and tourist destinations. As a frequently used online website, it is an important resource for travel reviews. In this study, we selected 10 restaurants in the top three regions (New York, Los Angeles and Chicago) on the TripAdvisor page, and selected 10 reviews from each restaurant. A total of 300 reviews were used as our dataset. Table 1 indicates the relevant variables of four important aspects (helpfulness, credibility, information quality, professionalism) in the literature related to the review mining. Based on this, we extracted from the TripAdvisor website the index variables that are often seen in every aspect of the literature and the relevant data that consumers will pay attention to. Furthermore, we extracted a total of 41 features in the review from four different sources: (1) review content: text content, (2) review action: actions or events taken as a result of comments, (3) product data: information related to the product itself and (4) review author: information about the author who posted the review. Table 2 is the list of features considered in developing the prediction models. It uses four important aspects (helpfulness, credibility, information quality and professionalism) and related data that consumers should pay attention to on the website (such as the website's predefined scoring items). These 41 variables are represented as z-score representation. The details of each variable from the four sources are described below. Automated readability index f8

Features Extraction
Subjectivity The ratio of using opinion words f9 The number of opinion words f10 Richness Number of photos f11 Polarity The polarity of the content f12 The polarity of the title f13 Relevancy Degree of relevance f14 Reviews Action (RA)

Consistency
Degree of deviation from average score f15 Degree of deviation between score and extreme score f16 Rating The points of this review f17 Response Is there any response f18 Total number of responses f19 Timeliness How long has the comment been posted f20 Helpful vote The number of helpful votes for the review f21 Product data (PD) We extracted a total of 14 variables from the review content (RC), which belong to five categories (readability, subjectivity, richness, polarity and relevancy). First, [61] pointed out that easy-to-read text will be better understood, and readers will form a more reasonable view of the text. In the readability section, we used eight variables, where readability indexes f5, f6, f7 and f8 are defined in [62][63][64], respectively (see Table 1). In addition, Chen and Tseng [50] indicated that subjective opinions in online reviews may help readers make decisions. The degree of subjectivity is based on the number of opinion words used. The subjective part uses "the number of opinion words" (f10) and "the proportion of use" (f9). As for richness, we use "the number of photos" (f11) to determine the richness of this review. In the polarity part, we used the VaderSentiment sentiment analysis package for sentiment analysis. We captured "the sentiment score of the review content" (f12) and "the sentiment score of the review's headline" (f13) because readers may pay attention to different reviews at the same time. Review relevancy refers to the frequency with which a review mentions the important characteristics of the target restaurant. The characteristics considered include 23 items: food, taste, price, quality, size, ingredients, drink, dessert, service, parking, staff, open hours, menu, pets allowed, WIFI, seating, ambiance, environment, view, located, discount, payment and smoke. Degree of relevance (f14) is the total number of times that these 23 items are mentioned in a review.

Reviews Action (RA)
This section is the follow-up behavior caused by users after reading the reviews. These operations on reviews include star ratings on reviews, other people's responses, other readers voting for the help of the review, and the relationship between a single review and other factors from an overall perspective. First, since readers usually pay attention not only to one review, but also notice the differences from other reviews, we used two features to evaluate the consistency of the review: "the degree of deviation between the reviewer's rating and the average rating" (f15) and "the degree of deviation between the rating and the polar rating" (f16). The mapping method is used to map the scoring level to the proportion of the sentiment score and was then subtracted to obtain the difference value. In addition, the behaviors related to reviews include "Rating" (f17), "Helpful Vote" (f21) and "Response." In "Response," we use "With or without replies" (f18) and "Total number of responses" (f19) as two measures to observe the impact of the number of responses. In addition, according to research, 41% of consumers will only refer to reviews within about 2 weeks, and only 18% of consumers will refer to reviews from the previous year [21]. Therefore, "Timeliness" (f20) must also be considered as a variable.

Product Data (PD)
According to the research by [65], eWOM has different effects on purchase intention. For different products and types, their impact is not the same. Therefore, we also include product-related variables in the scope of research, hoping to provide more theoretical reports through the analysis of the service industry. In the "Product Data" section, there are six variables. We use "number of reviews" (f22) and "average score" (f23) as indicators of popularity. In addition, we directly apply the predefined scoring aspects of the TripAdvisor website, including "food score" (f24), "service score" (f25), "value score" (f26) and "atmospheric score" (f27).

Author of a Review (AR)
Forman et al. [46] confirmed the importance of revealing the identity and location of the commentator. Paek et al. [66] also showed that the user's background information helps predict the importance of the information s/he publishes. Since our research goal is to predict the impact of reviews on purchase intention, it is an important factor whether the relevant information of reviewers will affect the judgment of products and reviews. To this end, we collected the personal information, past records and reputation of each reviewer on the website. Personal information includes "gender" (f28), "age" (f29) and "year of joining" (f30). More specifically, in the geographic location information section, we chose "whether this reviewer's location is the same as the restaurant's location" (f31). Finally, in the self-introduction section of the reviewer's page, we can observe "whether he/she disclosed more identity information" (f32). Readers can move to the reviewer's name area to see relevant information, such as what rating "this reviewer has rated the most" (f33) in the past, "the number of places visited" (f34) and "a total number of photos provided" (f35). This reputation part is part of TripAdvisor's recognition plan for website reviewers. It can show the reviewer's contribution to TripAdvisor, as well as their status and reputation on the site, to bring a better experience to other travelers. Six variables were extracted in this section: "Contribution" (f36) represents the total contribution of reviewers on this site. "Followers" (f37) is the number of followers the reviewer has. "Reviewer ranking" (f38) is published by TripAdvisor. The more points he/she earns on the website, the higher the level he/she is at. "Number of badges" (f39) indicates the travel knowledge possessed by the reviewer. The more travel points he/she collects, the more travel badges he/she can obtain, which will be shown in his/her profile. "Total points" (f40) is equal to the reviewer's travel points on TripAdvisor. You can earn different numbers of travel points by posting comments, posting photos, etc. The 41st variable is "the total number of useful votes."

Dataset Labeling
We followed the methods of past research to find experts to label the data [67]. The three people who assisted in data labeling are TripAdvisor users who are familiar with the functions of the website and have experience of searching for information on the website. In addition, they have experience of making decisions influenced by reviews. During the labeling process, we had three experts (coders) rate the same review at the same time, with a score of 1-5. When the difference between the maximum score and minimum score does not exceed 0.5, we averaged the three scores and reached a consensus. When the distance was greater than 0.5, it means that no consensus was reached. If consensus could not be reached, inconsistent reviews were excluded and the next consensus review score was sought until 300 reviews were collected.
After marking all the reviews, we used the K-means algorithm to divide the impact level into three levels based on people's general psychological judgments (such as large, medium and small). Level 1 is the least influential and Level 3 is the most influential. Results of the K-Means show that the cluster centroids of level 1 to level 3 are 1.4961, 2.8319 and 3.9636, respectively. Then, for the purpose of checking whether there is consistency between the scores of the three coders, we use Kendall W (the Kendall's coefficient of concordance) for the consistency check among the raters. The narrative statistics of the coders are shown in Table 3. The results showed that the Kendall W agreement coefficient is 0.979, the chi-square value is 878.400 and the p-value (=0.000) of significance is less than 0.05, which means that the ratings of the three reviewers (Coders) have high consistency.

Feature Selection
In this study, we applied five feature subset selection methods and the Pearson correlation (PC) method to evaluate the variables. Each of these methods aims to select the subset of attributes most relevant to the predicted target variable. From the results of variable selection, we can determine which variables are more important for prediction. In addition, we can also build a good prediction model based on these variables.

1.
CfsSubsetEval (CS) uses the Pearson correlation formula as a basis for evaluating the predictive ability of each attribute and its redundancy between each other.

2.
InInfoGainAttributeEval (IG): uses information gain as a criterion to evaluate the valence of attributes. It mainly records how to select appropriate attributes from the data set as the basis for data classification through entropy.

3.
ReliefFAttributeEval (RF) evaluates the value of attributes by distinguishing the degree of feature values from different groups of samples that are similar to each other. 4.
OneRAttributeEval (OR) establishes a rule for each attribute in the training data, and then selects the rule with the smallest error. It is usually used as a baseline performance benchmark for attribute evaluation.

5.
SymmetricalUncertattributeEval (SU) evaluates the worth of an attribute by measuring the symmetrical uncertainty with respect to the class. It establishes the SU score of an attribute. This criterion is similar to the gain ratio criterion.
The above subset selection methods were implemented with the Weka data mining package.

Ensemble Model
The ensemble method uses multiple learning algorithms to obtain predictive performance that is superior to the performance obtained from a single learning algorithm of any composition. The advantage of ensemble methods is that they can combine and smooth deviations of learning algorithms. For the proposed ensemble model, we have implemented stack generalization. Our stack architecture is divided into two layers: layer zero combines five algorithms, namely support vector machine (SVM), logistic regression, neural network, random forest and REPTree as basic classifiers. Then in the first layer, we take the prediction result of the zeroth classifier as input and use the meta-classifier to optimize the results of the basic classifier. We use Naive Bayes as a meta-classifier to optimize the results of the basic classifier and reduce the generalization error. Naive Bayes is a probability classifier that calculates the probability of each category of a given sample, and then outputs the sample category with the highest probability. Figure 2 shows the framework of our ensemble model. Through 10-fold cross-validations, the influence prediction of the ensemble machine learning model proposed in this study is completed. The 10-fold cross-validation randomly divides the original dataset into 10 subsets. In each test, nine of the 10 subsets are used as training data to build a prediction model, and the remaining data are used to evaluate the model. Finally, we compare our method with several classification algorithms, such as random forest, multilayer perceptron, logistic regression, support vector machine (SVM) and REPTree. All stages of classification (training and modeling) are implemented through the Weka data mining package.

Experiment and Evaluation
In the experiment, we have a total of seven methods to select attribute sets, and a total of six algorithms to build models. The attribute set selection algorithm includes all attributes, PC, CS, IG, RF, OR and SU. The six algorithms for building models include support vector machines, logistic regression, neural networks, random forest and REPTree and ensemble models. In Study 1, we observed the performance of the model constructed using all variables. In Study 2, we observed the performance of models constructed using variables selected by PC, CS, IG, RF, OR and SU, respectively. Finally, in Study 3, we observed the performance of models built using variables from four aspects: usefulness, credibility, information quality and credibility. Through 10-fold cross-validations, the influence prediction of the ensemble machine learning model proposed in this study is completed. The 10-fold cross-validation randomly divides the original dataset into 10 subsets. In each test, nine of the 10 subsets are used as training data to build a prediction model, and the remaining data are used to evaluate the model. Finally, we compare our method with several classification algorithms, such as random forest, multilayer perceptron, logistic regression, support vector machine (SVM) and REPTree. All stages of classification (training and modeling) are implemented through the Weka data mining package.

Experiment and Evaluation
In the experiment, we have a total of seven methods to select attribute sets, and a total of six algorithms to build models. The attribute set selection algorithm includes all attributes, PC, CS, IG, RF, OR and SU. The six algorithms for building models include support vector machines, logistic regression, neural networks, random forest and REPTree and ensemble models. In Study 1, we observed the performance of the model constructed using all variables. In Study 2, we observed the performance of models constructed using variables selected by PC, CS, IG, RF, OR and SU, respectively. Finally, in Study 3, we observed the performance of models built using variables from four aspects: usefulness, credibility, information quality and credibility.

Study 1: Using All 41 Predictive Variables
In this experiment, we used all 41 variables to build the model. The average accuracy of all algorithms is around 65%, and the accuracy of the ensemble model proposed in this study can reach 71% (See the "All features" column of Table 4).  Table 5 shows the Pearson correlation value of each feature. According to the analysis results, important attributes are f1, f10, f11, f14, f15, f16, f17, f18, f19, f20, f21, f26, f32, f34, f35, f36, f38, f39 and f40.
From the results of these significant variables, it can be found that these belong to four categories related to reviews (RC, RA, PD, AR). In the variable field selected in this study, each category has factors that influence purchase intention. f14 (the degree to which product features are mentioned), f38 (the level in the website), f20 (the time of the comment release), f32 (there is more identity disclosure) and f11 (the number of photos) are the top five significant variables, which are considered as the primary potential predictor variables in this study to reflect variations in influence between different reviews. From the results of important attributes, we can observe that whether this review cuts into the topic and describes relevant features may be the first thing readers should pay attention to. Whether there is more identity disclosure may be a mechanism for readers to determine authenticity, and the ranking of reviewers in the website will also have an impact on the influence of reviews. Among them, in particular, f20 is a negative correlation variable. We found that the longer the comment is posted, the more the impact of the comment will decrease, regardless of the identity and level of the commenter. This is also in line with the timeliness of the concept of information quality. In addition, among other influencing variables, f15 (the degree of deviation from the average score) and f16 (the degree of deviation between the score and the extreme score) are also negatively correlated variables. Both f15 and f16 are indicators of consistency, and both observe the degree of difference between variables. It can be seen from this that the greater the difference, the lower the degree of consistency and the smaller the impact. As explained by Russo et al. [68], when someone chooses between two options, he/she may distort the new information to support any one option that is temporarily preferred. For example, information that meets predetermined preferences in the assessment will be considered more useful. In addition, combined with the significant variables selected through Pearson correlation analysis, the accuracy of the model proposed in this study can reach 75% (see the PC column in Table 4). Note: *: p-value < 0.05; **: p-value < 0.01.

CfsSubsetEval Filtering Method (CF)
In the CfsSubsetEval (CF) filtering method, we chose BestFirst and Greedy Stepwise as the search methods. The variables selected by the BestFirst method are f10, f14, f15, f20, f32, f37 and f41. These variables represent the number of opinion words, the degree of relevance, the degree of deviation from the average score, timeliness, any disclosure, number of fans and total votes. On the other hand, the variables selected by the greedy stepwise method are f14, f15, f20, f32, f37 and f41, which, respectively, indicate the degree of relevance, deviation of product ratings of reviews, timeliness, any disclosure, fans and total votes. In our Ensemble model, the accuracy of the BestFirst method is 75.33%, while the accuracy of Greedy Stepwise is 77.33% (see the "CF" column in Table 4). It can be found from the results that the variable set selected using the greedy stepwise method as the search method has better modeling accuracy.

InfoGainAttributeEval (IG) Filtering Method
Based on the results of this attribute evaluation algorithm, we use the rank method to select attributes with a value greater than 0. There are 16 variables in descending order: f14, f38, f20, f39, f41, f37, f32, f34, f36, f1, f40, f11, f15, f16, f30 and f17. Then, we input different numbers of variables (large to small) into the ensemble model. From the test results, we found that when the last selected variable is variable f32, the accuracy can reach 79% (see the IG column in Table 4), which is the highest point in all results. Therefore, we used these seven variables as the attribute filtering results of InfoGainAttributeEvaluation (IG): f14, f38, f20, f39, f41, f37 and f32, respectively, representing the degree of relevance, reviewer rank, how long this comment has been published, number of badges, total votes, followers and any disclosure.

SymmetricalUncertattributeEval (SU) Filtering Method
Similarly, in this attribute evaluation method, we sort the attributes whose result value is greater than zero according to the filtering result of the SU algorithm, and then use different numbers of attributes as the input of the ensemble model in order from large to small. The results show that when eighth attributes are used, the best performance of the model is 77.67% (see the "SU" column in Table 4). Therefore, we used the variable set f14, f37, f41, f20, f39, f32, f36 and f38 as the result of this attribute filtering method. These variables represent relevance, number of followers, total votes, timeliness, number of badges, any disclosure, contribution and reviewer rank.

ReliefFAttributeEval (RF) Filtering Method
In the fifth attribute evaluation method, we first sort the variables according to the results, and then test each variable combination separately from the maximum value to the minimum value. Based on the ensemble model we proposed, we found that when we acquire the variables before the 17th variable (f29), we can achieve the best performance of 75.33% (see the "RF" column in Table 4), so we will use f32, f38, f14, f17, f15, f28, f10, f13, f16, f26, f20, f23, f30, f33, f11, f24 and f29 as the variable set of this filtering algorithm, which represents any disclosure, degree of relevance, gender, reviewer rank, the rating of review, degree of deviation from average score, number of photos, number of opinion words, timeliness, join years, value score, age rank, average rating, food score, the polarity of title, degree of deviation between score and extreme score and most frequently rated score.

OneRAttributeEval (OR) Filtering Method
In the last attribute evaluation method, we also first sorted the variables according to the algorithm result, and then used different combinations of variables from large to small as the input of the ensemble model. When the last included variable is f33, the results show that the combination of the 16 variables is the most accurate with 72% (see the "OR" column in Table 4), so we chose to use the top 16 variable set as the result of the OR method. The selected variables are f14, f32, f38, f11, f37, f15, f1, f19, f18, f39, f27, f23, f24, f25, f17 and f33, respectively, representing degree of relevance, any disclosure, reviewer rank, number of photos, number of followers, degree of deviation from average score, total number of words, total number of responses, any response, number of badges, atmosphere score, average rating, food score, service score, the points of this review and average rating.
After finishing the experiment of study 2, we analyzed the results. From Tables 6 and 7 below, we can find that f14 and f32 are the only two variables that commonly appear in the Pearson correlation analysis and the five attribute evaluation methods. They are "relevance" and "any disclosure," and we also consider these two variables as the main potential impact characteristics in the significant analysis. It can be considered that the degree of mention of product characteristics and the existence of identity exposure are the key factors that arouse widespread readers' attention. Next, there are two variables that were selected five times, namely f20 (timeliness) and f38 (reviewer rank). From the results of feature filtering, we can find that readers not only pay attention to the content of the review itself, but also refer to the identity of the author of the review to influence their decisions. In addition, the selected high frequency variables are f15 (degree of deviation from the average score), f37 (number of followers) and f39 (number of badges). Among them, in the Pearson correlation analysis, f37 is an insignificant variable. In the influence analysis, this variable was not significantly correlated. Of all the reviews we collected, most reviewers do not have too many followers, and only a few have many followers. Therefore, it may be difficult to determine the importance of this feature for the dataset. However, if a reviewer has more followers, it can actually indirectly affect the reviewer's reputation on the website, thereby affecting the ranking of its reviewers. Table 6. Results of feature selection.

Feature Number
PC f1 f10 f11 f14 f15 f16 f17 f18 f19 f20 f21 f26 f32 f34 f35 f36 f38 f39 f40 CF f14 f15 f20 f32 f37 f41 IG f14 f38 f20 f39 f41 f37 f32 SU f14 f37 f41 f20 f39 f32 f38 f36 RF f32 f38 f14 f17 f15 f28 f10 f13 f16 f26 f20 f23 f30 f33 f11 f24 f29 OR f14 f32 f38 f11 f37 f15 f1 f19 f18 f39 f27 f23 f24 f25 f17 f33 In addition, in the Pearson correlation analysis, f11 (number of photos) was among the top five potential influence variables but was selected only twice among the five attribute filtering algorithms. In the process of tagging a data set, the presence or absence of photos does have an impact. When the coders were asked to tag the dataset, the three coders almost paid more attention to the reviews with photos and increased the degree of discussion. It can be inferred that photos are also a key factor. However, due to the limited number of photos that can be uploaded in each review, there are not many reviews with photos. Therefore, although f11 is a less influential variable in terms of analysis, in reality it may have some degree of influence.

Experiment Results of Study 2
The experiments in Study 2 applied the six algorithms to the variable sets selected by the six attribute selection methods. Table 4 shows the accuracy results of all these combinations.
From each entry in Table 4, we can see the accuracy of the combination of algorithm and feature selection method. For example, the best performance can be achieved by combining the Ensemble algorithm with the IG feature selection method, while the worst performance is achieved when the REPTree algorithm is combined with the OR data selection method.
From each row in Table 4, we can observe the average performance of all six different algorithms. The results in Table 4 show that the best performing algorithm is the Ensemble algorithm, followed by SVM, multi-layer perceptron, random forest, logistic regression and finally REPTree. In fact, in all datasets, the performance of the Ensemble algorithm is superior to that of the other five algorithms. This phenomenon shows that the Ensemble algorithm is not only effective but also robust. According to their performance, these six algorithms can be further divided into three categories. The first group consists of just the Ensemble algorithm. The second group includes SVM, multi-layer perceptron and random forest, due to their similar accuracy. The last group contains two algorithms, logistic regression and REPTree; their performance is very poor.
From each column of Table 4, we can observe the average performance of all six feature selection methods. The results in Table 4 indicate that the best feature selection method is IG and the worst is OR. In fact, the performance of OR is not as good as the method using all 41 variables, as shown in Table 4. From Table 6, we see that IG uses only eight variables, while OR uses 16 variables. This comparison shows that using too many redundant variables can adversely affect model performance. A very interesting finding is that the three best data selection methods in Table 4 are IG, SU and CF. As shown in Table 6, these three methods use the least number of variables to build the model (respectively, 8, 9 and 7 variables). This phenomenon shows that it is better to use fewer but important variables than to use more but redundant variables.
In addition to the accuracy indicators, we also used other indicators to measure the effectiveness of these combinations. Other indicators include accuracy, recall rate, F1 measure and ROC area. Table 8 shows the average performance of five indicators of six different classification algorithms, while Table 9 shows the average performance of five indicators of six different feature selection algorithms. Since the results of these two tables are similar to those of Table 4, we will omit their discussion.

Study 3: For the Four Prediction Facets
In Study 3, we used four important theoretical aspects (helpfulness, credibility, information quality and professionalism) as the input variable combination of the ensemble model. In other words, when we used helpfulness as input to predict the target variable, the 24 variables listed in the helpfulness aspect in Table 1 was used as the input variables for model building. The results are shown in Table 10. The highest accuracy ensemble model in this study is a combination of variables selected by IG, and its accuracy can be close to 79%. The selected variables (f14, f38, f20, f39, f41, f37, f32) are completely related to the four theoretical aspects proposed in this study. Among them, according to Table 1, f14 and f32 are applicable to helpfulness. f14, f38, f39, f41, f37 and f32 are in line with the credibility. From the perspective of information quality, the fitting variables are f14, f38, f20, f39, f41 and f37. In the last aspect of professionalism, there are f38, f39, f41, f37 and f32. Table 10 shows the analysis results with four theoretical aspects. From the data, we can see that information quality (IQ) is the most accurate model, followed by credibility. Surprisingly, the accuracy of the helpfulness dimension is the lowest. However, no matter which aspect is used, its performance is more than 10% lower than that of the model built on all four aspects. Through the experiment of Study 3, we can prove that the previous literature discussing several important aspects of e-commerce is not enough to fully represent the influence on purchasing intention. This also shows that from the perspective of readers and consumers, the impact on purchase intention is not the same as that on helpfulness, credibility, information quality and professionalism. Therefore, our research proves that predicting the purchase intention from all aspects together can achieve a good influence assessment.

Conclusions
This study proposes a method to assess the impact of product reviews based on the four important theoretical aspects of reviews in the field of e-commerce, namely help-fulness, credibility, information quality and professionalism. We adopted feature filtering algorithms (CF, IG, RF, OR, SU) and proposed an ensemble model to integrate these classification results to obtain the most accurate prediction. The results showed that the accuracy of the ensemble model can reach 79%. Furthermore, we also conducted experiments on these four theoretical aspects separately. As a result, it was confirmed that the important aspects of e-commerce discussed in the previous literature were not sufficient to fully represent the influence. From the perspective of consumers, the impact on purchase intention is different from the meaning or degree of usefulness, credibility, information quality or professionalism, but these are indeed interrelated. It was found that the results of the analysis help to clarify the relationship between these factors and help us to understand which kind of comment has influences on purchasing intention.
The products we chose are experienced products, not credence or search goods, so the basis and perception of readers when viewing reviews and making decisions may vary. In addition, the age and nationality of the reviewers will also affect readers' feelings.
In the future, we may be able to consider more factors and check the correlation between more variables and filters to improve the accuracy of the system and conduct a more comprehensive analysis.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.