#### 4.2. Model Findings

Table 4 shows the output of the system GMM regression. Five models are presented depending on the case of review visibility assumed and on the weighting approach followed.

We observe that review non-textual and textual variables are significant in every model: ln_volume, ln_rating, ln_rating_inconsistency, ln_analytic, ln_authentic, and ln_clout. The interaction term ln_rating*ln_rating_inconsistency is also significant in every model, so H1a and H1b are supported. Therefore, both review non-textual and textual features influence product sales not only in the traditional case of review visibility (case 1), where every review is assumed to have the same probability of being viewed, but also in the rest of cases (cases 2 and 3), where we assume that consumers sort online reviews either by the most helpful mechanism (case 2) or by the most recent mechanism (case 3).

To test

H2a and

H2b, we should look at the possible differences between review variables coefficients among models. To graphically show the results from models 1 to 5, we represent in

Figure 3 and

Figure 4 the review non-textual and textual variables coefficients, respectively, in the different models. As far as

H2a is concerned, we observe that the coefficient sign of review non-textual variables is the same in every model, while the magnitude changes among models. Therefore, we can support

H2a because we observe differences among review visibility cases.

H2b is also supported, since the coefficient magnitude, and even the sign, of review textual variables differs among review visibility cases. In fact, we find bigger differences between review visibility cases when dealing with review textual variables.

Differences between review visibility cases have been further explored. First, we observe that when we compare between the two approaches of review visibility within the same case (Model 2 vs. Model 3, and Model 4 vs. Model 5), the approach where we assume that consumers view either the top five most helpful or the top five most recent online reviews (Model 3 and Model 5, respectively) has greater review variable coefficients than those in the approach where we assume consumers view every online review in a decreasing order when sorting either by most helpful or by most recent (Model 2 and Model 4, respectively). These findings might suggest that review features of the top five ranked online reviews (either top five most helpful or top five most recent) have a stronger influence on consumer purchase decisions. Second, we also notice that review variables coefficients are higher in case 2 than in case 3. Therefore, this might indicate that information contained in most helpful online reviews is likely to have a greater impact on consumer purchasing behavior than the information in most recent online reviews. A possible explanation is that consumers might experience a “wisdom of the crowd” effect when they evaluate the most helpful online reviews [

56]. This effect refers to the fact that consumers might believe that, since many other consumers have voted the information contained in those reviews as helpful, that information about the product might be a better approximation to the truth, so consumers are more likely to rely on it when making a purchase. Moreover, if we look at case 2, we observe that coefficients are bigger in Model 3 than in Model 2, which might indicate that those top five online reviews have a strong influence on consumers’ purchase behavior. This influence is greater than if we consider every individual online review with its corresponding visibility probability, represented by Model 2. Therefore, these findings might not only suggest that most helpful reviews are more influential than most recent reviews, but also that those online reviews placed on the first page of online reviews of each product have even a greater impact in consumers’ purchase behavior.

Overall, we could say that if we just considered Model 1, in which we assume all reviews have the same probability of being viewed (approach traditionally used in previous literature), we could get to misleading conclusions because the strength, and even the sign, of some effects, is not the same as it is in the other review visibility cases. For example, the coefficient of ln_rating is δ = 0.116 in Model 1, while it is δ = 0.578 in Model 3. Therefore, we observe that the product average rating has a higher impact when we assume that consumers read the top five most helpful reviews of each product. In other words, this might indicate that the impact of the average rating of the five most helpful online reviews is greater than the impact of the overall product average rating of the product. In this line, another pattern we observe is that the effect of the review non-textual variables, ln_rating and ln_rating_inconsistency, and also the interaction term ln_rating x ln_rating_in is greater (they have a higher coefficient) when we assume that consumers evaluate either the top five most helpful reviews (v_{2.2}) or the top five most recent reviews (v_{3.2}), than when we assume that consumers evaluate every online review following either the most helpful (v_{2.1}) or the most recent rank order (v_{3.1}). However, we cannot observe this pattern in the case of review textual variables.

Concerning the variable

L1_ln_sales_rank_inverse, it is positive and significant in every model, which means that the bestselling rank of the previous week positively impacts the bestselling rank of the current week. This confirms the dynamic behavior of the dependent variable in our model. Besides, this finding is even more relevant in our context, where consumers are likely to be influenced by a social influence effect when they are choosing between products within a category. Since consumers believe that many people have bought those products in top positions in the bestselling list, they are likely to continue buying those products, due to the social influence effect [

12,

59,

112]. The variable

ln_volume is always positive and significant, so the higher number of online reviews of a product, the more likely the product is in top positions of the bestselling rank. Coefficients for

L1_ln_sales_rank_inverse and

ln_volume are quite steady amongst models, so it might indicate that the effect of those variables on the sales rank does not depend much on the different cases of review visibility.

Ln_rating is also positive and significant in each model. Therefore, the better the product average rating, the better the bestselling position of the product. This means that regardless of the case of review visibility, the average rating always has a positive impact on the bestselling rank. However, we observe bigger differences in terms of coefficients magnitude. Ln_rating has a stronger impact when it is built considering the most helpful visibility of online reviews (case 2). Therefore, the higher the average rating of most helpful online reviews, the stronger the positive effect of ln_rating on the bestselling rank. It means that when the average rating of those reviews in top positions when sorting by the most helpful criterion is high, it has a greater positive impact on the bestselling ranking. This finding makes sense because it implies that those online reviews in top positions by the most helpful ranking are not only positive (high stars rating), but also, they have been voted as helpful by other consumers, which means that many other consumers have found the information provided by the review useful or diagnostic. On the other hand, the effect of ln_rating when considering review visibility by most recent (v_{3.1} and v_{3.2}) is also significant, but it is smaller than in case 2. Thus, the product average rating of the most recent online reviews also has a positive effect on the bestselling rank, but the effect is smaller than the one of the most helpful reviews. A possible explanation is that the date itself does not provide any extra information for consumers about the usefulness or diagnosticity of online reviews—it just means that the review has been recently published. However, the number of helpful votes is, by itself, rich information provided by online reviews.

The effect of ln_rating_inconsistency is positive and significant in every model. It means that the higher the difference between each individual review rating and the product average rating, the better the impact on the bestselling rank. Thus, it might be good for products to have online reviews whose ratings are different from the product average rating. This might indicate that those products that have more “extreme” online reviews, are more likely to be in better bestselling positions. A possible reason is that, since most online reviews at the online retailer are very positive (5-star online reviews), it is good for the product to have also negative online reviews. In this way, consumers can know both the positive and negative features of the product. Being aware of both the positive and negative information makes consumers have a better attitude towards the product because they might believe they have more real information than if they have only positive or only negative information. If we compare among models, there are also differences in the magnitude of coefficients. Again, the effect of ln_rating_inconsistency is stronger when we assume that online reviews are sorted by the most helpful criterion (case 2) rather than the most recent criterion (case 3). This might indicate that the presence of both the positive and negative online reviews in top positions of the most helpful ranking has a greater positive impact on the product bestselling ranking. As in the case of the ln_rating, being in top positions in the most helpful rank means that many other consumers have found the information of that online reviews useful or diagnostic. So, both, positive and negative reviews in top positions of that ranking have been useful for consumers, and therefore, prospective consumers find that information more trustworthy and closer to reality. If we had just looked at case 1, we would think that the effect is much stronger than it is when we consider review visibility.

We have also incorporated to the model an interaction term between ln_rating and ln_rating_inconsistency. We observe that in every model the interaction term is negative and significant. This indicates that the effect of ln_rating on ln_sales_rank_inverse is mitigated by ln_rating_inconsistency. In other words, when there is a high difference between individual review ratings, and the product average rating, the effect of the product average rating on the product bestselling rank is reduced. Thus, the presence of “extreme” online reviews makes the ln_rating itself to be less influential on the product bestselling rank. As in previous cases, this relationship is stronger in case 2 than in case 3. Therefore, when the presence of “extreme” reviews in the top most helpful ranking is high, the effect of the average rating of those most helpful online reviews on the product bestselling rank is smaller.

Finally, we observe that the effect of the review textual variables ln_analytic, ln_authentic, and ln_clout is significant in every model, but there are some differences in both sign and magnitude. Ln_analytic has a negative impact in case 1 and case 3, while it is positive in case 2. Having more organized, logical, and hierarchical written online reviews is positive when we are in case 2, where consumers evaluate online reviews based on the most helpful criterion. However, this feature of online reviews has a negative impact on sales when we are in case 1, when we assume that all reviews have the same visibility, and in case 3, when we assume that consumers evaluate online reviews based on the most recent criterion. Thus, we might think that consumer decision-making changes depending on the set of online reviews they view and evaluate. Ln_authentic, and ln_clout positively influence product sales in case 1, but they both have a significant and negative coefficient in the rest of the models. Therefore, if only Model 1 was evaluated, which is the one traditionally used, we might think that first, products with more personal and humble online reviews (high values in ln_authentic) and second, products with online reviews showing high reviewer confidence and leadership (high values in ln_clout) are more likely to be sold. However, we observe the opposite effect if we consider the other review visibility cases. When we assume that all reviews do not have the same probability of being viewed and consumers evaluate reviews based on either the most helpful or more recent criterion, we observe that both ln_authentic and ln_clout negatively influence product sales.

Overall, we observe that just considering one review visibility case (case 1) might lead to biased conclusions, since Model 1′s output differs from the rest of the models. To get a broader picture of the effect of online reviews on product sales, several cases of review visibility should be explored.

#### 4.3. Misspecification Tests and Alternative Panel Data Models

Four misspecifications tests are conducted to check the validity of the models and are reported in

Table 5. First, two Wald tests of the joint significance of the reported coefficients (

z1) and time dummy variables (

z2) are reported, with degrees of freedom in parentheses. The null hypothesis for

z1 claims no relationship between the explanatory variables, and the null hypothesis for

z2 posit no relationship between time dummy variables. The two Wald tests indicate that there is joint significance of explanatory variables and time dummy variables. Second, the Hansen test verifies the validity of the instruments or, in other words, the lack of correlation between the instruments and the random disturbance of the error term. The null hypothesis is that the instruments are not valid so failure to reject the null hypothesis means that the instruments are valid. We do not reject the null hypothesis, so our instruments are valid. Finally, the AR(2) test [

113] was conducted to test the lack of second order serial correlation of the first differenced residuals. The null hypothesis is that the residuals are serially uncorrelated. Therefore, if the null hypothesis is not rejected, it provides evidence that there is no second-order serial correlation and the GMM estimator is consistent. The AR (2) tests in our models indicate that we cannot reject the null hypothesis, so there is no second-order serial correlation and the GMM is consistent. Overall, the four tests indicate that the models are well specified.

There is theoretical and empirical evidence that the system GMM is the panel data model that better controls for unobserved heterogeneity and endogeneity of explanatory variables, so it is the one with less estimation bias [

110,

111,

116]. To explore the output of other commonly used panel data models, which do not control for endogeneity, and to compare it to the system GMM results, we have estimated those models for each case of review visibility.

Table 5 reports the output of the different panel data models when we consider case

v_{2.1}, where we assume consumers sort online reviews by the most helpful order and all reviews have a decreasing probability of being viewed. In column 1, we show the results of the Ordinal Least Squares (OLS) estimator, and columns 2 and 3 report the results of the Fixed Effects (FE) and Random Effects (RE) estimators. Finally, column 4 shows the output of the adopted system GMM estimator. Focusing on review variables, we observe some differences concerning review numeric variables, but not a clear pattern. For example,

ln_rating is significant in every model, but not in the FE model, and

ln_rating_inconsistency is significant in the FE and system GMM models, but not in the OLS and RE models. We observe a clearer pattern in terms of review text variables, which are only significant in the system GMM model. Thus, we can conclude that not dealing with endogeneity in our analysis might bias the results. We have estimated every model (OLS, FE, RE, and system GMM) for the rest of the review visibility cases (

v_{1}, v_{2.2}, v_{3.1}, and

v_{3.2}), and overall, results follow the same pattern as in the discussed case

v_{2.1}, shown in

Table 5. Comparison tables for each review visibility case are shown in

Appendix A (

Table A1,

Table A2,

Table A3 and

Table A4).