4.1. Single Variable Necessity Analysis
According to two basic strategies in set theory, before constructing a truth table for configuration analysis, it is necessary to evaluate whether a condition or a combination of conditions constitutes a necessary condition for the result. The necessary condition analysis (NCA) method was used to conduct necessity testing. For continuous variables and categorical variables, the Capacitated Regression (CR) method and Capacitated Envelope (CE) method are employed for estimation, respectively. Only when the effect size is not at a low level, and the two conditions of significance are met at the same time, can the antecedent variable be identified as a necessary condition for the outcome variable. The NCA package in R language was used for analysis, and the specific results are shown in
Table 6. Among them, the effect size of each antecedent variable is 0, so it does not constitute a necessary condition for the outcome variable, indicating that the perception of review helpfulness is the result of the joint influence of multiple factors.
The necessary conditions function in fsQCA3.0 was used to test the NCA necessary condition analysis, and the consistency threshold was set to 0.9 [
42]. The specific results are shown in
Table 7. In fsQCA necessity analysis, consistency captures how well the relationship of necessity is approached. Coverage, on the other hand, indicates the relevance (or, conversely, the trivialness) of a necessary condition. The results show that the consistency of all single variables in the composition of high review helpfulness is lower than 0.9, indicating that a single antecedent variable does not constitute a necessary condition, which is consistent with the NCA results.
4.2. Configuration Analysis
In the process of constructing the truth table, the minimum case frequency, raw consistency, and PRI consistency threshold are set to 2.00, 0.80, and 0.60 [
43]. To bolster the credibility of the findings from this larger-sample QCA, a minimum case frequency threshold higher than the customary ‘1’ used in small-N studies was adopted. If the logical condition combinations fall below the threshold, the outcome variable is manually marked as 0, following Ding’s [
43] approach, and only reports the parsimonious solution for two main reasons. First, the QCA method still includes redundant elements in the generation of complex and intermediate solutions, which raises concerns about the interpretability of causal inferences and makes parsimonious solutions more reliable [
44]. Second, this study yielded 12 parsimonious solutions, of which all but two have consistency scores greater than 0.8, thereby constituting sufficient conditions for the outcome.
The specific solution results are shown in
Table 8. After removing solutions with consistency below the threshold, a total of 10 parsimonious solutions were obtained. The overall coverage is 0.536, indicating that the 10 parsimonious solution configurations can explain 53.6% of the cases with high review helpfulness. Following the derivation and categorization of the parsimonious solutions, the theoretical naming of these configurational paths was undertaken. Each label was formulated through a systematic interpretation of the combination of conditions within its respective solution group, grounded in the core mechanisms and essential attributes that these configurations exhibited in explaining high review helpfulness. Based on the occurrence of conditions, the configurational paths leading to high review helpfulness are categorized into five types:
effective explanation type, unilateral negative type, insufficient integrated type, sufficient integrated type, and complementary type. 4.2.1. Effective Explanation Type
The characteristics of the effective explanation type comments are that the image uses an effect attribute, and the number of interactions appears in a conditional form. Configuration 1 shows a higher consistency between the image and the text, while configuration 2 shows that the comment image contains more information. An example of this type is in
Table 9.
When shopping on online platforms, consumers lack real-life product interaction and often use alternative methods to evaluate product value in pursuit of a realistic experience [
45]. Showing the product usage effect in reviews can reduce consumers’ uncertainty and improve their perceived helpfulness to a certain extent. For example, in
Table 9, the comments in configuration 1 show consumers’ dissatisfaction with the camera’s bad pixels, and the persuasiveness of the comments is enhanced by posting pictures of bad pixels. The comments in configuration 2 show the camera’s shooting effect, expressing consumers’ satisfaction with the product’s usage effect and requesting interaction with other consumers. Therefore, by clearly demonstrating product effects and functionality, this ‘Effective Explanation Type’ primarily helps consumers manage essential processing, as the core information about the product’s performance becomes more concrete and easier to mentally digest. According to the CTML theory, observation of product usage behavior will trigger consumers’ memory retrieval of similar behaviors and help them understand the overall content of the review.
4.2.2. Unilateral Negative Type
Unilateral negative information comments are characterized by negative textual sentiment, accompanied by either high textual information entropy or high image information entropy. Comments in configuration 3 exhibit higher image information entropy, while comments in configuration 4 exhibit higher textual information entropy. Specific examples of comments are provided in
Table 10.
Negative reviews are comments in which consumers tend to express dissatisfaction with their shopping experience by conveying negative emotions, thereby warning other consumers. Existing research indicates that consumers are more likely to trust negative reviews compared to positive ones. This is because negative reviews often use text or images to provide detailed descriptions of product issues or defects. These specific issues are perceived by consumers as more objective evidence of a product’s performance, making it easier to associate such reviews with the product’s objective attributes and enhancing the perceived helpfulness of the review.
For example, in configuration 3, the review highlights dissatisfaction with the product’s shipping service, particularly the lack of attention to the packaging of valuable items. Additionally, the review is supplemented with eight images illustrating different product attributes, demonstrating high image information entropy. In configuration 4, the textual review identifies product shortcomings from multiple perspectives, such as packaging issues, quality problems, and customer service concerns. Although it includes only one review image, it reflects high textual information entropy. The ‘Unilateral Negative Type’ enhances perceived helpfulness by primarily helping consumers reduce irrelevant processing, as the direct and information-rich negative feedback (whether textual or visual) allows for a focused and efficient assessment of potential product shortcomings. Thus, negative reviews with high image or textual information entropy are likely to be more helpful.
4.2.3. Insufficient Integrated Type
The characteristics of insufficient integrated text–image reviews are that the review images focus on a single product attribute, while the integration level between text and image information is relatively low. In Configuration 5, the review images focus on the global attributes of the product, whereas in Configuration 6, the review images focus on the local attributes of the product. Specific examples of reviews are provided in
Table 11.
Text–image integration represents the alignment between image information and textual information. Previous studies suggest that lower levels of text–image integration reduce the cognitive processing fluency of readers, thereby lowering the perceived usefulness of reviews [
8]. This is because when the “visual model” and the “language model” process different concepts, people often experience confusion or a sense of conflict.
However, this study finds that reviews with lower integration levels can also result in high perceived usefulness. For example, in Configuration 5, the textual review describes various aspects such as price protection, free gifts, packaging, and fingerprints, but only the first image corresponds to the information mentioned in the review. In Configuration 6, the textual review describes defects in the camera lens and the rights protection process, but there is only one image which highlights a scratch on the lens.
In these cases, the volume of review images is relatively small, while the text mentions multiple product features or problem scenarios. As a result, the images can only reflect part of the content, making it difficult to maintain consistency with the lengthy text. The ‘Insufficient Integrated Type’, despite its lower integration, enhances review helpfulness by primarily aiding consumers to reduce irrelevant processing, as the focused, albeit limited, imagery can prevent information overload from disparate cues and guide attention to specific key points within the broader textual content. In this situation, the smaller number of images reduces the likelihood of irrelevant cognitive processing, allowing consumers to focus on the key feedback points emphasized by the reviewer and concentrate on the necessary cognitive processing.
On the other hand, the product attributes highlighted by the review images may align with the product information consumers expect to obtain, thereby enhancing the perceived usefulness of the review.
4.2.4. Sufficient Integrated Type
The characteristics of text–image sufficient reviews are defined by simultaneously high textual and image information entropy, along with high reviewer credibility and a significant number of interactions. In configuration 7, the review images focus on the additional attributes of the product. In configuration 8, the review demonstrates a high degree of text–image integration in addition to high textual and image information entropy. Specific examples of such reviews are provided in
Table 12.
Multimodal reviews can enhance consumers’ depth of processing product information through the diversity of information entropy in both text and images. When the information entropy in a review is high, consumers may expend more cognitive effort to process this information but, in return, gain a more comprehensive understanding and insight into the product. High information entropy in both images and text together creates a comprehensive and in-depth review, aiding consumers in making more effective decisions. This integration of information significantly enhances the perceived usefulness of the review. The ‘Sufficient Integrated Type’ leverages its rich, high-entropy content in both text and images, often coupled with high credibility and strong integration, to help users not only manage essential processing of this detailed information but also to actively promote generative processing, thereby facilitating a comprehensive and insightful understanding. For example, in configuration 7, the review text and images detail the buyer’s satisfaction with the product and service from multiple perspectives, including product packaging, the attitude of logistics personnel, and delivery speed. In configuration 8, the review text adopts a structured format, describing the positive shopping experience from angles such as product packaging, convenience, and other unique features. This structured approach improves the readability of the review, reduces irrelevant cognitive processing in the verbal module, and facilitates subsequent multimodal information integration. As a result, the review in configuration 8 provides diverse product information while reducing the cognitive effort required for processing, making it easier for readers to integrate the information.
Additionally, the platform membership badge of the reviewer enhances the credibility of the information source, improving the overall perceived usefulness during the integration stage where prior knowledge plays a role. Therefore, multimodal reviews with high information entropy in both text and images represent an ideal form of high-quality reviews.
4.2.5. Complementary Type
The characteristics of text–image complementary reviews lie in the conditional complementarity between textual information entropy and image information entropy. Specific examples of such reviews are provided in
Table 13.
A core principle of the CTML is that working memory capacity is limited, and information from different modalities can share the cognitive load [
46]. When textual information entropy is low while image information entropy is high, images serve as the primary source of information delivery. Through the diversity and richness of visual information, they can quickly convey key messages and reduce the likelihood of irrelevant cognitive processing by readers. Conversely, when textual information entropy is high while image information entropy is low, detailed textual descriptions provide sufficient information, with images playing a supplementary or intuitive role without requiring complex or diverse visual content. In essence, the ‘Complementary Type’ optimizes cognitive load by primarily helping consumers reduce irrelevant processing, as one modality clearly carries the main informational weight while the other offers focused support without introducing unnecessary complexity or cognitive distraction.
For example, in configuration 9, the review text provides a detailed explanation of issues related to product packaging and invoice services, using only a single image for clarification. In configuration 10, the textual information in the review is relatively simple, but it is supplemented with eight images showcasing different product attributes. Thus, the complementary combination of textual and image information entropy helps optimize the way information is conveyed, thereby enhancing the overall perceived usefulness of the review.
4.3. Robustness Check
Acknowledging the sensitivity and potential for arbitrariness inherent in the calibration of antecedent conditions and the outcome variable in fsQCA, a robustness check was performed. For this check, the PRI consistency threshold was raised to 0.7, with all other analytical parameters held constant.
Table 14 displays the results. The parsimonious solution yielded five configurations that met the conditions outlined previously. Specifically, configurations 1, 2, 3, and 4 were identified as subsets of the configurations from the original analysis. Moreover, configuration 5, constituted by its antecedent conditions, showed no substantive changes. This provides a degree of confidence in the stability of the analytical findings.
To scrutinize the research findings from an alternative analytical perspective, a binary Logit regression model was introduced as a supplementary analysis. This approach aimed to estimate the impact of each antecedent condition on the likelihood of a review achieving high usefulness. If conditions identified as core or consistently important within the fsQCA configurations also demonstrated a statistically significant influence in the Logit model, this would provide corroborating evidence from a different methodological standpoint for the importance of these factors.
In constructing the Logit model, the outcome variable “Review Usefulness (RH)” was dichotomized. Specifically, when a review’s fuzzy-set membership score for RH exceeded 0.5, the dependent variable, “High Usefulness Review (
),” was coded as 1; otherwise, it was coded as 0. The independent variables were kept consistent with the antecedent conditions used in the fsQCA analysis, totaling 11 variables. The basic form of the model is as follows:
The analytical results of the Logit regression model are presented in
Table 15. Both interaction volume (IV) and experience attributes (EX) demonstrate a significant positive impact on achieving high review usefulness in the Logit model. This finding aligns with their identification as core conditions in multiple high-usefulness configurations derived from the fsQCA. For instance, IV is present as a core condition in seven configurational paths, and EX appears as a core condition within the ‘Effective Explanation Type’ configuration. Consequently, the results of the Logit regression analysis, particularly in the identification of key factors, exhibit consistency and complementarity with certain findings from the fsQCA, thereby enhancing the robustness of this study’s conclusions.