1. Introduction
In the digital era, consumers increasingly rely on online information to make purchasing decisions. Imagine a traveler planning a vacation and choosing between two comparable hotels on an online platform. While one hotel provides numerous text-based reviews, the other offers multimodal reviews enriched with vibrant images that showcase room interiors, amenities, and surrounding landscapes. The visual cues displayed by these images can significantly influence travelers’ perceptions, potentially tipping the decision in favor of the hotel with image-enhanced reviews. This scenario highlights the significant influence of visual information on shaping consumer perceptions and decisions within online environments.
Visual information has a profound impact on consumer behavior, as it enhances understanding, reduces uncertainty, and influences emotions. Studies have shown that images in online content can increase engagement, improve memory recall, and affect purchase intentions [
1]. In online reviews, visual cues, such as images, complement textual information by providing richer representations of products or services, thereby making the reviews more helpful [
2,
3]. From the perspective of photography, these images in reviews provide specific information about the target products or services, including associated components within the same context [
4]. The complexity of these images, encompassing elements, such as color diversity and size, can either enhance or impede their effectiveness [
5]. While some research suggests that complex images can capture attention and convey detailed information [
6], others argue that excessive complexity may lead to cognitive overload, reducing information processing efficiency [
7].
Online reviews are now indispensable in the decision-making processes, acting as a form of electronic word-of-mouth (eWOM) that significantly influences perceptions of product quality, credibility, and trustworthiness [
8]. With the exponential growth of e-commerce, consumers increasingly rely on the experiences and opinions of others to guide their purchasing decisions, particularly in environments characterized by information asymmetry and perceived risk [
9,
10]. Helpful online reviews not only affect individual consumer choices but also have a measurable impact on product sales and brand reputation [
11].
The helpfulness of reviews reflects the extent to which they assist consumers in making informed decisions by providing valuable information [
12]. Consumers often trust reviews that have been deemed helpful by others, using these endorsements as a heuristic for credibility and utility [
2]. Extensive research has explored factors affecting review helpfulness, focusing primarily on textual characteristics such as review length, sentiment, extremity, and linguistic style [
2,
12,
13,
14]. For instance, longer reviews tend to be perceived as more helpful due to the detailed information they provide [
15]. Reviews with moderate ratings are often seen as more credible, as they appear more balanced and less biased [
12]. However, despite the growing prevalence of images in online reviews, limited attention has been given to how image complexity influences perceived helpfulness. The research gaps give rise to the following unanswered research questions. What visual cues influence review helpfulness? How do visual cues influence review helpfulness? Is this influence affected by the characteristics of the text review? As platforms increasingly incorporate visual content, understanding its impact on review helpfulness becomes essential for both theory and practice.
To address this gap, the present study examines how visual cues (color diversity and texture homogeneity) and textual cues (readability) interact to influence consumers’ evaluations of online reviews. Drawing on Information Diagnosticity Theory and Dual Coding Theory, we develop a research framework and then analyze online review data in the hotel and travel sectors, to examine how these visual attributes influence review helpfulness. This research contributes to the literature by integrating visual factors into the analysis of review helpfulness, offering practical insights for consumers, platform managers, and marketers to enhance the effectiveness of online reviews.
The rest of the paper is organized as follows.
Section 2 reviews the related research work.
Section 3 points out the fundamental theories used to elaborate the hypotheses.
Section 4 describes the empirical data and variable descriptions. The empirical analysis, including the empirical model, result analysis, robustness test, and heterogeneity analysis, is demonstrated in
Section 5. Finally,
Section 6 presents the discussion, including theoretical and managerial implications, limitations, and further research.
4. Empirical Data and Variables Description
4.1. Data Description
The datasets used to test the proposed hypotheses focus on hotels and travel, selected for two primary reasons. First, with the rapid growth of the experience economy, consumer demand for services has steadily increased, positioning travel as a key avenue for leisure and entertainment. In planning their trips, consumers often prioritize the comfort of their accommodations. Unlike traditional hotels, homestays offer personalized and unique experiences, making them a preferred choice for travelers seeking distinctive lodging options. Second, both hotels and travel generate substantial volumes of online reviews, reflecting a level of consumer interest. Research has shown that images of destinations and restaurants in online reviews significantly influence consumers’ intentions to visit or make a purchase in the tourism and hospitality industry [
63,
64]. This abundance of user-generated content provides a rich dataset for empirical analysis.
Data were sourced from two major online travel platforms in China, Ctrip and Tongcheng, both known for their extensive user bases, transaction data, and structured review systems, ensuring the availability of high-quality data. For the hotel segment, reviews were collected from Ctrip for accommodations in Shanghai, Hangzhou, and Ningbo. A total of 42,333 online reviews were gathered from these cities. For the travel segment, 43,615 reviews were collected from Tongcheng.
The data collection process captured various variables, including review content, number of reviews, ratings, images, publication dates, and reviewer names. After data cleaning, 14,111 image-containing reviews were extracted from the 42,333 hotel reviews, and 11,010 image-containing reviews were selected from the 43,615 travel reviews. The final dataset comprised 25,121 image-based reviews, serving as the basis for further analysis.
4.2. Variable Design
4.2.1. Dependent Variable
The dependent variable in this study is review helpfulness. On both Ctrip and Tongcheng platforms, users indicate a review’s helpfulness by clicking a “like” button. This mechanism enables consumers to signal the helpfulness of a review, thereby helping others to filter for more informative content. Accordingly, the number of votes a review receives, i.e., Vote Count, serves as the measure of its helpfulness.
4.2.2. Independent Variables
The independent variables in this study are image complexity, which is composed of two components: color diversity and texture homogeneity. These two variables are derived from image data using computational techniques, providing critical insight into how visual elements in reviews influence their perceived helpfulness.
Color diversity refers to the variety of distinct colors present in an image. In this study, we use color histogram entropy, which quantifies the distribution of colors. First, convert the image to the HSV or Lab color space to focus on perceptual differences. Compute the color histogram using the Hue (H) channel with N bins. Normalize the histogram to get the probability
of each color bin. Then, calculate entropy using Shannon’s formula:
The value of color diversity is calculated using Python, which effectively quantifies the richness of colors in images. A higher value of color diversity indicates a greater variety of distinct colors within an image. Images with high color diversity typically include vibrant and complex scenes, such as a scenic landscape that include the sky, trees, water, and people, or a food photograph of assorted hotpot ingredients. In contrast, images with low color diversity are often monochromatic or grayscale, such as minimalist product shots with a uniform color background.
Figure 1 illustrates the sample images, where (a) and (c) show the images of high color diversity, while the remaining two are the ones of low color diversity.
Texture homogeneity measures the intricacy of texture patterns within an image, reflecting the visual surface characteristics such as roughness or smoothness. This study calculates texture homogeneity based on the gray-level co-occurrence matrix (GLCM), which analyzes the frequency with which pairs of pixel values occur in a specific spatial relationship. This value is obtained by Python. The formula for texture homogeneity is as follows:
where
is the normalized GLCM value at the pixel pair
, and
represents the difference between the pixel values. Higher texture homogeneity reflects visual smoothness and simplicity in the image, characterized by fewer intricate patterns and reduced surface variation. Examples of high texture homogeneity images include a white hotel bedsheet or a clean, uncluttered floor. Conversely, low texture homogeneity is found in images with rich surface details and complex textures, such as patterned fabrics, brick walls, or busy scenes with overlapping visual elements.
Figure 1a,b illustrates the sample images of low texture homogeneity, while the rest are the ones of high texture homogeneity. For each review, the values for color diversity and texture homogeneity are averaged across all included images to represent the overall complexity of the image.
It is essential to note that color diversity and texture homogeneity provide objective and replicable measures of image complexity; however, they may not always align with the perceived visual quality by consumers. For example, a grainy or overexposed photo may score high in complexity but offer little diagnostic value. Therefore, we caution that these metrics should be interpreted as proxies rather than direct reflections of perceived informativeness or clarity.
4.2.3. Moderator Variable
Readability of Chinese text is measured using the ‘cntext’ package (
https://github.com/hidadeng/cntext (accessed on 4 March 2024), which adapts principles from the Fog Index [
65] for Chinese texts. The algorithm incorporates two key components. The first component, denoted as
, is the average number of characters per sub-sentence. This metric captures sentence length and reflects the density of information within each segment of the text. The second component, denoted as
, is the proportion of adverbs and conjunctions present in each sentence, serving as an indicator of syntactic complexity. These two components are integrated into a single readability score using the following formula:
In this formulation, a higher readability score indicates that the text is more complex and imposes a greater cognitive load on the reader, suggesting that the material is more difficult to understand. Conversely, a lower score implies that the text is simpler and easier to comprehend. This objective measure of textual clarity is crucial for our study, as it serves as a moderator variable to investigate whether the ease of processing review texts influences how consumers integrate visual cues with textual information in evaluating review helpfulness.
4.2.4. Control Variables
Consumers consider multiple factors when evaluating the helpfulness of reviews. Prior research has demonstrated that variables such as rating score, sentiment score, sentence count, image count, word count, review lifespan, and product type, significantly influence review helpfulness.
The rating score refers to the numerical score assigned to a product or service by reviewers (typically on a scale of 1 to 5), which shapes consumers’ perceptions. Moderate or negative ratings are frequently viewed as more objective, while highly positive reviews may be perceived as subjective [
12,
28].
The sentiment score measures the overall emotional tone of the review text. It quantifies the emotional tone of a review based on a Chinese sentiment lexicon while accounting for the influence of degree adverbs. Reviews with negative sentiments are often regarded as more credible and influential in consumer decision-making compared to positive reviews [
17].
Sentence count refers to the total number of sentences in a review text. Higher sentence counts often indicate more comprehensive content, which may influence consumers’ perceptions of helpfulness [
20]. Thus, sentence count is included as a control variable to account for length.
Image count refers to the total number of images included in a review. Reviews with more images generally provide richer information, making them more helpful [
13].
Word count refers to the total number of words in a review, while sentence count reflects the number of sentences in a review. Longer reviews, with more detailed content, are generally perceived as more helpful [
15].
Review lifespan refers to the period between the publication of the review and its inclusion in a collection. Older reviews are often considered more credible due to their longevity and visibility [
20].
Product type refers to a categorical variable that indicates the category of the product or service being reviewed, such as hotels or travel. This variable is critical for controlling potential differences in review characteristics and consumer evaluations across distinct product categories [
29].
To ensure the robustness of the results, these control variables are incorporated into the regression models to account for their potential influence on review helpfulness, as supported by the prior literature. All the variables used in the study are listed in
Table 1.
4.3. Descriptive Analysis
To understand the basic information about the variables, we present the descriptive statistics for these variables in
Table 2, including the mean, standard deviation, minimum, and maximum values.
Before conducting the empirical analysis, the data were preprocessed to ensure consistency and reliability. We examined the distribution of each variable. The analysis revealed that the moderator variable, Readability, exhibited significant right-skewed distributions. To mitigate the influence of extreme values and outliers, logarithmic transformations were applied to Readability. Additionally, to eliminate dimensional differences among variables and enhance model stability, Z-score standardization was applied to all variables except the dependent variable. This standardization enables direct comparisons among variables measured on different scales, thereby improving the accuracy and interpretability of the model.
We then examined the correlations among all variables, as shown in
Table 3, demonstrating that the independent variables are associated with the dependent variable. To assess potential multicollinearity, we conducted a Variance Inflation Factor (VIF) analysis (
Table 4). The results show an average VIF of 2.31, with no variable exceeding 10, indicating that there is no significant multicollinearity. Thus, the independent variables are suitable for regression analysis.
6. Discussion
6.1. Theoretical Implications
This study advances the literature on online review effectiveness by integrating established frameworks with recent empirical insights. Our findings extend Information Diagnosticity Theory [
12] by demonstrating that the quality and richness of visual cues, which are color diversity and texture homogeneity, play a critical role in reducing consumer uncertainty [
36]. Recent evidence by Zhao et al. [
74] corroborates this view, showing that vibrant images in social networks provide detailed and contextually relevant information, thereby enhancing customer engagement.
Furthermore, our research refines Dual Coding Theory [
42,
63,
75] by highlighting the importance of multimodal integration in online information processing. While classic theory emphasizes the additive benefits of processing both verbal and non-verbal information, our results indicate that text readability significantly moderates the effectiveness of visual cues. Eitel and Scheiter [
61] demonstrated that presenting clear and concise textual information alongside images facilitates better understanding. Our analysis further clarifies the differentiated role of text readability across product categories. For hotel consumers, the texture and clarity of images directly convey functional value, rendering the influence of text readability negligible. In contrast, for travelers, moderately textured images paired with readable text amplify the vividness and emotional tone of the content, thereby enhancing the perception of hedonic value.
Additionally, the inverted U-shaped relationship we observe between texture homogeneity and review helpfulness—especially in hotel reviews—offers an extension of Berlyne’s Aesthetic Theory [
68]. Although earlier studies established that moderate complexity optimizes aesthetic appeal, our findings reveal that this optimal balance varies by product category. In hotel consumption contexts, where utilitarian value is prioritized, image clarity plays a more critical role. Moderate texture homogeneity enhances the perceived quality of reviews, suggesting that aesthetic balance is a key determinant in consumers’ evaluative judgments for utilitarian products.
By bridging classic theories, our research deepens the theoretical understanding of multimodal information processing in digital environments. These contributions offer a new perspective for future research to investigate dynamic interactions among sensory channels in various product contexts.
6.2. Managerial Implications
Our findings offer actionable insights for online review platforms and businesses aiming to optimize the effectiveness of user-generated content. First, for hedonic products such as travel experiences, color diversity has a positive impact on review helpfulness. Platforms should encourage users to upload vibrant, context-rich images when reviewing destinations. Prior research suggests that visually engaging content improves consumer engagement [
74]. Platforms can implement features, such as AI-driven image recommendations or review prompts, to help users select the most relevant images. Besides, for utilitarian products, such as hotel accommodations, moderate texture homogeneity enhances the informativeness of reviews. Platforms can provide automated image enhancement tools that balance sharpness and detail to optimize image quality for hotel reviews. This finding aligns with those of Ma et al. and Zhang et al. [
76,
77], who emphasize that visual clarity improves consumer trust in online content.
Secondly, given the hedonic attribute of travel, text readability significantly amplifies the effect of image complexity in travel-related reviews. Platforms should incorporate readability assessments or AI-powered writing assistance tools to help users create clearer and structured reviews. Prior research suggests that cognitive load is reduced when textual and visual elements are well-integrated [
10,
75]. Review templates or suggested sentence structures may further assist users in composing easily digestible content.
Our findings suggest that review helpfulness is driven by different factors depending on the product category. In travel reviews that emphasize hedonic experiences, color diversity and text readability are crucial, whereas in hotel reviews that prioritize utilitarian experiences, texture homogeneity plays a more significant role. By implementing these strategies, businesses and review platforms can enhance user engagement, improve the informativeness of reviews, and ultimately influence consumer purchase decisions, contributing to a more effective digital consumer experience.
6.3. Limitations and Future Research
While this study offers valuable insights into the impact of image complexity on the perceived helpfulness of online reviews, several limitations exist.
First, the data used in this research are limited to two specific product categories, namely hotels and travel, collected from two Chinese online travel platforms: Ctrip and Tongcheng. This narrow focus may limit the generalizability of the findings to other product types or cultural contexts. Future research could expand the scope by examining a broader range of products and services across different e-commerce platforms and in various cultural settings. This would help determine whether the observed effects of color diversity and texture homogeneity are consistent across different consumer markets and product categories.
Second, this study relies on computational measures of image complexity, specifically color diversity and texture homogeneity. While these metrics effectively capture key visual properties, they do not account for contextual relevance or semantic meaning within images. Consumers may perceive an image as helpful not solely based on its complexity but also on its alignment with the review content and its ability to depict key product features. Future research could integrate advanced computer vision techniques, such as object recognition, scene analysis, or sentiment-based image classification, to assess how the content and meaning of images influence review helpfulness.
Third, this study focuses on readability as a textual feature, but other linguistic factors, such as argument quality, writing style, sentiment polarity, and emotional appeal, may also moderate the effect of visual cues. For instance, highly emotional language may increase engagement, while concise, well-structured arguments may enhance credibility. Future research could leverage natural language processing (NLP) techniques to explore how different textual characteristics interact with visual elements in shaping consumer perceptions of review helpfulness.
Finally, this study utilizes observational data from online reviews, which limits the ability to make causal inferences. Although robust regression models and robustness checks were employed, the study cannot fully isolate the causal relationships between image complexity, readability, and review helpfulness. Future research could employ experimental designs to manipulate image characteristics and textual readability in controlled settings, providing stronger causal evidence on how multimodal information processing influences consumer decision-making.
By addressing these limitations, future research can deepen our understanding of how consumers integrate visual and textual information in online reviews, further advancing theories in consumer behavior, online marketing, and human-information interaction.