1. Introduction
E-commerce has developed rapidly in recent decades, and it has been estimated that there are at least 18 billion people purchasing online, within 59.5% of netizens around the world in 2021, a 7.3% increase compared to 2020 [
1]. Among various forms of e-commerce, online platforms are perceived as concentrated reflections of national consumption power and social trends. Digital traces, generated by massive online consumers and accumulated in e-platforms, offer a new probe for the collective behaviors of consumers. Most of the existing studies that focus on the e-commerce area are based on datasets from Amazon in developed countries [
2,
3,
4,
5] and aimed at the inner reactions and features of e-commerce, for instance, purchase intention [
6], purchase return prediction [
7], customer engagement [
8], consumer use intention [
9], the helpfulness of reviews [
10] and advertisement features to promote consumer conversion [
11]. In contrast, China, with the largest potential market and highest expansion speed in the world, at 904 million and 14.3%, respectively [
1,
12], has been seldom afforded substantial attention in previous exploitations. In China, online buying is experiencing rapid growth, and over 80.3% of Internet users now purchase online [
13], suggesting a high penetration rate of the top e-commerce platforms such as JD.com. Furthermore considering that the total number of online consumers in China has already exceeded the entire population of the USA, while its increasing rates of netizens and online consumers are positioned in the forefront of the world [
1], samples from JD.com would reasonably offer a solid and reasonable base for the behavioral understanding of online reviewing [
14,
15].
Among the many features of e-commerce, the online review, in the mixed form of a quantitative score and a qualitative text content, is always receiving great attention in both academia and business studies, proving its ability to change consumer attitudes [
16], purchase probability [
17,
18] and even company reputation [
19]. In particular, negative reviews with poor ratings, which serve as a channel for users to express complaints and feelings about unsatisfactory experiences after purchases, show the online review’s inherent capability to help in mining consumers’ perspectives and then platform service improvement [
20,
21]. It has been extensively proven that negative reviews are unexpectedly perceived as more helpful [
22] and persuasive [
23] to other consumers, suggesting their profound impact on future sales [
24,
25]. Moreover, different from positive reviews, negative reviews demonstrate a stronger nature in the expression of emotion as an intent [
26,
27], and review extremity [
10] and a questioning attitude [
28] can affect their helpfulness, suggesting the necessity of monitoring and mining consumers’ emotional characteristics through negative reviews. Even more important are the close connections among negative reviews and the purchase conversion of consumers [
11], product awareness [
7], user preferences [
29] and consumer attitude [
30], which are also reasonably demonstrated, indicating that these negative reviews could be a new inspiring source of understanding consumer behavior.
Previous studies, though suggesting the value of negative reviews, are mainly focused on negative reviews rather than negative reviewing. The user-centric questions, such as who often posts negative reviews or when and why consumers post them, still remain unknown. Moreover, the differentiated characteristics of consumers with different intention roles [
31], cultures [
32], engagement levels [
30] or reviewing periods [
33] and the application of sentiment analysis on user-level feature recognition [
34] provide the possibility of dividing online consumers into distinctive groups based on consumer characteristics, offering a finer resolution in probing the various patterns in the behaviors of online reviewing. Furthermore, prior literature that focused on various features, such as the sequential and temporal dynamics of online ratings [
35], different perceptional influence of negative reviews across cultures [
36] and emotional content and evolution in reviews [
37], indicates the importance of mining temporal, perceptional and emotional patterns in reviews to provide insights on how consumers understand reviews and then managerial implications regarding product development, advertisement and platform design. These indeed motivate our study. To fill the research gap and obtain a comprehensive and deep understanding of user behaviors in negative reviewing on e-commerce platforms, we propose three research questions (RQ) from three views:
RQ 1: Are there temporal patterns in posting negative reviews on e-commerce platforms, such as what time users tend to post negative reviews?
RQ 2: Are there perceptional patterns in posting negative reviews on e-commerce platforms, such as how different users understand negative reviews and why they post negative reviews?
RQ 3: Are there emotional patterns in posting negative reviews on e-commerce platforms, such as what emotions are expressed in negative reviews and how these emotions evolve and shift?
In this paper, a data-driven solution with various cutting-edge methods is employed to mine the dataset from JD.com, the largest and the most influential Chinese B2C platform, and reveal universal behavioral patterns in negative reviewing. Based on massive samples, data mining with machine learning models can dig deeper into the patterns in the real world and has the advantage of reducing statistical and subjective bias. Moreover, different levels of online consumers are taken into consideration here.
This paper has contributions in both academia and practice. In terms of academics, this paper creatively focuses on the differences across user levels to characterize negative reviewing in a more precise and distinguished way. From the practical perspective, it provides guidelines for precise marketing and profiling according to user levels to mitigate the negative impact of negative reviews and promote sales, consequently improving service quality. This is the first time, to our best knowledge, that a comprehensive probe of the behavioral patterns in posting negative reviews has been undertaken.
The remainder of this paper is organized as follows. In
Section 2, we introduce the related literature to demonstrate the motivations of the present study. In
Section 3, we introduce the dataset and preliminary methods employed. The results of our study are described in
Section 4, including patterns in the temporal, perceptional and emotional aspects.
Section 5 and
Section 6 provide discussions and concluding remarks for future research.
4. Results
In this paper, to understand online consumer negative reviewing behavior from a comprehensive perspective, consideration is given to the following: when consumers would like to post negative reviews, such as the exact time in a day or the interval after they bought a product, concluded as the temporal dimension; what content they express in negative reviews or why they post negative reviews, summarized as the perceptional dimension, and how they express the negativeness and how their emotion evolves, generalized to the emotional dimension.
4.1. Temporal Patterns
The temporal patterns of negative reviewing were probed from the review-creation (RC) time and intervals between the review-creation time and the product-purchase (PP) time.
4.1.1. Review-Creation Time
The distribution of review-creation time on an hourly scale can be seen in
Figure A1, which contrasts that of the product-purchase time. As shown in
Figure A1, the posting pattern of negative reviews in the temporal aspect is consistent with most online behaviors, such as the user active time [
65] or the log-in time on social media [
66]. Besides a minor lag in RC time compared to PP time, there are few obvious differences we can observe between the two distributions. It seems that there is more shopping in the morning but more negative reviewing at night. Moreover, this lag also inspired a further examination of the interval between RC time and PP time.
4.1.2. Intervals between RC Time and PP Time
There is always a time difference, i.e., an interval, between RC time and PP time, and it suggests a process of product or service experience and usage of consumers. The exploration of intervals between RC and PP times helps us understand consumers’ rhythms in negative reviewing from both general and user-level perspectives.
For every purchase record from different sectors in our dataset, we obtained the interval in hours and its log–log distribution, as shown in
Figure 3, which contains distributions for all five user levels. As presented, the overall form corresponds to heavy-tail-like and power-law-like distributions, which is consistent with most online human dynamics [
46,
65,
66], indicating that negative reviewing after purchase is bursty [
67]. In addition, comparing the all-user group with different-user-level groups, no significant difference can be found, and the five user levels all perform according to a heavy-tail distribution.
At the same time, in each distribution of
Figure 3, a periodic fluctuation is observed when the interval is larger than 10 min. Though the exact distance in the distribution becomes narrow because of the log scale, a rough measurement surprisingly shows that the fluctuation cycle is approximately 24 h, with the first peak at 24, which means the interval between RC and PP times is more likely to be hours in multiples of 24, i.e., one day. This interesting phenomenon suggests a circadian rhythm in negative reviewing.
To further examine the reliability of the periodic intervals, we thoroughly checked the exact correlation between the purchase hours and the reviewing hours of all negative reviews. As illustrated in
Figure 4a, for all users, the occurrence of pairs with the same or close PP and RC hours is significantly higher than that of other cases. It is also interesting that as pairs of the RC and PP hours approach the diagonal, i.e., reviewing at the same time of buying, the occurrences demonstrate a significant increase. This finding indicates the reliability of periodic intervals and their cycles of 24 h, which can be interpreted as online consumers tending to post negative reviews for a certain product at the same hour of buying.
Comparing the performances among user levels, in
Figure 4b–f, it can be further observed that the same patterns of periodic intervals for silver and golden levels are closer to those for all users. However, the patches in grids for diamond users and PLUS users are relatively disarrayed, lacking a regular pattern of gradual change. Therefore, it can be summarized that the online interactive behavior, such as purchasing and posting reviews, of users with higher levels can be triggered in a more random manner, especially compared with that of users at lower levels. This can be well explained by the more frequent purchases and more active interactions of higher-level users.
Periodic intervals indicate an interesting relationship between negative reviewing and purchase behavior, where the purchase is motivated by user demand and reviewing comes from user experience after using the product. According to this, we suppose that periodic intervals with a 24 h cycle might be related to a perceptional rhythm of online consumption, which means that users tend to perform or carry out related activities at a fixed period in a day. To verify this, we implemented an experiment inspired by Yang et al. [
68], where the discrepancy of behavior performance on weekdays and the weekend was examined, implying that randomness and recreation in behavior during weekends are much more intense than those in behavior during weekdays.
Figure A2 displays the result of the discrepancy between weekdays and weekends in periodic intervals and supports our conjecture to some extent. As we can see, the periodic interval of the weekday is more consistent and intensive than that of the weekend, indicating that regular work or study activities enforce the perceptional rhythm, while recreation activities weaken the perceptional rhythm. Therefore, we propose that the regular activities of humans are related to the periodic intervals between RC and PP times.
Furthermore, as
Figure 5a,b display, the different distributions of PP and RC times in different product categories provide more evidence that perceptional rhythms vary across sectors, which indicates that the time of consumer purchase and reviewing might be associated with the features of a certain product that might lead to a regular timeline for consumer online behavior and then to periodic intervals. To provide more details, we can see in
Figure 5a,b that the PP time or RC time of products from
Phone and Accessories and
Clothing shows peaks in the evening and early morning, which is always for recreation time. In contrast,
Computers-related PP and RC times are performed more actively than others at 10 to 16 o’clock, which are normally office hours.
To enhance the reliability of verification, we implemented another experiment from the view of identified users, preventing the statistical bias coming from users who purchase products frequently. In
Figure 5c, identified users with no less than three purchases are kept to obtain the average hour difference between PP and RC times. Moreover, the proportion of users with more than five purchases (generalized as frequent-purchase users) in all valid identified users is relatively small, which suggests that the high frequency of purchases from frequent-purchase users is not the main factor that leads to periodic intervals. Note that frequent-purchase users account for 19.16% of all users but generate more than half of all purchases. From
Figure 5c, we can conclude that periodic intervals come from the stable habits of online consumers in consumption. Therefore, the effectiveness and reliability of circadian rhythms in negative reviewing in
Figure 4 and
Figure A2 and then the robustness of periodic intervals can be accordingly testified.
Explorations about periodic intervals suggest that the purchase time and reviewing time are sector-dependent and have an interesting relationship with one another, which was rarely accessed previously and can add elements to marketing strategies. Hence, the results of this subsection answer RQ 1.
4.2. Perceptional Patterns
One profound value of negative reviews is that they offer a channel for the expression of consumer dissatisfaction about a product or a service, providing future consumers with reference information and sellers with a direction for improvement. Therefore, the recognition of what online consumers complain about in reviews and patterns of how they perceive online buying turn out to be significant. Perceptional patterns refer to the exact reasons behind the consumers’ postings of negative reviews and level-dependent preferences related to the cognition toward negativeness. Moreover, it is worth mentioning that differences among consumers’ negative reasons might originate from not only the users themselves but also from platform policies for different levels.
4.2.1. Main Reasons for Negative Reviews
For a medium-sized horizontal e-commerce platform, it can receive several million negative reviews that complain about problems from different aspects in one month, of which different departments are in charge. Under this circumstance, it is necessary to automatically identify reasons for a large number of negative reviews, preparing for settlements of the exact problem and other exploration. As discussed in
Section 4, we constructed a logistic regression classifier for automatic reason identification and applied it to all negative reviews in our dataset, and
Figure A3 shows the proportion of the four main negative review reasons of all five user levels. As can be seen, the sector Gifts and Flowers received the largest proportion of complaints about logistics, which is consistent with our common sense that products from this category have high demands for distribution punctuality and professional equipment. Regarding negativeness toward product quality, Clothing ranked first, followed by Computers, and Gifts and Flowers was the last. The former two have an obvious character of ‘commodity first’. With respect to dissatisfaction about customer service and false marketing, the ranking order is exactly the opposite, suggesting that the triggers underlying negative reviewing are sector-dependent. As there is no evident difference among the five user levels, indicating that all consumers perceive the main reasons similarly, more detailed reasons are probed in the later analysis.
To explore whether there are regular patterns for when to post certain kinds of negative reviews in a day, we followed thoughts from the temporal aspect to focus on time-triggered patterns of reasons for negative reviews.
Figure A4 and
Figure A5 present the proportions of main reasons in an hourly scale for two sectors, which demonstrate different trends. Specifically, the first peak is generally located at 10–12 o’clock as expected. However, the reach of the second peak is sector-dependent, for instance, the sector Clothing, with obvious personal usage features, reaches its second peak at 20–21 o’clock of the personal time for recreation, while Computers, with obvious office attributes, misses the second peak. This indicates that even in the hourly pattern of negative reviewing, the reasons that lead to poor reviews are still sector-related.
Comparing the four sectors we selected from our dataset, there are common laws in the time distributions of the main reasons for negative reviews that can be summarized. First, as observed in
Figure A4b, the negative reviews of logistics obviously reach peaks at 11 to 12 a.m. and 16 to 18 p.m., which is at the pick-up time. Second, there is an upward trend in the number of negative reviews of product quality in the evening, especially for categories with personal use features, such as Phone and Accessories and Clothing, and this trend is not very evident for sectors with office attributes. Third, a downward trend can be seen in negative reviews that complain about customer service for categories with office attributes, which vanishes in categories with personal use features. Moreover, there is no trend observed for negative reviewing due to false advertising. These findings imply that the hourly patterns of posting negative reviews for a certain reason have a close relationship with the schedule of daily life and demonstrate stable rhythms. Moreover, the distribution of the reasons that lead to negative reviewing is user-level-independent, as no evident discrimination across levels is found here.
4.2.2. Detailed Aspects Leading to Negative Reviewing
To explore the deeper regulations among users about how they regard negative reviewing, a division of only four main reasons is somehow insufficient. Therefore, we constructed an LDA model based on every main reason to further detect online consumers’ detailed aspects for negative reviews. It should be noted that we did not construct an LDA model for the reasons of product defects, considering that the review content of product defects is always sensitive to product function, design and category, lacking the possibility of stable clusters.
We considered model perplexity and topic coherence together in setting the parameters of the LDA. Model perplexity can be characterized by a positive number that indicates the uncertainty level, while topic coherence is used to describe the similarity among different topics and can be characterized by Umass coherence [
69], which is commonly a negative number indicating a better topic distribution through a smaller absolute value. According to this, we determined the criteria for the selection parameters as the multiplier of Umass coherence and perplexity, called the
multiplier parameter, with a larger
multiplier parameter suggesting a better performance of LDA. Besides the
multiplier parameter, the topic number was also considered when selecting models to reduce the overfitting possibility. In line with this, we determined three models for the three main reasons, with topic numbers 8, 9 and 7. The details of the parameter selection and topic words for every model can be seen in
Figure A6 and
Appendix B.
According to the topic words (see
Appendix B) of each topic, we summarized complaints on logistics, customer service and marketing, as seen in
Table 2. The percentage of each aspect is shown in
Figure 6. More than 40% of the logistics-related negative reviews complain about the low speed of delivery, followed by improper delivery timing with 18% and packaging problems with 11%. For customer service, complaints about shipping fees when returning goods come first, followed by rude customer service attitudes, delivery time and gift packaging negotiation issues. Thus, it can be proposed that the e-commerce platform should take more effective actions to enforce employee training, the construction of logistics infrastructure and channels of information feedback. Moreover, an abnormal price increase or an abrupt price reduction after purchase serves as the most dominant issue in false advertising, suggesting that e-commerce platforms should pay more attention to the price monitoring and management aspects.
Additionally, we can further find discrepancy among user levels’ preferences in
Figure 6. For example, golden users have low tolerances for common problems, such as slow delivery and shipping fees when returning goods, while copper users are sensitive to consumer service attitudes and logistics speed. Even more inspiring, copper and golden users have more consistent preferences, while silver and PLUS users are more diversified in detailed aspects leading to negative reviewing. Though the main reasons that trigger negative reviews are level-independent, different preferences across levels in detailed aspects demonstrate that consumers of various levels post negative reviews for diverse reasons, implying that a further examination of level-related discrimination is necessary.
4.2.3. Expression Habits in Reviewing
The reasons for users posting negative reviews are directly reflected in a single sentence of reviews, while expression habits are explored through establishing a frequent connection in multiple sentences and are an externalization of buyer perception. In this paper, we attempt to characterize the expression habits of different user levels from the aspects of review length and semantic similarity.
Review length is an indicator of review helpfulness, information diagnosticity [
10] and product demand [
70] and is an important factor for user centrality [
71]. Here, we implemented two ways to measure review length, with the same outcomes. The first method was to measure the exact length of all characters within the text of the review, and the second was to count the number of effective words after cutting the text into a word list and filtering stopping words.
Figure 7 shows the distribution of negative review length for five user levels, in which both methods were employed with the same performance. Considering most of the outliers come from the deliberate repetition of meaningless words and increase the difficulty of viewing and comparing among user levels, fliers are ignored here. It can be seen that with the increase in user level, the negative review length has an upward trend. Moreover, the upward trend is statistically significant in the one-way
t-test. The growth of review length with levels implies that consumers of high levels post longer negative reviews and offer more reference and helpfulness for both consumers and sellers. From this view, the value of negative reviews from high-level users should be stressed in practice.
Differences in users’ expression habits can be reflected in the landscape of word similarity in contexts.
Figure 8 shows the semantic networks of the word ‘match’ for five different user levels, where words connected to it are the top-four most similar synonyms of different contexts. As can be seen, users from different levels indeed demonstrate diversity in expression habits in terms of network structures. According to this, we constructed word networks for each user level to examine the similarity in user expression habits across user levels.
For the selection of
N in building word networks as it is illustrated in
Section 2.2, the proportion of the largest connected subset in commonSet,
, was used to narrow the range. As
N increases,
increases, implying that more words are connected to form a more complete landscape of semantic similarity. Until
ends its highest rate of increase, it can be conjectured that the most representative and strongly connected words have been contained in the network. According to this, we set
N as 4 (see
Figure A7).
After establishing five networks with
, the similarity in user language expression habits was measured by the Kendall correlation coefficient in a bootstrapping manner. Regarding the indicator, we focused on three different ones that characterized the different features of the network: degrees measuring node connections, clustering coefficients measuring adjacent node connection density and k-core indexes, meaning node locations within the network. When implementing bootstrapping for a robustness check, here, the repeating frequency was set as 1000 and the size as 500, i.e., 500 words were randomly selected from the common set for all user levels.
Figure 9 presents semantic similarity in the Kendall correlation among the five user levels. On the one hand, it can be observed that users with such adjacent levels show greater similarity, implying more sharing in expression habits. The silver user and golden user obtain the greatest similarity of 0.22–0.28. On the other hand, the absolute values of similarity are mostly in the range 0.1–0.3, suggesting that users of different levels are profoundly different in their expressions of negative reviewing. In addition, the great differences among user levels are truly beyond our expectation, which indicates that if e-commerce platforms regard different users’ expressions as the same, there will be a high possibility of misunderstanding users’ reviews and then implementing improper actions. Note that the results from other settings of
N are consistent, and our above observations are robust (see
Figure A8).
The exploration is aimed at perceptional discrimination in negative reviewing, examining causes beyond reviews and expression habits in reviews. Our investigations reliably demonstrate the existence of level-dependent patterns in negative reviewing. The conclusions here can answer RQ 2.
4.3. Emotional Patterns
Similar to why consumers post negative reviews, the emotion features of consumers can also be extracted from the texts of reviews, along with scores that represent consumers’ emotion baselines. Emotion, serving as an attribute for the entire review, can directly influence how practitioners solve complaints from consumers. Here, we regarded negative reviews as mixtures of both negative and positive emotions, instead of only negativeness, and aimed at the distribution of buyer emotion from different levels and its evolutionary patterns in both time and tendency.
4.3.1. Distribution of User Emotions
Consumers from different levels may have different emotion distributions, such as the proportion of negative or positive sentiments and the extent of polarization. If they exist, some patterns can be used to guide sellers to pay more attention or take special actions regarding those with extremely negative feelings.
First, we calculated
and
to characterize the emotional degrees of different user levels.
Figure 10a shows the average percentage of positive and negative emotion words in negative reviews posted in 2017. As can be seen, the degree of negative emotion decreases as user levels increase, while there is no evident trend in positive emotion. Interestingly, golden users have the lowest positive words usage, and copper users deliver the highest degrees of both positiveness and negativeness. It can be summarized that consumers from upper levels are usually less emotional than those of lower levels. The results may be explained to agree with our expectation, as consumers from upper levels have more experience in product selection and negotiation with sellers, leading to being less emotional but more rational. The results of the other years, from 2013 to 2016 (seen
Figure A9), are consistent with the stabilization of the user structure and the business model.
The single indexes
or
can reflect only the positive or negative degrees of a review. To measure the polarization, we introduced the index
here to figure out the distribution of buyer emotion, as seen in
Figure 10b. It can be found, as expected, that negative polarization dominates negative reviewing for all consumers of all sectors. For the sectors of Gifts and Flowers, Clothing and Phone and Accessories, the negative and extremely negative polarization increases as the user levels increase, while for the sector of Computers, it shows an upward tendency first and then goes downward. For the upward tendency in the user level that is different from that shown in
Figure 10a, it can be interpreted as the different compositions of the two indexes. The index in
Figure 10b omits words that are neither positive nor negative in the sentiment dictionary. Therefore, the different results can be explained by considering that consumers with lower levels tend to use more negative words than upper-level users; however, in negative reviews from upper-level users, positive words co-occur with negative words less, leading
to be equal to or near −1. Through these findings, we can conclude that upper-level users are less emotional than lower-level users, but the emotion expression in negative reviewing is more condensed and concentrated.
To further probe the differences in the usage of emotional words across the five user levels, we calculated the possibility of appearing in one negative review for each word
w in the sentiment dictionary, which is defined as
, where
is the number of all negative reviews posted by users of a certain level and
is
w’s occurrences within these reviews. For each word, the variance in occurring possibilities across the five user levels was used to filter out words with greater differences in emotional preferences, as seen in
Table 3. Interestingly, it can be concluded that users from lower levels place more importance on service attitudes and whether the product matches what they are in marketing, while upper-level users tend to have diversified shopping demands and preferences and emphasize the overall experience. The diverse demands of upper-level users, along with their more condensed review content, can lead to a less biased narration. This observation is consistent with the findings from the perception of the causes leading to negative reviewing.
4.3.2. Emotion Evolution over Time
Online reviewing might occur continuously, such as posting multiple times in the same day. The continuous probability in one day for the PP and RC times of reviews from identified consumers helps to verify its existence. The results show that the continuity for RC time is stronger than that of PP time in three sectors, and the remaining one has two numbers that are very close, indicating that compared with purchase time, reviewing time is more continuous (see
Figure A10). In line with this, a natural question is how consumers’ emotions evolve in continuous reviewing, which makes the examination of the emotion sequence necessary.
In addition to sentiment indexes, the review score is also a quantitative indicator of the emotional attitude and can be employed to reflect the evolution of emotion. For every identified user, we sorted the scores of continuous reviewing in a time-ascending order and then cut sequential scores into sequences in which the intervals of adjacent scores should be smaller than a specified threshold, e.g., 0.5 or one hour. The continuous probability of every pair of adjacent scores was defined to determine whether the postscore would be influenced by the former one. For example, a score pair could be defined as (1, 2), which means that the previous one is score 1 and that the post one is score 2. Furthermore, to eliminate the influence of the unbalanced occurrences of different scores, the proportion of five scores in all reviews was implemented to normalize the continuous probability.
Figure 11 shows the continuous probabilities after the normalization. From the figure, we find that all users demonstrate a momentum in negative emotion, which is shown by the diagonal bubbles that are larger than the others; moreover, the farther away from the diagonal, the smaller the bubbles are. In addition, comparing
Figure 11b–f suggests that users from upper levels have a stronger momentum in negative emotion than those of lower levels, as the sizes of the diagonal bubbles become larger as the user level increases. For every threshold in [0.5, 1, 2, 24], we obtained the same conclusion, as a promise of robustness. In contrast with the conclusion above that the upper-level users are less emotional, a more intensive momentum in negative emotion indicates that although upper-level users have richer sets of experiences and more stable emotions, once they are faced with an unsatisfactory experience, it will cost more for sellers to comfort them and alleviate the negative outcomes.
4.3.3. Tendency of Emotion Shifting
Emotion toward a certain objective is not static and will change over time due to external factors. To explore how online consumers’ emotions shift over time, we focused on reviews with after-use comments, which were posted after the initial review, usually separated by a certain time of product usage. Posted after-use comments, compared with the initial one, often contain some shifts regarding usage experience and the resulting attitudes and emotions. According to the polarization pair for the initial review and after-use comment, abbreviated as (,), respectively, we defined three directions of mood shifting, that is, increasing with the pair as (−,+), (0,+) or (−,0), implying a shift toward more positive emotion, decreasing as (+,0 ), (+,−) or (0,−), implying a shift toward more negative emotion, and stable as (0,0), (+,+) or (−,−), which implies no conversion. To reveal the exact patterns of emotion shifts beyond negative reviewing, we examined the occurrence of difference shift directions and the time needs for shifts across both user levels and review scores.
Regarding the proportion of the three shift directions, as shown in
Figure 12, more than 35% of consumers who posted negative reviews obtained emotional improvement, which can be an indicator of how effective the action was that was taken after the poor ratings. Moreover, there is a slight downward trend of the increasing situation as the user level increases, suggesting that it is more difficult to change upper-level users from dissatisfied to satisfied. Then, in
Figure A11, the scores of initial reviews are taken into consideration. It can be observed that the gap first narrows between the percentages of emotion increasing and decreasing and then increases in the opposite direction. This trend proves that the saturation effect occurred in the emotion shifts, which is described as always being much easier for emotion to develop in reverse rather than to become stronger in the initial direction. This result further implies that the shift toward the opposite direction will easily trigger the posting of after-use comments.
The interval between an initial review and the after-use comment is measured as the lag needed for the emotion shift, with the resolution of an hour. In
Figure A12, we provide a different perspective from the user level. It can be seen that for all negative reviews with after-use comments, lags become longer as the user level increases, except that a slight fluctuation occurs with PLUS users. Furthermore, the same phenomenon can be seen only in negative reviews with an increasing emotion shift. Therefore, it is conjectured that the average interval becomes larger as the user level increases, together with the lags for emotion improving. This suggests that users of upper levels may need a longer time to change their negative impressions of products, which agrees well with our conclusion about upper-level users’ more intensive momentum in negative emotion.
In this section of sentiment patterns, we focus on the polarization, evolution features and shifting emotions in negative reviewing and determine that lower-level users tend to have more intensive negative emotions, while upper-level users demonstrate a more intensive emotion momentum when faced with unsatisfactory shopping experiences. Moreover, patterns in emotion shifts prove again the conjecture about upper users’ negative momentum and the existence of the saturation effect in emotion. These conclusions answer RQ 3.
5. Discussion
Online reviews and consumer behavior on e-commerce platforms has been widely studied; however, with respect to negative reviews that can be regarded as a kind of additional reference information for consumers and as a focus on performance improvement for sellers, the behavioral patterns that underlie them have not been explored in a comprehensive manner in the literature. Moreover, existing research aimed at online consumer behavior does not distinguish online users from different levels and only pays attention to a few aspects of features, which is not sufficient to obtain a deep and systemic understanding of consumer characteristics.
This paper attempts to provide a systemic understanding of negative reviewing from the temporal, perceptional and emotional aspects. It utilizes a multisector and multibrand dataset from JD.com, the largest B2C platform in China, and implements various methods of semantic analysis and statistics to offer the first empirical experiment based on a Chinese e-commerce dataset with more than several million pieces of data. Our main findings are the following:
(1) Circadian rhythms exist with regard to negative reviews after buying which are related to the time consistency of people’s daily activities. (2) The similarity in different users’ expression habits from adjacent levels is significantly greater than that of other level pairs, and users with lower levels are more sensitive to prices and a seller’s deceitful acts concerning pricing or to the rude attitudes of sellers, while the demands of users at higher levels are more varied and exhaustive. (3) In the emotional dimension, users at lower levels experience more intensive expressions of negativeness but with lower emotional momentum in negativeness.
Our study has implications for both academics and practitioners. For academics, we contribute to an enhanced understanding of online consumers’ negative reviews, from the perspectives of temporal patterns, what kinds of experiences they tend to regard as failures, and emotional patterns regarding how they express and how this expression changes. Although there have been many recent studies that examine features of online reviews and negative reviews, to our knowledge, there is a lack of research aimed at online negative reviewing. In addition to insights into behavioral patterns of negative reviewing, the richness of our empirical setting provides a possibility for future research to dig deeply into not only online negative reviewing but also other online behaviors within buying. For example, the conclusions and conjectures put forward in the temporal dimension suggest that the purchase time and reviewing time are sector-dependent and have an interesting relationship with one another, which was rarely discovered previously, and provide value for more research into human behavior with perceptional regulation. Investigations from the perceptional aspect reliably demonstrate the existence of level-dependent patterns in negative reviewing and provide the possibility of establishing perception models for precisely profiling different consumers. Patterns in emotion prove the feature of upper users’ more intensive negative momentum and the existence of the saturation effect in emotion, which are seldom paid attention to and suggest a deeper dig into distinguished emotional patterns across groups. There are previous studies that divide online users into groups to study their characteristics [
30,
33,
57], for instance, differentiating by engagement level and concluding there are different performances in emotional perception [
30]. Our investigations from the perceptional and emotional dimensions provide more fine-grained user differentiation and deeper pattern mining to reveal their different service preferences, expression habits and negative momentum. To sum up, some unexpected findings shed light on the diversity of user behavior, such as the relationship between purchase time and reviewing time, the great differences among users’ expressions and the different amounts of emotion momentum, concepts previously examined in a limited way.
For practitioners, some findings in this paper are important because they offer guidelines or tools to determine a platform’s or a seller’s problems or disadvantages and then to devise targeted management strategies to improve a corporation’s performance. Moreover, the results about the characteristics of different user levels underscore the necessity of implementing a distinguished strategy toward different user groups. In the temporal aspect, explorations about periodic intervals suggest that in addition to personal preferences on products, the time line of regular activities can also be modeled to help user profiles and precise marketing, such as combining a timing preference with a user and the product category or precisely profiling consumers’ online active hours. Findings and inspirations in the perceptional dimension indicate that understanding the behavioral patterns behind negative reviewing can indeed help to trace back to the existing deficiencies and spark new improvements, such as information feedback channels and price monitoring. Moreover, the profound differences in expression habits among user levels also make level-dependent strategies necessary in an e-commerce platform, such as implementing different word strategies when facing different user levels to reduce misunderstandings. The preferences for negative reasons suggest that managers should provide more purchase and selection directions for junior users with less experience but more extremely negative expressions, which harm a platform’s reputation, and provide the personalization to review displays on platforms. In addition, according to momentum in negative emotion, even though upper-level users employ less negative word-of-mouth, as long as they experience unsatisfactory shopping, the platform should pay more in costs or measures to comfort and alleviate their negative mood and then to mitigate or avoid the negative word-of-mouth from breeding and spreading negative emotions. These proactive strategies could boost sales, reduce negative word-of-mouth and increase consumer satisfaction. This reaches a similar implication as the research of Richins and Marsha [
19] and Le and Ha [
18], which suggests the negative effect of untimely disposal and the importance of managerial responses towards negative reviews, while the finding here further points out the differences among users’ perceptions of sellers’ responses and suggests corresponding strategies.