Behavioral Patterns beyond Posting Negative Reviews Online: An Empirical View

: Negative reviews on e-commerce platforms are posted to express complaints about unsatisfactory experiences. However, the exact knowledge of how online consumers post negative reviews still remains unknown. To obtain an in-depth understanding of how users post negative reviews on e-commerce platforms, a big-data-driven approach with text mining and sentiment analysis is employed to detect various behavioral patterns. Speciﬁcally, using 1,450,000 negative reviews from JD.com, the largest B2C platform in China, the posting patterns from temporal, perceptional and emotional perspectives are comprehensively explored. A massive amount of consumers across four sectors in recent 10 years is split into ﬁve levels to reveal group discrepancies at a ﬁne resolution. The circadian rhythms of negative reviewing after making purchases are found, suggesting stable habits in online consumption. Consumers from lower levels express more intensive negative feelings, especially on product pricing and customer service attitudes, while those from upper levels demonstrate a stronger momentum of negative emotion. The value of negative reviews from higher-level consumers is thus unexpectedly highlighted because of less emotionalization and less biased narration, while the longer-lasting characteristic of these consumers’ negative responses also stresses the need for more attention from sellers. Our results shed light on implementing distinguished proactive strategies in different buyer groups to help mitigate the negative impact due to negative reviews. the This paper attempts to provide a systemic understanding of negative reviewing from the temporal, perceptional and emotional aspects. It utilizes a multisector and multibrand dataset from JD.com, the largest B2C platform in China, and implements various methods of semantic analysis and statistics to offer the first empirical experiment based on a Chinese e-commerce dataset with more than several million pieces of data. Our main findings are the following:


Introduction
E-commerce has developed rapidly in recent decades, and it has been estimated that there are at least 18 billion people purchasing online, within 59.5% of netizens around the world in 2021, a 7.3% increase compared to 2020 [1]. Among various forms of e-commerce, online platforms are perceived as concentrated reflections of national consumption power and social trends. Digital traces, generated by massive online consumers and accumulated in e-platforms, offer a new probe for the collective behaviors of consumers. Most of the existing studies that focus on the e-commerce area are based on datasets from Amazon in developed countries [2][3][4][5] and aimed at the inner reactions and features of e-commerce, for instance, purchase intention [6], purchase return prediction [7], customer engagement [8], consumer use intention [9], the helpfulness of reviews [10] and advertisement features to promote consumer conversion [11]. In contrast, China, with the largest potential market and highest expansion speed in the world, at 904 million and 14.3%, respectively [1, 12], has been seldom afforded substantial attention in previous exploitations. In China, online buying is experiencing rapid growth, and over 80.3% of Internet users now purchase online [13], suggesting a high penetration rate of the top e-commerce platforms such as JD.com. Furthermore considering that the total number of online consumers in China has already exceeded the entire population of the USA, while its increasing rates of netizens and online consumers are positioned in the forefront of the world [1], samples from JD.com would reasonably offer a solid and reasonable base for the behavioral understanding of online reviewing [14,15].
Among the many features of e-commerce, the online review, in the mixed form of a quantitative score and a qualitative text content, is always receiving great attention in both academia and business studies, proving its ability to change consumer attitudes [16], purchase probability [17,18] and even company reputation [19]. In particular, negative reviews with poor ratings, which serve as a channel for users to express complaints and feelings about unsatisfactory experiences after purchases, show the online review's inherent capability to help in mining consumers' perspectives and then platform service improvement [20,21]. It has been extensively proven that negative reviews are unexpectedly perceived as more helpful [22] and persuasive [23] to other consumers, suggesting their profound impact on future sales [24,25]. Moreover, different from positive reviews, negative reviews demonstrate a stronger nature in the expression of emotion as an intent [26,27], and review extremity [10] and a questioning attitude [28] can affect their helpfulness, suggesting the necessity of monitoring and mining consumers' emotional characteristics through negative reviews. Even more important are the close connections among negative reviews and the purchase conversion of consumers [11], product awareness [7], user preferences [29] and consumer attitude [30], which are also reasonably demonstrated, indicating that these negative reviews could be a new inspiring source of understanding consumer behavior.
Previous studies, though suggesting the value of negative reviews, are mainly focused on negative reviews rather than negative reviewing. The user-centric questions, such as who often posts negative reviews or when and why consumers post them, still remain unknown. Moreover, the differentiated characteristics of consumers with different intention roles [31], cultures [32], engagement levels [30] or reviewing periods [33] and the application of sentiment analysis on user-level feature recognition [34] provide the possibility of dividing online consumers into distinctive groups based on consumer characteristics, offering a finer resolution in probing the various patterns in the behaviors of online reviewing. Furthermore, prior literature that focused on various features, such as the sequential and temporal dynamics of online ratings [35], different perceptional influence of negative reviews across cultures [36] and emotional content and evolution in reviews [37], indicates the importance of mining temporal, perceptional and emotional patterns in reviews to provide insights on how consumers understand reviews and then managerial implications regarding product development, advertisement and platform design. These indeed motivate our study. To fill the research gap and obtain a comprehensive and deep understanding of user behaviors in negative reviewing on e-commerce platforms, we propose three research questions (RQ) from three views: RQ 1: Are there temporal patterns in posting negative reviews on e-commerce platforms, such as what time users tend to post negative reviews? RQ 2: Are there perceptional patterns in posting negative reviews on e-commerce platforms, such as how different users understand negative reviews and why they post negative reviews? RQ 3: Are there emotional patterns in posting negative reviews on e-commerce platforms, such as what emotions are expressed in negative reviews and how these emotions evolve and shift?
In this paper, a data-driven solution with various cutting-edge methods is employed to mine the dataset from JD.com, the largest and the most influential Chinese B2C platform, and reveal universal behavioral patterns in negative reviewing. Based on massive samples, data mining with machine learning models can dig deeper into the patterns in the real world and has the advantage of reducing statistical and subjective bias. Moreover, different levels of online consumers are taken into consideration here. This paper has contributions in both academia and practice. In terms of academics, this paper creatively focuses on the differences across user levels to characterize negative reviewing in a more precise and distinguished way. From the practical perspective, it provides guidelines for precise marketing and profiling according to user levels to mitigate the negative impact of negative reviews and promote sales, consequently improving service quality. This is the first time, to our best knowledge, that a comprehensive probe of the behavioral patterns in posting negative reviews has been undertaken.
The remainder of this paper is organized as follows. In Section 2, we introduce the related literature to demonstrate the motivations of the present study. In Section 3, we introduce the dataset and preliminary methods employed. The results of our study are described in Section 4, including patterns in the temporal, perceptional and emotional aspects. Sections 5 and 6 provide discussions and concluding remarks for future research.

Literature Review
Driven by multiple factors, such as industry, business and society, e-commerce has been greatly boosted, creating opportunities and providing various benefits to organizations, individuals and society [38], even guiding the formation of a new lifestyle and economic situation in recent years. Therefore, plenty of studies have focused on e-commerce due to its vital roles in economics.

e-Commerce: Related Literature
Extensive studies have contributed to the advantages [39], challenges [39] and patterns among different cultures [40], aiming to determine the development model of e-commerce or enhance techniques such as data mining methods to help corporations in its operation and CRM [41][42][43] in understanding consumer purchase behavior. He et al. employed text mining approaches to model the business competition environment of a pizza seller with social media comments [44], Huang et al. examined the impact of social media interaction on consumer perception in online purchases [28], and Tsai and Bui revealed that social media users' WOM significantly influences consumers' purchase intentions for cruise travel products [6]. In addition, interacting factors and techniques implemented in e-commerce platforms have been studied from various perspectives and dimensions. Much attention has been paid to the inner reactions and features of e-commerce, for instance, purchase return prediction [7], customer engagement [8], consumer use intention [9], the helpfulness of reviews [10], the various identities of online consumers [45], the characteristics of different consumers [30] and advertisement features [11], to promote consumer conversion. However, there is still much room to continue digging into inner reactions and human behaviors in e-commerce. Furthermore, there is no shortage of empirical exploitations based on specific e-commerce platforms, such as Amazon [2][3][4][5], JD.com [46], Taobao [47,48] and eBay [48]. China, with 25% of the global population and the largest market in the world, has been officially reported to have gained an 8.5% year-on-year growth in national e-commerce transaction volume and 14.3% in the national online shopping user scale in 2018 [12]. Considering the surge of e-commerce in China, along with the vast amount of online consumers in China and its highest rate of increase in netizens and online consumers worldwide [1,13], this paper aims to provide reasonable yet profound insight into consumer behavior in e-commerce, based on data from Chinese e-commerce. Furthermore, compared with previous studies that aimed at discerning how e-commerce influences consumers, the perspective of potential feedback that e-commerce can acquire from consumer behavioral patterns is lacking and needs attention.

Negative Reviews: Related Literature
Among the various topics of e-commerce research, serving as kind of word-of-mouth (WOM) concept, negative reviews are always a topic of great concern, as users always express their complaints and negative feelings about unsatisfactory experiences through online reviews after a purchase. In the trend of user-generated content, the negative review is an ideal channel for online sellers to mine consumers' perspectives and then help the e-commerce platform to discover problems and accordingly improve service. Surveys have pointed out that most online consumers browse online product reviews before making shopping decisions [17], their attitudes toward a product can be influenced by related reviews or WOM [16] and reviews online can also have a disciplinary effect on corporate policies [49]. The different features of negative reviews or negative WOM have been primarily explored, suggesting the unexpected value of negative reviews in the business model of e-commerce. Existing efforts have focused on the reference of reviews and proven that negative reviews are more helpful [22,23] for consumers and are regarded as more persuasive [50] to help purchase decisions. Different from positive WOM, negative WOM demonstrates a stronger nature in the expression of emotion as an intent [26] and is more emotional [27], which can serve as a proxy to monitor consumers' feelings. It has been disclosed that review extremity [10] and questioning attitudes [28] in reviews can affect their helpfulness. In addition, Bogdan and Nicoleta focused on the effectiveness differences between paid product reviews and nonpaid ones [51]. Richins and Marsha suggested that if a negative review is not addressed in a timely manner by the seller, its negative effect will spread to a wider range and cause harm to the company [19]. Similarly, Le and Ha found that managerial responses have positive effects in changing potential consumers' attitudes and behavior and moderating the negative impacts of negative reviews [18]. Nevertheless, the positive outcomes of negative reviews are also surprisingly unraveled. Ghose et al. investigated the relationship between the objectivity of a review and product sales or helpful votes and pointed out that negative reviews with more objectivity obtain more votes [24]. Berger et al. further provided evidence that negative reviews may even increase sales by increasing awareness [25]. Therefore, the studies above highlight the value of negative reviews and imply the need for a systematic understanding of behavioral patterns beyond negative reviewing from the perspective of online consumers; however, less attention has been paid to this perspective in existing exploitations.

Online User Behavior: Related Literature
In fact, modeling online user behavior can help e-platforms target customers and implement market segmentation [52] and marketing strategy [53]. Koufaris applied a technology acceptance model and flow theory to model online consumers and identified the online consumer's dual identity with a buyer and a computer user [45]. Hassan et al. examined the interlink of social media use, online purchase behavior and mental health using partial least squares structural equation modeling [54]. Corradini et al. proposed a multidimensional social-network-based model to construct the profile of negative influencers and their characteristics on Yelp [55]. Tan et al. introduced sentiment analysis and network structure analysis into user-level feature identification [34] but ignored the preferences of consumers. Bogdan et al. investigated social media messages' effects of emotional and rational posts on persuasiveness and consumer behavior [56]. Intuitively, while Ghose et al. aimed at the user's status in the e-commerce network and modeled the helpfulness of reviews with the historical data of the user [24], the study did so without the delineation of the overall or stable features of individuals. Bai et al. focused on the characteristics of early reviewers and their benefits for marketing [33]. Raphaeli et al. compared online consumer behavior in mobile versus PC devices [57]. Lee et al. divided online users into high-engagement and low-engagement groups and concluded that the two groups have different performances in emotional perception and purchase desire [30], providing the possibility of further dividing groups based on consumers' characteristics. However, while digging into negative reviews, the existing studies on buyer behavior inherently miss the focus of negative reviewing, such that the various patterns that depict how consumers of different groups negatively review still remain unknown.
A review of the relevant research on negative reviews and the behavior among the sources that we accessed revealed that the insightful value of negative reviews has been illustrated; however, we were unable to find research focused on understanding the buyer behavior that underlies negative reviews. Moreover, in related research, a fine resolution has not been reached when distinguishing different users, and even worse, the considered features lack richness. Motivated by these issues, this paper aims to shed light on the online consumer behavior patterns of negative reviews from the perspectives of various dimensions and a fine resolution of user levels.

Dataset Overview
In this study, we focus on e-commerce platforms in China, which has the most ecommerce marketing on the scale of consumers and related firms and has the highest rate of increase in online consumers, to probe into various behaviors beyond negative reviewing. Therefore, we implemented our research based on an online review dataset from JD.com, which is the largest B2C platform in China and increased in market size and gross merchandise volume at the rates of 0.8% and 24.37% in 2019, respectively.
An overview of the dataset we used in this study can be found in Table 1, and a brief description of the attributes is shown in Table A1. In total, we collected 1.45 million negative reviews within 47 million reviews of four sectors, and the time span ranged from 2008 to 2018, covering the most recent 10 years. The large volume of online consumers was split into five levels, referring to a feature of user accounts in JD.com. Note that the user level generally goes up with frequent purchase or other interaction behavior, such as sharing, reviewing and answering questions in forums. A similar mechanism also exists in other platforms, such as Amazon, Tmall, etc. Due to anonymity, only 2.05 million users that are associated with these reviews could be uniquely identified. In addition, the rich attributes of online reviews on JD.com guaranteed the feasibility of measuring the various behaviors of reviewing in the following experiments. The four sectors selected out of all 14 sectors were the main commodity categories on JD.com and could cover different commodity characteristics to obtain universal patterns. It should be mentioned that there were few spam reviews or bot reviews in our dataset due to the comment mechanism on JD.com, in which sophisticated techniques such as captchas are extensively employed to prevent false reviews. Furthermore, being a self-operated platform [58], there is no motivation for JD.com to encourage paid reviews. Moreover, we deduplicated the dataset according to the comment ID, a unique identifier of reviews. The definition of negative reviews in e-commerce platforms may differ and mainly refers to reviews expressing consumers' negative evaluations, usually evaluated as low scores. According to JD's user interface display, negative reviews refer to the one-score (the lowest score) reviews from consumers. The evaluation system for consumer reviews on JD.com is similar to that on Amazon.com-consumers who bought certain goods can post their comments in text, photos and scores about their purchase experiences. Considering the platform definition and general awareness, we identify negative reviews as reviews with score 1. Thus, reviews are regarded as generalized indexes of reviewers' attitudes and served as ideal proxies to probe buyer behaviors online in this paper. Our dataset is publicly available through https://doi.org/10.6084/m9.figshare.11944947.v3, accessed on 17 September 2020. It should be noted that the data form and underlying logic of JD.com involved in this study, including the business model, user interface, definition of negative reviews, user-level taxonomy and so on, remained typical and unchanged in recent years, especially after 2018. Thus, our findings from this dataset are anticipated to reflect constant and even general patterns.
An overview of the negative review dataset can be seen in Figure 1. Figure 1a shows the proportion of different scores in our dataset, and it can be seen that the negative reviews only occupy a small percentage (<0.05). Moreover, the proportion of reviews with score 2 is significantly smaller, compared with reviews with score 1. That is one of the reasons that we excluded reviews with score 2 from negative reviews. Even though the number of negative reviews is small, to some extent, their significance is greater than that of positive reviews with higher scores in the aspects of problem exposure and purchase helpfulness, such that the attribute usefulCount can serve as an indicator [22,23] (see Figure 1b).   Figure 1d shows the percentage of different review scores in different user groups. Though the identified consumers occupy a relatively smaller amount than the anonymous ones, the total volume of the four sectors still approaches 2.05 million, far exceeding the population size in conventional surveys or questionnaires. Thus, it will reliably facilitate the subsequent trace of the emotion evolution of negative reviewing at the individual level. However, for the examination at the collective level where the individual identification of consumers is unnecessary, the attribute of the user level (see Table A1) was directly used to divide consumers into different levels. Specifically, according to the platform's description, from copper and silver to golden and diamond, as the user level increases, it indicates that the user has purchased more, expensed more money or interacted more frequently, and PLUS users can be regarded as VIP users who paid an extra fee to enjoy more membership benefits.

Methods
To characterize online consumer behavior in negative reviewing from multiple aspects, the texts of negative reviews offer rich signals, and several cutting-edge text mining methods were employed here. Specifically, topic classification, aspect mining, word embedding and semantic distance measuring were used to probe various reviewing behaviors. Note that before all our experiments, we preprocessed the review dataset by deduplication and the feature vector generation of review texts. Figure 2 shows the analysis procedure with the methods employed in this paper.

Classifier Construction
To recognize what consumers were complaining about in negative reviews, we implemented a supervised classifier for automatic reason detection in negative reviews. It was found that product complaints and poor service are always sources of online consumers' unsatisfactory shopping experiences and trigger the posting of negative reviews [59,60]. According to this, after browsing hundreds of negative reviews in our dataset, we determined the four factors that lead to negative reviewing online: poor performance in logistics service, product, consumer service and marketing service. They are outlined as: logistics-reviews complaining about unsatisfactory performance related to logistics, such as slow logistics, slow delivery, the bad attitudes of courier service staff, parcel damage during delivery, etc.; product-reviews expressing defects about the product itself, which can differ from the categories of the product, for example, negative reviews about computers may contain comments about poor CPU performance and heat dissipation; consumer service-reviews whose content are about customer service, such as a lack of feedback response, problems with installation services, slow solutions to problems, the rude attitudes of sellers, etc., and marketing-reviews about sellers' false marketing behavior, such as untrustworthy or malicious price increases or false advertisements.
In classifier construction, 8153 negative reviews were randomly selected from four sectors as a training set and tagged by five well-trained coders separately. Among the algorithms we applied, the logistic regression algorithm and the bag-of-words model produced the best performance, with a word list as input and the four reason labels as output. Specifically, the five-fold cross-validation of the classifier demonstrated that its accuracy and recall both exceeded 0.80.

Aspect Mining
To further divide reasons into more concrete topics and obtain a more detailed understanding of why users post negative reviews, we applied a generative statistical model-latent Dirichlet allocation (LDA) [61,62]-to help with automatic topic recognition. Moreover, the advantages of automatic clustering and manpower savings make the LDA algorithm suitable for situations where concrete aspects are unknown. With respect to the parameter settings, the topic number, topic similarity and perplexity were comprehensively considered.

Semantic Network Construction
Consumers may have different expression habits in negative reviewing, especially considering the various user levels. It is assumed that the habits of expression in reviewing could be essentially reflected by the co-occurrence of words in texts, i.e., the similarity in word pairs, in which more similar words would co-occur more frequently. According to this assumption, the drifting of expression habits across levels could be well measured through the variation in similarities between common words that were adopted by all levels. To measure the similarity landscape, a word network of semantic closeness was first established for each user level.
To better reflect the semantic similarity in the background of online shopping, we trained five word-embedding models first for negative reviews posted by users from different levels. Word2vec is a group of models where words or phrases from vocabulary are mapped to multidimensional vectors of real numbers [63]. The continuous bag-ofwords (CBOW) algorithm was employed to train the embedding model to effectively enrich the semantic density of short text and overcome the possible sparsity, with a minimum count of 1 and a vector size of 200. The semantic similarity between words could then be calculated by the cosine distance between the corresponding embedding vectors that were derived from the embedding model.
In building the word network of a certain level, each common word could be connected to its most similar N neighbors inferred from the word2vec model of the corresponding level, where N could be tuned to adjust the threshold of semantic closeness. Then, for the drifting of expression habits of two levels, we simply ranked common words according to their structural indicators, such as degree, k-core index and clustering coefficient, in level-dependent word networks and employed the Kendall rank correlation to represent the variation in the semantic landscape across these two levels. For example, larger correlations indicated more consistent expression habits.

Emotion Measures
To measure the emotions delivered in reviews, a dictionary-based solution [64] was selected, which contained 1180 positive Chinese words and 1589 negative Chinese words. Two emotion indexes were further defined. The positive (negative) rate, denoted as r pos (r neg ) = n pos (n neg ) n , can be calculated by the proportion of positive words or negative words in this review, where n pos or n neg refers to the number of positive or negative words in a review text and n refers to the number of total words in a review text. Furthermore, the polarization, defined as i polar = n pos −n neg n pos +n neg , n pos + n neg = 0 0 , n pos + n neg = 0 indicates the extent of polarization, for example, positive implies positive emotion, negative implies negative emotion, 0 is neutral, and approaching −1 of the polarization indicates that the review is extremely negative.
It is worth mentioning that all of the following analysis was conducted from two views, in which the collective view meant putting all users together and the user-level view meant grouping users into levels.

Results
In this paper, to understand online consumer negative reviewing behavior from a comprehensive perspective, consideration is given to the following: when consumers would like to post negative reviews, such as the exact time in a day or the interval after they bought a product, concluded as the temporal dimension; what content they express in negative reviews or why they post negative reviews, summarized as the perceptional dimension, and how they express the negativeness and how their emotion evolves, generalized to the emotional dimension.

Temporal Patterns
The temporal patterns of negative reviewing were probed from the review-creation (RC) time and intervals between the review-creation time and the product-purchase (PP) time.

Review-Creation Time
The distribution of review-creation time on an hourly scale can be seen in Figure A1, which contrasts that of the product-purchase time. As shown in Figure A1, the posting pattern of negative reviews in the temporal aspect is consistent with most online behaviors, such as the user active time [65] or the log-in time on social media [66]. Besides a minor lag in RC time compared to PP time, there are few obvious differences we can observe between the two distributions. It seems that there is more shopping in the morning but more negative reviewing at night. Moreover, this lag also inspired a further examination of the interval between RC time and PP time.

Intervals between RC Time and PP Time
There is always a time difference, i.e., an interval, between RC time and PP time, and it suggests a process of product or service experience and usage of consumers. The exploration of intervals between RC and PP times helps us understand consumers' rhythms in negative reviewing from both general and user-level perspectives.
For every purchase record from different sectors in our dataset, we obtained the interval in hours and its log-log distribution, as shown in Figure 3, which contains distributions for all five user levels. As presented, the overall form corresponds to heavy-tail-like and power-law-like distributions, which is consistent with most online human dynamics [46,65,66], indicating that negative reviewing after purchase is bursty [67]. In addition, comparing the all-user group with different-user-level groups, no significant difference can be found, and the five user levels all perform according to a heavy-tail distribution. At the same time, in each distribution of Figure 3, a periodic fluctuation is observed when the interval is larger than 10 min. Though the exact distance in the distribution becomes narrow because of the log scale, a rough measurement surprisingly shows that the fluctuation cycle is approximately 24 h, with the first peak at 24, which means the interval between RC and PP times is more likely to be hours in multiples of 24, i.e., one day. This interesting phenomenon suggests a circadian rhythm in negative reviewing.
To further examine the reliability of the periodic intervals, we thoroughly checked the exact correlation between the purchase hours and the reviewing hours of all negative reviews. As illustrated in Figure 4a, for all users, the occurrence of pairs with the same or close PP and RC hours is significantly higher than that of other cases. It is also interesting that as pairs of the RC and PP hours approach the diagonal, i.e., reviewing at the same time of buying, the occurrences demonstrate a significant increase. This finding indicates the reliability of periodic intervals and their cycles of 24 h, which can be interpreted as online consumers tending to post negative reviews for a certain product at the same hour of buying.
Comparing the performances among user levels, in Figure 4b-f, it can be further observed that the same patterns of periodic intervals for silver and golden levels are closer to those for all users. However, the patches in grids for diamond users and PLUS users are relatively disarrayed, lacking a regular pattern of gradual change. Therefore, it can be summarized that the online interactive behavior, such as purchasing and posting reviews, of users with higher levels can be triggered in a more random manner, especially compared with that of users at lower levels. This can be well explained by the more frequent purchases and more active interactions of higher-level users. Periodic intervals indicate an interesting relationship between negative reviewing and purchase behavior, where the purchase is motivated by user demand and reviewing comes from user experience after using the product. According to this, we suppose that periodic intervals with a 24 h cycle might be related to a perceptional rhythm of online consumption, which means that users tend to perform or carry out related activities at a fixed period in a day. To verify this, we implemented an experiment inspired by Yang et al. [68], where the discrepancy of behavior performance on weekdays and the weekend was examined, implying that randomness and recreation in behavior during weekends are much more intense than those in behavior during weekdays. Figure A2 displays the result of the discrepancy between weekdays and weekends in periodic intervals and supports our conjecture to some extent. As we can see, the periodic interval of the weekday is more consistent and intensive than that of the weekend, indicating that regular work or study activities enforce the perceptional rhythm, while recreation activities weaken the perceptional rhythm. Therefore, we propose that the regular activities of humans are related to the periodic intervals between RC and PP times.
Furthermore, as Figure 5a,b display, the different distributions of PP and RC times in different product categories provide more evidence that perceptional rhythms vary across sectors, which indicates that the time of consumer purchase and reviewing might be associated with the features of a certain product that might lead to a regular timeline for consumer online behavior and then to periodic intervals. To provide more details, we can see in Figure 5a,b that the PP time or RC time of products from Phone and Accessories and Clothing shows peaks in the evening and early morning, which is always for recreation time. In contrast, Computers-related PP and RC times are performed more actively than others at 10 to 16 o'clock, which are normally office hours. To enhance the reliability of verification, we implemented another experiment from the view of identified users, preventing the statistical bias coming from users who purchase products frequently. In Figure 5c, identified users with no less than three purchases are kept to obtain the average hour difference between PP and RC times. Moreover, the proportion of users with more than five purchases (generalized as frequent-purchase users) in all valid identified users is relatively small, which suggests that the high frequency of purchases from frequent-purchase users is not the main factor that leads to periodic intervals. Note that frequent-purchase users account for 19.16% of all users but generate more than half of all purchases. From Figure 5c, we can conclude that periodic intervals come from the stable habits of online consumers in consumption. Therefore, the effectiveness and reliability of circadian rhythms in negative reviewing in Figures 4 and A2 and then the robustness of periodic intervals can be accordingly testified.
Explorations about periodic intervals suggest that the purchase time and reviewing time are sector-dependent and have an interesting relationship with one another, which was rarely accessed previously and can add elements to marketing strategies. Hence, the results of this subsection answer RQ 1.

Perceptional Patterns
One profound value of negative reviews is that they offer a channel for the expression of consumer dissatisfaction about a product or a service, providing future consumers with reference information and sellers with a direction for improvement. Therefore, the recognition of what online consumers complain about in reviews and patterns of how they perceive online buying turn out to be significant. Perceptional patterns refer to the exact reasons behind the consumers' postings of negative reviews and level-dependent preferences related to the cognition toward negativeness. Moreover, it is worth mentioning that differences among consumers' negative reasons might originate from not only the users themselves but also from platform policies for different levels.

Main Reasons for Negative Reviews
For a medium-sized horizontal e-commerce platform, it can receive several million negative reviews that complain about problems from different aspects in one month, of which different departments are in charge. Under this circumstance, it is necessary to automatically identify reasons for a large number of negative reviews, preparing for settlements of the exact problem and other exploration. As discussed in Section 4, we constructed a logistic regression classifier for automatic reason identification and applied it to all negative reviews in our dataset, and Figure A3 shows the proportion of the four main negative review reasons of all five user levels. As can be seen, the sector Gifts and Flowers received the largest proportion of complaints about logistics, which is consistent with our common sense that products from this category have high demands for distribution punctuality and professional equipment. Regarding negativeness toward product quality, Clothing ranked first, followed by Computers, and Gifts and Flowers was the last. The former two have an obvious character of 'commodity first'. With respect to dissatisfaction about customer service and false marketing, the ranking order is exactly the opposite, suggesting that the triggers underlying negative reviewing are sector-dependent. As there is no evident difference among the five user levels, indicating that all consumers perceive the main reasons similarly, more detailed reasons are probed in the later analysis.
To explore whether there are regular patterns for when to post certain kinds of negative reviews in a day, we followed thoughts from the temporal aspect to focus on time-triggered patterns of reasons for negative reviews. Figures A4 and A5 present the proportions of main reasons in an hourly scale for two sectors, which demonstrate different trends. Specifically, the first peak is generally located at 10-12 o'clock as expected. However, the reach of the second peak is sector-dependent, for instance, the sector Clothing, with obvious personal usage features, reaches its second peak at 20-21 o'clock of the personal time for recreation, while Computers, with obvious office attributes, misses the second peak. This indicates that even in the hourly pattern of negative reviewing, the reasons that lead to poor reviews are still sector-related.
Comparing the four sectors we selected from our dataset, there are common laws in the time distributions of the main reasons for negative reviews that can be summarized. First, as observed in Figure A4b, the negative reviews of logistics obviously reach peaks at 11 to 12 a.m. and 16 to 18 p.m., which is at the pick-up time. Second, there is an upward trend in the number of negative reviews of product quality in the evening, especially for categories with personal use features, such as Phone and Accessories and Clothing, and this trend is not very evident for sectors with office attributes. Third, a downward trend can be seen in negative reviews that complain about customer service for categories with office attributes, which vanishes in categories with personal use features. Moreover, there is no trend observed for negative reviewing due to false advertising. These findings imply that the hourly patterns of posting negative reviews for a certain reason have a close relationship with the schedule of daily life and demonstrate stable rhythms. Moreover, the distribution of the reasons that lead to negative reviewing is user-level-independent, as no evident discrimination across levels is found here.

Detailed Aspects Leading to Negative Reviewing
To explore the deeper regulations among users about how they regard negative reviewing, a division of only four main reasons is somehow insufficient. Therefore, we constructed an LDA model based on every main reason to further detect online consumers' detailed aspects for negative reviews. It should be noted that we did not construct an LDA model for the reasons of product defects, considering that the review content of product defects is always sensitive to product function, design and category, lacking the possibility of stable clusters.
We considered model perplexity and topic coherence together in setting the parameters of the LDA. Model perplexity can be characterized by a positive number that indicates the uncertainty level, while topic coherence is used to describe the similarity among different topics and can be characterized by Umass coherence [69], which is commonly a negative number indicating a better topic distribution through a smaller absolute value. According to this, we determined the criteria for the selection parameters as the multiplier of Umass coherence and perplexity, called the multiplier parameter, with a larger multiplier parameter suggesting a better performance of LDA. Besides the multiplier parameter, the topic number was also considered when selecting models to reduce the overfitting possibility. In line with this, we determined three models for the three main reasons, with topic numbers 8, 9 and 7. The details of the parameter selection and topic words for every model can be seen in Figure A6 and Appendix B.
According to the topic words (see Appendix B) of each topic, we summarized complaints on logistics, customer service and marketing, as seen in Table 2. The percentage of each aspect is shown in Figure 6. More than 40% of the logistics-related negative reviews complain about the low speed of delivery, followed by improper delivery timing with 18% and packaging problems with 11%. For customer service, complaints about shipping fees when returning goods come first, followed by rude customer service attitudes, delivery time and gift packaging negotiation issues. Thus, it can be proposed that the e-commerce platform should take more effective actions to enforce employee training, the construction of logistics infrastructure and channels of information feedback. Moreover, an abnormal price increase or an abrupt price reduction after purchase serves as the most dominant issue in false advertising, suggesting that e-commerce platforms should pay more attention to the price monitoring and management aspects. Additionally, we can further find discrepancy among user levels' preferences in Figure 6. For example, golden users have low tolerances for common problems, such as slow delivery and shipping fees when returning goods, while copper users are sensitive to consumer service attitudes and logistics speed. Even more inspiring, copper and golden users have more consistent preferences, while silver and PLUS users are more diversified in detailed aspects leading to negative reviewing. Though the main reasons that trigger negative reviews are level-independent, different preferences across levels in detailed aspects demonstrate that consumers of various levels post negative reviews for diverse reasons, implying that a further examination of level-related discrimination is necessary. ! Figure 6. Proportions of detailed aspects of negative reviews for the three main reasons. In each plot, the x-axis represents different aspects, as illustrated in Table 2, and the y-axis stands for different user levels. The horizontal sum of the patches in every subplot is 1.

Expression Habits in Reviewing
The reasons for users posting negative reviews are directly reflected in a single sentence of reviews, while expression habits are explored through establishing a frequent connection in multiple sentences and are an externalization of buyer perception. In this paper, we attempt to characterize the expression habits of different user levels from the aspects of review length and semantic similarity.
Review length is an indicator of review helpfulness, information diagnosticity [10] and product demand [70] and is an important factor for user centrality [71]. Here, we implemented two ways to measure review length, with the same outcomes. The first method was to measure the exact length of all characters within the text of the review, and the second was to count the number of effective words after cutting the text into a word list and filtering stopping words. Figure 7 shows the distribution of negative review length for five user levels, in which both methods were employed with the same performance. Considering most of the outliers come from the deliberate repetition of meaningless words and increase the difficulty of viewing and comparing among user levels, fliers are ignored here. It can be seen that with the increase in user level, the negative review length has an upward trend. Moreover, the upward trend is statistically significant in the one-way t-test. The growth of review length with levels implies that consumers of high levels post longer negative reviews and offer more reference and helpfulness for both consumers and sellers. From this view, the value of negative reviews from high-level users should be stressed in practice.  In each plot, the x-axis refers to different user levels rising from left to right and the y-axis refers to review length. Moreover, the growth of reviews with user levels is statistically significant in the one-way t-test.
Differences in users' expression habits can be reflected in the landscape of word similarity in contexts. Figure 8 shows the semantic networks of the word 'match' for five different user levels, where words connected to it are the top-four most similar synonyms of different contexts. As can be seen, users from different levels indeed demonstrate diversity in expression habits in terms of network structures. According to this, we constructed word networks for each user level to examine the similarity in user expression habits across user levels. For the selection of N in building word networks as it is illustrated in Section 2.2, the proportion of the largest connected subset in commonSet, r c , was used to narrow the range. As N increases, r c increases, implying that more words are connected to form a more complete landscape of semantic similarity. Until r c ends its highest rate of increase, it can be conjectured that the most representative and strongly connected words have been contained in the network. According to this, we set N as 4 (see Figure A7).
After establishing five networks with N = 4, the similarity in user language expression habits was measured by the Kendall correlation coefficient in a bootstrapping manner. Regarding the indicator, we focused on three different ones that characterized the different features of the network: degrees measuring node connections, clustering coefficients measuring adjacent node connection density and k-core indexes, meaning node locations within the network. When implementing bootstrapping for a robustness check, here, the repeating frequency was set as 1000 and the size as 500, i.e., 500 words were randomly selected from the common set for all user levels. Figure 9 presents semantic similarity in the Kendall correlation among the five user levels. On the one hand, it can be observed that users with such adjacent levels show greater similarity, implying more sharing in expression habits. The silver user and golden user obtain the greatest similarity of 0.22-0.28. On the other hand, the absolute values of similarity are mostly in the range 0.1-0.3, suggesting that users of different levels are profoundly different in their expressions of negative reviewing. In addition, the great differences among user levels are truly beyond our expectation, which indicates that if e-commerce platforms regard different users' expressions as the same, there will be a high possibility of misunderstanding users' reviews and then implementing improper actions. Note that the results from other settings of N are consistent, and our above observations are robust (see Figure A8). The exploration is aimed at perceptional discrimination in negative reviewing, examining causes beyond reviews and expression habits in reviews. Our investigations reliably demonstrate the existence of level-dependent patterns in negative reviewing. The conclusions here can answer RQ 2.

Emotional Patterns
Similar to why consumers post negative reviews, the emotion features of consumers can also be extracted from the texts of reviews, along with scores that represent consumers' emotion baselines. Emotion, serving as an attribute for the entire review, can directly influence how practitioners solve complaints from consumers. Here, we regarded negative reviews as mixtures of both negative and positive emotions, instead of only negativeness, and aimed at the distribution of buyer emotion from different levels and its evolutionary patterns in both time and tendency.

Distribution of User Emotions
Consumers from different levels may have different emotion distributions, such as the proportion of negative or positive sentiments and the extent of polarization. If they exist, some patterns can be used to guide sellers to pay more attention or take special actions regarding those with extremely negative feelings.
First, we calculated r pos and r neg to characterize the emotional degrees of different user levels. Figure 10a shows the average percentage of positive and negative emotion words in negative reviews posted in 2017. As can be seen, the degree of negative emotion decreases as user levels increase, while there is no evident trend in positive emotion. Interestingly, golden users have the lowest positive words usage, and copper users deliver the highest degrees of both positiveness and negativeness. It can be summarized that consumers from upper levels are usually less emotional than those of lower levels. The results may be explained to agree with our expectation, as consumers from upper levels have more experience in product selection and negotiation with sellers, leading to being less emotional but more rational. The results of the other years, from 2013 to 2016 (seen Figure A9), are consistent with the stabilization of the user structure and the business model. The single indexes r pos or r neg can reflect only the positive or negative degrees of a review. To measure the polarization, we introduced the index i polar here to figure out the distribution of buyer emotion, as seen in Figure 10b. It can be found, as expected, that negative polarization dominates negative reviewing for all consumers of all sectors. For the sectors of Gifts and Flowers, Clothing and Phone and Accessories, the negative and extremely negative polarization increases as the user levels increase, while for the sector of Computers, it shows an upward tendency first and then goes downward. For the upward tendency in the user level that is different from that shown in Figure 10a, it can be interpreted as the different compositions of the two indexes. The index in Figure 10b omits words that are neither positive nor negative in the sentiment dictionary. Therefore, the different results can be explained by considering that consumers with lower levels tend to use more negative words than upper-level users; however, in negative reviews from upper-level users, positive words co-occur with negative words less, leading i polar to be equal to or near −1. Through these findings, we can conclude that upper-level users are less emotional than lower-level users, but the emotion expression in negative reviewing is more condensed and concentrated.
To further probe the differences in the usage of emotional words across the five user levels, we calculated the possibility of appearing in one negative review for each word w in the sentiment dictionary, which is defined as f w = n w /n r , where n r is the number of all negative reviews posted by users of a certain level and n w is w's occurrences within these reviews. For each word, the variance in occurring possibilities across the five user levels was used to filter out words with greater differences in emotional preferences, as seen in Table 3. Interestingly, it can be concluded that users from lower levels place more importance on service attitudes and whether the product matches what they are in marketing, while upper-level users tend to have diversified shopping demands and preferences and emphasize the overall experience. The diverse demands of upper-level users, along with their more condensed review content, can lead to a less biased narration. This observation is consistent with the findings from the perception of the causes leading to negative reviewing. Table 3. Emotional words with different trends as the user level increases.

Emotional Words Aspect
Fre. of usage increasing as the user level increases

Emotion Evolution over Time
Online reviewing might occur continuously, such as posting multiple times in the same day. The continuous probability in one day for the PP and RC times of reviews from identified consumers helps to verify its existence. The results show that the continuity for RC time is stronger than that of PP time in three sectors, and the remaining one has two numbers that are very close, indicating that compared with purchase time, reviewing time is more continuous (see Figure A10). In line with this, a natural question is how consumers' emotions evolve in continuous reviewing, which makes the examination of the emotion sequence necessary.
In addition to sentiment indexes, the review score is also a quantitative indicator of the emotional attitude and can be employed to reflect the evolution of emotion. For every identified user, we sorted the scores of continuous reviewing in a time-ascending order and then cut sequential scores into sequences in which the intervals of adjacent scores should be smaller than a specified threshold, e.g., 0.5 or one hour. The continuous probability of every pair of adjacent scores was defined to determine whether the postscore would be influenced by the former one. For example, a score pair could be defined as (1, 2), which means that the previous one is score 1 and that the post one is score 2. Furthermore, to eliminate the influence of the unbalanced occurrences of different scores, the proportion of five scores in all reviews was implemented to normalize the continuous probability. Figure 11 shows the continuous probabilities after the normalization. From the figure, we find that all users demonstrate a momentum in negative emotion, which is shown by the diagonal bubbles that are larger than the others; moreover, the farther away from the diagonal, the smaller the bubbles are. In addition, comparing Figure 11b-f suggests that users from upper levels have a stronger momentum in negative emotion than those of lower levels, as the sizes of the diagonal bubbles become larger as the user level increases. For every threshold in [0.5, 1, 2, 24], we obtained the same conclusion, as a promise of robustness. In contrast with the conclusion above that the upper-level users are less emotional, a more intensive momentum in negative emotion indicates that although upper-level users have richer sets of experiences and more stable emotions, once they are faced with an unsatisfactory experience, it will cost more for sellers to comfort them and alleviate the negative outcomes. Figure 11. Normalized continuous probability for every score pair with a threshold of an hour. The x-axis and y-axis are the scores of the previous reviews and the postreviews, respectively. For every bubble, we normalized it by dividing the initial probability by the corresponding postscore's proportion. The colored bubbles with a value larger than one are blue, and others are red. Therefore, the size, along with the color of the bubbles, represents the extent of the influence that comes from the emotional basis of previous reviews.

Tendency of Emotion Shifting
Emotion toward a certain objective is not static and will change over time due to external factors. To explore how online consumers' emotions shift over time, we focused on reviews with after-use comments, which were posted after the initial review, usually separated by a certain time of product usage. Posted after-use comments, compared with the initial one, often contain some shifts regarding usage experience and the resulting attitudes and emotions. According to the polarization i polar pair for the initial review and after-use comment, abbreviated as (i polar−initial ,i polar−a f teruse ), respectively, we defined three directions of mood shifting, that is, increasing with the pair as (−,+), (0,+) or (−,0), implying a shift toward more positive emotion, decreasing as (+,0 ), (+,−) or (0,−), implying a shift toward more negative emotion, and stable as (0,0), (+,+) or (−,−), which implies no conversion. To reveal the exact patterns of emotion shifts beyond negative reviewing, we examined the occurrence of difference shift directions and the time needs for shifts across both user levels and review scores.
Regarding the proportion of the three shift directions, as shown in Figure 12, more than 35% of consumers who posted negative reviews obtained emotional improvement, which can be an indicator of how effective the action was that was taken after the poor ratings. Moreover, there is a slight downward trend of the increasing situation as the user level increases, suggesting that it is more difficult to change upper-level users from dissatisfied to satisfied. Then, in Figure A11, the scores of initial reviews are taken into consideration. It can be observed that the gap first narrows between the percentages of emotion increasing and decreasing and then increases in the opposite direction. This trend proves that the saturation effect occurred in the emotion shifts, which is described as always being much easier for emotion to develop in reverse rather than to become stronger in the initial direction. This result further implies that the shift toward the opposite direction will easily trigger the posting of after-use comments. The interval between an initial review and the after-use comment is measured as the lag needed for the emotion shift, with the resolution of an hour. In Figure A12, we provide a different perspective from the user level. It can be seen that for all negative reviews with after-use comments, lags become longer as the user level increases, except that a slight fluctuation occurs with PLUS users. Furthermore, the same phenomenon can be seen only in negative reviews with an increasing emotion shift. Therefore, it is conjectured that the average interval becomes larger as the user level increases, together with the lags for emotion improving. This suggests that users of upper levels may need a longer time to change their negative impressions of products, which agrees well with our conclusion about upper-level users' more intensive momentum in negative emotion.
In this section of sentiment patterns, we focus on the polarization, evolution features and shifting emotions in negative reviewing and determine that lower-level users tend to have more intensive negative emotions, while upper-level users demonstrate a more intensive emotion momentum when faced with unsatisfactory shopping experiences. Moreover, patterns in emotion shifts prove again the conjecture about upper users' negative momentum and the existence of the saturation effect in emotion. These conclusions answer RQ 3.

Discussion
Online reviews and consumer behavior on e-commerce platforms has been widely studied; however, with respect to negative reviews that can be regarded as a kind of addi-tional reference information for consumers and as a focus on performance improvement for sellers, the behavioral patterns that underlie them have not been explored in a comprehensive manner in the literature. Moreover, existing research aimed at online consumer behavior does not distinguish online users from different levels and only pays attention to a few aspects of features, which is not sufficient to obtain a deep and systemic understanding of consumer characteristics.
This paper attempts to provide a systemic understanding of negative reviewing from the temporal, perceptional and emotional aspects. It utilizes a multisector and multibrand dataset from JD.com, the largest B2C platform in China, and implements various methods of semantic analysis and statistics to offer the first empirical experiment based on a Chinese e-commerce dataset with more than several million pieces of data. Our main findings are the following: (1) Circadian rhythms exist with regard to negative reviews after buying which are related to the time consistency of people's daily activities. (2) The similarity in different users' expression habits from adjacent levels is significantly greater than that of other level pairs, and users with lower levels are more sensitive to prices and a seller's deceitful acts concerning pricing or to the rude attitudes of sellers, while the demands of users at higher levels are more varied and exhaustive. (3) In the emotional dimension, users at lower levels experience more intensive expressions of negativeness but with lower emotional momentum in negativeness.
Our study has implications for both academics and practitioners. For academics, we contribute to an enhanced understanding of online consumers' negative reviews, from the perspectives of temporal patterns, what kinds of experiences they tend to regard as failures, and emotional patterns regarding how they express and how this expression changes. Although there have been many recent studies that examine features of online reviews and negative reviews, to our knowledge, there is a lack of research aimed at online negative reviewing. In addition to insights into behavioral patterns of negative reviewing, the richness of our empirical setting provides a possibility for future research to dig deeply into not only online negative reviewing but also other online behaviors within buying. For example, the conclusions and conjectures put forward in the temporal dimension suggest that the purchase time and reviewing time are sector-dependent and have an interesting relationship with one another, which was rarely discovered previously, and provide value for more research into human behavior with perceptional regulation. Investigations from the perceptional aspect reliably demonstrate the existence of level-dependent patterns in negative reviewing and provide the possibility of establishing perception models for precisely profiling different consumers. Patterns in emotion prove the feature of upper users' more intensive negative momentum and the existence of the saturation effect in emotion, which are seldom paid attention to and suggest a deeper dig into distinguished emotional patterns across groups. There are previous studies that divide online users into groups to study their characteristics [30,33,57], for instance, differentiating by engagement level and concluding there are different performances in emotional perception [30]. Our investigations from the perceptional and emotional dimensions provide more fine-grained user differentiation and deeper pattern mining to reveal their different service preferences, expression habits and negative momentum. To sum up, some unexpected findings shed light on the diversity of user behavior, such as the relationship between purchase time and reviewing time, the great differences among users' expressions and the different amounts of emotion momentum, concepts previously examined in a limited way.
For practitioners, some findings in this paper are important because they offer guidelines or tools to determine a platform's or a seller's problems or disadvantages and then to devise targeted management strategies to improve a corporation's performance. Moreover, the results about the characteristics of different user levels underscore the necessity of implementing a distinguished strategy toward different user groups. In the temporal aspect, explorations about periodic intervals suggest that in addition to personal preferences on products, the time line of regular activities can also be modeled to help user profiles and precise marketing, such as combining a timing preference with a user and the product category or precisely profiling consumers' online active hours. Findings and inspirations in the perceptional dimension indicate that understanding the behavioral patterns behind negative reviewing can indeed help to trace back to the existing deficiencies and spark new improvements, such as information feedback channels and price monitoring. Moreover, the profound differences in expression habits among user levels also make level-dependent strategies necessary in an e-commerce platform, such as implementing different word strategies when facing different user levels to reduce misunderstandings. The preferences for negative reasons suggest that managers should provide more purchase and selection directions for junior users with less experience but more extremely negative expressions, which harm a platform's reputation, and provide the personalization to review displays on platforms. In addition, according to momentum in negative emotion, even though upperlevel users employ less negative word-of-mouth, as long as they experience unsatisfactory shopping, the platform should pay more in costs or measures to comfort and alleviate their negative mood and then to mitigate or avoid the negative word-of-mouth from breeding and spreading negative emotions. These proactive strategies could boost sales, reduce negative word-of-mouth and increase consumer satisfaction. This reaches a similar implication as the research of Richins and Marsha [19] and Le and Ha [18], which suggests the negative effect of untimely disposal and the importance of managerial responses towards negative reviews, while the finding here further points out the differences among users' perceptions of sellers' responses and suggests corresponding strategies.

Conclusions
In this paper, we have studied the online consumer behavior of negative reviewing in a systemic and user-distinguished perspective. Our analysis dimensions range from the temporal and perceptional to emotional aspects and, to some extent, offer a comprehensive understanding of the online consumer complaint behavior compared to prior literature that mostly lacks a richness of analysis dimensions. We have found circadian rhythms in negative reviewing which can be related to the time consistency of people's daily activities. Moreover, detailed causes leading to negative reviews and expression habits across user levels are exactly profiled. Differences in the user perception of negative reviews and emphasis of services are revealed. In addition, users from lower levels express more intensive negativeness, but those from upper levels demonstrate a stronger momentum in negative emotions. Our findings in this paper prove the possibility and value of segmenting online users according to user level and to mine their multidimensional behavioral patterns. For e-commerce practitioners, some findings and inspiration can help platforms redesign and improve their marketing strategies.
In the present work, due to the partial shortage of our dataset, the geographic location of consumers and the applicability of the corresponding findings on more platforms are not fully explored, which can be promising directions in future related research.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

B2C
business-to-consumer CRM consumer relationship management WOM word-of-mouth LDA latent Dirichlet allocation CBOW continuous bag-of-words RC time review-creation time PP time product-purchase time Appendix A Table A1. A brief description of the attributes of our dataset.

Description Snippet
Review text Text that consumers post on an e-commerce platform that expresses their attitude and evaluation regarding goods which always has a numerical score or photos about goods.
'The outer casing is severely deformed, the cooling effect is poor, and the noise is large'.
Score of review int A numerical score on goods from buyers, a representative of consumer's integral evaluation, with the lowest score of one for a poor experience to the highest score of five for good one.
A review about clothing that receives a score of five is 'It fits me well, and the seller is really patient'.
Negative review text The reviews that receive the lowest scores of 1 often represent unpleasant purchase experiences.
'The outer casing is severely deformed, the cooling effect is poor, and the noise is large.' Sector text (with a unique code) A product's category on an e-commerce platform, formulated according to the function or features of goods. It is usually retrieved from the platform itself. An anonymous user in our dataset contains '***', indicating that the platform encrypts the user's nickname; an identified user is buaa_jduser. usefulCount int A numerical count of how many helpful votes a review received from other users until the time we collected the data from JD platform.
The review 'The outer casing is severely deformed, the cooling effect is poor, and the noise is large.', received five usefulCounts.
replyCount int A numerical count of how many replies a review received from the seller behind the review that he/she posted until the time we collected the data from the JD platform.
The review 'The outer casing is severely deformed, the cooling effect is poor, and the noise is large' received five replyCounts.
After-user review text A review that a user posts after the first review on a certain product that often contains information or an attitude that is updated.
'After using it for a while, I feel that my skin is really getting better'.
User client text (with a unique code) Client type used when a user logs in. e.g., iPhone, Android, WeChat, iPad, Web Page, etc. Figure A1. Distribution of review-creation time and product-purchase time on an hourly scale. All data represent negative reviews. All data represent negative reviews. The x-axis represents the hour, and the y-axis represents the proportion of consumers that purchase or post negative reviews at the corresponding hour.    The x-axis is hours in a day, from 0 to 23, and the y-axis represents the proportion at the corresponding hour. Compared with the two sectors in Figure A4, it can be concluded that Phone and Accessories are products with strong personal features such as Clothing, while peaks in the sector Gifts and Flowers often appear at 10-12, 16 and 21-22 o'clock, which is often delivery time. Figure A6. Parameter selection. (a-c) are curves for the main reasons of logistics, customer service and marketing, respectively. The x-axis corresponds to the number of topics for model training, and the y-axis represents the value of multipliers. It is worth mentioning that this curve was not the only basis of parameter selection. The topic content and topic number were also considered. Figure A7. r c curve for all five user levels. The x-axis shows N, the number of the most similar words chosen to form the network, and the y-axis shows the value of r c . Five curves with different colors represent the five user levels. For the selection of N, we aimed at the one with or near the largest second derivative on the curve, which refers to the most effective or representative part that is contained in the network. Therefore, N = 4 was determined, with N = 5, 6, 7 to prove the robustness. Figure A8. Semantic similarity among the five user levels in the bootstrapping method (N = 5, 6, 7). The similarity is measured through the Kendall correlation coefficient of 500 randomly selected words that are repeated 1000 times. The x-axis and y-axis represent five user levels, and the numbers in the patches refer to the average with the standard deviation in brackets behind it. Patches in the grid refer to the similarity between two user levels, with darker colors indicating greater similarity. In addition, the outcomes here illustrate the reliability of the semantic similarity among user levels. The five groups, from left to right, refer to user levels, that is, copper, silver, golden, diamond and PLUS. The absolute value of the y-axis means the percentage of emotional words. Moreover, the error bar is calculated by the sample standard deviation. As the four bar plots show, the same tendency is performed in the negative degree and is regular in the positive degree in 2015-2017. Regarding the compared performances in consecutive years, it reflects that JD has emphasized the paid users' experience more and provided more rights or guarantees for them in recent years. x-axis represents four sectors in our dataset, and the y-axis is the continuous probability. Moreover, it can be seen that the continuous probability for RC time is often larger than that for PP time. Figure A11. Proportion of emotion shifts in all reviews and all reviews with different scores, taking the Computer sector as an example. The x-axis shows the three types of emotion shifts defined above. The heights of different-colored bars refer to the percentage of a certain kind of emotion shift in all after-use comments. Note that the results from other sectors are the same. Figure A12. Average lags in an hour of five user levels' emotion shifts in all negative reviews, taking the Computer sector as an example. The x-axis shows five user levels' emotion shifts. The heights of different-colored bars refer to the average lag of a certain kind of user levels' emotion shift in all after-use comments with initial scores equal to 1. Note that the results from other sectors are the same. Note that the upward trend of lags for increasing emotion in user level is statistically significant in the one-way t-test, which proves negative momentum again.