Text Mining Based Approach for Customer Sentiment and Product Competitiveness Using Composite Online Review Data

: We aimed to provide a realistic portrayal of customer sentiment and product competitiveness, as well as to inspire businesses to optimise their products and enhance their services. This paper uses 119,190 pairs of real composite review data as a corpus to examine customer sentiment analysis and product competitiveness. The research is conducted by combining TF-IDF text mining with a time-phase division through the k-means clustering method. The study identified ‘quality’, ‘taste’, ‘appearance packaging’, ‘logistics’, ‘prices’, ‘service’, ‘evaluations’, and ‘customer loyalty’ as the commodity dimensions that customers are most concerned about. These dimensions should therefore serve as the primary entry point for improving the commodity and understanding customers. A review of customer feedback reveals that the composite reviews can be divided into three time stages. Furthermore, the sentiment expressed by customers can become increasingly negative over time. The product competitiveness based on the composite review can be characterised by four regional quadrants, such as ‘Advantage Area’, ‘Struggle Area’, ‘Opportunity Area’, and ‘Waiting Area’, and merchants can target these areas to improve product competitiveness according to the dimensional distribution. In the future, it will also be possible to take customer demographics into account in order to gain a deeper understanding of the customer base.


Introduction
Online shopping is the primary method of shopping for customers.Customer reviews can provide insight into the quality of the purchased goods and the customer's overall experience.They represent a significant source of information for potential customers, exerting a considerable influence on their purchasing decisions [1].In the absence of clear and reliable information, potential customers are inclined to rely on prior customer experience feedback and product reviews to reduce the decision risk associated with online shopping information uncertainty [2].The customer base is heterogeneous, comprising individuals with varying personal characteristics, levels of knowledge, and economic status [3,4].Furthermore, the shopping interests and preferences of customers are dynamic and may undergo change over time.The academic community has been engaged in the study of customer behaviour, with a particular focus on the dynamic nature of customer interests.This is due to the differences between customer demand preferences and actual purchases, as well as the phenomenon of emotional drift in the distribution of emotions over time.It is crucial to note that these variables can be effectively managed with the appropriate approach.In the context of a highly competitive business market, it is of paramount importance to accurately identify customers' changing interests and emotional trends.This enables businesses to enhance their products and services, strengthen their core competitiveness, and increase customer satisfaction and purchase conversions.The objective of this study is to provide a realistic portrayal of customer sentiment and product competitiveness.This is achieved by motivating merchants to optimise their products, improve their service and increase purchase conversion rates.To this end, the paper examines the textual differences between initial and additional reviews and performs sentiment analysis based on the composite review data of fresh fruit on e-commerce platforms.The analysis identifies the dimensions of customer concerns and discovers the competitive advantages of the products.This enables the acquisition of clear and profound management insights.

Composite Online Reviews and Sentiment Analysis
The composite review function, which has been implemented by numerous e-commerce platforms, has significantly enhanced the review system.It enables customers to submit two reviews of the same shopping experience, the initial review and an additional review.The term 'initial reviews' is employed to describe product reviews provided by customers after they have received the product.In contrast, 'additional reviews' are product reviews provided by customers who have used the product for a certain period of time after completing the initial reviews.The additional review provides a highly accurate and comprehensive representation of the product's features and attributes.It includes valuable intelligence information [5], which has attracted the attention of numerous scholars [6,7].Zhao et al. investigated the impact of management response to online reviews on customer satisfaction, based on additional reviews [8].Chen et al. examined the interaction effect of online composite reviews and time pressure on customer information persuasiveness [9].Shi et al. found that additional reviews differed from initial reviews in terms of content length, number, and affective intensity.Furthermore, the additional reviews indicated that the customer's perception of the product evolved over time, exhibiting a more nuanced attitude in the later stages of the customer experience [10].In a study by Wang et al., two types of reviews were compared in terms of their perceived usefulness by customers [11].Zhang and colleagues examined the influence of alterations in the emotional valence of composite reviews on customers' willingness to adopt information [12].Zhou et al. demonstrated that the combination of reviews with different emotional tendencies can differentially affect customer purchase decisions by investigating the impact of additional reviews on customer information adoption [13].The findings of researchers indicate that the direct sale of goods can be influenced by additional reviews, including the number of reviews and the time interval between them [14].It is evident that sentiment analysis in customer reviews has become a subject of considerable interest to scholars [15][16][17].This highlights the importance of additional reviews in understanding customer purchase behaviour.The current approach to sentiment analysis is characterised by the existence of multiple paths, including those based on sentiment dictionaries, rules and machine learning.Among the aforementioned methods, the sentiment dictionary-based approach is notable for its simplicity and intuitiveness, as well as its suitability for use in the absence of complex models and large amounts of data.Furthermore, it exhibits excellent expandability, which has contributed to its widespread adoption by scholars engaged in the field of sentiment analysis.This study employs a sentiment dictionary-based method for the purpose of carrying out sentiment analysis.
In conclusion, the majority of research in this field compares the value of initial and additional reviews, emphasising their impact on customers' decision-making behaviours, including temporal, emotional, and textual characteristics.Furthermore, both merchants and research scholars have also focused on the length and number of reviews.It is regrettable that previous studies have not examined the temporal changes in customer preferences and product competitiveness.Nevertheless, it is crucial to recognise that additional reviews can provide invaluable insights and that there is a time-series dependence effect on customers' purchasing decisions.In light of the time-varying initial and additional reviews, it can be posited that they can assist merchants in accurately depicting changes in customers' attitudes and interest drift.This is significant in identifying customer prefer-ences, improving product competitiveness, and increasing the conversion rate of customer purchases.

Customer Interest Mapping
Online reviews can be anonymous, massive, and disparate, placing an information burden on customers who require access to content of interest quickly.It is typical for customers to rely on a small subset of reviews, typically between five and ten, when making shopping decisions [18].Consequently, content-based recommender systems have emerged as a valuable tool for enhancing customer loyalty and conversion rates.This is achieved by developing a customer interest model to represent customer preferences and actively recommending information of interest to customers [19].Current research in the field of e-commerce is focused on the representation of the interests of customers [20,21].Various methods exist for representing customer interest, including the keyword list method [22], ontological representation [23], topic representation [24], and vector space model representation [25].The vector space model-based representation is the most commonly used due to its strong computability and operability [25].Previous research has predominantly concentrated on the construction of customer preference models [26], designing recommendation algorithms [27], customer interest prediction [28] and detecting interest drift [29].However, this approach fails to distinguish between customers' interests in different time stages and lacks a comprehensive representation of the distribution of customers' interests and their variations over time.The paper presents a clear and concise division of the customer interest stage into three distinct categories: same-day, short-term, and long-term.The vector space model is employed to provide an accurate description of the customer's preferences at different time stages.In addition, both qualitative and quantitative measurements of the specific interest value are provided.This demonstrates that the customer is highly interested in the information and that there is a strong correlation between the customer and the interest in the information.Consequently, merchants are able to discern the customer's personal interest during different time stages, which allows them to provide a more tailored service and achieve mutual benefit.

Review-Based Product Competitiveness Analysis
The concept of product competitiveness is defined as the combined capabilities and relative competitive advantages demonstrated by two or more products in competition [30,31].Those merchants who are able to identify the characteristic dimensions of customer concerns from feedback information and to formulate effective marketing strategies will gain a competitive advantage in the market.Previous research has sought to ascertain how commodities can gain a competitive advantage in the market by analysing customer feedback.Xu et al. developed a graphical model to visualise and compare the relationship between products and customer reviews using data from the Amazon e-commerce platform.This approach assists enterprises in identifying potential crises and implementing risk management strategies with precision and confidence [32].Wu et al. present an innovative framework based on hybrid text mining and visualisation tools for competitiveness analysis and recommendations for product improvement [33].In their research, Kim et al. employed text mining and sentiment analysis on social media data from Twitter and Facebook to compare the relationship between social opinion and merchandise sales performance [34].The objective of the research was to predict the performance of the market, evaluate the advantages of competing products, and implement timely adjustments to market strategies.Xiao et al. conducted a sentiment analysis of online reviews on product characteristics, which enabled companies to confidently identify primary and secondary competitors [35].Xu et al. identified the key factors that lead to customer satisfaction or dissatisfaction based on online reviews of hotels, providing companies with the necessary insights to support their business marketing activities through online reviews [36].Zhao et al. proposed a product competitiveness model based on mutual information and quantile regression [37].The model integrates a multitude of online sources in order to analyse product competitiveness with a high degree of confidence.
Previous studies have predominantly focused on sentiment analysis and text mining of online reviews from the perspective of customer satisfaction.However, there has been a paucity of studies that have attempted to analyse product competitiveness based on customer interest preference drift.The contribution of this paper is to employ a realworld corpus of composite reviews, combining text mining techniques such as TF-IDF with k-means clustering to analyse customer sentiment and product competitiveness.The approach employs time-stage segmentation to identify shifts in customer interests and preferences over time while also monitoring changes in customer sentiment and offering insights for enhancing product competitiveness.Furthermore, it enhances the depth and breadth of the existing research, thereby making a valuable contribution to the field.

Data Collection and Pre-Processing
This paper presents a comprehensive analysis of online composite reviews of fresh fruit category products on the Jingdong e-commerce platform, which is a well-known and prominent entity in the Chinese e-commerce landscape.The research dataset comprises online reviews of fresh fruit products delivered by Jingdong Logistics from January 2019 to January 2022, collected using Python.The categories encompass five representative categories of fresh fruits.The research analysed 17,319 pairs of composite reviews, including customer names, product IDs, initial review times and content, additional review times and content, and other information.In terms of data pre-processing, this study employs the Python drop_duplicates method for filtering and deduplication, with the objective of eliminating duplicates and data with missing fields.The process yielded 11,919 pairs of valid comment data.Subsequently, once the valid comment data has been obtained, the key detail is to successively use the HIT deactivation word list processing, employing Jieba disambiguation to disambiguate and deactivate the valid comment data.This is performed in order to obtain the specific corpus and to prepare the data for the subsequent textual and sentiment analysis.

Time-Stage Processing
The extant literature unequivocally demonstrates the existence of a temporal pattern in customer reviewing behaviour [13].By comprehending the customer's behavioural patterns during different time stages, merchants can effectively identify customer demand preferences and review motivations.This allows for the reasonable distribution of enterprise resources and the provision of satisfactory customer service.This research characterises the distribution of the time interval sequence between the initial and additional reviews, as well as the number of additional reviews.The time interval is calculated for composite reviews, and the analysis explores customer demand preferences and sentiment trends over different time stages.The optimal clustering k value is determined using dichotomous k-means and the elbow rule [37][38][39].Equation (1) demonstrates how the intra-cluster error variance SSE accurately reflects the cohesion of the clusters.
In this context, k is the number of clusters, C i represents the ith cluster, c i represents the C i centre of the cluster, and x represents a sample from that cluster.
The contour coefficients are calculated for each point in the dataset, and the resulting average value is employed as the overall contour coefficient for the current cluster.A higher value of this coefficient indicates superior model performance, with the range being between −1 and 1. Equation ( 2) displays the contour coefficient for i.
In this equation, a(i) represents the average distance between vector i and other points in the same cluster, while b(i) represents the minimum of the average distance between vector i and all non-similar points belonging to different clusters.

Calculation of Text Feature Weights
A multitude of algorithms exist for text feature weighting, with the most prevalent being the word frequency-inverse document frequency (TF-IDF), information gain, chisquare test, and other methods.Among the aforementioned techniques, TF-IDF is capable of capturing the importance of individual words within a corpus of documents.The process of computing and comprehending TF-IDF is relatively straightforward.Furthermore, it serves to reduce the impact of common words on document similarity, thereby enhancing the visibility of the key information within a given document.Consequently, TF-IDF is regarded as a general and efficient text feature weighting algorithm to assess the importance of a word in a particular document across the corpus.The algorithm ensures that the greater the frequency of a word in a document and the lower the frequency of that word in other documents in the corpus, the greater the contribution of that word.In this paper, we utilise TF-IDF to calculate the degree of importance of the service feature dimensions.The main calculation formula is shown in Equations ( 3)-( 5): In this equation, the symbol N represents the frequency of occurrence; N(t j , i) represents the number of times the t j word appears in document i; S i represents the total number of words in document i; m represents the total number of documents; N(t j ) represents the number of documents in the corpus that contain that word.
The matrix W, which represents the sum of the importance of each word in the corpus, is given by Equation ( 6).The data is normalised through a linear function, as demonstrated by Equation ( 7): In this equation, m ∑ i=1 P i1 represents the sum of the importance of the words in the corpus, of which W i is set as the value of the ith row and the first column of the matrix W in order to differentiate the weights of the various dimensions.
In order to gain the customer's attention (I (r) ) to dimension r in the overall evaluation, it is crucial to focus on indicators with larger weights.Equation ( 8) clearly demonstrates how the weights of word sets with the same feature dimensions are accumulated.A greater frequency of keywords on the same page indicates a higher level of discussion.However, subjectivity can easily affect the accuracy of the summarised word set when manually screening words with varying feature dimensions.The manual screening of words with varying feature dimensions may be subject to bias, which could potentially affect the accuracy of the summarised word set.To address this issue, the actual word frequency of feature dimensions can be recalculated by combining the number of words, the word frequency sum, and the weight sum of the feature dimension word set.In order to calculate the actual word frequency (H (r) ), it is necessary to take the average number of occurrences of each word of the same dimension, as shown in Equation (9).
In this equation, W * i represents the weight of the word t i in the corpus; j ∈ v r represents the set of words t j belonging to dimension r; m ∑ i=1 N(t j , i) represents the word t j frequency in the corpus; and L (r) represents the number of words associated with dimension r.

Calculation of Emotional Intensity
This paper confidently presents a classification of the Chinese emotion vocabulary of Dalian University of Technology into seven emotion categories based on emotion attributes: music, good, sadness, surprise, fear, evil and anger, comprising a total of 27,079 emotion words [40,41].
The interpretation of customer reviews can be challenging when based solely on emotion vocabulary, as negative words and degree adverbs are often used to express emotional tendencies.Negation vocabulary and degree adverbs have a significant impact on the emotional tone and intensity of a review.To accurately understand the customer's emotional tendency, we define the intensity value by constructing a negation dictionary and a degree adverb dictionary.
In order to calculate sentiment scores, it is necessary to adhere to the following rules: if only sentiment words are present, the sentiment intensity value should be used directly.In the event that a negative sentiment is present, the sentiment tendency should be reversed for an odd number of negative sentiments and maintained unchanged for an even number of negative sentiments.Multiply the sentiment intensity value by the adverb of degree if adverbs of degree are present, i.e., E = d i Q g , i = 1, 2, 3, 4.
Combining a negative word with an adverb of degree in the format 'adverb of degree + negative word' reverses the affective tendency, considering the effect of the adverb of degree, i.e., E = (−1) Similarly, if the format is 'negative word + adverb of degree', the affective tendency is reversed by 0.5, taking into account the effect of the adverb of degree, i.e., E = (−1) The sentiment score for each review is presented in Table 1, calculated by weighting the number of lexicons included in the corpus according to their number.Table 1 also includes a list of the lexicons and their corresponding sentiment intensity values.In conclusion, the data obtained from the comments was validated by removing duplicates from the composite data set.Subsequently, the k-means algorithm was then employed to perform the clustering analysis and complete the division of the comment time stage.The TF-IDF algorithm and sentiment word classification were employed to conduct a competitive analysis based on the feature statistics of this paper and sentiment analysis based on sentiment scores, respectively.The ultimate objective of this research was to inspire merchants to optimise their products and improve the conversion rate of their services and purchases, based on the aforementioned analyses (Figure 1).

Marginally
Slightly, a little, somewhat, slightly, only…… In conclusion, the data obtained from the comments was validated by removing duplicates from the composite data set.Subsequently, the k-means algorithm was then employed to perform the clustering analysis and complete the division of the comment time stage.The TF-IDF algorithm and sentiment word classification were employed to conduct a competitive analysis based on the feature statistics of this paper and sentiment analysis based on sentiment scores, respectively.The ultimate objective of this research was to inspire merchants to optimise their products and improve the conversion rate of their services and purchases, based on the aforementioned analyses (Figure 1).In this paper, we present a list of high-frequency words and their frequencies in online reviews (Table 2).The frequency of occurrence of high-frequency words are significantly higher in the preliminary reviews than in the additional reviews.The initial reviews exhibit the highest word frequency of 2625, with 14 words having a frequency greater than 1000.In contrast, the highest word frequency in the additional reviews is 2040, with only six words having a frequency greater than 1000.The additional reviews emphasise the fresh food characteristics, using high-frequency words such as 'delicious', 'fresh', 'texture', and 'flavour'.The initial and additional reviews both focus on customer attention, although there are significant differences.The two texts exhibit a 76% similarity in highfrequency words, but the distribution and sorting of these words are inconsistent.The initial evaluation highlights words such as 'evaluation', 'Jingdong', 'customer', 'content', and 'unfilled', while the additional reviews focus on terms such as 'delicious', 'apple', 'Jingdong', 'very', and 'good'.The online composite reviews are influenced by the varying demands and preferences of the customers, which are reflected in the different review types.This highlights the diversity of the customer base in terms of review dimensions.Our research has identified eight feature dimensions that customers consider when purchasing fresh products: quality, taste, appearance packaging, logistics, prices, service, evaluations, and customer loyalty (Table 3).To improve these commodities and better understand our customers, we confidently recommend focusing on these dimensions.In fact, the diversity of high-frequency words in the additional reviews may be indicative of differences in the degree to which the product fulfils the customers' needs.In this paper, we utilized dichotomous k-means clustering to partition the review data into time stages.The range of k values was set to [2,10], and we executed the k-means algorithm to obtain the SSE value of the intra-cluster variance for each k value.This allowed us to characterise the distribution of the 'initial review-additional review' time interval sequence and the number of additional reviews.The clustering results (Figure 2) indicate a significant improvement in the degree of aberration when the inflexion point k value is set to 3, with a profile coefficient of 0. The principle of clustering performance evaluation of the profile coefficient divides temporal clustering into three phases: the same day (within 0 days), short-term (within 1-4 days), and long-term (within 5-180 days).The number of corresponding reviews is 6955, 2441, and 2524, respectively.The outcome of having the greatest number of reviews posted on the day of receipt is consistent with customer behaviour.The majority of customers who post a review typically evaluate it immediately after receipt.The comparable number of reviews in the short-term and long-term phases, on the other hand, may be related to the specific type of product purchased by the customer and the duration of its use.In this paper, we utilized dichotomous k-means clustering to partition the review data into time stages.The range of k values was set to [2,10], and we executed the k-means algorithm to obtain the SSE value of the intra-cluster variance for each k value.This allowed us to characterise the distribution of the 'initial review-additional review' time interval sequence and the number of additional reviews.The clustering results (Figure 2) indicate a significant improvement in the degree of aberration when the inflexion point k value is set to 3, with a profile coefficient of 0. The principle of clustering performance evaluation of the profile coefficient divides temporal clustering into three phases: the same day (within 0 days), short-term (within 1-4 days), and long-term (within 5-180 days).The number of corresponding reviews is 6955, 2441, and 2524, respectively.The outcome of having the greatest number of reviews posted on the day of receipt is consistent with customer behaviour.The majority of customers who post a review typically evaluate it immediately after receipt.The comparable number of reviews in the short-term and longterm phases, on the other hand, may be related to the specific type of product purchased by the customer and the duration of its use.

Textual Characterisation of Reviews
The results of the cluster analysis presented above provide the foundation for this research.In order to identify customer preferences and differences in composite reviews, the review set is divided into three time stages.The high-frequency word sets of the feature dimensions of each time period are extracted separately, and the screened high-frequency words are derived from the vocabulary in Table 3.The algorithm for weighting text features based on TF-IDF (F1 = 0.930, Precision = 0.933) calculates the sum of words, sum of word frequencies, attention level, and actual word frequency for each feature dimension across various review types (Table 4).Furthermore, in this study, the accumulated weight of the word set of the same feature dimension is defined as the attention degree.This is employed to reflect the degree of customer attention to various dimensions of the product.At the same time, this study also defines the actual word frequency of dimension keywords appearing on the same review page as the discussion degree.This is employed to reflect the amount of customer discussion on various dimensions.Attention and discussion intensity are important indicators for depicting the competitiveness of dimensions discussed later in the text.The feature dimensions significantly improve the ranked discrimination of word frequencies, as evidenced by the comparison between the sum of word frequencies and the actual word frequencies.The data indicates that logistics and distribution were the most frequently discussed topics, with a greater variety of related words.Nevertheless, the actual word frequency was not higher than that of taste and flavour.

Textual Characterisation of Reviews
The results of the cluster analysis presented above provide the foundation for this research.In order to identify customer preferences and differences in composite reviews, the review set is divided into three time stages.The high-frequency word sets of the feature dimensions of each time period are extracted separately, and the screened high-frequency words are derived from the vocabulary in Table 3.The algorithm for weighting text features based on TF-IDF (F1 = 0.930, Precision = 0.933) calculates the sum of words, sum of word frequencies, attention level, and actual word frequency for each feature dimension across various review types (Table 4).Furthermore, in this study, the accumulated weight of the word set of the same feature dimension is defined as the attention degree.This is employed to reflect the degree of customer attention to various dimensions of the product.At the same time, this study also defines the actual word frequency of dimension keywords appearing on the same review page as the discussion degree.This is employed to reflect the amount of customer discussion on various dimensions.Attention and discussion intensity are important indicators for depicting the competitiveness of dimensions discussed later in the text.The feature dimensions significantly improve the ranked discrimination of word frequencies, as evidenced by the comparison between the sum of word frequencies and the actual word frequencies.The data indicates that logistics and distribution were the most frequently discussed topics, with a greater variety of related words.Nevertheless, the actual word frequency was not higher than that of taste and flavour.The statistics of the most dominant feature dimensions in different periods are shown in Table 5.The comparison reveals that the dominant feature dimensions in the two types of reviews appear to change with the review period.In the initial evaluation, the number of advantageous feature dimensions will shift from logistics and customer loyalty to longer-term feature dimensions such as taste, prices and evaluations.As the number of advantageous feature dimensions increases, the focus of the reviews shifts from logistics, service and evaluations to long-term quality and prices.This shift is accompanied by a gradual decrease in the number of advantageous feature dimensions.The shift in the dimensions of advantageous characteristics can be attributed to the fact that customers do not have a profound experience of the day's service.Instead, their evaluation content is mostly intuitive feelings, such as logistics.Over time, however, they can pay greater attention to the intrinsic nature of fresh fruits and problematic feedback, such as the quality, taste, and so on.

Type of Review
The This research visualises the distribution of feature dimensions in composite reviews across different time stages in Figure 3.For each period, the percentage of actual word frequency of feature dimensions in composite reviews is calculated.A higher percentage of actual word frequency of feature dimensions corresponds to a higher area range.The results show that the proportion of reviews related to the 'quality' dimension exhibited fluctuations across different time stages, with percentages of 18.06%, 23.39%, and 17.90%.This indicates that customers prioritize quality in the short term.Furthermore, customer preferences are subject to change over time, with a particular focus on different aspects during different time stages.The preferences of customers for different types of reviews exhibit distinct differences, indicating a clear pattern of succession and complementarity.For instance, the dimension of 'quality' increases by 22.81% in the initial review, but decreases by 20.85% in the subsequent review.The initial review focuses on the 'service' and 'appearance packaging', while the additional review covers 'taste', 'flavour', 'appearance packaging', and 'customer loyalty'.These dimensions collectively demonstrate the customer preferences across various types of reviews.
This research visualises the distribution of feature dimensions in composite reviews across different time stages in Figure 3.For each period, the percentage of actual word frequency of feature dimensions in composite reviews is calculated.A higher percentage of actual word frequency of feature dimensions corresponds to a higher area range.The results show that the proportion of reviews related to the 'quality' dimension exhibited fluctuations across different time stages, with percentages of 18.06%, 23.39%, and 17.90%.This indicates that customers prioritize quality in the short term.Furthermore, customer preferences are subject to change over time, with a particular focus on different aspects during different time stages.The preferences of customers for different types of reviews exhibit distinct differences, indicating a clear pattern of succession and complementarity.For instance, the dimension of 'quality' increases by 22.81% in the initial review, but decreases by 20.85% in the subsequent review.The initial review focuses on the 'service' and 'appearance packaging', while the additional review covers 'taste', 'flavour', 'appearance packaging', and 'customer loyalty'.These dimensions collectively demonstrate the customer preferences across various types of reviews.The paper employs a vector space model to illustrate the distribution of customers' interests across different time stages of composite reviews.The five most frequently occurring words are selected as keywords, and their weights are indicated using the TF-IDF The paper employs a vector space model to illustrate the distribution of customers' interests across different time stages of composite reviews.The five most frequently occurring words are selected as keywords, and their weights are indicated using the TF-IDF index.This approach effectively distinguishes the differences in customers' interest preferences based on their attention to different types and time stages of reviews, as shown in Table 6.

Product Competitiveness Analysis Based on Composite Reviews
The research confidently identifies four regional quadrants: 'Advantage Area', 'Struggle Area', 'Opportunity Area', and 'Waiting Area' (Figure 4).These quadrants characterise product competitiveness based on online reviews and feature dimensions.The 'Advantage Area' is the feature dimension that customers frequently discuss and pay the most attention to.It is the most effective in promoting customers' perceived value and determines the core competitive advantage of the business.Conversely, 'Struggle Areas' represent feature dimensions that lack information but receive considerable attention.These feature dimensions provide valuable product information, and businesses may therefore seek to direct their customers to post adequate and high-quality reviews in order to bring them closer to the 'Advantage Areas'.The 'Opportunity Area' is a dimension that indicates that customers are less enthusiastic in their discussions and pay less attention to the features.Consequently, merchants must adjust their resources in real time according to the market dynamics.'Waiting Area' is a feature dimension that is frequently discussed but paid less attention to, and it is an important key direction for merchants to improve service competition and seize market opportunities in the future.
The feature dimensions in the initial reviews, such as 'appearance packaging' and 'logistics', are more competitive than those in the additional reviews.This may be attributed to the fact that customers have not yet had the opportunity to fully appreciate the product, with only a superficial understanding of its packaging and logistics features.These factors can initially influence the customer experience.The initial feedback indicated that the 'evaluations' aspect was highly favoured, while the 'quality' aspect was the most concerning.Conversely, the additional feedback was most positive about the 'taste' aspect, but expressed the most concern about the 'evaluations' aspect.An additional review is a characterisation of the customer's experience after using the product for a period of time.This allows the customer to assess the product in terms of its quality and taste, which reflects their original intention and need to purchase the product.These factors can initially influence the customer experience.The initial feedback indicated that the 'evaluations' aspect was highly favoured, while the 'quality' aspect was the most concerning.Conversely, the additional feedback was most positive about the 'taste' aspect, but expressed the most concern about the 'evaluations' aspect.An additional review is a characterisation of the customer's experience after using the product for a period of time.This allows the customer to assess the product in terms of its quality and taste, which reflects their original intention and need to purchase the product.

Sentiment Analysis Based on Composite Reviews
Online reviews can significantly impact customer preferences and purchase intentions [7].This research analyses sentiment value differences in three types of reviews: same-day, short-term, and long-term for initial and additional reviews, respectively.And then, the sentiment mean value was calculated by averaging the sentiment scores of the three stages.The research findings indicate that emotional attitudes in both initial and additional reviews became increasingly negative over time.Despite the fact that subsequent reviews were more diverse, they still carried negative emotions, resulting in an overall low emotional mean (Figure 5).This emphasises the importance for merchants to prioritise the long-term interests of their customers and to make efforts to enhance the quality, appearance packaging, and taste of their services to obtain positive evaluations.This shift in sentiment, in addition to reflecting the merchant's shortcomings, may also be correlated with a change in the customer's perception of the product.As the customer becomes more aware of the product over time, this has a direct impact on the customer's satisfaction and the sentiment expressed in the review.
Additionally, the sentiment mean value for additional reviews was significantly lower when compared to the level of initial reviews.Additional reviews are usually additional evaluations of the product experience.The sentiment value of the additional reviews is generally lower than that of the initial reviews, indicating that the product does not fully align with the customer's needs.Negative reviews that appear in both the initial and additional reviews will undoubtedly reduce potential customers' confidence in shopping.Therefore, businesses should maintain close service communication with customers to ensure that they receive an effective solution to their problem after providing a negative

Sentiment Analysis Based on Composite Reviews
Online reviews can significantly impact customer preferences and purchase intentions [7].This research analyses sentiment value differences in three types of reviews: same-day, short-term, and long-term for initial and additional reviews, respectively.And then, the sentiment mean value was calculated by averaging the sentiment scores of the three stages.The research findings indicate that emotional attitudes in both initial and additional reviews became increasingly negative over time.Despite the fact that subsequent reviews were more diverse, they still carried negative emotions, resulting in an overall low emotional mean (Figure 5).This emphasises the importance for merchants to prioritise the long-term interests of their customers and to make efforts to enhance the quality, appearance packaging, and taste of their services to obtain positive evaluations.This shift in sentiment, in addition to reflecting the merchant's shortcomings, may also be correlated with a change in the customer's perception of the product.As the customer becomes more aware of the product over time, this has a direct impact on the customer's satisfaction and the sentiment expressed in the review.
Additionally, the sentiment mean value for additional reviews was significantly lower when compared to the level of initial reviews.Additional reviews are usually additional evaluations of the product experience.The sentiment value of the additional reviews is generally lower than that of the initial reviews, indicating that the product does not fully align with the customer's needs.Negative reviews that appear in both the initial and additional reviews will undoubtedly reduce potential customers' confidence in shopping.Therefore, businesses should maintain close service communication with customers to ensure that they receive an effective solution to their problem after providing a negative evaluations.If there is an initial review of the negative emotions, the business should be prompt in appeasing the customer, to prevent the deterioration of emotions.Customers who receive a satisfactory service are more likely to provide positive feedback, which can be further incentivised through offers such as coupons.Furthermore, fluctuations in the types of fresh fruits due to seasonal factors may result in seasonal patterns in services related to fresh fruits.The resulting seasonal fluctuations in prices may also lead to an increase in customers' price sensitivity, which in turn affects their review sentiment.The research presented in this paper identifies the potential avenues for improvement in the service provided by current fresh food platforms, as customers' emotional attachments to fresh fruits tends to decrease over time.Nevertheless, in the case of fresh food platforms that consistently deliver high-quality products, the emotional intensity of customers does not necessarily decline gradually.Rather, it remains at a high level.
J. Theor.Appl.Electron.Commer.Res.2024, 19, FOR PEER REVIEW evaluations.If there is an initial review of the negative emotions, the business sho prompt in appeasing the customer, to prevent the deterioration of emotions.Cust who receive a satisfactory service are more likely to provide positive feedback, whi be further incentivised through offers such as coupons.Furthermore, fluctuations types of fresh fruits due to seasonal factors may result in seasonal patterns in se related to fresh fruits.The resulting seasonal fluctuations in prices may also lead increase in customers' price sensitivity, which in turn affects their review sentimen research presented in this paper identifies the potential avenues for improvement service provided by current fresh food platforms, as customers' emotional attachme fresh fruits tends to decrease over time.Nevertheless, in the case of fresh food pla that consistently deliver high-quality products, the emotional intensity of customer not necessarily decline gradually.Rather, it remains at a high level.

Conclusions
This research employs text mining and sentiment analysis of online reviews o produce on e-commerce platforms to identify shifts in customer preferences and ences in composite reviews at different times.The findings reveal that customers' in in purchasing fresh produce online are evolving.Merchants should direct their att to enhancing the eight quality dimensions that customers primarily consider when ing online purchases of fresh products.These dimensions include quality, taste, ap ance packaging, logistics, prices, service, evaluations, and customer loyalty.This w sult in an enhancement of the products in question.
At the same time, assessing the competitiveness of product features in differen stages based on the degree of attention and discussion intensity can assist busines adjusting their resources according to customer interests, developing correspondin keting strategies, and providing accurate and effective information services to the tomers.It can be observed that customer preferences exhibit heterogeneity in v types of feedback and at different stages.Furthermore, the level of attention paid b tomers to service features and the intensity of discussion may change due to shifts i tomer preferences and interests.

Conclusions
This research employs text mining and sentiment analysis of online reviews of fresh produce on e-commerce platforms to identify shifts in customer preferences and differences in composite reviews at different times.The findings reveal that customers' interests in purchasing fresh produce online are evolving.Merchants should direct their attention to enhancing the eight quality dimensions that customers primarily consider when making online purchases of fresh products.These dimensions include quality, taste, appearance packaging, logistics, prices, service, evaluations, and customer loyalty.This will result in an enhancement of the products in question.
At the same time, assessing the competitiveness of product features in different time stages based on the degree of attention and discussion intensity can assist businesses in adjusting their resources according to customer interests, developing corresponding marketing strategies, and providing accurate and effective information services to their customers.It can be observed that customer preferences exhibit heterogeneity in various types of feedback and at different stages.Furthermore, the level of attention paid by customers to service features and the intensity of discussion may change due to shifts in customer preferences and interests.
It is possible that the customer's emotional attitude may become increasingly negative over time, which could result in a greater diversity of reviews with a negative sentiment bias.This suggests that the observed change in customer satisfaction with online fresh fruit consumption can be attributed to perceived differences in the consumption experience across various characteristic dimensions.Merchants must maintain close communication with customers in order to ensure effective problem resolution and to prevent negative eval-uations.If a customer expresses negative emotions in their first review, the merchant must promptly appease them to maintain positive customer relations and improve satisfaction.
Furthermore, this research has identified that product competitiveness can be analysed based on initial and additional reviews, which can be categorised into four quadrants: 'Advantage Area', 'Struggle Area', 'Opportunity Area', and 'Waiting Area'.The feature dimensions in the 'Advantage Area' serve as the primary indicators of the core competitiveness of the merchant's products.In contrast, the feature dimensions in the 'Struggle Area' require customers to post sufficient and high-quality reviews in order to enhance the core competitiveness of the products.The 'Opportunity Area' represents product dimensions that are less frequently reviewed and discussed.In contrast, the 'Waiting Area' is a feature dimension that is frequently discussed but receives less attention.It is a priority for companies to improve their service competitiveness and seize market opportunities in the future.This research proposes a product competitiveness quadrant that can serve as a framework for businesses seeking to enhance their products and operations with confidence.
The present research examined the temporal evolution of sentiment means values.Subsequent studies should combine the validity and sentiment scores of each dimension in order to refine the analysis.Furthermore, future studies should explore the influence of demographic characteristics and product presentation on customer preferences.

Figure 3 .
Figure 3. Distribution of feature dimensions of composite reviews in three time stages.(a) Initial reviews.(b) Additional reviews.

Figure 3 .
Figure 3. Distribution of feature dimensions of composite reviews in three time stages.(a) Initial reviews.(b) Additional reviews.

Figure 4 .
Figure 4. Competitiveness analysis of feature dimensions.

Figure 4 .
Figure 4. Competitiveness analysis of feature dimensions.

Figure 5 .
Figure 5.Comparison of emotional means in three time stages.

Figure 5 .
Figure 5.Comparison of emotional means in three time stages.

Table 1 .
Emotional intensity values for selected words.

Table 2 .
The frequencies of the top 50 high-frequency words.

Table 3 .
Initial and additional reviews on the high-frequency word sets.

Table 4 .
Text feature statistics of dimensions.
Note: Initial reviews * and additional reviews * indicate that the feature dimensions of actual word frequencies do not match the ordering of the sum of word frequencies.

Table 5 .
The most advantageous interest preference statistics in three time stages.