Mining Customer Satisfaction from Online Reviews: An Explainable Kano-Based Framework for Product Improvement

Yu, Huiru; Li, Yanlai

doi:10.3390/systems14050585

Open AccessArticle

Mining Customer Satisfaction from Online Reviews: An Explainable Kano-Based Framework for Product Improvement

by

Huiru Yu

and

Yanlai Li

^*

Department of Management Science and Engineering, Business School, Liaoning University, Shenyang 110136, China

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(5), 585; https://doi.org/10.3390/systems14050585

Submission received: 9 April 2026 / Revised: 4 May 2026 / Accepted: 19 May 2026 / Published: 20 May 2026

(This article belongs to the Section Systems Practice in Social Science)

Download

Browse Figures

Versions Notes

Abstract

Improving customer satisfaction (CS) and gaining competitive advantages are central goals of product improvement, both of which rely on accurate classification of product attributes. Online reviews on e-commerce platforms provide firms with abundant customer feedback, but accurately classifying and prioritizing product attributes remains challenging. To address this issue, we propose an interpretable Kano model. In this method, the Biterm Topic Model (BTM) is first used to identify product attributes from reviews. Then, the Enhanced BERT Model with Attribute-Aware and Convolutional Mechanisms (BERT-A-Conv) is employed to classify the sentiment categories of these attributes. Given the critical role of neutral sentiment, it is incorporated into the Light Gradient Boosting Machine (LightGBM) model to quantify the impact of AS on CS. The Shapley additive explanations (SHAP) method is then adopted to construct the marginal contribution difference (MCD) between adjacent categories, which this study uses to classify product attributes into five Kano categories. On this basis, we calculate the attribute improvement priority score (AIPS) by combining each attribute’s MCD and improvement potential, thereby offering firms a systematic analytical framework to support product iteration and improvement. A case study on smartwatches demonstrates the applicability and feasibility of the proposed method.

Keywords:

online reviews; customer satisfaction; sentiment analysis; Kano model; product improvement

1. Introduction

In a rapidly changing business environment, a customer-centered product improvement strategy has become an important means for enterprises to gain insights into customer evaluations and thereby improve customer satisfaction (CS) [1,2]. The effective implementation of this strategy depends on the accurate identification of customer sentiment feedback and scientifically grounded guidance for product improvement [3]. However, traditional data collection methods, such as questionnaires, have become outdated. They cannot meet the need for real-time mining of massive amounts of unstructured, authentic customer feedback in the big data era [4].

In contrast to traditional survey-based methods, the widespread adoption of e-commerce and social platforms has made online reviews a major medium through which customers express their usage experiences and opinions, as well as an important factor influencing product sales and corporate reputation [5,6,7]. Online review data not only reveal customers’ preferences, behaviors, and subjective perceptions [8], but also provide a dynamic, rich, and real-time source of customer insight into products [9,10], thereby providing strong support for customer-centered targeting and agile product improvement.

Customers’ experiences with products or services often involve multiple dimensions. Therefore, identifying sentiment only at the overall review level may obscure their attitudes toward specific product attributes [11,12,13]. In fact, customers may express distinct or even opposing sentiments toward different attributes within the same review [14,15]. Accurately extracting these focal points and the fine-grained sentiments embedded in reviews is of great importance for revealing their actual concerns and sentiment tendencies [16,17]. However, in real-world review contexts, textual expressions are often characterized by grammatical irregularities, semantic ambiguity, and complex rhetorical devices, which can interfere with the accurate identification of such information [8]. Therefore, improving fine-grained sentiment identification at this level has become a key issue in understanding customer attitudes [18]. This capability not only helps reveal product shortcomings and improvement directions in greater detail, but also provides data-driven, precise support for enterprise product iteration and decision optimization in e-commerce contexts [19]. Although existing studies can automatically identify key attributes and their sentiment polarity in review texts, there remains substantial room for improvement in modeling fine-grained, multi-attribute sentiment relationships [20].

To translate attribute sentiment (AS) into actionable product improvement insights, the Kano model provides a useful analytical framework. The Kano model has been widely applied in product and service contexts, including hotels [3], tourism services [21], fresh products [22], and shopping platforms [23], because it can reveal the asymmetric relationship between AS and CS. When constructing Kano models from online reviews, prior research has mainly relied on binary sentiment (positive or negative) to investigate the impact of AS on CS. However, this binary treatment generally overlooks the independent value of neutral sentiment and often categorizes it as either positive or negative. This approach may lead to the loss of critical sentiment information [24], weaken the analytical value of objective factual information contained in review texts, and reduce the explanatory power of the mechanism through which AS influences CS, thereby distorting the true relationship between AS and CS [25].

Identifying attribute priorities is a critical step in resource allocation decisions [26]. However, existing studies have largely stopped at Kano category classification, paying insufficient attention to the relative priority of attributes within the same category. As a result, it is difficult to further distinguish the priority order of attributes within the same category [27], which weakens the decision-support value of the research findings for managerial practice [28,29]. Even if multiple attributes belong to the same Kano category, their perceived importance among customers may differ significantly [30]. Therefore, addressing how to rank attributes within the same category in a reasonable manner has become a critical issue in current research.

Accordingly, we propose the following research questions in this paper:

How can AS in online reviews be accurately recognized?
How can neutral sentiment be incorporated into the Kano classification method?
How can attributes within the same category be ranked?

To address these research questions, this study develops a systematic decision-support framework to mine CS from online reviews and guide product improvement. This paper makes three main contributions. First, an attribute sentiment analysis method, termed BERT-A-Conv, is developed based on Bidirectional Encoder Representations from Transformers (BERT) by integrating an attribute-aware mechanism with convolutional feature extraction, thereby enabling accurate identification of sentiments associated with multiple attributes in online reviews. Second, we propose the marginal contribution difference-based Kano model (MCD-Kano) to incorporate neutral sentiment into Kano classification, thereby addressing the limitations of traditional binary sentiment. Third, the Attribute Improvement Priority Score (AIPS) is developed by integrating attribute MCDs with their improvement potential to rank attributes within a category, thereby providing quantitative support for enterprise resource allocation and product improvement. Overall, this study supports a data-driven decision-support logic that transforms product-attribute-level information into interpretable product-improvement insights.

The remainder of this paper proceeds as follows. Section 2 reviews the relevant literature; Section 3 presents the methodological framework; Section 4 reports the case study and comparative analysis; and Section 5 discusses the theoretical and practical contributions, as well as the study’s limitations and future research directions.

2. Related Works

2.1. Kano Classification and Attribute Ranking

Customer needs are heterogeneous in real-world contexts, and customers differ in the degree of attention they pay to product attributes and in their value judgments. As a result, different types of attributes exhibit distinct mechanisms in influencing CS and behavioral responses. Therefore, systematically classifying and prioritizing product attributes not only helps reveal intrinsic differences in attribute values but also provides a quantitative basis and decision support for product and service improvement [31]. Existing methods for attribute classification include Multi-attribute Decision-making Methods [32,33,34], Importance–Performance Analysis (IPA) [35,36,37], and the Kano model [23,38]. Among these methods, the Kano model has been widely used as a practical tool for attribute classification.

As shown in Figure 1, the Kano model is used to characterize the relationship between product AS and CS [39,40]. Based on this relationship, the Kano model classifies attributes into five categories: (1) Must-be (M-type) attributes represent basic requirements. Their fulfillment contributes only limited improvement to CS, whereas their absence significantly reduces CS. (2) For One-dimensional (O-type) attributes, the higher the level of fulfillment, the higher the CS, whereas their absence reduces CS. (3) Attractive (A-type) attributes can significantly enhance CS when fulfilled, but their absence usually does not lead to dissatisfaction. (4) Indifferent (I-type) attributes have little significant effect on CS. (5) The presence of Reverse (R-type) attributes instead leads to a reduction in CS.

Existing studies on CS modeling based on the Kano model have mainly identified asymmetric attribute effects by comparing the impacts of positive and negative sentiments on CS. Representative approaches include the partial utility function model [41], the modified ordered choice model [42], multiple regression combined with penalty–reward contrast analysis [43], binary logistic regression [3], and Tobit regression [44]. In addition, machine learning methods have been introduced to capture nonlinear relationships between AS and CS, such as ensemble neural networks [45], multi-layer perceptions [38], and LSTM models [46]. From the perspective of satisfaction asymmetry, the effect of AS on CS is not always linear or symmetric, and transitions between different sentiment states may correspond to satisfaction responses of different magnitudes. Therefore, comparing only positive and negative sentiments may be insufficient to fully capture the nonlinear and asymmetric mechanisms through which AS affects CS [47]. Although these studies have advanced data-driven Kano classification, most still rely primarily on a binary positive–negative sentiment structure when modeling attribute effects. This simplification may be insufficient in online review contexts, where neutral sentiment is not merely the absence of emotion but may also reflect objective factual descriptions, moderate evaluations, or mixed perceptions. If neutral sentiment is conflated with positive or negative sentiment, the relationship between review sentiment and CS may be distorted [25]. Previous studies have also shown that neutral sentiment words account for a large proportion of review texts [48], are more likely to be shared than positive or negative emotions [49], and can improve sentiment classification accuracy when incorporated into sentiment analysis [24]. These findings indicate that ignoring neutral sentiment may reduce the validity of Kano-based CS evaluation in online review contexts.

In addition, existing studies using the Kano model for attribute ranking still lack a unified standard for quantifying attribute priority. Most studies have linked attribute priority to attribute weights or importance scores. Representative approaches include coefficient-difference-based importance measurement [3], IPA-based priority ranking [31], and multi-criteria decision-making methods such as PROMETHEE II [50]. Although these studies have provided useful references for attribute prioritization, most focus on overall attribute rankings and pay little attention to within-category prioritization, even though attributes within the same Kano category may differ in importance [30]. Therefore, further distinguishing attribute priorities within each Kano category can provide more targeted guidance for product improvement.

Therefore, this study explicitly incorporates neutral sentiment into attribute classification and uses LightGBM to model the impact of AS on CS. Based on SHAP values, this study derives MCDs between adjacent sentiment categories to characterize changes in the satisfaction response curve across negative, neutral, and positive sentiment states. It then develops an MCD-Kano classification framework and further ranks attributes by considering value differences across Kano categories.

2.2. Sentiment Analysis Based on Online Reviews

With the ongoing advancement of the customer-centered philosophy, CS has become an important driver of product innovation and service improvement, as it reflects customers’ overall perceptions and evaluations of product performance, functionality, and experience in real-world use. Traditional methods for obtaining CS mainly rely on questionnaires, interviews, and focus groups [31]. However, limited sample size, infrequent data updates, and inefficient data collection constrain these approaches, while respondents’ willingness to express their views may further affect survey implementation and cost control [4,36]. In recent years, the widespread adoption of e-commerce platforms and the continuous accumulation of online reviews have provided new data sources for CS modeling. Online reviews not only reflect customers’ direct perceptions of products but also reveal their satisfaction levels [51,52]. Compared with traditional surveys, these data are larger in scale, easier to obtain, and less costly. More importantly, because they originate from customers’ voluntary feedback [42], they can better reflect customer sentiments in real-world consumption contexts [44].

In CS modeling based on online reviews, sentiment analysis is commonly used to extract customer sentiment information from review texts. Existing studies can generally be divided into coarse-grained and fine-grained approaches. Coarse-grained sentiment analysis generally focuses on sentiment identification at the document or sentence level. For example, Arif et al. [53] extracted features such as Term Frequency–Inverse Document Frequency (TF-IDF) and sentiment lexicon scores and combined them with an improved classifier to determine the overall sentiment polarity of tweets or movie reviews. Xu et al. [54] developed a Convolutional Neural Network (CNN) and Word2Vec-based model that employed parallel convolution and max pooling over character-level word vectors to achieve end-to-end sentiment polarity classification for short Weibo texts. Zhang et al. [55] combined a CNN with NB-SVM and performed secondary classification on low-confidence sentences to improve the accuracy of sentence-level analysis. Li et al. [56] introduced sentiment-padding features derived from a sentiment lexicon and integrated local features with sequential information via a dual-channel CNN and Bidirectional Long Short-Term Memory (BiLSTM) architecture to achieve sentence-level sentiment polarity classification. However, coarse-grained sentiment analysis cannot capture the sentiment characteristics associated with specific entities and their related attributes. It thus cannot meet the requirements of attribute-level CS modeling [57].

Attribute-level analysis, as an important branch of fine-grained sentiment analysis, aims to identify specific attributes and their associated sentiment information. In this field, machine learning methods, particularly deep learning approaches, have been widely applied, and model fusion has been used to improve recognition accuracy [58]. For example, Wu et al. [59] constructed a phrase dependency graph based on syntactic dependency analysis to capture the dependency relations between aspect terms and sentiment words, and combined Gated Recurrent Unit (GRUs) to model contextual features, thereby achieving aspect-level sentiment classification. Chen et al. [60] employed an improved convolutional memory neural network that integrates a CNN and a BiLSTM to perform polarity classification and intensity calculation for feature-level review sentences. In deep semantic modeling, BERT, based on the Transformer architecture, has become a major focus of research due to its strong contextual modeling capabilities [61]. For example, Zhao and Yu [62] proposed a knowledge-enabled BERT model that embeds external domain knowledge into BERT’s input representations to achieve aspect-based sentiment analysis. Xu [63] developed a model integrating BERT, BiLSTM, and Conditional Random Field (CRF) to identify dissatisfaction in hotel online reviews. Liu et al. [64] combined BERT’s character-level semantics, Word2Vec’s word-level domain features, and a dual-channel CNN structure to enhance the modeling of fine-grained sentiment information.

Although attribute-level sentiment analysis has made significant progress, sentiment features may still be confused in multi-attribute review scenarios, especially when multiple attributes and sentiments appear in the same review. Existing methods also provide limited explicit guidance for focusing on target attributes. To address this issue, this study introduces an attribute attention mechanism to guide the model toward semantic segments highly relevant to the target attribute, thereby reducing cross-interference among sentiment features. Meanwhile, a convolution module is employed to capture local feature patterns and enhance the modeling of fine-grained sentiment information. This collaborative modeling strategy, integrating attribute awareness and local features, improves the accuracy of attribute-level sentiment identification and provides methodological support for attribute improvement and strategy formulation based on customer feedback.

3. Methodology

As shown in Figure 2, this paper proposes an explainable Kano-based decision-support framework for mining CS from online reviews and guiding product improvement. The framework aligns with the research stream of data-driven CS analysis and product improvement informed by online reviews. Rather than treating product attributes, customer sentiment, and satisfaction outcomes as separate analytical objects, the framework connects them within a unified decision-support process that transforms product-attribute-level information into interpretable product-improvement insights. The proposed method consists of five key steps: (1) Data Acquisition and Pre-processing; (2) Product Attribute Extraction; (3) Attribute Sentiment Analysis; (4) Construction of the MCD-Kano model; and (5) Attribute Improvement Priority Assessment Based on MCD. Together, these steps establish a coherent logic for converting online review data into actionable product improvement priorities.

3.1. Data Acquisition and Pre-Processing

During the data collection stage, large-scale customer review data were obtained from e-commerce platforms using a self-developed web crawler, covering key fields such as customer ID, review time, review content, and ratings. To ensure the authenticity and validity of the review data, we implemented a hierarchical quality control procedure during the pre-processing stage. First, the raw reviews were cleaned, including removing duplicate records, excluding reviews with missing text or invalid content, retaining only Chinese-language text, and filtering out excessively short reviews with fewer than 5 words, thereby reducing the proportion of noisy, information-insufficient samples. Second, to address potential interference from fake reviews and spamming, we performed additional consistency and anomaly screening, including verifying the correspondence between ratings and review texts, identifying abnormally active users who posted large numbers of reviews within a short period, and removing the associated samples. For text processing, we used Jieba 0.42.1 for Chinese word segmentation, constructed a synonym dictionary to enhance semantic recognition, and introduced a stop-word dictionary to reduce interference from noisy information and irrelevant terms.

3.2. Product Attribute Extraction

Because a single review may contain multiple attributes, direct modeling may easily lead to topic mixing. To improve identification accuracy in multi-attribute expressions, this paper adopts a sentence-level segmentation strategy that divides each review into multiple sentences based on punctuation. This study then combines all sentences into a corpus and uses the bitermplus toolkit in Python 3.11.5 to train a Biterm Topic Model (BTM) for identifying potential product attributes. BTM represents the topic structure by capturing the co-occurrence patterns of word pairs (biterms) across the global corpus and is particularly suitable for short-text scenarios [65,66].

To determine the optimal number of topics, this paper adopts the

C_{v}

coherence score for evaluation [31]. This metric measures the degree of semantic cohesion among keywords within the same topic. A higher

C_{v}

score indicates that the semantics within a topic are more concentrated and the expression is clearer, thereby reflecting higher topic quality.

Figure 3 illustrates the generative process of BTM and its role in attribute extraction in the present study.

(1) The model generates the global topic distribution θ according to the hyperparameter σ, which controls the topic sampling of biterms in the corpus;

(2) For each biterm, the topic variable z is sampled from θ to determine the topic membership of the word pair;

(3) The word distribution of each topic, P_w|k, k = {1, 2, …, K}, is generated according to the hyperparameter τ, and two words, w_p and w_q, are then separately sampled from the corresponding P_w|k (K denotes the predefined number of topics in BTM);

(4) After model training, the set of high-probability keywords for each topic is output;

(5) Through manual inspection and synonym merging, semantically similar topics are identified as specific product attributes, and this study constructs an attribute lexicon for subsequent attribute context extraction and sentiment prediction.

3.3. Attribute Sentiment Analysis

Based on the attributes extracted in Section 3.2, this section further incorporates a sentiment analysis model to achieve attribute-level sentiment analysis. This study proposes a BERT-A-Conv model. In this model, the attribute-aware mechanism guides the model to focus on semantic segments that are highly relevant to the target attribute, thereby effectively alleviating sentiment feature confusion in multi-attribute reviews. The convolutional module captures local feature patterns, making it suitable for fine-grained sentiment analysis. This method takes the “attribute–review” combination as the model input. Accordingly, this study decomposes each review involving multiple attributes into several “attribute–review” samples, and Algorithm A1 in Appendix A presents the matching algorithm. This design enables the model to learn sentiment expression independently of features for each attribute and to explicitly focus on review segments semantically related to each attribute, thereby effectively reducing sentiment feature confusion in multi-attribute scenarios.

During the feature encoding stage, pre-trained BERT is used to obtain the contextual representation of the input text, where the target attribute and the review text are concatenated into a single input sequence with special separators added, thereby forming an embedding representation that incorporates both attribute semantics and review context [67]. Through a multi-layer Transformer architecture, BERT captures global dependencies, and the resulting contextual vectors preserve both review semantics and attribute context. Different from conventional approaches that directly feed the entire review sentence into a sentiment model, this study introduces an attribute-aware strategy at the encoding stage: the target attribute and the review text are concatenated and input into BERT, and the weights of words in the contextual vectors are dynamically adjusted through attribute-aware attention, enabling the model to prioritize the sentiment signals most relevant to the current attribute when processing multi-attribute reviews without interference from irrelevant information associated with other attributes. On this basis, this study employs the CNN to extract n-gram-level local sentiment features. Convolutional kernels slide over the review text representations to capture phrase-level sentiment patterns, and max pooling is applied to retain the most salient sentiment cues, thereby enhancing the model’s capability for fine-grained sentiment identification. Subsequently, this study fuses the output vectors from the attribute-aware attention mechanism and the CNN module, and performs sentiment classification for the target attribute using a fully connected layer. The output layer uses the Softmax activation function to generate a probability distribution over three sentiment categories—positive, negative, and neutral—and identifies the category with the highest probability as the final sentiment class for the target attribute. Figure 4 illustrates the structure of the BERT-A-Conv model and the functions of its modules.

To validate model performance, this study adopted commonly used classification metrics, including Precision, Recall, and F1-score, to evaluate the sentiment analysis model. Precision measures the proportion of correctly predicted positive samples among all samples predicted as positive, while Recall measures the proportion of correctly predicted positive samples among all actual positive samples. The F1-score is the harmonic mean of Precision and Recall. Their calculation formulas are shown in Equation (1).

\begin{array}{l} p r e c i s i o n = \frac{T r u e p o s i t i v e}{T r u e p o s i t i v e + F a l s e p o s i t i v e} \\ r e c a l l = \frac{T r u e p o s i t i v e}{T r u e p o s i t i v e + F a l s e n e g a t i v e} \\ F_{1} = \frac{2 \cdot p r e c i s i o n \cdot r e c a l l}{p r e c i s i o n + r e c a l l} \end{array}

(1)

3.4. Construction of the MCD-Kano Model

In this paper, we propose the MCD-Kano model to accurately map ASs to the Kano model’s categories by quantifying each AS’s marginal contribution to changes in overall CS, thereby providing effective decision support for product improvement in the e-commerce environment.

In the CS modeling stage, this study adopted the LightGBM model, a gradient-boosting decision tree [67]. The model has demonstrated superior computational efficiency and predictive performance in large-scale and high-dimensional data analysis [68,69], and is particularly suitable for processing online review data characterized by complex structures and substantial feature redundancy. Empirical comparisons indicate that LightGBM outperforms XGBoost and the Bagging-based Random Forest model in terms of prediction accuracy, training efficiency, and feature importance interpretation [61,70]. Specifically, the model’s input features were the sentiment categories extracted during the sentiment analysis stage, and the target variable was the review rating (R), thereby establishing a nonlinear mapping between AS and CS. During model training, this study constructed a unique index from timestamps and user IDs to ensure precise correspondence between AS and R and performed hyperparameter tuning using a hierarchical optimization strategy. To further ensure the robustness and generalizability of the model, five-fold cross-validation was used during training. This study comprehensively evaluated the model’s predictive performance using Precision, Recall, and F1-score. If a review mentioned a sentiment category, it was coded as 1; if it did not mention it, it was coded as 0; and if the review did not involve the attribute, it was recorded as {0, 0, 0}. Table 1 shows the data structure of the model input.

This study used the Shapley additive explanations (SHAP) method to explain how AS affects CS. SHAP builds on the game-theoretic feature attribution framework proposed by Shapley [71] and quantifies the average marginal contribution of a feature to the model output through the prediction change induced by adding that feature across all possible feature combinations [72]. By accounting for all possible feature combinations in the attribution process, SHAP is suitable for interpreting the nonlinear relationship between AS and CS captured by the LightGBM-based model, which may not be adequately represented by traditional linear models [73]. In tree models, the TreeSHAP method adopted in this study retains the theoretical accuracy and interpretability of SHAP values while leveraging tree structure information to reduce computational complexity from exponential to polynomial, making it an efficient and precise implementation [74]. SHAP values represent the marginal contribution of an attribute to the rating: positive values indicate that the attribute increases CS, whereas negative values indicate that the attribute reduces CS. Since the contribution of the same attribute may be opposite in direction across reviews, direct averaging may conceal these differences; therefore, this study separately computed the mean SHAP value for each sentiment category to represent its marginal contribution to CS. Let the sentiment category be denoted as

c \in p o s, n e u, n e g

, with a total of

N_{i}^{c}

reviews in that category. For the

j

-th review, the SHAP value of attribute

i

is

φ_{i, j}^{c}

, and the marginal contribution

E_{i}^{c}

of attribute

i

for this sentiment category is computed as shown in Equation (2).

E_{i}^{c} = \frac{1}{N_{i}^{c}} \sum_{j \in N_{i}^{c}} φ_{i, j}^{c}

(2)

On this basis, drawing on reference-dependent utility theory, this study treats the neutral state as the reference point and calculates the MCDs between adjacent sentiment categories. Specifically,

M C D_{i}^{p o s - n e u}

measures the change in the marginal contribution of attribute

i

when sentiment shifts from neutral to positive, whereas

M C D_{i}^{n e u - n e g}

measures the change in the marginal contribution when sentiment shifts from negative to neutral. Equation (3) presents the detailed expression. These two measures represent changes in the satisfaction response curve across the “negative–neutral–positive” sentiment states. A positive

M C D_{i}^{p o s - n e u}

indicates that positive sentiment contributes more to CS than neutral sentiment, whereas a negative value indicates its contribution is weaker than that of neutral sentiment. Similarly, a positive

M C D_{i}^{n e u - n e g}

indicates that neutral sentiment contributes more to CS than negative sentiment, whereas a negative value indicates that the contribution of neutral sentiment is weaker than that of negative sentiment.

\begin{array}{l} M C D_{i^{p o s - n e u}} = E_{i}^{p o s} - E_{i}^{n e u} \\ M C D_{i^{n e u - n e g}} = E_{i}^{n e u} - E_{i}^{n e g} \end{array}

(3)

Based on the above two indicators, an MCD-Kano model was constructed in this study (as shown in Figure 5), and the classification rules are described as follows: If

|M C D_{i}^{p o s - n e u}| < ς

and

| M C D_{i}^{n e u - n e g} | < ς

, the attribute is classified as I-type, where experts determine ς, a near-zero threshold, based on domain knowledge and experience to exclude the influence of weak effects and random fluctuations. In this case, the marginal contributions of the attribute in the two sentiment transition intervals, namely from neutral to positive and from negative to neutral, are both close to zero. Therefore, it neither significantly increases CS nor effectively reduces customer dissatisfaction, and its overall effect on sentiment change is weak. Second, if

M C D_{i}^{p o s - n e u} < 0

and

M C D_{i}^{n e u - n e g} < 0

, the attribute is classified as R-type. This indicates that the attribute exerts negative effects in both sentiment transition directions. It not only fails to move customer sentiment from neutral to positive, but also fails to restore sentiment from negative to neutral, and may even intensify negative experiences. For the remaining attributes, the relative magnitudes of

M C D_{i}^{p o s - n e u}

and

M C D_{i}^{n e u - n e g}

are further compared to identify the shape of the contribution curve. Here, the threshold δ determines whether the relative difference between the pos−neu and neu−neg intervals meets the predefined criterion. Since this boundary is typically highly context-dependent and cannot be determined solely from sample data, it must be judged and specified by domain experts in light of the specific research context. If

M C D_{i}^{p o s - n e u} > M C D_{i}^{n e u - n e g}

, this indicates that the attribute shows a bigger difference in the pos−neu interval than in the neu−neg interval. Furthermore, if

|\frac{M C D_{i}^{p o s - n e u}}{M C D_{i}^{n e u - n e g}}| > δ

, the relative difference in strength between the two intervals exceeds the predefined threshold, the attribute is therefore classified as A-type (with a V-shaped curve). In other words, the primary role of this attribute is to generate additional satisfaction, whereas its absence does not necessarily lead to obvious dissatisfaction. Otherwise, the attribute is classified as O-type (with a linearly increasing curve), indicating that the attribute’s effect intensity is relatively balanced across the two adjacent intervals. If

M C D_{i}^{p o s - n e u} \leq M C D_{i}^{n e u - n e g}

, this indicates that the attribute shows a bigger difference in the neu−neg interval than in the pos−neu interval. Furthermore, if

|\frac{M C D_{i}^{n e u - n e g}}{M C D_{i}^{p o s - n e u}}| > δ

, the relative difference in strength between the two intervals exceeds the predefined threshold, the attribute is therefore classified as M-type (with an inverted V-shaped curve). Otherwise, the attribute is classified as O-type.

3.5. Attribute Improvement Priority Assessment Based on MCD

In this study,

P e r f_{i}

is defined as the global arithmetic mean marginal contribution of attribute i across all sentiment categories. It is calculated by averaging the marginal contributions of attribute i across all its occurrences in positive, neutral, and negative reviews, as shown in Equation (4), where

c \in {p o s, n e u, n e g}

denotes the sentiment category corresponding to positive, neutral, and negative, respectively.

P e r f_{i} = \frac{1}{\sum_{c} N_{i}^{c}} \sum_{c} \sum_{j \in N_{i}^{c}} φ_{i, j}^{c}

(4)

P e r f_{i}

was normalized using Equation (5) to obtain

P e r f_{i}^{*}

.

P e r f_{i}^{*} = \frac{P e r f_{i} - P e r f_{m i n}}{P e r f_{m a x} - P e r f_{m i n}}

(5)

Because different categories of attributes function differently in enhancing satisfaction and alleviating dissatisfaction, adopting a uniform evaluation criterion may obscure their actual improvement value [75,76].

For A-type attributes, according to the two-factor theory, the core mechanism is that fulfillment brings additional CS enhancement, whereas non-fulfillment does not necessarily lead to obvious dissatisfaction. Therefore,

M C D_{i}^{p o s - n e u}

is more suitable for characterizing their satisfaction enhancement effect. For M-type attributes, the core mechanism is that fulfillment may not significantly improve CS, whereas non-fulfillment leads to dissatisfaction. Therefore,

M C D_{i}^{n e u - n e g}

is more suitable for capturing their dissatisfaction-prevention effect. For O-type attributes, attribute performance generally exhibits a relatively linear or monotonic relationship with CS; that is, higher fulfillment improves CS, whereas non-fulfillment or poor performance leads to dissatisfaction. Therefore, O-type attributes involve both satisfaction enhancement and dissatisfaction prevention, and this study uses both

M C D_{i}^{p o s - n e u}

and

M C D_{i}^{n e u - n e g}

to characterize these two components [77,78]. The preference direction of R-type attributes is opposite to that in conventional improvement logic, whereas I-type attributes exert no significant effects on either satisfaction or dissatisfaction. Therefore, firms should reduce resource allocation to both categories to avoid blind investment, and this study excludes both categories from the scope of the priority discussion. This category-specific specification allows attributes within the same Kano category to be ranked according to the impact dimension most relevant to their improvement value.

On this basis, considering the important role of improvement potential in product improvement [79], this study incorporates improvement potential into the MCD of attributes to evaluate intra-class priorities and construct AIPS. This method simultaneously captures the impact intensity of attributes and their actual room for improvement, thereby providing a more targeted basis for identifying priority improvement targets under constrained resources.

In Equation (6),

α_{k_{i}}

and

β_{k_{i}}

denote binary indicator coefficients whose values are defined in Equation (7); they specify which directional component enters AIPS, while the corresponding MCD value determines the effect magnitude:

A I P S_{i} = (α_{k_{i}} M C D_{i}^{p o s - n e u} + β_{k_{i}} M C D_{i}^{n e u - n e g}) (1 - P e r f_{i}^{*})

(6)

(α_{k_{i}}, β_{k_{i}}) = \{\begin{array}{l} (1, 0), & k_{i} = A \\ (0, 1), & k_{i} = M \\ (1, 1), & k_{i} = O \end{array}, k_{i} \in A, M, O

(7)

4. Case Study

4.1. Data Description and Pre-Processing Results

JD.com (http://www.jd.com) and Taobao.com (https://www.taobao.com) both have large, active customer bases, particularly in electronics and smart devices [8]. Given their large user bases and authentic transaction contexts, these two platforms provide abundant online review data for research on product performance analysis and customer demand mining. Therefore, this study regards them as ideal data sources for identifying customer needs in the product domain.

To validate the effectiveness of the proposed method, this study selected smartwatches as the research case. The market for this type of product is relatively mature, and customer cognition is relatively stable, which helps reduce data interference and improve the accuracy and consistency of Kano category classification. In addition, smartwatches typically generate a large volume of high-quality customer reviews, providing a rich data foundation for attribute extraction, sentiment identification, and model validation. We selected eight representative mainstream smartwatch brands with high customer recognition as the research objects. Using web-crawling technology, this study collected customer reviews from self-operated flagship stores on the two aforementioned platforms from June 2024 to March 2025, yielding a total of 38,027 original reviews, including 17,316 from Taobao.com and 20,711 from JD.com. After text cleaning and pre-processing, a total of 35,640 valid reviews were retained. Specifically, this study strictly cleaned the data according to predefined rules by removing duplicate reviews, blank or garbled reviews, obviously meaningless reviews, and reviews irrelevant to the research object, thereby minimizing the effects of noise and subjective selection bias on the analysis results. To ensure the reliability of the cleaning results, this study further compared the samples before and after cleaning with respect to major characteristics, such as review time distribution and platform source, and found no significant shift in the overall distribution. Table 2 provides detailed information for each product.

4.2. Product Attribute Extraction Results

In this study, the BTM was implemented in Python after data pre-processing. Drawing on previous studies, this study set the expected number of topics to K, σ, and τ to 50/K, 0.01, and 1, respectively, and the number of iterations and the random state to 1000 and 1, respectively. By comparing topic coherence scores across different topic numbers, 19 was selected as the expected number of topics because it achieved a high coherence score and semantically interpretable topics; ultimately, 11 topics were determined, with a maximum coherence score of 0.646. On this basis, domain experts semantically named the keywords under each topic. Table 3 presents the 11 identified attributes and the five most representative keywords for each.

4.3. Attribute Sentiment Analysis Results

To enable attribute-level sentiment analysis, this study matched review texts to the attribute lexicon and constructed input samples in the form of “attribute–review” pairs. Specifically, when a representative keyword of a given attribute appeared in a review, that attribute was paired with the entire review to generate a new input record; if the same review involved multiple attributes, multiple corresponding “attribute–review” samples were constructed. After attribute matching, 11,532 attribute–review pairs were generated as supervised samples. All pairs were manually annotated as positive, neutral, or negative by two trained research team members over two months using a unified annotation guideline. Positive labels indicated explicit approval, support, optimism, or other favorable attitudes; negative labels indicated explicit criticism, concern, rejection, pessimism, or other unfavorable attitudes; and neutral labels indicated factual, descriptive, ambiguous, balanced, or non-dominant evaluative content. Borderline cases were labeled as neutral when the sentiment orientation was weak, implicit, mixed, or insufficiently supported by textual evidence. To assess annotation reliability, Cohen’s κ was calculated based on the two annotators’ independent labels, yielding κ = 0.82, indicating strong inter-annotator agreement. A domain expert resolved disagreements or ambiguous cases to determine the final labels.

The finalized dataset was split into training, validation, and test sets at an 8:1:1 ratio. The BERT-A-Conv model was initialized with the Google Chinese BERT-Base pretrained checkpoint. Hyperparameter tuning was performed empirically based on the model’s performance on the validation set. Key hyperparameters, including the learning rate, batch size, dropout rate, convolution kernel size, and number of convolution filters, were adjusted and selected according to the validation macro-F1 score. The maximum sequence length, embedding dimension, number of attributes, convolution kernel size, number of convolution filters, dropout rate, and batch size were set to 512, 768, 11, 3, 128, 0.3, and 16, respectively. The model was optimized using Adam with a learning rate of

2 \times 10^{- 5}

and trained for up to 5 epochs. Early stopping was applied with a patience of 3 based on the validation macro-F1 score, and the best-performing checkpoint on the validation set was retained for testing. The experiments were conducted on an NVIDIA GeForce RTX 4060 Laptop GPU, and each BERT-A-Conv training run took approximately 8 h under the above parameter settings. Across five independent runs with different random seeds, the model achieved macro-averaged Precision, Recall, and F1-score values of 0.926 ± 0.004, 0.941 ± 0.005, and 0.933 ± 0.004, respectively, on the test set, indicating stable performance in attribute-level sentiment identification. To avoid relying on an unusually favorable single run, the downstream attribute-level sentiment distribution analysis was conducted using the median-performing run, defined as the third-ranked run by validation macro-F1 among the five independent runs.

4.4. MCD-Kano Classification Results

To reduce model bias and ensure the reliability of the analytical procedure, we divided the dataset into training, validation, and test sets at a ratio of 6.5:1:2.5. The original review ratings were binarized into satisfaction labels: ratings of 4 and 5 were coded as satisfied reviews, denoted as class 1, whereas ratings of 1–3 were coded as not satisfied reviews, denoted as class 0. After binarization, class 0 accounted for 38.22% of the samples, while class 1 accounted for 61.78%. In the LightGBM model, we set the initial learning rate

μ

to 0.05 and the maximum number of iterations (n_estimators) to 1000. We also introduced an early stopping mechanism (early_stopping_rounds = 50) to prevent overfitting. Using the training set, we tuned the model hyperparameters through five-fold cross-validation, while the validation set was used to monitor early stopping. The optimal configuration is reported in Table A1 of Appendix A. After hyperparameter tuning based on the validation macro-F1 score, the model was retrained on the combined training and validation sets and evaluated on an independent test set.

Under these settings, the LightGBM model was evaluated across five independent runs with different random seeds to assess its prediction stability. The model achieved mean ± SD Precision, Recall, and F1-score values of 0.915 ± 0.004, 0.944 ± 0.006, and 0.929 ± 0.005, respectively, on the test set. Using the same dataset and optimization strategy, we also evaluated XGBoost and Bagging–Random Forest, and Table A2 of Appendix A presents the complete comparative results. Based on the final LightGBM model, SHAP values were computed on the test set to examine the contribution of each sentiment attribute to CS predictions.

Table 4 presents the SHAP-derived MCD values. Most attributes show positive

M C D_{i}^{p o s - n e u}

, indicating that positive sentiment generally contributes to CS improvement. Price, Battery, Quality, Practicality, Design, Fitness tracking, and Operate present relatively high

M C D_{i}^{p o s - n e u}

values, suggesting stronger satisfaction gains when customer perceptions shift from neutral to positive. By contrast, Service and Wearing show negative

M C D_{i}^{p o s - n e u}

, implying limited additional satisfaction gains from positive sentiment for these attributes.

The

M C D_{i}^{n e u - n e g}

results further reveal the role of attributes in dissatisfaction reduction. Practicality, Operate, Fitness tracking, Wearing, Service, and Connectivity present relatively high positive values, indicating that moving from negative to neutral sentiment substantially improves CS. In contrast, Design and Price show negative

M C D_{i}^{n e u - n e g}

, while App support, Quality, and Battery show only limited positive values, suggesting weaker effects in reducing dissatisfaction.

These differences explain the resulting Kano classifications. Based on interviews with product development engineers and cost analysis experts, we set ς to 0.05 and selected δ = 2 to distinguish between different curve shapes. A-type attributes, including App support, Design, Battery, Price, and Quality, are mainly characterized by stronger gains from neutral-to-positive sentiment shifts. O-type attributes, including Connectivity, Practicality, Fitness tracking, and Operate, are characterized by relatively balanced marginal contribution differences across adjacent sentiment transitions, indicating a more consistent association between sentiment shifts and CS changes. M-type attributes, including Service and Wearing, are characterized by larger

M C D_{i}^{n e u - n e g}

values, suggesting that their primary role is to prevent dissatisfaction rather than to generate additional satisfaction gains.

4.5. Attribute Improvement Priority Assessment Results Based on MCD

The performance values, calculated using the equations, along with their normalized results, are presented in Table 5. Practicality has the highest performance, indicating that this attribute is currently highly recognized; by contrast, Wearing has the lowest performance, suggesting that it still has considerable room for improvement. In addition, the remaining attributes exhibit varying degrees of differences.

By integrating the MCD and the improvement potential of attributes, Figure 6 presents the AIPS-based priority results. The results show that the improvement priority is not determined solely by the Kano category. Attributes within the same Kano category may have different priority levels because they differ in both current performance and marginal contribution to CS or dissatisfaction. For example, although Wearing and Service are both classified as M-type attributes, Wearing receives a higher AIPS value because it has a lower current performance and a stronger negative impact on CS. Similarly, among A-type attributes, Battery ranks higher than App support because it has greater improvement potential and a higher positive marginal contribution. Among O-type attributes, Operate receives the highest priority because its current performance remains relatively low while its total contribution is high. These results indicate that Kano-based prioritization should account for intra-category heterogeneity rather than assuming that attributes within the same category have equivalent improvement value. By integrating MCD with improvement potential, AIPS extends Kano analysis from requirement classification to fine-grained priority assessment, thereby providing a more nuanced basis for data-driven product improvement decisions.

In summary, the definition of AIPS provides a clear quantitative basis for enterprises to optimize resource allocation and develop improvement strategies. When optimizing products or services, enterprises should adopt differentiated improvement strategies. For M-type attributes, deficiencies should be addressed as a priority to avoid CS caused by inadequate basic functions. For O-type attributes, firms should prioritize improvements in core performance to achieve steady CS growth. For A-type attributes, they can serve as important leverage points for enhancing product competitiveness and building differentiated advantages. By prioritizing limited resources toward attributes with high AIPS values, enterprises can not only enhance CS more effectively but also maximize the benefits of improvement under cost constraints.

4.6. Sensitivity Analysis and Comparative Evaluation

4.6.1. Sensitivity Analysis of Threshold Parameters

To assess the appropriateness of the threshold settings in the MCD-Kano model, this study conducted a sensitivity analysis as a robustness check. Specifically, ς was varied across {0.02, 0.05, 0.10}, and δ across {1.5, 2.0, 3.0} to evaluate the stability of the MCD-Kano classification results under alternative threshold settings.

Table 6 shows that the MCD-Kano classification is insensitive to changes in ς, as all 11 attributes retain the same Kano category across the tested values. For δ, the classification results remain identical to the baseline when δ is decreased from 2.0 to 1.5. Only when δ is increased to 3.0 do two attributes, Design and Wearing, change category, while the remaining 9 out of 11 attributes retain their original Kano categories. This variation can be explained by the role of δ in the classification rules. In the proposed model, δ serves as the cutoff ratio for distinguishing asymmetric from one-dimensional effects. Increasing δ from 2.0 to 3.0 imposes a stricter requirement for identifying A-type or M-type attributes. As a result, attributes whose asymmetric marginal contributions are not strong enough to satisfy the higher cutoff may be reclassified as O-type. This explains why Design shifts from A-type to O-type and Wearing shifts from M-type to O-type under δ = 3.0.

These changes are consistent with the MCD-Kano classification rules and suggest that Design and Wearing are located near the decision boundary between asymmetric and one-dimensional effects, rather than being arbitrarily classified. Moreover, the changes follow the expected direction from asymmetric categories to the one-dimensional category, without irregular or contradictory shifts. Therefore, the sensitivity analysis does not undermine the rationality of the baseline classification. Instead, it shows that most attributes remain stable under reasonable threshold settings, while the few observed changes are rule-consistent and interpretable.

4.6.2. Comparison of Product Attribute Extraction Model

In the product attribute extraction stage, two topic modeling methods, LDA and BTM, were employed. To ensure a fair comparison, the analysis applied both models to the same product review dataset, fixed the number of topics at 19, and then calculated topic coherence scores to assess model quality. As shown in Figure 7, BTM achieved a significantly higher score than LDA, demonstrating stronger topic coherence and structural identification capability.

Further manual interpretation and semantic analysis showed that the high-frequency words generated by BTM had higher semantic concentration and fewer redundant or meaningless words, indicating that BTM can more effectively capture the core attribute information in the text and improve the accuracy and interpretability of attribute extraction. Combined with the quantitative results and manual analysis, the findings of this study are consistent with those of Zhang et al. [31], namely that when processing short or highly sparse texts, BTM can significantly improve topic coherence and semantic aggregation by modeling word-pair co-occurrence patterns while reducing noise interference.

4.6.3. Comparison of Sentiment Analysis Models

To evaluate the contribution of the main components of the proposed model, this study conducted an ablation analysis by comparing the full BERT-A-Conv model with four baseline models: BERT, BERT-CNN, CNN, and BERT-attention. These comparisons were designed to examine the effects of contextual semantic representation, convolutional feature extraction, and attention-based attribute modeling on attribute-level sentiment classification.

As shown in Table 7, the proposed model achieved the highest macro-average Precision, Recall, and F1-Score, with low standard deviations of 0.004, 0.005, and 0.004, respectively, indicating stable performance across five independent runs. To further assess the consistency of these performance differences, pairwise one-sided Wilcoxon signed-rank tests were conducted using macro-F1 scores from the five runs, showing that BERT-A-Conv significantly outperformed the compared models at the 0.05 significance level.

4.6.4. Comparison of Kano-Based Attribute Classification and Prioritization

To examine whether the attribute classification results derived from online reviews are consistent with consumers’ needs, a standard Kano questionnaire was adopted for external validation. A total of 350 questionnaires were distributed. After excluding invalid responses that failed the attention check, selected the same option for all items, or had abnormally short completion times, 311 valid responses were obtained. The questionnaire design followed the standard Kano method, with one pair of functional and dysfunctional questions designed for each product attribute. The functional question asked how consumers would feel if the attribute existed or performed well, whereas the dysfunctional question asked how consumers would feel if the attribute were absent or performed poorly. Each question was measured using a five-point scale ranging from “Like” to “Dislike”. The attributes were then classified as A, O, M, I, R, or questionable (Q-type) according to the Kano evaluation table, using the responses to the functional and dysfunctional questions. The descriptive statistics of the respondents are reported in Table A3 in Appendix A.

On this basis, the classification results obtained by the proposed method were compared with those of the standard Kano questionnaire and the method of Joung and Kim [80]. To enable comparison with binary-sentiment-based Kano classification, the three sentiment categories were transformed into binary sentiment by merging neutral sentiment into the corresponding positive or negative category under the Joung and Kim setting. The reliability of the attribute-level sentiment classification and CS prediction models was ensured before the subsequent marginal-contribution calculation, and Kano classification was performed.

Table 8 summarizes the comparison of all 11 attributes across the three classification methods. Taking the standard Kano questionnaire results as the external validation benchmark, the proposed method was consistent with the questionnaire results for all attributes. In contrast, the Joung and Kim method produced two inconsistent classifications, with Service and Wearing identified as O-type attributes. Further analysis shows that the Joung and Kim method identifies Kano types mainly based on the sign direction of the marginal contributions of positive and negative sentiments; therefore, Service and Wearing were classified as O-type attributes. However, relying solely on positive and negative sentiments captures only marginal effects at the two endpoints and cannot reveal differences in attribute effects during transitions between sentiment states. After incorporating neutral sentiment, the proposed method shows that the

M C D_{i}^{n e u - n e g}

values for Service and Wearing are 0.898 and 0.952, respectively, which are much higher than their

M C D_{i}^{p o s - n e u}

values (−0.144 and −0.455, respectively). This pattern indicates that the satisfaction response curves for these two attributes increase markedly from negative to neutral, but show almost no further increase from neutral to positive, overall presenting an inverted V-shaped trend with neutral sentiment as the peak, which is consistent with the characteristics of M-type attributes. In addition, although the classification of A-type attributes is consistent, their method cannot reflect the characteristics of A-type curves. For App support, Design, Battery, Price, and Quality, the

M C D_{i}^{n e u - n e g}

values are mostly close to 0 or negative, whereas all

M C D_{i}^{p o s - n e u}

values are clearly positive, indicating that A-type attributes show little change, or even a slight decline, when shifting from negative to neutral sentiment; however, they increase significantly when shifting from neutral to positive sentiment. Thus, they exhibit an overall V-shaped pattern, with neutral sentiment as the low point, and the increase on the right side substantially greater than the change on the left. For Connectivity, Practicality, Fitness tracking, and Operate, both

M C D_{i}^{p o s - n e u}

and

M C D_{i}^{n e u - n e g}

are positive, indicating that the indicators increase continuously from negative to neutral and then to positive, conforming to a monotonically increasing pattern and consistent with the characteristics of O-type attributes.

In summary, treating only positive and negative sentiments as endpoints is equivalent to assuming a two-endpoint shape for the satisfaction response curve. By incorporating neutral sentiment, the model introduces an intermediate reference point and captures the marginal changes across the three states of “negative–neutral–positive,” thereby distinguishing O, A, M, R, and I-type attributes more effectively. This result also supports Tang et al.’s [81] view that neutral sentiment is not noise but should be separately identified and modeled. Therefore, incorporating neutral sentiment helps reduce classification bias and improves the rationality and reliability of Kano classification results.

To further validate the practical relevance of the AIPS-based within-category rankings, we adopted an expert-judgment procedure. Experts with experience in product development and cost evaluation rated AIPS-derived within-category pairwise priority relationships using a seven-point Likert scale, combining pairwise comparison for relative-priority judgment with expert-based validation practices [82,83].

After evaluating 17 pairwise priority relationships, the experts showed strong agreement with the AIPS-derived rankings. Specifically, 16 comparisons received mean scores between 5.60 and 6.80, indicating broad support for the within-category priority relationships. The only exception was Design over Quality, which received a lower score and greater dispersion (Mean = 4.00, SD = 1.22). This divergence may be due to the small AIPS difference between the two attributes and to potential biases in online review data, in which customers may more readily express visible design impressions than quality-related concerns unless failures occur. It may also reflect experts’ greater emphasis on reliability, durability, failure risk, and long-term brand trust. Thus, closely ranked attributes should be interpreted with caution when online review signals and practical development considerations differ.

5. Discussion and Conclusions

The MCD-Kano model proposed in this study aims to decode CS for e-commerce products and accurately classify and prioritize the attributes that influence CS, thereby providing decision support for product improvement in review-rich e-commerce contexts. This method comprises key stages, including attribute extraction, attribute-level sentiment analysis, construction of the MCD-Kano model, and within-category attribute priority determination, forming an analytical process from semantic modeling to optimization decision-making. This systematic framework helps address several limitations of previous studies, including the neglect of neutral sentiment, insufficient accuracy in Kano classification, and ambiguous priority determination, thereby providing a basis for product iteration and improvement.

In the case study, 11 key attributes of best-selling smartwatches were systematically analyzed and classified into 4 O-type, 5 A-type, and 2 M-type attributes using the Kano model. The limited number of M-type attributes suggests that only a small proportion of the identified attributes are perceived as basic requirements. Nevertheless, these M-type attributes remain critical, as poor performance in them is more likely to cause customer dissatisfaction and adversely affect overall product evaluations. The relatively high proportion of A-type attributes suggests that more attributes are closely associated with enhancing CS and creating differentiated customer experiences. At the same time, this pattern may also stem from the relatively short time since some products were launched, reflecting the strong market enthusiasm commonly observed in early-adopter feedback.

The results show that Wearing was identified as a key M-type attribute. This classification is highly consistent with the physical characteristics of smartwatches. Smartwatches usually need to remain in close contact with the customer’s body for extended periods and are often worn throughout the day to support continuous health monitoring, activity tracking, and sleep monitoring. Therefore, even if the product performs well in other functional dimensions, discomfort during wear, excessive device weight, or unsuitable strap materials may directly undermine the customer experience and significantly reduce customers’ overall satisfaction with the product. Beyond basic requirements, firms should also make full use of differentiated attributes to enhance the perceived value of their products. Notably, the model results indicate that Battery was identified as the highest-priority A-type attribute. This finding reflects the tension between the current technological bottlenecks in the wearable device industry and consumers’ expectations. Compared with traditional watches, smartwatches integrate multiple functions, including health monitoring, message notifications, location services, and app-based interactions. As a result, they have higher levels of power consumption, and battery life has long been a core concern for customers. When a product achieves significant breakthroughs in battery stability, battery life, or charging efficiency, this attribute can become a powerful “delighter,” substantially exceeding customer expectations and further serving as an important driver of brand differentiation. Finally, regarding O-type attributes, the results suggest that operational experience, including response speed, ease of operation, and system fluency, is a key area for achieving linear improvements in CS. Given the inherent limitations of smartwatch interactions on small-screen interfaces, a smooth, efficient operational experience can directly translate into higher-quality perceived use. Therefore, firms should adopt a phased iterative strategy in product optimization: first, they should strictly ensure the performance of M-type attributes to prevent customer loss caused by unmet basic requirements; second, they should continuously improve O-type attributes to steadily enhance product usability and operational efficiency; and finally, they should prioritize investment in A-type attribute innovation to strengthen product differentiation and further consolidate market competitiveness.

This paper provides theoretical support and managerial insights for evaluating product satisfaction and improving enterprise products in the e-commerce context.

5.1. Theoretical Contribution

This study makes three main theoretical contributions. First, this study extends research in information systems on user-generated content mining and the intelligent analysis of customer feedback. This paper regards attribute-level sentiment information in online reviews as the key analytical unit linking the identification of customer feedback with the construction of demand knowledge, and develops an interpretable Kano analysis framework. This framework integrates attribute extraction, attribute-level sentiment analysis, Kano requirement classification, and interpretive analysis into a unified logic, thereby providing a clearer theoretical pathway for understanding how user-generated content can be systematically transformed into product requirement knowledge.

Second, this study advances the application of the Kano model and the theory of asymmetric CS in the context of online reviews. Traditional online-review-based customer requirement analysis often adopts a binary positive–negative sentiment classification, thereby neglecting the large number of neutral or weak sentiment expressions in customer evaluations. This paper introduces a neutral sentiment state and, drawing on game-theoretic logic, employs the SHAP method to measure differences in the marginal contributions of product attributes to CS and dissatisfaction under different sentiment states. On this basis, it constructs a Kano classification method and extends customer feedback from the traditional positive–negative sentiment classification to multi-dimensional satisfaction signals that include positive, negative, and neutral evaluations. This method helps reveal, in greater detail, the differentiated effects of sentiment states on CS and dissatisfaction, and provides a more interpretable theoretical basis for explaining the asymmetric role of product attributes in the structure of customer requirements.

Third, this study deepens research on decision support for product improvement. Traditional Kano analysis mainly focuses on the classification of attribute categories but gives insufficient attention to differences in attribute values within the same category and to the order of improvement. Drawing on two-factor theory, this paper combines differences in marginal contribution with improvement potential and distinguishes the evaluative dimensions most relevant to different Kano categories, thereby constructing an evaluation mechanism for prioritizing attribute improvement. This enables Kano analysis to extend from requirement category identification to attribute-level improvement ranking. Accordingly, this study provides a more refined theoretical–analytical perspective on product iteration and data-driven decision support in complex product contexts.

5.2. Practical Contribution

First, this study offers more precise and actionable guidance for enterprise product development and resource allocation. It enables product managers to move beyond the tendency to add features indiscriminately and instead direct limited budgets toward core attributes with greater satisfaction-conversion efficiency, thereby improving the effectiveness of product investment and maximizing returns.

Second, compared with traditional questionnaire-based approaches that are often constrained by delayed feedback, the automated analytical framework developed in this study helps e-commerce firms establish an agile market-monitoring mechanism. In practice, the framework can be embedded into regular reporting systems or decision dashboards to generate attribute priorities, satisfaction drivers, and emerging pain-point signals. For firms with limited technical resources, a simplified implementation focusing on attribute extraction and sentiment tracking can still provide useful support for early-stage product optimization. Moreover, the framework may be further extended to other consumer electronics products with sufficient online review data, thereby providing scalable support for customer need monitoring and managerial decision-making.

Third, this methodological framework is not limited to supporting product R&D; it also provides direct managerial value to marketing and customer service functions. By generating fine-grained attribute priorities, it enables marketing departments to identify differentiated selling points and helps customer service teams accurately detect pain-point attributes associated with customer dissatisfaction. As a result, firms can formulate proactive service recovery strategies to improve customer retention and business conversion in the e-commerce context.

5.3. Limitations and Future Research Directions

Although the proposed method and case study have been rigorously validated, future research should further address the following limitations. First, the current data are mainly derived from online reviews collected from two Chinese e-commerce platforms and focus on a single product category; therefore, the generalizability of the findings should be interpreted with caution. In addition, the measurement of satisfaction in this study is constrained by the indicators available on e-commerce platforms. Following previous studies, we used review ratings as a proxy for CS; however, ratings may not fully capture the complexity of customer perceptions. Therefore, future research could incorporate other satisfaction-related indicators, such as repurchase behavior, return records, or follow-up survey data, to further validate the proposed framework. Second, although expert- and consumer-based validation provides preliminary evidence of the method’s effectiveness, the systematic approach proposed in this study has not yet been applied or tested in actual product improvement processes. Therefore, future research should extend this approach to field implementation in relevant departments and establish a closed-loop verification mechanism through before-and-after comparisons of product improvement decisions, thereby bridging the gap between theoretical deduction and practical application. In addition, the relatively short evaluation periods of some products may introduce early-adopter bias, potentially skewing sentiment evaluations toward the positive. Future longitudinal studies could further examine how these Kano classifications evolve as products mature.

Author Contributions

Conceptualization, H.Y.; Data curation, H.Y.; Funding acquisition, Y.L.; Investigation, H.Y.; Project administration, Y.L.; Software, H.Y. and Y.L.; Supervision, Y.L.; Validation, Y.L.; Writing—original draft, H.Y.; Writing—review and editing, H.Y. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Major Entrusted Project of Liaoning Provincial Social Science Planning Fund (No. L23ZD045), and Liaoning Provincial Federation of Social Sciences Research Project on Economic and Social Development (No. 2024lslybkt-026).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the corresponding author upon reasonable request due to privacy and ethical restrictions associated with user-generated online review data.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Algorithm A1. Constructing attribute–review samples from review text

Input:

reviews, a collection of Chinese review texts

attr_dict, a dictionary mapping each attribute to its set of representative keywords extracted by the BTM model and expanded with related synonyms and lexical variants

Output:

samples, a list of (attribute, review) pairs

Steps:

1: samples ← ∅

2: for review in reviews do

3: matched_attributes ← ∅

4: for (attr, keywords) in attr_dict do

5: if contains_any(review, keywords) then

6: matched_attributes.add(attr)

7: end if

8: end for

9: if matched_attributes ≠ ∅ then

10: for attr in matched_attributes do

11: sample ← (attr, review)

12: samples.add(sample)

13: end for

14: end if

15: end for

16: return samples

Table A1. Optimal Ensemble-Level and Decision-Tree-Level Parameters of the LightGBM Binary Classification Model.

Parameter Name	Code Name	Optimal Value
Ensemble level
Boosting type	boosting_type	gbdt
Learning rate	learning_rate	0.035
Number of boosting iterations	n_estimators	742
Feature sampling fraction	feature_fraction	0.78
Bagging fraction	bagging_fraction	0.85
Bagging frequency	bagging_freq	5
L1 regularization term on weights	lambda_l1	0.5
L2 regularization term on weights	lambda_l2	3.0
Decision-tree level
Maximum number of leaves	num_leaves	128
Maximum tree depth	max_depth	10
Minimum number of samples in a leaf	min_data_in_leaf	45
Minimum sum of instance Hessian in a leaf	min_sum_hessian_in_leaf	1.2
Minimum split gain	min_split_gain	0.02
Maximum number of bins	max_bin	255

Table A2. Comparison of LightGBM with other models.

Model	Precision	Recall	F1-Score	Mean F1 Difference	Wilcoxon W	One-Sided p-Value
LightGBM	0.915 ± 0.004	0.944 ± 0.006	0.929 ± 0.005	-	-	-
XGBoost	0.832 ± 0.013	0.895 ± 0.008	0.862 ± 0.008	0.067	15	0.031
Bagging–Random Forest	0.883 ± 0.005	0.809 ± 0.003	0.844 ± 0.002	0.085	15	0.031

Note: The Mean F1 difference denotes the average paired difference in F1-score between LightGBM and each compared model across the five runs.

Table A3. Descriptive statistics of respondents in the smartwatch Kano questionnaire.

Characteristic	Category	Frequency	Percentage *
Gender	Female	146	46.95
Gender	Male	165	53.05
Age	<20	15	4.82
	20–25	89	28.62
	26–30	102	32.8
	31–35	51	16.4
	35–40	33	10.61
	>40	21	6.75
Current brands	Huawei	106	34.08
	Honor	51	16.4
	Oppo	47	15.11
	Xiaomi	86	27.65
	OnePlus	14	4.5
	Others	7	2.25

Note: * The percentages are computed based on a total usable sample of 311 people.

References

Jin, J.; Jia, D.; Chen, K. Mining Online Reviews with a Kansei-Integrated Kano Model for Innovative Product Design. Int. J. Prod. Res. 2022, 60, 6708–6727. [Google Scholar] [CrossRef]
Li, S.; Zhu, B.; Zhang, Y.; Liu, F.; Yu, Z. A Two-Stage Nonlinear User Satisfaction Decision Model Based on Online Review Mining: Considering Non-Compensatory and Compensatory Stages. J. Theor. Appl. Electron. Commer. Res. 2024, 19, 272–296. [Google Scholar] [CrossRef]
Kim, S.-A.; Park, S.; Kwak, M.; Kang, C. Examining Product Quality and Competitiveness via Online Reviews: An Integrated Approach of Importance Performance Competitor Analysis and Kano Model. J. Retail. Consum. Serv. 2025, 82, 104135. [Google Scholar] [CrossRef]
Li, Y.; Yu, H.; Shen, Z. Dynamic Prediction of Product Competitive Position: A Multisource Data-Driven Competitive Analysis Framework from a Multi-Competitor Perspective. J. Retail. Consum. Serv. 2025, 85, 104289. [Google Scholar] [CrossRef]
Thakur, R. Customer Engagement and Online Reviews. J. Retail. Consum. Serv. 2018, 41, 48–59. [Google Scholar] [CrossRef]
Wang, Q.; Zhang, W.; Li, J.; Mai, F.; Ma, Z. Effect of Online Review Sentiment on Product Sales: The Moderating Role of Review Credibility Perception. Comput. Hum. Behav. 2022, 133, 107272. [Google Scholar] [CrossRef]
Zhai, M.; Wang, X.; Zhao, X. The Importance of Online Customer Reviews Characteristics on Remanufactured Product Sales: Evidence from the Mobile Phone Market on Amazon. Com. J. Retail. Consum. Serv. 2024, 77, 103677. [Google Scholar] [CrossRef]
Wei, W.; Hao, C.; Wang, Z. User Needs Insights from UGC Based on Large Language Model. Adv. Eng. Inform. 2025, 65, 103268. [Google Scholar] [CrossRef]
Chen, J.; Wu, J.; Wang, D.; Stantic, B. Beyond Static Rankings: A Tourist Experience-Driven Approach to Measure Destination Competitiveness. Tour. Manag. 2025, 106, 105022. [Google Scholar] [CrossRef]
Ding, H.; Zhu, W.; Hu, G.; Bu, Z. Capturing Dynamic User Preferences: A Recommendation System Model with Non-Linear Forgetting and Evolving Topics. Systems 2025, 13, 1034. [Google Scholar] [CrossRef]
Li, H.; Yu, B.X.B.; Li, G.; Gao, H. Restaurant Survival Prediction Using Customer-Generated Content: An Aspect-Based Sentiment Analysis of Online Reviews. Tour. Manag. 2023, 96, 104707. [Google Scholar] [CrossRef]
Liu, Y.; Shi, J.; Huang, F.; Hou, J.; Zhang, C. Unveiling Consumer Preferences in Automotive Reviews through Aspect-Based Opinion Generation. J. Retail. Consum. Serv. 2024, 77, 103605. [Google Scholar] [CrossRef]
Schouten, K.; Frasincar, F. Survey on Aspect-Level Sentiment Analysis. IEEE Trans. Knowl. Data Eng. 2016, 28, 813–830. [Google Scholar] [CrossRef]
Tripathy, G.; Sharaff, A. Traversing the Landscape of Aspect-Based Sentiment Analysis: Delving Deeper into Techniques, Trends, and Future Directions. Comput. Sci. Rev. 2026, 60, 100885. [Google Scholar] [CrossRef]
Wu, H.; Huang, C.; Deng, S. Improving Aspect-Based Sentiment Analysis with Knowledge-Aware Dependency Graph Network. Inf. Fusion 2023, 92, 289–299. [Google Scholar] [CrossRef]
Xiao, Y.; Li, C.; Thürer, M.; Liu, Y.; Qu, T. User Preference Mining Based on Fine-Grained Sentiment Analysis. J. Retail. Consum. Serv. 2022, 68, 103013. [Google Scholar] [CrossRef]
Peng, S.; Ye, D.; Tan, H. Modeling User Requirement for Value-Oriented Design: A Multi-Dimensional Perception Evidence from the Automobile Market. Systems 2026, 14, 251. [Google Scholar] [CrossRef]
Hua, Y.C.; Denny, P.; Wicker, J.; Taskova, K. A Systematic Review of Aspect-Based Sentiment Analysis: Domains, Methods, and Trends. Artif. Intell. Rev. 2024, 57, 296. [Google Scholar] [CrossRef]
Wu, J.; Chen, J.; Yang, T.; Zhao, N. How to Stay Competitive: An Innovative Concept to Assess the Business Competitiveness Using Online Restaurant Reviews. Int. J. Hosp. Manag. 2024, 122, 103836. [Google Scholar] [CrossRef]
Zhang, H.; Cheah, Y.-N.; Alyasiri, O.M.; An, J. Exploring Aspect-Based Sentiment Quadruple Extraction with Implicit Aspects, Opinions, and ChatGPT: A Comprehensive Survey. Artif. Intell. Rev. 2024, 57, 17. [Google Scholar] [CrossRef]
Zhou, K.; Yao, Z. Analysis of Customer Satisfaction in Tourism Services Based on the Kano Model. Systems 2023, 11, 345. [Google Scholar] [CrossRef]
Zhang, D.; Shen, Z.; Li, Y. Requirement Analysis and Service Optimization of Multiple Category Fresh Products in Online Retailing Using Importance-Kano Analysis. J. Retail. Consum. Serv. 2023, 72, 103253. [Google Scholar] [CrossRef]
Sinemus, K.; Zielke, S.; Dobbelstein, T. Improving Consumer Satisfaction through Shopping App Features: A Kano-Based Approach. J. Retail. Consum. Serv. 2025, 85, 104243. [Google Scholar] [CrossRef]
Koppel, M.; Schler, J. The Importance of Neutral Examples for Learning Sentiment. Comput. Intell. 2006, 22, 100–109. [Google Scholar] [CrossRef]
Son, J.; Lee, H.-K.; Choi, H.; Oh, O.-O. Are Neutral Sentiments Worth Considering When Investigating Online Consumer Reviews? Their Relationship with Review Ratings. In Proceedings of the Hawaii International Conference on System Sciences 2022 (HICSS-55), Maui, HI, USA, 4–7 January 2022; pp. 4643–4652. [Google Scholar]
Bohanec, M.; Kadoić, N.; Begičević Ređep, N. Preferential Knowledge for Multi-Criteria Decision Making: Stability and Consistency of Decision Rules and Weights. EURO J. Decis. Process. 2026, 14, 100067. [Google Scholar] [CrossRef]
Slevitch, L. Kano Model Categorization Methods: Typology and Systematic Critical Overview for Hospitality and Tourism Academics and Practitioners. J. Hosp. Tour. Res. 2025, 49, 449–479. [Google Scholar] [CrossRef]
Ho, S.-C.; Chuang, W.-L. Identifying and Prioritizing the Critical Quality Attributes for Business-to-Business Cross-Border Electronic Commerce Platforms. Electron. Commer. Res. Appl. 2023, 58, 101239. [Google Scholar] [CrossRef]
Wenninger, A.; Rau, D.; Röglinger, M. Improving Customer Satisfaction in Proactive Service Design. Electron. Mark. 2022, 32, 1399–1418. [Google Scholar] [CrossRef]
Lee, S.; Park, S.; Kwak, M. Revealing the Dual Importance and Kano Type of Attributes through Customer Review Analytics. Adv. Eng. Inform. 2022, 51, 101533. [Google Scholar] [CrossRef]
Zhang, K.; Lin, K.-Y.; Wang, J.; Ma, Y.; Li, H.; Zhang, L.; Liu, K.; Feng, L. UNISON Framework for User Requirement Elicitation and Classification of Smart Product-Service System. Adv. Eng. Inform. 2023, 57, 101996. [Google Scholar] [CrossRef]
Du, S. Hybrid Kano-DEMATEL-TOPSIS Model Based Benefit Distribution of Multiple Logistics Service Providers Considering Consumer Service Evaluation of Segmented Task. Expert Syst. Appl. 2023, 213, 119292. [Google Scholar] [CrossRef]
Wang, P.; Chu, J.; Yu, S.; Chen, C.; Hu, Y. A Consumers’ Kansei Needs Mining and Purchase Intention Evaluation Method Based on Fuzzy Linguistic Theory and Multi-Attribute Decision Making Method. Adv. Eng. Inform. 2024, 59, 102267. [Google Scholar] [CrossRef]
Zheng, W.-Q.; Cheung, S.-M.; Zhu, B.-W.; Xiong, L.; Tzeng, G.-H. A Hybrid Multi-Attribute Decision-Making Model for the Systematic Evaluation of Exoticism-Themed Retail Spaces from the Perspective of Consumer Experience. J. Retail. Consum. Serv. 2024, 79, 103848. [Google Scholar] [CrossRef]
Shen, Z.; Zhao, C.; Li, Y. Customer Requirements Analysis and Product Service Improvement Framework Using Multi-Source User-Generated Content and Dual Importance–Performance Analysis: A Case Study of Fresh E-Ecommerce. J. Theor. Appl. Electron. Commer. Res. 2026, 21, 19. [Google Scholar] [CrossRef]
Shen, Z.; Li, Y.; Wang, S.; Zhao, C. Exploring Dynamic Customer Requirement Trend of Buffet Restaurant: A Two-Stage Analysis from Online Reviews. Br. Food J. 2025, 127, 413–430. [Google Scholar] [CrossRef]
Wu, X.; Liao, H.; Zhang, C. Importance-Performance Analysis to Develop Product/Service Improvement Strategies through Online Reviews with Reliability. Ann. Oper. Res. 2023, 342, 1905–1924. [Google Scholar] [CrossRef]
Zhang, Y.; Guo, W.; Chang, Z.; Ma, J.; Fu, Z.; Wang, L.; Shao, H. User Requirement Modeling and Evolutionary Analysis Based on Review Data: Supporting the Design Upgrade of Product Attributes. Adv. Eng. Inform. 2024, 62, 102861. [Google Scholar] [CrossRef]
Bi, J.-W.; Liu, Y.; Fan, Z.-P.; Zhang, J. Wisdom of Crowds: Conducting Importance-Performance Analysis (IPA) through Online Reviews. Tour. Manag. 2019, 70, 460–478. [Google Scholar] [CrossRef]
Liu, Y.; Wu, J.; Fu, Q.; Feng, H.; Liu, J.; Fang, Y.; Niu, Y.; Xue, C. A Method for Constructing an Ergonomics Evaluation Indicator System for Community Aging Services Based on Kano-Delphi-CFA: A Case Study in China. Adv. Eng. Inform. 2024, 62, 102842. [Google Scholar] [CrossRef]
Qi, J.; Zhang, Z.; Jeon, S.; Zhou, Y. Mining Customer Requirements from Online Reviews: A Product Improvement Perspective. Inf. Manag. 2016, 53, 951–963. [Google Scholar] [CrossRef]
Xiao, S.; Wei, C.-P.; Dong, M. Crowd Intelligence: Analyzing Online Product Reviews for Preference Measurement. Inf. Manag. 2016, 53, 169–182. [Google Scholar] [CrossRef]
Bi, J.-W.; Liu, Y.; Fan, Z.-P.; Zhang, J. Exploring Asymmetric Effects of Attribute Performance on Customer Satisfaction in the Hotel Industry. Tour. Manag. 2020, 77, 104006. [Google Scholar] [CrossRef]
Wang, T.; Wang, W.; Feng, J.; Fan, X.; Guo, J.; Lei, J. A Novel User-Generated Content-Driven and Kano Model Focused Framework to Explore the Impact Mechanism of Continuance Intention to Use Mobile APPs. Comput. Hum. Behav. 2024, 157, 108252. [Google Scholar] [CrossRef]
Bi, J.-W.; Liu, Y.; Fan, Z.-P.; Cambria, E. Modelling Customer Satisfaction from Online Reviews Using Ensemble Neural Network and Effect-Based Kano Model. Int. J. Prod. Res. 2019, 57, 7068–7088. [Google Scholar] [CrossRef]
Liu, H.; Wu, S.; Zhong, C.; Liu, Y. The Effects of Customer Online Reviews on Sales Performance: The Role of Mobile Phone’s Quality Characteristics. Electron. Commer. Res. Appl. 2023, 57, 101229. [Google Scholar] [CrossRef]
Mittal, V.; Ross, W.T., Jr.; Baldasare, P.M. The Asymmetric Impact of Negative and Positive Attribute-Level Performance on Overall Satisfaction and Repurchase Intentions. J. Mark. 1998, 62, 33–47. [Google Scholar] [CrossRef]
Park, S.-M. Root Cause Analysis Based on Relations Among Sentiment Words. Cogn. Comput. 2021, 13, 903–918. [Google Scholar] [CrossRef]
Huang, H.; Zeng, X.; Ge, L.; Sun, K. Social Media Engagement in Waste Sorting: The Role of Sentiment in Shaping Public Awareness. Humanit. Soc. Sci. Commun. 2025, 12, 1763. [Google Scholar] [CrossRef]
Becerra, C.E.T.; de Melo, F.J.C.; Xavier, L.d.A.; de Albuquerque, A.P.G.; Barbosa, A.A.L.; de Oliveira, L.A.B.; de Carvalho, R.S.M.C.; de Medeiros, D.D. A Holistic Quality Improvement Model for Food Services: Integrating Fuzzy Kano and PROMETHEE II. Systems 2024, 12, 422. [Google Scholar] [CrossRef]
Kumar, A.; Bala, P.K.; Chakraborty, S.; Behera, R.K. Exploring Antecedents Impacting User Satisfaction with Voice Assistant App: A Text Mining-Based Analysis on Alexa Services. J. Retail. Consum. Serv. 2024, 76, 103586. [Google Scholar] [CrossRef]
Yi, J.; Oh, Y.K. The Informational Value of Multi-Attribute Online Consumer Reviews: A Text Mining Approach. J. Retail. Consum. Serv. 2022, 65, 102519. [Google Scholar] [CrossRef]
Arif, M.H.; Li, J.; Iqbal, M.; Liu, K. Sentiment Analysis and Spam Detection in Short Informal Text Using Learning Classifier Systems. Soft Comput. 2018, 22, 7281–7291. [Google Scholar] [CrossRef]
Xu, D.; Tian, Z.; Lai, R.; Kong, X.; Tan, Z.; Shi, W. Deep Learning Based Emotion Analysis of Microblog Texts. Inf. Fusion 2020, 64, 1–11. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, Z.; Miao, D.; Wang, J. Three-Way Enhanced Convolutional Neural Networks for Sentence-Level Sentiment Classification. Inf. Sci. 2019, 477, 55–64. [Google Scholar] [CrossRef]
Li, W.; Zhu, L.; Shi, Y.; Guo, K.; Cambria, E. User Reviews: Sentiment Analysis Using Lexicon Integrated Two-Channel CNN–LSTM Family Models. Appl. Soft Comput. 2020, 94, 106435. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, A.; Liu, D.; Bian, Y. Customer Preferences Extraction for Air Purifiers Based on Fine-Grained Sentiment Analysis of Online Reviews. Knowl.-Based Syst. 2021, 228, 107259. [Google Scholar] [CrossRef]
Bello, A.; Ng, S.-C.; Leung, M.-F. A BERT Framework to Sentiment Analysis of Tweets. Sensors 2023, 23, 506. [Google Scholar] [CrossRef]
Wu, H.; Zhang, Z.; Shi, S.; Wu, Q.; Song, H. Phrase Dependency Relational Graph Attention Network for Aspect-Based Sentiment Analysis. Knowl.-Based Syst. 2022, 236, 107736. [Google Scholar] [CrossRef]
Chen, K.; Zheng, J.; Jin, J. Ranking Products through Online Opinions: A Text Analysis and Regret Theory-Based Approach. Appl. Soft Comput. 2024, 158, 111571. [Google Scholar] [CrossRef]
Darko, A.P.; Antwi, C.O.; Adjei, K.; Zhang, B.; Ren, J. Predicting Determinants Influencing User Satisfaction with Mental Health App: An Explainable Machine Learning Approach Based on Unstructured Data. Expert Syst. Appl. 2024, 249, 123647. [Google Scholar] [CrossRef]
Zhao, A.; Yu, Y. Knowledge-Enabled BERT for Aspect-Based Sentiment Analysis. Knowl.-Based Syst. 2021, 227, 107220. [Google Scholar] [CrossRef]
Xu, W. Understanding Customer Complaints from Negative Online Hotel Reviews: A BERT-Based Deep Learning Approach. Int. J. Hosp. Manag. 2025, 126, 104057. [Google Scholar] [CrossRef]
Liu, Y.; You, T.-H.; Zou, J.; Cao, B.-B. Modelling Customer Requirement for Mobile Games Based on Online Reviews Using BW-CNN and S-Kano Models. Expert Syst. Appl. 2024, 258, 125142. [Google Scholar] [CrossRef]
He, X.; Xu, H.; Li, J.; He, L.; Yu, L. FastBTM: Reducing the Sampling Time for Biterm Topic Model. Knowl.-Based Syst. 2017, 132, 11–20. [Google Scholar] [CrossRef]
Cheng, X.; Yan, X.; Lan, Y.; Guo, J. BTM: Topic Modeling over Short Texts. IEEE Trans. Knowl. Data Eng. 2014, 26, 2928–2941. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
Janizadeh, S.; Thi Kieu Tran, T.; Bateni, S.M.; Jun, C.; Kim, D.; Trauernicht, C.; Heggy, E. Advancing the LightGBM Approach with Three Novel Nature-Inspired Optimizers for Predicting Wildfire Susceptibility in Kauaʻi and Molokaʻi Islands, Hawaii. Expert Syst. Appl. 2024, 258, 124963. [Google Scholar] [CrossRef]
Wen, X.; Xie, Y.; Wu, L.; Jiang, L. Quantifying and Comparing the Effects of Key Risk Factors on Various Types of Roadway Segment Crashes with LightGBM and SHAP. Accid. Anal. Prev. 2021, 159, 106261. [Google Scholar] [CrossRef]
Wu, S.; Wu, S.; Chen, J.; Pan, C. Predicting Geriatric Environmental Safety Perception Assessment Using LightGBM and SHAP Framework. Sci. Rep. 2025, 15, 27444. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 4765–4774. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
Joung, J.; Kim, H. Interpretable Machine Learning-Based Approach for Customer Segmentation for New Product Development from Online Product Reviews. Int. J. Inf. Manag. 2023, 70, 102641. [Google Scholar] [CrossRef]
Mitchell, R.; Frank, E.; Holmes, G. GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles. PeerJ Comput. Sci. 2022, 8, e880. [Google Scholar] [CrossRef]
Ji, G.; Ng, S.I.; Cheah, J.-H.; Choo, W.-C. Understanding Asymmetric Effects of Attribute Performance on Tourist Satisfaction with Island Tourism Using User-Generated Data. J. Hosp. Tour. Insights 2023, 7, 2704–2722. [Google Scholar] [CrossRef]
Mikulić, J.; Šerić, M.; Krešić, D. Asymmetric Effects of Wellness Destination and Wellness Facility Attributes on Tourist Satisfaction. Tour. Rev. 2024, 79, 969–980. [Google Scholar] [CrossRef]
Kano, N.; Seraku, N.; Takahashi, F.; Tsuji, S. Attractive Quality and Must-Be Quality. J. Jpn. Soc. Qual. Control 1984, 14, 147–156. [Google Scholar] [CrossRef]
Matzler, K.; Bailom, F.; Hinterhuber, H.H.; Renzl, B.; Pichler, J. The Asymmetric Relationship between Attribute-Level Performance and Overall Customer Satisfaction: A Reconsideration of the Importance–Performance Analysis. Ind. Mark. Manag. 2004, 33, 271–277. [Google Scholar] [CrossRef]
Wang, J.; Sun, K.; Liu, P.; Zhang, K.; Feng, L.; Wu, X.; Zhang, Z. Dynamic Elicitation and Forecasting Innovation Requirement of Smart Product-Service System via User-Manufacturer Value Co-Creation Perspective Using Multi-Source Data. Comput. Ind. Eng. 2024, 197, 110511. [Google Scholar] [CrossRef]
Joung, J.; Kim, H.M. Explainable Neural Network-Based Approach to Kano Categorisation of Product Features from Online Reviews. Int. J. Prod. Res. 2022, 60, 7053–7073. [Google Scholar] [CrossRef]
Tang, T.; Fang, E.; Wang, F. Is Neutral Really Neutral? The Effects of Neutral User-Generated Content on Product Sales. J. Mark. 2014, 78, 41–58. [Google Scholar] [CrossRef]
Colquitt, J.A.; Sabey, T.B.; Rodell, J.B.; Hill, E.T. Content Validation Guidelines: Evaluation Criteria for Definitional Correspondence and Definitional Distinctiveness. J. Appl. Psychol. 2019, 104, 1243–1265. [Google Scholar] [CrossRef]
Dawes, J. Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales. Int. J. Mark. Res. 2008, 50, 61–104. [Google Scholar] [CrossRef]

Figure 1. Kano model.

Figure 2. The research framework is proposed in this study.

Figure 3. Biterm Topic Model.

Figure 4. The BERT-A-Conv model.

Figure 5. MCD-Kano model.

Figure 6. Attribute improvement priority assessment results.

Figure 7. Comparison between the BTM (a) and the LDA model (b).

Table 1. Input structure of the LightGBM model.

Review	Rating (R)	Attribute
		Attribute₁			Attribute₂			…
		X₁^pos	X₁^neu	X₁^neg	X₂^pos	X₂^neu	X₂^neg	…
1	1	1	0	0	0	1	0	…
2	0	0	0	1	1	0	0	…
…	…	…	…	…	…	…	…	…

Note: The ellipsis “…” indicates omitted items following the same structure.

Table 2. Detailed information on smartwatches.

Product Name	Release Price	Release Time	Number of Valid Reviews
Huawei Watch GT 5	1188	September 2024	4062
Honor Watch 4 Pro	1599	October 2023	3159
Oppo Watch X2	2249	February 2025	5402
Huawei Watch GT 5 Pro	2189	September 2024	3356
Xiaomi Watch S4	999	January 2024	4181
OnePlus Watch 2	1799	July 2024	4692
Huawei Watch Fit 3	949	May 2024	6831
Huawei Watch D2	2888	November 2024	3957

Table 3. Product attributes and keywords.

Attribute	Keywords
Connectivity	Bluetooth, Calling, NFC, eSIM, Connection
Fitness tracking	Monitoring, Exercise, Sleep, Heart rate, Positioning
App Support	System, App, Compatibility, Installation, Synchronization
Design	Texture, Beautiful, Color, Fashion, Dial, Theme
Battery	Power consumption, Charging speed, Standby, Fast charge, Magnetic charging
Price	Cost performance, Government subsidy, Cheap, Budget, Cost-effective
Service	Response, After-sales, Customer service, Delivery, Repair
Quality	Waterproof, Wear-resistant, Dustproof, Swimming, Material
Operate	Responsive, Convenience, Lag, Switching, Interface
Wearing	Comfortable, Weight, Strap, Lightweight, Adjustment
Practicality	Payment, Travel, Weather, Notification, WeChat

Table 4. MCD-Kano-based classification of product attributes.

Attribute	MCD_i^pos⁻^neu	MCD_i^neu⁻^neg	Kano Category
App support	0.612	0.107	A
Connectivity	0.481	0.599	O
Design	0.768	−0.283	A
Service	−0.144	0.898	M
Practicality	0.874	1.492	O
Battery	1.008	0.239	A
Price	1.355	−0.283	A
Wearing	−0.455	0.952	M
Fitness tracking	0.766	1.226	O
Operate	0.724	1.304	O
Quality	0.997	0.169	A

Note: O, One-dimensional attribute; A, Attractive attribute; M, Must-be attribute.

Table 5. Performance of each attribute.

Attribute	Performance	Normalized Performance
App support	0.443	0.178
Connectivity	1.108	0.578
Design	0.224	0.046
Service	1.371	0.737
Practicality	1.809	1.000
Battery	0.352	0.123
Price	0.778	0.380
Wearing	0.148	0.000
Fitness tracking	0.510	0.218
Operate	1.105	0.576
Quality	0.651	0.303

Table 6. Sensitivity analysis of MCD-Kano classification under different threshold settings.

Attribute	ς = 0.02, δ = 1.5	ς = 0.02, δ = 2.0	ς = 0.02, δ = 3.0	ς = 0.05, δ = 1.5	ς = 0.05, δ = 2.0	ς = 0.05, δ = 3.0	ς = 0.10, δ = 1.5	ς = 0.10, δ = 2.0	ς = 0.10, δ = 3.0
App support	A	A	A	A	A	A	A	A	A
Connectivity	O	O	O	O	O	O	O	O	O
Design	A	A	O	A	A	O	A	A	O
Service	M	M	M	M	M	M	M	M	M
Practicality	O	O	O	O	O	O	O	O	O
Battery	A	A	A	A	A	A	A	A	A
Price	A	A	A	A	A	A	A	A	A
Wearing	M	M	O	M	M	O	M	M	O
Fitness tracking	O	O	O	O	O	O	O	O	O
Operate	O	O	O	O	O	O	O	O	O
Quality	A	A	A	A	A	A	A	A	A

Note: O, One-dimensional attribute; A, Attractive attribute; M, Must-be attribute.

Table 7. Performance comparison of sentiment analysis models with Wilcoxon signed-rank test results.

Model	Macro Precision	Macro Recall	Macro F1	Mean F1 Difference	Wilcoxon W	One-Sided p-Value
BERT-A-Conv	0.926 ± 0.004	0.941 ± 0.005	0.933 ± 0.004	-	-	-
BERT	0.855 ± 0.003	0.821 ± 0.006	0.836 ± 0.004	0.097	15	0.031
BERT-CNN	0.826 ± 0.003	0.857 ± 0.007	0.841 ± 0.003	0.093	15	0.031
CNN	0.819 ± 0.012	0.853 ± 0.009	0.834 ± 0.005	0.099	15	0.031
BERT-attention	0.861 ± 0.008	0.812 ± 0.008	0.834 ± 0.008	0.099	15	0.031

Note: The Mean F1 difference denotes the average paired difference in F1-score between BERT-A-Conv and each compared model across the five runs.

Table 8. Comparison Results of Kano Classifications.

Attribute	Joung and Kim’s Method	Our Proposed Method	Questionnaire Results
App support	A	A	A
Connectivity	O	O	O
Design	A	A	A
Service	O	M	M
Practicality	O	O	O
Battery	A	A	A
Price	A	A	A
Wearing	O	M	M
Fitness tracking	O	O	O
Operate	O	O	O
Quality	A	A	A

Note: O, One-dimensional attribute; A, Attractive attribute; M, Must-be attribute.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, H.; Li, Y. Mining Customer Satisfaction from Online Reviews: An Explainable Kano-Based Framework for Product Improvement. Systems 2026, 14, 585. https://doi.org/10.3390/systems14050585

AMA Style

Yu H, Li Y. Mining Customer Satisfaction from Online Reviews: An Explainable Kano-Based Framework for Product Improvement. Systems. 2026; 14(5):585. https://doi.org/10.3390/systems14050585

Chicago/Turabian Style

Yu, Huiru, and Yanlai Li. 2026. "Mining Customer Satisfaction from Online Reviews: An Explainable Kano-Based Framework for Product Improvement" Systems 14, no. 5: 585. https://doi.org/10.3390/systems14050585

APA Style

Yu, H., & Li, Y. (2026). Mining Customer Satisfaction from Online Reviews: An Explainable Kano-Based Framework for Product Improvement. Systems, 14(5), 585. https://doi.org/10.3390/systems14050585

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mining Customer Satisfaction from Online Reviews: An Explainable Kano-Based Framework for Product Improvement

Abstract

1. Introduction

2. Related Works

2.1. Kano Classification and Attribute Ranking

2.2. Sentiment Analysis Based on Online Reviews

3. Methodology

3.1. Data Acquisition and Pre-Processing

3.2. Product Attribute Extraction

3.3. Attribute Sentiment Analysis

3.4. Construction of the MCD-Kano Model

3.5. Attribute Improvement Priority Assessment Based on MCD

4. Case Study

4.1. Data Description and Pre-Processing Results

4.2. Product Attribute Extraction Results

4.3. Attribute Sentiment Analysis Results

4.4. MCD-Kano Classification Results

4.5. Attribute Improvement Priority Assessment Results Based on MCD

4.6. Sensitivity Analysis and Comparative Evaluation

4.6.1. Sensitivity Analysis of Threshold Parameters

4.6.2. Comparison of Product Attribute Extraction Model

4.6.3. Comparison of Sentiment Analysis Models

4.6.4. Comparison of Kano-Based Attribute Classification and Prioritization

5. Discussion and Conclusions

5.1. Theoretical Contribution

5.2. Practical Contribution

5.3. Limitations and Future Research Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI