Next Article in Journal
Impact of Audio Feedback on User Experience in Haptic-Visual Mixed Reality Pulse Palpation Training Environments
Previous Article in Journal
The Interplay Between ICT Skills, Employability, and Entrepreneurial Intentions Among University Students in South Africa
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhanced Sentiment Analysis of E-Commerce Product Reviews Using Luong Attention-Based Bi-LSTM

by
Orken Mamyrbayev
1,
Dinara Mussayeva
2 and
Turdybek Kurmetkan
1,3,*
1
Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
2
Institute of Economics of the Committee of Science, Almaty 050010, Kazakhstan
3
Department of Information Systems, Faculty of Information and Artificial Intelligence, Al-Farabi Kazakh National University, Almaty 050010, Kazakhstan
*
Author to whom correspondence should be addressed.
Information 2026, 17(5), 398; https://doi.org/10.3390/info17050398
Submission received: 2 March 2026 / Revised: 17 April 2026 / Accepted: 18 April 2026 / Published: 22 April 2026
(This article belongs to the Section Artificial Intelligence)

Abstract

The rapid growth of e-commerce has highlighted the critical need for efficient customer review sentiment analysis, yet natural language complexities like sarcasm and mixed sentiments remain challenging. To address these ambiguities, this study proposes a novel sentiment analysis architecture. The methodology integrates a bidirectional Long Short-Term Memory (Bi-LSTM) network with a Luong Attention mechanism. The Bi-LSTM component models the sequential and bidirectional context of the text, while the Luong Attention mechanism isolates and emphasizes the most significant parts of the reviews for precise sentiment detection. The proposed hybrid model demonstrates exceptional performance compared to traditional methods, achieving an accuracy of 96.67%, a precision of 96.83%, and a recall of 96.67%, alongside relatively low overfitting. Ultimately, the findings confirm that this architecture effectively manages ambiguous language and is highly capable of large-scale, real-time sentiment analysis, offering robust analytical tools for shaping e-commerce marketing strategies.

1. Introduction

E-commerce has changed both the way customers behave and the spotlight of today’s marketing to understanding customer sentiment [1]. The construction of product recommendations and customer satisfaction, along with marketing strategies, depends on sentiment analysis, especially its accuracy [2]. Every day, billions of reviews are published, and there is a need for an adequate system, both in terms of speed and quality, for processing and analyzing sentiments on a large scale [3] to improve the classification of sentiments, especially e-commerce product reviews [4]. The proposed framework attempts to resolve these issues with the help of the latest deep learning (DL) techniques, including Bidirectional Long Short-Term Memory (Bi-LSTM) with a Luong Attention mechanism [5].
Challenges of sentiment analysis arise because of certain factors [6]. Text data can be difficult to interpret because of language nuances. Text data can be difficult to interpret due to contextual variations and complex language structures in customer reviews [7]. Furthermore, the emotional tone of a product’s features, reviews, and even external events can simultaneously impact consumer sentiment [8]. Advanced deep semantic feature extraction methods help tackle the intricacies of consumer sentiments, which depend on a variety of different, ever-changing factors that traditional models cannot capture [9].
Several existing methods, such as Recurrent Neural Networks (RNNs), Support Vector Machines (SVMs), and Naive Bayes classifiers, work reasonably well with sentiment classification [10]. However, none of them efficiently address sequential and contextual relationships in textual data [11]. In contrast, Bi-LSTM, which is a transformer-based model, can attain higher accuracy and is designed to capture context and long-range dependencies [12]. However, they do not attend to the portions of the review that are most relevant to aggregate sentiment [13]. Furthermore, these models largely ignore the classification relative to the importance of words, which can affect the classification, primarily with subtle sentiment classification [14].
The proposed framework negates the demerits of current approaches with the utilization of Luong Attention-based Bi-LSTM, which is a DL model that utilizes the advantages of Bi-LSTM in sequence learning while Luong Attention identifies the context that is most important. This two-pronged approach provides composite advantages (improved approaches to long-distance context and improved model attendance to the most relevant sections of the text). While enhancing sentiment classification, the framework nourishes e-commerce product review analysis with a correct and robust answer to the challenges posed by the structure of the text and the interrelationships of essential words, as both must be integrated for the sentiment analysis of product review text. It is a hybrid model that integrates Bi-LSTM with a context-sensitive attention mechanism to highlight key aspects for better insights into consumer sentiments that differentiates this work from others.
The Key Contributions of this paper:
  • A unified sentiment analysis framework integrating preprocessing, exploratory data analysis, and Luong attention-based Bi-LSTM for large-scale e-commerce review datasets.
  • A three-class sentiment classification strategy (positive, negative, and neutral) designed to improve the detection of neutral opinions often underrepresented in existing Bi-LSTM attention-based models.
  • An optimized preprocessing pipeline tailored for noisy customer reviews, including text normalization, sentiment labeling, and outlier handling to enhance model robustness.
Extensive evaluation on a multi-category Amazon review dataset containing more than 110,000 reviews, demonstrating improved generalization and stability compared to conventional attention-based Bi-LSTM approaches.
This study introduces methodological advancements beyond conventional attention-based Bi-LSTM architectures by incorporating explicit three-class sentiment modelling with dedicated neutral-class representation. An optimized Luong attention mechanism is integrated and validated through ablation-based evaluation to enhance focus on sentiment-relevant contextual features. The framework is designed as a domain-independent sentiment analysis pipeline and evaluated on a large multi-category dataset to ensure improved generalization. The remainder of this paper is structured as follows. Section 1 reviews the current literature and focuses on pertinent studies and developments. Section 2 provides a detailed description of the proposed methodology and the steps followed in the sentiment analysis model. Section 3 presents experimental results. Finally, Section 4 provides a summary of the key findings and possible areas for future research.

1.1. Literature Survey

First, traditional natural language processing (NLP) and topic modeling techniques have been widely used for analyzing e-commerce reviews. Methods such as Latent Dirichlet Allocation (LDA) have been applied to identify key topics and sentiment patterns in customer feedback. For instance, Liu et al. [15] analyzed online reviews using sentiment classification and social network analysis to understand consumer concerns, while Yuan et al. [16] and Yang et al. [17] utilized LDA-based models to extract themes and sentiment trends from product reviews. Similarly, Chen et al. [18] employed LDA techniques to analyze public sentiment in live-streaming e-commerce platforms. However, these traditional approaches often fail to capture contextual dependencies and complex language patterns in textual data.
To address these limitations, aspect-based and summarization methods have been introduced to better organize and interpret customer opinions. Mabrouk et al. [19] proposed a hierarchical framework combining aspect extraction and opinion summarization, while Guo et al. [20] developed a two-phase analytical model integrating natural language processing with decision-making techniques to identify key satisfaction factors. Although these methods improve interpretability, they still lack deep contextual understanding.
With the advancement of deep learning, models such as Long Short-Term Memory (LSTM) networks have been widely adopted for sentiment analysis. Fan et al. [21] combined LDA with LSTM to enhance feature extraction, while Nichifor et al. [22] utilized machine learning and NLP techniques to analyze large-scale e-commerce reviews. These approaches improve performance compared to traditional methods; however, they are limited in capturing bidirectional context and handling subtle sentiment variations.
More recently, attention-based deep learning models, particularly Bi-LSTM with attention mechanisms, have shown significant improvements in sentiment classification. H. Li et al. [23] proposed an attention-based Bi-LSTM model for three-class sentiment classification, demonstrating enhanced performance on complex datasets. Similarly, Huang et al. [24] introduced an ERNIE-Bi-LSTM model that incorporates dynamic word embeddings and attention mechanisms to address ambiguity and polysemy in textual data. These models highlight the importance of contextual representation and attention mechanisms in improving sentiment analysis accuracy.
In addition to model development, several studies have focused on understanding consumer behavior and sentiment dynamics in different e-commerce contexts. Wang et al. [25] explored consumer behavior using a mixed-methods approach, while Li et al. [26] analyzed customer preferences in fresh food e-commerce using text mining techniques. Ilieva et al. [27] examined customer satisfaction using structural equation modeling, and Fici et al. [28] investigated consumer behavior in metaverse environments. These studies provide valuable insights into real-world applications of sentiment analysis but do not focus on improving model architecture.

1.2. Problem Statement

The growing presence of e-commerce has significantly changed consumer behavior, notably regarding product reviews and customer satisfaction [23].
While research on topic modeling has considerably advanced, many issues make it difficult to effectively capture consumer behavior, such as the complexity of language, changing consumer expectations, and external effects such as pandemics or regulations [21].
Existing research has contributed knowledge related to consumer reviews; however, much of the research does not capture the multilayer aspect of consumer reviews, particularly with cross-border platforms and new technologies. The impact of consumer reviews on e-commerce strategies and changing global trends has not yet been examined in depth, making it worthwhile to explore further [18].

2. Materials and Methods

Figure 1 illustrates the overall workflow of the proposed sentiment analysis framework for Amazon customer reviews. The process begins with data collection, followed by sentiment labeling based on review ratings. Data preprocessing is then performed, including text cleaning, handling missing values, and outlier detection, to prepare the dataset for analysis. Exploratory Data Analysis (EDA) is conducted to examine the distribution of sentiment classes and analyze textual patterns in the reviews. Subsequently, feature extraction and classification are performed using a Luong Attention-based Bi-LSTM model, which captures contextual dependencies and focuses on the most relevant parts of the review text. The final output categorizes the reviews into three sentiment classes, positive, negative, and neutral, providing meaningful insights into customer feedback. The architecture extends beyond conventional Bi-LSTM–attention combinations by introducing several design refinements. The model incorporates explicit three-class sentiment formulation, enabling dedicated representation of neutral opinions, which is generally absent in traditional Bi-LSTM attention frameworks. The Luong attention mechanism is optimized through a context-sensitive scoring refinement that enhances the prioritization of sentiment-bearing tokens. In addition, the architecture is evaluated on a large multi-category dataset, supporting cross-domain generalization, whereas prior Bi-LSTM–attention models typically focus on single-domain inputs.

2.1. Data Collection

The dataset is compiled from Amazon’s customer reviews available on Kaggle and spans from 2013 to 2019. With reviews from multiple categories, such as smartphones, books, laptops, and even refrigerators, it contains additional features such as Category, Review Header, Review Text, Rating, and Own_Rating (which is a sentiment classification). The dataset contains more than 110,000 reviews, and the sentiment ratings are labeled as either positive or negative. This allows it to be a great resource when conducting sentiment analysis, as it provides an understanding of customers’ sentiments. The dataset is free to access and is under a CC0 1.0 Universal license, which means that it can be used freely for any purpose.

2.2. Data Preprocessing

The preprocessing pipeline consists of standard NLP operations such as lowercasing, tokenization, and stop-word removal, which are applied as baseline text normalization steps for noisy e-commerce reviews. Sentiment labeling is performed using rating-based weak supervision, where numerical ratings are mapped into three sentiment classes. This approach is widely adopted in the literature and enables scalable construction of labeled datasets for large-scale sentiment analysis tasks.

2.2.1. Text Cleaning

Text cleaning is performed as a standard preprocessing step for noisy e-commerce reviews. It focuses on improving text consistency and reducing irrelevant information. Basic normalization operations such as lowercasing, noise removal, stop-word filtering, and tokenization are applied. These steps help convert raw review text into a structured form suitable for further processing. This ensures improved data quality for sentiment analysis tasks.

2.2.2. Sentiment Labeling

Sentiment analysis transforms the prediction problem into a three-class classification problem by assigning a sentiment label to each review, based on its numerical rating. Let r be the review rating, where r {1, 2, 3, 4, 5}. The assigned sentiment label S is given as follows in Equation (1):
S =   Negative     if   r { 1,2 }   Neutral     if   r = 3   Positive     if   r { 4,5 }
Thus, a review with a rating of two was classified as negative, a rating of three as neutral, and a rating of five as positive. This mapping provides a clear and consistent method for projecting numerical ratings onto corresponding categorical sentiment labels for training and evaluating models.

2.2.3. Handling Missing Data

The missing values in the data were replaced with the mean of the available data to impute them. This completes the data and can be trained and analyzed more accurately using Equation (2).
Y i = 1 N j = 1 N   Y j
where Y i denotes the missing value to be imputed, N denotes the total number of non-missing values for the feature, and Y j denotes the non-missing values.

2.2.4. Outlier Detection

Techniques are employed to locate the data elements that deviate substantially from the rest. The detection of outliers is important because they tend to reduce the effectiveness of the model. Consider a review with an extreme rating that is not in line with the sentiment of the text. This review is an outlier. Outliers can be detected using the Z -score or Interquartile Range (IQR). The Z -score for a feature X (e.g., the length of a review or its rating), is defined as follows in Equation (3):
Z = X μ σ
where μ denotes the average value, and σ is the standard deviation. An outlier is defined as any data point with a Z-score beyond a defined threshold, for example, | Z | > 3 , whereas all other data points are inliers.
The proposed framework extends beyond conventional sentiment analysis pipelines through the integration of structured preprocessing and analytical stages within a unified workflow. The preprocessing module incorporates sentiment-balanced labeling, systematic noise reduction, and outlier detection, which enhance data consistency compared with standard text-cleaning approaches. In addition, exploratory data analysis is embedded within the pipeline to guide feature understanding and class distribution balancing, rather than being treated as a separate analytical step.

2.3. EDA of Sentiment Distribution and Textual Patterns

Exploratory Data Analysis (EDA) is performed to examine the distribution of sentiment classes within the dataset. The proportion of reviews in each category (positive, negative, and neutral) is analyzed to identify class imbalance or skewness. These distributions are visualized using bar charts and histograms to better understand the overall composition of the dataset.
In addition, textual analysis is conducted to identify frequently occurring words and phrases associated with different sentiment classes. The frequency of terms in positive and negative reviews is analyzed to understand their contribution to sentiment classification. By comparing word usage across sentiment categories, important emotionally indicative terms can be identified, which helps improve feature selection and model performance.
Overall, EDA ensures that the dataset is well-understood and properly structured, providing a strong foundation for training an effective sentiment analysis model.

2.4. Proposed Framework Overview

Table 1 presents a comparison between the proposed framework and existing attention-based Bi-LSTM models. The comparison highlights key aspects such as the preprocessing pipeline, support for neutral sentiment classification, dataset diversity, attention mechanism, and unified framework integration. Unlike conventional approaches, the proposed model incorporates an optimized preprocessing pipeline, explicit three-class sentiment modeling, and multi-category dataset handling within a unified architecture. This demonstrates the methodological distinction and improved robustness of the proposed approach over existing attention-based Bi-LSTM frameworks.

2.5. Feature Extraction and Classification Using Luong Attention-Based Bi-LSTM

The proposed framework employs a standard Bi-LSTM architecture integrated with Luong attention for sentiment classification. Instead of focusing on detailed mathematical formulations of LSTM components, the methodology emphasizes an optimized pipeline designed for noisy e-commerce review data, where attention is used as a feature refinement mechanism for sentiment-relevant token selection. The proposed framework extends standard Bi-LSTM–attention models by introducing a noise-robust sentiment analysis pipeline with explicit three-class sentiment modeling, where Luong attention acts as a feature refinement mechanism to enhance the focus on sentiment-relevant contextual information in large-scale e-commerce review data.
The attention-weighted representation is passed through a fully connected layer, followed by a softmax activation function to classify the input into three sentiment categories, positive, negative, and neutral, as illustrated in Figure 2.

2.5.1. Input Layer

Word embeddings or attribute sets are taken as inputs to the Input Layer in the text data, where x T is the embedding at step t , and T is the number of words in the review. The input sequence is usually generated using word embeddings such as Word2Vec, GloVe, or BERT, implying that x t is the embedding or high-dimensional vector representation of the word at step t . Therefore, the input sequence can be written as follows in Equation (4):
X = x 1 , x 2 , , x T ,   where   x t R d
where R d represents the d -dimensional space of word embedding for each word in the sequence.

2.5.2. Backward Pass Gate

In the Bi-LSTM architecture, the backward pass processes the input sequence from right to left, enabling the model to capture contextual information from future tokens. This complements the forward pass, which captures past context. The outputs from both forward and backward passes are combined to form a comprehensive representation of the input sequence, improving the model’s ability to understand contextual dependencies for sentiment classification.

2.5.3. Bi-LSTM

Bi-LSTM enhances sequential modeling by processing input data in both forward (left to right) and backward (right to left) directions. In the forward pass, the hidden state is computed as in Equation (5):
h t = o t t a n h C t
where h t captures past context up to time step t . In the backward pass, the hidden state is computed as in Equation (6):
h t ` = o t ` t a n h C ` t
o t represents future contexts from the end of the sequence back to t . At each step, the forward and backward states are concatenated to form a richer representation in Equation (7).
h t B i L S T M = [ h t , h ` t ]
h t , h ` t which combine both past and future information. This combined hidden state is then used in the downstream layers (e.g., fully connected with softmax) for tasks such as sentiment analysis, enabling the model to leverage complete contextual information for more accurate predictions.

2.5.4. Luong Attention Layer

The Luong attention mechanism is applied on top of the Bi-LSTM outputs to identify and emphasize important words in the input sequence. It assigns attention weights to each hidden state, allowing the model to focus on sentiment-relevant parts of the review. The weighted combination of hidden states forms a context vector, which captures the most informative features of the input sequence and is used for final sentiment classification.
This architecture supports the model to represent bidirectional context and focus on the most relevant input information through the attention mechanism shown in Figure 3.

2.5.5. Fully Connected Layer

This layer in the Bi-LSTM-based model serves as the final stage that transforms concatenated hidden states into class-specific scores. The combined hidden state h t B i L S T M is passed through a dense layer, where a linear transformation with weights and bias is applied in Equation (8):
Z = W f c h t B i L S T M + b f c
where W f c is the weight matrix, b f c is the bias term, and Z represents the unnormalised scores (logits) for each sentiment class. To convert these scores into interpretable probabilities, the softmax activation function is applied in Equation (9):
P ( Class   = c ) = e Z c c     e Z c
where Z c is the score for the class c , and the denominator normalizes across all possible classes. This ensures that the outputs form a valid probability distribution across categories (Positive, Negative, Neutral). The model determines the predicted sentiment by selecting the class with the highest probability of occurrence. This step is crucial because it bridges high-dimensional Bi-LSTM representations with the final classification decision, enabling accurate sentiment prediction.

2.5.6. Output Layer

The Bi-LSTM-based sentiment classification model produces a final prediction by selecting one of the three sentiment categories: Positive, Negative, or Neutral. After the fully connected layer computes the logits and the softmax function converts them into probabilities for each class, it chooses the class with the maximum probability as the final decision. This is mathematically expressed as in Equation (10):
y ` = a r g   m a x c   P ( Class   = c )
where y ` is a predicted sentiment label and P ( Class = c ) is the softmax probability for class c . Class with the highest probability is assigned as the sentiment of the input text (e.g., e-commerce review), ensuring a clear and interpretable classification outcome.

2.6. Experimental Setup

For experimental evaluation, the dataset consists of approximately 110,000 Amazon reviews. The dataset is divided into 80% training data (88,000 samples) and 20% testing data (22,000 samples). The model is trained using the Adam optimizer with a learning rate of 0.001. Pre-trained GloVe embeddings with 100 dimensions are used to represent the input text. A batch size of 64 and 20 training epochs is employed. The implementation is carried out using TensorFlow in a GPU-enabled environment. A dropout rate of 0.5 is applied to reduce overfitting, and early stopping based on validation loss is used to improve generalization.
This architecture was selected because Bi-LSTM effectively captures long-range review context, while the Luong Attention mechanism enhances focus on key sentiment-bearing words. Compared with conventional CNN, SVM, or basic LSTM models, this combination offers stronger semantic understanding with lower computational cost, making it suitable for large-scale e-commerce review analysis.

3. Results and Discussion

The study proves that the classification model works perfectly on three classes: Positive, Negative, and Neutral. It attains a very high accuracy, with only a few minutes of misclassification in negative instances. From the confusion matrix, ROC curve, and precision-recall curve, one can infer that separates classes with very little error. It also acts on extremes with precision and recall. Furthermore, their calculations for accuracy, precision, recall, and F1-score are also high. In addition, the training and validation metrics are good at generalizing data, meaning that they do not overfit to a great degree. However, there is a steady advancement through the passing of epochs.
The confusion matrix shown in Figure 4 is a scaled representation for visualization purposes. However, all evaluation metrics are computed using the complete test dataset consisting of 22,000 samples. The model effectively distinguishes between positive, negative, and neutral classes with minimal misclassification. The reported performance metrics, including accuracy, precision, recall, and F1-score, are computed on the complete test dataset to ensure their reliability and statistical significance.
Table 2 presents the robustness evaluation of the proposed model using different evaluation strategies. The model is first assessed using a standard 80/20 train–test split on the Amazon reviews dataset to obtain baseline performance. In addition, k-fold cross-validation is considered to further validate the stability and consistency of the model across different data partitions. This evaluation demonstrates that the model maintains reliable performance under varying data splits, indicating its robustness.
The AUC of the green curve (positive) was 1.000000, which indicates that it can be perfectly classified. The orange curve (negative) has an AUC of 0.996250, which implies that it has performed well but with minimal error. The AUC of the red curve (neutral) was 0.998750, which is also excellent. Each of the three curves is near the upper-left corner, which indicates that the model has high true-positive rates and low false-positive rates, as shown in Figure 5.
Figure 6 presents the precision–recall curves for the three sentiment classes. The curves are concentrated near the upper-right region, indicating high precision and recall values across all classes. This reflects the strong classification performance of the proposed model.
Figure 7 compares the false-negative rate (FNR) and false-positive rate (FPR) of the classification model. The FNR, denoted by the blue bar, is 0.033, indicating that approximately 3.3 percent of the real positive cases are misclassified as negative. The purple bar shows FPR = 0.017, or 1.7 of the real negative cases are misclassified as positive. The graph indicates that the false-negative rate is higher than the false-positive rate, although it is also relatively low.
This is an Accuracy of 0.9667, indicating that 96.67 percent of the predictions were correct. The accuracy was 0.9683 and 96.83 percent of the positive predictions were correct. Recall = 0.9667, which means that 96.67 percent of the true positives were detected. F1-Score was 0.9662, which is an equal show of precision and recall. It performs well on all important measures, where the values are approximately 1, as shown in Figure 8.
Table 3 provides a concise comparison between the proposed model’s experimental findings and the key outcomes reported in existing e-commerce literature. The results show that prior studies mainly focused on topic extraction, aspect summarization, or domain-specific sentiment trends, whereas the proposed model delivers higher sentiment accuracy and more precise polarity detection. By outperforming or complementing the limitations of earlier work, the model offers clear practical benefits for e-commerce analytics, including improved customer understanding, enhanced review summarization, scalable sentiment monitoring, and more reliable analysis of service-quality factors.
A comprehensive comparison was conducted between the proposed framework and recent transformer-based sentiment analysis models. Transformer architectures, including BERT and RoBERTa, are known for their strong contextual representation capabilities. However, the proposed Luong attention-based Bi-LSTM model achieves competitive classification performance while maintaining relatively lower computational complexity. This highlights the suitability of the proposed approach for large-scale e-commerce review sentiment analysis. Moreover, the methodological novelty of the framework lies in the integration of Bi-LSTM with the Luong attention mechanism, which facilitates effective bidirectional contextual learning and enhances the model’s ability to focus on sentiment-relevant features. This integration enables a balanced trade-off between performance and computational efficiency when compared with transformer-based approaches.
Table 4 presents a performance comparison between the proposed Luong attention-based Bi-LSTM framework and recent transformer-based sentiment analysis models. The comparison includes evaluation metrics such as accuracy, precision, recall, and F1-score. Although transformer models achieve slightly higher accuracy, the proposed framework demonstrates competitive performance with lower computational complexity, making it suitable for large-scale e-commerce review sentiment classification.
Comprehensive and fair evaluation, the proposed Luong attention-based Bi-LSTM model was compared with several baseline approaches, including traditional machine learning models such as Support Vector Machine (SVM) and Random Forest, as well as deep learning models including Convolutional Neural Networks (CNNs) and standard Bi-LSTM. In addition, a transformer-based model (BERT) was considered for comparison. All models were trained and evaluated on the same Amazon review dataset using identical preprocessing steps and an 80/20 train-test split to maintain consistency.
The comparative results demonstrate that the proposed model significantly outperforms traditional machine learning models and standard deep learning approaches, while achieving competitive performance with transformer-based models. Notably, the proposed model maintains lower computational complexity compared to transformer architectures, making it more suitable for large-scale and real-time sentiment analysis applications, as shown in Table 5.
Figure 9 shows the training and validation accuracy over 60 epochs. Both accuracies increase steadily during training, indicating effective learning. The training accuracy approaches 1.0, while the validation accuracy reaches approximately 0.95. The small gap between them suggests good generalization with minimal overfitting.
The model was trained and validated over 60 epochs, as shown in Figure 10. Training loss (dark green) and validation loss (light green) also decline considerably in the first epochs, approximately 0.8 to 0.2 or lower, meaning that the model has become better. Once they hit a low point, the training loss level is maintained at a small fluctuation and approaches 0.1. Nevertheless, there is a slight variation in the loss of validation, particularly at epoch 40, which indicates that there is a certain degree of variation. Overall, the model demonstrated a steady improvement with both losses becoming closer to a smaller value, reflecting effective learning and low overfitting.
Mobile has the largest number of reviews, with more than 20,000 reviews, most of which are 5-star (light blue) and 4-star (dark gray). The reviews of SmartTV are minimal, approximately 14,000, and they are both 4-star and 5-star. Mobile Accessories and Refrigerators have medium reviews, and Mobile Accessories have a more balanced ratio of reviews, with the majority of 3-star and 2-star reviews. The number of books with the smallest number of reviews, less than 5000, is evenly distributed in terms of different ratings. This chart shows that products related to mobile phones are rated higher than other products in Figure 11.
Figure 12 shows the review length (15,000–0 characters) on the x-axis and the rating (1–5) on the y-axis. The bubble size shows the number of reviews in that category, where larger bubbles have a higher number of reviews. As can be seen, the length of review is longer in those that were rated higher (4.5 and 5 stars, highlighted in yellow and light orange, respectively), and the length of review is more than 5000 characters. Conversely, lower rated reviews (1 and 2 stars, indicated in purple and blue, respectively) are shorter, and the length is mostly 0–5000 characters.
Figure 13 shows the distribution of review lengths, with the x-axis ranging from 0 to 15,000 characters and the y-axis representing review density. A sharp peak near zero indicates that a large proportion of reviews are very short, often consisting of only a few words or sentences. As review length increases, the density decreases rapidly, suggesting that longer reviews are less common. Overall, the distribution is highly left-skewed, with most reviews being short and only a small proportion approaching 15,000 characters.
Figure 14 illustrates the relationship between review ratings and word count. The x-axis represents ratings from 1 to 5, while the y-axis shows the number of words in each review. Reviews with 1-star ratings are generally shorter, with most containing fewer than 500 words and concentrated in the lower range. In contrast, higher-rated reviews, particularly 4-star and 5-star, tend to have longer word counts, with several exceeding 1000 words and some extending beyond 2500 and 3000 words. This indicates a positive association between higher ratings and longer reviews.
Figure 15 shows the Review Length by Sentiment distribution of the review length by sentiment rating, which is divided into Positive, Neutral and Negative. The x-axis represents a sentiment rating, and the y-axis represents the length of a review in terms of the number of characters. In the case of positive reviews, there are more characters in the medium of review, with a few reviews exceeding 17,500 characters, signaled by outliers. Positive reviews are more widespread and have a longer tail, which indicates that positive reviews are assumed to be longer and more detailed. Neutral reviews are shorter in terms of their length, with a median of approximately 300 characters, and most reviews are within a smaller range, embodying less detailed feedback. The pattern of negative reviews is the same as that of neutral reviews, except that review length is a bit longer, with some having up to 1000 characters.
The Rating vs. Review Length and Word Count scatter plot matrix represents associations that exist between review length and word count for various ratings (1–5). The scatter plots indicate that the longer the review (high review length), the higher the number of words used in the review, and the higher the rating (4 and 5 stars specifically, denoted by purple and red). The length and number of words in reviews with a 1-star rating (blue) are usually shorter. There was a high correlation between plot review length and word count. Reviews with a large word count (between 500 and 3000 words) are likely to be rated higher, particularly in four- and 5-star ratings. The length of 1-star reviews was significantly shorter, and the majority of them contained less than 500 words. The plots indicate that extended and lengthy reviews are likely to be rated highly, as shown in Figure 16.
Table 6 presents the results of the ablation study conducted to evaluate the impact of the Luong attention mechanism on the performance of the Bi-LSTM model. The standard Bi-LSTM achieves an accuracy of 94.76%, while the proposed Bi-LSTM with Luong attention improves the accuracy to 96.67%. Similar improvements are observed across precision, recall, and F1-score. This performance gain demonstrates that the Luong attention mechanism effectively enhances the model’s ability to focus on sentiment-relevant features within the input text, thereby improving classification accuracy and overall robustness.
The high performance observed in Table 6 is influenced by dataset-specific characteristics, including rating-based weak supervision and domain-constrained e-commerce reviews, which simplify sentiment separation. Therefore, results should be interpreted as task-specific performance under controlled conditions rather than generalizable superiority.
The experimental results demonstrate that the proposed model achieves improved performance compared to existing methods reported in the literature. In particular, traditional machine learning and standard LSTM-based approaches show limitations in capturing contextual dependencies within textual data.
In quantitative evaluation, an error analysis was conducted to understand model limitations. The model shows difficulties in handling sarcastic expressions, mixed sentiment sentences, and context-dependent polarity shifts. In several cases, implicit sentiment cues led to misclassification, particularly in reviews containing contradictory opinions or domain-specific expressions. These observations indicate that while the model performs well on explicit sentiment patterns, its robustness decreases in linguistically complex and ambiguous scenarios.
The reported performance improvements should be interpreted in light of dataset-specific constraints. Since sentiment labels are derived from rating-based weak supervision, the task may inherently exhibit reduced complexity and potential labeling bias. In addition, class imbalance and domain-specific characteristics of e-commerce reviews may influence the evaluation metrics. Therefore, the results reflect strong performance under controlled experimental conditions rather than universal superiority across diverse sentiment analysis scenarios.
The observed high performance is influenced by dataset-specific characteristics, including rating-based weak supervision that produces well-separated sentiment classes and domain-constrained e-commerce review structures. This may reduce task complexity and introduce mild class bias; therefore, the results represent controlled experimental performance rather than generalizable superiority across diverse sentiment datasets.
The dataset is domain-specific to e-commerce reviews, and sentiment labels are derived from rating-based weak supervision, which may introduce labeling bias and reduce task complexity. In addition, potential class imbalance and domain-specific linguistic patterns may influence performance metrics. Therefore, the reported results reflect controlled experimental conditions rather than universal applicability across diverse sentiment analysis datasets.
In contrast, the Bi-LSTM architecture effectively captures both forward and backward contextual information, leading to more accurate sentiment classification. Additionally, the integration of the attention mechanism enables the model to focus on the most relevant features in the input sequence, thereby enhancing classification performance, especially for complex and neutral sentiments. These findings confirm that the proposed approach provides a more robust and efficient framework for sentiment analysis in e-commerce applications compared to prior studies. These results outperform prior studies such as H. Li et al. [23] and Huang et al. [24].

4. Conclusions

This paper presents an improved sentiment analysis system for sentiment analysis of e-commerce product review sentiments using the Bi-LSTM architecture accompanied by the Luong Attention mechanism. Addressing the challenges of reading/reasoning with the blur of textual data (which has layered meanings and complexities) in product review sentiment analysis is very effective through this approach because it allows the model to acknowledge longer-range dependencies within the text and emphasizes the important parts of the text. The findings indicate that the proposed framework is much more efficient in improving sentiment classification accuracy and is better in comparison to other traditional approaches in that it distinguishes between Positive, Negative, and Neutral sentiments with high accuracy and recall. The model exhibited an excellent accuracy of 96.67%, and its precision and recall were 96.83% and 96.67%, respectively. It exhibited good generalization and little overfitting, as demonstrated by its stable training and validation performance. Despite the effectiveness of the proposed model, certain limitations exist. The model is evaluated on a specific dataset, which may limit its generalizability across different domains. Additionally, the approach focuses only on textual data and does not consider multimodal information such as images or user metadata, which may influence sentiment understanding. Furthermore, the model may face challenges in handling complex linguistic patterns such as sarcasm, irony, and context-dependent expressions, which are common in online reviews. The computational efficiency, although better than transformer-based models, may still be improved for large-scale real-time applications. Nonetheless, improvement of the model can be achieved through further investigation of more attention mechanisms or domain-specific characteristics to provide more dedicated insights. Future practice may concentrate on the refinement of the model to address more complicated sentiment variations, for example, sarcasm or context-specific emotions, and the extension of the framework to other languages and various e-commerce data to make it more universal. The proposed framework demonstrates strong performance under controlled experimental conditions; however, the contribution is primarily incremental due to reliance on standard architectures and dataset-specific constraints. Future work may focus on improving generalization across diverse datasets and incorporating more robust handling of complex linguistic phenomena.

Author Contributions

Conceptualization, O.M. and T.K.; methodology, O.M.; software, T.K.; validation, O.M., D.M. and T.K.; formal analysis, D.M.; investigation, D.M. and T.K.; resources, D.M.; data curation, D.M.; writing—original draft preparation, O.M.; writing—review and editing, O.M.; visualization T.K.; supervision, O.M.; project administration, O.M.; funding acquisition, O.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by grants from the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan BR24993001 Creation of a Large Language Model (LLM) to maintain the implementation of the Kazakh language and increase technological progress.

Institutional Review Board Statement

The study did not involve direct interaction with human participants or collect any new private human data. This study is exempt from Institutional Review Board approval.

Informed Consent Statement

This study utilized a publicly available, anonymized dataset from an open-source repository (Kaggle). So individual informed consent was not required for this specific study.

Data Availability Statement

The data presented in this study are available in Amazon Customer Reviews (2013–2019) Sentiment at https://www.kaggle.com/datasets/thedevastator/amazon-customer-reviews-with-2013-2019-sentiment (accessed on 20 January 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Madanchian, M. The role of complex systems in predictive analytics for e-commerce innovations in business management. Systems 2024, 12, 415. [Google Scholar] [CrossRef]
  2. Diekson, Z.A.; Prakoso, M.R.B.; Putra, M.S.Q.; Syaputra, M.S.A.F.; Achmad, S.; Sutoyo, R. Sentiment analysis for customer review: Case study of Traveloka. Procedia Comput. Sci. 2023, 216, 682–690. [Google Scholar] [CrossRef]
  3. Idries, A.; Krogstie, J.; Rajasekharan, J. Dynamic capabilities in electrical energy digitalization: A case from the Norwegian ecosystem. Energies 2022, 15, 8342. [Google Scholar] [CrossRef]
  4. Bello, A.; Ng, S.-C.; Leung, M.-F. A BERT framework to sentiment analysis of tweets. Sensors 2023, 23, 506. [Google Scholar] [CrossRef] [PubMed]
  5. Zhang, X.; Guo, C. Research on multimodal prediction of e-commerce customer satisfaction driven by big data. Appl. Sci. 2024, 14, 8181. [Google Scholar] [CrossRef]
  6. Taherdoost, H.; Madanchian, M. Artificial intelligence and sentiment analysis: A review in competitive research. Computers 2023, 12, 37. [Google Scholar] [CrossRef]
  7. Dubey, P.; Dubey, P.; Bokoro, P.N. Unpacking sarcasm: A contextual and transformer-based approach for improved detection. Computers 2025, 14, 95. [Google Scholar] [CrossRef]
  8. Nicolescu, L.; Tudorache, M.T. Human-computer interaction in customer service: The experience with AI chatbots—A systematic literature review. Electronics 2022, 11, 1579. [Google Scholar] [CrossRef]
  9. Xiong, S.; Tian, W.; Si, H.; Zhang, G.; Shi, L. A survey of the applications of text mining for the food domain. Algorithms 2024, 17, 176. [Google Scholar] [CrossRef]
  10. Hassan, S.U.; Ahamed, J.; Ahmad, K. Analytics of machine learning-based algorithms for text classification. Sustain. Oper. Comput. 2022, 3, 238–248. [Google Scholar] [CrossRef]
  11. Wang, H.; Li, J.; Wu, H.; Hovy, E.; Sun, Y. Pre-trained language models and their applications. Engineering 2023, 25, 51–65. [Google Scholar] [CrossRef]
  12. Kokab, S.T.; Asghar, S.; Naz, S. Transformer-based deep learning models for the sentiment analysis of social media data. Array 2022, 14, 100157. [Google Scholar] [CrossRef]
  13. Saoualih, A.; Safaa, L.; Bouhatous, A.; Bidan, M.; Perkumienė, D.; Aleinikovas, M.; Šilinskas, B.; Perkumas, A. Exploring the tourist experience of the Majorelle Garden using VADER-based sentiment analysis and the latent Dirichlet allocation algorithm: The case of TripAdvisor reviews. Sustainability 2024, 16, 6378. [Google Scholar] [CrossRef]
  14. Ghatora, P.S.; Hosseini, S.E.; Pervez, S.; Iqbal, M.J.; Shaukat, N. Sentiment analysis of product reviews using machine learning and pre-trained LLM. Big Data Cogn. Comput. 2024, 8, 199. [Google Scholar] [CrossRef]
  15. Liu, C.; Chen, T.; Pu, Q.; Jin, Y. Text mining for consumers’ sentiment tendency and strategies for promoting cross-border e-commerce marketing using consumers’ online review data. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 125. [Google Scholar] [CrossRef]
  16. Yuan, L.; Liu, H.; Fu, F.; Liu, Y.; Zuo, X.; Li, L. Study of Zhejiang tangerine e-commerce reviews based on natural language processing. Horticulturae 2025, 11, 151. [Google Scholar] [CrossRef]
  17. Yang, Y.; Ma, Y.; Wu, G.; Guo, Q.; Xu, H. The insights, “comfort” effect and bottleneck breakthrough of “e-commerce temperature” during the COVID-19 pandemic. J. Theor. Appl. Electron. Commer. Res. 2022, 17, 1493–1511. [Google Scholar] [CrossRef]
  18. Chen, T.; Tong, C.; Bai, Y.; Yang, J.; Cong, G.; Cong, T. Analysis of the public opinion evolution on the normative policies for the live streaming e-commerce industry based on online comment mining under COVID-19 epidemic in China. Mathematics 2022, 10, 3387. [Google Scholar] [CrossRef]
  19. Mabrouk, A.; Redondo, R.P.D.; Kayed, M. SEOpinion: Summarization and exploration of opinion from e-commerce websites. Sensors 2021, 21, 636. [Google Scholar] [CrossRef]
  20. Guo, P.; Li, H.; Mo, X. Quantifying post-purchase service satisfaction: A topic–emotion fusion approach with smartphone data. Big Data Cogn. Comput. 2025, 9, 125. [Google Scholar] [CrossRef]
  21. Fan, M.; Tang, Z.; Qalati, S.A.; Tajeddini, K.; Mao, Q.; Bux, A. Cross-border e-commerce brand internationalization: An online review evaluation based on Kano model. Sustainability 2022, 14, 13127. [Google Scholar] [CrossRef]
  22. Nichifor, E.; Brătucu, G.; Chițu, I.B.; Lupșa-Tătaru, D.A.; Chișinău, E.M.; Todor, R.D.; Albu, R.-G.; Bălășescu, S. Utilising artificial intelligence to turn reviews into business enhancements through sentiment analysis. Electronics 2023, 12, 4538. [Google Scholar] [CrossRef]
  23. Li, H.; Lu, Y.; Zhu, H.; Ma, Y. A novel AB-CNN model for multi-classification sentiment analysis of e-commerce comments. Electronics 2023, 12, 1880. [Google Scholar] [CrossRef]
  24. Huang, W.; Lin, M.; Wang, Y. Sentiment analysis of Chinese e-commerce product reviews using ERNIE word embedding and attention mechanism. Appl. Sci. 2022, 12, 7182. [Google Scholar] [CrossRef]
  25. Wang, R.; Xu, S.; Li, S.; Pang, Q. Research on influence mechanism of consumer satisfaction evaluation behavior based on grounded theory in social e-commerce. Systems 2024, 12, 572. [Google Scholar] [CrossRef]
  26. Li, Y.; Shen, Z.; Zhao, C.; Chin, K.-S.; Lang, X. Understanding customer opinion change on fresh food e-commerce products and services—Comparative analysis before and during COVID-19 pandemic. Sustainability 2024, 16, 2699. [Google Scholar] [CrossRef]
  27. Ilieva, G.; Yankova, T.; Klisarova, S.; Dzhabarova, Y. Customer satisfaction in e-commerce during the COVID-19 pandemic. Systems 2022, 10, 213. [Google Scholar] [CrossRef]
  28. Fici, A.; Bilucaglia, M.; Casiraghi, C.; Rossi, C.; Chiarelli, S.; Columbano, M.; Micheletto, V.; Zito, M.; Russo, V. From e-commerce to the metaverse: A neuroscientific analysis of digital consumer behavior. Behav. Sci. 2024, 14, 596. [Google Scholar] [CrossRef]
Figure 1. Overall proposed framework.
Figure 1. Overall proposed framework.
Information 17 00398 g001
Figure 2. Architecture of Bi-LSTM.
Figure 2. Architecture of Bi-LSTM.
Information 17 00398 g002
Figure 3. Luong Attention Mechanism.
Figure 3. Luong Attention Mechanism.
Information 17 00398 g003
Figure 4. Confusion Matrix.
Figure 4. Confusion Matrix.
Information 17 00398 g004
Figure 5. ROC Curve.
Figure 5. ROC Curve.
Information 17 00398 g005
Figure 6. Precision-Recall Curve.
Figure 6. Precision-Recall Curve.
Information 17 00398 g006
Figure 7. FNR and FPR.
Figure 7. FNR and FPR.
Information 17 00398 g007
Figure 8. Performance Metrics.
Figure 8. Performance Metrics.
Information 17 00398 g008
Figure 9. Model Accuracy.
Figure 9. Model Accuracy.
Information 17 00398 g009
Figure 10. Model Loss.
Figure 10. Model Loss.
Information 17 00398 g010
Figure 11. Ratings by Category.
Figure 11. Ratings by Category.
Information 17 00398 g011
Figure 12. Review Length vs. Rating.
Figure 12. Review Length vs. Rating.
Information 17 00398 g012
Figure 13. Review Length Distribution.
Figure 13. Review Length Distribution.
Information 17 00398 g013
Figure 14. Rating vs. Word Count.
Figure 14. Rating vs. Word Count.
Information 17 00398 g014
Figure 15. Review Length by Sentiment.
Figure 15. Review Length by Sentiment.
Information 17 00398 g015
Figure 16. Scatter plot matrix of Review Length and Word Count by Rating.
Figure 16. Scatter plot matrix of Review Length and Word Count by Rating.
Information 17 00398 g016
Table 1. Technical Comparison with Existing Attention-Based Bi-LSTM Models.
Table 1. Technical Comparison with Existing Attention-Based Bi-LSTM Models.
ModelEmbedding MethodPreprocessing DepthNeutral Class HandlingDataset DomainAttention TypeAttention ScoringOutlier HandlingDomain GeneralizationNovel Technical Component
Conventional Bi-LSTM + AttentionWord2Vec/GloVeBasic cleaning onlyNoSingle domainGenericAdditiveNoWeakNo contextual refinement
Existing Luong Attention Bi-LSTMWord embeddingsPartial cleaningNoLimited categoriesLuongDot-productNoLimitedStandard Luong mechanism without optimization
Proposed ModelCustom-trained embeddingsOptimized pipeline: noise reduction, sentiment balancing, outlier removalExplicit 3-class representationMulti-category Amazon datasetLuong AttentionRefined scoring mechanism for sentiment-bearing token prioritizationYesStrong cross-domain generalizationIntegrated EDA, unified workflow, ablation-validated improvements
Table 2. Robustness Evaluation of the Proposed Model.
Table 2. Robustness Evaluation of the Proposed Model.
Evaluation StrategyDatasetPurpose
Train–Test Split (80/20)Amazon ReviewsBaseline performance evaluation
k-fold Cross-ValidationAmazon ReviewsValidates model stability and robustness
Table 3. Comparison With Existing Literature and Practical Implications.
Table 3. Comparison With Existing Literature and Practical Implications.
Study/ModelKey Finding in LiteratureComparison with Our FindingsPractical Implications for E-Commerce
Cross-border review analysis (Liu et al.) [15]Identified major consumer concernsOur model provides higher sentiment accuracyBetter understanding of customer needs
Aspect-based summarization
(Mabrouk et al.) [19]
Extracted product aspects but no deep sentimentOur model adds strong sentiment classificationBetter automated review summarization
Attention-BiLSTM sentiment
(H. Li et al.) [23]
Improved 3-class classificationOur model offers competitive performance + scalabilitySupports large-scale sentiment analysis
NLP + ML for retailers
(Nichifor et al.) [22]
Found hidden neutral toneOur model detects polarity more preciselyImproves customer feedback interpretation
Fresh food sentiment
(Li et al.) [26]
Logistics issues identified via text miningOur model provides deeper emotional analysisEnhances service-quality monitoring
Table 4. Performance Comparison with Transformer-based Sentiment Analysis Models.
Table 4. Performance Comparison with Transformer-based Sentiment Analysis Models.
ModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)
BERT97.196.9597.0296.98
RoBERTa97.3597.1297.297.16
Bi-LSTM94.7694.294.5594.37
Proposed Model96.6796.2196.4896.34
Table 5. Performance Comparison with Baseline Models.
Table 5. Performance Comparison with Baseline Models.
ModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)
SVM89.288.58988.7
Random Forest9190.390.890.5
CNN93.59393.293.1
Bi-LSTM94.7694.294.5594.37
BERT97.3597.1297.297.16
Proposed Model96.6796.2196.4896.34
Table 6. Ablation Study Evaluating the Impact of the Luong Attention Mechanism.
Table 6. Ablation Study Evaluating the Impact of the Luong Attention Mechanism.
ModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)
Bi-LSTM94.7694.294.5594.37
Bi-LSTM + Luong Attention (Proposed Model)96.6796.2196.4896.34
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mamyrbayev, O.; Mussayeva, D.; Kurmetkan, T. Enhanced Sentiment Analysis of E-Commerce Product Reviews Using Luong Attention-Based Bi-LSTM. Information 2026, 17, 398. https://doi.org/10.3390/info17050398

AMA Style

Mamyrbayev O, Mussayeva D, Kurmetkan T. Enhanced Sentiment Analysis of E-Commerce Product Reviews Using Luong Attention-Based Bi-LSTM. Information. 2026; 17(5):398. https://doi.org/10.3390/info17050398

Chicago/Turabian Style

Mamyrbayev, Orken, Dinara Mussayeva, and Turdybek Kurmetkan. 2026. "Enhanced Sentiment Analysis of E-Commerce Product Reviews Using Luong Attention-Based Bi-LSTM" Information 17, no. 5: 398. https://doi.org/10.3390/info17050398

APA Style

Mamyrbayev, O., Mussayeva, D., & Kurmetkan, T. (2026). Enhanced Sentiment Analysis of E-Commerce Product Reviews Using Luong Attention-Based Bi-LSTM. Information, 17(5), 398. https://doi.org/10.3390/info17050398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop