AI-Driven Anomaly Detection in E-Commerce Services: A Deep Learning and NLP Approach to the Isolation Forest Algorithm Trees

Mah, Pascal Muam; Skalna, Iwona; Pelech-Pilichowski, Tomasz

doi:10.3390/jtaer20030214

Open AccessArticle

AI-Driven Anomaly Detection in E-Commerce Services: A Deep Learning and NLP Approach to the Isolation Forest Algorithm Trees

by

Pascal Muam Mah

^1,*

,

Iwona Skalna

²

and

Tomasz Pelech-Pilichowski

³

¹

Department of Information and Communication Technology, AGH University of Krakow, 30-059 Krakow, Poland

²

Department of Business Informatics and Management Engineering, AGH University of Krakow, 30-059 Krakow, Poland

³

Department of Applied Computer Science, AGH University of Krakow, 30-059 Krakow, Poland

^*

Author to whom correspondence should be addressed.

J. Theor. Appl. Electron. Commer. Res. 2025, 20(3), 214; https://doi.org/10.3390/jtaer20030214

Submission received: 23 May 2025 / Revised: 31 July 2025 / Accepted: 4 August 2025 / Published: 14 August 2025

Download

Browse Figures

Versions Notes

Abstract

The accelerated development of e-commerce has given rise to sophisticated systems defined by significant user interaction, a variety of product offerings, and considerable quantities of structured and unstructured data. Upholding trust and operational security is becoming ever more essential. E-commerce platforms are susceptible to deceptive practices, including counterfeit reviews, dubious transactions, and anomalous usage behaviors. This research introduces a framework for anomaly detection powered by artificial intelligence, integrating deep learning and natural language processing (NLP) with the isolation forest algorithm tree to enhance the identification of unusual activities on e-commerce platforms. We leveraged customer feedback, transaction logs, and user interaction data obtained from Kaggle. Textual reviews were interpreted using natural language processing (NLP), while deep learning was utilized to discern behavioral patterns. The isolation forest algorithm tree was employed to detect statistical anomalies in multidimensional data. The hybrid model surpassed conventional techniques in terms of detection accuracy, recall, and interpretability. It successfully detects suspicious actions and clarifies anomalies in their relevant context. The application of AI techniques, particularly natural language processing, deep learning, and isolation forest algorithm trees, establishes a solid foundation for anomaly detection in the realm of e-commerce. This approach fosters a more secure and trustworthy experience for online consumers.

Keywords:

e-commerce services; natural language processing (NLP); isolation forest algorithm trees; deep learning; artificial intelligence; anomaly detection

1. Introduction

E-commerce platforms have profoundly reshaped consumer behavior by enabling smooth transactions, personalized experiences, and access on a global scale. However, this swift digital evolution has also introduced vulnerabilities such as financial fraud, spam, and deceptive user behaviors, particularly as platforms rely more on user-generated content like reviews, ratings, and feedback. The requirement for advanced, reliable, and flexible systems to maintain authenticity and security has become increasingly important [1,2,3].

This study pioneers a framework that combines interpretable AI strategies with multi-dimensional isolation forest algorithm trees (IFAT), with each tree corresponding to various quality of service (QoS) dimensions, including quality, responsiveness, availability, security, assurance, and loyalty. Employing a synthetic dataset of customer–agent interactions, these well-crafted decision trees expose patterns of normal and abnormal user feedback, thereby enhancing interpretability and visual clarity. This novel methodology allows for actionable insights that can improve e-commerce service quality and customer satisfaction.

Utilizing advancements in natural language processing (NLP), deep learning (DL), biosignal simulations, and artificial intelligence (AI), this research combines unsupervised anomaly detection through isolation forests with comprehensive quality of service (QoS) metrics. The proposed framework facilitates improved personalization, real-time sentiment and anomaly detection, as well as secure intelligent interactions. The developed IFAT model provides a scalable and interpretable approach that correlates various service quality dimensions with user feedback patterns, validated using both real-world and synthetic e-commerce datasets [4].

Recent studies have emphasized the necessity of flexible e-commerce service architectures that support a variety of technological integrations [1]; the essential role of agility and digital trust, especially during crises [2]; and the growing relevance of mobile commerce usability [3]. In addition, AI, ML, and sophisticated analytics have been recognized for their importance in proactive cybersecurity, personalized customer service, and operational efficiency in the realm of e-commerce [5,6,7]. This research builds upon these foundational insights by focusing on interpretable anomaly detection related to quality of service and sentiment.

Through the combination of multi-dimensional QoS perspectives and interpretable anomaly detection, the proposed framework confronts essential challenges related to e-commerce security, customer satisfaction, and service optimization. It equips platform developers, data scientists, and security analysts with scalable AI solutions that bolster reliability, transparency, and the creation of actionable insights, facilitating enhanced decision making and trustworthiness in digital commerce contexts.

The objective of this research is threefold:

To explore how deep learning and NLP can enrich e-commerce data representations;
To integrate isolation forests for effective unsupervised anomaly detection;
To validate the proposed framework on real-world and synthetic e-commerce datasets.

The isolation forest algorithm trees (IFAT)-based model reinterprets intricate e-commerce interactions into clear, interpretable decision trees, with each tree highlighting a distinct aspect of quality of service (QoS). This clarity aids in pinpointing service challenges and unusual patterns in customer feedback, allowing businesses to enhance their offerings, optimize user experience, and bolster trust. The approach guarantees adaptability and scalability, empowering e-commerce platforms to confront evolving challenges in a highly dynamic digital landscape.

2. Literature Review

The evolution of computing and transformative Internet technologies have accelerated the development and popularization of e-commerce platforms and services. In an era characterized by economic digitization and globalization, e-commerce has played a constructive role in the economic and social growth of nations. Therefore, engaging with e-commerce directly enhances business efficiency and supports sustained and healthy economic development, which is an essential key focus and challenge in present-day economies.

One of the foundational requirement for employing e-commerce to improve business efficiency is a complete understanding of the e-commerce platform, which entails and attracts profits and boosts business visibility to potential customers with the support of analysis and forecasting of e-commerce data.

Natural language processing (NLP), deep learning (DL), biosignal simulations, and artificial intelligence (AI) are interconnected in their role of improving the quality of service (QoS) in e-commerce. NLP facilitates the comprehension of user intent; DL enhances the ability to recognize patterns and make predictions; biosignal simulations offer insights into emotional contexts; and AI synthesizes these elements for informed decision making. Collectively, they foster adaptive, responsive, and personalized services, thereby enhancing efficiency, satisfaction, and trust within e-commerce platforms through fluid human–computer interaction and real-time service optimization. Mudgal (2025) [8] examines the application of AI and ML in the proactive detection of threats within e-commerce. This chapter underscores the significance of real-time monitoring, anomaly detection, and predictive analytics to avert cyber threats, which in turn secures data, strengthens system resilience, and promotes safer online transactions. In addition, Rane, Choudhary, and Rane [6] offer a thorough examination of the impact of artificial intelligence (AI) and machine learning (ML) on business intelligence, finance, and e-commerce. The findings highlight the role of AI-driven data analytics, predictive modeling, and automation as pivotal elements that enhance decision making, operational efficiency, and customer experiences. Additionally, the study addresses the challenges posed by data privacy, ethical dilemmas, and the requirement for explainable AI to cultivate trust and transparency. Better still, Kalusivalingam et al. [9] introduce a hybrid model that integrates ensemble learning with anomaly detection to improve B2B fraud detection. By employing methods such as random forest and isolation forest, the research enhances detection precision, minimizes false positives, and adjusts to changing fraud patterns within intricate transactional settings. In their 2025 work, Gracious et al. [10] examine strategic innovations in AI and ML that are designed to enhance security within the e-commerce sector. This chapter underscores the significance of real-time threat detection, adaptive learning models, and automated defense systems. Furthermore, it explores future avenues, including the ethical application of AI, compliance with regulations, and the integration of new technologies to fortify digital trust and resilience. Khurana and Kaul [11] suggest dynamic cybersecurity strategies for AI-augmented e-commerce through the use of federated learning. This decentralized methodology facilitates collaborative threat detection and adaptive defense mechanisms while safeguarding data privacy, thereby improving security without the need for centralized data sharing.

2.1. Anomaly Detection in E-Commerce

The process of anomaly detection involves recognizing rare or atypical patterns that stray from the expected norm. Within the realm of e-commerce, these anomalies might encompass fraudulent activities, dubious user behaviors, irregular transactions, or the improper use of systems.

Traditional approaches, including K-means clustering, PCA, and statistical thresholding, have been employed, but they often fall short in high-dimensional and noisy settings.

Villegas-Ch. et al. [12] advocate for the integration of explainable AI into anomaly detection systems aimed at e-commerce threat management. This model enhances the clarity of security threat identification, which in turn boosts trust and compliance. The findings emphasize the necessity of interpretability in AI-driven cybersecurity measures for online environments. In addition, Al-Ebrahim, Bunian, and Nour [13] conduct a review of machine learning applications within the realm of e-commerce, emphasizing the analysis of consumer behavior, personalization strategies, inventory optimization, and fraud detection mechanisms. They underscore the challenges associated with implementation and suggest potential future pathways for the integration of machine learning to improve operational efficiency and enhance the customer experience. Zhang et al. [14] conduct a comprehensive survey of machine and deep learning methodologies within the realm of e-commerce, addressing applications such as sentiment analysis, recommendation systems, fraud detection, and product classification. They pinpoint challenges including data imbalance, overfitting, and the interpretability of models, while also emphasizing emerging trends in personalization, chatbots, and multi-modal learning. Kalla [15] examines the role of big data analytics and artificial intelligence in improving the performance of e-commerce organizations. The dissertation delves into AI-fueled decision making, customer insights, and the optimization of operations. It underscores the importance of data-driven strategies for gaining a competitive edge, while also addressing the challenges and best practices associated with the implementation of AI and analytics within e-commerce platforms.

2.2. Deep Learning for E-Commerce

Deep learning (DL), especially with models such as autoencoders, LSTM networks, and CNNs, has proven to be highly effective in uncovering complex patterns from vast datasets.

Autoencoders are often employed for the purpose of anomaly detection by learning to generate compact data representations and identifying instances with significant reconstruction errors.

In the context of e-commerce, deep learning has been applied to areas such as recommendation systems, dynamic pricing, and fraud detection.

Feng (2022) [16] introduces a deep learning strategy utilizing gradient boosted decision trees (GBDT) for the analysis and forecasting of e-commerce data. By extracting 107 features indicative of user behavior, the model positions purchase prediction as a binary classification task. It demonstrates superior performance compared to traditional methods, thereby enhancing predictive accuracy in the realm of online retail. In a 2018 study, Yu et al. [17] present a multi-layered deep learning framework for the purpose of e-commerce product categorization. They address challenges including category imbalance and complex taxonomies by constructing hierarchical classification models that employ FastText and AbLSTM. Their approach enhances classification accuracy by integrating hierarchical tree structures alongside ensemble methodologies.

Deep learning (DL) is instrumental in advancing e-services through various functions, such as pattern recognition, decision making, predictive maintenance, and intelligent automation. Table 1 provides the mathematical formulation and system architecture that enable these functions.

The work of Shankar et al. [18] introduces VisNet, a deep convolutional neural network aimed at facilitating large-scale visual search and recommendation in e-commerce. By concurrently addressing both tasks, VisNet captures multi-level visual similarities, resulting in enhanced accuracy for image retrieval. Its deployment on Flipkart not only improves user experience but also increases conversion rates. Nabi et al. [19] investigate the application of convolutional neural networks (CNNs)—notably VGG16, ResNet50, and InceptionV3—for the prediction of e-commerce profitability. Their deep learning models demonstrate superior performance compared to conventional techniques, providing enhanced accuracy in estimating profit margins. The research highlights the capability of CNNs to improve strategic financial planning and decision making within the realm of online retail. Zhang [20] introduces a deep learning framework aimed at e-commerce product recognition, employing convolutional neural networks (CNNs) to scrutinize product images. The model significantly boosts classification accuracy and retrieval efficiency, tackling issues such as visual similarity and category overlap. This methodology enhances product search capabilities and improves user experience on online retail platforms.

2.3. Natural Language Processing in E-Commerce

Natural language processing empowers machines to comprehend and analyze textual information, including product reviews, user feedback, and inquiries.

Methods like sentiment analysis, topic modeling, and transformer-based architectures (for instance, BERT) facilitate the extraction of semantic meaning from user-generated content.

The combination of natural language processing with anomaly detection enhances the development of contextually aware and behaviorally intelligent systems.

Jha, Sivasankari, and Venugopal [21] introduce a framework utilizing natural language processing for the sentiment analysis of product reviews in e-commerce. Their approach significantly improves both the efficiency and accuracy of real-time classification of customer feedback, thereby assisting businesses in comprehending consumer sentiment and enhancing product offerings as well as customer satisfaction. Additionally, Lin [22] explores the sentiment analysis of customer reviews in e-commerce through the application of natural language processing methods. The research assesses a range of machine learning algorithms, such as naive Bayes, support vector machines, and decision trees, to categorize reviews as either positive or negative. The results underscore the significance of choosing the appropriate model and the necessity of data preprocessing to improve classification accuracy. Moreover, Soundarapandian [23] investigates the use of natural language processing (NLP) within the realm of e-commerce to improve the customer experience. The text elaborates on various NLP methodologies, including sentiment analysis, chatbots, and tailored recommendations, highlighting their significance in boosting customer engagement, satisfaction, and loyalty. Additionally, it examines the challenges associated with implementation and the prospective trends in AI-enhanced customer service. Furthermore, Ismail, Ghareeb, and Youssry [24] investigate the ways in which sentiment analysis and natural language processing (NLP) improve customer experiences in e-commerce. Their research indicates that these technologies enhance customer satisfaction and loyalty by providing personalized services, improving chatbot effectiveness, and offering insights from reviews. Additionally, they highlight the importance of addressing ethical issues such as data privacy and cultural sensitivity to ensure responsible application.

2.4. Isolation Forest Algorithm Trees

The isolation forest algorithm comprises a collection of lightweight decision trees designed to model and illustrate anomaly detection across multiple quality of service (QoS) dimensions, utilizing customer feedback such as complaints, contextual keywords, and reviews as features.

Isolates potential anomalies: Identifies possible anomalies through a recursive division of features that stem from user-reported service problems (such as complaints, bugs, and refunds).

Maps specific perception keywords to QoS metrics: Associates particular perception keywords with quality of service (QoS) metrics, including quality, responsiveness, availability, and security.

Visualizes the decision boundaries: Illustrates the decision boundaries that distinguish between normal and anomalous behavior, contingent upon the presence or absence of these keywords.

Acts like a single tree in the full isolation forest: Functions as a singular tree within the broader isolation forest, facilitating the isolation of samples (anomalies) with fewer splits, which constitutes the fundamental principle of the algorithm.

2.4.1. Isolation Forest Algorithm Trees for E-Commerce Services

The isolation forest algorithm trees (IFAT) designed for quality of service (QoS) delineate twelve scenarios, with each scenario reflecting a specific QoS dimension, such as quality, security, or responsiveness. The initial step involves discerning customer perceptions and generating synthetic binary data to replicate their feedback. Labels are designated as normal or anomalous, contingent upon the presence of terms linked to anomalies. For each QoS category, relevant keywords are identified, and a decision tree is developed based on these features. The visualization process facilitates the recognition of patterns in customer perceptions related to service quality and highlights possible issues.

Figure 1: Isolation forest algorithm trees (IFAT) for e-commerce QoS. Isolation Forest Algorithm Trees for E-Commerce QoS-Based Perception Analysis: It emphasizes the detection of anomalies in service quality, responsiveness, availability, security, assurance, and loyalty. The framework facilitates the identification of user experience problems and informs service enhancement strategies.

The illustration above (Figure 1) represents 12 simulated isolation forest algorithm trees, each focusing on a distinct QoS dimension through the use of perception-based keywords.

Table 2 represents six unique decision trees operating within an isolation forest framework, each corresponding to a fundamental quality of service (QoS) dimension—quality, responsiveness, availability, security, assurance, and loyalty. Each tree employs particular perception-related keywords to identify anomalies in user experiences. This organized mapping improves the comprehension of service performance and customer feedback, facilitating proactive anomaly detection and focused enhancements in service delivery, trust, and overall user satisfaction.

Gałka, Karczmarek, and Tokovarov [25] introduced an enhanced isolation forest method that incorporates minimal spanning tree clustering, thereby improving the accuracy and robustness of anomaly detection. Their approach minimizes false positives and boosts precision, especially in datasets that are high-dimensional and complex. Additionally, Marteau et al. [26] introduce a hybrid isolation forest technique aimed at improving anomaly detection, especially in the context of intrusion detection systems. This method integrates conventional isolation forest with distance-based metrics to enhance the precision of identifying infrequent, suspicious occurrences within intricate network traffic. Moreover, Cheng, Zou, and Dong [27] conduct a comparative analysis of the isolation forest and local outlier factor techniques for the purpose of outlier detection. Their research, presented at the Research in Adaptive and Convergent Systems conference, evaluates the performance of both methods across various datasets, highlighting the advantages of each and suggesting a hybrid approach to enhance anomaly detection robustness.

The isolation forest algorithm trees play a crucial role in helping e-commerce services recognize anomalies in customer feedback by sorting issues into QoS domains, including quality, responsiveness, availability, and security.

This systematic detection process bolsters service reliability, cultivates trust, and increases customer satisfaction through the swift identification and resolution of essential perception-driven service shortcomings.

2.4.2. Isolation Forest Algorithm

The isolation forest algorithm is a tree-based, unsupervised learning method specifically developed for the purpose of anomaly detection.

It focuses on isolating anomalies rather than profiling normal data points, which enhances its efficiency and effectiveness when dealing with large datasets.

In contrast to distance-based or density-based approaches, isolation forests exhibit linear scalability and demonstrate strong performance in high-dimensional environments, rendering them particularly suitable for dynamic and feature-rich e-commerce data.

Liu, Ting, and Zhou (2008) [28] introduce the isolation forest algorithm for the purpose of anomaly detection. Differing from traditional methods, it utilizes random partitioning to isolate anomalies. This strategy is efficient, scalable for large datasets, and proficient in detecting outliers, making it well suited for high-dimensional data in practical applications. Hariri, Kind, and Brunner [29] introduce the extended isolation forest (EIF), which enhances the conventional isolation forest by employing randomly oriented hyperplanes rather than relying on axis-aligned splits. The EIF significantly improves the accuracy and consistency of anomaly detection, particularly in high-dimensional datasets, all while ensuring computational efficiency suitable for scalable applications in real-world scenarios. Ding and Fei [30] put forth an anomaly detection strategy for streaming data that leverages the isolation forest algorithm with a sliding window method. This technique successfully identifies anomalies in real-time data streams by regularly updating detection boundaries, thus providing adaptability, efficiency, and suitability for ever-changing environments. Moresor, Heigl et al. [31] improve the isolation forest algorithm for better outlier detection in streaming data. Their technique introduces adaptive mechanisms to address evolving data distributions, which guarantees robustness and accuracy. The optimized algorithm is efficient, scalable, and appropriate for real-time anomaly detection in dynamic contexts.

2.5. E-Services Powered by WSN + IoT

E-services are defined as services provided through digital means, leveraging wireless sensor networks (WSNs) and Internet of things (IoT) frameworks. Prominent sectors include:

1:

E-Healthcare: Facilitating remote monitoring of patients and issuing emergency alerts.

2:

Online Education: Developing smart learning environments and implementing adaptive tutoring systems.

3:

Smart Homes: Promoting energy efficiency, enhancing security, and automating processes.

SQS = α_{1} Q + α_{2} R + α_{3} A + α_{4} S

(1)

where:

Q: quality of data;
R: responsiveness (latency);
A: availability;
S: security assurance;
$α_{i}$ : weights assigned to each parameter such that $\sum_{i = 1}^{4} α_{i} = 1$ .

The importance of e-services, as illustrated by the e-service quality radar, is rooted in their capacity to assess service quality across various dimensions. This highlights essential metrics—quality, responsiveness, availability, and security—each given equal importance. By scoring each metric, it reveals the performance level of the e-service. These visual representations enable businesses to pinpoint their strengths and identify areas needing enhancement, thereby ensuring that customer expectations are fulfilled effectively and securely.

Dzemydienė et al. [32] suggest an architecture for an e-service system that combines IoT and wireless sensor networks to oversee and control intermodal freight transportation. This system features algorithms designed for object identification, authentication, and secure integration with IoT platforms, thereby improving real-time tracking, operational efficiency, and safety within the logistics sector. In addition, Pathan et al. [33] present a framework aimed at providing e-services to rural regions by utilizing wireless ad hoc and sensor networks. This model highlights the importance of affordable infrastructure, effective data transmission, and scalability, thereby facilitating healthcare, education, and disaster management services in underdeveloped and remote areas through intelligent connectivity. Sattar et al. [34] introduce a four-tier IoT-driven smart agricultural system designed to improve e-services within the agricultural sector. This framework combines contextual information, sensor networks, and mobile applications to enhance irrigation efficiency, resource utilization, and decision-making processes, with the goal of boosting productivity and resilience in the face of climate change.

Table 3 shows the core evaluation metrics for e-services. Table 2 shows the availability of services and the extent of platform usage in e-learning, and assesses whether there is accuracy in anomaly detection or event prediction based on the mathematical functions.

This section demonstrates the potential to identify the quality of services with regard to online e-commerce products through reviews and discussions between customers and agents. The table presents a scenario for predicting quality of service (QoS) based on the tone of the text.

Table 4 presents thequote, QoS, predicted tone, and heartbeat model data derived from the text. The subsequent paragraphs delineate the distinctions in recognizing the scenarios:

1. Heartbeat vs. QoS Metric (Scenario One): Scenario one shows the distribution of estimated heartbeats (bpm) based on text comment tone for each quality of service (QoS) category, including “Q” (quality), “R” (responsiveness), “A” (availability), and “S” (security assurance). The table highlights the spread and variability of heart rate data within each category. By comparing the heartbeats across QoS classifications, it helps identify potential emotional and stress-related differences between categories, suggesting how different types of issues (e.g., quality, reliability) may affect heart rate.

2. QoS Distribution (Scenario Two): Scenario two shows the frequency of each QoS category mentioned in the communication dataset. The chart helps to visually understand the prevalence of each QoS metric, such as “Q” (quality) for quality, “R” (responsiveness) for reliability, “A” (availability) for availability, and “S” (security assurance) for security. The distribution highlights which QoS issues are most commonly discussed, offering insights into customer concerns and the focus areas for improvements in product or service quality.

3. Heartbeat Over Time (Scenario Three): Scenario three tracks the estimated heartbeat (bpm) over time for each QoS classification across the sequence of messages. Each line represents a different QoS category (“Q”, “R”, “A”, “S”), allowing for a comparison of how heart rate changes as different issues are discussed. This visualization can indicate patterns of increasing stress or concern over time, especially when certain QoS issues repeatedly arise, offering a temporal view of emotional or physiological responses to ongoing issues.

Arnab et al. [35] introduce a deep-learning-driven system for predicting quality of service (QoS) in cellular networks, utilizing historical traffic data alongside contextual features. This model improves network performance by forecasting quality parameters, which facilitates proactive resource allocation and enhances user experience in environments driven by the Internet of things (IoT). In addition, Rehman, Nasralla, and Philip [36] created a medical quality of experience (m-QoE) prediction model that is aware of quality of service (QoS), content, and device characteristics for ultrasound streaming in small cell networks. Utilizing a multilayer perceptron neural network, this model incorporates network conditions, features of video content, and characteristics of devices to forecast m-QoE. Validated through subjective evaluations by healthcare professionals, it facilitates adaptive video streaming, thereby maintaining diagnostic quality across various devices and network settings. Moreover, Vijayakumar et al. [37] introduce a deep learning framework designed to forecast multimedia quality of experience (QoE) by leveraging electrocardiogram and respiration signals. Their approach, which incorporates models like CNN, BLSTM, and CNN-BLSTM, achieves an impressive F1-score of up to 87.55.

2.6. Natural Language Processing in Human–Device Interaction

Advancements in technology have empowered Internet of things (IoT) devices, enabling mobile application users to track and control their daily activities via graphical user interfaces, touchscreens, keyboards, mice, joysticks, voice inputs, or even eye movements.

Natural language processing (NLP) plays a crucial role in the field of human–device interaction, encompassing the following applications:

Voice Commands: Allowing users to manage devices using their voice.
Multilingual Interfaces: Supporting input processing in diverse languages.
Context-Aware Services: Suggesting services tailored to the user’s past behavior or expressed intentions.

Human–device interaction (HDI) refers to the translation of human desires into operational commands for devices. The framework presented by Rubio-Drosdov et al. [38] addresses seamless human–device interaction in IoT environments, concentrating on context-aware services, identity management, and trust negotiation. This approach significantly enhances user experience by facilitating intuitive, secure, and transparent interactions between users and smart devices in environments characterized by ubiquitous computing. Additionally, Söldner et al. [39] investigate the interaction between humans and devices within life science laboratories, highlighting the integration of smart biolabs. They address the importance of intuitive interfaces, automation, and real-time feedback systems to improve the accuracy, efficiency, and reproducibility of experiments, with the goal of revolutionizing laboratory practices through sophisticated human-centered and interconnected technological solutions.

Preprocessing: TF-IDF. TF-IDF converts unprocessed text into significant numerical representations by emphasizing key terms in relation to specific documents.

$TF-IDF (t, d, D) = TF (t, d) \times log (\frac{N}{| {d \in D : t \in d} |})$

(2)

Here, t is defined as a term (or word), d refers to a specific document, D represents a collection of documents, commonly referred to as a corpus, and N is the total number of documents within that corpus. The term $TF (t, d)$ denotes term frequency, which measures the frequency of the term t within the document d. On the other hand, $| {d \in D : t \in d} |$ indicates document frequency, which counts the number of documents in the corpus D that include the term t.
In the context of e-services, it enhances the extraction of keywords and the understanding of user intent, thereby enabling virtual assistants or chatbots to interpret user inquiries more accurately. This is particularly beneficial in sectors such as education and healthcare, where the use of precise terminology improves the efficiency of automated service provision.
Word Embedding: Word2Vec (Skip-gram Objective). Word2Vec is instrumental in recognizing the semantic relationships that exist between words, which enables systems to grasp language contextually.

$L = - \sum_{t = 1}^{T} \sum_{\begin{matrix} - c \leq j \leq c \\ j \neq 0 \end{matrix}} log P (w_{t + j} | w_{t})$

(3)

Let T represent the total count of words within the corpus. The symbol $w_{t}$ indicates the word situated at position t. The notation $w_{t + j}$ refers to a context word that is positioned at a distance of j from the current word $w_{t}$ . The variable c signifies the size of the context window, which dictates the number of surrounding words to be taken into account around $w_{t}$ . The expression $P (w_{t + j} ∣ w_{t})$ denotes the probability of accurately predicting the context word $w_{t + j}$ based on the current word $w_{t}$ .
This advancement is significant in e-services, as it enhances the machine’s understanding of user inputs, thereby facilitating more intelligent responses in applications such as e-health consultations and smart learning platforms. In these contexts, the varying meanings of words significantly influence the provision of personalized and precise feedback or actions.
Intent Classification: Softmax Layer. Intent classification translates user inputs into specific actions through models such as Softmax.

$P (y = j ∣ x) = \frac{e^{z_{j}}}{\sum_{k = 1}^{K} e^{z_{k}}} where z_{j} = w_{j}^{⊤} x + b_{j}$

(4)

Let x represent the input feature vector, which may include word embeddings; y signifies the output class that indicates the intent; and K denotes the total number of potential intent classes. The parameters $w_{j}$ and $b_{j}$ refer to the weight vector and bias term associated with class j, respectively, while $z_{j}$ represents the logit, or the unprocessed score calculated prior to the application of the Softmax function for class j.
This process allows systems to discern user intentions—such as arranging a health appointment or retrieving an online course—facilitating smooth interactions. It plays a vital role in electronic services by accurately activating services in response to both voice and text commands.
Dialogue State Tracking (Context Vector Update). Dialogue state tracking is essential for preserving the context of conversations over time, employing models such as GRU or transformer.

$c_{t} = GRU (x_{t}, h_{t - 1}) or h_{t} = Transformer Layer (x_{t})$

(5)

The variable $x_{t}$ signifies the input at time step t, while $h_{t - 1}$ indicates the hidden state from the preceding time step. The term $c_{t}$ refers to the dialogue context vector at the current moment. The gated recurrent unit (GRU) is a specific kind of recurrent neural network (RNN) that modifies the context in response to sequential inputs. The TransformerLayer is a model component that employs self-attention mechanisms to effectively capture long-range dependencies within the input sequence.
This process guarantees continuity in multi-turn dialogues, facilitating user navigation through health assessments or educational modules. Consequently, e-services can effectively manage intricate inquiries, comprehend follow-up questions, and provide seamless, human-like interactions via intelligent interfaces.

Ni et al. [40] suggest two deep learning models—BiGRU-Att-CapsuleNet and RCNN—for the combined tasks of intent detection and slot filling in IoT voice interactions. Their approach, which integrates BiGRU-CRF for slot filling, enhances semantic understanding, resulting in competitive performance across multilingual datasets, thus facilitating improved natural voice interfaces in IoT environments. In addition, Majewski and Kacalak [41] analyze the concept of intelligent speech interaction between devices and human operators, focusing on systems that improve communication between humans and machines. Their discussion includes the integration of speech recognition, natural language processing, and adaptive interfaces aimed at enhancing interaction efficiency, particularly in intelligent and industrial systems that demand immediate responses. Additionally, Niezen and Eslambolchilar [42] propose a control-theoretic model that characterizes human operators in the context of medical device interaction, merging ON–OFF control mechanisms with behavior-based hybrid automata. This model, when utilized in a syringe pump interface, replicates user behavior across continuous, discrete, and fine-tuning actions, closely mirroring the results obtained from empirical laboratory investigations.

3. Materials and Methods

The methodology involves the creation and execution of a hybrid system that integrates wireless sensor networks (WSN), the Internet of things (IoT), and artificial intelligence (AI). This system is structured around four primary components: data collection, data processing, model development, and system integration. The study uses Kaggle datasets. A link to the datasets is available in [43].

3.1. Dataset Overview

The dataset contains 93 entries and 7 columns. Text represents the NLP input (customer–agent chat or review). Inbound indicates whether the message was from a customer. Created_at refers to the timestamp of the message. Currently, there are no biosignal data present; these are likely to be simulated or planned for future inclusion. No explicit anomaly labels are available—these must be inferred using unsupervised methods.

3.2. Data Preprocessing Steps

The natural language processing text preprocessing steps are as follows: delete mentions, hashtags, and URLs; transform to lowercase, tokenize, and lemmatize; and classify emotions and sentiments using a pre-trained transformer model (for instance, distilbert-base-uncased with fine-tuning for emotional analysis).

3.3. Natural Language Processing Model Components

Detect emotional sentiment that correlates with user frustration or satisfaction with the online products. The emotion classifier, which is based on BERT, analyzes tokenized input text through a transformer encoder consisting of 12 layers. It identifies contextual relationships and subsequently feeds the output into a fully connected dense layer.

Table 5 shows the model components of the BERT-based emotion classifier. A softmax activation function generates a probability distribution across established emotion categories. This architecture facilitates precise emotion detection from textual data, thereby aiding downstream tasks such as anomaly detection when integrated with biosignal features like heart rate and temperature.

3.4. Simulated Biosignal Data

Given that real biosignal data are not present in the file, we engage in simulation. We align these with the inbound and outbound characteristics, noting that a higher heart rate or temperature is associated with angry or urgent messages, whereas lower levels are linked to joyful or informational tones. Such characteristics are advantageous in e-commerce for recognizing stress during service interactions.

Table 6 shows the biosignal feature characteristics. The heart rate and temperature are critical biosignal features that indicate emotional states. An increase in heart rate points to stress or frustration, while temperature rises in conjunction with the emotional intensity of communication tone, speed, contextual accuracy, fitting responses, and ongoing arguments. These signals enhance the detection of emotions in text by providing a physiological context for analyzing anomalies.

3.5. Comparative Overview of Anomaly Detection Models

The table illustrates a detailed comparison among three anomaly detection models: isolation forest, LOF, and a deep learning autoencoder.

Table 7 shows a comparative overview of anomaly detection models. The table highlights the primary parameters, values, and functions. Isolation forest isolates anomalies through random tree splits, LOF identifies anomalies by assessing local density, and the autoencoder reconstructs the input to detect deviations. It also includes information on architecture layers and training hyperparameters, offering a comprehensive overview for performance understanding and reproducibility.

3.6. System Architecture

This section presents a conceptual synthesis of wireless sensor networks (WSN), the Internet of things (IoT), natural language processing (NLP), and deep learning (DL) architecture. The concept encompasses a thorough description of essential technologies, including sensors, communication protocols, NLP models, and deep learning networks. Moreover, a framework is proposed to delineate the flow of data, tracing its journey from sensing to the decision-making process.

Figure 2 shows the e-service system architecture’s three fundamental layers. The e-service system architecture is categorized into these three fundamental layers:

Perception Layer: This layer is made up of WSN nodes, including devices like temperature sensors, motion detectors, and heart rate monitors, which are responsible for capturing environmental or physiological data. In e-commerce quality of service (QoS), these inputs act as simulations of biosignals that arise from the textual tone of customer exchanges with online agents. This capability allows for the detection of customer emotions and behaviors, empowering artificial intelligence (AI) and deep learning (DL) models to personalize services in real time, boost responsiveness, and develop emotionally intelligent, context-aware user experiences that contribute to increased satisfaction and trust.
Network Layer: This layer leverages IoT communication protocols, such as MQTT, HTTP, ZigBee, and Wi-Fi, to transmit the gathered data to a central processing unit or a cloud platform. Within the context of e-commerce quality of service (QoS), it facilitates a continuous and real-time flow of data that is based on the existing items in online stores and markets. This approach takes into account the current contextual discussions, availability of items, and prompt communication. This is vital for the processing of biosignals and environmental inputs concerning current items, rather than just passing on communications. This connectivity enhances AI-driven personalization, quick decision making, and flexible service delivery, which in turn elevates customer experience, decreases latency, and upholds system reliability and responsiveness in the digital commerce sector.
Application Layer: This layer handles the processing of inputs through AI modules, employing natural language processing (NLP) for language interpretation and deep learning for effective decision making. This layer processes incoming data through AI modules and generative-NLP (GNLP), interpreting user language, tone, gestures, reactions, motives, and responses. Deep learning (DL) is utilized to aid in intelligent decision making. In the realm of e-commerce quality of service (QoS), it analyzes customer inquiries, emotional biosignals, and behavior patterns to generate precise, personalized responses. This enhances the quality of real-time services, reduces errors, and supports adaptive, emotion-sensitive interactions, resulting in improved customer engagement, satisfaction, and trust.

3.7. Experimental Analysis

The study uses five experimental functions with datasets from [43], consisting of: the perception function, which gathers environmental data through sensors; the network function, responsible for transmitting this sensor data via IoT communication channels; the NLP processing function, which interprets and derives meaning from textual inputs for customer versus agent; the integration and feedback function, which synthesizes insights and adjusts the system based on results; and the evaluation function, which assesses system performance using metrics such as accuracy and F1-score. Collectively, these functions establish a comprehensive and intelligent pipeline for processing and enhancing e-services.

This section presents the five-step model explanation of findings indicated on Algorithm 1 of the service system flow: sensor, communication, NLP, integration, and evaluation.

Step 1: Isolation Forest Tree for Perception Layer

This layer conveys the general perception by isolating essential keywords from customer exchanges.

We compute word frequency:

f (w_{i}) = \frac{Count of word w_{i}}{Total words}

(6)

Here,

f (w_{i})

indicates the frequency of word i. An elevated

f (w_{i})

leads to an increased display size, which assists in the prompt identification of significant emotions and problems, thereby preparing for a more in-depth analysis.

Figure 3a shows the perception layer. A word cloud is formed from the conversations, where the most prevalent words are rendered in a larger format. This visual aid accentuates key complaint signals without the need for in-depth reading.

Step 2: Isolation Forest Tree Network Layer for quality of service

The network layer illustrates the communication timeline and quality of service between the customer and the agent.

This can be modeled as:

M (t) = \frac{d N}{d t}

(7)

Here,

M (t)

signifies the volume of messages at time t, and N indicates the overall count of messages. The peaks and troughs in

M (t)

reveal significant emotional moments or resolutions, thus presenting a fluid representation of the conversational dynamics.

Figure 3b shows the network layer quality of service. We generate a tree that depicts the frequency of messages exchanged versus quality of service over time for each participant, highlighting who predominates the dialogue and identifying the moments of escalation.

Step 3: Key Terms and Applicable Layer for Quality Services Issues

This layer classifies and quantifies the concerns expressed by the customer. We systematically label significant complaint categories (e.g., “Refund Request”, “App Bugs”) and represent them in the visualization:

P (I_{k}) = \frac{C_{k}}{\sum_{i = 1}^{n} C_{i}}

(8)

In this framework,

C_{k}

refers to the tally for issue k, and n is the aggregate number of issue categories.

Figure 3c shows the application layer. The visualization serves to identify which domains, such as product failures and trust issues, require immediate remediation based on the volume of mentions, thus supporting the implementation of targeted corrective actions.

Step 4: Integration Flowchart and Decision Layer

This layer depicts decision making in the form of a flowchart, demonstrating the manner in which the service agent identifies key issues; suggests resolutions (including refunds or replacements); and addresses escalations:

P (s^{'} | s, a)

(9)

Here, we define s as the present service state, a as the action performed (for instance, issuing a refund), and

s^{'}

as the resulting state.

Figure 3d shows the integration and decision layer. The flowchart design aids in formulating explicit protocols that boost service efficiency and ensure consistent handling of customer complaints.

Step 5: Evaluation Layers for Quality Service Issues Confusion Matrix

The evaluation layer analyzes the accuracy of service results (refunds, complaints, escalations) by applying a confusion matrix and a classification report, which encompasses precision, recall, and F1-score metrics.

Precision:

Precision = \frac{T P}{T P + F P}

(10)

Recall:

Recall = \frac{T P}{T P + F N}

(11)

F1-score:

F 1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(12)

Here, TP is defined as true positive, FP as false positive, and FN as false negative.

Figure 3e shows the evaluation layer confusion matrix. This process verifies if the service team has effectively understood and classified the customer’s needs, leading to advancements in service quality.

4. Further Analysis

4.1. Isolation Forest Algorithm

The algorithm represents a service system by emulating sensitive data from text tone such as heart rate, temperature, and motion for users. It constructs a feature matrix, identifies anomalies through the isolation forest trees, and classifies user comments utilizing emotion prediction based on natural language processing. The outputs from the sensor data and NLP are combined to form a comprehensive dataset. Subsequently, statistical analyses, including kernel density estimation and the calculation of anomaly rates, are conducted. The performance is ultimately assessed using metrics such as accuracy, precision, recall, and F1-score, comparing predicted outcomes with actual results.

The isolation forest algorithm for e-service system flow is optimized in seven steps. The isolation forest algorithm shows the steps applied for both the sensor data and NLP merged to produce a final results for this study.

4.2. Isolation Forest Algorithm Application

This section presents isolation forest algorithm analysis fore-service system flow, including the sensor, communication, NLP, integration, and evaluation.

Body Temperature Distribution by Sentence

The presentation here depicts the distribution of body temperatures among users for each sentence. By differentiating temperatures based on user text responses, we determine and underscore the normal and abnormal ranges.

Figure 4a shows the body temperature distribution by sentence. This graphical representation assists in determining if particular emotional or situational contexts, articulated through sentences, are linked to irregular body temperature changes, thus promoting the early detection of health risks in electronic service environments.

Heart Rate Distribution by Sentence

This segment outlines the heart rate distributions that correlate with individual user sentences. By revealing how heart rates change in response to different emotional or situational cues, it allows for the swift detection of stress, anxiety, or relaxation trends.

Figure 4b shows heart rate distribution by sentence. This helps in enabling e-service systems to modify their contextual responses as they are dynamically based on users’ physiological states, ensures a personalized and contextually relevant response to service experiences.

Motion Activity by Sentence

This paragraph illustrates the frequency of motion activity, whether moving or stationary, associated with each user sentence.

Figure 4c shows motion activity by sentence. It reveals that by examining the balance between users’ physical activity and their emotional expressions, the e-service system enhances its contextual understanding—such as differentiating between the need for physical rest and emotional fatigue. This facilitates more informed and compassionate service decisions. The analysis of motion activity by sentence, utilizing NLP, AI, and DL, identifies variations in text tone associated with QoS dimensions such as responsiveness and assurance. Abrupt changes in tone (for instance, from calm to urgent) indicate user frustration or a sense of urgency, which aids in detecting service anomalies and improving real-time monitoring of customer experiences on e-commerce platforms.

NLP Confidence Scores per Sentence

This illustration demonstrates the NLP model’s confidence in categorizing the sentiment of each user statement. Predictions with high confidence contribute to the system’s reliability, whereas predictions with low confidence may necessitate fallback measures such as human intervention.

Figure 4d shows the NLP confidence scores per sentence. By visualizing confidence levels, service providers can evaluate the model’s trustworthiness in real time and modify their actions according to the degree of certainty.

NLP confidence scores for each sentence assist in evaluating the dependability of customer sentiment associated with QoS dimensions such as quality, responsiveness, and security, thereby facilitating accurate anomaly detection and enhancing service performance in e-commerce settings.

Sentiment by Sentence

This visualization classifies each sentence as either positive (+1) or negative (−1) in terms of sentiment. Representing emotions in this binary manner offers a rapid assessment of the emotional dynamics present in user communications.

Figure 4e shows sentiment by sentence. By promptly recognizing negative sentiments, the e-service system can focus on critical or urgent responses, thereby improving user safety and satisfaction.

Heart Rate vs. Temperature by Sentence

This plot illustrates the correlation between heart rate and body temperature for each individual user input. By employing distinct colors for each cluster based on the input sentences, it elucidates the influence of various emotional or situational contexts on these two critical physiological indicators.

Figure 4f shows the heart rate vs. temperature scatter plot colored by sentence. The identification of patterns or anomalies within this plot can facilitate the detection of irregularities and initiate tailored health or safety interventions.

Anomalies per Sentence

This section emphasizes the number of anomalies, defined as atypical sensor readings, identified within each sentence. It establishes a link between emotional or situational user expressions and physiological irregularities, offering essential insights for preemptive measures.

Figure 4g shows the anomalies per sentence. High levels of anomaly occurrences for a given sentence can suggest urgent necessities, allowing e-service systems to allocate immediate attention to users who may be in critical situations.

Training Data and Testing Data

The ‘heart rate vs. temperature’ training data visualization presents the distribution of sensor values, such as heart rate and body temperature, for each sentence, thereby assisting the isolation forest model in grasping normal patterns. The testing data visualization, ‘anomalies detected per sentence’, leverages this trained model to identify abnormal readings.

Figure 5A,B shows the training data and testing data. These anomalies illustrate how various emotional or environmental contexts, represented in user sentences, stray from established norms, signaling possible distress or irregularities in user behavior or physiological responses.

Classification Report

The classification report presents a thorough evaluation of a model’s performance, detailing precision, recall, F1-score, and support for each class (complaint, escalation, refund). Precision assesses the accuracy of positive predictions, while recall gauges the model’s effectiveness in identifying all relevant instances. The F1-score is defined as the harmonic mean of precision and recall.

Table 8 shows the classification report. Overall accuracy reflects the model’s correctness. The macro and weighted averages summarize the model’s performance across all classes, considering their respective support.

In this classification report, “support” refers to the actual number of samples corresponding to each keyword in accordance with our quality of service (QoS) for e-commerce, “Scenario: QoS, Predicted Tone, and Heartbeat Model Data”, illustrated in Table 2 and Table 3 for each class keyword (complaint: 2, escalation: 1, refund: 3). This support is not indicative of our dataset sample size. This metric is vital for understanding precision, recall, and F1-score, as it reveals the number of examples that each metric relies upon. A low support figure (e.g., escalation: 1) can result in metrics that are not stable. Additionally, support has an impact on both macro and weighted averages, which in turn influences the fairness and reliability of model evaluations across different class distributions.

5. Isolation Forest Algorithm (IFA) vs. Isolation Forest Algorithm Trees (IFAT)

This section covers: isolation forest algorithm (IFA), isolation forest algorithm (IFA) training and testing, isolation forest algorithm trees (IFAT), isolation forest algorithm trees (IFAT) matrix, perception component matrix (PCM) training and testing, differences in IFA and IFAT for QoS-based anomaly detection, and dine metaphor.

In this paragraph, we present mathematical equations in the specified format corresponding to the isolation forest algorithm (IFA). This section is used to identify the differences that exist with the isolation forest algorithm tree (IFAT).

5.1. Isolation Forest Algorithm (IFA)

The isolation forest algorithm (IFA) is a method of unsupervised machine learning employed for anomaly detection. It is based on the premise that anomalies are rare and distinct, making them simpler to separate from the remaining data.

1. Standardization (Z-score Normalization)

z_{i} = \frac{x_{i} - μ}{σ}

(13)

where:

$z_{i}$ is the standardized value;
$μ$ is the mean of the feature;
$σ$ is the standard deviation of the feature.

2. Isolation Forest Anomaly Score

z_{i} = \frac{x_{i} - μ}{σ}

(14)

s (x, n) = 2^{- \frac{E (h (x))}{c (n)}}

c (n) = 2 H (n - 1) - \frac{2 (n - 1)}{n}, H (i) \approx ln (i) + γ

predict (x) = \{\begin{matrix} - 1 & if s (x, n) \geq τ \\ 1 & otherwise \end{matrix}

where:

$E (h (x))$ is the expected path length of instance x;
$c (n)$ is the average path length in a binary search tree;
$γ \approx 0.577$ is the Euler–Mascheroni constant.

3. Principal Component Analysis (PCA)

Σ = \frac{1}{n - 1} X^{⊤} X

(15)

Σ v = λ v

X_{PCA} = X \cdot W_{k}

where:

$Σ$ is the covariance matrix of the standardized data;
$v$ and $λ$ are the eigenvectors and eigenvalues of $Σ$ ;
$W_{k}$ is the matrix of the top k eigenvectors (here, $k = 2$ );
$X_{PCA}$ is the projection of data onto the first two principal components.

Within e-commerce services, “standardization” assures that attributes like comment length and access time are calibrated for just comparison. “Isolation forest” uncovers anomalies in user behavior or system interactions, signaling possible quality or security challenges that could influence QoS. “PCA” decreases dimensionality for visualization, allowing for a clearer understanding of complex service patterns. Together, these processes bolster quality of service (QoS) by recognizing outliers, refining monitoring, and facilitating timely interventions to improve customer experience and operational efficiency.

This section helps in recognizing atypical pattern outliers in user comments, based on their content, tone, responses, and metadata, thereby facilitating a more profound analysis of behavioral trends or data quality concerns.

Figure 6 shows the isolation forest algorithm anomaly detection. The figure illustrates a two-dimensional PCA projection of high-dimensional features, including UMAP embeddings, comment length, and hour, to reveal patterns identified by the isolation forest model. Each point signifies a data sample, with colors indicating whether it is classified as normal (blue) or an anomaly (red).

Isolation Forest Algorithm (IFA) Training and Testing

It constructs several binary decision trees (referred to as isolation trees). Each tree systematically divides the data by randomly picking a feature and subsequently selecting a random split value within the range of that feature. This procedure persists until either the data point is isolated or the maximum depth of the tree is attained. The length of the path from the root to a leaf is utilized to assess anomaly scores: a short path indicates a higher likelihood of being an anomaly, and a long path suggests a greater probability of being normal.

Figure 7 shows the isolation forest algorithm (IFA). The isolation forest algorithm (IFA) showcases its capability in anomaly detection by effectively separating normal data points from anomalous ones across training and test datasets.

Employing only two features, it vividly presents green clusters for normal instances and red points for outliers. This graphical representation bolsters interpretability, enabling users to grasp how IFA identifies possible anomalies in customer feedback or service quality data, which supports QoS monitoring and improvement efforts.

5.2. Isolation Forest Algorithm Trees (IFAT)

Isolation Forest Algorithm Trees: These are ensembles of lightweight decision trees constructed to model and visualize anomaly detection across various quality of service (QoS) dimensions, using customer feedback (e.g., complaint, context keywords, reviews) as features.

1. Synthetic Binary Data Generation

Each element

x_{i j} \in {0, 1}

is generated from a Bernoulli distribution:

x_{i j} \sim Bernoulli (p), typically p = 0.5

(16)

Class labels

y_{i} \in {0, 1}

(e.g., normal or anomalous) are also drawn from:

y_{i} \sim Bernoulli (0.5)

(17)

2. Decision Tree Splitting Criterion (Gini Impurity)

Given a node t, the Gini impurity is defined as:

G (t) = 1 - \sum_{i = 1}^{C} p_{i}^{2}

(18)

where

p_{i}

is the proportion of class i instances at node t, and C is the number of classes.

To find the best split, the algorithm minimizes the weighted sum of the Gini impurity of the child nodes:

Split Criterion = \frac{N_{left}}{N} G (t_{left}) + \frac{N_{right}}{N} G (t_{right})

(19)

where:

N: total samples at node t;
$N_{left}$ , $N_{right}$ : samples in the left and right child nodes.

3. Information Gain (Alternative Criterion)

As an alternative to Gini impurity, the information gain (IG) from splitting on attribute A is:

IG (D, A) = H (D) - \sum_{v \in values (A)} \frac{| D_{v} |}{| D |} H (D_{v})

(20)

where the entropy

H (D)

is defined as:

H (D) = - \sum_{i = 1}^{C} p_{i} {log}_{2} (p_{i})

with

p_{i}

being the proportion of class i in dataset D.

The three steps are centered on the classification of e-commerce user feedback based on perceptions of QoS (quality of service) through (IFAT). First, synthetic binary data are generated to simulate the presence or absence of QoS-related keywords in customer feedback. Second, decision trees are trained using subsets of these keywords to capture specific QoS dimensions, including quality, security, and loyalty. Each tree divides the data according to the Gini impurity criterion, identifying essential indicators of anomalies. Lastly, the decision boundaries aid in visualizing how specific perceptions (for example, “refund” or “defective”) influence classifications, providing a foundation for anomaly detection in the analysis of e-commerce services.

Isolation Forest Algorithm Trees (IFAT)

We present six decision trees, each illustrating a distinct quality of service (QoS) dimension: quality, responsiveness, availability, security, assurance, and loyalty.

Figure 8 represents the isolation forest algorithm trees (IFAT). These trees evaluate customer perceptions based on 18 keywords derived from feedback, including terms like “refund”, “trust”, and “malfunction”. Each keyword is systematically color-coded with a six-color palette that is consistently applied across all keywords, thereby improving interpretability. The trees, constructed using a decision tree classifier, differentiate between normal and anomalous sentiments. A consolidated legend at the bottom elucidates the keyword–color associations. This visualization facilitates an understanding of how various QoS factors affect perceived service anomalies in user feedback.

5.2.1. Isolation Forest Algorithm Trees (IFAT) Matrix

The matrix illustrates how certain terms (like “refund”, “malfunction”, and “trust”) are connected to service dimensions such as quality, responsiveness, and security. This mapping supports the detection of anomalies and the analysis of services by indicating which feedback areas are pertinent to critical QoS matters.

Figure 9 shows the isolation forest algorithm trees (IFAT) matrix for e-commerce services. The perception-context matrix (PCM) delivers an analysis of customer feedback in relation to the quality of service (QoS) dimensions. It is designed for the use of the isolation forest algorithm trees (IFAT) in e-commerce services.

5.2.2. Perception Component Matrix (PCM) Training and Testing

This graphic illustrates a perception component matrix (PCM) that incorporates training and testing data derived from customer feedback keywords and their relevance to the quality of service (QoS) dimensions.

Figure 10 shows the perception component matrix (PCM) training and testing. It classifies each feedback instance as either normal (green) or anomalous (red) according to the outputs generated by the isolation forest technique. The matrix assists in identifying how perception keywords (like “bug”, “trust”, and “complaint”) influence anomaly detection in services, thereby supporting enhanced interpretability and quality monitoring through visualized associations of keywords.

6. Discussion

This section highlights the critical components of deep learning and signal processing that are employed in intelligent e-services.

Our findings, analyses, and implementations indicate that considering quality of service (QoS) in e-commerce is crucial. By integrating interactions between customers and agents through natural language processing (NLP), deep learning (DL), biosignal simulations, and artificial intelligence (AI), we observe a significant enhancement in e-commerce QoS. NLP allows agents and chatbots to comprehend and respond to customer inquiries in a natural manner. DL examines interaction patterns to refine service responses. Biosignal simulations identify customer emotions, facilitating empathy-driven engagement. AI coordinates these elements, providing real-time personalized assistance, minimizing response times, and improving customer satisfaction. This collaboration guarantees seamless, intelligent, and emotionally attuned communication between customers and agents, fostering loyalty and service excellence in digital commerce settings.

Moreover, the application of quality of service (QoS) metrics such as responsiveness, availability, and security, correlated with emotional responses, emphasized the importance of context-aware service recommendations in enhancing user satisfaction. This interdisciplinary approach, combining IoT, NLP, and DL, not only improves the quality of service delivery but also fosters a more personalized, efficient, and adaptive e-service environment.

Figure 11 illustrates customer vs. agent interactions. The implementation of isolation forest markedly boosts the accuracy, reliability, and operational efficiency of the e-service system flow, rendering it more resilient for immediate service delivery.

In real-time e-commerce quality of service (QoS), natural language processing (NLP) rapidly processes customer inquiries, deep learning (DL) predicts needs and behaviors, biosignal simulations assess emotional states for personalized responses, and artificial intelligence (AI) manages dynamic decision making. We introduce isolation forest algorithm trees (IFAT), which organize each evaluation and structured response into tree formats. The IFAT (isolation forest algorithm trees) provides fast, personalized, and emotionally aware service, enhancing responsiveness, reducing wait times, and increasing user satisfaction and trust in online transactions.

Importance of Isolation Forest Algorithm for E-Service System Flow (Optimized Seven Steps)

The role of the isolation forest algorithm in the e-service system flow is significant, as it adeptly uncovers unusual patterns in sensor data, which is vital for guaranteeing accurate service provision. (1) Anomaly Detection: The isolation forest algorithm is particularly effective in identifying anomalies within sensor data, allowing for the rapid detection and management of erroneous or outlier data such as heart rate, temperature, and motion, thereby preserving the integrity of the system. (2) Efficient Processing: It functions effectively with high-dimensional data, rendering it appropriate for extensive IoT sensor networks in electronic services, facilitating real-time identification to avert failures or disruptions. (3) Integration with Other Data: The algorithm harmoniously combines with NLP-oriented user comment analysis, augmenting decision making by discarding outliers that might compromise the accuracy of results. (4) Real-Time Monitoring: The isolation forest method supports immediate anomaly detection, an essential feature for dynamic e-service environments like healthcare and smart homes, where a rapid response is of utmost importance. (5) Improved Decision Making: Precise identification of anomalies guarantees that decisions are founded on authentic user and sensor data, thereby enhancing personalized service provisions. (6) Enhanced Service Reliability: By identifying atypical readings, the algorithm enhances the system’s reliability, thereby facilitating improved performance assessment and greater customer satisfaction. (7) Optimized Resource Allocation: Identifying anomalies enhances resource utilization by directing focus and services towards legitimate data, thereby minimizing waste and increasing operational efficiency.

In general, the integration of NLP and deep learning into IoT communications holds the potential to revolutionize e-services, providing enhanced real-time decision-making capabilities, personalized support, and a deeper understanding of customer behavior. These technologies offer a pathway to building intelligent, context-aware systems capable of meeting the evolving needs of users and improving overall service performance. This advancement in e-service platforms will contribute to higher user satisfaction, operational efficiency, and the development of next-generation smart services.

Leveni et al. [44] present the online isolation forest, an innovative and efficient approach for anomaly detection in data streams. By incrementally adapting isolation trees, it allows for real-time detection with reduced memory and computational demands. This strategy achieves competitive results across diverse streaming datasets, thereby improving upon traditional isolation-based methods. Additionally, the research conducted by Geng et al. [45] presents a refined isolation forest algorithm for the purpose of unsupervised anomaly detection in LiDAR SLAM localization. Their methodology enhances the accuracy of localization by effectively pinpointing anomalies in LiDAR data, addressing the difficulties encountered in dynamic environments, and improving the resilience of autonomous vehicle navigation systems. In addition, in a 2025 study, Zhang et al. [46] present a fuzzy testing strategy for smart grid terminals utilizing the isolation forest algorithm. This strategy enhances the detection of abnormal behaviors in terminal communication by recognizing anomalies, thus bolstering the security and reliability of smart grid systems via effective and automated vulnerability assessment. Herreros-Martínez et al. [47] propose a hybrid method for detecting anomalies in enterprise purchasing processes, which integrates clustering techniques with the isolation forest algorithm. This methodology effectively identifies atypical transactions by utilizing both pattern clustering and isolation strategies, thereby improving detection precision and aiding in fraud prevention as well as optimizing business operations. In their 2024 study, Kaššaj and Peráček [48] examine the amalgamation of mobile roaming, WiFi4EU, and smart city concepts aimed at advancing sustainable connectivity in the European Union, which promotes digital inclusion, efficient infrastructure, and better citizen services within a unified technological framework. According to [49], in the context of e-commerce and quality of service (QoS), these innovations improve personalization, accuracy of responses, and user satisfaction, leading to greater service efficiency, better decision making, and smoother interactions, thus elevating both customer experience and operational quality in the digital market landscape.

Traditional methods for anomaly detection, often based on rules or statistics, encounter difficulties in scaling to the intricate and dynamic nature of modern e-commerce data. Furthermore, they are ineffective when addressing high-dimensional features and natural language inputs. This study proposes a hybrid model that integrates deep learning for feature extraction, NLP for semantic analysis, and the isolation forest algorithm for the identification of anomalies [50]. Data boxes serve as an essential digital resource in the realm of Czech public administration, improving secure communication and service delivery. From the perspective of e-commerce quality of service (QoS), they bolster reliability, accessibility, and security, which leads to more efficient user interactions, quicker transactions, and increased trust in digital public services, aligning with current e-commerce service standards. A study analyzed the impact of NLP, AI, and ML technologies in mobile applications on digital transformation and knowledge migration. Dritsas and Trigka [51] investigate the incorporation of machine learning within intelligent networks, providing an overview of architectures, essential techniques, and practical applications. The study emphasizes the role of ML in improving adaptability, automation, and decision-making processes in contemporary network infrastructures, thereby facilitating scalable, efficient, and context-sensitive network services for modern online businesses.

The classified results shown in Table 5 demonstrate how interpretable AI solutions that leverage QoS metrics, NLP, and DL can identify distinct user intents like complaints, escalations, and refunds. The high precision and recall, especially for vital classes, reflect the system’s efficiency in understanding customer problems. By linking feedback to QoS aspects (e.g., responsiveness or product quality), these solutions enable e-commerce platforms to respond rapidly, personalize their services, and maintain trust, while also ensuring performance transparency through interpretable metrics.

7. Conclusions

The study demonstrates how the synthesis of natural language processing (NLP), deep learning (DL), and anomaly detection algorithms can significantly improve the efficacy of e-commerce services and communication platforms.

By employing NLP models for sentiment and emotion recognition in tandem with real-time physiological sensor data—such as heart rate, temperature, and motion—the system is better equipped to interpret both verbal and non-verbal user signals. This data fusion enables adaptive, user-oriented service delivery and the optimization of quality of service (QoS) based on contextual analysis.

The application of the isolation forest algorithm within this structure facilitates robust anomaly detection, identifying irregular behavioral emotional patterns that determined dissatisfaction, stress, and service complications. Furthermore, the integration of wireless sensor networks (WSN) with Internet of things (IoT) technologies offers a strong infrastructure for real-time data acquisition, analysis, and automated decision making.

In summary, this research offers a unique, multi-dimensional framework for enhancing user interaction and service quality in intelligent e-commerce systems, with encouraging prospects for broader applications in customized digital services, dynamic user interfaces, and proactive service management.

7.1. Practical Implications for E-Commerce Platforms Based on Anomaly-Sentiment Integration

The table provides a concise summary of actionable insights and design guidelines for e-commerce platforms, based on findings from anomaly detection and sentiment analysis.

Table 9 lists the practical implications for e-commerce platforms based on anomaly–sentiment integration. The table focuses on strategies for early detection of issues, personalized support, improvements in user interface, content moderation, and product refinement, thus enabling developers to enhance user experience and operational responsiveness.

7.2. Limitations and Further Research

The model in question utilizes synthetic data alongside various decision trees. Its functionality is dependent on the integration of dynamic user feedback and the capability for real-time anomaly detection. Future investigations should consider the application of real datasets, the use of deeper trees or ensemble models (for instance, isolation forests), and the integration of streaming data to bolster adaptability, interpretability, and the real-time handling of anomalies in quality of service (QoS) monitoring.

Table 10 highlights the limitations and directions for future research. Future research should employ more complex tree structures or ensemble techniques (such as isolation forests), and incorporate streaming data to improve adaptability, interpretability, and the immediate anomaly response for overseeing quality of service (QoS).

Author Contributions

Conceptualization, P.M.M.; methodology, P.M.M.; software, P.M.M.; validation, P.M.M.; formal analysis, P.M.M.; investigation, P.M.M.; resources, P.M.M.; data curation, P.M.M.; writing—original draft preparation, P.M.M.; writing—review and editing, P.M.M., I.S. and T.P.-P.; visualization, P.M.M.; supervision, I.S. and T.P.-P.; project administration, I.S. and T.P.-P.; funding acquisition, P.M.M., I.S. and T.P.-P. All authors have read and agreed to the published version of the manuscript.

Funding

This at This research was funded by the Excellence Initiative—Research University (IDUB) at AGH University of Krakow.

Data Availability Statement

The datasets used in the analysis were from Kaggle. https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter accessed on 9 July 2025 [43]. This is the link to the datasets’ DOI (Digital Object Identifier) https://doi.org/10.34740/kaggle/dsv/8841.

Acknowledgments

We gratefully acknowledge the support and funding provided by the AGH University of Krakow and the IDUB administration, which made this paper possible.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
NLP	Natural language processing
DL	Deep learning
IFAT	Isolation forest algorithm tree
IFA	Isolation forest algorithm
WSN	Wireless sensor network
IoT	Internet of things
GNLP	Generative natural language processing

References

Aulkemeier, F.; Paramartha, M.A.; Iacob, M.E.; van Hillegersberg, J. A pluggable service platform architecture for e-commerce. Inf. Syst. e-Bus. Manag. 2016, 14, 469–489. [Google Scholar] [CrossRef]
Tran, L.T.T. Managing the effectiveness of e-commerce platforms in a pandemic. J. Retail. Consum. Serv. 2021, 58, 102287. [Google Scholar] [CrossRef]
Lucas, G.A.; Lunardi, G.L.; Dolci, D.B. From e-commerce to m-commerce: An analysis of the user’s experience with different access platforms. Electron. Commer. Res. Appl. 2023, 58, 101240. [Google Scholar] [CrossRef]
Mah, P.M. National AI strategies. Eur. Res. Stud. J. 2024, 27, 96–115. [Google Scholar]
Karunaratne, T. Machine Learning and Big Data Approaches to Enhancing E-commerce Anomaly Detection and Proactive Defense Strategies in Cybersecurity. J. Adv. Cybersecur. Sci. Threat Intell. Countermeas. 2023, 7, 1–16. [Google Scholar]
Rane, N.; Choudhary, S.; Rane, J. Artificial intelligence, natural language processing, and machine learning to enhance e-service quality on e-commerce platforms. Intell. Mach. Learn. 2024, 4, 67–82. [Google Scholar]
Girimurugan, B.; Kumaresan, V.; Nair, S.G.; Kuchi, M.; Kholifah, N. AI and Machine Learning in E-Commerce Security: Emerging Trends and Practices. In Strategies for E-Commerce Data Security: Cloud, Blockchain, AI, and Machine Learning; IGI Global: Hershey, PA, USA, 2024; pp. 29–53. [Google Scholar]
Mudgal, A. Leveraging AI and ML for Proactive Threat Detection for E-Commerce. In Strategic Innovations of AI and ML for E-Commerce Data Security; IGI Global: Hershey, PA, USA, 2025; pp. 281–322. [Google Scholar]
Kalusivalingam, A.K.; Sharma, A.; Patel, N.; Singh, V. Enhancing B2B Fraud Detection Using Ensemble Learning and Anomaly Detection Algorithms. Int. J. AI ML 2022, 3. [Google Scholar]
Gracious, L.A.; Sudha, L.; Chitra, B.; Kaur, G.; Sathya, V.; Kabitha, P.; Subramanian, R.S. Advancing E-Commerce Security: Strategic Innovations and Future Directions in AI and ML. In Strategic Innovations of AI and ML for E-Commerce Data Security; IGI Global: Hershey, PA, USA, 2025; pp. 79–106. [Google Scholar]
Khurana, R.; Kaul, D. Dynamic cybersecurity strategies for ai-enhanced ecommerce: A federated learning approach to data privacy. Appl. Res. Artif. Intell. Cloud Comput. 2019, 2, 32–43. [Google Scholar]
Villegas-Ch, W.; Jaramillo-Alcazar, A.; Navarro, A.M.; Mera-Navarrete, A. Integrating Explainable Artificial Intelligence in Anomaly Detection for Threat Management in E-Commerce Platforms. IEEE Access 2025, 13, 29830–29846. [Google Scholar] [CrossRef]
Al-Ebrahim, M.A.; Bunian, S.; Nour, A.A. Recent Machine-Learning-Driven Developments in E-Commerce: Current Challenges and Future Perspectives. Eng. Sci. 2023, 28, 1044. [Google Scholar] [CrossRef]
Zhang, X.; Guo, F.; Chen, T.; Pan, L.; Beliakov, G.; Wu, J. A brief survey of machine learning and deep learning techniques for e-commerce research. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 2188–2216. [Google Scholar] [CrossRef]
Kalla, D. Improving E-Commerce Organization Performance Using Big Data Analytics and Artificial Intelligence. Ph.D. Thesis, Colorado Technical University, Colorado Springs, CO, USA, 2024. [Google Scholar]
Feng, L. Data Analysis and Prediction Modeling Based on Deep Learning in E-Commerce. Sci. Program. 2022, 2022, 1041741. [Google Scholar] [CrossRef]
Yu, W.; Sun, Z.; Liu, H.; Li, Z.; Zheng, Z. Multi-level Deep Learning based e-Commerce Product Categorization. In Proceedings of the eCOM@ SIGIR, Ann Arbor, MI, USA, 12 July 2018. [Google Scholar]
Shankar, D.; Narumanchi, S.; Ananya, H.; Kompalli, P.; Chaudhury, K. Deep learning based large scale visual recommendation and search for e-commerce. arXiv 2017, arXiv:1703.02344. [Google Scholar] [CrossRef]
Nabi, N.; Pabel, M.A.H.; Rahman, M.A.; Mozumder, M.A.S.; Al-Imran, M.; Sweet, M.M.R.; Islam, M.Z.; Miah, M.N.I.; Naznin, R.; Sharif, M.K. Unleashing Deep Learning: Transforming E-commerce Profit Prediction with CNNs. J. Bus. Manag. Stud. 2024, 6, 126–131. [Google Scholar] [CrossRef]
Zhang, P. E-commerce products recognition based on a deep learning architecture: Theory and implementation. Future Gener. Comput. Syst. 2021, 125, 672–676. [Google Scholar] [CrossRef]
Jha, B.K.; Sivasankari, G.; Venugopal, K. Sentiment analysis for E-commerce products using natural language processing. Ann. Rom. Soc. Cell Biol. 2021, 25, 166–175. [Google Scholar]
Lin, X. Sentiment analysis of e-commerce customer reviews based on natural language processing. In Proceedings of the 2020 2nd International Conference on Big Data and Artificial Intelligenceand Internet of Things Engineering (ICBAIE 2020), Fuzhou, China, 12–14 June 2020; pp. 32–36. [Google Scholar]
Soundarapandian, R. Natural Language Processing in E-Commerce-Enhancing Customer Experience; Academic Guru Publishing House: Bhopal, India, 2024. [Google Scholar]
Ismail, W.S.; Ghareeb, M.M.; Youssry, H. Enhancing customer experience through sentiment analysis and natural language processing in e-commerce. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 2024, 15, 60–72. [Google Scholar] [CrossRef]
Gałka, Ł.; Karczmarek, P.; Tokovarov, M. Effective enhancement of isolation Forest method based on Minimal Spanning tree clustering. Inf. Sci. 2023, 628, 320–338. [Google Scholar] [CrossRef]
Marteau, P.F.; Soheily-Khah, S.; Béchet, N. Hybrid isolation forest-application to intrusion detection. arXiv 2017, arXiv:1705.03800. [Google Scholar] [CrossRef]
Cheng, Z.; Zou, C.; Dong, J. Outlier detection using isolation forest and local outlier factor. In Proceedings of the Conference on Research in Adaptive and Convergent Systems (RACS ’19), Chongqing, China, 24–27 September 2019; pp. 161–168. [Google Scholar]
Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
Hariri, S.; Kind, M.C.; Brunner, R.J. Extended isolation forest. IEEE Trans. Knowl. Data Eng. 2019, 33, 1479–1489. [Google Scholar] [CrossRef]
Ding, Z.; Fei, M. An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proc. Vol. 2013, 46, 12–17. [Google Scholar] [CrossRef]
Heigl, M.; Anand, K.A.; Urmann, A.; Fiala, D.; Schramm, M.; Hable, R. On the improvement of the isolation forest algorithm for outlier detection with streaming data. Electronics 2021, 10, 1534. [Google Scholar] [CrossRef]
Dzemydienė, D.; Burinskienė, A.; Čižiūnienė, K.; Miliauskas, A. Development of E-service provision system architecture based on IoT and WSNs for monitoring and management of freight intermodal transportation. Sensors 2023, 23, 2831. [Google Scholar] [CrossRef]
Pathan, A.S.K.; Islam, H.K.; Sayeed, S.A.; Ahmed, F.; Hong, C.S. A framework for providing e-services to the rural areas using wireless ad hoc and sensor networks. arXiv 2007, arXiv:0712.4168. [Google Scholar] [CrossRef]
Sattar, A.; Shampod, Y.A.; Ahmed, M.T.; Akter, N.; Mahmud, A. Deployment of e-services based contextual smart agro system using internet of things. Bull. Electr. Eng. Inform. 2022, 11, 414–425. [Google Scholar] [CrossRef]
Arnab, A.A.; Shuvro, A.A.; Ma, K.; Leung, H. A Deep Learning Approach for a QoS Prediction System in Cellular Networks. In Proceedings of the 2023 IEEE 9th World Forum on Internet of Things (WF-IoT), Aveiro, Portugal, 12–27 October 2023; pp. 1–6. [Google Scholar]
Rehman, I.U.; Nasralla, M.M.; Philip, N.Y. Multilayer perceptron neural network-based QoS-aware, content-aware and device-aware QoE prediction model: A proposed prediction model for medical ultrasound streaming over small cell networks. Electronics 2019, 8, 194. [Google Scholar] [CrossRef]
Vijayakumar, S.; Flynn, R.; Corcoran, P.; Murray, N. Predicting Quality of Multimedia Experience using Electrocardiogram and Respiration Signals. IEEE Access 2024, 13, 33600–33618. [Google Scholar] [CrossRef]
Rubio-Drosdov, E.; Díaz-Sánchez, D.; Almenárez, F.; Arias-Cabarcos, P.; Marín, A. Seamless human-device interaction in the internet of things. IEEE Trans. Consum. Electron. 2017, 63, 490–498. [Google Scholar] [CrossRef]
Söldner, R.; Rheinländer, S.; Meyer, T.; Olszowy, M.; Austerjost, J. Human–device interaction in the life science laboratory. In Smart Biolabs of the Future; Springer: Berlin/Heidelberg, Germany, 2022; pp. 83–113. [Google Scholar]
Ni, P.; Li, Y.; Li, G.; Chang, V. Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction. Neural Comput. Appl. 2020, 32, 16149–16166. [Google Scholar] [CrossRef]
Majewski, M.; Kacalak, W. Intelligent speech interaction of devices and human operators. In Software Engineering Perspectives and Application in Intelligent Systems; Springer: Cham, Switzerland, 2016; pp. 471–482. [Google Scholar] [CrossRef]
Niezen, G.; Eslambolchilar, P. A human operator model for medical device interaction using behavior-based hybrid automata. IEEE Trans. Hum.-Mach. Syst. 2015, 46, 291–302. [Google Scholar] [CrossRef]
Axelbrooke, S. Customer Support on Twitter. 2017. Available online: https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter (accessed on 9 July 2025).
Leveni, F.; Cassales, G.W.; Pfahringer, B.; Bifet, A.; Boracchi, G. Online isolation forest. arXiv 2025, arXiv:2505.09593. [Google Scholar] [CrossRef]
Geng, G.; Wang, P.; Sun, L.; Wen, H. Enhanced isolation forest-based algorithm for unsupervised anomaly detection in lidar SLAM localization. World Electr. Veh. J. 2025, 16, 209. [Google Scholar] [CrossRef]
Zhang, Y.; Tian, Q.; Dong, B.; Su, Y.; Sun, Y. A Fuzz Testing Method for Smart Grid Terminals Based on the Isolation Forest Algorithm. In Proceedings of the 2025 IEEE 8th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 14–16 March 2025; Volume 8, pp. 116–120. [Google Scholar]
Herreros-Martínez, A.; Magdalena-Benedicto, R.; Vila-Francés, J.; Serrano-López, A.J.; Pérez-Díaz, S.; Martínez-Herráiz, J.J. Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes: A Hybrid Approach Using Clustering and Isolation Forest. Information 2025, 16, 177. [Google Scholar] [CrossRef]
Kaššaj, M.; Peráček, T. Sustainable connectivity—Integration of mobile roaming, WiFi4EU and smart city concept in the European union. Sustainability 2024, 16, 788. [Google Scholar] [CrossRef]
Mah, P.M.; Skalna, I.; Pełech-Pilichowski, T.; Derlecki, T.; Mah, V.A.; Nyamka, K. Enabling Digital Transformation and Knowledge Migration: The Impact of NLP, AI, and ML in Mobile Applications; Scientific Papers of Silesian University of Technology—Organization & Management Series; Faculty of Organization and Management, Silesian University of Technology: Gliwice, Poland, 2023. [Google Scholar]
Dušek, J. Data Boxes as a Part of the Strategic Concept of Computerization of Public Administration in the Czech Republic. Adm. Sci. 2023, 13, 154. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M. Machine Learning in Intelligent Networks: Architectures, Techniques, and Use Cases. IEEE Access 2025, 13, 102724–102745. [Google Scholar] [CrossRef]

Figure 1. Isolation forest algorithm trees (IFAT) for e-commerce QoS: (a) quality, (b) responsiveness, (c) availability, (d) security, (e) assurance, (f) loyalty, (g) interaction, (h) payment, (i) connectivity, (j) return policy, (k) support services, (l) customer care. Source: author’s copy.

Figure 2. E-service system architecture three fundamental layers. Source: author’s copy.

Figure 3. Five-step model explanation of findings indicated by the algorithm for the service system flow: sensor, communication, NLP, integration, and evaluation: (a) isolation forest tree for perception layer; (b) isolation forest tree network layer for quality of service; (c) key terms and applicable layer for quality service issues; (d) integration flowchart and decision layer; (e) evaluation layers for quality service issues confusion matrix. Source: author’s copy.

Figure 4. Isolation forest algorithm application: (a) body temperature distribution by sentence; (b) heart rate distribution by sentence; (c) motion activity by sentence; (d) NLP confidence scores per sentence; (e) sentiment by sentence; (f) heart rate vs. temperature by sentence; (g) anomalies per sentence. Source: author’s copy.

Figure 5. Training data (A) and testing data (B). Source: author’s copy.

Figure 6. Isolation forest algorithm anomaly detection. Source: author’s copy.

Figure 7. Isolation forest algorithm (IFA) training and testing. Source: author’s copy.

Figure 8. Isolation forest algorithm trees (IFAT). Source: author’s copy.

Figure 9. Isolation forest algorithm trees (IFAT) matrix for e-commerce services. Source: author’s copy.

Figure 10. Perception component matrix (PCM) training and testing. Source: author’s copy.

Figure 11. Customer vs. agent interactions. Source: author’s copy.

Table 1. Deep learning applications in E-services. Source: author’s copy.

Application Area	Function/Equation	Explanation
Pattern Recognition	$\hat{y} = softmax (W \cdot f (x) + b)$	Classifies inputs (e.g., user activity, images, text) using CNN/RNN and softmax to produce class probabilities.
Decision Making (Reinforcement Learning)	$Q (s, a) = r + γ max_{a^{'}} Q (s^{'}, a^{'})$	Estimates action values to select optimal actions in dynamic E-service environments.
Predictive Maintenance (LSTM Forecasting)	$h_{t} = LSTM (x_{t}, h_{t - 1}), {\hat{y}}_{t} = W h_{t} + b$	Forecasts failures by learning patterns from time series data in equipment or systems.
Intelligent Automation (Hybrid Models)	$Output = \{\begin{matrix} Rule-based, & if x_{i} \in rules \\ NN-based, & otherwise \end{matrix}$	Combines rule-based logic and neural networks to automate complex E-service decisions.

Table 2. Isolation forest trees: QoS-based perception analysis. Source: author’s copy.

QoS Focus	Keywords Used	Purpose
Quality ( $Q$ )	quality, complaint, defective, malfunction	Detect anomalies in perceived product quality.
Responsiveness ( $R$ )	refund, request, app	Capture anomalies in service response delays.
Availability ( $A$ )	bug, issue, damage	Identify issues related to system uptime and functionality.
Security ( $S$ )	uncertainty, trust	Detect anomalies related to perceived system trust and security.
Quality + Security	assurance, complaint, trust	Assess patterns indicating repeated assurance or security failures.
Value Services	loyalty, value, service	Detect perceptions of value, loyalty, and overall service effectiveness.

Table 3. Core evaluation metrics for e-services. Source: author’s copy.

Metric	Formula	Interpretation
Latency	$L = t_{resp} - t_{req}$	Delay in communication
Uptime	$U = \frac{Service Hours}{Total Hours}$	Service availability
Engagement	$E = \frac{Active Sessions}{Total Users}$	Platform usage level (e-learning)
Accuracy	$\frac{TP + TN}{TP + TN + FP + FN}$	Anomaly detection or event prediction accuracy

Table 4. Quote, QoS, predicted tone, and heartbeat model data. Source: author’s copy.

Quote	QoS	Predicted Tone	Heartbeat Model
The filter makes a grinding noise	Q	negative	99
The air quality meter is way off	Q	negative	99
The touch panel freezes	R	negative	100
Starts grinding like it’s chewing gravel	Q	negative	99
AQI meter constantly shows 999	Q	negative	99
Touch controls stop responding	R	negative	100
Have to unplug it to reset	A	negative	100
It’s not even a week old	A	negative	99
I don’t trust your replacements	S	negative	99
Tons of negative reviews	S	negative	99
Build quality is poor	Q	negative	100
Circuit issues, sensors failing, overheating	S	negative	100
Random shut-offs	A	negative	99
Firmware feels half-baked	Q	negative	100
App has more bugs than features	R	negative	99
Restart three times to pair with Wi-Fi	A	negative	99

Table 5. Model components of BERT-based emotion classifier.

Layer	Type
Input	Tokenized text (includes [CLS], [SEP], attention mask)
Transformer Backbone	BERT-base (12 transformer encoder layers)
Dense	Fully connected layer (for classification head)
Activation	Softmax (outputs probability distribution over emotions)

Table 6. Biosignal feature characteristics.

Feature	Value Range	Behavior
Heart Rate	60–120 bpm	Higher during stress or frustration
Temperature	36.5–38.5 °C	Rises with emotional intensity

Table 7. Comparative overview of anomaly detection models.

Model	Parameter/Layer	Value	Explanation
Isolation Forest (Baseline)	n_estimators	100	Number of trees in the forest
	max_samples	auto	Number of samples drawn per tree
	contamination	0.1	Proportion of anomalies expected
	behaviour	new	Ensures consistent behavior
	random_state	42	Ensures reproducibility
			Note: Anomalies isolate faster in fewer splits
Local Outlier Factor (LOF)	n_neighbors	20	Size of local neighborhood
	metric	euclidean	Distance measure used
	novelty	True	Enables prediction on new data
Autoencoder (Deep Learning)	Input	10–20 features	Combination of text and biosignal data
	Dense (enc1)	64/ReLU	First encoder layer
	Dense (enc2)	32/ReLU	Second encoder layer
	Bottleneck	16	Compressed latent feature representation
	Dense (dec1)	32/ReLU	First decoder layer
	Dense (dec2)	64/ReLU	Second decoder layer
	Output	Input size/Sigmoid	Reconstructs original input features
Autoencoder Hyperparameters	Loss	MSE	Mean squared error used for reconstruction loss
	Optimizer	Adam	Adaptive learning rate optimizer
	Learning Rate	0.001	Step size in gradient descent
	Epochs	50	Training cycles through the dataset
	Batch Size	16	Samples processed before updating weights

Table 8. Classification report. Source: author’s copy.

Class	Precision	Recall	F1-Score	Support
Complaint	0.67	1.00	0.80	2
Escalation	1.00	1.00	1.00	1
Refund	1.00	0.67	0.80	3
Accuracy			0.83	6
Macro avg	0.89	0.89	0.87	6
Weighted avg	0.89	0.83	0.83	6

Table 9. Practical implications for e-commerce platforms based on anomaly-sentiment integration.

Area	Actionable Insight	Design Guideline
Early Detection of Dissatisfaction	Link emotional tone with physiological anomaly detection to flag at-risk users in real time.	Implement hybrid NLP-sensor monitoring for adaptive support escalation.
Personalized Intervention Thresholds	Use adaptive behavioral baselines instead of static rules to detect unusual patterns.	Develop dynamic models that adjust thresholds based on individual history.
Emotion-Aware UX/UI Design	Identify recurring frustration paths using anomaly-emotion maps.	Redesign high-friction elements (e.g., checkout or refund navigation) using emotional heatmaps.
Content Moderation and Prioritization	Prioritize highly negative or anomalous reviews for human or automated moderation.	Integrate AI-assisted triage that flags emotionally intense comments for quick response.
Feedback Loop for Product Improvement	Cluster anomaly-linked complaint themes to guide product or policy adjustments.	Align NLP-anomaly insights with technical logs or warranty reports for targeted improvements.

Table 10. Limitations and future research directions of IFA and IFAT. Source: author’s copy.

Limitations	Future Research Directions
Random partitioning in IFA can result in instability and variable anomaly scoring, particularly when dealing with small or noisy datasets.	Devise advanced feature selection strategies or guided randomization techniques to enhance the stability and consistency of anomaly detection.
Employing only two features for visualization restricts the generalizability and risks oversimplifying the complexities of multi-dimensional QoS data.	Employ manifold learning techniques like t-SNE and UMAP to broaden visualization methods into higher dimensions for a more effective representation of sophisticated feedback.
Binary synthetic data could potentially overlook the fine details and complexities inherent in actual customer feedback during QoS monitoring.	Utilize more authentic, multi-valued, or continuous synthetic data simulations in the training and validation processes of models.
When dealing with unsupervised anomaly contexts, decision tree splits based on Gini or information gain may prove suboptimal, as the true labels are not available.	Analyze unsupervised division criteria or anomaly-oriented metrics that are specifically developed for unlabeled quality of service data.
The static model structure may struggle to adjust to the shifting patterns or concept drift associated with the evolving behavior of customers.	Investigate dynamic or adaptive isolation forest frameworks (such as online learning and reinforcement-learning-based trees) for real-time quality of service monitoring.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mah, P.M.; Skalna, I.; Pelech-Pilichowski, T. AI-Driven Anomaly Detection in E-Commerce Services: A Deep Learning and NLP Approach to the Isolation Forest Algorithm Trees. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 214. https://doi.org/10.3390/jtaer20030214

AMA Style

Mah PM, Skalna I, Pelech-Pilichowski T. AI-Driven Anomaly Detection in E-Commerce Services: A Deep Learning and NLP Approach to the Isolation Forest Algorithm Trees. Journal of Theoretical and Applied Electronic Commerce Research. 2025; 20(3):214. https://doi.org/10.3390/jtaer20030214

Chicago/Turabian Style

Mah, Pascal Muam, Iwona Skalna, and Tomasz Pelech-Pilichowski. 2025. "AI-Driven Anomaly Detection in E-Commerce Services: A Deep Learning and NLP Approach to the Isolation Forest Algorithm Trees" Journal of Theoretical and Applied Electronic Commerce Research 20, no. 3: 214. https://doi.org/10.3390/jtaer20030214

APA Style

Mah, P. M., Skalna, I., & Pelech-Pilichowski, T. (2025). AI-Driven Anomaly Detection in E-Commerce Services: A Deep Learning and NLP Approach to the Isolation Forest Algorithm Trees. Journal of Theoretical and Applied Electronic Commerce Research, 20(3), 214. https://doi.org/10.3390/jtaer20030214

Article Menu

AI-Driven Anomaly Detection in E-Commerce Services: A Deep Learning and NLP Approach to the Isolation Forest Algorithm Trees

Abstract

1. Introduction

2. Literature Review

2.1. Anomaly Detection in E-Commerce

2.2. Deep Learning for E-Commerce

2.3. Natural Language Processing in E-Commerce

2.4. Isolation Forest Algorithm Trees

2.4.1. Isolation Forest Algorithm Trees for E-Commerce Services

2.4.2. Isolation Forest Algorithm

2.5. E-Services Powered by WSN + IoT

2.6. Natural Language Processing in Human–Device Interaction

3. Materials and Methods

3.1. Dataset Overview

3.2. Data Preprocessing Steps

3.3. Natural Language Processing Model Components

3.4. Simulated Biosignal Data

3.5. Comparative Overview of Anomaly Detection Models

3.6. System Architecture

3.7. Experimental Analysis

4. Further Analysis

4.1. Isolation Forest Algorithm

4.2. Isolation Forest Algorithm Application

5. Isolation Forest Algorithm (IFA) vs. Isolation Forest Algorithm Trees (IFAT)

5.1. Isolation Forest Algorithm (IFA)

Isolation Forest Algorithm (IFA) Training and Testing

5.2. Isolation Forest Algorithm Trees (IFAT)

5.2.1. Isolation Forest Algorithm Trees (IFAT) Matrix

5.2.2. Perception Component Matrix (PCM) Training and Testing

6. Discussion

Importance of Isolation Forest Algorithm for E-Service System Flow (Optimized Seven Steps)

7. Conclusions

7.1. Practical Implications for E-Commerce Platforms Based on Anomaly-Sentiment Integration

7.2. Limitations and Further Research

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI