1. Introduction
The rapid growth of e-commerce sites has resulted in the generation of significant amounts of heterogeneous information due to user interactions, transactions, and product information. The e-commerce systems of the current era operate on a large scale in a highly dynamic environment where the preferences of the customers are constantly changing, the product offerings are changing at a rapid rate, and the decisions have to be made in an uncertain and dynamic environment [
1].
From the data-centric point of view, the following are the basic properties of the e-commerce platform: First, the data associated with the e-commerce environment is typically massive, sparse, and high-dimensional, involving millions of users and products with limited feedback information. Second, the data is also diverse and multimodal, combining the structured information from transactions with unstructured information, such as text (e.g., queries and reviews), images (e.g., product images), and sequential information. Third, the environment is highly dynamic and non-stationary, driven by factors such as seasonality, promotional activities, and external market factors. Finally, there are many e-commerce problems that have real-time or near-real-time decision requirements, requiring models that are accurate and computationally efficient.
In addition, earlier e-commerce platforms were mainly dependent on conventional machine learning approaches for various e-commerce applications, including recommendation, demand prediction, and customer segmentation. Even though these approaches were relatively successful, there are often problems that these approaches face, including difficulties in handling complex non-linear relationships, long-range temporal dependencies, and high-order feature interactions, which are often inherent in e-commerce domains. However, Deep Learning has become the leading approach, given its ability to directly learn hierarchical representations of data and enable end-to-end optimization of various e-commerce applications [
2].
Recent advances in neural architectures—including Recurrent Neural Networks (RNNs) [
3], Convolutional Neural Networks (CNNs), Transformers, Graph Neural Networks (GNNs), and Deep Reinforcement Learning (DRL)—have fundamentally reshaped intelligent e-commerce systems. These models have demonstrated superior performance over classical approaches in a wide range of applications, such as intermittent sales forecasting [
4], customer behavior prediction [
5], and customer satisfaction estimation [
6]. Their ability to jointly model sequential, contextual, relational, and multimodal information makes them particularly well suited to the complexity of online retail environments.
Despite these significant developments, existing surveys tend to focus on specific problem domains or specific families of models—most notably in the area of recommendation systems—thus offering little guidance on how DL contributes to end-to-end decision-making in the e-commerce chain. In this evolving landscape of research on DL for e-commerce intelligence, we can conceptually identify three closely interrelated aspects: prediction, personalization, and decision intelligence.
Prediction-oriented models aim to forecast key business variables such as product demand, sales volume, customer lifetime value, churn probability, and delivery performance. Numerous studies have demonstrated that Deep Neural Networks (DNNs) consistently outperform traditional time-series and regression-based methods in these tasks, particularly under conditions of data sparsity and non-stationarity [
6]. Personalization constitutes the second major pillar of DL in e-commerce, focusing on tailoring recommendations, content, pricing, and user experiences to individual users. DL-based recommender systems have become the de facto standard in large-scale platforms, leveraging latent representations, attention mechanisms, and sequential modelling to capture complex user–item interactions [
7]. Recent work further enhances personalization through sentiment-aware and multimodal approaches that integrate textual reviews and visual product features, enabling richer and more explainable recommendations [
8].
Aside from prediction and personalization, decision intelligence has been identified as another important paradigm that incorporates DL models into a prescriptive and optimization-centric framework. It is important to distinguish between prediction models that forecast outcomes (e.g., CTR/CVR, churn risk, demand), and decision-making or prescriptive models that determine actions to be taken under a set of objectives and/or constraints.
Decision intelligence systems aim to support or automate strategic and operational decisions such as dynamic pricing, promotion planning, inventory control, and customer engagement strategies. In this context, RL/DRL is best viewed as one methodological family within decision intelligence (alongside contextual bandits, constrained optimization, and causal decision-making), because it explicitly learns action policies through interaction with environments [
9,
10]. This framing avoids conflating prediction with control and clarifies how predictive models can act as components (e.g., simulators, reward models, state estimators) inside broader decision-making pipelines.
From a theoretical point of view, E-commerce Decision Intelligence is inherently related to economics and consumer behavior theories. The choices of a consumer depend on utility, price elasticity, consideration set, trust, and delivery constraints. Therefore, any Decision Learning (DL) model or policies failing to consider these factors might lead to learning correlations based on platform logs. The integration of economic structures such as elasticity and consumer behavior will improve the interpretability, validity, and robustness of decision learning under changing market conditions.
This survey offers a thorough overview of DL methodologies for e-commerce applications, with a focus on prediction, personalization, and decision intelligence. Unlike previous surveys that have focused on particular tasks or types of models, this one offers a business decision-oriented view of DL for e-commerce, based on particular objectives and contexts. Specifically, this survey (i) offers a unified view that systematically links DL models to key e-commerce functions along the entire value chain; (ii) offers a critical evaluation of state-of-the-art models for their scalability, interpretability, and deployment limitations; (iii) offers a unified view that combines prediction, personalization, and prescriptive models under the new umbrella of decision intelligence; and (iv) offers a discussion of research gaps and opportunities for future research, including those related to explainable and responsible AI, multimodal foundation models, and RL for long-term business optimization.
Figure 1 presents a unified taxonomy of deep learning methods in e-commerce, organizing the literature along three intelligence pillars—prediction, personalization, and decision intelligence—while explicitly linking application tasks, data modalities, and representative model families. This taxonomy provides a structural lens through which the subsequent sections of this survey are organized, helping to clarify the relationships between e-commerce data characteristics, learning objectives and deep learning architectures.
The remainder of this paper is organized as follows.
Section 2 introduces the characteristics of e-commerce data and the foundational DL architectures that underpin modern platforms.
Section 3 reviews DL approaches for consumer behavior prediction and demand forecasting.
Section 4 examines recommendation systems with an emphasis on personalization and multimodal modelling.
Section 5 focuses on sentiment analysis and review intelligence.
Section 6 discusses trust, security, fraud defense, and anomaly detection.
Section 7 surveys catalogue intelligence and product understanding.
Section 8 explores pricing, operations, and decision intelligence.
Section 9 discusses cross-cutting challenges and future research directions, followed by concluding remarks.
2. Deep Learning Foundations for e-Commerce
2.1. Characteristics of e-Commerce Data
E-commerce platforms generate large-scale, heterogeneous, and continuously evolving data streams that reflect complex interactions between users, products, and business processes [
11,
12,
13,
14,
15]. These data are typically high-dimensional, sparse, noisy, and temporally dependent, posing significant challenges for traditional machine learning approaches [
2,
8]. Transactional data constitute the backbone of e-commerce analytics and include purchase histories, order quantities, prices, timestamps, and payment information. Such data are inherently sparse, as users interact with only a small subset of available products, and exhibit strong temporal patterns driven by seasonality, promotions, and external events [
4]. In addition, clickstream and session data capture fine-grained user behavior, such as page views, dwell time, and navigation paths, forming sequential records that are crucial for modelling user intent and short-term preferences [
5]. Textual data, including product descriptions, customer reviews, and search queries, provide rich semantic information and are widely used for sentiment analysis, opinion mining, and explainable personalization. These data sources are often unstructured and noisy, requiring advanced Natural Language Processing (NLP) techniques for effective representation learning [
8]. Visual data, such as product images and videos, further enrich product representation and play an increasingly important role in categories where visual appearance strongly influences purchasing decisions [
2]. Moreover, e-commerce data are inherently multimodal and context-dependent, integrating information across users, items, time, and platforms. This multimodality, combined with privacy constraints and limited access to public benchmarks, significantly complicates model training, evaluation, and reproducibility [
1].
2.2. Deep Learning Architectures for e-Commerce Applications
DL forms the algorithmic foundation for current e-commerce applications, including predictive modelling, multimodal product understanding, behavior analytics, graph reasoning, decision intelligence using RL, and distributed AI [
16,
17,
18,
19,
20]. Although they span a broad range of tasks within e-commerce recommendation, pricing recommendation, catalogue recommendation, anti-fraud analytics, and supply chain analytics, they are anchored on a common body of core concepts using various types of DL paradigms. This part identifies core foundations divided into four broad categories: Neural Representation Learning; Structured, Sequential, and Graph-Based Models; Decision Intelligence and Reinforcement Learning; and Scalability, Security, and Distributed Foundations.
Why “Standard” Architectures Become Non-Standard in e-Commerce
A misconception that is often found in the literature is that the application of transformers, graph neural networks, or reinforcement learning to e-commerce is actually a form of reusing generic models. However, it is important to understand that there is a unique blend of data and system constraints that often requires non-trivial modifications.
Key e-commerce-specific nuances that drive methodological differentiation include:
Implicit feedback and exposure bias: Most interaction signals are implicit (clicks, views, carts) and heavily shaped by platform exposure, which can bias both supervised learning and offline evaluation.
Multi-stage retrieval–ranking pipelines: Models are deployed as cascades (candidate generation, ranking, re-ranking), so architectural choices must consider end-to-end latency budgets and cross-stage consistency.
Extreme scale and long-tail heterogeneity: Item catalogues, queries, and user intents exhibit heavy-tailed distributions; robust generalization requires long-tail-aware learning and representation sharing.
Multi-entity graphs with adversarial behavior: Interaction graphs include users, items, merchants, devices, IPs, logistics nodes, and can contain adversarial edges (fraud, collusion), requiring robust graph learning.
Decision coupling and delayed outcomes: Many KPIs (conversion, returns, satisfaction) are delayed; actions such as pricing and ranking changes feed back into future observations, creating non-stationarity.
Consequently, novelty in e-commerce deep learning models is frequently not achieved by adding new layers to the backbone models but by applying domain-specific changes to them. These changes may involve de-biasing objectives, counterfactual or off-policy evaluation, decision optimization under constraints, multimodal consistency checks for noisy catalogs, as well as infrastructure-aware compression and distillation approaches.
Table 1 summarizes representative deep learning model families used in modern e-commerce systems. Instead, the models are incorporated into the table according to the unique structural features that the models are designed to capture in the data, such as spatial, sequential, relational, multimodal, or decisional features, and so on, in the context of the e-commerce data. Therefore, the table reflects a data-driven organization of the literature, in which the models are chosen according to the features of the data and the associated business objectives.
The model families in
Table 1 can be broadly interpreted according to the dominant data structures they address. Convolutional Neural Networks (CNNs) mainly exploit local spatial correlation and are therefore particularly suitable for visual product understanding and image-based search. Recurrent and sequential models, including RNNs, LSTMs, and GRUs, are designed to exploit temporal dependencies and are therefore natural candidates for modeling click streams, user behavior, and time series of demand. The Transformer-based language models generalize this capability of sequential models using self-attention and can be applied to text data, including reviews, queries, and product descriptions.
To address these kinds of relational and structural dependencies, Graph Neural Networks (GNNs) have been proposed to effectively propagate information across user–item–entity graphs, which is crucial for applications like fraud detection, cross-domain recommendation, and catalogue refinement. Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) are fundamentally different from other models since they are based on sequential decision-making under uncertain conditions, where actions have consequences for future states and rewards, which are important for applications like dynamic pricing and logistics optimization. Furthermore, multimodal and generative representation learning models aim to effectively learn unified representations or generate data samples from different data sources, which are important for applications like robust personalization, data augmentation, and anomaly detection in complex e-commerce scenarios.
2.3. Neural Representation Learning
Representation learning can power AI to generate meaningful representation for text, images, and other metadata using their learned capabilities [
55,
56,
57,
58]. In NLP tasks, BERT, RoBERTa, and domain-specific models derived from transformers reveal their efficiency in sentiment analysis tasks, review classification, attribute extraction, and multi-lingual understanding tasks [
32,
33,
59]. Hybrid models for dissatisfaction mining using CNNs-LSTMs [
60], multi-lingual sentiment models [
35,
36], and figurative language identification models [
38] also illustrate its efficacy.
In computer vision, architectures using CNNs enable product classification, image similarity representation, quality assessment, and analysis of images created by consumers [
21,
22,
23]. Solutions for virtual try-on using depth information [
61] and image synthesis using GANs for products [
62].
This concurrent representation learning integrates various types of data, including text, image media, metadata, and behaviors, into common representation spaces [
49,
50]. This enhances various tasks, including retrieval, search results, catalogue augmentation, recommendations, and anti-fraud analysis. Attribute extraction and catalogue representation using Large Language Models (LLMs) [
24,
51] reflect the increasing usage of large LLMs for semantic representation.
2.4. Structured, Sequential, and Graph-Based Models
E-commerce data has complex patterns with respect to time, relations, and interactions [
26,
42,
63,
64,
65]. Models for sequential data including LSTMs, GRUs, or transformers can handle tasks including clickstream prediction, session modelling, churn prediction, and behavior forecasting [
27,
28,
29]. Specialized recommenders for e-commerce including ITS-Rec [
66] can handle text information for products and models for sequential data.
GNNs are graph representations for relations between buyers, products, attributes, or merchants. Examples showing GNN applications are Causal Intention Models [
40] for fraud detection, Identity Theft Detection [
39], and Cross-Category catalogue reasoning. Graph representation learning can also find uses in counterfeit goods detection, risk assessment for merchants, and global products mapping.
Models that integrate sequential encoders, graph reasoning, and multimodal fusion, including causal variational autoencoder-based GNN hybrids [
40] and multimodal factorization systems [
50], combine temporal, graph, and semantic representations into single prediction.
2.5. Decision Intelligence and Reinforcement Learning
Decision intelligence refers to prescriptive modelling frameworks that select actions under objectives and constraints, often combining predictive components (e.g., demand models, churn models, value models) with optimization, simulation, or policy learning. Within this broader class, RL/DRL is a prominent approach when decisions are sequential and rewards are delayed.
RL is therefore at the core of many e-commerce decision-support tasks [
67,
68,
69]. For dynamic pricing [
70,
71], feedback-optimized pricing can coordinate prices to react to demand elasticity and market competition. In warehouse automation, RL can optimize where to store goods [
43], assemble orders for robots [
44], or locate facilities. The microservice environment also uses RL for adaptive rate control and traffic management within the system [
45]. The multi-agent framework for RL also enables secure collaboration within IoT-based logistic and sensor environments [
46]. This has implications for distributed e-commerce operations. As e-commerce becomes more and more real-time and dynamic, RL will emerge as a workhorse technology for end-to-end decision intelligence, including pricing, logistics, inventory, and anti-fraud strategies.
2.6. Scalability, Security, and Distributed Foundations
Scalable DL models are required for e-commerce platforms around the globe to support inference operations with low latency and can ensure operability [
72,
73,
74]. Distributed/federated learning frameworks also provide support for regional, private model development that can ensure global consistency [
46]. Edge and 6G-enabled IoT models support computation near sensors to provide near-real-time anomaly detection for logistics and deliveries [
75,
76].
The security-centered foundations cover deep anomaly detection works [
77], Online-to-Offline (O2O) fraud detection research [
78], multimodal adversarial defense research [
24,
79], and authenticity verification research for images using computer vision [
21].
Scalability issues also push forward research on model compression, distillation, memory-efficient transformers, and distributed inference approaches, all of which are needed for real-world applications on global microservice architectures.
Figure 2 summarizes the methodological landscape of deep learning in e-commerce through a layered taxonomy that connects data modalities, model families, application tasks, and system-level objectives. The diagram emphasizes how distinct data types, such as behavioral logs, transactional records, user-generated content, relational graphs, and operational context, naturally align with specific deep learning paradigms, including sequential transformers, embedding-based models, GNNs, multimodal learning, Self-Supervised Learning (SSL) and RL, as well as explainable and trust-aware models. These methods collectively enable a broad spectrum of e-commerce tasks, ranging from forecasting and personalization to fraud detection, catalogue intelligence, and decision optimization, ultimately supporting objectives related to efficiency, revenue, risk management, and responsible AI.
Table 2 summarizes the relationships between e-commerce data modalities and deep learning model families.
Table 3 provides a comparative overview of the main deep learning architectures used in e-commerce, highlighting how their structural properties align with different data modalities and application requirements.
3. Consumer Behavior Prediction and Demand Forecasting
Accurate consumer behavior prediction and demand forecasting constitute foundational components of intelligent e-commerce systems, directly influencing recommendation quality, inventory management, pricing strategies, and customer retention [
14]. With the increasing availability of large-scale, high-frequency interaction data, DL has significantly reshaped how online platforms model user intent, purchasing likelihood, satisfaction, and future demand. Unlike traditional statistical or shallow machine learning approaches, deep neural models are capable of capturing non-linear dependencies, long-term temporal patterns, and complex interactions across heterogeneous data sources. Recent advances in this domain can be broadly categorized into three complementary directions: (i) behavioral modelling and user intent prediction, (ii) demand forecasting and market dynamics, and (iii) multimodal, cross-border, and causal behavior analysis.
3.1. Behavioral Modelling and User Intent Prediction
Behavioral modelling focuses on understanding how users interact with e-commerce platforms across browsing, searching, and purchasing stages [
14,
17,
80,
81,
82]. Deep sequential architectures, such as RNNs, Gated Recurrent Units (GRUs), and transformer-based models, have proven particularly effective in capturing session-level and long-term behavioral dependencies. For instance, CrossNet [
27] introduces a hybrid neural architecture for cross-domain shopping behavior prediction, enabling the transfer of behavioral knowledge across platforms. Similarly, ITS-Rec [
66] integrates item textual information into sequential recommendation models, improving intent inference by combining interaction histories with semantic product features.
User retention and churn prediction further extend behavioral modelling by focusing on post-purchase engagement and dissatisfaction signals. Bi-directional LSTM-based churn prediction frameworks [
29] demonstrate how contextualized temporal modelling can significantly improve customer retention strategies. Hybrid approaches that fuse textual reviews, metadata, and behavioral embeddings [
28,
32] provide richer representations of user sentiment and engagement, enabling early identification of dissatisfaction and churn risk. Studies on dissatisfaction detection using review analytics [
60] further highlight the importance of post-purchase feedback in understanding behavioral trajectories.
Importantly, behavioral modelling must account for linguistic and cultural diversity. Multilingual sentiment analysis frameworks for Bangla and Arabic e-commerce data [
35,
36] demonstrate that user intent modelling cannot rely solely on English-centric datasets. Finally, graph-based and causal learning approaches offer deeper interpretability by modelling relational and decision-driven behavior. Explainable causal variational autoencoders combined with equivariant graph NNs [
40] enable the identification of structural causes behind purchase decisions, moving beyond correlation-based inference.
3.2. Demand Forecasting and Market Dynamics
Demand forecasting remains critical for pricing, procurement, inventory planning, and supply chain optimization in e-commerce [
3,
83,
84,
85,
86]. DL models such as LSTMs, GRUs, and transformers have shown strong performance in capturing seasonality, trend shifts, and promotional effects across multiple time scales. A notable contribution is the deep learning-based approach for forecasting intermittent online sales proposed by Ahmadov and Helo [
4], which addresses sparsity and volatility, two common challenges in e-commerce demand data.
Hybrid forecasting architectures that combine temporal signals with operational features have also gained traction. Akshara et al. [
30] demonstrate how DL can improve product fulfillment forecasting by integrating order patterns with logistics constraints. Beyond micro-level demand signals, macroeconomic-aware forecasting models [
31] incorporate socio-economic indicators, weather conditions, and market trends, highlighting the importance of external context in long-term demand prediction.
Domain-specific demand forecasting applications further illustrate the versatility of deep learning. For example, skincare product demand modelling [
53] leverages deep representations to capture preference heterogeneity in niche markets. Demand forecasts also directly inform dynamic pricing strategies. Sequential recommender systems that incorporate price dynamics [
71] demonstrate how demand elasticity and competitive shifts can be exploited for revenue optimization. RL-based pricing models [
9,
10] further extend forecasting by jointly optimizing future demand and long-term revenue through policy learning.
3.3. Multimodal, Cross-Border, and Causal Behavior Analysis
Recent research increasingly emphasizes multimodal and cross-border behavior analysis as e-commerce platforms expand globally and incorporate richer content [
20,
87,
88,
89,
90]. Multimodal learning frameworks integrate text, images, attributes, and user-generated multimedia to enhance behavioral prediction. For instance, Sun and Liu [
49] propose an explainable multimodal model that combines visual and textual signals to predict image helpfulness and product sales. Attribute-centric large language models for product categorization [
51] further enrich behavioral representations by extracting fine-grained semantic attributes.
Cross-border e-commerce introduces additional complexity due to linguistic, cultural, and economic heterogeneity. DL embeddings for cross-border customer satisfaction prediction [
91] reveal significant regional variations in purchasing behavior. Studies on cross-market recommendation and localization [
92] demonstrate that globally deployable models must be sensitive to cultural norms and language-specific cues to remain effective.
Finally, causal modelling has emerged as a promising direction for understanding the mechanisms underlying consumer behavior. Causal graph-based approaches [
40] enable the disentanglement of cause–effect relationships such as price changes leading to demand shifts or dissatisfaction driving churn. These models offer significant advantages for decision intelligence, as they support counterfactual reasoning and policy evaluation rather than purely predictive accuracy.
In summary, consumer behavior prediction and demand forecasting have evolved from isolated predictive tasks into complex, multi-dimensional learning problems encompassing sequential modelling, multimodal fusion, cross-border representation learning, and causal reasoning. DL architectures now enable richer and more actionable insights into user intent, purchasing dynamics, and market behavior. These advances provide a critical foundation for downstream e-commerce applications, including recommendation, pricing, logistics, and retention, thereby positioning predictive intelligence as a cornerstone of data-driven decision-making in modern digital commerce.
4. Recommendation Systems
Recommendation systems constitute a central intelligence layer in modern e-commerce platforms, enabling product discovery, personalization, user engagement, and revenue optimization. Early recommender systems relied primarily on static heuristics or shallow collaborative filtering techniques, which often struggled to model complex user preferences, contextual dependencies, and rapidly evolving catalogues. The advent of DL has fundamentally transformed recommendation engines into dynamic, data-driven, and context-aware systems capable of jointly modelling user intent, item semantics, and global interaction patterns. Recent research advances in this domain can be broadly organized into three interrelated research directions: (i) modelling user preferences and interaction signals, (ii) multimodal and graph-based recommendation, and (iii) personalization, evaluation, and cross-border recommendation.
4.1. Modelling User Preferences and Interaction Signals
Modelling user preferences lies at the core of neural recommendation systems and focuses on learning latent representations that capture user interests and item characteristics from interaction data [
11,
64]. DL approaches extend classical collaborative filtering by incorporating representation learning, sequential modelling, and hybrid architectures. Neural collaborative filtering and representation-based matrix factorization methods have laid the groundwork for learning expressive user–item embeddings, while more recent sequential models leverage GRUs, LSTMs, and transformers to capture temporal dynamics in browsing, clicking, and purchasing behavior [
2].
Sequential recommendation models have demonstrated substantial performance gains by explicitly modelling session-level and long-term interaction dependencies. For example, CrossNet [
27] introduces a hybrid neural architecture for cross-domain recommendation that transfers behavioral knowledge across platforms, while ITS-Rec [
66] integrates item textual information into sequential transformers to enhance intent-aware recommendations. Similarly, GRUs and BiLSTM-based frameworks [
28,
29] effectively capture temporal preference evolution and have been successfully applied to both recommendation and churn-aware personalization tasks.
Beyond interaction signals, behavior-driven recommenders increasingly incorporate sentiment and post-purchase feedback to enrich user preference profiles. Models leveraging sentiment embeddings [
32], dissatisfaction signals extracted from reviews [
60], and transformer-based sentiment representations [
33] demonstrate that affective cues significantly improve recommendation relevance. Domain-specific recommenders, for skincare products [
53], lifestyle goods [
93], and tourism platforms [
94], further validate that incorporating domain knowledge and side information leads to more accurate and interpretable recommendations.
4.2. Multimodal and Graph-Based Recommendation
As e-commerce content becomes increasingly rich, multimodal recommendation has emerged as a key research direction [
55,
95]. Multimodal systems integrate visual, textual, and attribute-level information to overcome sparsity in interaction data and improve item representation quality. Image-based recommendation approaches employ CNNs and vision transformers to capture visual similarity and aesthetic features, supporting applications such as outfit recommendation and visual search [
21,
22,
23].
Textual and attribute-based representations learned using BERT-style models and large language models enable fine-grained semantic understanding of product descriptions and attributes [
24,
51]. Fusion architectures that combine visual, textual, and structured attributes, often through cross-attention mechanisms or shared latent spaces, have shown superior performance in complex recommendation scenarios [
49,
50].
GNNs further enhance recommendation systems by explicitly modelling relational structures among users, items, brands, categories, and merchants. By propagating information along interaction graphs, GNNs capture higher-order dependencies that are difficult to model with sequence-based approaches alone. Causal GNN frameworks [
40] provide interpretable insights into decision-making processes, while relational graph models [
39] are particularly effective in multi-merchant and cross-category environments. Hybrid recommenders that integrate multimodal learning with graph-based representations represent a promising direction for holistic preference modelling.
4.3. Personalization, Evaluation, and Cross-Border Recommendation
Personalization becomes increasingly challenging as e-commerce platforms expand across geographic regions, languages, and cultures [
87,
96]. Cross-border recommendation requires models that can generalize across heterogeneous markets while remaining sensitive to regional preferences. Studies on cross-market recommendation and customer satisfaction prediction [
91,
92] demonstrate that global consumer behavior varies substantially across cultures, necessitating adaptive and localized recommendation strategies. Multilingual sentiment analysis frameworks [
35,
36] further enable culturally aware personalization by aligning cross-lingual representations of user feedback.
Evaluation practices play a critical role in the deployment of recommendation systems. While offline ranking metrics such as NDCG, HR@k, and MRR remain standard, they often fail to capture real-world business impact. Consequently, online evaluation methods, including A/B testing, engagement lift, conversion rate, and retention analysis, are increasingly emphasized. Domain-specific evaluation frameworks [
93] and multimodal evaluation methodologies [
49] provide more informative assessments for specialized recommendation scenarios.
Finally, fairness, robustness, and interpretability have emerged as essential considerations in modern recommender systems. Explainable recommendation models leveraging attention mechanisms and attribute-level reasoning [
49] improve transparency and user trust. Causal assessment frameworks [
40] enable counterfactual reasoning, while robustness studies addressing adversarial manipulation and fake review detection [
24,
79] help ensure reliability in real-world deployments.
In summary, DL has transformed recommendation systems into multimodal, context-aware, and globally adaptive engines capable of modelling complex user preferences and item semantics. Advances in sequential modelling, multimodal representation learning, and graph-based reasoning have significantly improved recommendation accuracy and interpretability. At the same time, emerging research on cross-border personalization, robust evaluation, and fairness-aware design is enabling recommendation systems to scale responsibly across global e-commerce ecosystems. These developments position deep learning-based recommendation systems as a cornerstone of intelligent, decision-driven digital commerce.
5. Sentiment, Review Intelligence, and User Opinion Modelling
User-generated content, including textual reviews, ratings, images, and multimedia opinions, constitutes a critical source of information for consumer trust, product understanding, and platform credibility in e-commerce ecosystems. Beyond reflecting satisfaction levels, reviews encode nuanced emotional states, subjective preferences, cultural norms, and implicit decision signals. Recent advances in DL have substantially improved the ability to interpret affective tone, figurative language, visual sentiment, and cross-lingual expressions at scale. This section synthesizes key developments in e-commerce-oriented sentiment analysis and opinion modelling across three interrelated research directions: (i) linguistic and semantic sentiment modelling, (ii) multimodal and attribute-enhanced review intelligence, and (iii) multilingual, deceptive, and cross-cultural opinion analysis.
5.1. Linguistic and Semantic Modelling of Sentiment
Transformer-based architectures, including BERT, RoBERTa, and domain-adapted variants, now form the dominant paradigm for textual sentiment analysis in e-commerce platforms [
97,
98]. These models excel at capturing contextual polarity shifts, emotional subtleties, and long-range dependencies present in consumer reviews. Hybrid architectures that combine transformers with CNNs or recurrent layers further enhance robustness by jointly modelling local lexical patterns and sequential dependencies. Representative examples include CNN–BiLSTM frameworks for dissatisfaction analysis [
60], BERT–BiGRU sentiment pipelines [
32], and adaptive semantic embeddings tailored to review corpora [
33].
Beyond explicit polarity detection, recent work addresses more challenging linguistic phenomena such as sarcasm, figurative language, and implicit sentiment. Hybrid DL and contextual feature-based approaches have demonstrated effectiveness in identifying figurative expressions commonly used in stylized or emotionally charged reviews [
38]. Weak supervision, attention mechanisms, and category-aware fine-tuning strategies further improve generalization across product types, price ranges, and user segments. Domain-adaptive sentiment studies in skincare [
53] and lifestyle products [
93] highlight the importance of incorporating product semantics and category-specific language into sentiment representations.
5.2. Multimodal and Attribute-Enhanced Review Understanding
As user opinions increasingly include images and structured feedback, multimodal sentiment analysis has emerged as a critical extension of text-centric models [
99,
100]. Multimodal review intelligence integrates textual, visual, and metadata features to capture richer signals of satisfaction, authenticity, and trustworthiness. Vision-based sentiment analysis systems evaluate image quality, visual appeal, and implicit emotional cues embedded in user-uploaded images [
21,
23]. These approaches are particularly valuable for fashion, lifestyle, and experiential products where visual perception strongly influences sentiment.
Attribute extraction and product understanding models further enhance sentiment interpretability by grounding opinions in fine-grained product features. Large language models and attribute-aware transformers [
24,
51] link review content to structured attributes, enabling precise identification of dissatisfaction sources and preference drivers. Such attribute-grounded sentiment modelling supports explainability, targeted quality improvement, and feature-level personalization.
Graph-based opinion modelling introduces an additional relational perspective by connecting users, products, attributes, and sentiment expressions. Causal graph frameworks [
40] reveal latent factors influencing consumer opinions, while graph-based clustering improves aggregation of sentiment across merchants, categories, and markets. These approaches enable higher-level reasoning over opinion ecosystems rather than isolated reviews.
5.3. Multilingual, Deceptive and Cross-Cultural Opinion Modelling
The globalization of e-commerce necessitates sentiment models that generalize across languages, cultures, and regional expression norms [
101,
102]. Multilingual sentiment analysis frameworks address challenges such as code-switching, translation ambiguity, and culturally specific emotional expressions. Recent studies demonstrate effective cross-lingual sentiment classification using multilingual transformers and cascaded domain-based pipelines for low-resource languages [
35,
36,
37].
Deceptive opinion detection represents another critical challenge, as fake or manipulated reviews can significantly distort consumer perception. Contemporary approaches employ chaos-driven CNNs [
24], hybrid transformer–LSTM architectures [
25], and large language models for assisted fraud detection [
79]. Complementary signals, such as review helpfulness [
49] and metadata-driven anomaly detection [
78], further improve robustness against adversarial behavior.
Cross-cultural sentiment modelling reveals substantial variation in rating distributions, linguistic expression, and satisfaction thresholds across regions. Studies on cross-border customer satisfaction [
91,
103] emphasize the need for region-aware sentiment normalization, culturally adaptive embeddings, and global emotion lexicons. Addressing these challenges remains an open research direction for truly global opinion intelligence systems.
In summary, DL has elevated sentiment analysis and opinion modelling from basic polarity classification to holistic review intelligence frameworks that integrate linguistic nuance, multimodal perception, attribute grounding, and cross-cultural awareness. These advances directly support improved recommendation quality, fraud detection, customer experience management, and global personalization. As e-commerce platforms increasingly rely on opinion-driven decision intelligence, sentiment analysis emerges as a foundational component for trustworthy, explainable, and globally scalable digital commerce systems.
6. Trust, Security, and Anomaly Detection in e-Commerce
Trust and security represent foundational pillars of modern e-commerce ecosystems, underpinning consumer confidence, transactional integrity, and platform sustainability. E-commerce systems must continuously defend against a broad spectrum of threats, including fraudulent reviews, abusive or compromised accounts, counterfeit products, financial irregularities, logistics manipulation, and cyber–physical attacks on interconnected infrastructures. Traditional rule-based security mechanisms struggle to scale under the volume, heterogeneity, and adaptive nature of such threats. In response, DL has emerged as a core enabling technology for intelligent threat detection, behavioral risk modelling, and large-scale monitoring across digital and physical commerce layers. This section synthesizes recent advances across three tightly coupled research directions: (i) review integrity and content trustworthiness, (ii) behavioral anomaly detection and fraud risk modelling, and (iii) cyber–physical security and logistics integrity.
6.1. Review Integrity, Deception Detection, and Content Trustworthiness
User-generated content—including reviews, ratings, images, and multimedia feedback—plays a decisive role in purchase decisions but remains highly vulnerable to manipulation and coordinated fraud [
104,
105]. Deep learning-based deception detection has significantly improved robustness against forged reviews, spam campaigns, and group-based attacks. Recent approaches demonstrate that chaos-enhanced CNN classifiers [
24], large language model (LLM)-based reasoning frameworks [
79], and hybrid ConvRoBERTa–LSTM architectures [
25] achieve substantial gains over classical classifiers. Complementary fuzzy NN models [
106] further enhance detection in noisy or ambiguous review environments.
Multilingual and bidialectal e-commerce platforms introduce additional complexity, requiring models that capture language-specific sentiment markers and deceptive patterns. Domain-adapted frameworks for Arabic [
35] and Bangla [
36] demonstrate that culturally and linguistically aware modelling is essential for maintaining global platform trust. Beyond explicit deception detection, trust calibration mechanisms leverage transformer-based sentiment consistency checks [
32,
33] and multimodal review helpfulness estimation [
49] to assess post-publication reliability.
Visual authenticity verification further strengthens content trustworthiness by detecting manipulated or misleading images uploaded in reviews. Image-based verification models [
23], combined with attribute extraction and consistency checks [
51], support early detection of tampering, misrepresentation, and counterfeit presentation, thereby protecting both consumers and merchants.
6.2. Behavioral Anomaly Detection, Fraudulent Activity, and Networked Risk Modelling
Beyond content-level threats, e-commerce platforms face continuous exposure to behavioral anomalies such as account takeovers, bot-driven abuse, coupon exploitation, merchant collusion, and abnormal transaction patterns [
107,
108]. Deep learning-based anomaly detection frameworks employ LSTMs and transformers to model temporal behavior in event logs and interaction streams, enabling early detection of deviations from normative user or merchant behavior [
77].
GNNs play a critical role in uncovering coordinated and relational fraud. Graph-based risk models identify collusive merchants, shared identities, and cross-platform abuse patterns by reasoning over interaction networks [
39]. More recently, causal variational autoencoder-based GNN architectures [
40] have introduced explainability into anomaly detection by isolating latent causes behind suspicious behaviors rather than relying solely on correlation.
Financial and transactional risk modelling further extends anomaly detection into mission-critical domains. DL approaches to financial fraud [
109], online-to-offline merchant verification [
78], and cross-border refund irregularity prediction [
91] highlight the necessity of adaptive, data-driven security models. Multilingual sentiment cues and deceptive language patterns [
37,
60] enrich behavioral risk scores, while RL-based rate limiting for microservices [
45] safeguards platform infrastructure in real time.
Product authenticity detection also intersects with behavioral security. CNN-based authenticity scoring [
21] and multimodal consistency analysis [
49,
51] enable identification of counterfeit or altered goods, strengthening catalogue integrity and consumer safety.
6.3. Cyber–Physical Security, IoT Anomalies, and Logistics Integrity
E-commerce logistics increasingly operate as complex cyber–physical systems (CPSs), integrating warehouses, autonomous robots, IoT sensors, vehicles, edge devices, and next-generation communication networks [
72,
110]. These systems are exposed to adversarial threats such as sensor spoofing, route manipulation, infrastructure sabotage, and coordinated CPS fraud. DL has proven effective for anomaly detection in IoT-enabled commerce environments [
76], sensor placement optimization [
111], and distributed logistics intelligence supported by 6G networks [
75].
RL-driven warehouse control systems [
43] and automated batching and routing algorithms [
44] introduce new attack surfaces, making robustness and out-of-distribution detection critical research challenges. At the macro level, deep learning-based forecasting and risk analytics [
52,
112] enable early detection of supply chain disruptions, cross-border irregularities, and anomalous logistics flows.
Trust reinforcement mechanisms further integrate multimodal product verification [
21], cross-border metadata alignment [
113], and graph-based merchant risk scoring [
39]. Together, these approaches establish multi-layered protection across digital, transactional, and physical commerce infrastructures.
In summary, DL has become the backbone of trust, security, and anomaly detection in e-commerce ecosystems. Advances in deception detection, behavioral risk modelling, graph-based fraud reasoning, and cyber–physical security enable comprehensive protection across content, transaction, infrastructure, and logistics layers. As threats grow increasingly adaptive and cross-border in nature, future research must emphasize robustness, explainability, and global scalability to maintain trust in large-scale, intelligent e-commerce platforms.
7. Catalogue Intelligence and Product Understanding
Catalogue intelligence constitutes a core pillar of modern e-commerce platforms, directly influencing product discovery, search relevance, recommendation accuracy, pricing strategies, and global personalization. As online catalogues scale to millions of heterogeneous items across markets and languages, traditional rule-based or manually curated approaches become infeasible. In this context, DL has emerged as a dominant paradigm for enabling automated product understanding, supporting representation learning, attribute extraction, quality assessment, and multimodal reasoning over catalogue data. This section reviews recent advances in catalogue intelligence, organizing the literature into three interrelated research directions: (i) multimodal product representation and understanding, (ii) attribute extraction, enrichment, and semantic normalization, and (iii) catalogue structuring, product quality assessment, and consistency modelling.
7.1. Multimodal Product Representation and Understanding
Contemporary e-commerce platforms rely on rich multimodal product information, including textual descriptions, images, and contextual metadata [
56,
88]. DL models enable the construction of unified latent representations that jointly capture semantic, visual, and contextual aspects of products. Vision-based approaches employ CNNs and Vision Transformer (ViT) architectures to model product appearance, visual quality, and fine-grained attributes, demonstrating strong performance in visual search, recommendation, and quality evaluation tasks [
21,
22,
61]. In parallel, image-based verification frameworks detect manipulated or low-quality user-uploaded images through deep similarity scoring and anomaly detection mechanisms, contributing to catalogue trustworthiness [
23].
Textual product understanding has been significantly advanced through transformer-based language models such as BERT and RoBERTa, which extract semantic information from titles, descriptions, and specifications [
32,
33]. To bridge visual and textual modalities, multimodal fusion models integrate image and text embeddings using cross-attention mechanisms or shared co-embedding spaces, enabling more expressive and context-aware product representations [
49,
50]. Beyond pairwise fusion, graph-based approaches further enhance multimodal catalogue intelligence by modelling relational structures among products. Graph NNs (GNNs) support reasoning over product co-occurrence, similarity relationships, and hierarchical taxonomies, facilitating improved categorization and inference across brands and categories [
39,
40].
7.2. Attribute Extraction, Enrichment, and Semantic Normalization
Accurate and complete attribute information is essential for effective search, filtering, comparison, and analytics. Missing or inconsistently represented attributes remain a major bottleneck in large-scale catalogues. DL has substantially improved attribute extraction through transformer-based and hybrid multimodal models capable of mapping unstructured textual content to structured product schemas. Large language model-based frameworks achieve scalable and precise attribute extraction by aligning free-form descriptions with predefined attribute ontologies [
51]. Complementarily, vision-based attribute extraction using convolutional and attention mechanisms enables fine-grained recognition of visual characteristics directly from product images [
21].
Semantic normalization addresses inconsistencies arising from heterogeneous naming conventions across sellers, regions, and languages. Cross-border catalogue integration benefits from multilingual attribute extraction and normalization frameworks that harmonize product metadata across markets [
36,
113]. Furthermore, sentiment and opinion modelling at the attribute level enhances catalogue enrichment by linking customer feedback to specific product features, enabling more nuanced quality assessment and user insight extraction [
24,
60]. Recent hybrid systems combine multimodal embeddings, graph-based co-occurrence statistics, and rule-aware transformers to support downstream catalogue operations such as category mapping, deduplication, facet generation, and SKU-level expansion.
7.3. Catalogue Structuring, Product Quality Assessment, and Consistency Modelling
Beyond representation and enrichment, catalogue intelligence requires mechanisms for ensuring structural integrity and internal consistency at scale. GNNs and causal reasoning frameworks play a central role in catalogue structuring by enabling deduplication, product grouping, catalogue inference, and the detection of irregular or anomalous items [
40]. Consistency modelling techniques identify mismatches across modalities, such as discrepancies between images, titles, descriptions, and extracted attributes, thereby improving catalogue reliability [
49,
51].
Authenticity verification systems leverage CNN-based image classifiers to detect forged, misleading, or manipulated product images, contributing to trust and fraud prevention [
21]. In addition, product quality assessment frameworks evaluate image clarity, relevance, and informativeness to optimize product page presentation and user experience [
23]. Cross-market catalogue management introduces further complexity due to linguistic, cultural, and seller-specific variations. Multilingual sentiment indicators [
35,
36] and cross-market product alignment techniques [
103] support consistent catalogue quality and representation across global e-commerce ecosystems.
Overall, deep-learning-based catalogue intelligence underpins a wide range of e-commerce functionalities, including search, recommendation, pricing, trust, and global personalization. Recent advances in multimodal representation learning, scalable attribute enrichment, and graph-based catalogue reasoning highlight a shift toward more autonomous, consistent, and globally adaptable catalogue systems. As catalogues continue to grow in size and diversity, future research must emphasize robustness, explainability, and cross-market generalization to sustain high-quality product understanding in large-scale e-commerce platforms.
8. Pricing, Operations, and Decision Intelligence
Pricing optimization, fulfillment, and large-scale operational decision-making constitute critical dimensions of competitiveness in modern e-commerce ecosystems. Platforms must continuously adapt prices, inventory allocation, logistics execution, and infrastructure behavior in response to volatile demand, competitive pressure, and operational uncertainty. DL and RL have emerged as foundational technologies for modelling such complex, dynamic environments, enabling data-driven decision intelligence across pricing, warehousing, logistics, and platform reliability. This section synthesizes recent advances in three tightly coupled research directions: (i) dynamic pricing and elasticity learning, (ii) warehouse optimization and logistics intelligence, and (iii) operational forecasting, anomaly detection, and microservice reliability.
8.1. Dynamic Pricing, Elasticity Learning, and Revenue Optimization
Dynamic pricing remains a central challenge in e-commerce, requiring accurate modelling of demand elasticity, competition, and temporal demand fluctuations. DL models increasingly integrate multi-source pricing signals, including historical price trajectories, competitor prices, and market attributes extracted from textual descriptions and user behavior. Sequential architectures such as LSTMs and GRU networks are widely used for price and demand forecasting, demonstrating improved stability and responsiveness to temporal patterns [
30]. Incorporating macroeconomic and market-level indicators further enhances forecasting robustness, particularly for long-term trend estimation [
31].
Beyond supervised forecasting, RL has gained prominence as a decision-theoretic framework for revenue optimization under uncertainty. Reinforcement learning-based pricing controllers learn adaptive pricing policies through continuous interaction with demand feedback, effectively capturing nonlinear elasticity effects and delayed reward structures [
9,
70]. Cross-border pricing models highlight the importance of cultural and regional heterogeneity in elasticity learning, showing that pricing strategies trained in one market may not generalize directly to others [
91,
114]. Collectively, these studies indicate a shift from static or myopic pricing toward adaptive, long-horizon pricing intelligence.
8.2. Warehouse Optimization, Robotics, and Logistics Intelligence
Efficient warehousing and logistics operations are indispensable for meeting customer expectations in large-scale e-commerce. RL has been successfully applied to warehouse storage optimization, enabling adaptive allocation strategies that minimize travel distance and improve picking efficiency in dynamic environments [
43]. In robotic fulfillment systems, DRL and hyper-heuristic routing strategies optimize order batching and robot coordination, significantly improving throughput in high-density warehouses [
44].
Beyond warehouse interiors, DL supports broader logistics intelligence through IoT-enabled sensing and multimodal forecasting. Models for logistics node layout optimization [
111], anomaly-resilient sensor networks [
76], and low-latency routing over 6G-enabled infrastructures [
75] enhance operational visibility and responsiveness. Demand-aware logistics forecasting frameworks facilitate anticipatory inventory allocation and distribution planning [
112], while deep learning-based supply chain models support cross-border import forecasting under uncertainty [
52]. These advances collectively enable logistics systems to transition from reactive execution to proactive, predictive coordination.
8.3. Operational Forecasting, Anomaly Detection, and Microservice Reliability
At the platform level, e-commerce operations rely on complex software ecosystems comprising microservices, financial pipelines, and delivery coordination systems. Accurate forecasting and early anomaly detection are therefore essential for maintaining service reliability and preventing operational disruptions. Deep learning-based anomaly detection frameworks analyze service logs, transactional streams, and operational metrics to identify fraudulent behavior, system abuse, and failure patterns [
77]. In O2O commerce scenarios, semi-supervised anomaly detection models address cross-domain risks such as courier fraud, geographic spoofing, and irregular routing behavior [
78].
Operational forecasting models—ranging from sequential encoders to multimodal predictors—support procurement planning, resource allocation, autoscaling, and rate control [
30,
31]. RL further extends decision intelligence to infrastructure control, with adaptive microservice traffic regulators preventing overload and cascading failures in large-scale e-commerce services [
45]. In cross-border settings, forecasting frameworks that incorporate regulatory, customs, and regional demand factors enable more reliable estimation of delays, inventory requirements, and service-level risks [
52,
91].
In summary, DL and RL now form the computational backbone of pricing, logistics, and operational decision intelligence in e-commerce. By integrating multimodal forecasting, adaptive optimization, IoT-enabled sensing, and anomaly-aware control mechanisms, modern platforms achieve higher levels of efficiency, robustness, and scalability. Future research directions point toward unified decision-making architectures that jointly optimize pricing, logistics, and platform reliability under shared uncertainty, reinforcing the role of AI as a strategic enabler of end-to-end e-commerce operations.
9. Cybersecurity, Fraud Defense, and Platform Reliability
With the exponential growth of e-commerce platforms, the associated attack surface has expanded to encompass digital accounts, payment systems, logistics networks, interlinked services, and large-scale supply chains. Ensuring cybersecurity, fraud defense, and platform reliability is thus a fundamental requirement for maintaining consumer trust, safeguarding financial integrity, and sustaining scalable operations. DL approaches have emerged as a core technological foundation in this space, enabling nuanced detection of deceptive behavior, real-time anomaly identification, and adaptive security for distributed infrastructures. This section synthesizes recent developments in securing e-commerce environments across three tightly connected domains: content integrity and review fraud detection, behavioral and transactional anomaly modelling, and systemic reliability for microservices, IoT, and cyber–physical logistics.
9.1. Content Integrity, Review Fraud Detection, and Multimodal Trust Signals
User-generated content such as textual reviews, ratings, and uploaded images plays a pivotal role in shaping purchase decisions and platform reputation. However, this content is vulnerable to manipulation, including fake reviews, deceptive imagery, and coordinated spam campaigns. Traditional keyword-based or rule-based filters are increasingly insufficient at scale due to adversarial language patterns and multimodal sophistication.
Recent DL methods have significantly improved detection accuracy by leveraging rich learned representations. Chaos-enhanced CNNs, such as those proposed by Fan et al. [
24], introduce structured noise perturbations and hierarchical feature extraction to enhance robustness against forged text patterns. Hybrid architectures that combine transformer encodings with recurrent structures, exemplified by ConvRoBERTa–LSTM frameworks [
25], effectively capture both long-range context and sub-sequence dynamics present in deceptive content. Large language models (LLMs) have also been leveraged for assisted classification, offering contextual reasoning capabilities that go beyond surface lexical features [
79].
Content trustworthiness must also adapt to multilingual and multicultural contexts. Domain-agnostic detection frameworks for languages such as Arabic [
35] and Bangla [
36] demonstrate that language-specific features and script complexities must be considered to maintain global coverage. Beyond textual authenticity [
115], multimodal verification systems that combine linguistic cues with visual signals (e.g., image relevance, manipulation artifacts) contribute to more comprehensive assessments of review credibility. For example, multimodal helpfulness scoring models integrate textual tone and image content to predict trustworthiness and user influence [
49], while image authenticity frameworks built on convolutional vision models help detect tampered photographs in review streams [
23]. Attribute extraction models [
51] further enhance trust signals by identifying mismatches between product descriptions, structured attributes, and user-submitted media.
Collectively, these contributions illustrate a shift from uni-modal text classification to multimodal and global content verification systems [
116] capable of coping with linguistic diversity, adversarial manipulation, and complex media.
9.2. Behavioral Anomaly Detection, Transactional Fraud, and Network Threat Modelling
Moving beyond content integrity, it is equally critical to detect anomalous behaviors that signal fraud or compromise at the interaction and transaction levels. DL models excel at capturing temporal patterns, relational dependencies, and latent anomalies embedded in high-volume event logs and transactional sequences.
Sequence anomaly detection utilizing LSTMs and transformer architectures enables identification of abnormal navigation, purchase timing, or payment sequences that deviate from learned normative behaviors [
77]. These models consider temporal continuity and pattern shifts, facilitating early flagging of suspicious activities such as bot-like navigation, rapid order placements, or session hijacking.
GNNs offer complementary relational modelling capabilities, allowing platforms to identify multi-hop collusion between buyers, sellers, or intermediaries. By representing interactions as graph structures, GNN-based models can uncover subtle patterns of coordinated behaviors that are not detectable via isolated sequential analysis [
39]. Extending this idea, causal GNN–VAE hybrids [
40] incorporate latent causal reasoning, enabling not only detection but also explanation of the generative mechanisms underlying coordinated fraud.
Transactional anomaly detection is a further important dimension, particularly for refund manipulation, payment irregularities, and order routing abuse. Approaches tailored to financial fraud combine deep representations with domain constraints to surface anomalies in payment flows, refund patterns, and cross-border refund abuse [
109]. Online-to-Offline merchant anomaly frameworks further integrate geospatial, behavioral, and service usage signals to detect patterns such as courier fraud, geographic spoofing, and abnormal routing [
78]. Cross-lingual sentiment and behavior modelling [
37,
60] enriches these systems by incorporating feedback cues that correlate with fraudulent intent, particularly in global marketplaces.
These advances illustrate a transition from isolated anomaly detectors to integrated behavioral and relational threat models that capture temporal, structural, and contextual irregularities in e-commerce ecosystems.
9.3. Platform Reliability, IoT/6G Security, and Microservice Hardening
E-commerce platforms are increasingly built upon distributed architectures comprising microservices, IoT sensors, edge devices, and next-generation communication networks such as 6G. While this infrastructure enhances scalability and responsiveness, it also introduces new classes of vulnerabilities at the systems level. Ensuring reliability and security in these environments requires DL approaches that can operate on telemetry, system logs, and multimodal sensor streams.
At the microservice layer, RL has been applied to real-time traffic shaping, rate limiting, and reliability hardening. RL-based rate limiters, such as those introduced by Li et al. [
45], dynamically adjust service access policies to defend against distributed abuse and ensure quality of service (QoS) under adversarial load. Sequence-based anomaly detection frameworks can monitor distributed service logs to identify throughput anomalies or sudden failures indicative of ongoing attacks [
77].
IoT security remains a critical concern due to the proliferation of connected sensors and devices in warehouses, logistics hubs, and last-mile operations. Deep anomaly detection models designed for IoT telemetry can flag irregular communication patterns, sensor spoofing attempts, and data integrity threats [
76]. Optimized sensor placement strategies [
111] further improve monitoring coverage and reduce blind spots.
In logistics networks enabled by high-speed 6G connectivity, DL supports irregular flow detection, route anomaly modelling, and transboundary incident analysis [
75]. Coupled with deep forecasting models that capture expected flow patterns [
52,
112], these systems can surface unusual deviations that may signal supply chain attacks, illicit rerouting, or customs evasion.
Finally, multimodal platform protection integrates content-level authenticity checks (e.g., vision-based verification [
21]) with system-level consistency modelling [
49,
51] to ensure that product catalogues, logistics statuses, and service health indicators align with trusted expectations. Such comprehensive integration supports both operational reliability and security assurance across digital and cyber–physical layers.
DL has emerged as a critical foundation for securing e-commerce platforms at multiple layers, from content authenticity and behavioral threat detection to systemic resilience in distributed infrastructures. The surveyed works collectively demonstrate the maturity of deep models in detecting deception, capturing anomalous patterns, and reinforcing platform reliability. However, evolving attack vectors, cross-border threats, and adversarial manipulation continue to challenge existing techniques, highlighting the need for research that emphasizes robustness, interpretability, and scalable deployment in real-world settings.
10. Global e-Commerce, Cultural Adaptation, and Multilingual Intelligence
The globalization of e-commerce has fundamentally transformed digital marketplaces into culturally diverse, multilingual, and regulation-dependent ecosystems. Modern platforms operate across national borders, accommodating heterogeneous consumer preferences, linguistic diversity, socio-cultural norms, and country-specific legal constraints. These complexities challenge conventional data-driven models, which often assume homogeneous user behavior and language uniformity. DL has emerged as a key enabler of global e-commerce intelligence by supporting culturally adaptive personalization, cross-lingual content understanding, and scalable global operations. This section synthesizes recent advances in global e-commerce modelling across three interconnected dimensions: cross-market consumer behavior and cultural adaptation, multilingual and cross-lingual content intelligence, and global logistics and regulatory adaptation.
10.1. Cross-Market Consumer Behavior and Cultural Adaptation
Consumer behavior in global marketplaces is shaped by regional norms, cultural values, economic conditions, and purchasing rituals, leading to substantial variation in satisfaction patterns, evaluation distributions, and decision-making processes. Empirical studies on cross-border e-commerce reveal that identical products may elicit markedly different browsing trajectories, price sensitivities, and purchase triggers across countries [
91]. DL models trained on region-specific embeddings have demonstrated improved predictive accuracy by capturing these latent cultural and behavioral signals [
103].
Behavior forecasting and customer satisfaction modelling become particularly complex in international settings, where temporal factors such as regional holidays, promotional calendars, and cultural events interact with pricing strategies and consumer sentiment. Deep behavioral prediction frameworks [
28,
29] incorporate contextual signals to improve churn prediction, retention analysis, and satisfaction forecasting across regions. Similarly, dynamic pricing models based on recurrent and RL architectures [
9,
70,
71] account for regional price elasticity, local competition, and purchasing power parity, enabling more adaptive and market-aware pricing policies.
Sentiment analysis and dissatisfaction detection further illustrate the importance of cultural adaptation. Linguistic expressions of dissatisfaction, irony, or politeness vary significantly across cultures, limiting the effectiveness of generic sentiment classifiers. Hybrid sentiment modelling approaches [
60] explicitly incorporate regional language patterns and culturally grounded semantics, enhancing the reliability of dissatisfaction forecasting in global review streams.
A notable advancement in this domain is cross-border customer representation learning, where global embeddings integrate regional priors, cultural sentiment features, and market-specific category preferences. Such approaches [
103] enable unified recommendation and ranking systems that preserve personalization accuracy while maintaining scalability across heterogeneous markets. Collectively, these works highlight a shift from region-agnostic behavioral modelling toward culturally adaptive DL frameworks that respect behavioral diversity without fragmenting platform architectures.
10.2. Multilingual, Cross-Lingual, and Multimodal International Understanding
Global e-commerce platforms must process vast volumes of multilingual content, including reviews, product descriptions, customer inquiries, and search queries. DL has substantially advanced multilingual sentiment analysis and opinion mining by enabling models to handle code-switching, dialectal variations, and culturally specific figurative language. Transformer-based architectures trained on multilingual corpora [
35,
36] demonstrate strong performance across low-resource and morphologically complex languages, while hybrid linguistic–contextual models improve figurative language detection in product reviews [
38].
Cross-lingual opinion mining further enhances global trust mechanisms by aligning sentiment signals across languages, enabling consistent review interpretation and trust scoring regardless of linguistic origin [
37]. These capabilities are critical for cross-border marketplaces, where translated or multilingual reviews often inform purchasing decisions.
Beyond textual understanding, global catalogue intelligence requires accurate cross-lingual attribute extraction and semantic alignment. Deep models for multilingual entity and attribute extraction [
51,
113] bridge inconsistencies between localized catalogues, enabling coherent product representations across markets. Multimodal learning further strengthens international understanding by combining textual descriptions with product images and visual attributes [
49,
50], supporting global search, retrieval, and recommendation functionalities.
Fraud and deception detection in global markets also benefit from multilingual intelligence. Fake review detection frameworks increasingly leverage cross-lingual semantic cues and multilingual embeddings to identify culturally distinct manipulation strategies [
24,
79]. These models outperform monolingual baselines by capturing subtle linguistic patterns and deception tactics that vary across regions. Overall, advances in multilingual and multimodal DL enable language-agnostic yet culturally sensitive content intelligence, a prerequisite for trustworthy global e-commerce.
10.3. Global Logistics, Cross-Border Fulfillment, and Regulatory Adaptation
Cross-border logistics introduces additional layers of uncertainty related to customs procedures, international transportation networks, regional holidays, and geopolitical constraints. Deep learning-based forecasting models address these challenges by integrating time-series data with contextual and multimodal signals. Predictive frameworks for delivery times, demand fluctuations, and trade flows [
31,
52] improve resilience against disruptions and enhance operational planning in international fulfillment networks. Sustainable logistics and green supply chain optimization further extend these models by incorporating environmental constraints and regulatory goals [
112,
117].
The global nature of logistics systems also amplifies cyber–physical risks. IoT-enabled anomaly detection models monitor sensor data, shipment trajectories, and operational telemetry to identify irregularities in global supply chains [
76]. Optimized sensor placement strategies [
111] enhance visibility across distributed logistics nodes, while DL frameworks operating over 6G-enabled networks [
75] support real-time monitoring and adaptive routing in large-scale international operations. RL-based optimizers for warehouses and routing [
43,
44] further adapt logistics decisions to regional constraints and infrastructure heterogeneity.
Regulatory adaptation represents an additional challenge for global platforms, as data governance, fairness requirements, and explainability standards vary widely across jurisdictions. Deep learning-based governance frameworks [
118] emphasize transparency, compliance, and adaptability, enabling AI systems to conform to country-specific regulations while maintaining consistent performance. These approaches highlight the growing importance of regulation-aware AI architectures in global e-commerce.
In summary, DL has become a foundational technology for global e-commerce by enabling culturally adaptive consumer modelling, multilingual content intelligence, and resilient cross-border operations. The surveyed literature demonstrates that innovations in cross-market behavior modelling, multilingual and multimodal understanding, and global logistics forecasting collectively support scalable and trustworthy international platforms. Nevertheless, persistent challenges related to cultural bias, regulatory heterogeneity, and interpretability underscore the need for future research on globally robust, transparent, and ethically grounded DL systems.
11. Evaluation, Benchmarking, and Model Governance in e-Commerce
The rapid adoption of DL in e-commerce has intensified the need for rigorous, multidimensional evaluation frameworks that extend beyond conventional machine learning metrics. Unlike static prediction tasks, e-commerce systems operate in highly dynamic, competitive, and globally distributed environments, where model performance directly affects revenue, customer experience, fairness, and regulatory compliance. Consequently, evaluation and benchmarking practices must jointly consider algorithmic accuracy, business impact, robustness under real-world constraints, and adherence to governance principles. This section reviews recent advances and challenges in evaluation methodologies, benchmarking infrastructures, and model governance frameworks that aim to ensure trustworthy, scalable, and compliant e-commerce AI systems.
11.1. Evaluation Metrics, Real-World Constraints, and Methodological Challenges
Traditional evaluation metrics such as Accuracy, F1-score, AUC, NDCG, and HR@k remain foundational for assessing predictive and ranking performance, yet they provide only a partial view of system effectiveness in real e-commerce environments. These metrics are typically computed under static assumptions and offline settings, failing to capture the temporal dynamics of user behavior, seasonality effects, data sparsity, geographic non-stationarity, and strategic interactions with competing platforms. As a result, there is growing recognition that evaluation protocols must be adapted to the operational realities of e-commerce systems.
Recent studies on sequential recommendation [
27,
66], multimodal review and content intelligence [
49], and dynamic pricing mechanisms [
71] highlight the importance of temporal-aware and context-sensitive evaluation strategies. In practice, platforms increasingly rely on business-oriented metrics such as conversion lift, engagement uplift, customer retention, return-rate reduction, cross-selling impact, and fulfillment efficiency to quantify real-world value creation. These metrics bridge the gap between algorithmic performance and organizational objectives, yet they introduce additional methodological complexity due to delayed feedback and confounding effects.
Robust evaluation is particularly critical for high-risk models such as sentiment-driven churn prediction [
32,
60], multimodal attribute extraction systems [
51], and fraud detection frameworks [
24,
79]. Such models must be evaluated not only for predictive accuracy but also for robustness against adversarial manipulation, multilingual noise, domain shift, and long-tail behaviors. To address these challenges, advanced evaluation workflows increasingly combine offline benchmarking with online experimentation, counterfactual inference, and simulation-based analysis [
40]. These hybrid approaches allow researchers and practitioners to assess causal impact, stability, and risk under realistic deployment conditions.
11.2. Benchmarking, Dataset Quality, and Reproducibility Limitations
Benchmarking DL models in e-commerce is inherently constrained by the proprietary nature of platform data, domain heterogeneity, and linguistic diversity. Publicly available datasets often lack critical characteristics observed in production environments, such as strategic user manipulation, catalogue volatility, cross-border inconsistencies, and authentic fraud patterns. As a result, strong offline performance does not necessarily translate into operational effectiveness.
Benchmarking catalogue intelligence and recommendation systems increasingly requires multimodal datasets that integrate text, images, structured attributes, and behavioral logs [
21,
49,
50]. However, existing benchmarks often provide limited modality coverage or simplified interaction structures. Similarly, multilingual and cross-lingual datasets [
35,
36] remain narrow in scope, constraining meaningful comparisons across languages and cultural contexts. Graph-based datasets for behavioral modelling and fraud detection [
39,
78] further illustrate the tension between realism and accessibility, as such data are rarely released due to privacy and security concerns.
Reproducibility is additionally challenged by the reliance on proprietary features, unavailable operational signals (e.g., logistics routing data, warehouse telemetry, or competitor pricing), and platform-specific engineering choices. Domain-specific evaluation studies—such as skincare recommendation [
53] or lifestyle-oriented personalization [
93]—underscore the urgent need for standardized evaluation protocols that reflect real-world complexity while remaining accessible to the research community. Promising directions include the development of multimodal retail benchmarks, controlled synthetic data generation, and privacy-preserving data-sharing mechanisms that balance realism with reproducibility.
11.3. Model Governance, Fairness, Accountability, and Regulatory Compliance
As e-commerce platforms scale globally, model governance has emerged as a central concern encompassing fairness, transparency, accountability, safety, and regulatory compliance. AI-driven decisions increasingly influence product visibility, pricing, creditworthiness, and fraud enforcement, necessitating governance frameworks that ensure responsible and explainable behavior across jurisdictions. Explainable AI techniques applied to recommendation, sentiment analysis, and fraud detection [
33,
40,
49] provide mechanisms for interpreting model decisions and auditing their impacts.
Fairness challenges are particularly pronounced in cross-market personalization [
103], cultural sentiment prediction [
91], and multilingual recommendation systems [
35]. Bias may arise from uneven data availability, culturally specific expressions, or historical behavioral imbalances, leading to systematic disparities across regions or user groups. Mitigation strategies include fairness-aware training objectives, region-calibrated models, explainable risk assessments, and disparity analysis across demographic and geographic segments.
Regulatory compliance further shapes governance practices, particularly with respect to data privacy, algorithmic transparency, and explainability mandates. Platforms are increasingly required to document model behavior, training data usage, and decision logic to satisfy regional regulations. Operational governance mechanisms include continuous monitoring, concept drift detection, controlled A/B experimentation, and safety constraints for RL systems [
43,
45]. In logistics and cyber–physical domains, governance extends to IoT-based anomaly detection [
76], counterfeit detection [
21], and supply-chain risk modelling, emphasizing the convergence of technical robustness and organizational accountability.
Evaluation, benchmarking, and model governance provide the framework for trustworthy, scaleable, and compliant e-commerce AI on the global stage. The development in e-commerce AI has ensured that DL frameworks are always fair, trustworthy, and compliant with global standards.
Table 4 summarizes the most commonly used datasets in deep learning-based e-commerce research, highlighting their evaluation scope, modality coverage, and representative studies.
To provide context for e-commerce evaluation practices,
Table 4 outlines the core data resources for e-commerce applications ranging from review data to multimodal catalogues.
12. Challenges and Limitations of Deep Learning in e-Commerce
Despite the remarkable advances of DL across prediction, personalization, and decision intelligence in e-commerce, significant challenges remain that limit the robustness, scalability, and global applicability of current solutions. As modern e-commerce platforms evolve into highly dynamic, multimodal, and cross-border ecosystems, limitations related to data quality, generalization, robustness, operational reliability, interpretability, and regulatory compliance become increasingly pronounced. This section synthesizes the major open challenges into three interrelated themes: (i) data, modelling, and generalization limitations, (ii) robustness, security, and reliability challenges, and (iii) operational, ethical, and cross-border constraints.
12.1. Data, Modelling, and Generalization Limitations
DL models are inherently data-intensive, yet e-commerce data are often noisy, sparse, heterogeneous, and temporally nonstationary. User behavior is influenced by seasonality, promotions, external events, and rapidly shifting preferences, resulting in frequent distribution drift that undermines model generalization. Cold-start scenarios for new users, products, and merchants persist as a fundamental challenge, even with advanced representation learning and pretraining strategies.
The growing reliance on multimodal inputs further exacerbates data-related limitations. Product catalogues frequently contain inconsistent or incomplete textual descriptions, low-quality images, missing attributes, and biased or manipulated reviews. While transformer-based language models and vision architectures (CNNs and ViTs) have significantly improved representation learning, their robustness under domain transfer, out-of-distribution scenarios, and low-resource languages remains limited. Sequential and session-based models based on RNNs or transformers similarly struggle to generalize across markets, product categories, and user segments with distinct behavioral dynamics.
Graph-based modelling introduces additional vulnerabilities. Recommendation, fraud detection, and merchant analytics increasingly depend on GNNs operating over interaction, co-purchase, or trust graphs. However, incomplete connectivity, noisy edges, and evolving graph structures can significantly degrade GNN performance. RL models applied to pricing, promotion optimization, and warehouse control face further difficulties due to sparse rewards, delayed feedback, partial observability, and the need for safe exploration in complex, real-world environments.
12.2. Robustness, Security, and Reliability Challenges
E-commerce platforms are adversarial by nature, making robustness and security central concerns for deployed DL systems. Fake reviews, manipulated images, coordinated collusive behaviors, bot-generated interactions, and phishing attacks actively target text, vision, and graph-based models. Even state-of-the-art detection systems can be circumvented through adversarial perturbations, linguistic obfuscation, or rapidly evolving fraud strategies.
Beyond explicit attacks, DL models are highly sensitive to distribution shifts arising from changes in user preferences, inventory availability, pricing strategies, and competitive landscapes. Models trained on historical data may experience rapid performance degradation when deployed in live environments. These risks are amplified in microservice-based architectures, where inference latency, cascading failures, and partial outages can compromise real-time personalization and decision-making pipelines.
Cyber–physical components of e-commerce logistics, including IoT sensors, automated warehouses, and route optimization systems, introduce additional attack surfaces. Sensor spoofing, data integrity violations, connectivity disruptions, and adversarial control policies threaten system reliability. Addressing these challenges requires advances in deep anomaly detection, secure model serving, fault-tolerant RL, and continuous monitoring frameworks capable of detecting drift and anomalous behaviors in real time.
12.3. Operational, Ethical, and Cross-Border Limitations
The deployment of DL on a worldwide scale has complex operational and ethical challenges. The regulatory landscape for data privacy, algorithmic transparency, and accountability varies across the globe. This limits the design of DL models, training data management, and deployment practices. On the other hand, the e-commerce domain has a need for decision support systems that not only have good prediction accuracy but also provide transparency and risk awareness.
12.3.1. Interpretability and Explainability Gaps
The major drawback of the existing state-of-the-art models of e-commerce is the lack of interpretability. Many of the expressive models, including transformers, multimodal fusion models, and graph neural networks, often lack the ability to provide a clear understanding of the reasons behind the recommendation of a product to a user, which influenced the setting of the price of the product, or the reasons behind the detection of a fraudulent transaction.
Interpretabilty from a deployment point of view implies not only a user-facing requirement, as in the case of explainability in recommender systems, but also an operational requirement for internal stakeholders. Therefore, explainable model design and explainability techniques, as well as feature/attribution auditing, emerge as critical elements of responsible deployment, especially when models have implications for product visibility, pricing models, refunds, or even account sanctions.
12.3.2. Uncertainty Quantification and Decision Risk
E-commerce environments are inherently uncertain, and they are characterized by demand shocks, distributional shifts, strategic user behavior, inventory volatility, and data quality concerns. However, many deep learning pipelines used for decision-making in practice today do not output calibrated uncertainty estimates and instead output a single number or score. This has important implications for high-stakes decision-making, where the optimal action depends not only on the predicted outcome but also on the model’s confidence.
Quantifying uncertainty can help support more informed and safe decision-making by allowing for risk-aware ranking, cautious updates to pricing, and automation. Most importantly, quantifying uncertainty is also necessary for offline policy evaluation for reinforcement learning-based optimization, where distribution shifts and unobserved confounding can lead to overly optimistic performance estimates.
12.3.3. Causal Mechanism Identification and Counterfactual Reasoning
Another gap that is evident is that, with a restricted ability to explain causal mechanisms, predictive models are not very useful. There are a lot of models that have a very high performance and are able to learn user behavior and outcomes. However, it is evident that even with such models, it is not possible to determine if a particular intervention, such as a change in price, a promotion, a change in the ranking, or even a moderation, actually produces a particular effect or if it is just coinciding with it. This is a very big problem, and it is evident that uplift models, counterfactuals, and causal mechanism identification are very important for the complement of deep learning models in the world of e-commerce.
However, issues of fairness and bias remain a persistent problem. This is because of the lack of balance in data coverage, and cultural differences in sentiment expressions, as well as behavioral biases inherent in the data. In addition, issues of cross-border personalization have proved to be difficult. This is because of differences in purchasing habits, rating standards, and language usage, as well as cultural differences. In multilingual models, issues of translation noise, and complexity of morphological words, as well as lack of labeled data for certain languages have proved to be challenges.
Scalability has proved to be a real-world limitation. Deep learning models have proved to require substantial computing resources. However, this has proved to be difficult. This is because of the need to balance model complexity with efficiency and energy consumption, as well as cost-effectiveness.
While DL has achieved significant success in e-commerce applications, there are a number of challenges that still persist in the areas of data quality, robustness, scalability, fairness, and global applicability. Overcoming these challenges will involve research in unified multimodal models, culturally intelligent AI, trustworthy AI governance, anomalies, and efficient architectures with global applicability.
13. Open Research Directions
DL is increasingly becoming a foundational component of modern e-commerce ecosystems, enabling intelligent decision-making across recommendation, pricing, sentiment analysis, fraud detection, logistics, and customer experience optimization. Despite substantial progress, the rapid expansion of e-commerce platforms in scale, modality, and geographic reach exposes persistent limitations in current AI systems, particularly in scalability, robustness, personalization, and global adaptability. These limitations, highlighted throughout the preceding sections on modelling, trust, evaluation, and operational challenges, motivate the need for a new generation of research directions. This section outlines future research prospects structured around three complementary pillars: unified multimodal foundation models for e-commerce intelligence, culturally adaptive and cross-border learning, and trustworthy, efficient, and sustainable AI systems.
13.1. Unified Multimodal and Knowledge-Enhanced Foundation Models
A major opportunity for future breakthroughs lies in the development of universal, large-scale foundation models specifically designed for e-commerce environments. While existing multimodal solutions for catalogue intelligence [
49,
50,
51], sentiment representation [
32,
33], and visual verification tasks [
21,
23] have demonstrated strong performance, they remain largely task-centric and loosely coupled. These fragmented approaches limit knowledge transfer across domains and restrict generalization under distributional shifts.
Future foundation models for e-commerce should be designed to jointly encode heterogeneous information sources and operational constraints, including:
Integrating multimodal content such as reviews, descriptions, images, and videos;
Encoding structured relationships from product, merchant, and user graphs [
39,
40];
Combining causal reasoning mechanisms with transformer-based architectures to support decision-aware inference;
Grounding predictions in explicit attributes, taxonomies, and metadata [
51];
Supporting cross-task generalization for recommendation, search, ranking, demand forecasting, and fraud detection.
Beyond representation learning, knowledge graphs and symbolic reasoning offer promising directions for improving robustness and interpretability by injecting domain structure, product hierarchies, and causal dependencies into learned models. Moreover, integrating RL within foundation model architectures could enable end-to-end decision intelligence for pricing, logistics, and warehousing tasks [
43,
44,
70], allowing models to move beyond prediction toward adaptive, policy-driven optimization under uncertainty.
13.2. Economics- and Consumer-Behavior-Informed Deep Learning
A major theoretical gap in much of the current e-commerce DL literature is the limited integration of economic structure and consumer behavior theory into modelling choices and evaluation. Many systems optimize predictive loss functions on logged interactions, yet e-commerce outcomes are generated by strategic interactions among consumers, sellers, and platform mechanisms (ranking, pricing, promotion, and trust enforcement). Purely correlation-driven learning can therefore misestimate true demand response, overfit exposure bias, and produce brittle policies when market conditions or platform rules change.
A promising direction is to combine deep representations with theory-informed components, for example: (i) incorporating utility or choice-based constraints when modelling preference and substitution; (ii) learning demand and price-response models with explicit elasticity structure to support pricing decisions [
70,
71,
114]; (iii) integrating causal identification to separate preference from exposure and to estimate the incremental effect of interventions [
40]; and (iv) using decision-making frameworks (e.g., constrained optimization or RL) whose reward definitions and constraints align with economic objectives and consumer welfare rather than proxy metrics alone. More generally, grounding modelling assumptions in established behavioral theory can improve transparency and strengthen the external validity of reported gains.
13.3. Culturally Adaptive, Cross-Border, and Multilingual Intelligence
As e-commerce platforms increasingly operate across national, linguistic, and cultural boundaries, future AI systems must explicitly account for regional heterogeneity. Linguistic diversity, culturally specific expressions of sentiment, regulatory fragmentation, and market-dependent behavioral norms pose fundamental challenges to globally deployed models. Recent advances in multilingual sentiment analysis and deception detection [
35,
36,
37], cross-market consumer modelling [
91,
103], and multilingual attribute extraction [
113] provide encouraging results but also reveal substantial gaps in robustness and fairness.
Key open research challenges in this domain include:
Learning culturally calibrated sentiment and dissatisfaction representations that account for regional norms and expression styles [
60];
Aligning behavioral representations across heterogeneous markets, platforms, and product ecosystems;
Handling translation ambiguity, code-switching, and sparse supervision in low-resource languages;
Supporting region-specific price elasticity modelling and demand forecasting [
31,
71];
Adapting catalogue structures and product taxonomies to local regulations, compliance standards, and consumer expectations.
In addition to consumer-facing intelligence, cross-border logistics and supply-chain modelling present important research opportunities. Models capable of learning transport delays, customs processes, sustainability constraints, and geopolitical factors can significantly improve forecasting accuracy and operational resilience [
52,
112,
117]. Future work should aim to develop globally coherent models that remain locally adaptive, balancing scalability with sensitivity to regional constraints.
13.4. Trustworthy, Efficient, and Sustainable AI for e-Commerce
Trust, resilience, and sustainability are emerging as defining requirements for next-generation e-commerce AI systems. Fraudulent behavior, adversarial manipulation, and coordinated attacks remain persistent and evolving threats [
24,
77,
79], underscoring the need for advances in adversarial robustness, anomaly detection, and secure graph inference. At the same time, explainability and governance infrastructures must mature to ensure transparency and accountability in recommendation, pricing, fraud prediction, and sentiment analysis [
33,
49,
118].
Toward a Cost–Benefit and Sustainability Accounting Framework
To avoid the possibility of “sustainable AI” being viewed as a monolithic goal unto itself, it is beneficial to view sustainability as a multi-objective cost–benefit analysis problem. In the case of e-commerce, model selection may be viewed as having justification beyond offline performance to include net business benefits that are subject to resource constraints, latency constraints, as well as sustainability. In other words, a comprehensive evaluation of model performance may need to balance (i) benefits such as conversion uplifts, retention uplifts, fraud loss reduction, and improvements to service reliability with (ii) costs such as computational requirements for model training and/or inference, energy consumption, carbon emissions, latency effects, and complexity of model engineering, as well as operational risk.
Table 5 summarizes a practical set of metrics that can be reported consistently to support transparent trade-off analysis across model families and deployment options.
Efficiency and scalability remain critical bottlenecks for real-world deployment. Large transformer-based and graph-centric models often conflict with the stringent latency, cost, and energy constraints of global e-commerce platforms. Future research should prioritize:
From a sustainability perspective, environmental goals such as carbon reduction, energy efficiency, and optimized transportation can be incorporated into model design, training, evaluation, and deployment decisions through explicit reporting of trade-offs, e.g.,
Table 5. This means, in practice, (i) reporting energy consumption and CO
2e emissions for both training and inference, (ii) favoring inference efficiency and serving optimizations when accuracy improvements do not provide a net benefit, (iii) using distillation, pruning, and retrieval/ranking cascade approaches to move computation to more efficient steps, and (iv) using carbon-aware scheduling and region-aware deployment when possible.
However, as AI-driven reinforcement learning becomes more prominent in pricing, warehousing, and logistics, it is important to ensure the integration of sustainability with safety and governance, thereby requiring uncertainty-aware policies, satisfaction of constraints, and human-in-the-loop approaches to address high-impact or low-confidence cases to avoid costly mistakes that may jeopardize the achievement of economic and environmental objectives.
In the future, the direction of deep learning in the context of the e-commerce industry will be defined by the confluence of unified multimodal foundation models, culturally adaptive global intelligence, and safe, efficient, and sustainable AI systems, and it is important to address the open challenges to ensure that deep learning, besides improving the economic efficiency of the global digital marketplace, promotes fairness, trust, and sustainability in the global digital marketplace.
14. Conclusions
DL has firmly established itself as a foundational technology for modern e-commerce, enabling intelligent behavior modelling, multimodal product understanding, personalized recommendation, dynamic pricing, logistics optimization, and large-scale fraud and anomaly detection. In this survey, we synthesized recent advances across representation learning, sequential and graph reasoning, reinforcement learning, catalogue intelligence, sentiment analysis, and cyber–physical security, illustrating how DL permeates the entire e-commerce value-chain, from content integrity and consumer trust to supply-chain forecasting and cross-border personalization.
By organizing the literature across modelling paradigms, application domains, and operational constraints, this survey provides a unifying perspective on how diverse DL techniques collectively support scalable, data-driven commerce. In particular, it highlights the increasing convergence of multimodal learning, graph-based inference, and decision-oriented reinforcement learning as core enablers of intelligent and adaptive e-commerce systems.
Despite these advances, substantial challenges remain. E-commerce models must operate under extreme scale, heterogeneous and noisy multimodal data, continuous distribution shifts, and persistent adversarial pressure. Cross-border deployments introduce additional complexity in the form of cultural variability, linguistic diversity, and heterogeneous regulatory requirements. Catalogue inconsistencies across sellers and regions, evolving security threats, and the safety and explainability limitations of reinforcement learning systems further constrain real-world deployment. Moreover, benchmarking and reproducibility remain difficult due to the predominance of proprietary datasets, while offline evaluation metrics often fail to reflect true business impact.
Looking ahead, the evolution of e-commerce intelligence will be shaped by unified multimodal foundation models, culturally adaptive and multilingual representations, energy-efficient and low-latency architectures, and trustworthy, governance-aligned AI systems. Advances in knowledge-enhanced reasoning, causal inference, robust anomaly detection, and sustainable logistics optimization are expected to drive the next wave of innovation. RL will play an increasingly central role in pricing, routing, and warehouse automation, while model governance frameworks will be essential to ensure fairness, accountability, transparency, and regulatory compliance.
As global digital commerce continues to expand in scale and complexity, DL will remain the primary analytical and decision-making engine enabling platforms to deliver personalized, transparent, and resilient services. The research directions identified in this survey outline a clear path toward more intelligent, secure, and sustainable e-commerce ecosystems capable of meeting the demands of a rapidly evolving digital economy.