AI-Driven Resilient Reverse Logistics Network for Electric Vehicle Battery Circular Economy: A Deep Reinforcement Learning Approach with Multi-Objective Optimization Under Disruption Uncertainty

Almuwallad, Mansour

doi:10.3390/en19030738

Open AccessArticle

AI-Driven Resilient Reverse Logistics Network for Electric Vehicle Battery Circular Economy: A Deep Reinforcement Learning Approach with Multi-Objective Optimization Under Disruption Uncertainty

by

Mansour Almuwallad

Department of Management Science, Yanbu Industrial College, Royal Commission for Jubail and Yanbu, Yanbu 46452, Saudi Arabia

Energies 2026, 19(3), 738; https://doi.org/10.3390/en19030738

Submission received: 12 January 2026 / Revised: 22 January 2026 / Accepted: 25 January 2026 / Published: 30 January 2026

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Download

Browse Figures

Versions Notes

Abstract

The rapid growth of electric vehicles (EVs) has created an urgent need for sustainable end-of-life battery management systems. This paper presents a novel AI-driven framework for designing resilient reverse logistics networks that optimize the collection, testing, repurposing, and recycling of EV batteries within a circular economy context. We develop a bi-level optimization model in which the upper level determines strategic facility location decisions under disruption uncertainty, and the lower level employs deep reinforcement learning (DRL) to make dynamic operational decisions including battery routing, State-of-Health (SoH)-based sorting, and inventory management. The model simultaneously optimizes three objectives: total supply chain cost minimization, carbon emission reduction, and resilience maximization. A novel Fuzzy-Robust Stochastic programming approach with Conditional Value-at-Risk (FRS-CVaR) handles hybrid uncertainty from demand variability, supply disruptions, and material price volatility. We propose an enhanced Non-dominated Sorting Genetic Algorithm III (NSGA-III) integrated with Proximal Policy Optimization (PPO) for an efficient solution. The framework is validated through a comprehensive case study of the Gulf Cooperation Council (GCC) region, demonstrating that the AI-driven approach reduces total costs by 18.7%, decreases carbon emissions by 23.4%, and improves supply chain resilience by 31.2% compared to traditional optimization methods. Ablation studies across 10 independent runs with different random seeds confirm the robustness of these findings (95% confidence intervals within ±2.3% for all metrics). Sensitivity analysis reveals that battery SoH prediction accuracy and facility redundancy levels significantly impact network performance. This research contributes to both methodology and practice by providing decision-makers with an intelligent, adaptive tool for sustainable EV battery lifecycle management.

Keywords:

electric vehicle batteries; reverse logistics; deep reinforcement learning; multi-objective optimization; circular economy; supply chain resilience; disruption management

1. Introduction

The global transition toward sustainable transportation has accelerated electric vehicle (EV) adoption at an unprecedented rate. Global EV sales reached 14 million units in 2023, with projections indicating that annual sales will exceed 40 million by 2030 [1]. This exponential growth is generating a corresponding surge in end-of-life (EoL) battery volumes, with estimates suggesting that 1.5–3.3 million tons of EV batteries will require management annually by 2040 in China alone [2]. Improper handling of these batteries poses significant environmental and safety risks due to toxic materials including lithium, cobalt, and nickel, while simultaneously representing a substantial loss of valuable resources critical for battery manufacturing supply chains [3,4].

The circular economy (CE) paradigm offers a promising approach to address this challenge by maximizing resource utilization through strategies including battery repurposing for secondary applications (e.g., stationary energy storage systems), remanufacturing, and material recycling [5,6]. However, implementing effective CE strategies for EV batteries requires sophisticated reverse logistics networks capable of handling the inherent complexity and uncertainty in battery returns, variable battery conditions, fluctuating material prices, and potential supply chain disruptions [7,8]. Recent geopolitical tensions and pandemic-induced disruptions have highlighted the vulnerability of global battery supply chains, with critical material prices experiencing up to 80% fluctuations within 12-month periods in some cases [9].

Traditional optimization approaches for reverse logistics network design face significant limitations when addressing the dynamic, uncertain, and multi-faceted nature of EV battery management. First, conventional mathematical programming models assume static decision-making environments, thereby failing to capture the real-time adaptability required for efficient battery collection and processing [10,11]. Second, existing approaches often consider single objectives (typically cost minimization), thus neglecting the multi-dimensional sustainability requirements, including environmental impact and social considerations [12,13]. Third, most studies inadequately address the hybrid uncertainty arising from multiple sources including demand stochasticity, processing cost ambiguity, and disruption scenarios [14,15].

Artificial intelligence (AI), particularly deep reinforcement learning (DRL), has emerged as a transformative technology capable of addressing these limitations [16,17]. Recent advances demonstrate that AI-enhanced diagnostics can extend second-life battery service by up to 50% and reduce lifecycle costs by 25% [18,19]. Furthermore, AI-driven sorting and classification systems can significantly improve material recovery rates while reducing processing time and costs [20]. However, integrating AI capabilities with strategic supply chain network design remains largely unexplored in the literature.

A critical technical bottleneck hindering this integration is the computational intractability of jointly optimizing strategic facility location decisions and operational routing policies under uncertainty. Traditional approaches either (i) solve the strategic problem assuming simplified operational rules, leading to suboptimal network designs, or (ii) require exhaustive enumeration of operational scenarios for each candidate configuration, resulting in prohibitive computational costs. Previous attempts to address this have failed because heuristic-based operational rules cannot capture the complex, state-dependent decisions required when battery conditions, market prices, and facility availability change dynamically. Our bi-level framework overcomes this barrier by employing a learned DRL policy that can generalize across network configurations through policy transfer mechanisms, enabling efficient fitness evaluation during the evolutionary search without sacrificing operational decision quality. The “learning” component is essential at the strategic phase because it enables the algorithm to accurately estimate the operational performance of candidate network configurations without solving expensive stochastic programs for each evaluation.

This research addresses these gaps by developing a novel AI-driven framework for resilient reverse logistics network design that integrates: (1) machine learning-based battery State-of-Health prediction for intelligent sorting decisions; (2) deep reinforcement learning for dynamic operational optimization; (3) multi-objective optimization considering economic, environmental, and resilience criteria; and (4) hybrid uncertainty handling through fuzzy-robust stochastic programming with risk measures. The framework is designed specifically for the EV battery context, incorporating unique characteristics including cascaded lifecycle pathways (second-life vs. end-of-life), chemistry-specific processing requirements, and regulatory compliance considerations.

The main contributions of this paper are:

1.: Methodological Innovation: We develop a novel bi-level optimization framework that integrates strategic network design with AI-driven operational decision-making for EV battery reverse logistics. We provide a formal mathematical characterization of the coupling between upper-level (optimized) and lower-level (learned) decisions, including explicit policy transfer mechanisms across network configurations.
2.: AI Integration: We propose the first application of deep reinforcement learning (specifically, Proximal Policy Optimization) for dynamic battery routing and sorting decisions within a closed-loop supply chain context.
3.: Uncertainty Handling: We introduce a novel Fuzzy-Robust Stochastic programming approach with CVaR (FRS-CVaR) that simultaneously addresses epistemic uncertainty (via fuzzy sets), aleatory uncertainty (via scenarios), and tail risk (via CVaR), with explicit categorization of how each uncertainty type manifests in the EV battery context.
4.: Multi-Objective Optimization: We develop an enhanced NSGA-III algorithm incorporating adaptive evolution strategies for solving the tri-objective problem of cost minimization, emission reduction, and resilience maximization, with comprehensive ablation studies and multi-seed statistical validation.
5.: Practical Validation: We validate the framework through a comprehensive case study of the GCC region, providing actionable insights for policymakers and practitioners in developing economies transitioning toward sustainable EV battery management.

The remainder of this paper is organized as follows. Section 2 reviews the relevant literature. Section 3 describes the problem and presents the mathematical formulation. Section 4 details the solution methodology including the DRL architecture and multi-objective algorithm. Section 5 presents the case study and computational results. Section 6 discusses managerial implications, and Section 7 concludes with limitations and future research directions.

2. Literature Review

This section reviews three streams of literature relevant to our research: (1) EV battery reverse logistics and circular economy, (2) AI applications in supply chain management, and (3) multi-objective optimization under uncertainty.

2.1. EV Battery Reverse Logistics and Circular Economy

The management of end-of-life EV batteries has received increasing scholarly attention as the EV market matures. A comprehensive systematic review by Li et al. [21], which analyzed 165 articles, identified three main research areas: manufacturing-oriented processes (disassembly, repurposing, remanufacturing, recycling), logistics-oriented operations (collection, transportation, facility location), and assessment-oriented activities (life cycle assessment, policy analysis). The review found that approximately 72.7% of existing studies focus on manufacturing-oriented processes, with logistics optimization remaining relatively underexplored.

Closed-loop supply chain (CLSC) models for EV batteries have been developed by several researchers. Wang et al. [22] proposed a Stackelberg game-theoretic model examining four mixed-channel recycling models under carbon cap-and-trade mechanisms. Their results demonstrated that carbon trading regulations effectively manage emissions without significantly impacting return rates. Zhang et al. [23] developed a two-stage stochastic programming model for sustainable CLSC design considering economic, environmental, and social criteria. Gu et al. [24] analyzed the impact of government subsidies on EV battery recycling using evolutionary game theory, demonstrating the effectiveness of policy interventions in promoting recycling adoption.

The integration of circular economy principles with supply chain design has been addressed by Lotfi et al. [25], who proposed a net-zero, resilient, and agile CLSC network design considering renewable energy. Their model incorporated multiple recovery pathways but did not specifically address the unique characteristics of EV batteries. Saeed et al. [26] introduced a four-valued refined neutrosophic optimization approach for smartphone CLSC, demonstrating the potential of advanced uncertainty handling methods. Kazancoglu et al. [27] examined CE practices in the automotive industry, highlighting barriers and enablers for sustainable battery management.

Battery repurposing for second-life applications has attracted significant attention due to its potential for value recovery [28,29]. Martinez et al. [30] evaluated the technical and economic viability of second-life batteries for grid storage applications, finding positive returns under favorable electricity pricing. Richa et al. [31] conducted a comprehensive environmental impact assessment comparing direct recycling versus second-life pathways, demonstrating substantial emission reduction potential from cascaded use strategies.

2.2. AI Applications in Supply Chain Management

Artificial intelligence has transformed various aspects of supply chain management, with applications ranging from demand forecasting to autonomous decision-making [32,33]. In the battery domain, AI technologies have primarily been applied to battery management systems (BMsS), including State-of-Health estimation, remaining useful life prediction, and anomaly detection [19,34]. Machine learning algorithms—including neural networks, support vector machines, and ensemble methods—have demonstrated high accuracy in predicting battery degradation patterns [35,36].

Recent advances in deep reinforcement learning have opened new possibilities for supply chain optimization. Gijsbrechts et al. [17] demonstrated that DRL agents can learn effective inventory policies in complex, multi-echelon supply chains. Chen et al. [37] applied DRL to inventory management in uncertain environments, demonstrating superior performance compared to traditional stochastic programming approaches. Oroojlooy and Hajinezhad [38] provided a comprehensive survey of DRL applications in supply chain management, identifying key opportunities and challenges.

In the context of battery recycling, AI has been applied to sorting and classification tasks. Zanoletti et al. [20] demonstrated that AI-driven systems can rapidly distinguish battery chemistries, materials, and components, significantly improving material recovery rates. Zhu et al. [39] developed a machine learning framework for predicting optimal disassembly sequences, reducing processing time and improving worker safety. Digital twin technologies have also emerged as a promising approach for battery lifecycle management, enabling real-time monitoring and predictive optimization [40,41].

Recent developments in digital twins for reverse logistics have shown particular promise for EV battery applications. Tao et al. [42] proposed a digital twin-enabled framework for product lifecycle management that captures real-time condition data throughout the battery’s service life. Notably, the emergence of the EU Battery Passport regulation [43] mandates comprehensive data tracking including manufacturing details, SoH history, and carbon footprint data, which provides the foundational data infrastructure assumed in our DRL framework. This regulatory development ensures that the granular battery state information required by our model will increasingly become available in practice. Furthermore, Liu et al. [44] demonstrated how BMS data streams can be integrated with supply chain decision systems, achieving 15–20% improvement in second-life allocation accuracy when real-time telemetry is incorporated. Our model assumes that such BMS data feeds into the DRL agent’s state representation through the “predicted SoH distributions” (

A_{t}

) component, where data from connected battery packs is aggregated at collection centers before routing decisions are made.

2.3. Multi-Objective Optimization Under Uncertainty

Supply chain network design inherently involves multiple conflicting objectives and various sources of uncertainty [45,46]. Multi-objective evolutionary algorithms, particularly NSGA-II and NSGA-III, have been widely applied to solve such problems [47,48]. NSGA-III, with its reference-point-based selection mechanism, is particularly effective for many-objective optimization problems with three or more objectives.

Uncertainty in supply chain optimization has been addressed through various approaches including stochastic programming [49], robust optimization [50], and fuzzy programming [51]. Recent work has focused on hybrid approaches that combine multiple uncertainty representations. Haseli et al. [15] proposed a fuzzy-robust approach for CLSC design, while Sabouhi et al. [14] integrated scenario-based stochastic programming with conditional value-at-risk measures for risk management.

Supply chain resilience has emerged as a critical consideration following recent global disruptions [52,53]. Ribeiro and Barbosa-Povoa [54] developed a resilience assessment framework for supply chain networks, identifying key vulnerability factors and mitigation strategies. Dolgui et al. [55] examined ripple effects in supply chains and proposed proactive and reactive resilience measures. In the EV battery context, Chen et al. [56] assessed supply chain resilience for lithium-ion batteries, identifying critical bottlenecks in raw material sourcing and processing capacity.

2.4. Research Gaps and Contributions

Based on the literature review, we identify the following research gaps that this study addresses:

1.: Limited AI integration in CLSC design: While AI has been applied to individual components (e.g., sorting, demand forecasting), no existing study integrates AI-driven operational decision-making with strategic network design for EV batteries.
2.: Static decision-making assumptions: Most reverse logistics models assume static parameters and decisions, failing to capture the dynamic adaptability required in practice.
3.: Single-objective focus: Many studies prioritize cost minimization, inadequately addressing the multi-dimensional sustainability requirements of circular economy implementation.
4.: Incomplete uncertainty treatment: Existing approaches typically address one type of uncertainty (stochastic or fuzzy), not the hybrid uncertainty prevalent in real-world EV battery supply chains.
5.: Lack of resilience consideration: Few studies explicitly optimize for supply chain resilience in the context of EV battery reverse logistics.

Table 1 summarizes how this study addresses these gaps compared to recent related work.

3. Problem Description and Mathematical Formulation

3.1. Problem Description

We consider the design and operation of a multi-echelon reverse logistics network for end-of-life electric vehicle batteries. The network comprises multiple facility types organized in a hierarchical structure, as illustrated in Figure 1. EV owners and dealers generate end-of-life batteries that are collected at designated collection centers. The collected batteries undergo initial testing to determine their State-of-Health (SoH), which determines their subsequent pathway: batteries with SoH above a threshold (typically 70–80%) are directed to repurposing centers for second-life applications in energy storage systems, while those below the threshold proceed to dismantling and recycling facilities for material recovery.

The problem involves two interrelated decision levels. At the strategic level, decisions include where to locate facilities, what capacity levels to install, and how to configure the network structure. These decisions are made before uncertainty is realized and remain fixed throughout the planning horizon. At the operational level, decisions include how to route batteries through the network, how to set SoH thresholds for pathway allocation, and how to manage inventory at each facility. These decisions are made dynamically in response to realized conditions and can adapt to disruptions.

Information Flow Between Decision Levels. Figure 2 illustrates the bidirectional coupling between the strategic optimization module and the DRL operational agent. During fitness evaluation within the NSGA-III algorithm, each candidate network configuration

(Y, Z, U, V)

is passed to the DRL module. The DRL agent does not require complete retraining for every candidate configuration. Instead, we employ a meta-learning approach where a single meta-policy

π_{m e t a}

is pre-trained on a diverse set of representative network topologies (approximately 1000 configurations sampled from the feasible space). During NSGA-III evaluation, the network configuration is encoded into an embedding vector

N

(see Section 4.2.5), which conditions the meta-policy’s behavior. For configurations that deviate significantly from the training distribution (measured by Hamming distance exceeding a threshold

τ_{H} = 0.3

), a brief warm-start fine-tuning of 5000 episodes is applied. This transfer learning mechanism reduces evaluation time from approximately 2 h (full training) to under 3 min per configuration while maintaining solution quality within 2% of fully-trained policies.

3.2. Sets and Indices

Table 2 defines the sets used in the mathematical formulation.

3.3. Parameters

Table 3 defines the model parameters.

3.4. Decision Variables

Table 4 defines the decision variables.

3.5. Objective Functions

The model optimizes three conflicting objectives simultaneously.

3.5.1. Objective 1: Minimize Total Supply Chain Cost

The first objective minimizes the expected total cost including facility establishment costs, operational costs, transportation costs, and processing costs, while incorporating CVaR to manage cost risk:

min Z_{1} = F C + E [O C] + λ \cdot {CVaR}_{β} (T C)

(1)

where the facility cost component is:

F C = \sum_{j \in J} \sum_{l \in L} f c_{j l} \cdot Y_{j l} + \sum_{k \in K} \sum_{l \in L} f t_{k l} \cdot Z_{k l} + \sum_{r \in R} \sum_{l \in L} f r_{r l} \cdot U_{r l} + \sum_{m \in M} \sum_{l \in L} f m_{m l} \cdot V_{m l}

(2)

The expected operational cost is:

E [O C] = \sum_{s \in S} p_{s} [\sum_{i, j, t} t c_{i j} \cdot X_{i j t s} + \sum_{j, k, t} t c_{j k} \cdot F_{j k t s} + \sum_{k, r, t} t c_{k r} \cdot Q_{k r t s} + \sum_{k, m, t} t c_{k m} \cdot W_{k m t s}]

(3)

The CVaR component ensures robustness against worst-case scenarios:

{CVaR}_{β} (T C) = η + \frac{1}{1 - β} \sum_{s \in S} p_{s} \cdot max (T C_{s} - η, 0)

(4)

3.5.2. Objective 2: Minimize Total Carbon Emissions

The second objective minimizes lifecycle carbon emissions:

min Z_{2} = E_{t r a n s} + E_{p r o c} - E_{a v o i d e d}

(5)

where transportation emissions are:

E_{t r a n s} = \sum_{s \in S} p_{s} [\sum_{i, j, t} e_{i j} \cdot X_{i j t s} + \sum_{j, k, t} e_{j k} \cdot F_{j k t s} + \sum_{k, r, t} e_{k r} \cdot Q_{k r t s} + \sum_{k, m, t} e_{k m} \cdot W_{k m t s}]

(6)

Processing emissions account for energy consumption at each facility type:

E_{p r o c} = \sum_{s \in S} p_{s} [\sum_{k, t} ϵ_{k}^{T} \cdot \sum_{j} F_{j k t s} + \sum_{r, t} ϵ_{r}^{R} \cdot \sum_{k} Q_{k r t s} + \sum_{m, t} ϵ_{m}^{M} \cdot \sum_{k} W_{k m t s}]

(7)

Avoided emissions credit the system for displacing virgin material production:

E_{a v o i d e d} = \sum_{s \in S} p_{s} \sum_{m, t, c} ξ_{c} \cdot γ_{c} \cdot W_{k m t s}^{c}

(8)

3.5.3. Objective 3: Maximize Supply Chain Resilience

The third objective maximizes network resilience through a composite metric grounded in established resilience engineering principles [53,54]:

max Z_{3} = w_{1} \cdot R_{r e d u n d a n c y} + w_{2} \cdot R_{c o n n e c t i v i t y} + w_{3} \cdot R_{f l e x i b i l i t y}

(9)

Our resilience formulation is motivated by three foundational resilience capabilities identified in the supply chain resilience literature [52,53]:

Absorptive capacity (Redundancy): The ability to absorb disruptions through excess resources.

R_{r e d u n d a n c y}

measures the ratio of installed capacity to peak expected demand:

R_{r e d u n d a n c y} = \frac{\sum_{j, l} c a p_{j l}^{C} \cdot Y_{j l} - {max}_{t} \sum_{i} {\bar{d}}_{i t}}{{max}_{t} \sum_{i} {\bar{d}}_{i t}}

(10)

Adaptive capacity (Connectivity): The ability to reconfigure flows when disruptions occur:

R_{c o n n e c t i v i t y} = \frac{\sum_{j} \sum_{l} Y_{j l} \cdot \sum_{k} \sum_{l} Z_{k l}}{| J | \cdot | K |}

(11)

Restorative capacity (Flexibility): The ability to recover from disruptions by reallocating resources:

R_{f l e x i b i l i t y} = 1 - \frac{\sum_{s} p_{s} \cdot Δ_{s}}{E [D]}

(12)

where

Δ_{s}

represents unmet demand in scenario s.

The weights

(w_{1}, w_{2}, w_{3})

balance the three resilience components. We adopt the baseline weights

(0.4, 0.3, 0.3)

based on a redundancy premium reflecting its direct, quantifiable impact on disruption absorption and a connectivity–flexibility balance recognizing that structural and operational adaptability are complementary.

Sensitivity Analysis on Resilience Weights. To examine how the optimal network topology changes under different resilience strategies, we conducted experiments varying the weights

(w_{1}, w_{2}, w_{3})

. Table 5 summarizes the results.

Key insights: (i) Prioritizing redundancy (

w_{1} = 0.6

) increases costs by 5.8% but improves resilience by 8.3%, achieved by opening three additional backup collection centers; (ii) connectivity focus produces a hub-and-spoke topology with testing facilities serving as central nodes; (iii) flexibility focus yields the lowest-cost solution with fewer but more versatile facilities capable of dynamic capacity reallocation. Managerially, organizations facing high disruption risk should weight redundancy more heavily, while those with strong logistics capabilities can prioritize flexibility for cost savings.

3.6. Constraints

3.6.1. Flow Conservation Constraints

Battery flow conservation at collection centers:

\sum_{i \in I} X_{i j t s} + I_{j, t - 1, s}^{C} = \sum_{k \in K} F_{j k t s} + I_{j t s}^{C} \forall j \in J, t \in T, s \in S

(13)

Flow balance at testing facilities:

\sum_{j \in J} F_{j k t s} = \sum_{r \in R} Q_{k r t s} + \sum_{m \in M} W_{k m t s} \forall k \in K, t \in T, s \in S

(14)

SoH-based pathway allocation:

\sum_{r \in R} Q_{k r t s} = \bar{α} \cdot \sum_{j \in J} F_{j k t s} \forall k \in K, t \in T, s \in S

(15)

3.6.2. Capacity Constraints

Collection center capacity:

\sum_{i \in I} X_{i j t s} \leq \sum_{l \in L} c a p_{j l}^{C} \cdot Y_{j l} \cdot (1 - δ_{j s}) \forall j \in J, t \in T, s \in S

(16)

Testing facility capacity:

\sum_{j \in J} F_{j k t s} \leq \sum_{l \in L} c a p_{k l}^{T} \cdot Z_{k l} \cdot (1 - δ_{k s}) \forall k \in K, t \in T, s \in S

(17)

Repurposing center capacity:

\sum_{k \in K} Q_{k r t s} \leq \sum_{l \in L} c a p_{r l}^{R} \cdot U_{r l} \cdot (1 - δ_{r s}) \forall r \in R, t \in T, s \in S

(18)

Recycling center capacity:

\sum_{k \in K} W_{k m t s} \leq \sum_{l \in L} c a p_{m l}^{M} \cdot V_{m l} \cdot (1 - δ_{m s}) \forall m \in M, t \in T, s \in S

(19)

3.6.3. Facility Selection and Budget Constraints

Single capacity level selection:

\sum_{l \in L} Y_{j l} \leq 1 \forall j \in J

(20)

\sum_{l \in L} Z_{k l} \leq 1 \forall k \in K

(21)

Budget constraint:

F C \leq B

(22)

Non-negativity and binary constraints:

X_{i j t s}, F_{j k t s}, Q_{k r t s}, W_{k m t s}, I_{j t s}^{C} \geq 0

(23)

Y_{j l}, Z_{k l}, U_{r l}, V_{m l} \in {0, 1}

(24)

4. Solution Methodology

This section presents our integrated solution methodology, which combines strategic optimization with AI-driven operational decision-making. The overall framework is illustrated in Figure 3.

4.1. Fuzzy-Robust Stochastic Transformation (FRS-CVaR)

To handle the hybrid uncertainty in our model, we develop a novel transformation approach that addresses three uncertainty types simultaneously.

Justification for the Hybrid FRS-CVaR Approach. A natural question arises: why combine fuzzy sets with robust stochastic programming when robust optimization already accounts for worst-case scenarios? The key insight is that these techniques address fundamentally different uncertainty manifestations in the EV battery context:

Epistemic uncertainty (Fuzzy): Processing costs and some demand parameters cannot be characterized by probability distributions because historical data is limited (the EV battery recycling industry is nascent). Expert judgments provide bounds and plausible ranges, which are naturally represented as fuzzy numbers. Standard robust optimization with interval uncertainty would treat all values within the interval as equally likely, leading to overly conservative decisions.
Aleatory uncertainty (Scenarios): Battery return volumes and disruption events have sufficient historical analogues (from related industries) to construct probability distributions. Scenario-based stochastic programming captures this randomness efficiently.
Tail risk (CVaR): Decision-makers are particularly concerned about extreme cost outcomes. CVaR explicitly manages the expected cost in the worst $β$ -percentile of scenarios.

We conducted comparative experiments (reported in Section 5) showing that (i) pure stochastic programming (without fuzzy layer) underestimates costs by 12–15% when expert-judged parameters deviate from assumed distributions; (ii) pure robust optimization (without scenarios) yields 8–10% higher costs due to excessive conservatism; (iii) omitting CVaR increases variance in realized costs by 45% while achieving only marginal expected cost reduction. The hybrid approach achieves the best trade-off between expected performance and risk management.

Why Simpler Models Do Not Suffice. We explicitly compared the proposed bi-level DRL framework against simpler alternatives: (1) a two-stage stochastic program with recourse, and (2) a standard robust optimization formulation. The results show that the two-stage SP achieves only 6.7% improvement over deterministic MILP, while our approach achieves 18.7%. This gap arises because two-stage SP assumes a fixed recourse policy structure, whereas our DRL agent learns an adaptive policy that makes state-dependent decisions based on realized inventory levels, facility status, and current market conditions. The robust optimization formulation, while providing worst-case guarantees, produces solutions that are 23% more costly in expected performance due to the inherent conservatism of min-max objectives. Our framework bridges this gap by combining the adaptability of DRL with the risk management capabilities of CVaR, achieving strong expected performance while maintaining acceptable tail-risk profiles.

4.1.1. Step 1: Fuzzy Parameter Defuzzification

Epistemic uncertainty in parameters such as processing costs and demand is represented using trapezoidal fuzzy numbers

\tilde{a} = (a_{1}, a_{2}, a_{3}, a_{4})

. We apply the possibilistic mean-variance approach:

E [\tilde{a}] = \frac{1}{4} (a_{1} + a_{2} + a_{3} + a_{4})

(25)

V a r [\tilde{a}] = \frac{1}{80} [{(a_{4} - a_{1})}^{2} + {(a_{3} - a_{2})}^{2}] + \frac{1}{48} (a_{4} - a_{1}) (a_{3} - a_{2})

(26)

The fuzzy constraint

\sum_{j} c_{j} x_{j} \leq \tilde{b}

is converted to its crisp equivalent:

\sum_{j} c_{j} x_{j} \leq E [\tilde{b}] - Φ^{- 1} (θ) \cdot \sqrt{V a r [\tilde{b}]}

(27)

where

θ

is the confidence level and

Φ^{- 1}

is the inverse standard normal distribution.

4.1.2. Step 2: Scenario Generation and Reduction

Random uncertainty in demand and disruption events is captured through scenario-based stochastic programming. We generate scenarios using Latin Hypercube Sampling (LHS) and reduce them using the forward selection algorithm [57]:

D_{K} (P, Q_{n}) = inf_{Q \in Q_{n}} D_{K} (P, Q)

(28)

where

D_{K}

is the Kantorovich distance and

Q_{n}

is the set of discrete distributions with at most n scenarios.

4.1.3. Step 3: CVaR Integration for Risk Management

The CVaR formulation is linearized using auxiliary variables:

{CVaR}_{β} (T C) = η + \frac{1}{1 - β} \sum_{s \in S} p_{s} \cdot z_{s}

(29)

subject to:

z_{s} \geq T C_{s} - η \forall s \in S

(30)

z_{s} \geq 0 \forall s \in S

(31)

4.2. Deep Reinforcement Learning Architecture

The operational decision layer employs a Proximal Policy Optimization (PPO) agent [58] to make dynamic routing and sorting decisions. The DRL architecture is illustrated in Figure 4.

4.2.1. State Space

The state representation at time t includes:

s_{t} = {I_{t}, A_{t}, O_{t}, P_{t}, τ_{t}, N_{t}}

(32)

where

I_{t}

represents current inventory levels,

A_{t}

denotes pending battery arrivals with predicted SoH distributions,

O_{t}

is the facility operational status vector,

P_{t}

contains current material prices,

τ_{t}

is the time period indicator, and

N_{t}

is the network configuration embedding.

Handling SoH Prediction Uncertainty. The component

A_{t}

encodes not point estimates but distributions of predicted SoH values for pending battery arrivals. Specifically, for each batch of arriving batteries, the state includes the mean (

μ_{S o H}

), standard deviation (

σ_{S o H}

), and estimated prediction error (

ϵ_{p r e d}

) derived from the SoH prediction model’s calibration data. During DRL training, the environment simulator explicitly models prediction errors by sampling true SoH values from:

{SoH}_{t r u e} \sim N ({SoH}_{p r e d i c t e d}, σ_{e r r o r}^{2})

(33)

where

σ_{e r r o r}

is calibrated to match empirical prediction accuracy levels (85–95% as varied in sensitivity analysis). The agent observes only

{SoH}_{p r e d i c t e d}

, while rewards are computed based on

{SoH}_{t r u e}

. This training regime ensures the agent learns robust policies that account for prediction uncertainty. We verified that agents trained with perfect predictions (optimistic training) suffer 8.3% performance degradation when deployed with realistic prediction errors, whereas our noise-injected training maintains performance within 1.5% of the oracle policy that observes true SoH values.

4.2.2. Action Space

Actions include routing decisions (

a_{t}^{r o u t e}

), sorting threshold adjustments (

a_{t}^{s o r t}

), and inventory rebalancing quantities (

a_{t}^{i n v}

).

4.2.3. Reward Function

The reward signal combines operational cost minimization, emission reduction, and service level maintenance:

r_{t} = - ω_{1} \cdot {\hat{c}}_{t} - ω_{2} \cdot {\hat{e}}_{t} + ω_{3} \cdot {\hat{v}}_{t} - ω_{4} \cdot p_{t}

(34)

where

{\hat{c}}_{t}

is normalized operational cost,

{\hat{e}}_{t}

is normalized emissions,

{\hat{v}}_{t}

is value recovered, and

p_{t}

is penalty for service level violations.

4.2.4. PPO Algorithm

The PPO objective maximizes:

L^{C L I P} (θ) = E_{t} [min (r_{t} (θ) {\hat{A}}_{t}, clip (r_{t} (θ), 1 - ϵ, 1 + ϵ) {\hat{A}}_{t})]

(35)

where

r_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{o l d}} (a_{t} | s_{t})}

is the probability ratio and

{\hat{A}}_{t}

is the advantage estimate computed using Generalized Advantage Estimation (GAE).

4.2.5. Policy Transfer Across Network Configurations

We address the computational challenge of evaluating many candidate network configurations through two complementary mechanisms:

Mechanism 1: Network-Conditioned Policy. We embed the network configuration into a fixed-dimensional vector:

N = Embed (Y, Z, U, V) = [n_{1}, n_{2}, \dots, n_{d}]

(36)

Encoding Scheme Details. The network embedding

N \in R^{d}

with

d = 64

is constructed as follows. First, binary facility decisions are encoded: for each facility type, we create a binary vector indicating which locations are open (e.g.,

y \in {0, 1}^{| J |}

where

y_{j} = 1

if

\sum_{l} Y_{j l} = 1

). These binary vectors (totaling

| J | + | K | + | R | + | M |

dimensions) are then passed through a learnable embedding layer that projects them to a 64-dimensional continuous space:

N = σ (W_{2} \cdot ReLU (W_{1} \cdot [y; z; u; v] + b_{1}) + b_{2})

(37)

where

W_{1} \in R^{128 \times (| J | + | K | + | R | + | M |)}

,

W_{2} \in R^{64 \times 128}

are learned weight matrices, and

σ

is layer normalization.

Addressing Discrete Combinatorial Sensitivity. A valid concern is that facility location is a discrete combinatorial problem where small changes in configuration can result in substantially different optimal routing. We address this through three design choices: (1) the embedding layer is trained jointly with the policy network on a diverse set of 1000+ network configurations, learning to capture structural similarities (e.g., configurations with similar geographic coverage produce similar embeddings); (2) the meta-policy is trained with configuration augmentation, where during each training episode the network configuration is perturbed with probability 0.1 to encourage robustness to minor topology changes; (3) for configurations where the policy’s performance degrades significantly (detected via a validation rollout), warm-start fine-tuning is triggered (Mechanism 2). Our ablation results (Section 5.4) confirm that this approach successfully generalizes: removing policy transfer increases costs by only 5.8%, indicating that the meta-policy captures the essential operational structure across diverse configurations.

Mechanism 2: Warm-Start Transfer. For configurations not well-covered during meta-training:

θ_{n_{n e w}} \leftarrow θ_{n_{c l o s e s t}} + Δ θ

(38)

where

n_{c l o s e s t}

is the closest previously trained configuration in Hamming distance.

Empirical Validation of Warm-Start Transfer. To validate this mechanism, we conducted controlled experiments comparing three initialization strategies for 100 randomly sampled network configurations not seen during meta-training: (a) random initialization, (b) initialization from meta-policy

π_{m e t a}

, and (c) warm-start from closest trained configuration. Table 6 summarizes the results.

The warm-start strategy achieves 98.1% of full-training performance with only 1.4% of the computational budget, validating its effectiveness. The theoretical justification follows from the Lipschitz continuity of optimal value functions in Markov Decision Processes. Specifically, Theorem 3.1 of Rachelson et al. [59] establishes that for MDPs with bounded rewards and Lipschitz-continuous transition dynamics, the optimal value function

V^{*} (s)

satisfies

| V^{*} (s) - V^{*} (s^{'}) | \leq L_{V} \cdot d (s, s^{'})

for some Lipschitz constant

L_{V}

. In our context, network configurations that are “close” in Hamming distance induce similar transition dynamics (since routing options and capacity constraints change gradually), implying that their optimal policies are also similar. This theoretical property ensures that warm-start initialization from a nearby configuration provides a good starting point for policy optimization.

4.3. Enhanced NSGA-III Algorithm

We enhance the standard NSGA-III algorithm [48] with adaptive initialization, DRL-guided evaluation, and elite retention with local search. The algorithm pseudocode is presented in Algorithm 1.

Algorithm 1 Enhanced NSGA-III with DRL Integration

Input: Population size N, generations G, reference points H
Output: Pareto-optimal solution set $P^{*}$
Initialize population $P_{0}$ using adaptive initialization
Pre-train meta-policy $π_{m e t a}$ on diverse network configurations
for $g = 1$ to G do
Generate offspring $Q_{g}$ via crossover and mutation
Combine $R_{g} = P_{g} \cup Q_{g}$
for each solution $n \in R_{g}$ do
Compute network embedding $N = Embed (n)$
if $n$ requires fine-tuning then
$π_{n} \leftarrow WarmStart (π_{m e t a}, n)$
else
$π_{n} \leftarrow π_{m e t a} (\cdot | N)$
end if
Evaluate strategic objectives using mathematical model
Evaluate operational performance using $π_{n}$
Compute composite fitness $f (n) = f_{s t r a t e g i c} (n) + f_{o p e r a t i o n a l} (π_{n})$
end for
Apply non-dominated sorting to $R_{g}$
Select next generation $P_{g + 1}$ using reference-point association
Apply local search to elite solutions
Update adaptive parameters
end for
return Non-dominated solutions from final population

5. Case Study and Computational Results

5.1. Case Study Description

We validate the proposed framework through a comprehensive case study of the Gulf Cooperation Council (GCC) region, comprising Saudi Arabia, the United Arab Emirates, Kuwait, Qatar, Bahrain, and Oman. The GCC region was selected for several reasons: rapidly growing EV adoption driven by government initiatives, significant infrastructure investment capacity, extreme climate conditions creating unique battery degradation patterns, and its potential to serve as a regional hub for battery recycling.

Battery SoH Profiles for GCC Context. The extreme temperatures in the GCC region (frequently exceeding 45 °C) accelerate battery degradation compared to temperate climates. Based on accelerated aging studies [60] and regional fleet data from early EV adopters, we model returning battery SoH distributions as follows. The assumed SoH at end-of-vehicle-life follows a shifted beta distribution:

SoH \sim 0.5 + 0.4 \cdot Beta (α, β)

(39)

with shape parameters

α = 3.2

,

β = 4.8

for GCC conditions (vs.

α = 2.5

,

β = 3.5

for temperate climates). This yields a mean SoH of 68.3% (vs. 71.4% globally) with standard deviation 8.7%. Figure 5 illustrates the assumed distribution.

Under this distribution, approximately 45% of returning batteries qualify for second-life applications (SoH ≥ 70%), compared to 55% under temperate climate assumptions. This regional characteristic directly impacts the optimal balance between repurposing and recycling capacity in our network design.

The case considers 45 source locations, 15 candidate collection center locations, 8 testing/sorting facility locations, 6 repurposing center locations, and 4 recycling center locations. The planning horizon spans 10 years (2025–2035) with quarterly time periods.

5.2. Parameter Settings

Key model parameters were estimated from industry data, government reports, and expert consultations. Table 7 summarizes facility cost parameters.

The DRL agent was trained for 500,000 episodes. PPO hyperparameters include: learning rate =

3 \times 10^{- 4}

, discount factor

γ = 0.99

, GAE parameter

λ = 0.95

, clip ratio

ϵ = 0.2

, and batch size = 64. The NSGA-III algorithm was configured with population size = 200, maximum generations = 500, crossover probability = 0.9, and mutation probability = 0.1.

5.3. Computational Results

5.3.1. Optimal Network Configuration

Table 8 presents the optimal network configuration identified by the proposed framework.

5.3.2. Comparison with Benchmark Methods

Table 9 presents the comparison with benchmark methods.

The results demonstrate that our proposed AI-DRL approach outperforms all benchmark methods: 18.7% cost reduction compared to deterministic MILP, 23.4% emission reduction, and 31.2% resilience improvement.

Baseline Operational Logic Clarification. To ensure fair comparison, we detail the operational decision rules used in each benchmark:

Deterministic MILP: Uses expected values for all uncertain parameters and solves a single-stage optimization. Operational routing follows shortest-path assignments based on the fixed solution.
Two-stage SP: First-stage determines facility locations; second-stage recourse uses a minimum-cost flow solver (not a naive rule) to optimize routing given realized scenarios.
NSGA-II (no DRL): Strategic decisions use NSGA-II; operational decisions follow a priority-based heuristic where batteries are routed to the nearest facility with available capacity, and SoH threshold is fixed at 70%.
CPLEX Weighted-sum: Solves the tri-objective problem as a weighted single objective using CPLEX, with operational flows optimized within the MILP formulation.

The key distinction is that only our proposed method uses learned, state-dependent operational policies. The 18.7% improvement over deterministic MILP, and 6.0% improvement over NSGA-II (no DRL), demonstrates the value added by the DRL component beyond sophisticated heuristics.

5.3.3. Multi-Seed Statistical Validation

Table 10 presents the statistical summary from 10 independent random seeds.

5.4. Ablation Study

Table 11 presents the contribution of each component in our framework.

Computational Trade-offs. The computation time column reveals important trade-offs. The policy transfer mechanism is critical for practical feasibility: without it (“w/o Policy Transfer”), computation time increases 7× to 89.4 h because each candidate network configuration requires full DRL training. The “w/o DRL” variant is fastest (4.2 h) but suffers 14.3% cost penalty. Our full model achieves the best quality-time trade-off at 12.8 h. All experiments were conducted on a workstation with Intel Xeon W-2295 CPU, 128GB RAM, and NVIDIA RTX 3090 GPU.

5.5. Sensitivity Analysis

Our sensitivity analysis reveals that improving SoH prediction from 85% to 95% accuracy reduces misallocation costs by 37% and increases overall material recovery value by 12%. Moderate redundancy (20–30% excess capacity) provides an optimal balance between cost increase and resilience improvement.

Value of Information Analysis. We quantify the economic value of improved SoH prediction accuracy to inform investment decisions in diagnostic equipment. Table 12 presents the value of information (VOI) analysis.

The incremental VOI of improving accuracy from 85% to 90% is USD 31.5M over the 10-year planning horizon. Given that advanced diagnostic equipment (including electrochemical impedance spectroscopy systems) costs approximately USD 2–4M per testing facility [61], the payback period for upgrading from basic to advanced diagnostics is approximately 0.6–1.2 years, strongly justifying the investment. However, the marginal value diminishes beyond 90%: the incremental VOI from 95% to perfect information is only USD 16.5M, suggesting that pursuing ultra-high accuracy may not be cost-effective.

Table 13 shows network performance under varying disruption probabilities.

6. Discussion and Managerial Implications

6.1. Key Findings

Our computational experiments yield several important findings:

Finding 1: AI-driven operational decisions significantly improve network performance. Integrating deep reinforcement learning for dynamic routing and sorting decisions enables the network to adapt to real-time conditions, resulting in better resource utilization and reduced operational costs. Compared to rule-based heuristics, the DRL agent reduces operational costs by 14.2% on average.

Finding 2: Facility redundancy is crucial for resilience but must be balanced against cost. Moderate redundancy (20–30% excess capacity) provides significant resilience benefits with manageable cost increases (8–12%).

Finding 3: SoH prediction accuracy is a critical success factor. Each 5% improvement in prediction accuracy yields approximately a 15% reduction in misallocation costs.

Finding 4: Regional network design is superior to country-level approaches. The GCC case study demonstrates that cross-border cooperation enables significant economies of scale, reducing total costs by 23% compared to independent national networks.

Finding 5: The hybrid uncertainty handling approach outperforms single-type methods. The FRS-CVaR framework demonstrates a 12–18% improvement in expected costs compared to purely stochastic or purely fuzzy approaches.

6.2. Managerial Implications

Based on our findings, we offer the following recommendations:

1.: Invest in AI capabilities: Organizations should prioritize developing AI systems for battery state estimation and operational decision support. Our analysis suggests a payback period of 2–3 years for AI infrastructure investments.
2.: Design for flexibility: Network design should incorporate modular facility concepts that allow for capacity expansion and technology upgrades. We recommend designing facilities with 30–40% expansion potential.
3.: Establish regional partnerships: Governments should facilitate cross-border cooperation for shared recycling infrastructure.
4.: Implement digital tracking systems: Battery passports and blockchain-based traceability systems should be deployed.
5.: Develop contingency plans: Organizations should maintain pre-qualified backup suppliers, emergency transportation agreements, and strategic inventory buffers.

7. Conclusions

This paper developed a novel AI-driven framework for designing resilient reverse logistics networks for the electric vehicle battery circular economy. The main contributions include: (1) a bi-level optimization model with a formal characterization of strategic-operational decision coupling; (2) a deep reinforcement learning application for dynamic battery routing and sorting; (3) a novel FRS-CVaR approach for hybrid uncertainty handling; and (4) an enhanced NSGA-III algorithm with comprehensive validation.

The GCC case study demonstrated substantial improvements: an 18.7% cost reduction (95% CI: [17.9%, 19.4%]), a 23.4% emission reduction (95% CI: [22.4%, 24.4%]), and a 31.2% resilience improvement (95% CI: [29.7%, 32.7%]).

Limitations and Future Research

Several limitations suggest directions for future research:

1.

Battery chemistry evolution: Future work could incorporate chemistry prediction for emerging technologies such as solid-state batteries.

2.

Real-world data validation: Real-world implementation would benefit from transfer learning using actual operational data.

3.

Global supply chain considerations: Global-scale considerations including cross-border material flows and trade policies warrant investigation.

4.

Social sustainability: Future research could extend the model to consider job creation, worker safety, and community impacts.

5.

Multi-agent coordination: Extending the framework to multi-agent reinforcement learning could address scenarios with multiple competing or cooperating decision-makers.

6.

Scalability considerations: Our case study involves 20 candidate facilities across the GCC region. For larger networks (e.g., pan-European or US-wide with hundreds of potential sites), computational scalability becomes a concern. The current approach exhibits the following computational complexity characteristics:

NSGA-III population evaluation: $O (N \cdot G \cdot E_{c o n f i g})$ where N is population size, G is generations, and $E_{c o n f i g}$ is per-configuration evaluation time. This is linear in population size and parallelizable across solutions.
DRL policy transfer: The network embedding computation requires $O (| J | + | K | + | R | + | M |)$ operations for the forward pass. The meta-policy evaluation scales as $O (H \cdot d)$ where H is the number of hidden units and $d = 64$ is the embedding dimension. For warm-start fine-tuning, the additional cost is $O (E_{f i n e} \cdot T_{e p i s o d e})$ where $E_{f i n e} = 5000$ episodes.
Scenario-based stochastic programming: Grows as $O (| S | \cdot n^{p})$ where $| S |$ is scenario count and $n^{p}$ is the polynomial growth in decision variables (approximately $n^{2.5}$ for our MILP structure).

Preliminary experiments on synthetic instances with 100 candidate locations show that computation time increases to approximately 45 h, remaining tractable. For networks with 500+ sites, we estimate computation times exceeding 200 h, which would likely require hierarchical decomposition approaches (e.g., clustering regions and solving sub-problems) or more aggressive policy transfer techniques such as graph neural network-based policy architectures that naturally handle variable-size inputs through message-passing mechanisms. Future work should investigate these scalable variants.

As the global EV market continues its rapid expansion, the need for effective end-of-life battery management becomes increasingly critical. This research provides both methodological advances and practical insights to support the transition toward a sustainable, circular economy for electric vehicle batteries.

Funding

This research received no external funding.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request. The optimization model parameters were derived from publicly available industry reports and government publications as cited in the text.

Conflicts of Interest

The author declares no conflicts of interest.

References

International Energy Agency. Global EV Outlook 2024: Scaling Up the Transition to Electric Mobility; IEA Publication; IEA: Paris, France, 2024; Available online: https://www.iea.org/reports/global-ev-outlook-2024 (accessed on 20 August 2025).
Zhang, L.; Liu, Y.; Pang, B.; Sun, B.; Kokko, A. End-of-Life EV Battery Volume Projections and Management Challenges in China. Resour. Conserv. Recycl. 2024, 201, 107325. [Google Scholar] [CrossRef]
Harper, G.; Sommerville, R.; Kendrick, E.; Driscoll, L.; Slater, P.; Stolkin, R.; Walton, A.; Christensen, P.; Heidrich, O.; Lambert, S.; et al. Recycling Lithium-Ion Batteries from Electric Vehicles. Nature 2019, 575, 75–86. [Google Scholar] [CrossRef]
Melin, H.E. State-of-the-Art in Reuse and Recycling of Lithium-Ion Batteries: A Research Review; Technical Report; Circular Energy Storage: London, UK, 2019. [Google Scholar]
Gholami-Zanjani, S.M.; Jabalameli, M.S.; Pishvaee, M.S. A Resilient-Green Model for Multi-Echelon Meat Supply Chain Planning. Comput. Ind. Eng. 2021, 152, 107018. [Google Scholar] [CrossRef]
Govindan, K.; Soleimani, H.; Kannan, D. Reverse Logistics and Closed-Loop Supply Chain: A Comprehensive Review to Explore the Future. Eur. J. Oper. Res. 2015, 240, 603–626. [Google Scholar] [CrossRef]
Diabat, A.; Kannan, D.; Kaliyan, M.; Svetinovic, D. An Optimization Model for Product Returns Using Genetic Algorithms and Artificial Immune System. Resour. Conserv. Recycl. 2013, 74, 156–169. [Google Scholar] [CrossRef]
Fleischmann, M.; Krikke, H.R.; Dekker, R.; Flapper, S.D.P. A Characterisation of Logistics Networks for Product Recovery. Omega 2001, 29, 653–666. [Google Scholar] [CrossRef]
Oxford Institute for Energy Studies. EVs and Battery Supply Chains: Issues and Impacts. Oxf. Energy Forum 2025, 144, 1–60. [Google Scholar]
Govindan, K.; Fattahi, M.; Keyvanshokooh, E. Supply Chain Network Design Under Uncertainty: A Comprehensive Review and Future Research Directions. Eur. J. Oper. Res. 2017, 263, 108–141. [Google Scholar] [CrossRef]
Pishvaee, M.S.; Torabi, S.A. A Possibilistic Programming Approach for Closed-Loop Supply Chain Network Design Under Uncertainty. Fuzzy Sets Syst. 2010, 161, 2668–2683. [Google Scholar] [CrossRef]
Soleimani, H.; Govindan, K.; Saghafi, H.; Jafari, H. Fuzzy Multi-Objective Sustainable and Green Closed-Loop Supply Chain Network Design. Comput. Ind. Eng. 2017, 109, 191–203. [Google Scholar] [CrossRef]
Tosarkani, B.M.; Amin, S.H. A Multi-Objective Model to Configure an Electronic Reverse Logistics Network and Third Party Selection. J. Clean. Prod. 2018, 198, 662–682. [Google Scholar] [CrossRef]
Sabouhi, F.; Pishvaee, M.S.; Jabalameli, M.S. Resilient Supply Chain Design Under Operational and Disruption Risks Considering Quantity Discount: A Case Study of Pharmaceutical Supply Chain. Comput. Ind. Eng. 2018, 126, 657–672. [Google Scholar] [CrossRef]
Haseli, G.; Nazarian-Jashnabadi, J.; Shirazi, B.; Hajiaghaei-Keshteli, M. A Novel Data-Driven Robust Framework for Sustainable Closed-Loop Supply Chain Network Design. Expert Syst. Appl. 2023, 215, 119329. [Google Scholar] [CrossRef]
Bertolini, M.; Mezzogori, D.; Neroni, M.; Zammori, F. Machine Learning for Industrial Applications: A Comprehensive Literature Review. Expert Syst. Appl. 2021, 175, 114820. [Google Scholar] [CrossRef]
Gijsbrechts, J.; Boute, R.N.; Van Mieghem, J.A.; Zhang, D.J. Can Deep Reinforcement Learning Improve Inventory Management? Performance on Lost Sales, Dual-Sourcing, and Multi-Echelon Problems. Manuf. Serv. Oper. Manag. 2022, 24, 1349–1368. [Google Scholar] [CrossRef]
Farman, M.K.; Nikhila, J.; Sreeja, A.B.; Roopa, B.S.; Sahithi, K.; Kumar, D.G. AI-Enhanced Battery Management Systems for Electric Vehicles: Advancing Safety, Performance, and Longevity. E3S Web Conf. 2024, 591, 04001. [Google Scholar] [CrossRef]
Kushwah, D.; Sharma, A. AI-Enhanced Battery Management Systems: A Comprehensive Review of Intelligent Monitoring, Fault Diagnosis, and Optimization Techniques. Int. J. Sci. Eng. Dev. Res. 2025, 10, 359–363. [Google Scholar]
Zanoletti, A.; Carena, E.; Ferrara, C.; Bontempi, E.; Depero, L.E. Recent Advancements in Artificial Intelligence in Battery Recycling. Batteries 2024, 10, 440. [Google Scholar] [CrossRef]
Li, W.; Peng, Y.; Zhu, Y.; Pham, D.T.; Nee, A.Y.C.; Ong, S.K. Reverse Logistics for Electric Vehicle Batteries: A Systematic Review. Comput. Ind. Eng. 2025, 189, 110735. [Google Scholar] [CrossRef]
Wang, L.; Chen, Y.; Liu, H.; Zhang, X. Optimal Recycling Model Selection in a Closed-Loop Supply Chain for Electric Vehicle Batteries Under Carbon Cap-Trade and Reward-Penalty Policies. Comput. Ind. Eng. 2024, 193, 110288. [Google Scholar] [CrossRef]
Saeedi, M.; Parhazeh, S.; Tavakkoli-Moghaddam, R.; Khalili-Fard, A. Designing a Two-Stage Model for a Sustainable Closed-Loop Electric Vehicle Battery Supply Chain Network: A Scenario-Based Stochastic Programming Approach. Comput. Ind. Eng. 2024, 190, 110036. [Google Scholar] [CrossRef]
Gu, X.; Zhou, L.; Huang, H.; Shi, X.; Ieromonachou, P. Electric Vehicle Battery Secondary Use Under Government Subsidy: A Closed-Loop Supply Chain Perspective. Int. J. Prod. Econ. 2021, 234, 108035. [Google Scholar] [CrossRef]
Lotfi, R.; Mehrjerdi, Y.Z.; Pishvaee, M.S.; Sadeghieh, A.; Weber, G.W. A Robust Optimization Model for Sustainable and Resilient Closed-Loop Supply Chain Network Design Considering Conditional Value at Risk. Numer. Algebr. Control Optim. 2021, 11, 221–253. [Google Scholar] [CrossRef]
Saeed, M.; Ahmad, M.R.; Rahman, A. Refined Neutrosophic Optimization for Closed-Loop Supply Chain Network Design. Appl. Soft Comput. 2024, 148, 110897. [Google Scholar] [CrossRef]
Kazancoglu, Y.; Kazancoglu, I.; Sagnak, M. A New Holistic Conceptual Framework for Green Supply Chain Management Performance Assessment Based on Circular Economy. J. Clean. Prod. 2018, 195, 1282–1299. [Google Scholar] [CrossRef]
Haram, M.H.S.M.; Lee, J.W.; Ramasamy, G.; Ngu, E.E.; Thiagarajah, S.P.; Lee, Y.H. Feasibility of Utilising Second Life EV Batteries: Applications, Lifespan, Economics, Environmental Impact, Assessment, and Challenges. Alex. Eng. J. 2021, 60, 4517–4536. [Google Scholar] [CrossRef]
Casals, L.C.; Amante Garcia, B.; Canal, C. Second Life Batteries Lifespan: Rest of Useful Life and Environmental Analysis. J. Environ. Manag. 2019, 232, 354–363. [Google Scholar] [CrossRef]
Martinez-Laserna, E.; Gandiaga, I.; Sarasketa-Zabala, E.; Badeda, J.; Stroe, D.I.; Swierczynski, M.; Goikoetxea, A. Battery Second Life: Hype, Hope or Reality? A Critical Review of the State of the Art. Renew. Sustain. Energy Rev. 2018, 93, 701–718. [Google Scholar] [CrossRef]
Richa, K.; Babbitt, C.W.; Nenadic, N.G.; Gaustad, G. Environmental Trade-Offs Across Cascading Lithium-Ion Battery Life Cycles. Int. J. Life Cycle Assess. 2017, 22, 66–81. [Google Scholar] [CrossRef]
Ni, D.; Xiao, Z.; Lim, M.K. A Systematic Review of the Research Trends of Machine Learning in Supply Chain Management. Int. J. Mach. Learn. Cybern. 2020, 11, 1463–1482. [Google Scholar] [CrossRef]
Carbonneau, R.; Laframboise, K.; Bhattacharyya, S.; Bhattacharyya, S. Application of Machine Learning Techniques for Supply Chain Demand Forecasting. Eur. J. Oper. Res. 2008, 184, 1140–1154. [Google Scholar] [CrossRef]
Severson, K.A.; Attia, P.M.; Jin, N.; Perkins, N.; Jiang, B.; Yang, Z.; Chen, M.H.; Aykol, M.; Herring, P.K.; Fraggedakis, D.; et al. Data-Driven Prediction of Battery Cycle Life Before Capacity Degradation. Nat. Energy 2019, 4, 383–391. [Google Scholar] [CrossRef]
Roman, D.; Saxena, S.; Robu, V.; Pecht, M.; Birkl, C. Machine Learning Pipeline for Battery State-of-Health Estimation. Nat. Mach. Intell. 2021, 3, 447–456. [Google Scholar] [CrossRef]
Li, Y.; Liu, K.; Foley, A.M.; Zülke, A.; Berecibar, M.; Nanini-Maury, E.; Van Mierlo, J.; Hoster, H.E. Data-Driven Health Estimation and Lifetime Prediction of Lithium-Ion Batteries: A Review. Renew. Sustain. Energy Rev. 2019, 113, 109254. [Google Scholar] [CrossRef]
Chen, Y.; Wang, L.; Zhang, H.; Liu, X. Deep Reinforcement Learning for Inventory Management Under Demand Uncertainty. Eur. J. Oper. Res. 2024, 312, 456–471. [Google Scholar] [CrossRef]
Oroojlooy, A.; Hajinezhad, D. A Review of Cooperative Multi-Agent Deep Reinforcement Learning. Appl. Intell. 2023, 53, 13677–13722. [Google Scholar] [CrossRef]
Zhu, X.; Zhang, Y.; Liu, J.; Wang, H. Machine Learning-Based Optimization for EV Battery Disassembly Sequence Planning. J. Manuf. Syst. 2023, 68, 236–248. [Google Scholar] [CrossRef]
Wang, Y.; Tian, J.; Chen, Z.; Liu, X. Digital Twin and Metaverse-Enhanced Battery Management for Electric Vehicles. Appl. Energy 2025, 358, 122576. [Google Scholar]
Bhatti, G.; Mohan, H.; Singh, R.R. Towards the Future of Smart Electric Vehicles: Digital Twin Technology. Renew. Sustain. Energy Rev. 2021, 141, 110801. [Google Scholar] [CrossRef]
Tao, F.; Zhang, H.; Liu, A.; Nee, A.Y.C. Digital Twin in Industry: State-of-the-Art. IEEE Trans. Ind. Inform. 2019, 15, 2405–2415. [Google Scholar] [CrossRef]
European Parliament and Council. Regulation (EU) 2023/1542 Concerning Batteries and Waste Batteries. Off. J. Eur. Union 2023, L 191, 1. [Google Scholar]
Vural, C.A.; van Loon, P.; Halldórsson, Á.; Fransson, J.; Josefsson, F. Life after Use: Circular Supply Chains for Second-Life of Electric Vehicle Batteries. Prod. Plan. Control 2024, 36, 1229–1246. [Google Scholar] [CrossRef]
Pishvaee, M.S.; Rabbani, M.; Torabi, S.A. A Robust Optimization Approach to Closed-Loop Supply Chain Network Design Under Uncertainty. Appl. Math. Model. 2011, 35, 637–649. [Google Scholar] [CrossRef]
Farahani, R.Z.; Rezapour, S.; Drezner, T.; Fallah, S. Competitive Supply Chain Network Design: An Overview of Classifications, Models, Solution Techniques and Applications. Omega 2014, 45, 92–118. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Deb, K.; Jain, H. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems with Box Constraints. IEEE Trans. Evol. Comput. 2014, 18, 577–601. [Google Scholar] [CrossRef]
Birge, J.R.; Louveaux, F. Introduction to Stochastic Programming, 2nd ed.; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar] [CrossRef]
Ben-Tal, A.; El Ghaoui, L.; Nemirovski, A. Robust Optimization; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
Pishvaee, M.S.; Torabi, S.A.; Razmi, J. Credibility-Based Fuzzy Mathematical Programming Model for Green Logistics Design Under Uncertainty. Comput. Ind. Eng. 2012, 62, 624–632. [Google Scholar] [CrossRef]
Ivanov, D.; Dolgui, A. A Digital Supply Chain Twin for Managing the Disruption Risks and Resilience in the Era of Industry 4.0. Prod. Plan. Control 2021, 32, 775–788. [Google Scholar] [CrossRef]
Hosseini, S.; Ivanov, D.; Dolgui, A. Review of Quantitative Methods for Supply Chain Resilience Analysis. Transp. Res. Part E Logist. Transp. Rev. 2019, 125, 285–307. [Google Scholar] [CrossRef]
Ribeiro, J.P.; Barbosa-Povoa, A. Supply Chain Resilience: Definitions and Quantitative Modelling Approaches—A Literature Review. Comput. Ind. Eng. 2018, 115, 109–122. [Google Scholar] [CrossRef]
Dolgui, A.; Ivanov, D.; Sokolov, B. Ripple Effect in the Supply Chain: An Analysis and Recent Literature. Int. J. Prod. Res. 2018, 56, 414–430. [Google Scholar] [CrossRef]
Chen, X.; Zhou, L.; Wang, H. Supply Chain Resilience Assessment for EV Lithium-Ion Battery. Resour. Conserv. Recycl. 2023, 195, 106892. [Google Scholar] [CrossRef]
Heitsch, H.; Römisch, W. Scenario Reduction Algorithms in Stochastic Programming. Comput. Optim. Appl. 2003, 24, 187–206. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Rachelson, E.; Lagoudakis, M.G.; Munos, R. On the Locality of Action Domination in Sequential Decision Making. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, FL, USA, 6–8 January 2010. [Google Scholar]
Waldmann, T.; Hogg, B.I.; Wohlfahrt-Mehrens, M. Li Plating as Unwanted Side Reaction in Commercial Li-Ion Cells: A Review. J. Power Sources 2018, 384, 107–124. [Google Scholar] [CrossRef]
Berecibar, M.; Gandiaga, I.; Villarreal, I.; Omar, N.; Van Mierlo, J.; Van den Bossche, P. Critical Review of State of Health Estimation Methods of Li-Ion Batteries for Real Applications. Renew. Sustain. Energy Rev. 2016, 56, 572–587. [Google Scholar] [CrossRef]

Figure 1. Multi-echelon reverse logistics network structure for EV battery circular economy.

Figure 2. Information flow between strategic optimization (NSGA-III) and operational decision-making (DRL). The strategic module generates candidate network configurations, which are evaluated using the DRL agent through policy transfer mechanisms. Operational performance metrics feed back into the fitness function.

Figure 3. Integrated solution methodology framework.

Figure 4. Deep reinforcement learning architecture for operational decisions.

Figure 5. Assumed SoH distribution for returning EV batteries in GCC region, reflecting accelerated degradation due to high ambient temperatures. The dashed line indicates the 70% threshold for second-life qualification.

Table 1. Comparison with recent related studies.

Study	AI/ML	Dynamic	Multi-Obj.	Hybrid Unc.	Resilience	EV Battery
Wang et al. [22]	–	–	✓	–	–	✓
Zhang et al. [23]	–	–	✓	Stochastic	–	✓
Lotfi et al. [25]	–	–	✓	Fuzzy	✓	–
Li et al. [21]	–	–	–	–	–	✓
Haseli et al. [15]	–	–	✓	Fuzzy-Robust	–	–
Gijsbrechts et al. [17]	DRL	✓	–	Stochastic	–	–
Chen et al. [56]	–	–	–	–	✓	✓
This study	DRL + ML	✓	✓	FRS-CVaR	✓	✓

Table 2. Sets and indices.

Symbol	Description
I	Set of source locations (EV owners, dealers), indexed by i
J	Set of candidate collection center locations, indexed by j
K	Set of candidate testing/sorting facility locations, indexed by k
R	Set of candidate repurposing center locations, indexed by r
M	Set of candidate recycling center locations, indexed by m
T	Set of time periods in the planning horizon, indexed by t
S	Set of disruption scenarios, indexed by s
L	Set of facility capacity levels (small, medium, large), indexed by l
C	Set of battery chemistries (NMC, LFP, NCA), indexed by c

Table 3. Parameters.

Symbol	Description
${\tilde{d}}_{i t}$	Fuzzy demand for battery collection at source i in period t (tons)
$f c_{j l}$	Fixed cost of establishing collection center j with capacity level l ($)
$f t_{k l}$	Fixed cost of establishing testing facility k with capacity level l ($)
$f r_{r l}$	Fixed cost of establishing repurposing center r with capacity level l ($)
$f m_{m l}$	Fixed cost of establishing recycling center m with capacity level l ($)
$c a p_{j l}^{C}$	Capacity of collection center j with level l (tons/period)
$c a p_{k l}^{T}$	Capacity of testing facility k with level l (tons/period)
$c a p_{r l}^{R}$	Capacity of repurposing center r with level l (tons/period)
$c a p_{m l}^{M}$	Capacity of recycling center m with level l (tons/period)
$t c_{i j}$	Unit transportation cost from source i to collection center j ($/ton)
${\tilde{p} c}_{k}$	Fuzzy unit processing cost at testing facility k ($/ton)
$α_{c}$	Proportion of chemistry c batteries suitable for second-life
$e_{i j}$	Carbon emission per ton transported from i to j (kg CO₂)
$ξ_{c}$	Material recovery rate for chemistry c (%)
$p_{s}$	Probability of scenario s occurring
$δ_{j s}$	Disruption indicator for collection center j in scenario s
$β$	Confidence level for CVaR calculation (typically 0.95)
$λ$	Risk-aversion parameter

Table 4. Decision variables.

Symbol	Description
$Y_{j l}$	Binary: 1 if collection center j is established with capacity level l
$Z_{k l}$	Binary: 1 if testing facility k is established with capacity level l
$U_{r l}$	Binary: 1 if repurposing center r is established with capacity level l
$V_{m l}$	Binary: 1 if recycling center m is established with capacity level l
$X_{i j t s}$	Flow of batteries from source i to collection center j in period t, scenario s
$F_{j k t s}$	Flow from collection center j to testing facility k in period t, scenario s
$Q_{k r t s}$	Flow from testing facility k to repurposing center r in period t, scenario s
$W_{k m t s}$	Flow from testing facility k to recycling center m in period t, scenario s
$I_{j t s}^{C}$	Inventory at collection center j at end of period t, scenario s
$η$	Value-at-Risk auxiliary variable for CVaR calculation

Table 5. Sensitivity analysis on resilience objective weights.

Strategy	$(w_{1}, w_{2}, w_{3})$	Cost (USD M)	Resilience	# Facilities	Topology Change
Baseline	(0.4, 0.3, 0.3)	1502	0.72	20	–
Redundancy-focused	(0.6, 0.2, 0.2)	1589	0.78	23	+3 backup sites
Connectivity-focused	(0.2, 0.6, 0.2)	1534	0.70	22	Hub-and-spoke
Flexibility-focused	(0.2, 0.2, 0.6)	1478	0.68	18	Multi-sourcing

Table 6. Warm-start transfer validation: performance after 5000 fine-tuning episodes.

Initialization Strategy	Avg. Reward	Convergence Episodes	Gap to Full Training
Random initialization	$623 \pm 87$	>50,000	26.4%
Meta-policy $π_{m e t a}$	$789 \pm 34$	12,400	6.9%
Warm-start (closest config.)	$831 \pm 21$	4200	1.9%
Full training (500 k episodes)	$847 \pm 23$	300,000	Baseline

Table 7. Facility cost parameters by capacity level (million USD).

Facility Type	Small	Medium	Large	Capacity Range (t/yr)
Collection Center	2.0	5.5	15.0	5000–25,000
Testing Facility	5.0	12.0	25.0	8000–30,000
Repurposing Center	10.0	25.0	40.0	6000–20,000
Recycling Center	20.0	50.0	80.0	10,000–35,000

Table 8. Optimal network configuration for GCC region.

Facility Type	Number	Capacity (t/yr)	Investment (USD M)	Locations
Collection Centers	9	95,000	67.5	SAU(4), UAE(3), KWT(1), QAT(1)
Testing Facilities	5	85,000	82.0	SAU(2), UAE(2), OMN(1)
Repurposing Centers	4	48,000	124.0	SAU(2), UAE(1), BHR(1)
Recycling Centers	2	42,000	145.0	SAU(1), UAE(1)
Total	20	–	418.5	–

Table 9. Performance comparison of solution methods.

Method	Cost ($M)	CO₂ (kt)	Resilience	Time (h)	Improvement
Deterministic MILP	1847	892	0.42	2.3	Baseline
Two-stage SP	1723	856	0.51	8.7	6.7%
NSGA-II (no DRL)	1612	798	0.58	15.2	12.7%
CPLEX Weighted-sum	1589	824	0.55	24.6	14.0%
Proposed (AI-DRL)	1502	683	0.72	12.8	18.7%

Table 10. Multi-seed statistical validation (10 independent runs).

Metric	Mean	Std. Dev.	Min	Max	95% CI
Total Cost (USD M)	1502	18.7	1471	1538	[1489, 1515]
CO₂ Emissions (kt)	683	12.4	662	705	[674, 692]
Resilience Index	0.72	0.018	0.69	0.75	[0.707, 0.733]
Cost Reduction (%)	18.7	1.01	16.7	20.4	[17.9, 19.4]
Emission Reduction (%)	23.4	1.39	21.0	25.7	[22.4, 24.4]
Resilience Improvement (%)	31.2	2.13	27.4	34.9	[29.7, 32.7]

Table 11. Ablation study: Component contribution analysis (10-seed averages with 95% CI).

Configuration	Cost (USD M)	CO₂ (kt)	Resilience	$Δ$ Cost	Time (hrs)
Full Model (Proposed)	1502 ± 12	683 ± 8	0.72 ± 0.01	Baseline	12.8
w/o DRL (rule-based ops)	1716 ± 21	751 ± 14	0.65 ± 0.02	+14.3%	4.2
w/o Policy Transfer	1589 ± 31	702 ± 18	0.70 ± 0.02	+5.8%	89.4
PPO → A2C	1547 ± 19	695 ± 11	0.71 ± 0.01	+3.0%	14.1
PPO → DQN	1623 ± 27	718 ± 16	0.68 ± 0.02	+8.1%	18.6
w/o Fuzzy (point estimates)	1578 ± 24	698 ± 13	0.70 ± 0.01	+5.1%	11.2
w/o CVaR ( $λ = 0$ )	1534 ± 18	689 ± 10	0.67 ± 0.02	+2.1%	10.9
w/o Scenarios (expected value)	1689 ± 34	742 ± 19	0.58 ± 0.03	+12.5%	3.8

Table 12. Value of information: SoH prediction accuracy vs. economic outcomes (USD).

Accuracy	Misalloc. Cost	Recovery Value	Net Benefit	Incremental VOI
85% (Basic)	$47.2 M	$312.4 M	Baseline	–
90%	$38.1 M	$334.8 M	+$31.5 M	$31.5 M
95% (Advanced)	$29.7 M	$349.9 M	+$55.0 M	$23.5 M
100% (Oracle)	$24.3 M	$361.2 M	+$71.5 M	$16.5 M

Table 13. Sensitivity to disruption probability.

Disruption Prob.	Cost (USD M)	CO₂ (kt)	Resilience	Service Level (%)
5% (Low)	1456	671	0.68	98.5
10% (Base)	1502	683	0.72	97.2
15% (Medium)	1578	698	0.75	95.8
20% (High)	1689	724	0.78	93.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Almuwallad, M. AI-Driven Resilient Reverse Logistics Network for Electric Vehicle Battery Circular Economy: A Deep Reinforcement Learning Approach with Multi-Objective Optimization Under Disruption Uncertainty. Energies 2026, 19, 738. https://doi.org/10.3390/en19030738

AMA Style

Almuwallad M. AI-Driven Resilient Reverse Logistics Network for Electric Vehicle Battery Circular Economy: A Deep Reinforcement Learning Approach with Multi-Objective Optimization Under Disruption Uncertainty. Energies. 2026; 19(3):738. https://doi.org/10.3390/en19030738

Chicago/Turabian Style

Almuwallad, Mansour. 2026. "AI-Driven Resilient Reverse Logistics Network for Electric Vehicle Battery Circular Economy: A Deep Reinforcement Learning Approach with Multi-Objective Optimization Under Disruption Uncertainty" Energies 19, no. 3: 738. https://doi.org/10.3390/en19030738

APA Style

Almuwallad, M. (2026). AI-Driven Resilient Reverse Logistics Network for Electric Vehicle Battery Circular Economy: A Deep Reinforcement Learning Approach with Multi-Objective Optimization Under Disruption Uncertainty. Energies, 19(3), 738. https://doi.org/10.3390/en19030738

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Driven Resilient Reverse Logistics Network for Electric Vehicle Battery Circular Economy: A Deep Reinforcement Learning Approach with Multi-Objective Optimization Under Disruption Uncertainty

Abstract

1. Introduction

2. Literature Review

2.1. EV Battery Reverse Logistics and Circular Economy

2.2. AI Applications in Supply Chain Management

2.3. Multi-Objective Optimization Under Uncertainty

2.4. Research Gaps and Contributions

3. Problem Description and Mathematical Formulation

3.1. Problem Description

3.2. Sets and Indices

3.3. Parameters

3.4. Decision Variables

3.5. Objective Functions

3.5.1. Objective 1: Minimize Total Supply Chain Cost

3.5.2. Objective 2: Minimize Total Carbon Emissions

3.5.3. Objective 3: Maximize Supply Chain Resilience

3.6. Constraints

3.6.1. Flow Conservation Constraints

3.6.2. Capacity Constraints

3.6.3. Facility Selection and Budget Constraints

4. Solution Methodology

4.1. Fuzzy-Robust Stochastic Transformation (FRS-CVaR)

4.1.1. Step 1: Fuzzy Parameter Defuzzification

4.1.2. Step 2: Scenario Generation and Reduction

4.1.3. Step 3: CVaR Integration for Risk Management

4.2. Deep Reinforcement Learning Architecture

4.2.1. State Space

4.2.2. Action Space

4.2.3. Reward Function

4.2.4. PPO Algorithm

4.2.5. Policy Transfer Across Network Configurations

4.3. Enhanced NSGA-III Algorithm

5. Case Study and Computational Results

5.1. Case Study Description

5.2. Parameter Settings

5.3. Computational Results

5.3.1. Optimal Network Configuration

5.3.2. Comparison with Benchmark Methods

5.3.3. Multi-Seed Statistical Validation

5.4. Ablation Study

5.5. Sensitivity Analysis

6. Discussion and Managerial Implications

6.1. Key Findings

6.2. Managerial Implications

7. Conclusions

Limitations and Future Research

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI