1. Introduction
The global transition toward sustainable transportation has accelerated electric vehicle (EV) adoption at an unprecedented rate. Global EV sales reached 14 million units in 2023, with projections indicating that annual sales will exceed 40 million by 2030 [
1]. This exponential growth is generating a corresponding surge in end-of-life (EoL) battery volumes, with estimates suggesting that 1.5–3.3 million tons of EV batteries will require management annually by 2040 in China alone [
2]. Improper handling of these batteries poses significant environmental and safety risks due to toxic materials including lithium, cobalt, and nickel, while simultaneously representing a substantial loss of valuable resources critical for battery manufacturing supply chains [
3,
4].
The circular economy (CE) paradigm offers a promising approach to address this challenge by maximizing resource utilization through strategies including battery repurposing for secondary applications (e.g., stationary energy storage systems), remanufacturing, and material recycling [
5,
6]. However, implementing effective CE strategies for EV batteries requires sophisticated reverse logistics networks capable of handling the inherent complexity and uncertainty in battery returns, variable battery conditions, fluctuating material prices, and potential supply chain disruptions [
7,
8]. Recent geopolitical tensions and pandemic-induced disruptions have highlighted the vulnerability of global battery supply chains, with critical material prices experiencing up to 80% fluctuations within 12-month periods in some cases [
9].
Traditional optimization approaches for reverse logistics network design face significant limitations when addressing the dynamic, uncertain, and multi-faceted nature of EV battery management. First, conventional mathematical programming models assume static decision-making environments, thereby failing to capture the real-time adaptability required for efficient battery collection and processing [
10,
11]. Second, existing approaches often consider single objectives (typically cost minimization), thus neglecting the multi-dimensional sustainability requirements, including environmental impact and social considerations [
12,
13]. Third, most studies inadequately address the hybrid uncertainty arising from multiple sources including demand stochasticity, processing cost ambiguity, and disruption scenarios [
14,
15].
Artificial intelligence (AI), particularly deep reinforcement learning (DRL), has emerged as a transformative technology capable of addressing these limitations [
16,
17]. Recent advances demonstrate that AI-enhanced diagnostics can extend second-life battery service by up to 50% and reduce lifecycle costs by 25% [
18,
19]. Furthermore, AI-driven sorting and classification systems can significantly improve material recovery rates while reducing processing time and costs [
20]. However, integrating AI capabilities with strategic supply chain network design remains largely unexplored in the literature.
A critical technical bottleneck hindering this integration is the computational intractability of jointly optimizing strategic facility location decisions and operational routing policies under uncertainty. Traditional approaches either (i) solve the strategic problem assuming simplified operational rules, leading to suboptimal network designs, or (ii) require exhaustive enumeration of operational scenarios for each candidate configuration, resulting in prohibitive computational costs. Previous attempts to address this have failed because heuristic-based operational rules cannot capture the complex, state-dependent decisions required when battery conditions, market prices, and facility availability change dynamically. Our bi-level framework overcomes this barrier by employing a learned DRL policy that can generalize across network configurations through policy transfer mechanisms, enabling efficient fitness evaluation during the evolutionary search without sacrificing operational decision quality. The “learning” component is essential at the strategic phase because it enables the algorithm to accurately estimate the operational performance of candidate network configurations without solving expensive stochastic programs for each evaluation.
This research addresses these gaps by developing a novel AI-driven framework for resilient reverse logistics network design that integrates: (1) machine learning-based battery State-of-Health prediction for intelligent sorting decisions; (2) deep reinforcement learning for dynamic operational optimization; (3) multi-objective optimization considering economic, environmental, and resilience criteria; and (4) hybrid uncertainty handling through fuzzy-robust stochastic programming with risk measures. The framework is designed specifically for the EV battery context, incorporating unique characteristics including cascaded lifecycle pathways (second-life vs. end-of-life), chemistry-specific processing requirements, and regulatory compliance considerations.
The main contributions of this paper are:
- 1.
Methodological Innovation: We develop a novel bi-level optimization framework that integrates strategic network design with AI-driven operational decision-making for EV battery reverse logistics. We provide a formal mathematical characterization of the coupling between upper-level (optimized) and lower-level (learned) decisions, including explicit policy transfer mechanisms across network configurations.
- 2.
AI Integration: We propose the first application of deep reinforcement learning (specifically, Proximal Policy Optimization) for dynamic battery routing and sorting decisions within a closed-loop supply chain context.
- 3.
Uncertainty Handling: We introduce a novel Fuzzy-Robust Stochastic programming approach with CVaR (FRS-CVaR) that simultaneously addresses epistemic uncertainty (via fuzzy sets), aleatory uncertainty (via scenarios), and tail risk (via CVaR), with explicit categorization of how each uncertainty type manifests in the EV battery context.
- 4.
Multi-Objective Optimization: We develop an enhanced NSGA-III algorithm incorporating adaptive evolution strategies for solving the tri-objective problem of cost minimization, emission reduction, and resilience maximization, with comprehensive ablation studies and multi-seed statistical validation.
- 5.
Practical Validation: We validate the framework through a comprehensive case study of the GCC region, providing actionable insights for policymakers and practitioners in developing economies transitioning toward sustainable EV battery management.
The remainder of this paper is organized as follows.
Section 2 reviews the relevant literature.
Section 3 describes the problem and presents the mathematical formulation.
Section 4 details the solution methodology including the DRL architecture and multi-objective algorithm.
Section 5 presents the case study and computational results.
Section 6 discusses managerial implications, and
Section 7 concludes with limitations and future research directions.
5. Case Study and Computational Results
5.1. Case Study Description
We validate the proposed framework through a comprehensive case study of the Gulf Cooperation Council (GCC) region, comprising Saudi Arabia, the United Arab Emirates, Kuwait, Qatar, Bahrain, and Oman. The GCC region was selected for several reasons: rapidly growing EV adoption driven by government initiatives, significant infrastructure investment capacity, extreme climate conditions creating unique battery degradation patterns, and its potential to serve as a regional hub for battery recycling.
Battery SoH Profiles for GCC Context. The extreme temperatures in the GCC region (frequently exceeding 45 °C) accelerate battery degradation compared to temperate climates. Based on accelerated aging studies [
60] and regional fleet data from early EV adopters, we model returning battery SoH distributions as follows. The assumed SoH at end-of-vehicle-life follows a shifted beta distribution:
with shape parameters
,
for GCC conditions (vs.
,
for temperate climates). This yields a mean SoH of 68.3% (vs. 71.4% globally) with standard deviation 8.7%.
Figure 5 illustrates the assumed distribution.
Under this distribution, approximately 45% of returning batteries qualify for second-life applications (SoH ≥ 70%), compared to 55% under temperate climate assumptions. This regional characteristic directly impacts the optimal balance between repurposing and recycling capacity in our network design.
The case considers 45 source locations, 15 candidate collection center locations, 8 testing/sorting facility locations, 6 repurposing center locations, and 4 recycling center locations. The planning horizon spans 10 years (2025–2035) with quarterly time periods.
5.2. Parameter Settings
Key model parameters were estimated from industry data, government reports, and expert consultations.
Table 7 summarizes facility cost parameters.
The DRL agent was trained for 500,000 episodes. PPO hyperparameters include: learning rate = , discount factor , GAE parameter , clip ratio , and batch size = 64. The NSGA-III algorithm was configured with population size = 200, maximum generations = 500, crossover probability = 0.9, and mutation probability = 0.1.
5.3. Computational Results
5.3.1. Optimal Network Configuration
Table 8 presents the optimal network configuration identified by the proposed framework.
5.3.2. Comparison with Benchmark Methods
Table 9 presents the comparison with benchmark methods.
The results demonstrate that our proposed AI-DRL approach outperforms all benchmark methods: 18.7% cost reduction compared to deterministic MILP, 23.4% emission reduction, and 31.2% resilience improvement.
Baseline Operational Logic Clarification. To ensure fair comparison, we detail the operational decision rules used in each benchmark:
Deterministic MILP: Uses expected values for all uncertain parameters and solves a single-stage optimization. Operational routing follows shortest-path assignments based on the fixed solution.
Two-stage SP: First-stage determines facility locations; second-stage recourse uses a minimum-cost flow solver (not a naive rule) to optimize routing given realized scenarios.
NSGA-II (no DRL): Strategic decisions use NSGA-II; operational decisions follow a priority-based heuristic where batteries are routed to the nearest facility with available capacity, and SoH threshold is fixed at 70%.
CPLEX Weighted-sum: Solves the tri-objective problem as a weighted single objective using CPLEX, with operational flows optimized within the MILP formulation.
The key distinction is that only our proposed method uses learned, state-dependent operational policies. The 18.7% improvement over deterministic MILP, and 6.0% improvement over NSGA-II (no DRL), demonstrates the value added by the DRL component beyond sophisticated heuristics.
5.3.3. Multi-Seed Statistical Validation
Table 10 presents the statistical summary from 10 independent random seeds.
5.4. Ablation Study
Table 11 presents the contribution of each component in our framework.
Computational Trade-offs. The computation time column reveals important trade-offs. The policy transfer mechanism is critical for practical feasibility: without it (“w/o Policy Transfer”), computation time increases 7× to 89.4 h because each candidate network configuration requires full DRL training. The “w/o DRL” variant is fastest (4.2 h) but suffers 14.3% cost penalty. Our full model achieves the best quality-time trade-off at 12.8 h. All experiments were conducted on a workstation with Intel Xeon W-2295 CPU, 128GB RAM, and NVIDIA RTX 3090 GPU.
5.5. Sensitivity Analysis
Our sensitivity analysis reveals that improving SoH prediction from 85% to 95% accuracy reduces misallocation costs by 37% and increases overall material recovery value by 12%. Moderate redundancy (20–30% excess capacity) provides an optimal balance between cost increase and resilience improvement.
Value of Information Analysis. We quantify the economic value of improved SoH prediction accuracy to inform investment decisions in diagnostic equipment.
Table 12 presents the value of information (VOI) analysis.
The incremental VOI of improving accuracy from 85% to 90% is USD 31.5M over the 10-year planning horizon. Given that advanced diagnostic equipment (including electrochemical impedance spectroscopy systems) costs approximately USD 2–4M per testing facility [
61], the payback period for upgrading from basic to advanced diagnostics is approximately 0.6–1.2 years, strongly justifying the investment. However, the marginal value diminishes beyond 90%: the incremental VOI from 95% to perfect information is only USD 16.5M, suggesting that pursuing ultra-high accuracy may not be cost-effective.
Table 13 shows network performance under varying disruption probabilities.
6. Discussion and Managerial Implications
6.1. Key Findings
Our computational experiments yield several important findings:
Finding 1: AI-driven operational decisions significantly improve network performance. Integrating deep reinforcement learning for dynamic routing and sorting decisions enables the network to adapt to real-time conditions, resulting in better resource utilization and reduced operational costs. Compared to rule-based heuristics, the DRL agent reduces operational costs by 14.2% on average.
Finding 2: Facility redundancy is crucial for resilience but must be balanced against cost. Moderate redundancy (20–30% excess capacity) provides significant resilience benefits with manageable cost increases (8–12%).
Finding 3: SoH prediction accuracy is a critical success factor. Each 5% improvement in prediction accuracy yields approximately a 15% reduction in misallocation costs.
Finding 4: Regional network design is superior to country-level approaches. The GCC case study demonstrates that cross-border cooperation enables significant economies of scale, reducing total costs by 23% compared to independent national networks.
Finding 5: The hybrid uncertainty handling approach outperforms single-type methods. The FRS-CVaR framework demonstrates a 12–18% improvement in expected costs compared to purely stochastic or purely fuzzy approaches.
6.2. Managerial Implications
Based on our findings, we offer the following recommendations:
- 1.
Invest in AI capabilities: Organizations should prioritize developing AI systems for battery state estimation and operational decision support. Our analysis suggests a payback period of 2–3 years for AI infrastructure investments.
- 2.
Design for flexibility: Network design should incorporate modular facility concepts that allow for capacity expansion and technology upgrades. We recommend designing facilities with 30–40% expansion potential.
- 3.
Establish regional partnerships: Governments should facilitate cross-border cooperation for shared recycling infrastructure.
- 4.
Implement digital tracking systems: Battery passports and blockchain-based traceability systems should be deployed.
- 5.
Develop contingency plans: Organizations should maintain pre-qualified backup suppliers, emergency transportation agreements, and strategic inventory buffers.
7. Conclusions
This paper developed a novel AI-driven framework for designing resilient reverse logistics networks for the electric vehicle battery circular economy. The main contributions include: (1) a bi-level optimization model with a formal characterization of strategic-operational decision coupling; (2) a deep reinforcement learning application for dynamic battery routing and sorting; (3) a novel FRS-CVaR approach for hybrid uncertainty handling; and (4) an enhanced NSGA-III algorithm with comprehensive validation.
The GCC case study demonstrated substantial improvements: an 18.7% cost reduction (95% CI: [17.9%, 19.4%]), a 23.4% emission reduction (95% CI: [22.4%, 24.4%]), and a 31.2% resilience improvement (95% CI: [29.7%, 32.7%]).
Limitations and Future Research
Several limitations suggest directions for future research:
- 1.
Battery chemistry evolution: Future work could incorporate chemistry prediction for emerging technologies such as solid-state batteries.
- 2.
Real-world data validation: Real-world implementation would benefit from transfer learning using actual operational data.
- 3.
Global supply chain considerations: Global-scale considerations including cross-border material flows and trade policies warrant investigation.
- 4.
Social sustainability: Future research could extend the model to consider job creation, worker safety, and community impacts.
- 5.
Multi-agent coordination: Extending the framework to multi-agent reinforcement learning could address scenarios with multiple competing or cooperating decision-makers.
- 6.
Scalability considerations: Our case study involves 20 candidate facilities across the GCC region. For larger networks (e.g., pan-European or US-wide with hundreds of potential sites), computational scalability becomes a concern. The current approach exhibits the following computational complexity characteristics:
NSGA-III population evaluation: where N is population size, G is generations, and is per-configuration evaluation time. This is linear in population size and parallelizable across solutions.
DRL policy transfer: The network embedding computation requires operations for the forward pass. The meta-policy evaluation scales as where H is the number of hidden units and is the embedding dimension. For warm-start fine-tuning, the additional cost is where episodes.
Scenario-based stochastic programming: Grows as where is scenario count and is the polynomial growth in decision variables (approximately for our MILP structure).
Preliminary experiments on synthetic instances with 100 candidate locations show that computation time increases to approximately 45 h, remaining tractable. For networks with 500+ sites, we estimate computation times exceeding 200 h, which would likely require hierarchical decomposition approaches (e.g., clustering regions and solving sub-problems) or more aggressive policy transfer techniques such as graph neural network-based policy architectures that naturally handle variable-size inputs through message-passing mechanisms. Future work should investigate these scalable variants.
As the global EV market continues its rapid expansion, the need for effective end-of-life battery management becomes increasingly critical. This research provides both methodological advances and practical insights to support the transition toward a sustainable, circular economy for electric vehicle batteries.