Abstract
During pandemic emergencies, demand for relief supplies in affected areas surges abruptly and evolves randomly and dynamically, resulting in highly asymmetric supply and demand. Ensuring timely and reliable supply requires robust decision-making under risk. This study addresses a stochastic multi-objective location-routing problem (LRP) that simultaneously considers demand uncertainty and travel time variability. A multi-scenario stochastic programming model is developed with three objectives: minimizing total system cost, minimizing total waiting time, and minimizing the composite conditional value at risk (CVaR–Rcomp) to capture tail risks under extreme scenarios. A novel regret-based risk mechanism is introduced to unify temporal and cost dimensions, enabling joint evaluation of uncertainties within a single framework. To solve this challenging high-dimensional problem, a reinforcement learning-enhanced NSGA-III (RL-NSGAIII) is proposed. Specifically, Q-learning generates high-quality initial solutions, which accelerate convergence and improve population diversity for NSGA-III. Case studies demonstrate that the proposed method outperforms traditional evolutionary algorithms in convergence efficiency and Pareto solution quality, while effectively revealing potential risk blind spots. The results provide quantitative decision support and robust optimization insights for emergency logistics networks operating under uncertain conditions.