A Multi-Agent Emergency Material Allocation Approach Based on a Markov Decision Process Under Demand Uncertainty for Sustainable Disaster Response
Abstract
1. Introduction
- How can the assessment process increase the number of people rescued by improving the information required for allocation decisions?
- How can the assessment agent and the allocation agent be effectively coordinated within a highly uncertain, time-sensitive operational environment?
2. Problem Definition and Model Description
2.1. Problem Description
2.2. Markov Decision Process Formulation
2.2.1. State Variable
2.2.2. Decision Variable
2.3. Markov Decision Process-Based Recursive Model
3. Solution Approach
3.1. Agent Collaboration in the Proposed Model
3.2. Feasibility Enforcement
3.3. D3QN-PER: Neural Architecture
- Bellman optimality and Q-network approximation
- 2.
- Dueling architecture and Double DQN target
- 3.
- Prioritized Experience Replay (PER)
| Algorithm 1: D3QN-PER Training Procedure |
| Input: discount factor , replay buffer capacity N, mini-batch size B, learning rate, target update interval, feasibility masking rules (Equations (7)–(10), served-zone exclusion rule, route feasibility conditions), PER parameters (, ). Output: learned DMA policy arg . 1. Initialize online dueling Q-network and target network . 2. Initialize prioritized replay buffer with capacity N. 3. For each training episode (disaster scenario) do 4. Initialize state , initialize state , where . 5. for t = 0, 1, …, T − 1 do 6. AT provides field intelligence = ; The DMA updates the state . 7. Construct the feasible set via constraints (7)–(10) and . 8. Select action : with probability , sample uniformly from ; 9. Otherwise, arg . 10. The ST executes the prescribed allocation; the AT executes routing if applicable; compute reward and next state ; 11. Construct the next augmented state by updating , and sampling new demands . 12. Store transition (,,) in with maximum priority . 13. Sample mini-batch of K transitions from using . 14. For each sampled transition K do 15. Compute TD target using Equation (21). 16. Compute TD error and update priority: ; =||+. 17. Compute IS weight using Equation (25). 18. Update ϑ by minimizing the loss function . 19. At every target update interval , synchronize the target network . 20. end for 21. end for 22. Return the learned policy arg . |
4. Case Study
4.1. Study Area and Data Settings
4.2. Results and Analysis
4.2.1. Performance Analysis and Contribution of the AT Agent
4.2.2. Performance Analysis of Multi-Agent Coordination
4.2.3. Solution Stability Analysis
4.2.4. Sensitivity Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A

| Town | Total Population | Instability Ratio | Vulnerable Population () |
|---|---|---|---|
| QLT | 2300 | 0.17 | 391 |
| CT | 3000 | 0.15 | 450 |
| TT | 2500 | 0.15 | 375 |
| NT | 1800 | 0.16 | 288 |
| GT | 1100 | 0.16 | 176 |
| QDT | 2900 | 0.15 | 435 |
| XGT | 10,000 | 0.12 | 1200 |
| ZT | 3169 | 0.13 | 411.97 |
| MT | 2900 | 0.15 | 435 |
| LT | 6906 | 0.16 | 1104.96 |
| QT | 3400 | 0.14 | 476 |
| XT | 5400 | 0.14 | 756 |
Appendix B
| Failure Probability | Start Zone | End Zone | Distance (km) | Time (min) |
|---|---|---|---|---|
| 0 | QLT | CT | 20.1 | 33 |
| 0 | QLT | TT | 21.7 | 34 |
| 0 | QLT | NT | 53.9 | 77 |
| 0 | QLT | GT | 42.8 | 41 |
| 0.15 | QLT | QDT | 150.4 | 203 |
| 0 | QLT | XGT | 30.8 | 36 |
| 0.0 | QLT | ZT | 83.0 | 111 |
| 0.0 | QLT | MT | 120.1 | 128 |
| 0.0 | QLT | LT | 115.5 | 120 |
| 0 | QLT | QT | 105.4 | 107 |
| 0 | QLT | XT | 116.8 | 117 |
| 0 | CT | TT | 21.7 | 27 |
| 0 | CT | NT | 44.5 | 58 |
| 0 | CT | GT | 62.8 | 63 |
| 0.1 | CT | QDT | 119.6 | 168 |
| 0 | CT | XGT | 50.8 | 58 |
| 0.1 | CT | ZT | 171.5 | 178 |
| 0.05 | CT | MT | 140.0 | 149 |
| 0.1 | CT | LT | 135.5 | 165 |
| 0 | CT | QT | 125.3 | 128 |
| 0.05 | CT | XT | 136.8 | 138 |
| 0 | TT | NT | 55.4 | 72 |
| 0 | TT | GT | 64.5 | 65 |
| 0.1 | TT | QDT | 130.5 | 171.9 |
| 0 | TT | XGT | 52.3 | 60 |
| 0.1 | TT | ZT | 173.1 | 180 |
| 0.05 | TT | MT | 142 | 153 |
| 0.05 | TT | LT | 137.0 | 140 |
| 0 | TT | QT | 129 | 126.9 |
| 0.05 | TT | XT | 138.3 | 139 |
| 0 | NT | GT | 96.5 | 99 |
| 0.05 | NT | QT | 106.7 | 157 |
| 0 | NT | XT | 84.6 | 92 |
| 0.1 | NT | ZT | 162.8 | 172 |
| 0.05 | NT | MT | 135.1 | 152 |
| 0.05 | NT | LT | 126.8 | 134 |
| 0.05 | NT | QT | 120.5 | 131 |
| 0 | NT | XT | 108.4 | 117 |
| 0.15 | GT | QDT | 149.3 | 206 |
| 0 | GT | XGT | 25.8 | 29 |
| 0 | GT | ZT | 108.9 | 116 |
| 0 | GT | MT | 77.4 | 88 |
| 0 | GT | LT | 72.8 | 79 |
| 0 | GT | QT | 66 | 62.7 |
| 0 | GT | XT | 74.1 | 77 |
| 0.15 | QDT | XGT | 137.5 | 200 |
| 0.15 | QDT | ZT | 241.4 | 281 |
| 0.15 | QDT | MT | 213.6 | 260 |
| 0.15 | QDT | LT | 205.3 | 243 |
| 0.15 | QDT | QT | 199.1 | 240 |
| 0.15 | QDT | XT | 186.9 | 226 |
| 0.15 | XGT | ZT | 134.5 | 146 |
| 0 | XGT | MT | 103.3 | 116 |
| 0 | XGT | LT | 98.5 | 107 |
| 0 | XGT | QT | 88.3 | 94 |
| 0 | XGT | XT | 99.7 | 105 |
| 0 | ZT | MT | 62.2 | 75 |
| 0 | ZT | LT | 36 | 39 |
| 0 | ZT | QT | 47.1 | 53 |
| 0 | ZT | XT | 54.6 | 57 |
| 0 | MT | LT | 26.3 | 40 |
| 0 | MT | QT | 15.7 | 27 |
| 0 | MT | XT | 26.9 | 37 |
| 0 | LT | QT | 11.1 | 15 |
| 0 | LT | XT | 18.6 | 18 |
| 0 | QT | XT | 12.3 | 14 |
Appendix C
| Parameter | Value | Explanation |
|---|---|---|
| Replay buffer capacity | 10,000,000 | Maximum number of transitions stored in the replay buffer |
| Mini-batch size B | 64 | Number of transitions sampled per gradient update step |
| Learning rate | 0.00025 | Step size for neural network parameter optimization |
| PER priority exponent | 0.6 | Controls degree of prioritization in experience sampling |
| Maximum episodes | 100,000 (1200) | Number of training episodes |
| Target update | 400 | Frequency of target network parameter synchronization |
| Exploration rate | 1.0 | Probability of random action selection |
Appendix D




References
- Laosunthara, A.; Kruthphong, K.; Leelawat, N.; Wararuksajja, W.; Sukulthanasorn, N.; Suppasri, A.; Thongthip, R.; Chintanapakdee, C. Initial observations and immediate lessons learned from Thailand’s response to the 2025 Mandalay earthquake. Int. J. Disaster Risk Reduct. 2025, 127, 105675. [Google Scholar] [CrossRef]
- Sun, F.; Li, H.; Cai, J.; Hu, S.; Xing, H. Examining organizational collaboration and resource flows of disaster response systems based on a time-dynamic perspective. Int. J. Disaster Risk Reduct. 2024, 108, 104565. [Google Scholar] [CrossRef]
- Biswas, S.; Kumar, D.; Hajiaghaei-Keshteli, M.; Bera, U.K. An AI-based framework for earthquake relief demand forecasting: A case study in Turkey. Int. J. Disaster Risk Reduct. 2024, 102, 104287. [Google Scholar] [CrossRef]
- Steen, R.; Roud, E.; Torp, T.M.; Hansen, T.A. The impact of interorganizational collaboration on the viability of disaster response operations: The Gjerdrum landslide in Norway. Saf. Sci. 2024, 173, 106459. [Google Scholar] [CrossRef]
- Andreassen, N.; Borch, O.J. Co-ordination of emergency response systems in high-complexity environments. In Crisis and Emergency Management in the Arctic; Routledge: Abingdon, UK, 2020. [Google Scholar] [CrossRef]
- Yu, L.; Zhang, C.; Jiang, J.; Yang, H.; Shang, H. Reinforcement learning approach for resource allocation in humanitarian logistics. Expert Syst. Appl. 2021, 173, 114663. [Google Scholar] [CrossRef]
- Camacho-Vallejo, J.-F.; González-Rodríguez, E.; Almaguer, F.-J.; González-Ramírez, R.G. A bi-level optimization model for aid distribution after the occurrence of a disaster. J. Clean. Prod. 2015, 105, 134–145. [Google Scholar] [CrossRef]
- Zahedi, A.; Kargari, M.; Husseinzadeh Kashan, A. Multi-objective decision-making model for distribution planning of goods and routing of vehicles in emergency. Int. J. Disaster Risk Reduct. 2020, 48, 101587. [Google Scholar] [CrossRef]
- Wang, Y.; Dong, Z.S.; Hu, S. A stochastic prepositioning model for distribution of disaster supplies considering lateral transshipment. Socio-Econ. Plan. Sci. 2021, 74, 100930. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, B. Multiperiod optimal emergency material allocation considering road network damage and risk under uncertain conditions. Oper. Res. 2022, 22, 2173–2208. [Google Scholar] [CrossRef]
- Shiripour, S.; Mahdavi-Amiri, N. Optimal distribution of the injured in a multi-type transportation network with damage-dependent travel times. Socio-Econ. Plan. Sci. 2019, 68, 100660. [Google Scholar] [CrossRef]
- Wang, Y.; Bier, V.M.; Sun, B. Measuring and achieving equity in multiperiod emergency material allocation. Risk Anal. 2019, 39, 2408–2426. [Google Scholar] [CrossRef] [PubMed]
- Wang, F.; Pei, Z.; Dong, L.; Ma, J. Emergency resource allocation for multi-period post-disaster using multi-objective cellular genetic algorithm. IEEE Access 2020, 8, 82255–82265. [Google Scholar] [CrossRef]
- Zhang, J.; Long, D.Z.; Li, Y. A reliable emergency logistics network for COVID-19 considering uncertain time-varying demands. Transp. Res. Part E Logist. Transp. Rev. 2023, 172, 103087. [Google Scholar] [CrossRef]
- Long, E.F.; Nohdurft, E.; Spinler, S. Spatial resource allocation for emerging epidemics: A comparison of greedy, myopic, and dynamic policies. Manuf. Serv. Oper. Manag. 2018, 20, 181–198. [Google Scholar] [CrossRef]
- Li, Y.; Chung, S.H. Disaster relief routing under uncertainty: A robust optimization approach. IISE Trans. 2019, 51, 869–886. [Google Scholar] [CrossRef]
- Wang, W.; Yang, K.; Yang, L.; Gao, Z. Distributionally robust chance-constrained programming for multi-period emergency resource allocation and vehicle routing in disaster response. Omega 2023, 120, 102915. [Google Scholar] [CrossRef]
- Manshadi, V.; Niazadeh, R.; Rodilitz, S. Fair dynamic rationing. Manag. Sci. 2023, 69, 6417–7150. [Google Scholar] [CrossRef]
- Wan, M.; Ye, C.; Peng, D. Multi-period dynamic multi-objective emergency material distribution model under uncertain demand. Eng. Appl. Artif. Intell. 2023, 117, 105530. [Google Scholar] [CrossRef]
- Mohamadi, A.; Yaghoubi, S. A bi-objective stochastic model for emergency medical services network design under disruptions. Int. J. Disaster Risk Reduct. 2017, 23, 204–217. [Google Scholar] [CrossRef]
- Caunhye, A.M.; Nie, X. A stochastic programming model for casualty response planning during catastrophic health events. Transp. Sci. 2018, 52, 437–453. [Google Scholar] [CrossRef]
- Yang, M.; Liu, Y.; Yang, G. Multi-period dynamic distributionally robust pre-positioning of emergency supplies under demand uncertainty. Appl. Math. Model. 2021, 89, 1433–1458. [Google Scholar] [CrossRef]
- Zhou, L.; Wu, X.; Xu, Z.; Fujita, H. Emergency decision making for natural disasters: An overview. Int. J. Disaster Risk Reduct. 2018, 27, 567–576. [Google Scholar] [CrossRef]
- Wang, C.; Ju, P.; Lei, S.; Wang, Z.; Wu, F.; Hou, Y. Markov decision process-based resilience enhancement for distribution systems. IEEE Trans. Smart Grid 2020, 11, 2498–2510. [Google Scholar] [CrossRef]
- Iqbal, S.; Sardar, M.U.; Lodhi, F.K.; Hasan, O. Statistical model checking of relief supply location and distribution in natural disaster management. Int. J. Disaster Risk Reduct. 2018, 31, 1043–1053. [Google Scholar] [CrossRef]
- Wang, J.; Wang, N.; Ouyang, M. Regional-scale dynamic planning for distributing emergency supplies under evolving tropical cyclones. Reliab. Eng. Syst. Saf. 2024, 245, 110024. [Google Scholar] [CrossRef]
- Zhan, S.L.; Liu, N. Determining the optimal decision time of relief allocation in response to disaster via relief demand updates. Int. J. Syst. Sci. 2016, 47, 509–520. [Google Scholar] [CrossRef]
- Comfort, L.K.; Ko, K.; Zagorecki, A. Coordination in rapidly evolving disaster response systems: The role of information. Am. Behav. Sci. 2004, 48, 295–313. [Google Scholar] [CrossRef]
- Mochizuki, J.; Toyasaki, F.; Sigala, I.F. Toward resilient humanitarian cooperation: Examining horizontal cooperation among humanitarian organizations using ABM. J. Nat. Disaster Sci. 2015, 36, 35–52. [Google Scholar] [CrossRef][Green Version]
- Kirac, E.; Shaltayev, D.; Wood, N. Evaluating the impact of citizen collaboration with government agencies in disaster response operations: An agent-based simulation study. Int. J. Disaster Risk Reduct. 2024, 106, 104469. [Google Scholar] [CrossRef]
- Martin, S. A multi-agent-based cooperative approach to scheduling and routing. Eur. J. Oper. Res. 2016, 254, 169–178. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, J. Agent-based evaluation of humanitarian relief goods supply capability. Int. J. Disaster Risk Reduct. 2019, 36, 101105. [Google Scholar] [CrossRef]
- Beklaryan, A.L.; Akopov, A.S. Simulation of agent-rescuer behavior in emergency based on modified fuzzy clustering. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (AAMAS ‘16); International Foundation for Autonomous Agents and Multiagent Systems: Richland, SC, USA, 2016; pp. 1275–1276. [Google Scholar] [CrossRef]
- Hawe, G.I.; Coates, G.; Wilson, D.T.; Crouch, R.S. Agent-based simulation of emergency response to plan the allocation of resources for a hypothetical two-site major incident. Eng. Appl. Artif. Intell. 2015, 46, 336–345. [Google Scholar] [CrossRef]
- Akopov, A.S.; Beklaryan, L.A.; Beklaryan, A.L. Cluster-based optimization of an evacuation process using a parallel bi-objective real-coded genetic algorithm. Cybern. Inf. Technol. 2020, 20, 45–63. [Google Scholar] [CrossRef]
- Guerrero Granados, B.; Quintero Monroy, C.G.; Viloria Núñez, C. Improved genetic algorithm approach for coordinating decision-making in technological disaster management. Neural Comput. Appl. 2024, 36, 4503–4521. [Google Scholar] [CrossRef]
- Huang, P.; Lin, X.; Liu, C.; Fu, L.; Yu, L. A real-time automatic fire emergency evacuation route selection model based on decision-making processes of pedestrians. Saf. Sci. 2024, 169, 106332. [Google Scholar] [CrossRef]
- Lee, H.-R.; Lee, T. Multi-agent reinforcement learning algorithm to solve a partially observable multi-agent problem in disaster response. Eur. J. Oper. Res. 2021, 291, 296–308. [Google Scholar] [CrossRef]
- Schmid, V. Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming. Eur. J. Oper. Res. 2012, 219, 611–621. [Google Scholar] [CrossRef] [PubMed]
- Asadi, A.; Pinkley, S.N.; Mes, M. A Markov decision process approach for managing medical drone deliveries. Expert Syst. Appl. 2022, 204, 117490. [Google Scholar] [CrossRef]
- Robbins, M.J.; Jenkins, P.R.; Bastian, N.D.; Lunday, B.J. Approximate dynamic programming for the aeromedical evacuation dispatching problem. Omega 2020, 91, 102020. [Google Scholar] [CrossRef]
- Wang, X.; Liang, Z.; Zhu, K. Markov decision model of emergency medical supply scheduling in public health emergencies. Int. J. Comput. Intell. Syst. 2021, 14, 1155–1169. [Google Scholar] [CrossRef]
- Yang, S.; Zhang, Y.; Lu, X.; Guo, W.; Miao, H. Multi-agent deep reinforcement learning-based decision support model for resilient community post-hazard recovery. Reliab. Eng. Syst. Saf. 2024, 242, 109754. [Google Scholar] [CrossRef]
- Nadi, A.; Edrisi, A. Adaptive multi-agent relief assessment and emergency response. Int. J. Disaster Risk Reduct. 2017, 24, 12–23. [Google Scholar] [CrossRef]
- Edrisi, A.; Poorzahedy, H.; Nassiri, H.; Nourinejad, M. A multi-agent optimization formulation of earthquake disaster prevention and management. Eur. J. Oper. Res. 2013, 229, 261–275. [Google Scholar] [CrossRef]
- Shapiro, A. Distributionally robust optimal control and MDP modeling. Oper. Res. Lett. 2021, 49, 809–814. [Google Scholar] [CrossRef]
- Wiesemann, W.; Kuhn, D.; Rustem, B. Robust Markov decision processes. Math. Oper. Res. 2013, 38, 153–183. [Google Scholar] [CrossRef]
- Gök, M. Dynamic path planning via Dueling Double Deep Q-Network (D3QN) with prioritized experience replay. Appl. Soft Comput. 2024, 158, 111503. [Google Scholar] [CrossRef]
- Zeng, Y.; Wen, X.; Tan, Q.; Liu, Y.; Chen, X. Real-time load dispatch in hydropower plant based on D3QN-PER. J. Hydrol. 2023, 625, 130019. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Huang, S.; Ontañón, S. A closer look at invalid action masking in policy gradient algorithms. Proc. Int. Fla. Artif. Intell. Res. Soc. Conf. 2022, 35. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Liu, Y.; Zhang, W.; Chen, F.; Li, J. Path planning based on improved deep deterministic policy gradient algorithm. In Proceedings of the 2019 IEEE ITNEC Conference, Chengdu, China, 15–17 March 2019; IEEE: New York, NY, USA; pp. 295–299. [CrossRef]
- Wang, Y.; Chen, H. Blockchain: A potential technology to improve the performance of collaborative emergency management with multi-agent participation. Int. J. Disaster Risk Reduct. 2022, 72, 102867. [Google Scholar] [CrossRef]
- Jaiswal, K.S.; Wald, D.J.; Hearne, M. Estimating Casualties for Large Earthquakes Worldwide Using an Empirical Approach; USGS Open-File Report 2009-1136; USGS: Reston, VA, USA, 2009. [Google Scholar] [CrossRef]
- Chen, W.; Wu, M.; Zhang, L.; Gardoni, P. Multi-objective optimization for enhancing hospital network resilience under earthquakes. Int. J. Disaster Risk Reduct. 2022, 82, 103281. [Google Scholar] [CrossRef]
- Lin, A.; Wu, H.; Liang, G.; Cardenas-Tristan, A.; Wu, X.; Zhao, C.; Li, D. A big data-driven dynamic estimation model of relief supplies demand in urban flood disaster. Int. J. Disaster Risk Reduct. 2020, 49, 101682. [Google Scholar] [CrossRef]
- Jaimungal, S. Reinforcement learning and stochastic optimisation. Financ. Stoch. 2022, 26, 103–129. [Google Scholar] [CrossRef]
- Saltelli, A.; Bammer, G.; Bruno, I.; Charters, E.; Di Fiore, M.; Didier, E.; Espeland, W.N.; Kay, J.; Lo Piano, S.; Mayo, D.; et al. Five ways to ensure that models serve society: A manifesto. Nature 2020, 582, 482–484. [Google Scholar] [CrossRef]
- Dukkanci, O.; Koberstein, A.; Kara, B.Y. Drones for relief logistics under uncertainty after an earthquake. Eur. J. Oper. Res. 2023, 310, 117–132. [Google Scholar] [CrossRef]









| Notation | Definition |
|---|---|
| Sets, Indices, and Parameters | |
| I | Set of demand zones, indexed by |
| K | Set of medical supply points, indexed by |
| S | All feasible states |
| Set of actions at decision epoch in state | |
| Zone list defining the material allocation sequence for disaster zones | |
| Route link, , | |
| Index of disaster zones | |
| Travel time from zone to zone | |
| Service time at zone | |
| Travel time from zone to zone | |
| Completion time at zone | |
| Performance factor | |
| Nominal demand in zone (vulnerable population in zone ) | |
| Actual demand in zone (actual injured population in zone ) | |
| Discount factor | |
| State variables | |
| Consists of two components: material status and disaster zone request status, at decision epoch | |
| The status of supply point k, indicating which disaster zones can be serviced at decision epoch | |
| The total available materials from supply point k at decision epoch | |
| Characteristics of the disaster zone request at decision epoch | |
| Location of a disaster zone at decision epoch | |
| Demand level of the corresponding zone at decision epoch | |
| Probability distribution; explanatory variables | |
| Whether the disaster zone is being served at time , If , it means the zone is not being served at decision epoch | |
| Decision variables | |
| Indices of actions, at decision epoch | |
| Whether disaster zone chooses route z for the material need assessment of the next disaster zone () or not (), at decision epoch | |
| Quantity of materials allocated from supply point k to disaster zone , at decision epoch | |
| Function | |
| Value function of states at time | |
| The reward function | |
| The rescue effect of taking feasible action | |
| Survival function describing the effectiveness of relief assessment | |
| Approximation function (Q-value), represented by the number of people rescued | |
| Decision function | |
| The optimal value of the next state | |
| Zone | (h) | (h) | Route (→) | (h) | (h) | (h) |
|---|---|---|---|---|---|---|
| QT | 0.9775 | 2.6067 | QLT → CT | 0.5500 | 0.9775 | 2.6067 |
| CT | 1.1250 | 3.0000 | CT → XGT | 0.9667 | 2.6525 | 6.1567 |
| XGT | 1.0875 | 2.9000 | XGT → MT | 1.9333 | 4.7067 | 10.0234 |
| MT | 3.00 | 8.0000 | MT → LT | 0.6667 | 9.6400 | 19.9567 |
| LT | 1.0875 | 2.9000 | LT → QDT | 4.0500 | 11.3942 | 23.5234 |
| QDT | 2.7624 | 7.3664 | QDT → QDT | 0.00 | 18.2066 | 34.9398 |
| Zone | Scenario 1 (with AT) | Scenario 2 (Without AT) | ||||
|---|---|---|---|---|---|---|
| AT Route | ST Satisfaction (%) | Q-Value | AT Route | ST Satisfaction (%) | Q-Value | |
| QLT | 7 | 70 | 343 | — | 63 | 235 |
| CT | 4 | 80 | 297 | — | 80 | 247 |
| XGT | 9 | 60 | 339 | — | 36 | 311 |
| MT | 1 | 40 | 754 | — | 40 | 298 |
| LT | 11 | 74 | 337 | — | 58 | 703 |
| QDT | 5 | 64 | 229 | — | 73 | 267 |
| Scenario | Method | No. of People Rescued | Completion Time (h) |
|---|---|---|---|
| DR1 | D3QN-PER | 6173.98 | 64.69 |
| DQN | 5500.01 | 66.03 | |
| Q-learning | 5290.14 | 67.00 | |
| Myopic policy | 4874.20 | 64.09 | |
| DR2 | D3QN-PER | 5524.09 | 67.49 |
| DQN | 5300.14 | 68.67 | |
| Q-learning | 5199.14 | 70.45 | |
| Myopic policy | 4549.25 | 72.60 |
| Scenario | Method | Mean | Std Dev | CV (%) | Min | Max | IQR |
|---|---|---|---|---|---|---|---|
| DR1 | D3QN-PER | 6233 | 143 | 2.3 | 5924 | 6483 | 172 |
| DQN | 5557 | 283 | 5.1 | 5070 | 5981 | 476 | |
| Q-learning | 5284 | 256 | 4.8 | 4619 | 5632 | 339 | |
| Myopic policy | 4973 | 447 | 9.0 | 4132 | 5701 | 627 | |
| DR2 | D3QN-PER | 5592 | 165 | 3.0 | 5235 | 5881 | 199 |
| DQN | 5361 | 304 | 5.7 | 4839 | 5816 | 511 | |
| Q-learning | 5192 | 277 | 5.3 | 4471 | 5570 | 368 | |
| Myopic policy | 4658 | 489 | 10.5 | 3737 | 5454 | 686 |
| AT | Routing Paths (Disaster Zones Sequences) |
|---|---|
| AT = 3 | AT 1: ZT-LT-MT-XT; AT 2: XGT-CT-TT-QDT; AT 3: QLT-GT-QT-NT |
| AT = 4 | AT 1: QLT-TT-NT; AT 2: XGT-CT-QDT; AT 3: GT-LT-QT; AT 4: ZT-MT-XT |
| AT = 5 | AT 1: QLT-GT-QT; AT 2: TT-NT; AT 3: XGT-LT-XT; AT 4: ZT-MT; AT 5: CT-QDT |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, L.; Hou, J. A Multi-Agent Emergency Material Allocation Approach Based on a Markov Decision Process Under Demand Uncertainty for Sustainable Disaster Response. Sustainability 2026, 18, 5539. https://doi.org/10.3390/su18115539
Huang L, Hou J. A Multi-Agent Emergency Material Allocation Approach Based on a Markov Decision Process Under Demand Uncertainty for Sustainable Disaster Response. Sustainability. 2026; 18(11):5539. https://doi.org/10.3390/su18115539
Chicago/Turabian StyleHuang, Lu, and Jundong Hou. 2026. "A Multi-Agent Emergency Material Allocation Approach Based on a Markov Decision Process Under Demand Uncertainty for Sustainable Disaster Response" Sustainability 18, no. 11: 5539. https://doi.org/10.3390/su18115539
APA StyleHuang, L., & Hou, J. (2026). A Multi-Agent Emergency Material Allocation Approach Based on a Markov Decision Process Under Demand Uncertainty for Sustainable Disaster Response. Sustainability, 18(11), 5539. https://doi.org/10.3390/su18115539

