CAHT: A Constraint-Aware Heterogeneous Transformer for Real-Time Multi-Robot Task Allocation in Warehouse Environments
Abstract
1. Introduction
- (1)
- Dynamic feasibility masking (addressing limitation b): Hard constraint enforcement is embedded directly into the assignment decoder’s probability computation by setting infeasible robot–task scores to negative infinity before softmax normalization. This architectural mechanism reduces constraint violations by over 75 percentage points and improves objective values by 213% compared to unconstrained decoding, validating that constraint satisfaction in heterogeneous MRTA cannot be learned from data alone but must be structurally enforced [14].
- (2)
- Spatial-bias Transformer encoding for heterogeneous entities (addressing limitation a): The standard self-attention mechanism is augmented with a learned spatial proximity bias, enabling distance-dependent robot–task interaction modeling. Combined with type-specific input embeddings that distinguish robot categories, this design supports effective representation learning across heterogeneous entity types without requiring explicit graph construction.
- (3)
- End-to-end assignment and sequencing (addressing limitation c): CAHT jointly produces task-to-robot assignments via a bilinear attention decoder and per-robot task execution orders via a GRU-based autoregressive decoder, eliminating the need for separate optimization stages.
2. Related Work
2.1. Multi-Robot Task Allocation
2.2. Vehicle Routing with Heterogeneous Fleets
2.3. Neural Combinatorial Optimization
3. Methodology
3.1. Problem Formulation
3.2. Model Architecture
3.2.1. Heterogeneous Input Embedding
3.2.2. Spatial-Bias Transformer Encoder
3.2.3. Constraint-Aware Assignment Decoder
3.2.4. Autoregressive Sequencing Decoder
3.3. Two-Stage Training
3.3.1. Stage I: Supervised Pretraining
3.3.2. Stage II: Reinforcement Learning Fine-Tuning
3.4. Inference
3.5. Model Complexity
4. Results and Discussion
4.1. Experimental Setup
4.1.1. Dataset
4.1.2. Baselines and Metrics
4.2. Solution Quality and the Speed–Quality Trade-Off
Supervised vs. Reinforcement Learning Variants
4.3. Ablation Study: Why Dynamic Masking Is the Key Innovation
4.4. Cross-Scale Generalization
4.5. Latency Profiling
4.6. Online Rolling-Horizon Evaluation
4.7. Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
References
- Keith, R.; La, H.M. Review of autonomous mobile robots for the warehouse environment. arXiv 2024, arXiv:2406.08333. [Google Scholar] [CrossRef]
- Zhen, L.; Tan, Z.; de Koster, R.; He, X.; Wang, S.; Wang, H. Optimizing warehouse operations with autonomous mobile robots. Transp. Sci. 2025, 59, 1130–1152. [Google Scholar] [CrossRef]
- Msala, Y.; Oussama, H.; Talea, M.; Aboulfatah, M. A novel method for enhancing warehouse operations using heterogeneous robotic systems for autonomous pick-and-deliver tasks. EAI Endorsed Trans. AI Robot. 2025, 4, 1–13. [Google Scholar] [CrossRef]
- Shakeri, Z.; Benfriha, K.; Varmazyar, M.; Talhi, E.; Quenehen, A. Production scheduling with multi-robot task allocation in a real industry 4.0 setting. Sci. Rep. 2025, 15, 1795. [Google Scholar] [CrossRef] [PubMed]
- Choi, B.; Kim, M.; Kim, H. An optimization framework for allocating and scheduling multiple tasks of multiple logistics robots. Mathematics 2025, 13, 1770. [Google Scholar] [CrossRef]
- Metz, L.; Mutzel, P.; Niemann, T.; Schürmann, L.; Stiller, S.; Tillmann, A.M. Delay-resistant robust vehicle routing with heterogeneous time windows. Comput. Oper. Res. 2024, 164, 106553. [Google Scholar] [CrossRef]
- Voigt, S. A review and ranking of operators in adaptive large neighborhood search for vehicle routing problems. Eur. J. Oper. Res. 2025, 322, 357–375. [Google Scholar] [CrossRef]
- Liu, S.; Sun, J.; Duan, X.; Liu, G. Parallel adaptive large neighborhood search based on Spark to solve VRPTW. Sci. Rep. 2024, 14, 23809. [Google Scholar] [CrossRef] [PubMed]
- Darvariu, V.-A.; Hailes, S.; Musolesi, M. Graph reinforcement learning for combinatorial optimization: A survey and unifying perspective. arXiv 2024, arXiv:2404.06492. [Google Scholar] [CrossRef]
- Chung, K.T.; Lee, C.K.M.; Tsang, Y.P. Neural combinatorial optimization with reinforcement learning in industrial engineering: A survey. Artif. Intell. Rev. 2025, 58, 130. [Google Scholar] [CrossRef]
- Berto, F.; Hua, C.; Park, J.; Luttmann, L.; Ma, Y.; Bu, F.; Wang, J.; Ye, H.; Kim, M.; Choi, S.; et al. RL4CO: An extensive reinforcement learning for combinatorial optimization benchmark. In Proceedings of the KDD 2025, Toronto, ON, Canada, 3–7 August 2025. [Google Scholar] [CrossRef]
- Fang, H.; Song, Z.; Weng, P.; Ban, Y. INViT: A generalizable routing problem solver with invariant nested view Transformer. arXiv 2024, arXiv:2402.02317. [Google Scholar] [CrossRef]
- Gao, C.; Shang, H.; Xue, K.; Li, D.; Qian, C. Towards generalizable neural solvers for vehicle routing problems via ensemble with transferrable local policy. In Proceedings of the IJCAI-24, Jeju, Republic of Korea, 3–9 August 2024; pp. 6914–6922. [Google Scholar] [CrossRef]
- Bi, J.; Ma, Y.; Zhou, J.; Song, W.; Cao, Z.; Wu, Y.; Zhang, J. Learning to handle complex constraints for vehicle routing problems. arXiv 2024, arXiv:2410.21066. [Google Scholar] [CrossRef]
- Sioud, R.; Bamoumen, M.; Hamani, N. A novel model for multi-robot task assignment in smart warehouses. In IN4PL 2024; CCIS 2373; Springer: New York, NY, USA, 2025; pp. 343–353. [Google Scholar] [CrossRef]
- Mozhdehi, A.; Mohammadizadeh, M.; Wang, Y.; Sun, S.; Wang, X. EFECTIW-ROTER: Deep reinforcement learning approach for solving heterogeneous fleet and demand VRPTW. In Proceedings of the ACM SIGSPATIAL 2024, Atlanta, GA, USA, 29 October–1 November 2024; pp. 17–28. [Google Scholar] [CrossRef]
- Kim, B.S.; Mozhdehi, A.; Wang, Y.; Sun, S.; Wang, X. Clustering-based enhanced ant colony optimization for multi-trip VRP with heterogeneous fleet and time windows. In Proceedings of the IWCTS’24, Atlanta, GA, USA, 29 October 2024; pp. 46–55. [Google Scholar] [CrossRef]
- Boualamia, H.; Metrane, A.; Hafidi, I.; Mellouli, O. A new adaptation mechanism of the ALNS algorithm using reinforcement learning. Oper. Res. Forum 2025, 6, 105. [Google Scholar] [CrossRef]
- Ye, H.; Wang, J.; Liang, H.; Cao, Z.; Li, Y.; Li, F. GLOP: Learning global partition and local construction for solving large-scale routing problems in real-time. In Proceedings of the AAAI-24, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 20284–20292. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhou, C.; Tong, X.; Yuan, M.; Wang, Z. UDC: A unified neural divide-and-conquer framework for large-scale combinatorial optimization problems. Adv. Neural Inf. Process. Syst. 2024, 37, 6081–6125. [Google Scholar]



| Scale | N | M | Train | Aug. | Test | ALNS Obj. | CVR% | TW% |
|---|---|---|---|---|---|---|---|---|
| S | 5 | 50 | 300 | 1200 | 50 | 1001.3 | 0.7 | 99.3 |
| M | 10 | 100 | 300 | 1200 | 50 | 1781.5 | 1.9 | 98.1 |
| L | 15 | 150 | 300 | 1200 | 50 | 2575.4 | 3.5 | 96.5 |
| XL | 20 | 200 | – | – | 50 | 3431.4 | 4.6 | 95.4 |
| (a) | ||||||
| Method | Obj. ↓ | Gap (%) | CVR% ↓ | TW% ↑ | Makespan | Time (ms) |
| ALNS (30 s) | 1001.3 ± 98.5 | 0.0 | 0.7 ± 1.6 | 99.3 ± 1.6 | 820.8 ± 129.7 | 2115.6 |
| Nearest Greedy | 1276.5 ± 189.1 | +27.5 | 2.0 ± 2.6 | 98.0 ± 2.6 | 810.8 ± 214.0 | 0.5 |
| OR-Tools (10 s) | 1297.1 ± 241.7 | +29.5 | 25.4 ± 11.0 | 79.4 ± 5.7 | 926.2 ± 266.3 | 10,001.5 |
| POMO | 2080.7 ± 758.8 | +107.8 | 99.1 ± 2.7 | 73.4 ± 10.0 | 1449.8 ± 244.6 | 7.8 |
| POMO+Repair | 1188.7 ± 211.2 | +18.7 | 6.3 ± 4.8 | 93.7 ± 4.8 | 937.4 ± 140.0 | 9.2 |
| CAHT (SL) | 1111.9 ± 152.2 | +11.0 | 3.5 ± 3.0 | 96.5 ± 3.0 | 961.2 ± 171.6 | 20.2 |
| CAHT (SL+RL) | 1131.2 ± 170.9 | +13.0 | 4.9 ± 3.4 | 95.1 ± 3.4 | 954.4 ± 164.2 | 21.2 |
| (b) | ||||||
| Method | Obj. ↓ | Gap (%) | CVR% ↓ | TW% ↑ | Makespan | Time (ms) |
| ALNS (30 s) | 1781.5 ± 128.7 | 0.0 | 1.9 ± 1.7 | 98.1 ± 1.7 | 919.2 ± 113.1 | 3001.8 |
| Nearest Greedy | 2222.6 ± 236.5 | +24.8 | 2.6 ± 2.2 | 97.4 ± 2.2 | 949.7 ± 192.2 | 1.7 |
| OR-Tools (10 s) | 2191.1 ± 282.5 | +23.0 | 26.7 ± 8.8 | 79.2 ± 4.7 | 992.1 ± 242.9 | 10,003.2 |
| POMO | 10,664.3 ± 2865.0 | +498.6 | 98.4 ± 5.4 | 45.8 ± 8.2 | 2566.1 ± 321.8 | 16.8 |
| POMO+Repair | 1995.4 ± 188.7 | +12.0 | 3.7 ± 2.4 | 96.3 ± 2.4 | 890.3 ± 138.0 | 19.5 |
| CAHT (SL) | 1947.9 ± 184.7 | +9.3 | 4.0 ± 2.3 | 96.0 ± 2.3 | 1075.2 ± 150.2 | 52.5 |
| CAHT (SL+RL) | 1997.1 ± 199.4 | +12.1 | 4.9 ± 2.2 | 95.1 ± 2.2 | 1070.9 ± 148.9 | 50.8 |
| (c) | ||||||
| Method | Obj. ↓ | Gap (%) | CVR% ↓ | TW% ↑ | Makespan | Time (ms) |
| ALNS (30 s) | 2575.4 ± 173.0 | 0.0 | 3.5 ± 2.0 | 96.5 ± 2.0 | 981.3 ± 113.3 | 3006.1 |
| Nearest Greedy | 3109.5 ± 263.1 | +20.7 | 2.6 ± 1.8 | 97.4 ± 1.8 | 1018.7 ± 206.2 | 3.9 |
| OR-Tools (10 s) | 3174.6 ± 455.3 | +23.3 | 28.4 ± 5.7 | 79.0 ± 4.8 | 1067.6 ± 233.4 | 10,008.2 |
| POMO | 29,024.2 ± 4580.9 | +1027.0 | 99.6 ± 0.6 | 33.0 ± 5.5 | 3672.4 ± 311.5 | 26.5 |
| POMO+Repair | 2952.4 ± 257.3 | +14.6 | 3.3 ± 1.7 | 96.7 ± 1.7 | 998.7 ± 203.7 | 33.1 |
| CAHT (SL) | 2757.7 ± 240.1 | +7.1 | 4.4 ± 2.5 | 95.8 ± 2.0 | 1133.8 ± 134.7 | 90.7 |
| CAHT (SL+RL) | 2867.0 ± 269.4 | +11.3 | 5.7 ± 2.2 | 94.3 ± 2.2 | 1146.7 ± 144.1 | 93.3 |
| Variant | Obj. ↓ | ΔObj% | CVR% ↓ | TW% ↑ | Time (ms) |
|---|---|---|---|---|---|
| CAHT Full | 1997.1 ± 199.4 | +0.0 | 4.9 ± 2.2 | 95.1 ± 2.2 | 51.1 |
| w/o RL | 1947.9 ± 184.7 | −2.5 | 4.0 ± 2.3 | 96.0 ± 2.3 | 49.0 |
| w/o dynamic masking | 6250.0 | +213.0 | 79.9 | 70.1 | 62.0 |
| Test Scale | Obj. ↓ | ΔObj% | CVR% ↓ | TW% ↑ | Setting |
|---|---|---|---|---|---|
| S (5, 50) | 1131.2 ± 170.9 | +13.0 | 4.9 ± 3.4 | 95.1 ± 3.4 | In-distr. |
| M (10, 100) | 1997.1 ± 199.4 | +12.1 | 4.9 ± 2.2 | 95.1 ± 2.2 | In-distr. |
| L (15, 150) | 2867.0 ± 269.4 | +11.3 | 5.7 ± 2.2 | 94.3 ± 2.2 | In-distr. |
| XL (20, 200) | 3797.7 ± 332.3 | +10.7 | 6.3 ± 2.0 | 93.7 ± 2.0 | Zero-shot |
| Scale | Embed. | Encoder | Assign | Seq. | Total |
|---|---|---|---|---|---|
| S | 0.18 | 2.81 | 13.01 | 7.44 | 23.44 |
| M | 0.19 | 5.57 | 35.74 | 14.55 | 56.04 |
| L | 0.23 | 9.79 | 68.14 | 21.92 | 100.08 |
| Method | Comp. | TW% ↑ | Wait (s) | Throughput | Solution Time (ms) |
|---|---|---|---|---|---|
| Nearest Greedy | 93 | 98.9 | 5.0 | 18.60 | 0.1 |
| ALNS (1 s) | 93 | 93.5 | 5.0 | 18.60 | 304.9 |
| POMO | 93 | 30.1 | 5.0 | 18.60 | 1.5 |
| CAHT | 93 | 61.3 | 5.0 | 18.60 | 3.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gong, S.; Varlamov, O. CAHT: A Constraint-Aware Heterogeneous Transformer for Real-Time Multi-Robot Task Allocation in Warehouse Environments. Algorithms 2026, 19, 312. https://doi.org/10.3390/a19040312
Gong S, Varlamov O. CAHT: A Constraint-Aware Heterogeneous Transformer for Real-Time Multi-Robot Task Allocation in Warehouse Environments. Algorithms. 2026; 19(4):312. https://doi.org/10.3390/a19040312
Chicago/Turabian StyleGong, Shengshuo, and Oleg Varlamov. 2026. "CAHT: A Constraint-Aware Heterogeneous Transformer for Real-Time Multi-Robot Task Allocation in Warehouse Environments" Algorithms 19, no. 4: 312. https://doi.org/10.3390/a19040312
APA StyleGong, S., & Varlamov, O. (2026). CAHT: A Constraint-Aware Heterogeneous Transformer for Real-Time Multi-Robot Task Allocation in Warehouse Environments. Algorithms, 19(4), 312. https://doi.org/10.3390/a19040312

