On the Use of Biased-Randomized Transformers as Data-Driven Heuristics for Agile Optimization
Abstract
1. Introduction
2. Related Work
3. Training Transformers with BRAs
4. Biased-Randomization of Trained Transformers
5. Illustrative Case Studies
5.1. Case Study 1: Closed TOP
5.2. Case Study: Bi-Dimensional Knapsack Problem
6. Computational Experiments
6.1. Computational Experiments for the TOP
6.2. Computational Experiments for the 2KP
7. Analysis of the Results
7.1. Results for the TOP
7.2. Results for the 2KP
7.3. Comparing Results Across Problems
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Toth, P.; Vigo, D. Vehicle Routing: Problems, Methods, and Applications; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2014. [Google Scholar]
- Salhi, S.; Thompson, J. An overview of heuristics and metaheuristics. In The Palgrave Handbook of Operations Research; Salhi, S., Boylan, J., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 353–403. [Google Scholar]
- Wu, Y.; Song, W.; Cao, Z.; Zhang, J.; Lim, A. Learning improvement heuristics for solving routing problems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 5057–5069. [Google Scholar] [CrossRef] [PubMed]
- Bengio, Y.; Lodi, A.; Prouvost, A. Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 2021, 290, 405–421. [Google Scholar] [CrossRef]
- Guerrero, A.; Juan, A.A.; Garcia-Sanchez, A.; Pita-Romero, L. Optimizing maintenance of energy supply systems in city logistics with heuristics and reinforcement learning. Mathematics 2024, 12, 3140. [Google Scholar] [CrossRef]
- Hottung, A.; Tierney, K. Neural large neighborhood search for routing problems. Artif. Intell. 2022, 313, 103786. [Google Scholar] [CrossRef]
- Resende, M.G.; Ribeiro, C.C. Optimization by GRASP; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Bamoumen, M.; Elfirdoussi, S.; Ren, L.; Tchernev, N. An efficient GRASP-like algorithm for the multi-product straight pipeline scheduling problem. Comput. Oper. Res. 2023, 150, 106082. [Google Scholar]
- Fernandez, S.A.; Carvalho, M.M.; Silva, D.G. A hybrid metaheuristic algorithm for the efficient placement of UAVs. Algorithms 2020, 13, 323. [Google Scholar] [CrossRef]
- Bresina, J. Heuristic-biased stochastic sampling. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, OR, USA, 4–8 August 1996; pp. 271–278. [Google Scholar]
- Martí, R.; Lozano, J.A.; Mendiburu, A.; Hernando, L. Multi-start methods. In Handbook of Heuristics; Springer: Berlin/Heidelberg, Germany, 2025; pp. 211–230. [Google Scholar]
- Gajula, V.; Rajathy, R. An agile optimization algorithm for vitality management along with fusion of sustainable renewable resources in microgrid. Energy Sources Part A Recovery Util. Environ. Eff. 2020, 42, 1580–1598. [Google Scholar]
- Peyman, M.; Copado, P.J.; Tordecilla, R.D.; Martins, L.d.C.; Xhafa, F.; Juan, A.A. Edge computing and IoT analytics for agile optimization in intelligent transportation systems. Energies 2021, 14, 6309. [Google Scholar] [CrossRef]
- Zhou, M.; Lin, X.; Liang, Y. Agile optimization framework: A framework for tensor operator optimization in neural network. Future Gener. Comput. Syst. 2024, 161, 432–444. [Google Scholar] [CrossRef]
- Liu, T.; Wang, Y.; Sun, J.; Tian, Y.; Huang, Y.; Xue, T.; Li, P.; Liu, Y. The role of transformer models in advancing blockchain technology: A systematic survey. Eng. Appl. Artif. Intell. 2026, 163, 112968. [Google Scholar] [CrossRef]
- Juan, A.A.; Faulin, J.; Ferrer, A.; Lourenço, H.R.; Barrios, B. MIRHA: Multi-start biased randomization of heuristics with adaptive local search for solving non-smooth routing problems. Top 2013, 21, 109–132. [Google Scholar] [CrossRef]
- Wang, F.; He, Q.; Li, S. Solving combinatorial optimization problems with deep neural network: A survey. Tsinghua Sci. Technol. 2024, 29, 1266–1282. [Google Scholar] [CrossRef]
- Chung, K.T.; Lee, C.K.; Tsang, Y.P. Neural combinatorial optimization with reinforcement learning in industrial engineering: A survey. Artif. Intell. Rev. 2025, 58, 130. [Google Scholar] [CrossRef]
- Cappart, Q.; Chételat, D.; Khalil, E.B.; Lodi, A.; Morris, C.; Veličković, P. Combinatorial optimization and reasoning with graph neural networks. J. Mach. Learn. Res. 2023, 24, 1–61. [Google Scholar]
- Joshi, C.K.; Cappart, Q.; Rousseau, L.M.; Laurent, T. Learning the travelling salesperson problem requires rethinking generalization. Constraints 2022, 27, 70–98. [Google Scholar] [CrossRef]
- Angioni, D.; Archetti, C.; Speranza, M.G. Neural combinatorial optimization: A tutorial. Comput. Oper. Res. 2025, 182, 107102. [Google Scholar] [CrossRef]
- Berto, F.; Hua, C.; Zepeda, N.G.; Hottung, A.; Wouda, N.A.; Lan, L.; Park, J.; Tierney, K.; Park, J. RouteFinder: Towards foundation models for vehicle routing problems. Trans. Mach. Learn. Res. 2025, 2025. [Google Scholar]
- Kool, W.; van Hoof, H.; Welling, M. Attention, learn to solve routing problems! In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Chi, M.; Pang, W.; Wu, X.; Zhao, P.; Li, Y.; Wang, T.; Qian, J.; Xiao, Y.; Wang, L.; Zhou, Y. A generalized neural solver based on LLM-guided heuristic evoluation framework for solving diverse variants of vehicle routing problems. Expert Syst. Appl. 2026, 296, 128876. [Google Scholar] [CrossRef]
- Fang, Z.; Wang, D.; Chen, J.; Wang, J.; Zhang, Z. UCPO: A universal constrained combinatorial optimization method via preference optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Singapore, 20–27 January 2026; Volume 40, pp. 36900–36908. [Google Scholar]
- Ma, L.; Hao, X.; Zhou, W.; He, Q.; Zhang, R.; Chen, L. A hybrid neural combinatorial optimization framework assisted by automated algorithm design. Complex Intell. Syst. 2024, 10, 8233–8247. [Google Scholar] [CrossRef]
- Guan, Q.; Cao, H.; Jia, L.; Yan, D.; Chen, B. Synergetic attention-driven transformer: A deep reinforcement learning approach for vehicle routing problems. Expert Syst. Appl. 2025, 274, 126961. [Google Scholar] [CrossRef]
- Bi, J.; Ma, Y.; Zhou, J.; Song, W.; Cao, Z.; Wu, Y.; Zhang, J. Learning to handle complex constraints for vehicle routing problems. Adv. Neural Inf. Process. Syst. 2024, 37, 93479–93509. [Google Scholar]
- Toenshoff, J.; Ritzert, M.; Wolf, H.; Grohe, M. Graph neural networks for maximum constraint satisfaction. Front. Artif. Intell. 2021, 3, 580607. [Google Scholar] [CrossRef]
- da Costa, P.R.d.O.; Rhuggenaath, J.; Zhang, Y.; Akcay, A. Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. PMLR 2020, 129, 465–480. [Google Scholar]
- Xiao, P.; Zhang, Z.; Chen, J.; Wang, J.; Zhang, Z. Neural combinatorial optimization for robust routing problem with uncertain travel times. Adv. Neural Inf. Process. Syst. 2024, 37, 134841–134867. [Google Scholar]
- Wang, Y.; Liang, X. Application of reinforcement learning methods combining graph neural networks and self-attention mechanisms in supply chain route optimization. Sensors 2025, 25, 955. [Google Scholar] [CrossRef]
- Ammouriova, M.; Guerrero, A.; Tsertsvadze, V.; Schumacher, C.; Juan, A.A. Using reinforcement learning in a dynamic team orienteering problem with electric batteries. Batteries 2024, 10, 411. [Google Scholar] [CrossRef]
- Guerrero, A.; Escoto, M.; Ammouriova, M.; Men, Y.; Juan, A.A. Using transformers and reinforcement learning for the team orienteering problem under dynamic conditions. Mathematics 2025, 13, 2313. [Google Scholar] [CrossRef]
- Yan, D.; Guan, Q.; Ou, B.; Yan, B.; Cao, H. Graph-driven deep reinforcement learning for vehicle routing problems with pickup and delivery. Appl. Sci. 2025, 15, 4776. [Google Scholar] [CrossRef]
- Chen, L.; Lu, K.; Rajeswaran, A.; Lee, K.; Grover, A.; Laskin, M.; Abbeel, P.; Srinivas, A.; Mordatch, I. Decision transformer: Reinforcement learning via sequence modeling. Adv. Neural Inf. Process. Syst. 2021, 34, 15084–15097. [Google Scholar]
- Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
- Gu, Y.; Cheng, Y.; Chen, C.P.; Wang, X. Proximal policy optimization with policy feedback. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 4600–4610. [Google Scholar] [CrossRef]
- Mısır, M.; Gunawan, A.; Vansteenwegen, P. Algorithm selection for the team orienteering problem. In Proceedings of the European Conference on Evolutionary Computation in Combinatorial Optimization (Part of EvoStar); Springer: Berlin/Heidelberg, Germany, 2022; pp. 33–45. [Google Scholar]
- Palomo-Martínez, P.J.; Salazar-Aguilar, M.A.; Albornoz, V.M. Formulations for the orienteering problem with additional constraints. Ann. Oper. Res. 2017, 258, 503–545. [Google Scholar] [CrossRef]
- Pisinger, D.; Toth, P. Knapsack problems. In Handbook of Combinatorial Optimization: Volume 1–3; Springer: Berlin/Heidelberg, Germany, 1998; pp. 299–428. [Google Scholar]
- Wiher, G.; Meister, C.; Cotterell, R. On decoding strategies for neural text generators. Trans. Assoc. Comput. Linguist. 2022, 10, 997–1012. [Google Scholar] [CrossRef]
- Zhu, Y.; Li, J.; Li, G.; Zhao, Y.; Jin, Z.; Mei, H. Hot or cold? Adaptive temperature sampling for code generation with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 437–445. [Google Scholar]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wen, J.R. Decoding and Deployment. In Large Language Models; Springer: Berlin/Heidelberg, Germany, 2025; pp. 259–301. [Google Scholar]
- Panadero, J.; Juan, A.A.; Bayliss, C.; Currie, C. Maximising reward from a team of surveillance drones: A simheuristic approach to the stochastic team orienteering problem. Eur. J. Ind. Eng. 2020, 14, 485–516. [Google Scholar] [CrossRef]
- Nessari, S.; Tavakkoli-Moghaddam, R.; Bakhshi-Khaniki, H.; Bozorgi-Amiri, A. A hybrid simheuristic algorithm for solving bi-objective stochastic flexible job shop scheduling problems. Decis. Anal. J. 2024, 11, 100485. [Google Scholar] [CrossRef]






| Research Line | Main Idea | Representative Methods |
|---|---|---|
| Classical metaheuristics | Constructive and improvement heuristics based on deterministic or stochastic search rules | GRASP, tabu search, simulated annealing, biased-randomized heuristics [11,16] |
| Representation learning for CO | Encoding combinatorial structure via graphs or attention mechanisms to parameterize policies | GNN-based models and attention architectures for routing and SAT-like problems [19,20,23] |
| RL-based combinatorial optimization | Sequential decision-making trained via reward maximization for constructive or improvement heuristics | Pointer networks, actor-critic routing, Decision Transformer-style policies [23,24] |
| Hybrid learning and metaheuristics | Integration of learned policies within classical search procedures for improved exploration | Neural LNS, learned improvement heuristics, preference-optimization frameworks [6,25,26] |
| Robust and adaptive optimization | Learning under uncertainty, dynamics, and distribution shifts in problem instances | Stochastic routing, dynamic VRP, robust neural policies [31,32,33] |
| Proposed approach (BRT) | Combination of transformer modeling with biased-randomization for controlled exploration | Probabilistic decoding over learned transformer policies with heuristic-guided diversification |
| Problem | MILP | Transf. | BR-Geo (0.5) | BR-Native | Temp. (5) | Top-k (5) | Top-p (0.9) |
|---|---|---|---|---|---|---|---|
| P1 | 11.72 | 11.46 | 11.68 | 11.46 | 11.68 | 11.46 | 11.46 |
| P2 | 6.28 | 6.28 | 6.28 | 6.28 | 6.28 | 6.28 | 6.28 |
| P3 | 7.71 | 7.32 | 7.69 | 7.32 | 7.69 | 7.32 | 7.32 |
| P4 | 7.47 | 7.06 | 7.06 | 7.06 | 7.06 | 7.06 | 7.06 |
| P5 | 8.73 | 8.19 | 8.73 | 8.19 | 8.73 | 8.19 | 8.19 |
| P6 | 10.81 | 10.71 | 10.71 | 10.71 | 10.71 | 10.71 | 10.71 |
| P7 | 9.30 | 9.30 | 9.30 | 9.30 | 9.30 | 9.30 | 9.30 |
| P8 | 11.04 | 10.61 | 10.99 | 10.75 | 10.99 | 10.75 | 10.61 |
| P9 | 11.20 | 10.92 | 11.20 | 10.94 | 11.10 | 10.92 | 10.94 |
| P10 | 11.54 | 11.52 | 11.52 | 11.52 | 11.54 | 11.52 | 11.52 |
| P11 | 16.55 | 16.55 | 16.55 | 16.55 | 16.55 | 16.55 | 16.55 |
| P12 | 19.85 | 18.46 | 19.18 | 19.36 | 19.37 | 18.68 | 18.48 |
| P13 | 26.21 | 22.96 | 24.43 | 24.52 | 24.58 | 23.90 | 24.35 |
| P14 | 16.26 | 16.03 | 16.18 | 16.18 | 16.26 | 16.03 | 16.03 |
| P15 | 14.32 | 10.89 | 13.16 | 11.85 | 13.12 | 12.35 | 11.44 |
| P16 | 17.63 | 17.45 | 17.63 | 17.63 | 17.63 | 17.63 | 17.63 |
| Average | 12.91 | 12.23 | 12.64 | 12.48 | 12.66 | 12.42 | 12.37 |
| Avg. Gap (%) | – | 4.51 | 1.65 | 3.23 | 1.60 | 3.86 | 3.44 |
| Avg. Time (s) | 68.46 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 |
| Transf. | BR-Geo (dyn) | BR-Native | ||||
|---|---|---|---|---|---|---|
| N Obj. | Gap | N Opt. | Gap | N Opt. | Gap | N Opt. |
| 20 | 4.27 | 0 | 0.00 | 10 | 0.01 | 9 |
| 35 | 2.34 | 1 | 0.21 | 6 | 0.35 | 6 |
| 50 | 2.86 | 0 | 0.18 | 3 | 0.21 | 4 |
| 70 | 0.53 | 1 | 0.07 | 6 | 0.00 | 10 |
| 85 | 1.57 | 0 | 0.10 | 4 | 0.14 | 5 |
| 100 | 1.09 | 0 | 0.26 | 4 | 0.28 | 1 |
| Average | 2.11 | 0.33 | 0.14 | 5.5 | 0.16 | 5.8 |
| Avg. Time | 0.08 | 0.09 | 0.09 | |||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Juan, A.A.; Guerrero, A.; Escoto, M.; Panadero, J.; Garcia-Sanchez, A.; Resende, M.G.C. On the Use of Biased-Randomized Transformers as Data-Driven Heuristics for Agile Optimization. Information 2026, 17, 504. https://doi.org/10.3390/info17050504
Juan AA, Guerrero A, Escoto M, Panadero J, Garcia-Sanchez A, Resende MGC. On the Use of Biased-Randomized Transformers as Data-Driven Heuristics for Agile Optimization. Information. 2026; 17(5):504. https://doi.org/10.3390/info17050504
Chicago/Turabian StyleJuan, Angel A., Antoni Guerrero, Marc Escoto, Javier Panadero, Alvaro Garcia-Sanchez, and Mauricio G. C. Resende. 2026. "On the Use of Biased-Randomized Transformers as Data-Driven Heuristics for Agile Optimization" Information 17, no. 5: 504. https://doi.org/10.3390/info17050504
APA StyleJuan, A. A., Guerrero, A., Escoto, M., Panadero, J., Garcia-Sanchez, A., & Resende, M. G. C. (2026). On the Use of Biased-Randomized Transformers as Data-Driven Heuristics for Agile Optimization. Information, 17(5), 504. https://doi.org/10.3390/info17050504

