A Modular ROS–MARL Framework for Cooperative Multi-Robot Task Allocation in Construction Digital Environments
Abstract
1. Introduction
- Framework: We develop a modular MARL–ROS framework for cooperative multi-robot task allocation, with interchangeable modules for environment modelling, task representation, robot interfaces, and learning.
- Formulation: We cast multi-robot task allocation as a MAPPO-based centralized-training, decentralized-execution problem and specify the state, observation, reward, and optimization structure in a robotics-compatible way.
- Digital testbed: We instantiate the framework in a simplified 2D Flatland benchmark and on TurtleBot3 robots, demonstrating an end-to-end pipeline from digital environment description and MARL training to decentralized execution.
- Evaluation: We compare the learned MAPPO policy with non-learning baselines in simulation and analyze the sim-to-real performance gap, illustrating how the framework can be used as a construction-oriented digital testbed.
2. Literature Review
2.1. MRTA Definitions and Solutions
- (a)
- Behavior-based approaches are simple to implement and perform well in dynamic environments; however, they lack adaptability to evolving conditions and may fail to achieve globally optimal task allocation.
- (b)
- Utility-based approaches provide precise and efficient task distribution through defined utility functions but require detailed system modeling, and their computational demands increase sharply with the number of tasks and agents.
- (c)
- Optimization-based approaches are capable of finding globally optimal solutions and offer strong adaptability, yet they are computationally intensive and often depend on accurate models that may not capture real-world variability.
- (d)
- Learning-based approaches enable adaptability and continuous improvement through experience, supporting effective task management after sufficient training. Nonetheless, they require large datasets and may exhibit unstable performance during early training stages.
- (e)
- Consensus- and cooperation-based approaches emphasize coordinated task distribution and conflict resolution, improving collective performance. However, they rely on continuous inter-agent communication, which becomes challenging in large-scale or bandwidth-limited systems.
2.2. MRTA Solved by MARL
2.3. Research Gaps
- (1)
- Simulation Environments Setup
- (2)
- MARL algorithms
- (3)
- Multi-agent collaboration
3. Methodology
3.1. Work Packages
3.1.1. Work Package 1: Communication Link
3.1.2. Work Package 2: Simulation Environments
3.1.3. Work Package 3: Scenario Definition
3.1.4. Work Package 4: MARL Algorithm
3.2. Data Flow
4. Case Study: Small-Scale Collaborative Mobile Robots
4.1. Work Package 1: Communication Link
4.2. Work Package 2: Simulation Environments
4.3. Work Package 3: Scenario Definition
| Algorithm 1. Pseudocode implementation of the iterative learning process for robot task assignment |
| // Initialization and Definition of Parameters
Initialize robots as Initialize Global Assigned Tasks as [] Initialize Global Reached Tasks as [], an empty list. Set Simulation Time and Time Goal Reached Define Action Space for each robot // Begin Iterative Training Steps for each episode do // Action Assignment for each robot in do Assign new goal from Action Space based on current state and police // Execution for each robot in do Attempt to reach assigned goal within the environment Record execution success or failure // State Update Update the state of the environment Track changes and update Simulation Time // Observation State Reporting Process and report the state as neural network input Update Observation State with new environment and robot statuses // Reward Calculation for each robot in do Calculate reward based on task completion, logic, schedule adherence // Episode Check if all goals in are reached or timeout is reached then Conclude the episode else Increment step count and continue with next step in the episode // Reset and Error Handling if episode concluded or connection error then if connection error then Terminate the simulator to prevent error propagation else Reset robots to initial positions for next episode End if End if End for // Dynamic Configuration and Adaptation Adjust parameters for agent count, efficiency, pick/place durations as needed Modify logic and weights governing task sequences for various scenarios |
- Publishing Actions: Nodes broadcast the actions generated by the algorithm at each time step t. These actions are transmitted through dedicated ROS topics (e.g., ), where i denotes the specific robot instance. The published message represents the goal position toward which the robot’s navigation stack autonomously maneuvers.
- Subscribing to Observations: Nodes simultaneously subscribe to observation topics that provide information about the robot’s state and the surrounding environment. These data serve as neural network inputs, informing decision-making and reward computation.
- Goal Command: Nodes publish to topic to issue action commands to the corresponding robot.
- State and Environment Data: Nodes subscribe to /scan and /odom to obtain spatial and motion data, ensuring situational awareness and collision avoidance.
- Simulation Time: Subscribing to /sim_time synchronizes training steps with the simulation clock.
- Idle Time: The /idle_time topic logs periods of robot inactivity, helping assess efficiency.
- Task Management: Nodes publish to to record task allocations and subscribe to to monitor task completion order, providing essential data for evaluating task sequencing and logical consistency.
4.4. Work Package 4: MARL Algorithm (MAPPO Under CTDE)
- State Observation: Data Flow 6 extracts the environment’s state from the communication link, comprising sensory readings and internal status data. These observations serve as the neural network inputs that represent the agents’ understanding of their surroundings.
- Training Outputs: Updated parameters, including value functions and policy adjustments, are communicated back to the network. This iterative feedback loop—action, evaluation, and update—forms the core of the MARL mechanism.
4.4.1. CTDE Architecture
4.4.2. PPO-Based Policy Objective
4.4.3. Value Function Loss and Entropy Regularization
4.4.4. Overall Multi-Agent Objective
4.4.5. Training Hyperparameters
4.5. Simulation Results
4.5.1. Simulation Results Reporting
4.5.2. Non-Training Baseline
4.5.3. Dynamic Task Allocation Training (Two-Robot Case)
4.5.4. Dynamic Task Allocation Training (Four-Robot Case)
4.5.5. Scalability Analysis
- (i)
- recovers near-optimal, logic-consistent allocations in the small-team case,
- (ii)
- adapts its task-distribution strategy when the team size is doubled from two to four robots without any change to the underlying learning architecture, and
- (iii)
- maintains cooperative, fully decentralized execution under higher agent densities while achieving substantial performance gains over uncoordinated baselines and robust behavior under stochastic navigation times.
4.6. Real-World Verification
Real-World Results and Discussion
- 1.
- Reduced realized speed (commanded vs. effective).Although commands capped at in sim, logs show the effective linear speed on TB3 was consistently lower due to actuator limits and floor friction (on hardware—0.19 m/s versus 0.22 m/s in sim (a 13% decrease)). In corridors, safety inflation and tighter turns further depress average speed.
- 2.
- Navigational hesitation from perception & localization.The robots frequently paused to reassess their trajectories due to sensor noise and minor localization errors, resulting in extended idle periods and reduced overall efficiency. LiDAR noise, beam dropouts, and small heading jitter trigger micro-stops and replans (safety checks, oscillation damping), inflating idle time. This is largely absent in deterministic sim runs. Concretely, each episode exhibited 3–6 pauses with a median duration of 0.5–1.2 s, alongside average 1.5 replan events. The measured LiDAR dropout rate was ~2–4% with processing latency around 40–70 ms, all of which cumulatively inflated idle time.
- 3.
- Environmental irregularities not captured in BIM→2D.Despite control, the laboratory environment introduced unpredictable surface variations and minor obstacles, such as uneven flooring and reflections, that were not fully represented in the simulated BIM-based model. Minor floor unevenness, glossy reflections, and small, transient obstacles (tripods, bags, cables) create conservative inflation and detours that do not exist in the Flatland world.
5. Limitations
6. Conclusions and Outlook
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Arents, J.; Greitans, M. Smart industrial robot control trends, challenges and opportunities within manufacturing. Appl. Sci. 2022, 12, 937. [Google Scholar] [CrossRef]
- Piyush, P.; Mohamed, E.; Gabriella, S.K. Identifying the Challenges to Adopting Robotics in the US Construction Industry. J. Constr. Eng. Manag. 2021, 147, 05021003. [Google Scholar] [CrossRef]
- Bloss, R. Collaborative robots are rapidly providing major improvements in productivity, safety, programing ease, portability and cost while addressing many new applications. Ind. Robot. Int. J. 2016, 43, 463–468. [Google Scholar] [CrossRef]
- Feng, C.; Xiao, Y.; Willette, A.; McGee, W.; Kamat, V.R. Vision guided autonomous robotic assembly and as-built scanning on unstructured construction sites. Autom. Constr. 2015, 59, 128–138. [Google Scholar] [CrossRef]
- Pedersen, M.R.; Nalpantidis, L.; Andersen, R.S.; Schou, C.; Bøgh, S.; Krüger, V.; Madsen, O. Robot skills for manufacturing: From concept to industrial deployment. Robot. Comput. Manuf. 2016, 37, 282–291. [Google Scholar] [CrossRef]
- Hosseini, M.R.; Martek, I.; Zavadskas, E.K.; Aibinu, A.A.; Arashpour, M.; Chileshe, N. Critical evaluation of off-site construction research: A Scientometric analysis. Autom. Constr. 2018, 87, 235–247. [Google Scholar] [CrossRef]
- Khamis, A.; Hussein, A.; Elmogy, A. Multi-robot task allocation: A review of the state-of-the-art. In Cooperative Robots and Sensor Networks; Springer International Publishing: Cham, Switzerland, 2015; pp. 31–51. [Google Scholar]
- Badreldin, M.; Hussein, A.; Khamis, A. A comparative study between optimization and market-based approaches to multi-robot task allocation. Adv. Artif. Intell. 2013, 2013, 56524. [Google Scholar] [CrossRef]
- Parker, L.E. Task-oriented multi-robot learning in behavior-based systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS ’96, Osaka, Japan, 4–8 November 1996. [Google Scholar]
- Chakraa, H.; Guérin, F.; Leclercq, E.; Lefebvre, D. Optimization techniques for Multi-Robot Task Allocation problems: Review on the state-of-the-art. Robot. Auton. Syst. 2023, 168, 104492. [Google Scholar] [CrossRef]
- Clifton, J.; Laber, E. Q-learning: Theory and applications. Annu. Rev. Stat. Appl. 2020, 7, 279–301. [Google Scholar] [CrossRef]
- Cai, Q.; Pan, L.; Tang, P. Generalized deterministic policy gradient algorithms. arXiv 2018, arXiv:1807.03708. [Google Scholar]
- Lansing, E. Optimizing Production Manufacturing using Reinforcement Learning Sridhar Mahadevan and Georgios Theo-charous. Available online: https://cdn.aaai.org/FLAIRS/1998/FLAIRS98-072.pdf (accessed on 4 May 2024).
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Rob. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
- Isele, D.; Rahimi, R.; Cosgun, A.; Subramanian, K.; Fujimura, K. navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018. [Google Scholar]
- Wang, N.; Zhou, W.; Tian, Q.; Hong, R.; Wang, M.; Li, H. Multi-cue Correlation Filters for Robust Visual Tracking. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4844–4853. [Google Scholar] [CrossRef]
- Xu, X.; Garcia de Soto, B. Reinforcement learning with construction robots: A review of research areas, challenges and opportunities. In Proceedings of the International Symposium on Automation and Robotics in Construction (ISARC), Bogotá, Colombia, 13–15 July 2022. [Google Scholar] [CrossRef]
- Conde, M. Organization based multiagent architecture for distributed environments. Doctoral dissertation, Universidad de Salamanca, Salamanca, Spain, 2010. [Google Scholar]
- Edmondson, J.; Schmidt, D. Multi-agent distributed adaptive resource allocation (MADARA). Int. J. Commun. Netw. Distrib. Syst. 2010, 5, 229–245. [Google Scholar] [CrossRef]
- Ontanon, S.; Synnaeve, G.; Uriarte, A.; Richoux, F.; Churchill, D.; Preuss, M. A survey of real-time strategy game AI research and competition in StarCraft. IEEE Trans. Comput. Intell. AI Games 2013, 5, 293–311. [Google Scholar] [CrossRef]
- Turner, C.J.; Oyekan, J.; Stergioulas, L.; Griffin, D. Utilizing industry 4.0 on the construction site: Challenges and opportunities. IEEE Trans. Ind. Inform. 2021, 17, 746–756. [Google Scholar] [CrossRef]
- Zhu, X.; Xu, J.; Ge, J.; Wang, Y.; Xie, Z. Multi-task multi-agent reinforcement learning for real-time scheduling of a dual-resource flexible job shop with robots. Processes 2023, 11, 267. [Google Scholar] [CrossRef]
- Chipade, V.S. Collaborative Task Allocation and Motion Planning for Multi-Agent Systems in the Presence of Adversaries. Doctoral Dissertation, University of Michigan, Ann Arbor, MI, USA, 2022. [Google Scholar]
- Gerkey, B.P.; Matarić, M.J. A formal analysis and taxonomy of task allocation in multi-robot systems. Int. J. Robot. Res. 2004, 23, 939–954. [Google Scholar] [CrossRef]
- Nunes, E.; Manner, M.; Mitiche, H.; Gini, M. A taxonomy for task allocation problems with temporal and ordering constraints. Robot. Auton. Syst. 2017, 90, 55–70. [Google Scholar] [CrossRef]
- Calzavara, M.; Faccio, M.; Granata, I. Multi-objective task allocation for collaborative robot systems with an Industry 5.0 human-centered perspective. Int. J. Adv. Manuf. Technol. 2023, 128, 297–314. [Google Scholar] [CrossRef]
- Gmytrasiewicz, P.J.; Doshi, P. A framework for sequential planning in multi-agent settings. J. Artif. Intell. Res. 2005, 24, 49–79. [Google Scholar] [CrossRef]
- Choudhury, S.; Gupta, J.; Kochenderfer, M.; Sadigh, D.; Bohg, J. Dynamic multi-robot task allocation under uncertainty and temporal constraints. Auton. Robot. 2022, 46, 231–247. [Google Scholar] [CrossRef]
- Robu, V. Market-based task allocation and control for distributed logistics. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, Utrecht, The Netherlands, 25–29 July 2005; p. 1383. [Google Scholar]
- Liu, L.; Shell, D. Optimal market-based multi-robot task allocation via strategic pricing. In Proceedings of the Robotics: Science and Systems Conference, Berlin, Germany, 24–28 June 2013. [Google Scholar]
- Tang, F.; Parker, L.E. A complete methodology for generating multi-robot task solutions using ASyMTRe-D and market-based task allocation. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Rome, Italy, 10–14 April 2007; pp. 3351–3358. [Google Scholar]
- Hussein, A.; Khamis, A. Market-based approach to Multi-robot Task Allocation. In Proceedings of the 2013 International Conference on Individual and Collective Behaviors in Robotics (ICBR), Sousse, Tunisia, 15–17 December 2013; pp. 69–74. [Google Scholar]
- Parker, L.E. L-ALLIANCE: Task-oriented multi-robot learning in behavior-based systems. Adv. Robot. 1996, 11, 305–322. [Google Scholar] [CrossRef]
- Seenu, N.; Kuppan Chetty, R.M.; Ramya, M.M.; Janardhanan, M.N. Review on state-of-the-art dynamic task allocation strategies for multiple-robot systems. Ind. Rob. 2020, 47, 929–942. [Google Scholar]
- Liu, F.; Liang, S.; Xian, X. Multi-robot task allocation based on utility and distributed computing and centralized determination. In Proceedings of the 27th Chinese Control and Decision Conference (CCDC), Qingdao, China, 23–25 May 2015. [Google Scholar]
- Mazdin, P.; Barcis, M.; Hellwagner, H.; Rinner, B. Distributed task assignment in multi-robot systems based on information utility. In Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, China, 20–21 August 2020. [Google Scholar]
- Shelkamy, M.; Elias, C.M.; Mahfouz, D.M.; Shehata, O.M. Comparative analysis of various optimization techniques for solving multi-robot task allocation problem. In Proceedings of the 2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES), Giza, Egypt, 24–26 October 2020. [Google Scholar]
- Majumder, A.; Majumder, A.; Bhaumik, R. Teaching–learning-based optimization algorithm for path planning and task allocation in multi-robot plant inspection system. Arab. J. Sci. Eng. 2021, 46, 8999–9021. [Google Scholar] [CrossRef]
- Park, B.; Kang, C.; Choi, J. Cooperative Multi-Robot Task Allocation with Reinforcement Learning. NATO Adv. Sci. Inst. Ser. E Appl. Sci. 2021, 12, 272. [Google Scholar] [CrossRef]
- Kim, I.; Morrison, J.R. Learning based framework for joint task allocation and system design in stochastic multi-UAV systems. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June 2018; pp. 324–334. [Google Scholar]
- Jin, L.; Li, S.; La, H.M.; Zhang, X.; Hu, B. Dynamic task allocation in multi-robot coordination for moving target tracking: A distributed approach. Automatica 2019, 100, 75–81. [Google Scholar] [CrossRef]
- Bischoff, E.; Meyer, F.; Inga, J.; Hohmann, S. Multi-robot task allocation and scheduling considering cooperative tasks and precedence constraints. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020. [Google Scholar]
- Liu, X.F.; Lin, B.C.; Zhan, Z.H.; Jeon, S.W.; Zhang, J. An efficient ant colony system for multi-robot task allocation with large-scale cooperative tasks and precedence constraints. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, 4–7 December 2021. [Google Scholar]
- Alitappeh, R.J.; Jeddisaravi, K. Multi-robot exploration in task allocation problem. Appl. Intell. 2022, 52, 2189–2211. [Google Scholar] [CrossRef]
- Liu, Z.; Chen, B.; Zhou, H.; Koushik, G.; Hebert, M.; Zhao, D. MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments. In Proceedings of the IROS 2020 International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 25–29 October 2020. [Google Scholar]
- Agrawal, A.; Bedi, A.; Manocha, D. RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments. arXiv 2023, arXiv:2209.05738. [Google Scholar] [CrossRef]
- Lee, D.; Lee, S.; Masoud, N.; Krishnan, M.; Li, V.C. Digital twin-driven deep reinforcement learning for adaptive task allocation in robotic construction. Adv. Eng. Inform. 2022, 53, 101710. [Google Scholar] [CrossRef]
- Metelli, A.M. Configurable Environments in Reinforcement Learning: An Overview. In Special Topics in Information Technology; Piroddi, L., Ed.; Springer International Publishing: Cham, Switzerland, 2022; pp. 101–113. [Google Scholar]
- Beattie, C.; Leibo, J.Z.; Teplyashin, D.; Ward, T.; Wainwright, M.; Küttler, H.; Lefrancq, A.; Green, S.; Valdés, V. DeepMind Lab. arXiv 2016, arXiv:1612.03801. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar] [CrossRef]
- Abioye, S.O.; Oyedele, L.O.; Akanbi, L.; Ajayi, A.; Delgado, J.M.D.; Bilal, M.; Akinade, O.O.; Ahmed, A. Artificial intelligence in the construction industry: A review of present status, opportunities and future challenges. J. Build. Eng. 2021, 44, 103299. [Google Scholar] [CrossRef]
- Liu, S.; Liu, P. Benchmarking and optimization of robot motion planning with motion planning pipeline. Int. J. Adv. Manuf. Technol. 2021, 118, 949–961. [Google Scholar] [CrossRef]
- Kayhan, B.M.; Yildiz, G. Reinforcement learning applications to machine scheduling problems: A comprehensive literature review. J. Intell. Manuf. 2023, 34, 905–929. [Google Scholar] [CrossRef]
- de Woillemont, P.L.P.; Labory, R.; Corruble, V. Automated Play-Testing through RL Based Human-Like Play-Styles Generation. Proc. AAAI Conf. Artif. Intell. Interact. Digit. Entertain. 2022, 18, 146–154. [Google Scholar] [CrossRef]
- Brito, B.; Everett, M.; How, J.P.; Alonso-Mora, J. Where to go next: Learning a subgoal recommendation policy for navigation in dynamic environments. IEEE Robot. Autom. Lett. 2021, 6, 4616–4623. [Google Scholar] [CrossRef]
- Xie, J.; Ge, F.; Cui, T.; Wang, X. A virtual test and evaluation method for fully mechanized mining production system with different smart levels. Int. J. Coal Sci. Technol. 2022, 9, 41. [Google Scholar] [CrossRef]
- Conway, B.A. A Survey of Methods Available for the Numerical Optimization of Continuous Dynamic Systems. J. Optim. Theory Appl. 2012, 152, 271–306. [Google Scholar] [CrossRef]
- Gazi, V.; Passino, K.M. Swarm Stability and Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Green, S.A.; Billinghurst, M.; Chen, X.; Chase, J.G. Human-Robot Collaboration: A Literature Review and Augmented Reality Approach in Design. Int. J. Adv. Robot. Syst. 2008, 5, 1–18. [Google Scholar] [CrossRef]
- Fjeldstad, D.; Snow, C.C.; Miles, R.E.; Lettl, C. The architecture of collaboration. Strateg. Manag. J. 2012, 33, 734–750. [Google Scholar] [CrossRef]
- Al-Hamadani, M.N.A.; Fadhel, M.A.; Alzubaidi, L.; Harangi, B. Reinforcement Learning Algorithms and Applications in Healthcare and Robotics: A Comprehensive and Systematic Review. Sensors 2024, 24, 2461. [Google Scholar] [CrossRef] [PubMed]
- Bommasani, R. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
- Choi, H.; Crump, C.; Duriez, C.; Elmquist, A.; Hager, G.; Han, D.; Hearl, F.; Hodgins, J.; Jain, A.; Leve, F.; et al. On the use of simulation in robotics: Opportunities, challenges, and suggestions for moving forward. Proc. Natl. Acad. Sci. USA 2021, 118, e1907856118. [Google Scholar] [CrossRef] [PubMed]
- You, H.; Zhou, T.; Zhu, Q.; Ye, Y.; Du, E.J. Embodied AI for Dexterity-Capable Construction Robots: Dexbot Framework. Adv. Eng. Inform. 2024, 62, 102572. [Google Scholar] [CrossRef]
- Silver, T.; Chitnis, R. PDDLGym: Gym Environments from PDDL Problems. arXiv 2020, arXiv:2002.06432. [Google Scholar] [CrossRef]
- Gomes, G.; Vidal, C.A.; Cavalcante-Neto, J.B.; Nogueira, Y.L. A modeling environment for reinforcement learning in games. Entertain. Comput. 2022, 43, 100516. [Google Scholar] [CrossRef]
- Jonassen, D.H.; Rohrer-Murphy, L. Activity theory as a framework for designing constructivist learning environments. Educ. Technol. Res. Dev. 1999, 47, 61–79. [Google Scholar] [CrossRef]
- Perel, M.; Elkin-Koren, N. BLACK BOX TINKERING: Beyond transparency in algorithmic enforcement. SSRN Electron. J. 2016, 69, 181. [Google Scholar]
- Yu, C.; Velu, A.; Vinitsky, E.; Gao, J.; Wang, Y.; Bayen, A.; Wu, Y. The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games. ICLR 2022. arXiv 2021, arXiv:2103.01955. [Google Scholar]
- Flatland. A 2D Robot Simulator for ROS. Github. Available online: https://github.com/avidbots/flatland (accessed on 11 November 2025).
- Grossberg, S. Recurrent neural networks. Sch. J. 2013, 8, 1888. [Google Scholar] [CrossRef]

















| Problem Definition | Application Category | Possible Solutions |
|---|---|---|
| Single-task robots (ST) | Multiple traveling salesman problem (mTSP) | Market-Based Approaches |
| Multi-task robots (MT) | Vehicle Routing Problem (VRP) | Behavior-Based Approaches |
| Single-robot tasks (SR) | Location routing problem (LRP) | Utility-Based Approaches |
| Multi-robot tasks (MR) | Job scheduling problem (JSP) | Optimization-Based Approaches |
| Instantaneous assignment (IA) | Linear assignment problem (LAP) | Learning-Based Approaches |
| Time-extended assignment (TA) | Consensus and Cooperation-Based Approaches: |
| Possible Solutions | Pros | Cons | References |
|---|---|---|---|
| Market-Based Approaches | Flexibility Scalability | Overhead Instability | [31,32,33] |
| Behavior-Based Approaches | Simplicity Robustness | Inflexibility Efficiency | [9,34] |
| Utility-Based Approaches | Optimality Precision | Complexity Computation | [35,36,37] |
| Optimization-Based Approaches | Comprehensive Adaptable | Computational Demand Rigidity | [8,10,38] |
| Learning-Based Approaches | Adaptive Learning Generalization | Initial Learning Curve Predictability | [39,40,41] |
| Consensus and Cooperation-Based Approaches | Conflict Resolution Cooperative | Communication Requirements Complexity | [42] |
| Work Package | Characteristics |
|---|---|
| Simulation Environment | Simulator: ROS Flatland. Data Flow 1: 2D laboratory environment reconstruction from BIM. Data Flow 2: Multi-robot setup using multiple TurtleBot3 platforms for testing collaborative navigation. |
| Scenario Definition | Defines the multi-agent task allocation problem and corresponding robot actions. Data Flow 3: Generation of navigation sequences for coordinated robot movement. Data Flow 4: Definition of the iterative learning process and continuous update of task logic. |
| Communication Link | ROS middleware enables inter-agent and environment communication through publishers and subscribers. Data Flow 5: Exchange of task commands, state data, and control feedback among ROS nodes. Data Flow 6: Bidirectional data exchange for MARL training, state observations as inputs and policy updates as outputs. |
| Algorithm | Reinforcement learning algorithm: MAPPO. Data Flow 6: Processing observation-state inputs and producing training outputs for policy refinement. Data Flow 7: Decision command processing using centralized training and decentralized execution to maximize group performance. |
| Simulation-to-Reality Translation | Data Flow 8: Transfers optimized policies and decision strategies from the simulation environment to real-world robots, ensuring consistency and reliability between virtual training and physical execution. |
| External Scripts | Format | Functionalities | ROS Packages Used |
|---|---|---|---|
| Environment Model | .SDF | The environmental model that robots work in | /Gazebo; /map_server; /Rviz |
| Robot Model | .URDF | Robot model to be imported into the ROS for simulation | /amcl/; /move_base; /robot_state_publisher; /joint_state_publisher; |
| Scheduler | .XML | A fundamental schedule for robotic tasks | /rl_task_allocation_manager |
| RL Training Script | Python | The training method to find the optimal policy of robotic task allocation strategy within the scope of defined constraints and rewards | /rl_manager; /rl_task_allocation_manager; |
| State | Formula Representation |
|---|---|
| Step number | |
| Initial Position | |
| Robot ID | |
| Current Task and Old Task | |
| Current and Old Goals | |
| Robot Status | |
| Task Category | |
| Navigation Duration | |
| Step Duration |
| Observations Space | Format | Category | Usage |
|---|---|---|---|
| Global assigned tasks | A list of assigned tasks | Global | Allocation strategy |
| Task status | Task finished list: (Bool) True or false | Global | Track completion |
| Global reached goals | A list of tasks reached with orders and time | Global | Check logic |
| Logic correctness | Logic correct flag: (Bool) True or False | Global | Check logic |
| Episode duration | A list of duration of steps | Global | Optimization |
| Idle time | After reaching the goal, time is used to wait for the other robot to finish in a step. | Local | Use ratio |
| Navigation time | The time to reach the goal in a step | Local | Path efficiency |
| Collision | Whether there is a collision: (Bool) True or False | Local | Fatal error |
| Hyperparameter | Symbol | Value | Description |
|---|---|---|---|
| Discount factor | γ | 0.98 | Weighting of future rewards in return estimation |
| PPO clipping parameter | ϵ | 0.20 | Trust-region bound on policy ratio (epo_eps) |
| Entropy coefficient | 0.01 | Weight of entropy bonus (entropy_factor) | |
| Value loss coefficient | 5.0 | Weight of value-function loss (value_factor) | |
| Policy/value learning rate | – | Adam step size for actor and critic (learning_rate) | |
| Learning-rate decay factor | – | 0.95/10,000 steps | Multiplicative LR decay and interval (learning_rate_decay, learning_rate_decay_steps) |
| Batch size | – | 1000 | Number of samples per PPO update (batch_size) |
| Minibatch size | – | 256 | Size of SGD minibatches (mini_batch_size) |
| PPO epochs per update | – | 10 | Passes over the collected batch (epochs) |
| Control frequency | – | 5 Hz | Environment update rate (HZ) |
| Episode duration | – | 120 s | Maximum simulated episode time (episode_duration) |
| Total environment steps | – | 70,000 | Target number of steps per training run (num_env_steps) |
| Number of parallel threads | – | 8 | Parallel environments used for data collection (threads) |
| Algorithms and Scenario | Duration (avg-s) | Task Assigned R0 | Task Assigned R1 | Task Assigned R2 | Task Assigned R3 |
|---|---|---|---|---|---|
| No training for 2 robots | 98.75 | Random (Average task allocation counts num 4.6) | |||
| GA 2 robots with logic | 40.2 | ‘A1’, ‘A2’, ‘B2’, ‘B1’ | ‘A3’, ‘A4’, ‘B3’, ‘B4’ | NA | NA |
| No training for 2 robots | 112.27 | Random (Average task allocation counts num 2.6) | |||
| GA 4 robots with logic | 43.37 | ‘B3’, ‘B4’ | ‘B2’, ‘B1’ | ‘A3’, ‘A4’, | ‘A2’, ‘A1’ |
| Agent # | No Training | GA | MAPPO |
|---|---|---|---|
| 98.75 | 40.2 | 42 | |
| 112.27 | 43.37 | 55 |
| Algorithms and Scenario | Duration (s) | Task Assigned R0 | Task Assigned R1 |
|---|---|---|---|
| No training 2 robots | 98.75 | random | random |
| GA 2 robots with logic | 40.2 | ‘A1’, ‘A2’, ‘B2’, ‘B1’ | ‘A3’, ‘A4’, ‘B3’, ‘B4’ |
| MARL 2 robots | 42 | ‘A1’, ‘A2’, ‘B2’, ‘B1’ | ‘A3’, ‘A4’, ‘B3’, ‘B4’ |
| Real-world 2D Nav | 60 | ‘A1’, ‘A2’, ‘B2’, ‘B1’ | ‘A3’, ‘A4’, ‘B3’, ‘B4’ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xu, X.; Prieto, S.A.; García de Soto, B. A Modular ROS–MARL Framework for Cooperative Multi-Robot Task Allocation in Construction Digital Environments. Buildings 2026, 16, 539. https://doi.org/10.3390/buildings16030539
Xu X, Prieto SA, García de Soto B. A Modular ROS–MARL Framework for Cooperative Multi-Robot Task Allocation in Construction Digital Environments. Buildings. 2026; 16(3):539. https://doi.org/10.3390/buildings16030539
Chicago/Turabian StyleXu, Xinghui, Samuel A. Prieto, and Borja García de Soto. 2026. "A Modular ROS–MARL Framework for Cooperative Multi-Robot Task Allocation in Construction Digital Environments" Buildings 16, no. 3: 539. https://doi.org/10.3390/buildings16030539
APA StyleXu, X., Prieto, S. A., & García de Soto, B. (2026). A Modular ROS–MARL Framework for Cooperative Multi-Robot Task Allocation in Construction Digital Environments. Buildings, 16(3), 539. https://doi.org/10.3390/buildings16030539

