Optimizing Coalition Formation Strategies for Scalable Multi-Robot Task Allocation: A Comprehensive Survey of Methods and Mechanisms

Arjun, Krishna; Parlevliet, David; Wang, Hai; Yazdani, Amirmehdi

doi:10.3390/robotics14070093

Open AccessReview

Optimizing Coalition Formation Strategies for Scalable Multi-Robot Task Allocation: A Comprehensive Survey of Methods and Mechanisms

by

Krishna Arjun

,

David Parlevliet

^*

,

Hai Wang

and

Amirmehdi Yazdani

School of Engineering and Energy, Murdoch University, Perth, WA 6150, Australia

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(7), 93; https://doi.org/10.3390/robotics14070093

Submission received: 1 May 2025 / Revised: 14 June 2025 / Accepted: 27 June 2025 / Published: 2 July 2025

(This article belongs to the Section AI in Robotics)

Download

Browse Figures

Versions Notes

Abstract

In practical applications, the utilization of multi-robot systems (MRS) is extensive and spans various domains such as search and rescue operations, mining operations, agricultural tasks, and warehouse management. The surge in demand for MRS has prompted extensive exploration of Multi-Robot Task Allocation (MRTA). Researchers have devised a range of methodologies to tackle MRTA problems, aiming to achieve optimal solutions, yet there remains room for further enhancements in this field. Among the complex challenges in MRTA, the identification of an optimal coalition formation (CF) solution stands out as one of the (Nondeterministic Polynomial) NP-hard problems. CF pertains to the effective coordination and grouping of agents or robots for efficient task execution, achieved through optimal task allocation. In this context, this paper delivers a succinct overview of dynamic task allocation and CF strategies. It conducts a comprehensive examination of diverse strategies employed for MRTA. The analysis encompasses the advantages, disadvantages, and comparative assessments of these strategies with a focus on CF. Furthermore, this study introduces a novel classification system for prominent task allocation methods and compares these methods with simulation analysis. The fidelity and effectiveness of the proposed CF approach are substantiated through comparative assessments and simulation studies.

Keywords:

multi-robot; multi-robot task allocation; coalition formation; reinforcement learning; convergence; evolutionary optimization

1. Introduction

In the era marked by the emergence of Industry 4.0, there has been a remarkable increase in the demand for autonomous systems and Artificial Intelligence. The field of robotics has experienced widespread popularity across various sectors of society. As a result, the concept of multi-robot systems (MRS) has gained significant attention, surpassing that of individual robots. A multi-robot system comprises a group of robots working together to achieve a predetermined objective. A related term, like MRS, is the multi-agent system (MAS), which consists of a collective of agents collaborating to accomplish a specific goal [1]. The key distinction between MRS and MAS lies in the former involving physical robots as agents, whereas the latter incorporates agents represented as software entities. MRS offer numerous advantages over individual robots:

(a): MRS enables parallel task execution, leading to accelerated goal attainment.
(b): Heterogeneity in robot capabilities can be accommodated within MRS.
(c): MRS effectively handles tasks distributed across large spatial domains.
(d): Inherent robustness in fault tolerance is a characteristic feature of MRS.

The taxonomy of MRS divided into two main categories: homogeneous assemblies, consisting of robots with similar capabilities, and heterogeneous assemblies, comprising robots with varying proficiencies. Additionally, this classification extends to cooperative ensembles, where robots collaborate to achieve goals, and competitive configurations, where robots compete for dominance. In recent times, there has been a significant increase in the demand for MRS in practical applications [2]. This growing demand has led to the integration of MRS into various domains, including mining, warehouse logistics, agricultural field operations, and even recreational and entertainment sectors. However, this widespread adoption of MRS spans across numerous sectors globally [3] as shown in Figure 1. As this trend continues, addressing the existing challenges inherent to MRS becomes increasingly critical.

MRS pose a range of intricate challenges across technological, logistical, and operational spheres, as depicted in Figure 2. Key challenges include coordinating robot actions to avoid collisions or redundancy, ensuring reliable communication, and efficient task allocation [10]. Moreover, addressing faults, maintaining accurate environmental perception, and managing shared resources while upholding privacy, security, and ethical humanrobot interactions are significant hurdles. Energy efficiency is crucial, especially for battery-powered robots. These challenges collectively render the development and operation of multi-robot systems multifaceted.

Task allocation stands out as particularly crucial among these challenges due to its pivotal role in optimizing resource utilization, enhancing system efficiency, and facilitating effective collaboration among robots. Efficient task allocation in multi-robot systems offers various benefits. Firstly, it optimizes resource usage, leading to cost savings and improved system performance. Secondly, by ensuring tasks are executed effectively and promptly, task allocation enhances overall system efficiency. Thirdly, it facilitates scalability, enabling seamless integration of additional robots or tasks without compromising efficiency [11]. Moreover, task allocation contributes to system robustness by allowing adaptation to dynamic environments and changing task demands. Lastly, it promotes collaboration among robots, reducing conflicts and redundancy while enhancing system coordination.

This article is a revised and expanded version of a paper entitled “Analyzing multi-robot task allocation and coalition formation methods: A comprehensive study,” which was presented at the 1st International Conference on Advanced Robotics, Control, and Artificial Intelligence (ARCAI), Australia, December 2024 [12]. The rest of this paper offers a concise taxonomy of various strategies utilized in MRTA and CF. Section 2 describes and reviews MRTA and CF that this paper discusses. In Section 3, detailed insights into each strategy are provided along with a comparative analysis of different approaches within each strategy, and a higher-level comparison of strategies is conducted based on relevant factors. Section 4 focuses on algorithm formulation and presents simulation results, while Section 5 serves as the discussion segment, summarizing the comprehensive review and highlighting the key findings. Section 6 gives our conclusion.

2. MRTA and CF

2.1. Multi-Robot Task Allocation (MRTA)

Task allocation within a multi-robot system involves determining which robots should perform specific tasks to achieve overarching system objectives, aiming for coordinated team behavior. While some systems, like certain biologically inspired robotic setups, exhibit coordinated team behavior through local interactions among team members and the environment, known as implicit or emergent coordination [13], some others rely on explicit or intentional cooperation.

Achieving truly optimal solutions in MRTA remains a significant challenge due to a combination of theoretical complexity and real-world constraints. At its core, MRTA is an NP-hard combinatorial optimization problem, where the number of possible task-to-robot assignments grows exponentially with the size of the team and task set. This makes exhaustive search or brute-force optimization computationally impractical, particularly in dynamic or time-sensitive environments. Additionally, real-world applications introduce uncertainty through factors such as fluctuating network conditions, robot hardware limitations, sensor noise, and unexpected environmental changes all of which can disrupt carefully planned task allocations. Another key challenge lies in the heterogeneity of both robots and tasks; balancing diverse capabilities, energy constraints, and spatial considerations further complicates the search for optimality. Decentralized approaches, though more scalable and robust, often lack global knowledge, which can result in suboptimal, locally greedy decisions. Moreover, MRTA often involves multi-objective optimization such as minimizing travel distance, maximizing task quality, and ensuring balanced workload distribution which leads to trade-offs where improving one metric may degrade another. Communication limitations, task dependencies, real-time decision-making needs, and the lack of unified metrics for success across heterogeneous teams all contribute to the difficulty [13]. As a result, most modern MRTA methodologies prioritize robustness, scalability, and adaptability over pure optimality. In conclusion, while optimal MRTA solutions are theoretically desirable, the practical demands of real-world deployment necessitate a shift toward heuristic, learning-based, or market-inspired strategies that provide good enough performance under diverse and dynamic conditions.

To address the challenge of meeting deadlines in multi-robot task assignment, a recent study [14] proposed distributed performance impact algorithms that allow each robot to independently compute its initial task allocation. The initial assignment is further refined using local search strategies, resulting in both decoupled and coupled task assignment variants. These algorithms were shown to outperform traditional methods by maximizing mission success and reducing total service time, especially in time-critical scenarios.

A recent study also introduced a co-evolutionary multipopulational genetic algorithm that efficiently balances multiple objectives in MRTA problems, demonstrating improved performance in cases with vehicle-specific constraints and limited target reachability. The study also introduces a target merging strategy and a longest path-based algorithm to efficiently handle unreachable targets and minimize the number of required vehicles. By incorporating marginal-cost-based insertion principles, the proposed methods ensure bounded travel times while maintaining solution quality. Together, these approaches enable scalable and effective task allocation in constrained multi-vehicle environments [15].

Another recent study [16] proposed an efficient hybrid multi-population genetic algorithm to address the multi-UAV task assignment problem in consumer electronics logistics, where UAVs must deliver items to various locations while operating under capacity and time constraints. The work integrates advanced genetic operators with multiple local search strategies such as improved 2-opt, 1-opt, and interchange mechanisms to enhance both solution diversity and convergence speed. By allowing individual chromosomes to generate offspring independently and adapting local search based on chromosome similarity thresholds, the algorithm significantly outperforms conventional methods like CMGA and MMA.

The study reported in [17] proposed a categorization framework for MRTA problems based on three axes. The first axis distinguishes between single-task (ST) robots, capable of executing only one task at a time, and multi-task (MT) robots, capable of handling multiple tasks simultaneously. The second axis categorizes tasks as Single-Robot (SR) tasks, requiring one robot for completion, or Multi-Robot (MR) tasks, necessitating the involvement of multiple robots. The third axis differentiates between Instantaneous Assignment (IA) problems, which involve immediate task allocation without consideration for future assignments, and Time-Extended Assignment (TA) problems, where robots are allocated, tasks based on a predetermined schedule encompassing both current and future allocations [17]. The study delineates the MRTA quandary into eight distinct typologies known as the two-level task allocation taxonomy (iTax) classification, as illustrated in Figure 3 [18]. This scheme rests on the foundation of interdependent resources and constraints. Employing the iTax framework, the MRTA conundrum was segmented into four distinct categories: those devoid of dependencies, those governed by scheduled dependencies, those subjected to cross-scheduled dependencies, and finally, those characterized by complex dependencies [19].

The limitations of the iTax have driven the exploration of alternative strategies like auction-based and optimization-based approaches to tackle MRTA problems. iTax’s complexity arises from its division of task allocation into high-level and low-level processes, potentially complicating system design and coordination. Scalability issues may arise with iTax as the number of robots and tasks increases, leading to inefficiencies in allocation and coordination. Additionally, iTax may yield suboptimal solutions due to the separation of high-level decisions from low-level motion planning considerations. Consequently, auction-based approaches, inspired by auction dynamics, and optimization-based approaches, relying on mathematical formulations, have emerged as alternatives. Auction-based approaches enable dynamic task allocation through robots bidding based on their capabilities, while optimization-based approaches offer systematic solutions considering resource utilization and task efficiency. These alternatives aim to address iTax’s limitations and enhance MRTA problem-solving effectiveness.

Considering factors such as solution optimality, allocation timing, and problem-specific constraints, two crucial task allocation strategies have been developed: auction-based approaches [20] and optimization-based approaches [21]. Auction-based approaches draw inspiration from the dynamics of auction-bid systems observed in societal contexts. In this approach, robots submit bids reflecting their current state values to compete for tasks. The task is then assigned to the robot offering the most suitable bid, representing a merit-based selection process. Conversely, optimization approaches involve devising mathematical solutions to tackle task allocation challenges. Additionally, behavior-based methodologies [22] have garnered considerable scholarly attention for their ability to adapt to the dynamic behavior of systems. More recently, innovative dynamic task allocation strategies have emerged to address complex constraints associated with various uncertain conditions. The growing significance of Artificial Intelligence (AI) and learning-based approaches has become prominent in this domain [23]. Researchers are increasingly turning their attention to these methods, motivated by their potential to provide comprehensive, autonomous, and optimal solutions to MRTA challenges. This trend reflects the evolving landscape of the field.

MRTA optimizes system performance by efficiently distributing tasks among multiple robots, crucial for various applications such as resolving complex challenges like the Multiple Traveling Salesman Problems (mTSP), Vehicle Routing Problems (VRP), Job Scheduling Problems (JSP), Team Orienting Problems (TOP), and Dial-a-Ride Problems (DARP). In mTSP, multiple robots solve a variant of the Traveling Salesman Problem, optimizing routes to minimize travel distance and costs [24]. VRP involves efficiently routing tasks among robots, resembling delivery vehicles, considering constraints like capacities and time windows [25]. JSP focuses on scheduling tasks among robots to minimize makespan, critical for industrial automation and warehouse management [26]. TOP dynamically forms and optimizes teams of robots to accomplish tasks collaboratively, crucial for search and rescue missions [27]. DARP coordinates multiple robots to provide transportation services, relevant for ride-sharing and public transportation systems, emphasizing the optimization of passenger routing and vehicle allocation [28].

The coalition formation (CF) remains a crucial but unaddressed aspect in a wide array of multi-robot task allocation problems, such as mTSP, VRP, Job JSP, TOP, and DARP. These problems involve multiple robots or agents tasked with optimizing various objectives. For instance, in mTSP, the challenge is to have multiple salesmen (robots) efficiently visit a set of locations while minimizing the total route length. The establishment of coalitions can facilitate the equitable distribution of tasks, leading to a reduction in the overall travel distance. In VRP, the objective is to deliver goods to multiple customers while minimizing transportation costs. Here, coalitions play a vital role in route and resource sharing to achieve cost-effective solutions. Similarly, in JSP, multiple robots must execute a variety of tasks while optimizing time and resource utilization. The formation of coalitions aids in workload balancing and reduces the makespan. In TOP, the focus is on organizing robots into teams to accomplish specific tasks effectively. Coalitions are instrumental in the formation of teams that maximize task performance. Lastly, DARP involves multiple robots providing transportation services to passengers with pickup and drop-off requests. In this context, coalitions help in passenger sharing and route optimization.

In conclusion, CF is an indispensable aspect of addressing the challenges posed by MRTA across various domains. This paper reviews MRTA algorithms, assesses their support for CF, and seeks to determine the most effective methods for achieving optimal CF.

2.2. Coalition Formation (CF)

In the realm of robotics or multi-agent systems, the assembly of separate groups of robots or agents, each assigned specific tasks, constitutes coalition formation presented in Figure 4 [29]. Task allocation, within this framework, arises as a fundamental aspect of CF, involving the assignment of tasks to robots based on their capabilities. Optimal CF requires maximizing overall team performance while minimizing task completion time and resource consumption [30].

Various algorithms aimed at facilitating efficient collaboration among robots through CF have been devised. Examples include the Contract Net Protocol (CNP) [31], Consensus-Based Bundle Algorithm (CBBA) [32], Task Allocation Via Iterative Bargaining (TAIB) [33], Stable Marriage Algorithm (SMA) [34], Dynamic Distributed Coalition Formation (DDCF), as well as the Merge-and-Split Approaches, Combinatorial Auctions, Core Selecting Combinatorial Auction (CCA), and the Genetic Algorithm (GA) for CF [35]. These algorithms incorporate negotiation mechanisms, enabling robots to form coalitions based on their individual competencies and preferences.

In the context of the CNP, tasks are disseminated to participating robots by a central coordinator. Interested robots then submit bids for tasks, and the coordinator selects winning bids to form coalitions. CBBA involves robots engaging in communication and negotiations through a consensus-based framework, facilitating deliberation on coalition assignments and task allocation. In TAIB, robots negotiate with other entities to assess potential coalitions, iteratively refining preferences during CF. The SMA entails robots (males) proposing to potential coalitions (females), with coalitions accepting or rejecting proposals based on preferences. DDCF utilizes a distributed negotiation approach, enabling robots to adapt to changes in team composition. The Merge-and-Split algorithm allows existing coalitions to merge or split based on evolving task requirements and robot capabilities. Combinatorial auctions offer bundles of tasks or resources for bidding, with the auctioneer assigning winning combinations to maximize efficiency. Core CCA aims to determine stable and efficient task allocations among coalitions. Lastly, in GA, robot capabilities and preferences are represented as genes, subject to selection, crossover, and mutation operations to converge towards an optimal coalition structure [36].

The domain of multi-robot CF presents various challenges, including managing uncertainty, partial observability [37], non-stationarity [38], communication disruptions, the heterogeneity of robot capabilities, and task interdependencies. Developing efficient coalition algorithms is a significant task, often requiring substantial computational resources and specialized expertise. Maintaining coordination within coalition-based interactions in dynamic environments can be complex, with synchronization of actions emerging as a crucial concern. Additionally, multiple coalitions may converge to undertake similar or overlapping tasks, potentially leading to inefficiencies [39,40]. Nevertheless, CF enables robots to share resources such as energy, computational power, and communication bandwidth, enhancing resource utilization [41]. This sharing facilitates flexibility in team configuration adaptation in response to changing environmental conditions or task requirements. Such adaptive capability enhances system resilience against unforeseen challenges and improves overall robustness.

3. MRTA Classification

Figure 5 depicts an arrangement presenting a categorization of strategies pertaining to MRTA. Within this taxonomy, four overarching classifications are discernible: behavior-based, market-based, optimization-based, and learning-based. Each of these classifications introduces a collection of prominent and well-established methodologies to address MRTA challenges, representative of standard paradigms. While certain researchers are modifying or extending these established approaches, others are integrating aspects of these methodologies to engender hybrid strategies. The four main categories of MRTA strategies are Behavior-based MRTA, Market-based MRTA, Optimization-based MRTA, and Learning-based MRTA.

3.1. Behavior-Based MRTA

Behavior-based MRTA represents a distinctive strategy characterized by its entirely reactive response to dynamic problem scenarios, as suggested by its nomenclature. This approach is structured upon a dual-tier architecture, comprising a foundational lower level and a supervisory stratum [42]. The lower level includes tasks such as navigation, obstacle avoidance, and task-switching, while the supervisory level handles task identification and inter-robot communication within the team. To enhance resilience, a problem-specific layer augments the lower-level behavior [43]. Figure 5 illustrates that behavior-based MRTA encompasses four methodological approaches: ALLIANCE, vacancy chain scheduling, Broadcast of Local Eligibility (BLE), and Automated Synthesis of Multi-Robot Task Solutions through Software Reconfiguration (ASyMTRe) [13].

3.1.1. Alliance

The Alliance architecture, known for its behavior-based and fully distributed principles, enables MRS to adapt to diverse circumstances by organizing behaviors tailored to high-level tasks [44,45]. Key features include robot impatience and acquiescence within coalitions, which impact collaboration by prioritizing individual goals or yielding excessively, respectively. Balancing these behaviors is crucial for effective collaboration [46,47,48]. Alliance supports task allocation, resource sharing, and decision-making among dynamically assembled coalitions, enhancing communication and, information exchange for optimized performance [49,50]. Attributes include distributed decision-making, reactive behavior, emergent coalition behavior, robustness, scalability, and adaptability.

3.1.2. Vacancy Chain Scheduling

The Vacancy Chain System (VCS) models are scheduled by creating a cascade effect through job promotions, akin to bureaucratic structures. This method addresses scheduling intricacies and group dynamics using microscopic and macroscopic approaches [51]. While effective in spatial domains, VCS’s performance varies with individual robot capabilities. Its strength lies in its versatility across problems, requiring minimal problem-specific information. Key attributes include distributed decision-making, reactive behavior, emergent system behavior, robustness against robot unavailability, scalability, and adaptability through scheduling algorithm adjustments [52,53].

3.1.3. Broadcast of Local Eligibility (BLE)

The Broadcast of Local Eligibility (BLE) mechanism compares a task’s local eligibility with the highest eligibility among similar behaviors in other robots, following the Port Arbitrated Behavior (PAB) paradigm [54]. BLE operates by suppressing peer behaviors when a robot’s local eligibility surpasses others, signaling its claim to tasks. In the absence of inhibitory interactions, other robots assume task responsibility. BLE’s scalability depends on communication bandwidth, with attributes including distributed decision-making, reactive behavior, emergent behavior, robustness against failures, scalability, adaptability, and local communication for sharing eligibility information among neighbors [55,56].

3.1.4. Automated Synthesis of Multi-Robot Task Solutions Through Software Reconfiguration (ASyMTRe)

ASyMTRe automatically reconfigures schema connections across and within robots by linking environmental, perceptual, and motor control schemas, facilitating effective multi-robot behaviors aligned with team objectives [57]. It addresses task-related challenges in diverse robot teams, enabling the synthesis of new task solutions and sharing sensory information among networked robots to assist less capable ones [58]. ASyMTRe exhibits distributed decision-making, reactive behavior, emergent behavior, robustness, scalability, and adaptability, allowing robots to autonomously make decisions, adjust to environmental changes, and withstand failures [59].

As previously mentioned, ALLIANCE, Vacancy chain, BLE, and ASyMTRE are four key algorithms falling under the umbrella of behavior-based MRTA. Despite their shared behavior-based nature, they exhibit distinct attributes and advantages and disadvantages. Table 1 and Table 2, below, present a comprehensive comparative analysis of these algorithms based on their characteristics. The comparison of these attributes underscores ALLIANCE’s proficiency in effectively managing CF in Multi-Robot Systems (MRS).

3.2. Market-Based MRTA

The foundational concept of market-based MRTA draws inspiration from the principles of auctions and bidding. Various market-based methodologies are crafted by adapting the core auction-bidding procedure with the incorporation of incentives and penalties. Figure 6 illustrates the fundamental genesis of market-based MRTA.

Market-oriented strategies focus on utility functions, quantifying agents’ capacity to evaluate their interest in particular tasks for potential exchange. In MRTA systems, these functions clarify the alignment between a robot’s skills and task requirements [26]. Market-based approaches in multi-robot task allocation exhibit distinctive attributes that enhance their efficacy. By centering on utility functions, agents can gauge preferences for tasks, facilitating efficient task trading. Auction mechanisms introduce competitive bidding, motivating agents to bid based on perceived task values [60]. Incentive-driven behavior, through rewards and penalties, ensures optimal task allocation and fosters cooperation. Decentralized decision-making empowers agents to autonomously engage in task assignments. The adaptability of market-based strategies allows responses to changing conditions by adjusting utility functions and allocation mechanisms. These methods are flexible, accommodating various tasks and heterogeneous robot capabilities [61]. Through bid submissions and communication, agents share information, enhancing decision-making. The scalability of market-based systems effectively manages large-scale multi-robot scenarios, resulting in robust allocations that maximize overall utility, contributing to resource-efficient task distribution. Collectively, these attributes equip market-based approaches to effectively address multi-robot task allocation challenges.

3.2.1. RACHNA

RACHNA, an ascending auction-based task allocation protocol that allows task preemption, introduces market-based approaches to CF in competitive environments but faces challenges like unnecessary task reassignments, impacting performance due to switching overhead [62]. Designed for CF, RACHNA addresses heuristic-driven methods’ limitations by allowing tasks to compete for robot resources and autonomously adjusting sensor values based on supply and demand, without constraints on coalition size [63,64]. It frames CF as a multi-unit combinatorial auction, enabling efficient management of CF through bids on combinations of goods [65].

3.2.2. KAMARA (KAMRO’s Multi-Agent Robot Architecture)

KAMARA combines centralized and distributed frameworks for the Karlsruhe Autonomous Mobile Robot (KAMRO). This architecture enhances control and integration across diverse aspects of distributed intelligent robot systems and their individual components, accommodating various forms of cooperation among interconnected agents, such as closed kinematic chains or camera-manipulator couplings [66,67].

3.2.3. MURDOCH

MURDOCH involves an Auctioneer and Bidder in a swift task allocation method through initial-price, one-round auctions, assigning tasks to the highest bidder based on computed bid values reflecting agents’ capabilities [68]. It introduces a robust MRTA algorithm adept at handling robot malfunctions and communication breakdowns, allowing task reassignment and facilitating collaboration among heterogeneous robots [69]. The Auctioneer initiates auctions upon task arrival, selects winning bidders, and monitors task execution, while bidders compute utility metrics, submit bids, and execute tasks upon winning, using a publish-subscribe communication model for coordination [70,71,72,73].

3.2.4. M+

M+ is a distributed multi-robot cooperation scheme integrating mission planning, task refinement, and cooperative mechanisms from the Contract Net Protocol framework within the M+ unified system, designed for integration using the LAAS Architecture [74,75]. It includes M+ task allocation, cooperative response to contingencies, and task execution operations, refining and assigning tasks through negotiation mechanisms. The cooperative reaction activity handles task execution failures by updating the world state, facilitating information exchange, overseeing (re)planning, and coordinating assistance, while the execution function manages task control and synchronization among robots [76,77].

3.2.5. TraderBots

The TraderBots methodology leverages market economies’ benefits for effective multi-robot coordination in dynamic scenarios through decentralized decision-making and task allocation within coalitions, demonstrated in simulations and physical implementations [78,79,80]. It excels in CF by employing a market-based framework, facilitating efficient task allocation based on robots’ capabilities and task compatibility, adapting to dynamic environments, and fostering information exchange among robots for informed decision-making. TraderBots is robust against failures and communication disruptions, showcasing versatility and scalability for diverse multi-robot scenarios, proving its practicality for effective CF [81].

All these methodologies fall within the realm of market-based task allocation and are founded on the common principle of employing market mechanisms to distribute tasks among robots. However, they diverge in terms of bid generation methods and the precise algorithms employed to optimize the task allocation process. Table 3, presented below, illustrates the comparison of five market-based approaches according to their distinctive attributes.

3.3. Optimization-Based MRTA

Optimization-driven methods strive to address the MRTA challenge by framing it as an optimization dilemma, seeking the best task-to-robot assignment. These methods primarily fall into two categories: conventional optimization methods and evolutionary optimization strategies.

Traditional optimization methods encompass mathematical frameworks, among which are Mixed Integer Linear Programming (MILP) and Quadratic Programming (QP), frequently employed. In contrast, evolutionary optimization approaches such as Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), GA, and Simulated Annealing (SA) have gained popularity [82]. While MILP and QP are favored for their potential to identify globally optimal solutions, their computational complexity can escalate with larger problem scales [83]. On the other hand, PSO, ACO, GA, and SA serve as metaheuristic optimization methods, capable of managing multiple objectives and non-linear constraints, although they might converge to local optima.

3.3.1. Traditional Optimization

MILP and QP are prevalent traditional methods for solving multi-robot task allocation challenges. MILP involves linear objective functions and constraints, while QP deals with non-linear formulations. A significant challenge with MILP is its scalability due to increased complexity and variable quantities [84]. In contrast, QP controllers are becoming benchmarks for managing complex objectives in robots with multiple joints, such as humanoids, proving effective in position control, torque regulation, multi-robot coordination, and force management [85].

3.3.2. Evolutionary Optimization

PSO operates as a population-centric search algorithm inspired by social behavior, iteratively exploring optimization problems within a designated search domain, making it suitable for robots with limited capabilities [86,87,88,89].

ACO, initiated by Dr. Marco Dorigo, emulates ant foraging behavior and finds applications in routing and scheduling, functioning as a probabilistic strategy for identifying favorable pathways across graphs within swarm intelligence methodologies [90,91,92,93,94].

GA explores optimization challenges through competitive selection, recombination, and mutation within a population of solutions, generating novel solutions biased towards favorable regions of the solution space [95].

SA mimics physical annealing transformations in solids, gradually reducing temperature to converge towards the most favorable state, using temperature-dependent random factors to escape local minima and reach lower energy configurations [96,97,98].

Optimization-based MRTA approaches: Methods like PSO, ACO, GA, SA, LP, and QP exhibit diverse attributes shaping their problem-solving dynamics.

PSO seeks global optima while navigating exploration and exploitation trade-offs.
ACO emphasizes pheromone-guided exploration and solution construction.
GA balances diversity through mutation and convergence via crossover.
SA transitions from high-temperature exploration to low-temperature exploitation.
LP targets linear relationships, and QP handles quadratic ones, both optimized for resource allocation.

Each approach’s distinct characteristics equip them to excel in various problem domains, addressing the intricacies of task allocation scenarios. A comparison of these approaches, considering various attributes, is presented in Table 4 and Table 5 below.

3.4. Learning-Based MRTA

Learning is fundamental to constructing solutions, representing a progression where an AI program enhances its understanding by observing its environment. Technically, AI learning mechanisms involve processing input-output pairs to deduce patterns for a designated function, enabling the anticipation of outputs for novel inputs. This paradigm allows robots to dynamically improve their task allocation strategies through learning from data and experience. By leveraging machine learning techniques, these systems adapt and optimize decisions over time, effectively navigating complex and changing environments [99]. This approach enhances resource utilization, coordination, and overall efficiency in multi-robot systems by allowing them to autonomously refine task allocation behaviors based on real-world interactions and performance feedback.

Machine Learning

Machine Learning, a subset of AI, enables machines to replicate intelligent human actions. It includes four main types: supervised, semi-supervised, unsupervised, and reinforcement learning [100] as shown in Figure 7. These techniques allow computers to learn from data, facilitating informed decision-making and predictions. In multi-robot task allocation, machine learning enhances decision-making and adaptability, enabling dynamic optimization of task allocation strategies based on data-driven insights and historical performance [101]. By learning from interactions between robots, their environment, and tasks, multi-robot systems improve resource utilization, coordination, and system efficiency, enhancing scalability, flexibility, and autonomy in addressing task allocation challenges.

ML and AI offer powerful avenues for improving the efficiency of solving MRTA problems. Firstly, learning-based models such as reinforcement learning can enable robots to adaptively learn task allocation strategies based on experience, especially in dynamic or uncertain environments. Supervised learning can be employed to predict optimal task assignments based on historical data, reducing computation time during online decision-making. Deep learning architecture can process high-dimensional sensory inputs and environmental data to inform allocation decisions more intelligently. Additionally, imitation learning allows robots to learn efficient task allocation behaviors by observing expert demonstrations [102,103]. AI techniques can also support coalition formation, by identifying optimal groupings of heterogeneous robots for complex tasks through clustering or utility-based reasoning. Furthermore, ML models can be integrated with traditional optimization algorithms to guide search processes more efficiently or to predict promising regions of the solution space.

Supervised Learning trains algorithms on labeled data to predict specific outputs, iteratively refining models to discern patterns connecting input and output labels. In multi-robot task allocation, supervised learning maps behaviors to optimal allocations, using historical data to predict suitable assignments for new scenarios, optimizing resource utilization and system efficiency [104,105].

Semi-supervised Learning combines supervised and unsupervised learning, leveraging both labeled and unlabeled data to enhance model performance and generalization. In multi-robot task allocation, it blends techniques to optimize resource utilization even with limited labeled data [106].

Unsupervised Learning operates without explicit guidance, uncovering patterns in data without predefined labels, grouping similar data points together to encapsulate datasets more compactly. In multi-robot task allocation, unsupervised learning distills insights from unstructured datasets, improving resource allocation strategies [107].

Reinforcement Learning involves models making sequential decisions to achieve objectives within unpredictable settings, continuously interacting with the environment, and receiving rewards or penalties to adapt and refine behaviors over time. It is promising for optimizing resource utilization and improving collaboration in multi-robot task allocation systems [108,109,110,111,112].

Machine learning encompasses methodologies including supervised, semi-supervised, unsupervised, and reinforcement learning, each offering distinct advantages and applications. Supervised learning excels in prediction tasks, semi-supervised learning enhances understanding with limited labels, unsupervised learning reveals insights from unlabeled data, and reinforcement learning excels in sequential decision-making settings. Each approach uniquely contributes to machine learning challenges, catering to diverse problem domains and objectives. A comparison of these approaches, considering various attributes, is presented in Table 6 and Table 7 below.

In our simulation, we are using reinforcement learning with the epsilon-greedy algorithm for CF. The epsilon-greedy algorithm is a commonly used strategy in reinforcement learning that balances exploration and exploitation. In our simulation, this algorithm allows the robots to make decisions about which subtasks to pursue based on their learned Q-values while still allowing for some randomness to discover potentially better actions. The action

a

chosen by a robot (agent) is determined by

a = \{\begin{matrix} r a n d o m a c t i o n & w i t h p r o b a b i l i t y \in \\ a r g m a x_{a_{1}^{'}} Q (s, a^{'}) & w i t h p r o b a b i l i t y 1 - \in \end{matrix}

(1)

Q (s, a^{'})

is the action-value function (Q-value) for state

s

and action

a^{'}

and

\in

is the exploration probability. After taking action

a

and observing a reward

r

from the environment, the Q-value is updated according to the Q-learning rule.

3.5. Comparison with Different MRTA Approaches

In order to attain an optimal solution for the MRTA problem, it is imperative to take into account a range of critical factors. Effectively addressing these factors is crucial for obtaining the most favorable results in solving the MRTA problem. The comparative analysis presented in Table 8 highlights the diverse strengths and limitations of various CF methods, including behavior-based, market-based, optimization-based, and learning-based approaches. Each method exhibits distinct characteristics regarding scalability, complexity, optimality, flexibility, and robustness, influencing their applicability in real-world scenarios. Behavior-based methods, while suitable for small to moderate systems, often struggle with optimality and flexibility, primarily relying on local communication and simple task handling. Conversely, market-based methods introduce a more adaptable framework capable of accommodating complex tasks and heterogeneous robot populations, although they may encounter challenges with uncertainty and task reallocation due to their negotiation-based mechanisms.

Optimization-based approaches demonstrate superior performance in achieving optimal solutions, particularly in large systems; however, they may grapple with the complexities of multiple decision variables, impacting their robustness. Learning-based methods, particularly those utilizing reinforcement learning, emerge as a promising alternative, offering significant adaptability and improved performance through experience-based learning. Their capacity to handle dynamic environments and complex tasks positions them as a viable option for future research endeavors in CF.

Despite their individual advantages, it is crucial to consider that these methods may not perform optimally in isolation. A hybrid approach, integrating elements from multiple methodologies, could enhance overall performance by leveraging the strengths of each while mitigating their weaknesses. For instance, combining learning-based algorithms with optimization techniques may yield robust solutions capable of adapting to changing environments and task requirements. Future studies should explore these hybrid methodologies, focusing on their scalability and efficiency in dynamic, real-world scenarios, thereby contributing to the advancement of multi-robot systems and CF strategies.

4. Simulation and Results

In our simulation studies, we use a laptop manufactured by Dell Inc., headquartered in Round Rock, TX, USA, with an Intel Core i7 processor and 16 GB of RAM to run Python (version: 3.11.4)-based programs and simulations. The software environment includes necessary libraries such as NumPy (version: 1.25.2), Matplotlib (version: 3.7.3), PyTorch (2.7.1), and custom Python scripts to visualize robot movements and coalitions. The simulations do not involve real hardware robots; instead, we focus on the behavior and coalition dynamics of virtual robots modeled in the software. All robot movements and task assignments appear in real time through graphical plots generated by the Python 3.11.4 simulation, providing a clear visualization of the coalition-based task allocation process.

We consider a team of N robots, each denoted as robot

i \in \{1, 2, 3, \dots N\}

and their positions are represented by

x_{i}

in the two-dimensional space

R^{2}

,

x_{i} \in R^{2}

. Similarly, we have objects labeled as

l \in \{1, 2, 3, \dots M\}

and their positions, velocities, and desired positions are represented as

z_{l} \in R^{2}, v_{l} \in R^{2}, a n d z_{l}^{d} \in R^{2},

respectively. Robot

i

can observe other robots

j

belonging to the set

N_{i}

of neighboring robots. Robot

i

can observe neighboring robots

j \in N_{i}

and obstacle

l \in L_{i}

. These observed robots and objects are selected based on their proximity to robot

i

, specifically, the

K

nearest neighbors. We have established the following assumptions for this scenario:

Robots are aware of the values of M (the total number of objects) and N (the total number of robots).
Robots possess knowledge of both the current and desired positions of all M objects.
Robots are capable of communicating with each other as needed.
We assume that all robots are operating within a workspace where communication between robots is feasible.

We simulate a scenario with a set of 50 robots, denoted as

R = {r_{1}, r_{2}, r_{3}, \dots, r_{50}}

, and a task,

T = {t_{1}, t_{2}, t_{3}, t_{4}, t_{5}}

, divided into five subtasks in a warehouse dimension of 100 × 100 (in metres). The assignment of robots to these subtasks is based on their capabilities and the specific requirements of each subtask. Although robots are assumed to be identical, they differ in terms of their capabilities, categorized into three groups: high, medium, and low.

Each robot is randomly assigned to evaluate the temporal capabilities of the subtasks, and the optimal CF is determined using various algorithms. We consider the distance of each robot from each subtask and the time required for them to reach the nearest subtask. Since distance and time are directly related, each robot selects the nearest subtask to ensure efficient CF.

In this study, we evaluate the top-performing algorithms within each category. Specifically, we selected four algorithms: Alliance (behavioral-based), M+ (market-based), PSO (optimization-based), and Reinforcement Learning (learning-based). Our simulations involved running these algorithms through a series of tests using identical parameters, with each simulation consisting of 100 iterations.

After 100 iterations, we obtained 50 convergence matrices, each representing the distances of the 50 robots to the nearest subtask over the iterations. The convergence analysis assesses the speed and reliability with which the sequence of approximations, generated by the iterative method, converges toward the actual solution. The convergence matrix plays a crucial role in this analysis.

Then we use all 50 convergence matrices for each algorithm to capture a range of distances, spanning from the closest to the farthest robot-subtask configurations. Within these matrices, we analyzed the average distance, and the time required for individual robots to reach convergence with their respective subtasks. Subsequently, we compared the results from all four algorithms to draw conclusions from our findings. The next section will discuss the results of these algorithms.

4.1. Behavior-Based: Alliance Architecture

The ALLIANCE architecture can be mathematically described in terms of robot capabilities, CF, and robot movement. Robot movement towards a subtask is governed by updating the robot’s position based on the direction vector

\vec{d}

which points toward the subtask, scaled by the robot’s velocity

v

. This is mathematically represented as

N e w P o s i t i o n_{s} = P o s i t i o n_{r} + \frac{\underset{d}{\to}}{| \underset{d}{\to} |} * v

(2)

Each robot’s capability for a given subtask is denoted as

C_{r},_{s} \in [0,1]

, where

C_{r},_{s}

represents the capability of robot

r

to perform subtask

s

. CF occurs when a robot’s capability for a particular subtask is greater than zero, formally expressed as

c o a l i t i o n_{s} = \{r| C_{r},_{s} > 0\}

, where

c o a l i t i o n_{s}

is the set of robots suitable for subtask

s

.

The implementation of the ALLIANCE architecture in our simulation involves key concepts such as robot capabilities, CF, and autonomous movement. Each robot is assigned distinct capabilities for different subtasks, which are randomly generated and stored in a

r o b o t_c a p a b i l i t i e s

matrix. These capabilities represent the temporal suitability of each robot to perform a given subtask, based on sensor inputs or predefined characteristics.

CF is driven by these capabilities, where robots with non-zero capability values for a specific subtask form a coalition. In the code, this process is executed by appending robot IDs to the respective coalitions if their capability exceeds zero.

Once the coalitions are formed, the robots autonomously navigate toward the assigned subtasks. The movement is computed by calculating the direction vector between a robot’s current position and the position of the closest subtask.

direction = (closest_subtask_pos [0] − robot_pos [0], closest_subtask_pos [1] − robot_pos [1]

(3)

This equation computes the direction vector from a robot’s current position to the position of its most suitable subtask. It does so by subtracting the robot’s x and y coordinates from the corresponding coordinates of the best subtask position, selected using a function or method represented by best_(). The resulting vector indicates the direction the robot should move in order to approach the optimal subtask, based on certain criteria such as proximity or task relevance. This directional information is fundamental for navigation and task allocation, helping the robot align its movement towards the assigned goal efficiently.

This dynamic process ensures that each robot continuously updates its movement based on its capability and proximity to the task, reflecting the behavior-based control mechanisms inherent in the ALLIANCE architecture.

The initial simulation employed Alliance architecture with a fleet of 50 robots of differing capabilities, randomly assigned as high, medium, or low. Robots were directed towards five subtasks, selected based on proximity. Over 100 iterations, the results shown in Figure 8 illustrate efficient coalition formation (CF) using the Alliance algorithm. All 50 robots were effectively allocated to the nearest subtasks, promoting active participation and rapid task accessibility. Figure 8 displays the CF outcome after 100 iterations.

The alliance architecture facilitates the convergence of multiple robots into a single point to form a coalition through flexible and decentralized coordination. Robots autonomously organize into alliances based on shared objectives, capabilities, and task compatibility. This adaptability allows robots to dynamically identify suitable partners with aligned interests and abilities, leading to a common goal or convergence point. Consequently, the alliance architecture promotes efficient cooperation among diverse robots, enabling effective collaboration in achieving shared tasks or converging to specific points in their environment, fostering robust and scalable multi-robot systems.

4.2. Market-Based: M+ Algorithm

In our simulation, the M+ algorithm is applied to create optimal CF for robots tasked with subtasks in a warehouse environment. The temporal capabilities of robots, stored in a matrix, are used to rank robots for each subtask. The top-performing robots are selected to form coalitions based on their capabilities. Coalition size is determined by a square root function, simulating optimal task distribution. Once assigned, the robots navigate toward their designated subtasks by calculating the shortest distance and updating their positions iteratively. This code implementation reflects the core mechanics of the M+ algorithm, focusing on distributed decision-making and market-based CF.

Each robot

r

is assigned a capability score

C_{r},_{s} \in [0,1]

, for each subtask

s

. This score represents the robot’s ability to effectively contribute to a specific subtask, and it plays a critical role in determining the robot’s eligibility to join a coalition for that subtask. In our implementation, the robot capabilities are represented by a matrix where each element is randomly generated using the following expression:

C_{r},_{s} = r a m d o m . u n i f o r m (0.1, 1)

(4)

where

C_{r},_{s}

denotes the capability of robot

r

for subtask

s

, and its value is uniformly drawn from the interval [0.1, 1], ensuring variability in robot performance across subtasks.

c o a l i t i o n_s i z e = m i n (N, | \sqrt{N} |)

(5)

The algorithm selects robots with the highest capabilities to form a coalition for each subtask. The size of the coalition is determined by a function related to the total number of robots N, which approximates the optimal number of robots per coalition. The coalition size is computed as

| \sqrt{N} |

, reflecting the square root rule commonly used in distributed task allocation.

For each subtask

s

, the robots are ranked based on their capability scores

C_{r},_{s}

and the top-performing robots are selected to form the coalition:

c o a l i t i o n_{(s)} = \{r| C_{(r},_{s)} \geq t h r e s h o l d\}

(6)

In our code, the coalitions are formed by selecting the top

| \sqrt{N} |

robots from the sorted list of robots based on their capabilities.

Once the coalitions are formed, each robot in a coalition is assigned a specific subtask and moves toward the location of the subtask. The distance

d

between robot

r

and subtask

s

is calculated as

d = \sqrt{{(x_{s} - x_{r})}^{2} + {(y_{s} - y_{r})}^{2}}

(7)

The movement of each robot is governed by minimizing the Euclidean distance between the robot’s current position

{(x}_{r}, y_{r})

and the target subtask’s position

(x_{s}, y_{s})

.

The algorithm iteratively updates the robot’s position by moving it along the vector direction towards the subtask, ensuring that the robot gradually reduces its distance to the target subtask.

In the simulation with the M+ algorithm, we employ the same 50 robots with varying capabilities, randomly assigned as high, medium, or low. These robots are tasked with moving towards five subtasks, with their selection based on their current proximity to the subtasks. We conduct a total of 100 iterations, and the conclusive outcome is presented in Figure 9.

A communication simulation is performed during the execution of the M+ algorithm, wherein a total of 250 message transmission attempts are recorded. Of these, 237 messages are successfully delivered, while 13 are lost, resulting in a packet loss rate of 5%. The simulated message latency is fixed at 50.0 ms, representing moderate communication delays commonly encountered in decentralized multi-robot systems operating in real-world environments. Despite a measurable degree of message loss, the system exhibits a high message delivery rate, suggesting a satisfactory level of communication robustness under constrained network conditions.

The selection of 50 ms latency and 5% packet loss as simulation thresholds is intentional and grounded in the need to emulate realistic wireless communication scenarios, such as those found in outdoor or semi-structured domains. These values reflect typical conditions in Wi-Fi-based or ad hoc mesh networks operating under moderate traffic loads, where intermittent packet drops and variable transmission delays are expected. The chosen parameters aim to strike a balance between ideal and severely degraded communication, thereby facilitating an evaluation of algorithmic resilience without introducing unrealistic network unreliability. In this context, a 5% packet loss rate is sufficient to reveal the algorithm’s tolerance to communication imperfections, while a 50 ms latency is significant enough to influence coordination timing without critically impairing performance.

M+ depends heavily on continuous inter-robot communication for the processes of task bidding, utility evaluation, and consensus formation. As such, its performance is inherently tied to the reliability and efficiency of the underlying communication network. Unlike optimization-based algorithms such as PSO or learning-based strategies like RL, which often rely on centralized updates or individual policy execution, M+ is especially vulnerable to message delays and packet loss. The communication simulation thus provides crucial insights into the operational resilience and practical applicability of the M+ algorithm in realistic, decentralized robotic systems.

The difference between alliance architecture and M+ in CF, concerning their ability to converge to a common point or task, lies in their fundamental mechanisms. Alliance architecture emphasizes a decentralized, self-organizing approach where agents form alliances based on shared interests and capabilities. Convergence in this context relies on the agents’ ability to autonomously identify suitable partners and form alliances, gradually moving the system towards a desired task or point of convergence through distributed decision-making. On the other hand, M+ utilizes mathematical programming to globally optimize the allocation of tasks to agents, explicitly defining an objective function. Convergence in M+ occurs through the solution of this optimization problem, aiming for an efficient task allocation. While alliance architecture offers flexibility in dynamic environments, M+ may excel in optimizing task allocations but could be less adaptable to changing circumstances due to its reliance on a fixed optimization framework.

4.3. Optimization-Based: PSO Algorithm

In PSO, each “particle” represents a potential solution: in our case, the position of each robot in the warehouse. The initial positions of robots are randomly generated within the dimensions of the warehouse:

x_{i} (0) = (x_{i} (0), y_{i} (0)), i = {1, 2, 3, \dots N}

(8)

N

is the number of robots, and

x_{i} (0)

represents the initial position of robot

i

in the 2D space. The velocities of the particles are also initialized randomly, and the goal of the algorithm is to iteratively update these velocities and positions to optimize the assignment of robots to subtasks. The objective function that our PSO is optimizing is the ratio of each robot’s capability for a subtask to its distance from that subtask. The score

S_{i}

of each particle (robot) is defined as

S_{i} = \sum_{S = 1}^{M} \frac{C_{i},_{s}}{d_{i},_{s}}

(9)

C_{i},_{s}

is the capability,

d_{i},_{s}

is the Euclidean distance between the position of robot

i

for subtask

s

, and

M

is the total number of subtasks. The score is maximized when robots with higher capabilities are closer to their respective subtasks. Each robot maintains its personal best position

P_{i},

meaning the position where it had the highest score in the past:

P_{i} (t) = \arg \max_{i} S_{i} (t)

(10)

There is also a global best position

g

that corresponds to the highest score achieved by any robot across the swarm:

g (t) = \arg \max_{t} S_{i} (t)

(11)

In the PSO algorithm, the velocity and position of each particle, representing a robot, are updated through three key components: inertia, cognitive, and social terms. The inertia term maintains the particle’s current momentum, ensuring it continues in its present direction, with the velocity update defined as

v_{i} (t + 1) = w * v_{i} (t)

, where

w

represents the inertia weight. The cognitive term directs the particle towards its personal best position,

p_{i} (t)

, by adjusting its velocity based on the difference between its current position and its best-known position, represented as

c_{1} r_{1} (p_{i} (t) - x_{i} (t))

, where

c_{1}

is the cognitive weight and

r_{1}

is a random scalar between 0 and 1. The social term steers the particle towards the global best position,

g (t)

, using

c_{2} r_{2} (g (t) - x_{i} (t))

, where

c_{2}

is the social weight and

r_{2}

is a random scalar. The overall velocity update equation combines these components as

v_{i} (t + 1) = w * v_{i} (t) + c_{1} r_{1} (p_{i} (t) - x_{i} (t)) + c_{2} r_{2} (g (t) - x_{i} (t))

, and the particle’s position is subsequently updated using

x_{i} (t + 1) = x_{i} (t) + v_{i} (t + 1

).

In this simulation, we applied the PSO algorithm to a fleet of 50 robots with diverse capabilities, randomly classified as high, medium, or low. The robots were tasked with approaching five subtasks, with selection based on their relative distances. Over 100 iterations, the results indicate that the PSO algorithm prioritized the first subtask, allocating all robots towards its completion, as shown in Figure 10. This outcome highlights PSO’s strength in converging on near-optimal solutions; however, our objective is to accelerate task completion by enabling simultaneous engagement across all subtasks to minimize total task time.

PSO differs from the Alliance Architecture and the M+ algorithm in CF, particularly in its approach to converging on a common point or task. PSO is a heuristic optimization technique inspired by the collective behavior of birds or fish in a swarm. In CF, PSO models agents as particles exploring a solution space to find the optimal task allocation. While the Alliance Architecture and M+ emphasize decentralized coordination and explicit optimization, PSO uses a population-based search mechanism. It does not form traditional coalitions but directs agents (particles) towards optimal solutions. PSO’s convergence is driven by the optimization process, with particles adjusting their positions based on their experiences and those of their neighbors. This stochastic exploration can effectively find near-optimal solutions, especially in continuous solution spaces, but it may lack the transparency and adaptability of the Alliance Architecture or M+ in multi-agent systems with evolving goals and capabilities.

4.4. Learning-Based: Reinforcement Learning

In the proposed reinforcement learning framework for multi-robot CF, the epsilon-greedy algorithm is utilized to strike a balance between exploration and exploitation during the decision-making process. Each robot maintains a Q-value table

Q (s, a)

, where

s

represents the current state of the robot, and

a

represents the actions associated with available subtasks. To select an action, the robot employs the epsilon-greedy strategy, wherein it explores a random action with probability

\in

and exploits the action with the highest Q-value with probability

1 - \in

. This dual approach facilitates the discovery of optimal strategies while ensuring that previously identified effective actions are also utilized.

The Q-value updates are performed following the Q-learning rule, allowing the robots to refine their strategies based on the rewards received from the environment. Specifically, after executing an action and observing a reward, the robot updates its Q-value for the taken action using the following formula:

Q (s, a) \leftarrow Q (s, a) + α [r + γ m a x Q (s^{'}, a^{'}) - Q (s, a)]

(12)

Here,

α

represents the learning rate, which controls the extent to which new information influences the existing Q-values, while

γ

denotes the discount factor, balancing immediate versus future rewards. This method allows for an effective adaptation of the robots’ actions in pursuit of subtasks, enhancing their cooperative performance in the CF process.

In our simulation employing the Reinforcement Learning (RL) algorithm, we utilized the same 50 robots with varying capabilities, randomly assigned as high, medium, or low. These robots were tasked with moving towards five subtasks, selecting based on their proximity to the subtasks. We conducted a set of 100 iterations, with the results displayed in Figure 11a–e. A notable limitation of RL is its prolonged learning duration, hindering convergence within the initial 100 iterations.

To evaluate the feasibility of the RL-based coalition formation strategy, we measure the inference time of the trained neural network used during greedy action selection within the ε-greedy policy implemented in PyTorch 2.7.1. On a standard CPU, the model achieves an average inference time of approximately 0.126075 milliseconds per decision step. This latency is substantially lower than the typical control cycle duration in robotic systems, which generally ranges from 30 to 50 milliseconds. These results suggest that the RL model is computationally efficient and well-suited for real-time deployment, even in CPU-only environments without requiring GPU acceleration. The policy network architecture is deliberately designed to be lightweight and free of recurrent or sequential layers, enabling fast, parallelizable computation. This design choice enhances the scalability of the model, making it capable of supporting an increasing number of robots and subtasks without introducing significant computational overhead.

Moreover, our benchmark findings reaffirm that the RL model remains highly efficient under real-time constraints. With an average inference time well below 1.2 milliseconds on a CPU, the approach is clearly viable for resource-constrained robotic platforms, such as embedded systems or mobile robots operating without access to dedicated GPUs. This demonstrates the model’s robustness and applicability in distributed multi-robot task allocation scenarios where responsiveness and computational economy are critical.

RL differs significantly from alliance architecture, M+, and PSO algorithms in CF due to its agent-centric, trial-and-error learning paradigm. RL agents make decisions based on learned policies, and convergence occurs through individual agent learning. While alliance architecture and M+ emphasize decentralized coordination and explicit CF, RL agents act autonomously to maximize cumulative rewards, often without explicit coalition considerations. Conversely, PSO aims to converge on optimal solutions collectively. RL introduces adaptability and autonomy at the individual agent level, allowing agents to dynamically adapt to changing circumstances and goals. The choice between these approaches depends on factors such as problem complexity, collaboration requirements, desired autonomy, and adaptability in the multi-robot system.

From the results depicted in Figure 11, it is evident that RL exhibits behavior similar to alliance architecture, indicating its significant value.

4.5. Statistical Analysis

4.5.1. Quantitative Results and Statistical Comparison of Alliance, M+, PSO, and RL

Figure 12 and Figure 13 depict behavioral line charts for a set of 50 robots. These charts illustrate the utilization of alliance, M+, PSO, and reinforcement learning algorithms. Figure 12 display blue lines indicating the average distance traveled by each robot, and Figure 13 shows the average time taken to achieve convergence over 100 iterations. Analyzing the simulation results and statistical data reveals that parallel task allocation utilizing all available resources is achieved only by two algorithms: Alliance and RL. However, it is worth noting that the reinforcement algorithm, due to its learning attributes, takes more time to accomplish the task compared to the alliance approach. Yet, as the number of robots increases, the performance of the alliance architecture starts to deteriorate, and this is when reinforcement learning shines. If we can mitigate the time constraints associated with reinforcement learning, it would emerge as the optimal approach for CF and multi-robot task allocation.

In the analysis of CF algorithms, the four approaches, Alliance (behavior-based), M+ (market-based), PSO (optimization-based), and reinforcement learning (Learning-based), exhibit distinct methodologies and effectiveness in promoting collaborative behavior among autonomous agents. The Alliance algorithm emphasizes behavioral dynamics, enabling robots to form coalitions based on mutual capabilities and situational awareness. In contrast, the M+ algorithm leverages market mechanisms to allocate tasks, optimizing efficiency through competitive bidding among agents. The PSO approach, rooted in swarm intelligence, focuses on optimizing coalition structures based on collective experieFnce and positioning, promoting swift convergence towards optimal solutions. While both the Alliance and RL methods yield promising results in CF, the reinforcement learning approach emerges as a robust framework for future studies. This is primarily due to its adaptive learning capabilities, which allow robots to continuously improve their decision-making processes through trial and error, leading to enhanced performance in dynamic environments.

Despite the preliminary findings indicating effective results from the Alliance and RL algorithms, the inclusion of reinforcement learning is particularly compelling for future investigations. It offers the potential for self-optimization and adaptability, accommodating varying task demands and environmental conditions that may not be fully captured by the other methods. Moreover, the comparative analysis conducted through convergence matrices of 50 robots showcases the efficiency of these algorithms, highlighting RL’s capability to converge toward optimal coalition structures over time. Such insights pave the way for further exploration of RL in CF, potentially leading to more resilient and flexible robotic systems capable of adapting to complex and evolving scenarios.

4.5.2. Analysis of Convergence Time in Coalition Formation Algorithms Using ANOVA

To evaluate the temporal efficiency of various coalition formation strategies, we analyze the convergence time (measured in iterations) for four algorithms: Alliance, M+, PSO, and RL. The box plot comparison in Figure 14 visually illustrates the distribution and variance of convergence time across 50 robots per algorithm. Among the four, PSO consistently achieves the fastest convergence, with a mean of 12.27 iterations and low variance (standard deviation (std) = 10.99), indicating its strong ability to quickly lead agents to task completion. In contrast, the RL and M+ algorithms exhibit the highest mean convergence times, at 53.24 and 51.32 iterations, respectively, with broader interquartile ranges and several outliers, suggesting less consistent performance in dynamic scenarios.

The Alliance algorithm demonstrates a moderate performance, with a mean convergence time of 24.88 iterations and a standard deviation of 13.22, showing a more stable convergence compared to M+ and RL but slower than PSO. While Alliance does not optimize convergence speed aggressively, it provides a relatively balanced trade-off between speed and robustness. The M+ algorithm, although market-driven and theoretically scalable, shows substantial variance in convergence time (std = 22.15), likely due to its dependency on decentralized bidding and communication overhead, which can delay coordination.

A one-way ANOVA test is conducted to determine whether the differences in convergence time across the four algorithms were statistically significant. The analysis yielded an F-statistic of 116.93 and a p-value of <0.0001, confirming that at least one algorithm’s mean convergence time differs significantly from the others. This strongly supports the observation that algorithm choice significantly impacts convergence efficiency in multi-robot coalition formation. The high F-value reflects a large variance between group means compared to within-group variance, suggesting that performance differences are substantial rather than random.

From a practical standpoint, while PSO appears to offer the fastest convergence, its limitations such as converging all robots to the same subtask can hinder task coverage and overall mission success. On the other hand, RL and M+ may take longer to converge but offer potential advantages in learning-based coordination and flexibility under real-world conditions. Alliance, though heuristic, balances speed and structure. Therefore, the choice of algorithm should be guided not only by convergence time but also by the nature of the task, the requirement for decentralized decision-making, and the environment’s dynamics.

4.5.3. Computation Cost Analysis of MRTA Strategies

Table 9 highlights the CPU time required by four different MRTA strategies: Alliance, M+, PSO, and RL over 100 iterations. Among these, the PSO algorithm demonstrates the lowest computational cost, with an impressively minimal CPU time of just 0.051 s. This suggests that PSO is highly efficient in terms of processing time, making it a favorable choice for real-time or resource-constrained robotic applications. In addition to the total CPU time over 100 iterations, we also measure the computational cost per iteration for the PSO algorithm, which is found to be as low as 0.000597 s. This highlights PSO’s remarkable efficiency in terms of computational speed. However, despite its low CPU cost, PSO demonstrates significant limitations in the context of coalition-based multi-robot task allocation.

A key drawback observed is that PSO tended to direct all robots toward a single subtask during convergence, resulting in the remaining subtasks being neglected. This centralized convergence behavior undermines the fundamental goal of parallel task execution in multi-robot systems. In scenarios involving multiple simultaneous subtasks, PSO fails to form optimal coalitions that would allow tasks to be completed concurrently. Consequently, while PSO appears computationally efficient, it is ineffective in scenarios requiring distributed coalition formation, leading to suboptimal task completion times and reduced overall system efficiency.

In contrast, the Alliance-based approach consumes the highest CPU time at 289.7 s, indicating significant computational overhead. This can be attributed to its behavior-based architecture, which involves complex coordination mechanisms and constant monitoring of robot states and task eligibility. The M+ strategy, while significantly faster than Alliance (70.57 s), still requires more processing than PSO and RL, reflecting the computational burden of market-based bidding and task negotiation. Meanwhile, the RL method stands between the two extremes, with a CPU time of 26.35 s reasonable considering the learning and decision-making processes involved.

Overall, these results show that while PSO is the most lightweight in terms of computation, other methods like RL offer a trade-off between speed and adaptability. The choice of algorithm should therefore balance computational efficiency with the specific task complexity and adaptability needs of the MRTA system.

4.5.4. Scalability Comparison of RL and Alliance Algorithms

Following the above comparisons and results, both Alliance and RL-based approaches demonstrate notable competency in addressing the MRTA problem. While Alliance exhibits higher CPU time relative to RL in our evaluated scenario, it is important to acknowledge that RL may also become sensitive to increased system complexity. To derive a more comprehensive understanding of their overall performance, a scalability analysis was conducted, assessing how each algorithm handles larger team sizes. This comparison provides a more informed basis for evaluating which method offers a more robust and efficient solution under varying operational scales shown in Table 10.

The scalability of coalition formation algorithms is critical when applied to large-scale multi-robot systems. The results demonstrate that the RL-based approach consistently achieves significantly lower average final distances across all tested team sizes (10 to 100 robots) compared to the Alliance algorithm. For instance, with 10 robots, the RL algorithm achieves a final distance of 11.49 units versus 36.94 units for Alliance. This performance gap remains evident as team size increases, with RL achieving just 12.20 units for 100 robots while Alliance reaches 40.16 units. These results indicate that RL maintains efficient spatial convergence and task allocation regardless of swarm size, showcasing its robustness and adaptability.

In addition to better task convergence, the RL-based approach exhibits superior computational efficiency as the number of robots increases. The CPU time required by the RL model grows slowly and remains consistently lower than that of Alliance. For instance, at 100 robots, RL completes processing in 0.0158 s compared to 0.0437 s for Alliance nearly a threefold difference. Overall, the results clearly show that the RL-based method scales more gracefully in both task convergence and CPU usage. As team size increases, Alliance suffers from growing coordination complexity, leading to higher distances and longer CPU times due to its reliance on emergent behaviour and decentralized rule execution. In contrast, RL generalizes well across different scales, maintaining decision quality and computational efficiency. These findings highlight the RL framework’s suitability for real-time deployment in large-scale multi-robot systems, where quick, coordinated decision-making is essential.

Enhancing the scalability of MRTA solutions remains a critical aspect of developing efficient and robust MRS, irrespective of the specific algorithmic framework employed. From our perspective, several strategies hold promise in addressing this challenge. Hierarchical decomposition and task clustering can partition large-scale MRTA problems into smaller, more tractable sub-problems, facilitating more efficient allocation. The use of heuristics, metaheuristic methods, and approximation techniques can offer rapid, near-optimal solutions without the computational burden of exact methods. Incorporating capability-aware filtering can significantly reduce the task-robot pairing search space, while employing communication-efficient architectures such as peer-to-peer or publish-subscribe models can mitigate network congestion in large teams. Moreover, strategies like coalition formation for complex tasks and dynamic task reallocation contribute to greater system adaptability and robustness. Hybrid approaches that integrate multiple allocation paradigms may further enhance performance. Overall, advancing the scalability of MRTA frameworks represents a valuable and timely direction for future research.

5. Discussion

In this study, we adopt a simplified kinematic motion model where each robot moves at a constant speed of 1 unit per time step toward its assigned subtask using direct vector-based navigation. This abstraction allows us to isolate and evaluate the effectiveness of the task allocation and coalition formation algorithms without the additional complexity of low-level dynamics or obstacle avoidance.

While we acknowledge that real-world mobile robots are subject to motion constraints such as maximum velocity, acceleration limits, turning radius, and potential for collisions, our goal in this phase is to validate the high-level decision-making logic. Collision avoidance, speed regulation, and motion constraints are identified as future extensions of this work. These aspects will be integrated in subsequent stages using a more realistic simulation environment, specifically, ROS2 Humble with Gazebo employing TurtleBot3 platforms. Furthermore, the extended framework will be validated on physical robots equipped with onboard motion control systems and safety mechanisms to ensure reliable and safe deployment in real-world conditions.

Upon reviewing the literature, it becomes evident that learning-based methods hold the potential to adapt and attain a truly optimal solution for the Multi-Robot Task Allocation (MRTA) problem. Each of these methods, including behavior-based, market-based, and optimization-based approaches, may excel in certain aspects, while others might fall short. Nevertheless, learning-based approaches, when applied to the MRTA problem, can offer solutions, even when dealing with large teams. When analyzing algorithms against mTSP, VRP, JSP, TOP, and DARP, they are not scaled well to many robots or locations, and they may struggle with real-time adaptation to dynamic environments except learning-based algorithms.

Learning-based MRTA approaches are adept at tackling exceedingly intricate problems that conventional techniques struggle to address. Previous research has underscored the competitive nature of reinforcement learning in achieving long-term outcomes in this context. Furthermore, it possesses the capability to rectify errors that may have arisen during the training phase and can adapt when confronted with an insufficient quantity of training data, drawing insights from its experiences. Notably, one of the primary strengths of reinforcement learning lies in its ability to balance exploration and exploitation. Exploration involves the testing of novel ideas to discover potential superior solutions, while exploitation entails leveraging the strategies that have previously proven to be effective. The primary challenge associated with learning-based Multi-Robot Task Allocation (MRTA) is the time required for training. Achieving a high number of reward points necessitates a substantial number of trials, which can be time-intensive.

Interestingly, an intriguing avenue for addressing the limitations of learning-based MRTA is to combine these approaches with one or more alternative methods that can mitigate their drawbacks. Such hybrid approaches hold the promise of yielding viable solutions to the MRTA problem. Additionally, more intricate learning techniques, such as deep reinforcement learning or deep neural networks, have the potential to enhance performance. Moreover, there exists ample room for future research and development in the realm of learning-based MRTA, offering exciting prospects for further advancements.

The future research will primarily focus on addressing these challenges, particularly partial observability, and non-stationarity in CF, using a learning-based approach.

6. Conclusions

This study conducted a comprehensive evaluation of four coalition formation strategies: Alliance, M+, Particle Swarm Optimization (PSO), and Reinforcement Learning (RL) within the framework of MRTA. Using simulation-based experiments, the performance of each algorithm was assessed across several key metrics, including convergence time, average convergence distance, and computational cost. The findings indicate that although PSO exhibited the fastest convergence behavior, it failed to ensure effective task distribution across multiple subtasks. The M+ algorithm, while market-based, lacked structured coordination and showed high sensitivity to communication disruptions. In contrast, the Alliance approach provided better structural organization in task allocation but was hindered by greater computational overhead and limited scalability when applied to larger robot teams.

Among the evaluated approaches, the RL-based strategy emerged as the most promising in terms of adaptability and scalability. It maintained low inference latency, efficient task convergence, and significantly lower CPU time, even as the team size increased. Furthermore, statistical analysis using ANOVA validated the significance of observed differences in performance across algorithms. The RL model also proved viable for real-time deployment, operating within the bounds of typical robotic control cycles without requiring GPU acceleration.

CF plays a critical role in enhancing the efficiency and effectiveness of MRTA systems. Through an in-depth examination of existing strategies, we highlight that while various methods such as behavior-based, market-based, optimization-based, and learning-based approaches offer viable solutions, each comes with its own advantages and limitations. The comparative analysis emphasizes that no single method fits all scenarios; rather, the effectiveness of a CF strategy is heavily dependent on the nature of the task, the capabilities of the robots, and the dynamic environment in which they operate. Our simulation results affirm that strategic grouping of robots based on proximity, capability, and task requirements leads to more reliable and faster convergence toward optimal task execution.

In conclusion, this paper not only provides a structured overview of current MRTA approaches with a focus on CF but also proposes a classification system to better understand and compare these methodologies. The introduced simulation-based evaluation framework further substantiates the potential of the proposed CF approach in solving complex, real-world MRTA problems. Future work may explore hybrid models that dynamically switch between strategies based on environmental changes and robot feedback, paving the way for more adaptive, scalable, and intelligent multi-robot systems.

Author Contributions

Conceptualization, K.A.; methodology, K.A.; formal analysis, K.A.; investigation, K.A., D.P., A.Y. and H.W.; resources, K.A., D.P., A.Y. and H.W.; writing—original draft preparation, K.A.; writing—review and editing, K.A., D.P., A.Y. and H.W.; visualization, K.A.; supervision, K.A., D.P., A.Y. and H.W.; project administration, K.A., D.P., A.Y. and H.W.; funding acquisition, K.A., D.P., A.Y. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a Higher Degree by Research (HDR) scholarship awarded by Murdoch University, located in Murdoch, Perth, Australia.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gautam, A.; Mohan, S. A review of research in multi-robot systems. In Proceedings of the 2012 IEEE 7th International Conference on Industrial and Information Systems (ICIIS), Chennai, India, 6–9 August 2012. [Google Scholar]
Parker, L. Distributed intelligence: Overview of the field and its application in multi-robot systems. J. Phys. Agents 2008, 2, 5–14. [Google Scholar] [CrossRef]
Arai, T.; Parker, L. Editorial: Advances in multi-robot systems. IEEE Trans. Robot. Autom. 2003, 18, 655–661. [Google Scholar] [CrossRef]
Ahmad, A.; Walter, V.; Petráček, P.; Petrlík, M.; Báča, T.; Žaitlík, D.; Saska, M. Autonomous aerial swarming in GNSS-denied environments with high obstacle density. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; Available online: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9561284/ (accessed on 13 May 2024).
Team CERBERUS Wins the DARPA Subterranean Challenge. Autonomous Robots Lab. Available online: https://www.autonomousrobotslab.com/ (accessed on 13 May 2024).
Autonomous Mobile Robots (AMR) for Factory Floors: Key Driving Factors. 2021. Roboticsbiz. Available online: https://roboticsbiz.com/autonomous-mobile-robots-amr-for-factory-floors-key-driving-factors/ (accessed on 13 May 2024).
Different Types of Robots Transforming the Construction Industry. 2020. Roboticsbiz. Available online: https://roboticsbiz.com/different-types-of-robots-transforming-the-construction-industry/ (accessed on 13 June 2024).
Robocup Soccer Small-Size League. Robocup. Available online: https://robocupthailand.org/services/robocup-soccer-small-size-league/ (accessed on 13 May 2024).
Robots in Agriculture and Farming. 2022. Cyber-Weld Robotic System Integrators. Available online: https://www.cyberweld.co.uk/robots-in-agriculture-and-farming/ (accessed on 13 May 2024).
Roldán Gómez, J.; Barrientos, A. Special issue on multi-robot systems: Challenges, trends, and applications. Appl. Sci. 2021, 11, 11861. [Google Scholar] [CrossRef]
Khamis, A.; Hussein, A.; Elmogy, A. Multi-robot task allocation: A review of the state-of-the-art. In Cooperative Robots and Sensor Networks 2015; Springer: Cham, Switzerland, 2015; pp. 31–51. [Google Scholar]
Arjun, K.; Parlevliet, D.; Wang, H.; Yazdani, A. Analyzing multi-robot task allocation and coalition formation methods: A comparative study. In Proceedings of the 2024 International Conference on Advanced Robotics, Control, and Artificial Intelligence, Perth, Australia, 9–12 December 2024. [Google Scholar]
Gerkey, B.P.; Mataric, M.J. Multi-robot task allocation: Analyzing the complexity and optimality of key architectures. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422), Taipei, Taiwan, 19 September 2003. [Google Scholar]
Bai, X.; Li, C.; Zhang, B.; Wu, Z.; Ge, S.S. Efficient performance impact algorithms for multirobot task assignment with deadlines. IEEE Trans. Ind. Electron. 2024, 71, 14373–14382. [Google Scholar] [CrossRef]
Bai, X.; Yan, W.; Ge, S.S. Efficient task assignment for multiple vehicles with partially unreachable target locations. IEEE Internet Things J. 2021, 8, 3730–3742. [Google Scholar] [CrossRef]
Bai, X.; Jiang, H.; Li, C.; Ullah, I.; Al Dabel, M.M.; Bashir, A.K.; Wu, Z.; Sam, S. Efficient hybrid multi-population genetic algorithm for multi-UAV task assignment in consumer electronics applications. IEEE Trans. Consum. Electron. 2025, 1–12. [Google Scholar] [CrossRef]
Gerkey, B.; Mataric, M. A formal analysis and taxonomy of task allocation in multi-robot systems. Int. J. Robot. Res. 2004, 23, 939–954. [Google Scholar] [CrossRef]
Korsah, G.; Stentz, A.; Dias, M. A comprehensive taxonomy for multi-robot task allocation. Int. J. Robot. Res. 2013, 32, 1495–1512. [Google Scholar] [CrossRef]
Saravanan, S.; Ramanathan KC Mm, R.; Janardhanan, M.N. Review on state-of-the-art dynamic task allocation strategies for multiple-robot systems. Ind. Robot. Int. J. Robot. Res. Appl. 2020, 47, 929–942. [Google Scholar]
Wen, X.; Zhao, Z.G. Multi-robot task allocation based on combinatorial auction. In Proceedings of the 2021 9th International Conference on Control, Mechatronics and Automation (ICCMA), Esch-sur-Alzette, Luxembourg, 11–14 November 2021. [Google Scholar]
Chakraa, H.; Guérin FLeclercq, E.; Lefebvre, D. Optimization techniques for multi-robot task allocation problems: Review on the state-of-the-art. Robot. Auton. Syst. 2023, 168, 104492. [Google Scholar] [CrossRef]
Dos Reis, W.P.N.; Lopes, G.L.; Bastos, G.S. An arrovian analysis on the multi-robot task allocation problem: Analyzing a behavior-based architecture. Robot. Auton. Syst. 2021, 144, 103839. [Google Scholar] [CrossRef]
Agrawal, A.; Bedi, A.; Manocha, D. RTAW: An attention inspired reinforcement learning method for multi-robot task allocation in warehouse environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 1393–1399. [Google Scholar]
Cheikhrouhou, O.; Khoufi, I. A comprehensive survey on the multiple traveling salesman problem: Applications, approaches and taxonomy. Comput. Sci. Rev. 2021, 40, 100369. [Google Scholar] [CrossRef]
Deng, P.; Amirjamshidi, G.; Roorda, M. A vehicle routing problem with movement synchronization of drones, sidewalk robots, or foot-walkers. Transp. Res. Procedia 2020, 46, 29–36. [Google Scholar] [CrossRef]
Sun, Y.; Chung, S.-H.; Wen, X.; Ma, H.-L. Novel robotic job-shop scheduling models with deadlock and robot movement considerations. Transp. Res. Part E Logist. Transp. Rev. 2021, 149, 102273. [Google Scholar] [CrossRef]
Santana, K.A.; Pinto, V.P.; Souza DAd Torres, J.L.O.; Teles, I.A.G. (Eds.) New GA Applied Route Calculation for Multiple Robots with Energy Restrictions; EasyChair: Manchester, UK, 2020. [Google Scholar]
Jorgensen, R.; Larsen, J.; Bergvinsdottir, K. Solving the dial-a-ride problem using genetic algorithms. J. Oper. Res. Soc. 2007, 58, 1321–1331. [Google Scholar] [CrossRef]
Hussein, A.; Khamis, A. Market-based approach to multi-robot task allocation. In Proceedings of the 2013 International Conference on Individual and Collective Behaviors in Robotics (ICBR), Sousse, Tunisia, 15–17 December 2013. [Google Scholar]
Ramanathan, K.C.; Singaperumal, M.; Nagarajan, T. Cooperative formation planning and control of multiple mobile robots. In Mobile Robots—Control Architectures, Bio-Interfacing, Navigation, Multi Robot Motion Planning and Operator Training; IntechOpen Limited: London, UK, 2011. [Google Scholar]
Aziz, H.; Pal, A.; Pourmiri, A.; Ramezani, F.; Sims, B. Task allocation using a team of robots. Curr. Robot. Rep. 2022, 3, 227–238. [Google Scholar] [CrossRef]
Zitouni, F.; Harous, S.; Maamri, R. A distributed approach to the multi-robot task allocation problem using the consensus-based bundle algorithm and ant colony system. IEEE Access 2020, 8, 27479–27494. [Google Scholar] [CrossRef]
Rauniyar, A.; Muhuri, P.K. Multi-robot coalition formation problem: Task allocation with adaptive immigrants based genetic algorithms. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016. [Google Scholar]
Kraus, S. Negotiation and cooperation in multi-agent environments. Artif. Intell. 1997, 94, 79–97. [Google Scholar] [CrossRef]
Tosic, P.; Ordonez, C. Distributed protocols for multi-agent coalition formation: A negotiation perspective. In Proceedings of the 8th International Conference, AMT 2012, Macau, China, 4–7 December 2012. [Google Scholar]
Selseleh Jonban, M.; Akbarimajd, A.; Hassanpour, M. A combinatorial auction algorithm for a multi-robot transportation problem. In Proceedings of the 3rd International Conference on Machine Learning and Computer Science (IMLCS’2014), Dubai, United Arab Emirates, 5–6 January 2014. [Google Scholar]
Capitan, J.; Spaan, M.T.J.; Merino, L.; Ollero, A. Decentralized multi-robot cooperation with auctioned POMDPs. Int. J. Robot. Res. 2013, 32, 650–671. [Google Scholar] [CrossRef]
Hernandez-Leal, P.; Kaisers, M.; Baarslag, T.; Munoz de Cote, E. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity; Cornell University: New York, NY, USA, 2017. [Google Scholar]
Vig, L.; Adams, J.A. Multi-robot coalition formation. IEEE Trans. Robot. 2006, 22, 637–649. [Google Scholar] [CrossRef]
Guerrero, J.; Oliver, G. Multi-robot coalition formation in real-time scenarios. Robot. Auton. Syst. 2012, 60, 1295–1307. [Google Scholar] [CrossRef]
Rizk, Y.; Awad, M.; Tunstel, E. Cooperative heterogeneous multi-robot systems: A Survey. ACM Comput. Surv. 2019, 52, 1–31. [Google Scholar] [CrossRef]
Ramanathan, K.C.; Singaperumal, M.; Nagarajan, T. Behaviour based planning and control of leader follower formations in wheeled mobile robots. Int. J. Adv. Mechatron. Syst. 2010, 2, 281. [Google Scholar]
Schillinger, P.; Bürger, M.; Dimarogonas, D. Simultaneous task allocation and planning for temporal logic goals in heterogeneous multi-robot systems. Int. J. Robot. Res. 2018, 37, 818–838. [Google Scholar] [CrossRef]
Parker, L.E. ALLIANCE: An architecture for fault tolerant multirobot cooperation. IEEE Trans. Robot. Autom. 1998, 14, 220–240. [Google Scholar] [CrossRef]
Parker, L.E. ALLIANCE: An architecture for fault tolerant, cooperative control of heterogeneous mobile robots. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’94), Munich, Germany, 12–16 September 1994. [Google Scholar]
Parker, L. L-ALLIANCE: A Mechanism for Adaptive Action Selection in Heterogeneous Multi-Robot Teams; Oak Ridge National Laboratory: Oak Ridge, TN, USA, 1996. [Google Scholar]
Parker, L. Evaluating success in autonomous multi-robot teams: Experiences from ALLIANCE architecture implementations. J. Exp. Theor. Artif. Intell. 2000, 13, 95–98. [Google Scholar] [CrossRef]
Parker, L. On the design of behavior-based multi-robot teams. Adv. Robot. 2002, 10, 547–578. [Google Scholar] [CrossRef]
Parker, L.E. Task-oriented multi-robot learning in behavior-based systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS ’96, Osaka, Japan, 8 November 1996. [Google Scholar]
Mesterton-Gibbons, M.; Gavrilets, S.; Gravner, J.; Akçay, E. Models of coalition or alliance formation. J. Theor. Biol. 2011, 274, 187–204. [Google Scholar] [CrossRef]
Dahl, T.; Matarić, M.; Sukhatme, G. Multi-robot task allocation through vacancy chain scheduling. Robot. Auton. Syst. 2009, 57, 674–687. [Google Scholar] [CrossRef]
Lerman, K.; Galstyan, A.; Martinoli, A.; Ijspeert, A.J. A macroscopic analytical model of collaboration in distributed robotic systems. Artif. Life 2001, 7, 375–393. [Google Scholar] [CrossRef]
Jia, X.; Meng, M.Q.H. A survey and analysis of task allocation algorithms in multi-robot systems. In Proceedings of the 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO), Shenzhen, China, 12–14 December 2013. [Google Scholar]
Werger, B.; Mataric, M. Broadcast of local eligibility: Behavior-based control for strongly cooperative robot teams. In Proceedings of the Fourth International Conference on Autonomous Agents, Barcelona, Spain, 3–7 June 2000. [Google Scholar]
Werger, B.; Mataric, M. Broadcast of local eligibility for multi-target observation. In Distributed Autonomous Robotic Systems 4; Springer: Tokyo, Japan, 2000; pp. 347–356. [Google Scholar]
Faigl, J.; Kulich, M.; Preucil, L. Goal assignment using distance cost in multi-robot exploration. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012; pp. 3741–3746. [Google Scholar]
Tang, F.; Parker, L. ASyMTRe: Automated synthesis of multi-robot task solutions through software reconfiguration. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 1501–1508. [Google Scholar]
Fang, T.; Parker, L.E. Coalescent multi-robot teaming through ASyMTRe: A formal analysis. In Proceedings of the ICAR ‘05. Proceedings, 12th International Conference on Advanced Robotics, Piscataway, NJ, USA, 18–20 July 2005. [Google Scholar]
Fang, G.; Dissanayake, G.; Lau, H. A behaviour-based optimisation strategy for multi-robot exploration. In Proceedings of the IEEE Conference on Robotics, Automation and Mechatronics, Singapore, 1–3 December 2004; Volume 2, pp. 875–879. [Google Scholar]
Trigui, S.; Koubaa, A.; Cheikhrouhou, O.; Youssef, H.; Bennaceur, H.; Sriti, M.-F.; Javed, Y. A distributed market-based algorithm for the multi-robot assignment problem. Procedia Comput. Sci. 2014, 32, 1108–1114. [Google Scholar] [CrossRef]
Badreldin, M.; Hussein, A.; Khamis, A. A comparative study between optimization and market-based approaches to multi-robot task allocation. Adv. Artif. Intell. 2013, 2013, 256524. [Google Scholar] [CrossRef]
Service, T.C.; Sen, S.D.; Adams, J.A. A simultaneous descending auction for task allocation. In Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, CA, USA, 5–8 October 2014. [Google Scholar]
Vig, L.; Adams, J. A Framework for Multi-Robot Coalition Formation. In Proceedings of the 2nd Indian International Conference on Artificial Intelligence, Pune, India, 20–22 December 2005; pp. 347–363. [Google Scholar]
Vig, L.; Adams, J. Market-based multi-robot coalition formation. In Distributed Autonomous Robotic Systems 7; Springer: Tokyo, Japan, 2007; pp. 227–236. [Google Scholar]
Vig, L.; Adams, J.A. Coalition formation: From software agents to robots. J. Intell. Robot. Syst. 2007, 50, 85–118. [Google Scholar] [CrossRef]
Lueth, T.; Längle, T. Task description, decomposition, and allocation in a distributed autonomous multi-agent robot system. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’94), Munich, Germany, 12–16 September 1994. [Google Scholar]
Längle, T.; Lueth, T.; Rembold, U. A distributed control architecture for autonomous robot systems. In Modelling and Planning for Sensor Based Intelligent Robot Systems; World Scientific Publishing Co Pte Ltd.: Singapore, 1995; pp. 384–402. [Google Scholar]
Gerkey, B.P.; Mataric, M.J. Sold!: Auction methods for multirobot coordination. IEEE Trans. Robot. Autom. 2002, 18, 758–768. [Google Scholar] [CrossRef]
Guidotti, C.F.; Baião ATBastos, G.S.; Leite, A.H.R. A murdoch-based ROS package for multi-robot task allocation. In Proceedings of the 2018 Latin American Robotic Symposium 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), Joao Pessoa, Brazil, 6–10 November 2018. [Google Scholar]
Lagoudakis, M.G.; Markakis, E.; Kempe, D.; Keskinocak, P.; Kleywegt, A.; Koenig, S.; Tovey, C.; Meyerson, A.; Jain, S. Auction-based multi-robot routing. In Proceedings of the 2005 International Conference on Robotics: Science and Systems I, Cambridge, MA, USA, 8–11 June 2005. [Google Scholar]
Lin, L.; Zheng, Z. Combinatorial bids based multi-robot task allocation method. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005. [Google Scholar]
Sheng, W.; Yang, Q.; Tan, J.; Xi, N. Distributed multi-robot coordination in area exploration. Robot. Auton. Syst. 2006, 54, 945–955. [Google Scholar] [CrossRef]
Gerkey, B.; Matari, M. MURDOCH: Publish/subscribe task allocation for heterogeneous agents. In Proceedings of the National Conference on Artificial Intelligence, AAAI 2000 (Student Abstract), Austin, TX, USA, 30 July–3 August 2000. [Google Scholar]
Alami, R.; Fleury, S.; Herrb, M.; Ingrand, F.; Robert, F. Multi-robot cooperation in the MARTHA project. IEEE Robot. Autom. Mag. 1998, 5, 36–47. [Google Scholar] [CrossRef]
Botelho, S.; Alami, R. M+: A scheme for multi-robot cooperation through negotiated task allocation and achievement. In Proceedings of the 1999 IEEE International Conference on Robotics and Automation, Detroit, MI, USA, 10–15 May 1999; Volume 2, pp. 1234–1239. [Google Scholar]
Botelho, S.C.; Alami, R. A multi-robot cooperative task achievement system. In Proceedings of the 2000 ICRA, Millennium Conference, IEEE International Conference on Robotics and Automation, Symposia Proceedings (Cat. No.00CH37065), San Francisco, CA, USA, 24–28 April 2000. [Google Scholar]
Smith. The contract net protocol: High-level communication and control in a distributed problem solver. IEEE Trans. Comput. 1980, C-29, 1104–1113. [Google Scholar] [CrossRef]
Dias, M.B.; Zlot, R.; Zinck, M.; Gonzalez, J.P.; Stentz, A. A Versatile Implementation of the Traderbots Approach for Multirobot Coordination; Carnegie Mellon University: Pittsburgh, PA, USA, 29 June 2018. [Google Scholar] [CrossRef]
Dias, M.B.; Stentz, A. A Free Market Architecture for Distributed Control of a Multirobot System; Computer Science, Engineering: Dalian, China, 2000. [Google Scholar]
Zlot, R.; Stentz ADias, M.B.; Thayer, S. Multi-robot exploration controlled by a market economy. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), Washington, DC, USA, 11–15 May 2002. [Google Scholar]
Dias, M.B.; Stentz, A. Traderbots: A New Paradigm for Robust and Efficient Multirobot Coordination in Dynamic Environments; Carnegie Mellon University: Pittsburgh, PA, USA, 2004. [Google Scholar]
Hussein, A.; Marín-Plaza, P.; García, F.; Armingol, J.M. Hybrid optimization-based approach for multiple intelligent vehicles requests allocation. J. Adv. Transp. 2018, 2018, 2493401. [Google Scholar] [CrossRef]
Shelkamy, M.; Elias CMMahfouz, D.M.; Shehata, O.M. Comparative analysis of various optimization techniques for solving multi-robot task allocation problem. In Proceedings of the 2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES), Giza, Egypt, 24–26 October 2020. [Google Scholar]
Atay, N.; Bayazit, B. Mixed-Integer Linear Programming Solution to Multi-Robot Task Allocation Problem; Washington University in St Louis: St Louis, MO, USA, 2006. [Google Scholar]
Bouyarmane, K.; Vaillant, J.; Chappellet, K.; Kheddar, A. Multi-Robot and Task-Space Force Control with Quadratic Programming. 2017. [Google Scholar]
Pugh, J.; Martinoli, A. Inspiring and modeling multi-robot search with particle swarm optimization. In Proceedings of the 2007 IEEE Swarm Intelligence Symposium, Washington, DC, USA, 1–5 April 2007. [Google Scholar]
Imran, M.; Hashim, R.; Khalid, N.E.A. An overview of particle swarm optimization variants. Procedia Eng. 2013, 53, 491–496. [Google Scholar] [CrossRef]
Li, X.; Ma, H.X. Particle swarm optimization based multi-robot task allocation using wireless sensor network. In Proceedings of the 2008 International Conference on Information and Automation, Hunan, China, 20–23 June 2008. [Google Scholar]
Nedjah, N.; Mendonc¸a, R.; Mourelle, L. PSO-based distributed algorithm for dynamic task allocation in a robotic swarm. Procedia Comput. Sci. 2015, 51, 326–335. [Google Scholar] [CrossRef]
Pendharkar, P.C. An ant colony optimization heuristic for constrained task allocation problem. J. Comput. Sci. 2015, 7, 37–47. [Google Scholar] [CrossRef]
Dorigo, M.; Birattari, M.; Stützle, T. Ant colony optimization: Artificial ants as a computational intelligence technique. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef]
Agarwal, M.; Agrawal, N.; Sharma, S.; Vig, L.; Kumar, N. Parallel multi-objective multi-robot coalition formation. Expert Syst. Appl. 2015, 42, 7797–7811. [Google Scholar] [CrossRef]
Dorigo, M.; Caro, G.D. Ant colony optimization: A new meta-heuristic. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Washington, DC, USA, 6–9 July 1999. [Google Scholar]
Wang, J.; Gu, Y.; Li, X. Multi-robot task allocation based on ant colony algorithm. J. Comput. 2012, 7, 2160–2167. [Google Scholar] [CrossRef]
Jianping, C.; Yimin, Y.; Yunbiao, W. Multi-robot task allocation based on robotic utility value and genetic algorithm. In Proceedings of the 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, China, 20–22 November 2009; pp. 256–260. [Google Scholar]
Haghi Kashani, M.; Jahanshahi, M. Using simulated annealing for task scheduling in distributed systems. In Proceedings of the 2009 International Conference on Computational Intelligence, Modelling and Simulation, Brno, Czech Republic, 7–9 September 2009; pp. 265–269. [Google Scholar]
Mosteo, A.; Montano, L. Simulated Annealing for Multi-Robot Hierarchical Task Allocation with Flexible Constraints and Objective Functions. In Proceedings of the Workshop on Network Robot Systems: Toward Intelligent Robotic Systems Integrated with Environments, IROS, Beijing, China, 9–15 October 2006. [Google Scholar]
Chakraborty, S.; Bhowmik, S. Job shop scheduling using simulated annealing. In Proceedings of the 1st International Conference on Computation & Communication Advancement, West Bengal, India, 11–12 January 2013. [Google Scholar]
Elfakharany, A.; Yusof, R.; Ismail, Z. Towards multi robot task allocation and navigation using deep reinforcement learning. J. Phys. Conf. Ser. 2020, 1447, 012045. [Google Scholar] [CrossRef]
Dahl, T.; Mataric, M.; Sukhatme, G. A machine learning method for improving task allocation in distributed multi-robot transportation. In Complex Engineered Systems; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Wang, Y.; Silva, C. A machine-learning approach to multi-robot coordination. Eng. Appl. Artif. Intell. 2008, 21, 470–484. [Google Scholar] [CrossRef]
Yu, Y.; Tang, Q.; Jiang, Q.; Fan, Q. A deep reinforcement learning-assisted multimodal multiobjective bilevel optimization method for multirobot task allocation. IEEE Trans. Evol. Comput. 2025, 29, 574–588. [Google Scholar] [CrossRef]
Zhang, Z.; Jiang, X.; Yang, Z.; Ma, S.; Chen, J.; Sun, W. Scalable multi-robot task allocation using graph deep reinforcement learning with graph normalization. Electronics 2024, 13, 1561. [Google Scholar] [CrossRef]
Cunningham, P.; Cord, M.; Delany, S. Supervised Learning; Springer: Berlin/Heidelberg, Germany, 2008; pp. 21–49. [Google Scholar]
Sermanet, P.; Lynch CHsu, J.; Levine, S. Time-contrastive networks: Self-supervised learning from multi-view observation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Teichman, A.; Thrun, S. Tracking-based semi-supervised learning. Int. J. Robot. Res. 2012, 31, 804–818. [Google Scholar] [CrossRef]
Xu, J.; Zhu, S.; Guo, H.; Wu, S. Automated labeling for robotic autonomous navigation through multi-sensory semi-supervised learning on big data. IEEE Trans. Big Data 2021, 7, 93–101. [Google Scholar] [CrossRef]
Bousquet, O.; Luxburg, U.; Rätsch, G. Advanced lectures on machine learning. In ML Summer Schools 2003, Canberra, Australia, 2–14 February 2003, Tübingen, Germany, 4–16 August 2003, Revised Lectures; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Arel, I.; Liu, C.; Urbanik, T.; Kohls, A. Reinforcement learning-based multi-agent system for network traffic signal control. Intell. Transp. Syst. IET 2010, 4, 128–135. [Google Scholar] [CrossRef]
Verma, J.; Ranga, V. Multi-robot coordination analysis, taxonomy, challenges and future scope. J. Intell. Robot. Syst. 2021, 102, 10. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Silva, C.W.D. Multi-robot box-pushing: Single-agent Q-learning vs. team Q-learning. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–13 October 2006. [Google Scholar]
Guo, H.; Meng, Y. Dynamic correlation matrix based multi-Q learning for a multi-robot system. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008. [Google Scholar]

Figure 1. Real-world applications of MRS [4,5,6,7,8,9].

Figure 2. Challenges of MRS.

Figure 3. iTax: a two-level task allocation taxonomy [15].

Figure 4. Concept of coalition formation.

Figure 5. Classification chart for MRTA strategies.

Figure 6. A basic model of market-based MRTA.

Figure 7. Structure of machine learning.

Figure 8. Coalition formation using Alliance.

Figure 9. Coalition formation using M+.

Figure 10. Coalition formation using PSO.

Figure 11. CF using RL: (a) initial positions. (b) CF using RL in 25 iterations. (c) CF using RL in 51 iterations (d) CF using RL in 80 iterations. (e) CF using RL in 100 iterations.

Figure 12. Average convergence distance of 50 robots using Alliance, M+, PSO, and RL.

Figure 13. Average convergence time of 50 robots using Alliance, M+, PSO, and RL.

Figure 14. Convergence time comparison of Alliance, M+, PSO, and RL Algorithms.

Table 1. Efficiency and trade-offs of various approaches in behavior-based MRTA.

Algorithm	Efficiency	Advantages	Disadvantages
Alliance	High	Scalable, adaptable to dynamic environments, provide a higher degree of stability in coalition	Requires effective communication and coordination
Optimal allocation	Medium	Low communication overhead, stable coalitions	Limited scalability, sensitive to changes in the team
Cooperation	Medium to high	Efficient, distributed, Low communication	Tends to form smaller coalitions
Vacancy Chain	High	Adaptive, efficient task allocation	Requires sophisticated negotiation mechanisms

Table 2. Comparison of different algorithms under behavior-based MRTA.

Characteristics	Alliance	Vacancy Chain	BLE	ASyMTRE
Homogenous/ Heterogenous	Heterogeneous	Heterogeneous	Heterogeneous	Heterogeneous
Optimal allocation	Guarantee optimal allocation	Guarantee (Minimal)	Does not Guarantee	Guarantee (Minimal)
Cooperation	Strongly cooperative	Weak cooperation	Strongly cooperative	Strongly cooperative
Communicatiom	Strong	Limited	Strong	Limited
Hierarchy	Fully distributed	Not fully distributed	Fully distributed	Not fully distributed
Task reassignment	Possible through coalition reconfiguration)	Possible (via vacancy announcement)	(Possible based on dynamic eligibility)	(Possibly based on genetic optimization)

Table 3. Comparison of different algorithms under market-based MRTA.

Characteristics	RACHNA	KAMARA	MURDOCH	M+	TraderBots
Market-based	Negotiation-based	Market-based	Market-based	Negotiation-based	Auction-based
Bidding method	Uses a genetic algorithm to optimize bids	Bids are based on utility functions that consider the cost and quality of the task	Bids based on a simple cost function	Form coalitions to bid on tasks together	Bids are based on a reinforcement learning algorithm
Homogenous/ Heterogenous	Heterogeneous	Heterogeneous	Heterogeneous	Heterogeneous	Homogeneous robots
Fault tolerance	Not fault-tolerant	Fault-tolerant	Not fault-tolerant	Fault-tolerant	Fault-tolerant
Optimal allocation	Can guarantee depending on the fitness function	Can guarantee based on the utility function	Not Guaranteed	Can guarantee based on the coalition formation algorithm	Can guarantee
Cooperation	Cooperative	Cooperative	Strong cooperation	Cooperative	Strongly cooperative
Communication	Limited (Global communication)	Limited (Local communication)	Strong (Global communication)	Strong (Local communication)	Strong (Local communication)
Hierarchy	Distributed	Hybrid	Distributed (Loosely coupled)	Fully distributed	Combination of a distributed and centralized approach
Task reassignment	Not Possible	Possible	Not possible	Possible	Possible
Complexity	Moderate	Moderate	Simple	High	High
Cost	Moderate	Moderate	Low	High	High
Scalability	Limited	Highly scalable	Limited	Highly scalable	Highly scalable
Coalition formation	Yes	Possible	Yes, and dynamically adaptable	Yes, and dynamically adaptable	Yes

Table 4. Comparative overview of optimization-based approaches.

Approach	Optimization Technique	Advantages	Disadvantages
PSO [75,76,77,78]	Swarm intelligence	Easy implementation Efficient in solving continuous problems. Can handle multiple objectives. Can handle non-linear constraints. Fast convergence	Premature convergence. Weak local search ability. Require extensive parameter tuning.
ACO [79,80,81,82,83]	Swarm intelligence	Rapid discovery of reasonable solutions. Efficient in solving TSP problems and other discrete problems. Can handle non-linear constraints. Can search global optima. Can handle a dynamic environment.	Premature convergence. Weak local search ability. Probability distribution changes Iterations. Not effective in solving continuous problems. High computational cost
GA [84,85]	Evolutionary	Exchange information (Crossover or mutation) Efficient in solving continuous problems. Can search global optima. Can handle multiple objectives	Premature convergence. Weak local search ability. High computational effort. Difficult to encode a problem.
SA [86,87,88]	Stochastic	Rapid discovery of solutions. Applicable to large data sets. Can handle a dynamic environment. Can escape local optima. Can handle dynamic environments	A large number of iterations may be required to achieve an optimal solution. It does not guarantee that an optimal solution will be found. Slow convergence High computational cost
MILP [72,73]	Mathematical Programming	Efficient in solving linear and non-linear constraints. You can find global optima. Can handle multiple objectives	High computational effort. May not handle dynamic environments. May not scale well to significant problems
QP [74]	Mathematical Programming	Efficient in solving non-linear linear constraints	High computational effort. Have restrictions in some problem situations.

Table 5. Comparison of different algorithms under optimization-based MRTA.

Characteristics	Particle Swarm Optimization (PSO)/Ant Colony Optimization (ACO)/Genetic Algorithm (GA)/Simulated Annealing (SA)	Mixed Integer Linear Programming (M ILP)	Quadratic Programming (QP)
Fault tolerance	Robust to individual robot failures but not to system-wide failures	Not inherently fault-tolerant	Not inherently fault-tolerant
Optimal allocation	May converge to local optima and be able to handle multiple objectives	Can find globally optimal solutions, but computational complexity may increase with problem size.	Can find globally optimal solutions, but computational complexity may increase with problem size.
Scalability	Can handle significant problems efficiently but requires extensive parameter tuning.	Small-medium-sized problems	Can handle significant problems efficiently but requires extensive parameter tuning
Task reassignment	Can handle by updating the objective functions and constraints	Can handle by updating the objective functions and constraints	Can handle by updating the objective functions and constraints
Coalition formation	Can handle by adding appropriate terms to the objective functions and constraints	Can handle by adding appropriate terms to the objective functions and constraints	Can handle by adding appropriate terms to the objective functions and constraints
Complexity	Can handle complex optimization problems with non-linearities and multiple objectives	Can handle linear and non-linear constraints.	Can handle linear and non-linear constraints.
Cost	It can be less expensive than MILP and QP but requires extensive parameter tuning.	It can be expensive due to the computational complexity	It can be expensive due to the computational complexity

Table 6. Comparison between different machine learning approaches.

Factors	Supervised Learning	Unsupervised Learning	Semi-Supervised Learning	Reinforcement Learning
Fault tolerance	Low, sensitive to errors in the labels as it relies on labeled data for training	Low, may handle noise and outliers better as it does not require labels.	More fault-tolerant by leveraging both labeled and unlabeled data.	Medium, through exploration-exploitation trade-offs
Optimal allocation	Can achieve optimal allocation by learning from labeled data and mapping inputs to correct outputs.	No, mostly aims to discover patterns and relationships in the data.	Partially, it may require more specialized approaches	Partially, it may require more specialized approaches
Scalability	High, may face challenges due to the need for labeled data and computational complexity.	High as it does not require labeled data.	High, by utilizing both labeled and unlabeled data.	Medium-high, may face challenges due to the need for labeled data and computational complexity.
Task reassignment	Difficult, not inherently designed for it, may require additional mechanisms.	Yes, it naturally clusters data into groups.	Yes, can leverage both labeled and unlabeled data to handle task reassignment.	Yes, equipped to handle task reassignment in sequential decision-making problems.
Coalition formation	Not specifically tailored, may need additional considerations and adaptations.	Same as Supervised learning	Same as Supervised learning and Unsupervised Learning	Possible in situations where agents make sequential decisions in coalition formation tasks.
Complexity	Lower, but based on the specific algorithm and techniques used within	Same as Supervised learning	Same as Supervised learning and Unsupervised Learning	Higher due to the need to learn policies for sequential decision-making.
Cost	Lower for simple models but varies depending on the complexity of the model and size of the data.	Same as Supervised learning	Same as Supervised learning and Unsupervised Learning	High due to learning and exploration process.

Table 7. Advantages and disadvantages of different machine learning approaches.

Method	Efficiency	Advantages	Disadvantages
Supervised Learning	Low-Medium	Well-established and wildly used Effective for classification and regression tasks	Requires labeled data for training Limited ability to handle new or unseen data.
Semi-Supervised learning	Low-Medium	Combines labeled and unlabeled data Can leverage both supervised and unsupervised learning.	Limited labeled data may still lead to suboptimal performance Sensitive to the quality of unlabeled data.
Unsupervised learning	Low-Medium	Useful for clustering and data exploration tasks. Requires no labeled data for training	May not produce interpretable results. Difficult to evaluate performance without ground truth
Reinforcement learning	Medium-High	Suitable for sequential decision-making tasks Can learn from trial and error	Computationally expensive Requires significant exploration to discover optimal policies. May suffer from convergence issues

Table 8. Performance comparison of different MRTA strategies.

Factors	Behavior-Based Methods	Market-Based Method	Optimization-Based Methods	Learning-Based Methods
Scalability	Scalable for small- to moderate-sized systems	Scalable for small- to moderate-sized systems	Scalable for large systems	Can scale to large and complex systems
Complexity	Can handle simple to moderately complex tasks	Can handle complex tasks and heterogeneous robots	Can handle complex tasks and constraints	Can handle complex tasks, constraints, and heterogeneous robots
Optimality	May not always achieve optimality	Can achieve Pareto efficiency under certain conditions	Can achieve optimality under certain conditions.	Can achieve optimality under certain conditions. But guaranteed for good optimal allocation all the time.
Flexibility	Limited flexibility to adapt to new tasks or situations	Can be flexible and adaptable to changing market conditions	May be flexible depending on the optimization method used	Can be flexible and adaptable to changing environment
Robustness	May be robust to some degree of uncertainty or failures	Can be robust to some degree of market uncertainty and failures	May not be robust to uncertainty or failures.	Can improve robustness through learning from experience and failures.
Communication	Local communication among neighbor robots.	Multiple times broadcasting of winner robot details after bidding	Local communication among neighbour robots.	Local/Global communication
Objective function	Single/multiple objectives Implicit or ad hoc	Single/multiple objectives Optimization	Single/multiple objectives Mathematical	Single/multiple objectives Learning from data
Coordination type	Centralized/distributed	Centralized/distributed	Centralized/distributed	Decentralized
task reallocation method	Heuristics ruled searching/Bayesian Nash equilibrium	Iterative auctioning methods	Iterative searching and allocation	Reinforcement learning
Uncertainty handling techniques	Game theory/probabilistic predictive modelling	Iterative auctioning methods	Difficult to handle uncertainty	Adaptive models
Constraints	Can be handled in a collective manner	Difficult to conduct auctions	Complex and difficult to solve due to multiple decision variables	Varies based on learning algorithms
Computational cost	Higher than optimization-based strategy	Lower than optimization strategy	Higher than market-based strategy	High; needs large amount of data
Coalition formation	Low efficiency as the approach is based on local rules without a global optimization perspective.	Moderate efficiency due to negotiation and market mechanisms	High efficiency through global optimization approaches	Moderate efficiency as it relies on learning and adaptive algorithms.
Task reallocation	Limited ability to perform task reallocation dynamically as it relies on predefined rules.	Efficient task reallocation due to negotiation and the market mechanism	Efficient reallocation due to optimization algorithms and centralized coordination	Adaptive due to learning algorithms and flexible decision-making
Collision avoidance	Limited capability due to lack of sophisticated coordination mechanism	Effective collision avoidance due to price-based mechanisms and negotiations.	Effective due to optimized task allocation and coordination	Adaptive due to learning and sensor-based approaches
Dynamic decision-making	Limited adaptability due to its rule-based and reactive characteristics	Limited adaptability as it relies on predefined market rules.	Flexible due to mathematical optimization and modeling	Flexible through adaptive learning algorithms
Temporal constraints	Limited support due to a lack of coordinated decision-making	Moderate support due to negotiation and the market mechanism	Highly support handling temporal constraints through optimization techniques and advanced scheduling algorithms.	Highly support handling temporal constraints through learning and scheduling algorithms.

Table 9. CPU time comparison of different MRTA strategies.

Algorithm	CPU Time for 100 Iterations (Seconds)
Alliance	289.7019
M+	70.5721
PSO	0.051051
RL	26.3469

Table 10. Scalability metrics of Alliance vs. RL in MRTA.

Team Size	Alliance Average Final Distance	RL Average Final Distance	Alliance CPU Time (s)	RL CPU Time (s)
10	36.94	11.49	0.0031	0.0014
25	49.54	12.69	0.0081	0.0028
50	29.04	10.64	0.0172	0.0064
75	36.62	9.31	0.0286	0.0098
100	40.16	12.20	0.0437	0.0158

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arjun, K.; Parlevliet, D.; Wang, H.; Yazdani, A. Optimizing Coalition Formation Strategies for Scalable Multi-Robot Task Allocation: A Comprehensive Survey of Methods and Mechanisms. Robotics 2025, 14, 93. https://doi.org/10.3390/robotics14070093

AMA Style

Arjun K, Parlevliet D, Wang H, Yazdani A. Optimizing Coalition Formation Strategies for Scalable Multi-Robot Task Allocation: A Comprehensive Survey of Methods and Mechanisms. Robotics. 2025; 14(7):93. https://doi.org/10.3390/robotics14070093

Chicago/Turabian Style

Arjun, Krishna, David Parlevliet, Hai Wang, and Amirmehdi Yazdani. 2025. "Optimizing Coalition Formation Strategies for Scalable Multi-Robot Task Allocation: A Comprehensive Survey of Methods and Mechanisms" Robotics 14, no. 7: 93. https://doi.org/10.3390/robotics14070093

APA Style

Arjun, K., Parlevliet, D., Wang, H., & Yazdani, A. (2025). Optimizing Coalition Formation Strategies for Scalable Multi-Robot Task Allocation: A Comprehensive Survey of Methods and Mechanisms. Robotics, 14(7), 93. https://doi.org/10.3390/robotics14070093

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Coalition Formation Strategies for Scalable Multi-Robot Task Allocation: A Comprehensive Survey of Methods and Mechanisms

Abstract

1. Introduction

2. MRTA and CF

2.1. Multi-Robot Task Allocation (MRTA)

2.2. Coalition Formation (CF)

3. MRTA Classification

3.1. Behavior-Based MRTA

3.1.1. Alliance

3.1.2. Vacancy Chain Scheduling

3.1.3. Broadcast of Local Eligibility (BLE)

3.1.4. Automated Synthesis of Multi-Robot Task Solutions Through Software Reconfiguration (ASyMTRe)

3.2. Market-Based MRTA

3.2.1. RACHNA

3.2.2. KAMARA (KAMRO’s Multi-Agent Robot Architecture)

3.2.3. MURDOCH

3.2.4. M+

3.2.5. TraderBots

3.3. Optimization-Based MRTA

3.3.1. Traditional Optimization

3.3.2. Evolutionary Optimization

3.4. Learning-Based MRTA

Machine Learning

3.5. Comparison with Different MRTA Approaches

4. Simulation and Results

4.1. Behavior-Based: Alliance Architecture

4.2. Market-Based: M+ Algorithm

4.3. Optimization-Based: PSO Algorithm

4.4. Learning-Based: Reinforcement Learning

4.5. Statistical Analysis

4.5.1. Quantitative Results and Statistical Comparison of Alliance, M+, PSO, and RL

4.5.2. Analysis of Convergence Time in Coalition Formation Algorithms Using ANOVA

4.5.3. Computation Cost Analysis of MRTA Strategies

4.5.4. Scalability Comparison of RL and Alliance Algorithms

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI