You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

15 December 2024

Research on Real-Time Multi-Robot Task Allocation Method Based on Monte Carlo Tree Search

,
and
1
School of Computer and Communication Engineering, Pujiang Institute, Nanjing Tech University, Nanjing 210037, China
2
School of Computer Science and Engineering, Southeast University, Nanjing 211102, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue AI Applications of Multi-Agent Systems

Abstract

Task allocation is an important problem in multi-robot systems, particularly in dynamic and unpredictable environments such as offshore oil platforms, large-scale factories, or disaster response scenarios, where high change rates, uncertain state transitions, and varying task demands challenge the predictability and stability of robot operations. Traditional static task allocation strategies often struggle to meet the efficiency and responsiveness demands of these complex settings, while optimization heuristics, though improving planning time, exhibit limited scalability. To address these limitations, this paper proposes a task allocation method based on the Monte Carlo Tree Search (MCTS) algorithm, which leverages the anytime property of MCTS to achieve a balance between fast response and continuous optimization. Firstly, the centralized adaptive MCTS algorithm generates preliminary solutions and monitors the state of the robots in minimal time. It utilizes dynamic Upper Confidence Bounds for Trees (UCT) values to accommodate varying task dimensions, outperforming the heuristic Multi-Robot Goal Assignment (MRGA) method in both planning time and overall task completion time. Furthermore, the parallelized distributed MCTS algorithm reduces algorithmic complexity and enhances computational efficiency through importance sampling and parallel processing. Experimental results demonstrate that the proposed method significantly reduces computation time while maintaining task allocation performance, decreasing the variance of planning results and improving algorithmic stability. Our approach enables more flexible and efficient task allocation in dynamically evolving and complex environments, providing robust support for the deployment of multi-robot systems.

1. Introduction

Offshore oil and gas facilities are typically located in remote and isolated areas, and the unique nature of these locations presents significant challenges for human operators. Introducing fully autonomous robot systems is an effective way to address these issues [1]. Robots are able to perform tasks in these harsh environments, reducing reliance on human workers, lowering operating costs, and improving safety. In recent years, the potential of autonomous robots for offshore operations has been preliminarily explored and applied [2,3].
The preliminary research work [4,5] has set some benchmarks to demonstrate the effectiveness of autonomous vehicles participating in intervention tasks in marine environments. These studies typically focus on simple, single-task autonomous execution, such as inspection and monitoring. However, the operation and maintenance of offshore oil facilities require a higher level of autonomy, involving a series of complex tasks such as long-term facility inspections, systematic maintenance work, and responding to emergencies.
The traditional solution is to equip a single robot with all the hardware and software components required to perform complex tasks [6], and a major issue with this approach is its high cost. In order to adapt to various possible tasks, the design of these robots often needs to make trade-offs, resulting in insufficient performance on certain specific tasks. In addition, some specific tasks may only be applicable to certain types of robots. For example, vertical inspection of pipelines may only be performed by specially designed aircraft. Therefore, using multiple platforms optimized for specific tasks, which collaborate to achieve common goals [6,7], has become a more ideal solution. This heterogeneous multi-agent approach not only provides stronger robustness for the entire system but also expands the scope and flexibility of tasks. For example, ground robots can be responsible for inspecting and maintaining ground structures, while aerial robots can perform high-altitude operations or remote monitoring. This multi-robot collaboration method is theoretically feasible but still faces some challenges in practical applications.
Current multi-agent planning schemes often struggle to generate optimal task allocation plans [8], and low planning quality and computational efficiency are common issues in current multi-agent planning schemes. To address these issues, this study designed a robot group consisting of several ground and aerial robots equipped with different sensory systems that must perform regular supervision and control tasks in the oil drilling environment, which requires vehicles to move between different buildings and floors. Figure 1 shows an oil drilling scene supervised by a robot team. In this situation, task execution must consider the availability of robots to achieve goals in different areas of the oil drilling platform, the capabilities of individual robots, and the distance of task target points.
Figure 1. Simulation of oil drilling rig scene with robot operation.
Generally speaking, these methods address limited task complexity and diversity, making it impossible to evaluate the performance of planners in highly restricted domains. Recently, the newer heuristic task allocation strategy MRGA [9] has proposed a more effective computational task time allocation plan. This method assigns tasks based on robot functionality, redundant sensor systems, spatial distribution of targets, and task execution time, avoiding the need to calculate a large number of possible tasks. However, the excessive neglect of real-time task dynamics to improve computational speed has, to some extent, affected the scalability of tasks.
As a powerful decision algorithm, MCTS has demonstrated outstanding performance in multiple fields, especially in gaming and optimization problems. MCTS can efficiently find approximate optimal solutions by simulating and evaluating a large number of possible paths. However, despite the significant progress brought by MCTS, its complexity and computational cost remain major challenges, especially in high-dimensional and dynamic environments. To this end, researchers are constantly exploring ways to improve and optimize MCTS in order to enhance its adaptability and computational efficiency, which thus play a greater role in a wider range of application scenarios.
This paper conducts research on the control technology of offshore oil drilling robots based on multi-agent coordination using MCTS. Through abstract modeling of actual working processes, distributed optimization technology will be used to improve algorithm efficiency. The characteristics of the MCTS algorithm will be utilized to achieve fast start-up and continuous optimization, which, to some extent, compensate for the poor timeliness and weak scalability of previous methods. This has important theoretical significance and application value for solving the task allocation technology of offshore oil extraction platform robots.
The contributions of this paper are three-fold:
  • We utilize the anytime feature of MCTS and introduce a fast response adaptation mechanism in the centralized MCTS algorithm, which can generate preliminary solutions in a very short time and monitor the robot’s status in real time. Before the robot needs further action instructions, transmit the current optimal solution to continuously optimize the task allocation results. Meanwhile, to address the issue of fixed UCT coefficients failing in different task dimensions, this paper introduces dynamic UCT values to enhance the algorithm’s adaptability to any task dimension;
  • In large-scale task environments, due to the exponential growth of action options in the initial state, the variance of the planning results of the MCTS method significantly increases. To address this issue, we propose a strategy based on Distributed Monte Carlo Tree Search (Dec-MCTS). By effectively reducing algorithm complexity through importance sampling and combining parallel processing to optimize the iterative process, the computational efficiency in large-scale task scenarios has been greatly improved, and the optimal preliminary results have been generated in the shortest possible time.
In the rest of this paper, we review related work in Section 2, introduce the multi-robot task allocation model, and formulate the problem in Section 3. We propose an MCTS-based multi-robot task allocation method in Section 4. Finally, we evaluate our solution in Section 5 and conclude in Section 6.

3. Model and Problem Formulation

The task allocation problem of heterogeneous multi-agent systems at sea focuses on researching how to effectively assign multiple tasks to heterogeneous robot clusters deployed on offshore oil platforms based on specific capability requirements, including perception and motion. The core of this issue is to ensure that the task allocation scheme can fully utilize the unique abilities of each robot, thereby achieving efficient completion of the overall task. To achieve this goal, the task allocation mechanism not only needs to consider the time required for robots to complete different types of tasks but also the distance between task locations and corresponding time costs in order to optimize the operational efficiency of the entire system. This task allocation method places special emphasis on minimizing the cost function, which is mainly based on the time cost of different robots completing different tasks and the distance time cost between task locations.
In the real environment of offshore platforms, machine maintenance requires a large number of sensors and actuators. This article considers six tasks that require different abilities: temperature inspection, pressure inspection, observation, photography, valve inspection, and valve manipulation. Observation tasks involve exploring specific task points and can be performed by unmanned aerial vehicles (UAVs) or Husky ground robots. The valve inspection task requires the ground robot to recognize the status of the valve (open/closed), which requires the robot to be in a specific operating position. The image acquisition task involves using drones to capture structural images from different perspectives. The tasks of pressure and temperature checks are associated with collecting sensor data through ground robots. The task of manipulating valves involves the ground robot changing the opening/closing status of the valves. The basic actions in the process include data communication, navigation, and charging, which support the robot’s mobility and long-term autonomy. The task objectives are closely related to the tasks that the robot system can implement. The task planner is responsible for generating a series of actions to achieve a given set of tasks.
Table 1 shows the list of domain functions possessed by each robot and the estimated time required to perform operations related to that function. The duration of basic function actions depends on the distance d between targets, energy e, and charging rate chr. This article assumes that all actions are deterministic and have complete platform information. The planner generates a plan for assigning targets to the robot queue based on domain constraints, and robots acting on the platform can perform actions simultaneously.
Table 1. Robot capabilities and expected task duration.
The offshore oil platform consists of the platform body and four relatively independent floors. On this platform, maintenance tasks are performed by four Husky ground robots and two UAV drones. The operations proposed in this article include various actions performed by drones and robots, which involve autonomous navigation, data communication, charging, observation, image capture, and valve detection. All operations are collision-free and planned using semantic roadmaps to ensure their kinematic performance. We analyze in detail the requirements of different tasks for robot capabilities and establish a set of robot capabilities tailored to different task needs before task allocation to optimize resource allocation and utilization.
In this study, we use MDP to model and analyze complex task allocation problems. Specifically, we define a state s that includes the following elements: the robot set R = { r 1 , r 2 , , r n } , the time consumed by each robot’s assigned task set T = { t 1 , t 2 , , t n } , and the task set G = { g 1 , g 2 , , g s } that has not yet been assigned to the robot. In addition, R C = { r c 1 , r c 2 , , r c n } represents all the ability groups contained in each robot in the robot set, while G C = { g c 1 , g c 2 , , g c s } is the target ability group, representing the robot’s ability required to complete the corresponding task. The three-dimensional position of the task is represented by the coordinate set P = p 1 , p 2 , , p s . Under this framework, each robot r R may possess multiple abilities, while each task g G typically only requires one specific ability. Considering that we know the distance between any two task points on the map, by referring to Table 1, we can calculate the time it takes for the robot to complete any task. This time includes the cost of the robot’s journey from its last task location to a new task location, as well as the operational time required to complete the new task.
In each round of MDP decision-making, action a in state s refers to the collective action of all robots in that round. When state s takes action a , all robots will update on the basis of their original execution time, increasing the sum of the travel cost and operation time from the last task execution location to the newly arranged task location. After the status update, mark the selected tasks as completed from the task set to export the new state s . It should be pointed out that the possible number of actions a in different states s varies depending on the size of the task set and the robot. As the scale of tasks and robots increases, the number of possible actions increases exponentially. This growth reflects the complexity and difficulty of solving problems, especially in situations involving multi-agent systems and highly dynamic environments, where the design and optimization of task allocation strategies are particularly critical.

5. Experimental Evaluation

5.1. Experimental Setup

In order to simulate the real environment, this article uses four Husky ground robots and two UAV drones that are the same as the real scene to simulate the real environment. They are tested in large-scale environments with task scales of 16, 20, 24, and 28, respectively [14]. Each robot can synchronize with the agent by sending its own status messages during importance sampling updates or receiving messages from the agent at any time. The experiment adopts a random task arrangement method, conducting 20 experiments and taking the average value.
In order to validate the PED-MCTS algorithm proposed in this paper, it was compared with the following benchmark testing algorithms, including the heuristic method MRGA, which is represented by MRGA in the annotation of the experimental result graph. The central adaptive MCTS algorithm in Section 3 is represented by D-UBMCTS in the figure. The parallelized and decentralized MCTS algorithm is represented by PED-MCTS in the annotation of the experimental result graph.

5.2. Experiment Results

5.2.1. Experimental Results of Central Adaptive MCTS Algorithm

Figure 6 show the results obtained through three random runs of experiments with four robots and eighteen tasks. These results clearly reveal that in multi-agent task allocation problems, the performance of the algorithm has significant characteristics when using dynamic UCT values. With the increase in iteration times, the D-UBMCTS algorithm shows a strong willingness to explore, which is particularly prominent in the application of dynamic UCT values. Compared to traditional static UCT values, the D-UBMCTS algorithm allowed for the adjustment of exploration strategies based on the current exploration situation and obtained information during the iteration process, thereby enabling further optimization of known and well-performing solutions while maintaining exploration of unexplored areas.
Figure 6. Comparison of results between traditional MCTS algorithm and dynamic D-UBMCTS algorithm.
In addition, the results in Figure 6 also indicate that the introduction of dynamic UCT values significantly improves the performance of the algorithm in long-term operation in the context of multi-agent task allocation. This not only means that the algorithm can make further improvements based on the initial good solution but also indicates that the algorithm has the ability to jump out of local optima and explore potential better solutions. This characteristic is particularly important for dealing with complex task allocation problems, as such problems often have multiple feasible solutions rather than a single optimal solution. By adjusting the exploration coefficient and considering the fluctuation of task reward values, the new formula can encourage the algorithm to focus on high-reward actions while also actively exploring those actions that have not been fully explored. This method not only helps to prevent the algorithm from getting stuck in local optima too early but also increases the possibility of finding the global optimum.

5.2.2. Experimental Results of Parallelized Distributed MCTS Algorithm

(1)
The Influence of Capability Scarcity Coefficient k on Task Execution Time
Figure 7 clearly illustrates the impact of the decentralized MCTS strategy on task completion time at different coefficient k values. In this figure, the increasing coefficient k reflects the exploration of strategy adjustment. This article examines the performance of the strategy in two different scenarios, 12 tasks and 20 tasks completed by four robots. Analyzing these results reveals some conclusions.
Figure 7. Final results with different k values added to the cap Bonus function.
Specifically, in a more complex scenario involving four robots and 20 tasks, as the coefficient k increases, indicating that the agent places greater emphasis on individual task completion, this study observed a decrease in overall task completion time. This result indicates that in situations with a large number of tasks, increasing the coefficient k can motivate the agent to complete individual preference tasks more efficiently, thereby accelerating the overall task completion. However, once the coefficient k increases above 0.2, this trend begins to reverse, as overemphasizing the completion of individual tasks may lead the robot to overlook the opportunity to find the global optimal solution. In other words, machines may fall into short-sightedness, focusing only on quickly completing immediate tasks without prioritizing global task completion efficiency, resulting in a decline in overall performance.
This phenomenon is more pronounced in the simplified scenario of four robots and 12 tasks. When the coefficient k is set to 0.1, the strategy achieves optimal performance, revealing that small biases in individual agent behavior may have a significant impact on the global optimal solution when the task size is small. Because in situations where the number of tasks is small, each individual task has a relatively large impact on the overall outcome, any bias that overly focuses on individual tasks may lead to a decrease in global efficiency. Therefore, for smaller-scale tasks, this article recommends setting the coefficient k slightly lower than the value for larger task scales to avoid over-optimizing local solutions at the expense of overall optimality.
(2)
The impact of the number of CPU cores on experimental results
This experiment adopts a gradual expansion method, which gradually increases the number of CPU cores used in the PED-MCTS system, starting from a single core and gradually increasing to 20 cores, in order to simulate different configurations of parallel computing environments. During the experiment, the iteration times of the system running for 5 s were recorded for each number of cores and compared with the running time of a single core to evaluate the performance changes when expanding the parallel processing capability of the system. Through this method, this article can intuitively observe the extent of system performance improvement as the number of processor cores increases.
The experimental results are summarized in Figure 8. The experimental results indicate that as the number of CPU cores gradually increases, the computational efficiency of the system shows an approximately linear growth trend. It is worth noting that the experiment used a high-performance Core i5-13600 K processor, which is equipped with 24 MB of level 3 cache and runs at the default frequency of 3.5 GHz. Thanks to Intel’s hyperprocess technology, the original 14 physical cores have been strengthened to handle up to 20 computing processes simultaneously, significantly increasing the potential for parallel computing.
Figure 8. Number of iterations increases approximately linearly with the increase in CPU cores.
Figure 9 shows that the ratio of extra time for multiple processes to extra time for a single process increases with the number of processes. Specifically, the ratio did not increase linearly as the number of processing steps increased. This phenomenon is explained in parallel computing environments, as in pure root parallelization settings, each process performs independent search tasks, and there is no obvious contention phenomenon in resource scheduling. Therefore, theoretically, the number of simulations should show a linear relationship with the increase in the number of processes. However, due to the lack of effective communication mechanisms between processes, they were unable to share their simulation results in a timely manner, resulting in overlapping search spaces, which is particularly evident in the later stages of the simulation. Therefore, as the number of processes increases to a certain extent, the contribution of each additional process to improving the total reward gradually decreases, reflecting the marginal effect of parallel processing capability.
Figure 9. The time difference ratio of multiple processes relative to a single process.
(3)
Experimental results under different task scales
Figure 10 presents a clear visual comparison between D-UBMCTS and central MCTS in terms of operational continuity. In this comparison, the distributed method exhibits a clear trend of continuous optimization, indicating its ability to continuously optimize solutions. On the contrary, the centralized approach shows more phased progress, which is significant but discontinuous after each update cycle.
Figure 10. Comparison of PED-MCTS, D-UBMCTS, and heuristic MRGA algorithms under different task sizes.
This significant difference stems from the core communication mechanism of distributed systems: in this system, the communication of each robot may bring critical new information, which is obtained in real time from any single agent, without relying on centralized and unified information updates from all agents. This mechanism allows distributed systems to quickly respond and integrate new data without waiting for the entire network to synchronize, thereby achieving more stable and continuous global optimization. Every communication between intelligent agents is not just a simple transmission of information but an active part of the continuous optimization process of global solutions. Therefore, the search results for each update stage are not only more accurate but also reflect the latest data and scenario changes, ensuring the timeliness and relevance of solutions.
Through this approach, PED-MCTS is able to continuously adjust and improve its strategies to cope with complex and changing environmental conditions and task requirements, significantly outperforming the lag and adaptability issues that may arise with centralized approaches in large-scale and dynamic environments. This makes distributed MCTS an efficient and flexible strategy, particularly suitable for application scenarios that require rapid response and frequent decision updates.
In large-scale task scenarios, when all six robots participate in task execution simultaneously, as the task size gradually increases, we can clearly observe that the efficiency and effectiveness of the centralized Monte Carlo search method gradually decrease in the initial allocation scheme. This phenomenon is mainly due to the exponential increase in the number of potential action options for each initial state s as the task scale expands. This has led to the allocation of initial actions taking decisive importance in the entire solution. If the initial action allocation is poor, it will become extremely difficult to optimize through adjustments later on. Therefore, when dealing with large-scale tasks, the PED-MCTS algorithm performs significantly better in terms of stability and result quality than centralized methods.
By increasing the communication frequency and information exchange volume in distributed systems, each agent can access more global information in a timely manner, which not only effectively improves the search process but also optimizes the search results. This mechanism ensures that as each communication stage is completed, the overall search results will gradually converge toward a better solution. This gradual optimization process is a major advantage of the PED-MCTS algorithm, especially in dynamic and complex task environments that require rapid adaptation and response.
Obviously, through this high-frequency communication and information sharing, PED-MCTS can more effectively synchronize and integrate data and strategies from various agents, thereby demonstrating significant advantages in global optimization problems. This optimization not only improves the efficiency of task execution but also greatly enhances the system’s adaptability and response speed to environmental changes, ensuring the accuracy and efficiency of decision-making in complex environments.

6. Conclusions and Future Work

This study explores the task allocation problem of multi-robot systems in complex dynamic environments, with a particular focus on highly unpredictable offshore oil platform scenarios. Given that traditional static task allocation strategies cannot meet the efficiency and response speed requirements on site, this paper proposes a dynamic task allocation method that combines the MCTS algorithm. This method utilizes the dynamic UCT value and reward optimization of the D-UBMCTS algorithm to achieve fast response and continuous task allocation optimization, effectively improving the initial efficiency and continuous optimization ability of task allocation. Compared with the traditional heuristic method MRGA, the D-UBMCTS algorithm improves the efficiency and adaptability of task processing by optimizing the task execution process in real time within the existing planning time.
In addition, the PED-MCTS method was introduced to enhance the robustness and stability of the system by replacing the centralized allocation process with a distributed independent computing communication process. Distributed processing not only reduces task complexity but also further optimizes time complexity and improves algorithm execution efficiency through importance sampling and parallel optimization schemes. The PED-MCTS method significantly reduces the computation time while ensuring the quality of the solution, improving the processing speed and efficiency of large-scale applications.
In terms of future work, although distributed computing has significantly reduced algorithm complexity and improved efficiency, the system still faces challenges, such as uncontrollable variance in algorithm results, which may affect the stability and predictability of the algorithm. The current model is not yet applicable to all situations, and further adjustments and optimizations are needed for specific environments and conditions. Future research will focus on improving the stability of algorithms, exploring new strategies to better control the variance of algorithm outputs, and enhancing the applicability and predictability of models in various environments.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: H.Z.; data collection: Y.S.; analysis and interpretation of results: H.Z. and F.Z.; draft manuscript preparation: H.Z. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by (1) 2023 Jiangsu University Philosophy and Social Science Research Project (2023SJYB0687); (2) Pujiang College of Nanjing University of Technology, Top Priority Education Reform Project in 2022 (2022JG001Z); (3) Natural Science Key Cultivation Project of Pujiang College of Nanjing University of Technology (njpj2022-1-06). Key project of natural science, Nanjing Tech University Pujiang Institute (NJPJ2024-1-01).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Acknowledgments

Special thanks to the reviewers of this paper for their valuable feedback and constructive suggestions, which greatly contributed to the refinement of this research.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

References

  1. Patrick, B.; Charles, L.; Magali, B. Hybrid planning and distributed iterative repair for multi-robot missions with communication losses. Auton. Robot. 2020, 44, 505–531. [Google Scholar]
  2. Èric, P.; Paola, A.; Katrin, S. A Digital Twin for Human-Robot Interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Daegu, Republic of Korea, 11–14 March 2019; IEEE: New York, NY, USA; p. 372. [Google Scholar]
  3. Zhu, L.; Peng, P.; Lu, Z.; Tian, Y. MetaVIM: Meta Variationally Intrinsic Motivate Reinforcement Learning for Decentralized Traffic Signal Control. IEEE Trans. Knowl. Data Eng. 2023, 35, 11570–11584. [Google Scholar] [CrossRef]
  4. Xu, B.; Wang, Y.; Wang, Z.; Jia, H.; Lu, Z. Hierarchically and Cooperatively Learning Traffic Signal Control. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; pp. 669–677. [Google Scholar]
  5. Fioretto, F.; Pontelli, E.; Yeoh, W. Distributed constraint optimization problems and applications: A survey. J. Artif. Intell. Res. 2018, 61, 623–698. [Google Scholar] [CrossRef]
  6. Zhao, W.; Ye, Y.; Ding, J.; Wang, T.; Wei, T.; Chen, M. Ipdalight: Intensity-and phase duration-aware traffic signal control based on reinforcement learning. J. Syst. Archit. 2022, 123, 102374. [Google Scholar] [CrossRef]
  7. Jiang, Q.; Qin, M.; Shi, S.; Sun, W.; Zheng, B. Multi-agent reinforcement learning for traffic signal control through universal communication method. arXiv, 2022; arXiv:220412190. [Google Scholar] [CrossRef]
  8. Wu, N.; Li, D.; Xi, Y. Distributed Weighted Balanced Control of Traffic Signals for Urban Traffic Congestion. IEEE Trans. Intell. Transp. Syst. 2018, 20, 3710–3720. [Google Scholar] [CrossRef]
  9. Goldstein, R.; Smith, S. Expressive real-time intersection scheduling. In Proceedings of the AAAI Conference on Artificial Intelligence, Orleans, LA, USA, 2–7 February 2018; pp. 6177–6185. [Google Scholar]
  10. Cashmore, M.; Fox, M.; Larkworthy, T.; Long, D.; Magazzeni, D. AUV mission control via temporal planning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–5 June 2014; pp. 6535–6541. [Google Scholar]
  11. Cashmore, M.; Fox, M.; Long, D.; Long, D.; Magazzeni, D. ROSPlan: Planning in the robot operating system. In Proceedings of the International Conference on Automated Planning and Scheduling, Jerusalem, Israel, 7–11 June 2015; pp. 333–341. [Google Scholar]
  12. Anthony, T.; Tian, Z.; Barber, D. Thinking fast and slow with deep learning and tree search. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5366–5376. [Google Scholar]
  13. Hertle, A.; Bernhard, N. Efficient auction based coordination for distributed multi-agent planning in temporal domains using resource abstraction. In Proceedings of the Joint German/Austrian Conference on Artificial Intelligence, Berlin, Germany, 24–28 September 2018; pp. 86–98. [Google Scholar]
  14. Carreno, Y.; Pairet, È.; Petillot, Y.; Petrick, R.P. Task allocation strategy for heterogeneous robot teams in offshore missions. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, Visual, 9–13 May 2020; pp. 225–230. [Google Scholar]
  15. Bogue, R. Robots in the offshore oil and gas industries: A review of recent developments. Industrial Robot. Int. J. Robot. Res. Appl. 2020, 47, 1–6. [Google Scholar]
  16. Chen, S.; Andrejczuk, E.; Irissappane, A.A.; Zhang, J. ATSIS: Achieving the ad hoc teamwork by subtask inference and selection. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 172–179. [Google Scholar]
  17. Kurzer, K.; Zhou, C.; Zöllner, J.M. Decentralized cooperative planning for automated vehicles with hierarchical Monte Carlo tree search. In Proceedings of the Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 529–536. [Google Scholar]
  18. Kurzer, K.; Engelhorn, F.; Zöllner, J.M. Decentralized cooperative planning for automated vehicles with continuous monte carlo tree search. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 452–459. [Google Scholar]
  19. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
  20. Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of Go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
  21. Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In Proceedings of the International Conference on Computers and Games, Turin, Italy, 29–31 May 2006; pp. 72–83. [Google Scholar]
  22. Gelly, S.; Silver, D. Combining online and offline knowledge in UCT. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 273–280. [Google Scholar]
  23. Van den Broeck, G.; Driessens, K.; Ramon, J. Monte-Carlo tree search in poker using expected reward distributions. In Proceedings of the Advances in Machine Learning: First Asian Conference on Machine Learning, Nanjing, China, 2–4 November 2009; pp. 367–381. [Google Scholar]
  24. Gabor, T.; Peter, J.; Phan, T.; Meyer, C.; Linnhoff-Popien, C. Subgoal-based temporal abstraction in Monte Carlo tree search. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 5562–5568. [Google Scholar]
  25. Nguyen, K.Q.; Thawonmas, R. Monte carlo tree search for collaboration control of ghosts in ms. pac-man. IEEE Trans. Comput. Intell. AI Games 2012, 5, 57–68. [Google Scholar] [CrossRef]
  26. Pepels, T.; Winands, M.H.M.; Lanctot, M. Real-time monte carlo tree search in ms pac-man. IEEE Trans. Comput. Intell. AI Games 2014, 6, 245–257. [Google Scholar] [CrossRef]
  27. Tesauro, G.; Rajan, V.; Segal, R. Bayesian inference in monte-carlo tree search. arXiv 2012, arXiv:12033519. [Google Scholar] [CrossRef]
  28. Eysenbach, B.; Salakhutdinov, R.; Levine, S. Search on the Replay Buffer: Bridging Planning And Reinforcement Learning. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 15246–15257. [Google Scholar]
  29. Santos, B.S.; Bernardino, H.S. Game state evaluation heuristics in general video game playing. In Proceedings of the 17th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), Foz do Iguaçu, Brazil, 29 October–1 November 2018; pp. 14701–14709. [Google Scholar]
  30. Li, L.; Dong, P.; Wei, Z.; Yang, Y. Automated knowledge distillation via monte carlo tree search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 17413–17424. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.