MDPI - Publisher of Open Access Journals

30 pages, 8264 KiB

Open AccessArticle

BoxesZero: An Efficient and Computationally Frugal Dots-and-Boxes Agent

by Xuefen Niu, Qirui Liu, Wei Chen, Yujiao Zheng and Zhanggen Jin

Entropy 2025, 27(3), 285; https://doi.org/10.3390/e27030285 - 9 Mar 2025

Viewed by 860

In recent years, deep reinforcement learning (DRL) has made significant progress in the field of games. A prime example is AlphaZero, which, despite the formidable capabilities showcased, deters many from exploring its potential because of its demands for substantial computational resources. In this [...] Read more.

In recent years, deep reinforcement learning (DRL) has made significant progress in the field of games. A prime example is AlphaZero, which, despite the formidable capabilities showcased, deters many from exploring its potential because of its demands for substantial computational resources. In this paper, we introduce BoxesZero, a computationally frugal Dots-and-Boxes agent that can achieve a high level of performance using relatively fewer computational resources. BoxesZero utilizes a novel and insightful training approach called “backward training”, which starts by training from high-reward states near the end of the game and gradually trains earlier stages of the game. It also incorporates the domain knowledge of Dots-and-Boxes, such as endgame theorems, to accelerate the Monte Carlo Tree Search (MCTS) process. Furthermore, we extend the existing endgame theorems (which only include long chains) to encompass scenarios with 1-chains and 2-chains, providing corresponding proofs, which we refer to as the extended endgame theorems. This novel agent, BoxesZero, can achieve a high level of playing strength much faster than AlphaZero, substantially improving the model’s learning efficiency. With carefully tuned parameters and limited GPU resources, BoxesZero surpasses the strongest open-source Boxes agents, PRsboxes and DabbleBoxes. Experimental results demonstrate that BoxesZero achieves an ELO rating comparable to AlphaZero in significantly less time. Furthermore, BoxesZero won the championship in the Dots-and-Boxes category of the 2024 Chinese Computer Game Competition. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

26 pages, 12992 KiB

Open AccessArticle

AlphaRouter: Bridging the Gap Between Reinforcement Learning and Optimization for Vehicle Routing with Monte Carlo Tree Searches

by Won-Jun Kim, Junho Jeong, Taeyeong Kim and Kichun Lee

Entropy 2025, 27(3), 251; https://doi.org/10.3390/e27030251 - 27 Feb 2025

Viewed by 1265

Abstract

Deep reinforcement learning (DRL) as a routing problem solver has shown promising results in recent studies. However, an inherent gap exists between computationally driven DRL and optimization-based heuristics. While a DRL algorithm for a certain problem is able to solve several similar problem [...] Read more.

Deep reinforcement learning (DRL) as a routing problem solver has shown promising results in recent studies. However, an inherent gap exists between computationally driven DRL and optimization-based heuristics. While a DRL algorithm for a certain problem is able to solve several similar problem instances, traditional optimization algorithms focus on optimizing solutions to one specific problem instance. In this paper, we propose an approach, AlphaRouter, which solves routing problems while bridging the gap between reinforcement learning and optimization. Fitting to routing problems, our approach first proposes attention-enabled policy and value networks consisting of a policy network that produces a probability distribution over all possible nodes and a value network that produces the expected distance from any given state. We modify a Monte Carlo tree search (MCTS) for the routing problems, selectively combining it with the routing problems. Our experiments demonstrate that the combined approach is promising and yields better solutions compared to original reinforcement learning (RL) approaches without MCTS, with good performance comparable to classical heuristics. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Data Analytics)

► Show Figures

Figure 1

21 pages, 4319 KiB

Open AccessArticle

Research on Real-Time Multi-Robot Task Allocation Method Based on Monte Carlo Tree Search

by Huiying Zhang, Yule Sun and Fengzhi Zheng

Electronics 2024, 13(24), 4943; https://doi.org/10.3390/electronics13244943 - 15 Dec 2024

Viewed by 1175

Abstract

Task allocation is an important problem in multi-robot systems, particularly in dynamic and unpredictable environments such as offshore oil platforms, large-scale factories, or disaster response scenarios, where high change rates, uncertain state transitions, and varying task demands challenge the predictability and stability of [...] Read more.

Task allocation is an important problem in multi-robot systems, particularly in dynamic and unpredictable environments such as offshore oil platforms, large-scale factories, or disaster response scenarios, where high change rates, uncertain state transitions, and varying task demands challenge the predictability and stability of robot operations. Traditional static task allocation strategies often struggle to meet the efficiency and responsiveness demands of these complex settings, while optimization heuristics, though improving planning time, exhibit limited scalability. To address these limitations, this paper proposes a task allocation method based on the Monte Carlo Tree Search (MCTS) algorithm, which leverages the anytime property of MCTS to achieve a balance between fast response and continuous optimization. Firstly, the centralized adaptive MCTS algorithm generates preliminary solutions and monitors the state of the robots in minimal time. It utilizes dynamic Upper Confidence Bounds for Trees (UCT) values to accommodate varying task dimensions, outperforming the heuristic Multi-Robot Goal Assignment (MRGA) method in both planning time and overall task completion time. Furthermore, the parallelized distributed MCTS algorithm reduces algorithmic complexity and enhances computational efficiency through importance sampling and parallel processing. Experimental results demonstrate that the proposed method significantly reduces computation time while maintaining task allocation performance, decreasing the variance of planning results and improving algorithmic stability. Our approach enables more flexible and efficient task allocation in dynamically evolving and complex environments, providing robust support for the deployment of multi-robot systems. Full article

(This article belongs to the Special Issue AI Applications of Multi-Agent Systems)

► Show Figures

Figure 1

19 pages, 4387 KiB

Open AccessArticle

Satellite Autonomous Mission Planning Based on Improved Monte Carlo Tree Search

by Zichao Li, You Li and Rongzheng Luo

Symmetry 2024, 16(8), 1039; https://doi.org/10.3390/sym16081039 - 13 Aug 2024

Cited by 1 | Viewed by 1678

Abstract

This paper improves the timeliness of satellite mission planning to cope with the rapid response to changes. In this paper, satellite mission planning is investigated. Firstly, the satellite dynamics model and mission planning model are established, and an improved Monte Carlo tree (Improved-MCTS) [...] Read more.

This paper improves the timeliness of satellite mission planning to cope with the rapid response to changes. In this paper, satellite mission planning is investigated. Firstly, the satellite dynamics model and mission planning model are established, and an improved Monte Carlo tree (Improved-MCTS) algorithm is proposed, which utilizes the Monte Carlo tree search in combination with the state uncertainty network (State-UN) to reduce the time of exploring the nodes (At the MCTS selection stage, the exploration of nodes specifically refers to the algorithm needing to decide whether to choose nodes that have already been visited (exploitation) or nodes that have not been visited yet (exploration)). The results show that this algorithm performs better in terms of profit (in this paper, the observation task is given a weight of 0–1, and each planned task will receive a profit; that is, a profit will be assigned at the initial moment) and convergence speed compared to the ant colony algorithm (ACO) and the asynchronous advantage actor critic (A3C). Full article

(This article belongs to the Section Engineering and Materials)

► Show Figures

Figure 1

23 pages, 2643 KiB

Open AccessArticle

An Efficient Optimization of the Monte Carlo Tree Search Algorithm for Amazons

by Lijun Zhang, Han Zou and Yungang Zhu

Algorithms 2024, 17(8), 334; https://doi.org/10.3390/a17080334 - 1 Aug 2024

Cited by 1 | Viewed by 1657

Abstract

Amazons is a computerized board game with complex positions that are highly challenging for humans. In this paper, we propose an efficient optimization of the Monte Carlo tree search (MCTS) algorithm for Amazons, fusing the ‘Move Groups’ strategy and the ‘Parallel Evaluation’ optimization [...] Read more.

Amazons is a computerized board game with complex positions that are highly challenging for humans. In this paper, we propose an efficient optimization of the Monte Carlo tree search (MCTS) algorithm for Amazons, fusing the ‘Move Groups’ strategy and the ‘Parallel Evaluation’ optimization strategy (MG-PEO). Specifically, we explain the high efficiency of the Move Groups strategy by defining a new criterion: the winning convergence distance. We also highlight the strategy’s potential issue of falling into a local optimum and propose that the Parallel Evaluation mechanism can compensate for this shortcoming. Moreover, We conducted rigorous performance analysis and experiments. Performance analysis results indicate that the MCTS algorithm with the Move Groups strategy can improve the playing ability of the Amazons game by 20–30 times compared to the traditional MCTS algorithm. The Parallel Evaluation optimization further enhances the playing ability of the Amazons game by 2–3 times. Experimental results show that the MCTS algorithm with the MG-PEO strategy achieves a 23% higher game-winning rate on average compared to the traditional MCTS algorithm. Additionally, the MG-PEO Amazons program proposed in this paper won first prize in the Amazons Competition at the 2023 China Collegiate Computer Games Championship & National Computer Games Tournament. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

18 pages, 2966 KiB

Open AccessArticle

Beyond Trial and Error: Lane Keeping with Monte Carlo Tree Search-Driven Optimization of Reinforcement Learning

by Bálint Kővári, Bálint Pelenczei, István Gellért Knáb and Tamás Bécsi

Electronics 2024, 13(11), 2058; https://doi.org/10.3390/electronics13112058 - 25 May 2024

Cited by 2 | Viewed by 1227

Abstract

In recent years, Reinforcement Learning (RL) has excelled in the realm of autonomous vehicle control, which is distinguished by the absence of limitations, such as specific training data or the necessity for explicit mathematical model identification. Particularly in the context of lane keeping, [...] Read more.

In recent years, Reinforcement Learning (RL) has excelled in the realm of autonomous vehicle control, which is distinguished by the absence of limitations, such as specific training data or the necessity for explicit mathematical model identification. Particularly in the context of lane keeping, a diverse set of rewarding strategies yields a spectrum of realizable policies. Nevertheless, the challenge lies in discerning the optimal behavior that maximizes performance. Traditional approaches entail exhaustive training through a trial-and-error strategy across conceivable reward functions, which is a process notorious for its time-consuming nature and substantial financial implications. Contrary to conventional methodologies, the Monte Carlo Tree Search (MCTS) enables the prediction of reward function quality through Monte Carlo simulations, thereby eliminating the need for exhaustive training on all available reward functions. The findings obtained from MCTS simulations can be effectively leveraged to selectively train only the most suitable RL models. This approach helps alleviate the resource-heavy nature of traditional RL processes through altering the training pipeline. This paper validates the theoretical framework concerning the unique property of the Monte Carlo Tree Search algorithm by emphasizing its generality through highlighting crossalgorithmic and crossenvironmental capabilities while also showcasing its potential to reduce training costs. Full article

(This article belongs to the Special Issue Advancements in Cross-Disciplinary AI: Theory and Application—2nd Edition)

► Show Figures

Figure 1

20 pages, 1302 KiB

Open AccessArticle

Enhancing Autonomous Underwater Vehicle Decision Making through Intelligent Task Planning and Behavior Tree Optimization

by Dan Yu, Hongjian Wang, Xu Cao, Zhao Wang, Jingfei Ren and Kai Zhang

J. Mar. Sci. Eng. 2024, 12(5), 791; https://doi.org/10.3390/jmse12050791 - 8 May 2024

Cited by 3 | Viewed by 2093

Abstract

The expansion of underwater scenarios and missions highlights the crucial need for autonomous underwater vehicles (AUVs) to make informed decisions. Therefore, developing an efficient decision-making framework is vital to enhance productivity in executing complex tasks within tight time constraints. This paper delves into [...] Read more.

The expansion of underwater scenarios and missions highlights the crucial need for autonomous underwater vehicles (AUVs) to make informed decisions. Therefore, developing an efficient decision-making framework is vital to enhance productivity in executing complex tasks within tight time constraints. This paper delves into task planning and reconstruction within the AUV control decision system to enable intelligent completion of intricate underwater tasks. Behavior trees (BTs) offer a structured approach to organizing the switching structure of a hybrid dynamical system (HDS), originally introduced in the computer game programming community. In this research, an intelligent search algorithm, MCTS-QPSO (Monte Carlo tree search and quantum particle swarm optimization), is proposed to bolster the AUV’s capacity in planning complex task decision control systems. This algorithm tackles the issue of the time-consuming manual design of control systems by effectively integrating BTs. By assessing a predefined set of subtasks and actions in tandem with the complex task scenario, a reward function is formulated for MCTS to pinpoint the optimal subtree set. The QPSO algorithm is then leveraged for subtree integration, treating it as an optimal path search problem from the root node to the leaf node. This process optimizes the search subtree, thereby enhancing the robustness and security of the control architecture. To expedite search speed and algorithm convergence, this paper recommends reducing the search space by pre-grouping conditions and states within the behavior tree. The efficacy and superiority of the proposed algorithm are validated through security and timeliness evaluations of the BT, along with comparisons with other algorithms for automatic AUV decision control behavior tree design. Ultimately, the effectiveness and superiority of the proposed algorithm are corroborated through simulations on a multi-AUV complex task platform, showcasing its practical applicability and efficiency in real-world underwater scenarios. Full article

(This article belongs to the Special Issue Unmanned Marine Vehicles: Perception, Planning, Control and Swarm)

► Show Figures

Figure 1

11 pages, 1413 KiB

Open AccessArticle

Model for Hydrogen Production Scheduling Optimisation

by Vitalijs Komasilovs, Aleksejs Zacepins, Armands Kviesis and Vladislavs Bezrukovs

Modelling 2024, 5(1), 265-275; https://doi.org/10.3390/modelling5010014 - 19 Feb 2024

Cited by 1 | Viewed by 1912

Abstract

This scientific article presents a developed model for optimising the scheduling of hydrogen production processes, addressing the growing demand for efficient and sustainable energy sources. The study focuses on the integration of advanced scheduling techniques to improve the overall performance of the hydrogen [...] Read more.

This scientific article presents a developed model for optimising the scheduling of hydrogen production processes, addressing the growing demand for efficient and sustainable energy sources. The study focuses on the integration of advanced scheduling techniques to improve the overall performance of the hydrogen electrolyser. The proposed model leverages constraint programming and satisfiability (CP-SAT) techniques to systematically analyse complex production schedules, considering factors such as production unit capacities, resource availability and energy costs. By incorporating real-world constraints, such as fluctuating energy prices and the availability of renewable energy, the optimisation model aims to improve overall operational efficiency and reduce production costs. The CP-SAT was applied to achieve more efficient control of the electrolysis process. The optimisation of the scheduling task was set for a 24 h time period with time resolutions of 1 h and 15 min. The performance of the proposed CP-SAT model in this study was then compared with the Monte Carlo Tree Search (MCTS)-based model (developed in our previous work). The CP-SAT was proven to perform better but has several limitations. The model response to the input parameter change has been analysed. Full article

► Show Figures

Figure 1

15 pages, 1610 KiB

Open AccessArticle

De Novo Drug Design Using Transformer-Based Machine Translation and Reinforcement Learning of an Adaptive Monte Carlo Tree Search

by Dony Ang, Cyril Rakovski and Hagop S. Atamian

Pharmaceuticals 2024, 17(2), 161; https://doi.org/10.3390/ph17020161 - 27 Jan 2024

Cited by 6 | Viewed by 13745

Abstract

The discovery of novel therapeutic compounds through de novo drug design represents a critical challenge in the field of pharmaceutical research. Traditional drug discovery approaches are often resource intensive and time consuming, leading researchers to explore innovative methods that harness the power of [...] Read more.

The discovery of novel therapeutic compounds through de novo drug design represents a critical challenge in the field of pharmaceutical research. Traditional drug discovery approaches are often resource intensive and time consuming, leading researchers to explore innovative methods that harness the power of deep learning and reinforcement learning techniques. Here, we introduce a novel drug design approach called drugAI that leverages the Encoder–Decoder Transformer architecture in tandem with Reinforcement Learning via a Monte Carlo Tree Search (RL-MCTS) to expedite the process of drug discovery while ensuring the production of valid small molecules with drug-like characteristics and strong binding affinities towards their targets. We successfully integrated the Encoder–Decoder Transformer architecture, which generates molecular structures (drugs) from scratch with the RL-MCTS, serving as a reinforcement learning framework. The RL-MCTS combines the exploitation and exploration capabilities of a Monte Carlo Tree Search with the machine translation of a transformer-based Encoder–Decoder model. This dynamic approach allows the model to iteratively refine its drug candidate generation process, ensuring that the generated molecules adhere to essential physicochemical and biological constraints and effectively bind to their targets. The results from drugAI showcase the effectiveness of the proposed approach across various benchmark datasets, demonstrating a significant improvement in both the validity and drug-likeness of the generated compounds, compared to two existing benchmark methods. Moreover, drugAI ensures that the generated molecules exhibit strong binding affinities to their respective targets. In summary, this research highlights the real-world applications of drugAI in drug discovery pipelines, potentially accelerating the identification of promising drug candidates for a wide range of diseases. Full article

(This article belongs to the Special Issue Structural and Computational-Driven Molecule Design in Drug Discovery)

► Show Figures

Graphical abstract

20 pages, 3917 KiB

Open AccessArticle

Collaborative Cost Multi-Agent Decision-Making Algorithm with Factored-Value Monte Carlo Tree Search and Max-Plus

by Nii-Emil Alexander-Reindorf and Paul Cotae

Games 2023, 14(6), 75; https://doi.org/10.3390/g14060075 - 17 Dec 2023

Viewed by 2267

Abstract

In this paper, we describe the Factored Value MCTS Hybrid Cost-Max-Plus algorithm, a collection of decision-making algorithms (centralized, decentralized, and hybrid) for a multi-agent system in a collaborative setting that considers action costs. Our proposed algorithm is made up of two steps. In [...] Read more.

In this paper, we describe the Factored Value MCTS Hybrid Cost-Max-Plus algorithm, a collection of decision-making algorithms (centralized, decentralized, and hybrid) for a multi-agent system in a collaborative setting that considers action costs. Our proposed algorithm is made up of two steps. In the first step, each agent searches for the best individual actions with the lowest cost using the Monte Carlo Tree Search (MCTS) algorithm. Each agent’s most promising activities are chosen and presented to the team. The Hybrid Cost Max-Plus method is utilized for joint action selection in the second step. The Hybrid Cost Max-Plus algorithm improves the well-known centralized and distributed Max-Plus algorithm by incorporating the cost of actions in agent interactions. The Max-Plus algorithm employed the Coordination Graph framework, which exploits agent dependencies to decompose the global payoff function as the sum of local terms. In terms of the number of agents and their interactions, the suggested Factored Value MCTS-Hybrid Cost-Max-Plus method is online, anytime, distributed, and scalable. Our contribution competes with state-of-the-art methodologies and algorithms by leveraging the locality of agent interactions for planning and acting utilizing MCTS and Max-Plus algorithms. Full article

► Show Figures

Figure 1

22 pages, 6913 KiB

Open AccessArticle

Multi-UAV Urban Logistics Task Allocation Method Based on MCTS

by Zeyuan Ma and Jing Chen

Drones 2023, 7(11), 679; https://doi.org/10.3390/drones7110679 - 17 Nov 2023

Cited by 15 | Viewed by 2841

Abstract

Unmanned aerial vehicles (UAVs) open new methods for efficient and rapid transportation in urban logistics distribution, where task allocation is a significant issue. In urban logistics systems, the energy status of UAVs is a critical factor in ensuring mission fulfillment. While extensive literature [...] Read more.

Unmanned aerial vehicles (UAVs) open new methods for efficient and rapid transportation in urban logistics distribution, where task allocation is a significant issue. In urban logistics systems, the energy status of UAVs is a critical factor in ensuring mission fulfillment. While extensive literature addresses the energy consumption of UAVs during tasks, the feasibility of energy replenishment must be addressed, which introduces additional uncertainty to the task allocation. This paper realizes multi-tasking, considering the energy consumption and replenishment of UAVs, to ensure that the tasks can be accomplished while reducing energy consumption. This paper proposes uniform distribution K-means to realize balanced multi-task grouping. Based on the Monte Carlo tree search (MCTS), a task-allocation-oriented MCTS method is proposed, including improving the selection and simulation process of MCTS. The aim was to collaborate with multiple trees for node selection and record historical simulation information to guide subsequent simulations for better results. Finally, the optimality of the proposed method was validated by comparing it with other relevant MCTS methods through several randomized experiments. Full article

► Show Figures

Figure 1

22 pages, 4369 KiB

Open AccessArticle

Yard Space Allocation Algorithm for Unloading Containers at Marine Terminals

by Xingyu Wang, Ning Zhao and Chao Mi

J. Mar. Sci. Eng. 2023, 11(11), 2109; https://doi.org/10.3390/jmse11112109 - 4 Nov 2023

Cited by 1 | Viewed by 2971

Abstract

The issue of unloading efficiency for containers is the operational bottleneck for most traditional container terminals. In addressing the intricate challenges of space allocation in container yards during ship unloading, this study focuses on the real-time, dynamic decision-making needs that are currently unmet [...] Read more.

The issue of unloading efficiency for containers is the operational bottleneck for most traditional container terminals. In addressing the intricate challenges of space allocation in container yards during ship unloading, this study focuses on the real-time, dynamic decision-making needs that are currently unmet by existing planning methods. To tackle this, the article introduces a novel model for container space allocation that aims to maximize the “attractiveness” of yard spaces. This model factors in key considerations like the allocation of container handling equipment resources, the rate of container handling equipment traversing the yard, and container handling equipment operations across containers. A unique Monte Carlo tree search (MCTS)-based algorithm is developed to solve this multi-objective problem. The algorithm’s efficacy is rigorously tested via numerical experiments, where it outperforms existing approaches like UCT-MCTS, AMAF-MCTS, and manual scheduling plans using practical engineering examples. This research not only provides a more dynamic and efficient method for yard space allocation but also offers empirical evidence to support its practicality and effectiveness. Full article

(This article belongs to the Special Issue Advances in Marine Logistics, Shipping, and Ports)

► Show Figures

Figure 1

16 pages, 2837 KiB

Open AccessArticle

A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge

by Dong Sui, Hanping Chen and Tingting Zhou

Aerospace 2023, 10(11), 914; https://doi.org/10.3390/aerospace10110914 - 26 Oct 2023

Cited by 3 | Viewed by 1702

Abstract

With the escalating complexity of surface operations at large airports, the conflict risk for aircraft taxiing has correspondingly increased. Usually, the Air Traffic Controllers (ATCOs) generate route, speed and holding instructions to resolve conflicts. In this paper, we introduce a conflict resolution framework [...] Read more.

With the escalating complexity of surface operations at large airports, the conflict risk for aircraft taxiing has correspondingly increased. Usually, the Air Traffic Controllers (ATCOs) generate route, speed and holding instructions to resolve conflicts. In this paper, we introduce a conflict resolution framework that incorporates prior knowledge by integrating a Multi-Layer Perceptron (MLP) neural network into the Monte Carlo Tree Search (MCTS) approach. The neural network is trained to learn the allocation strategy for waiting time extracted from actual aircraft taxiing trajectory data. Subsequently, the action probability distribution generated with the neural network is embedded into the MCTS algorithm as a heuristic evaluation function to guide the search process in finding the optimal conflict resolution strategy. Experimental results show that the average conflict resolution rate is 96.8% in different conflict scenarios, and the taxiing time required to resolve conflicts is reduced by an average of 42.77% compared to the taxiing time in actual airport surface operations. Full article

(This article belongs to the Section Air Traffic and Transportation)

► Show Figures

Figure 1

16 pages, 4798 KiB

Open AccessArticle

Development of an MCTS Model for Hydrogen Production Optimisation

by Vitalijs Komasilovs, Aleksejs Zacepins, Armands Kviesis, Kaspars Ozols, Arturs Nikulins and Kaspars Sudars

Processes 2023, 11(7), 1977; https://doi.org/10.3390/pr11071977 - 30 Jun 2023

Cited by 2 | Viewed by 1774

Abstract

Hydrogen has the potential to revolutionize the energy industry due to its clean-burning and versatile properties. It is the most abundant element in the universe and can be produced through a variety of methods, including electrolysis. The widespread adoption of hydrogen faces various [...] Read more.

Hydrogen has the potential to revolutionize the energy industry due to its clean-burning and versatile properties. It is the most abundant element in the universe and can be produced through a variety of methods, including electrolysis. The widespread adoption of hydrogen faces various challenges, including the high cost of production; thus, it is important to optimise the production processes. This research focuses on development of models for hydrogen production optimisation based on various external factors and parameters. Models based on electricity prices are developed and compared between different market situations. To run hydrogen production more effectively, it is required to use renewable energy sources for the production process. Adding the solar power component to the economic evaluation model outcome is more positive. The Monte Carlo tree search (MCTS) algorithm is adapted to effectively control the electrolysis process. MCTS schedule optimization was performed for a 24 h time horizon applying two time-resolution settings—1 h and 15 min. The results demonstrate the potential of the MCTS algorithm for finding good schedules for water electrolyser devices by taking into account variable environmental factors. Whereas the MCTS with a 15 min resolution ensures mathematically better results, it requires more computational power to solve the decision tree. Full article

► Show Figures

Figure 1

18 pages, 3604 KiB

Open AccessArticle

Model-Based Reinforcement Learning Method for Microgrid Optimization Scheduling

by Jinke Yao, Jiachen Xu, Ning Zhang and Yajuan Guan

Sustainability 2023, 15(12), 9235; https://doi.org/10.3390/su15129235 - 7 Jun 2023

Cited by 4 | Viewed by 2830

Abstract

Due to the uncertainty and randomness of clean energy, microgrid operation is often prone to instability, which requires the implementation of a robust and adaptive optimization scheduling method. In this paper, a model-based reinforcement learning algorithm is applied to the optimal scheduling problem [...] Read more.

Due to the uncertainty and randomness of clean energy, microgrid operation is often prone to instability, which requires the implementation of a robust and adaptive optimization scheduling method. In this paper, a model-based reinforcement learning algorithm is applied to the optimal scheduling problem of microgrids. During the training process, the current learned networks are used to assist Monte Carlo Tree Search (MCTS) in completing game history accumulation, and updating the learning network parameters to obtain optimal microgrid scheduling strategies and a simulated environmental dynamics model. We establish a microgrid environment simulator that includes Heating Ventilation Air Conditioning (HVAC) systems, Photovoltaic (PV) systems, and Energy Storage (ES) systems for simulation. The simulation results show that the operation of microgrids in both islanded and connected modes does not affect the training effectiveness of the algorithm. After 200 training steps, the algorithm can avoid the punishment of exceeding the red line of the bus voltage, and after 800 training steps, the training result converges and the loss values of the value and reward network converge to 0, showing good effectiveness. This proves that the algorithm proposed in this paper can be applied to the optimization scheduling problem of microgrids. Full article

(This article belongs to the Special Issue Renewable and Sustainable Energy Systems: Architecture, Methodology and Technology)

► Show Figures

Figure 1

Search Results (31)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (31)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI