Next Article in Journal
A Multi-LiDAR Self-Calibration System Based on Natural Environments and Motion Constraints
Previous Article in Journal
Equilibrium Dynamics in the CR3BP with Radiating Primary and Oblate Secondary Using the Rotating Mass Dipole Model
Previous Article in Special Issue
Linear Model and Gradient Feature Elimination Algorithm Based on Seasonal Decomposition for Time Series Forecasting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multi-Agent Reinforcement Learning with Two-Layer Control Plane for Traffic Engineering

Department of Computing Systems and Automation, Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, 119991 Moscow, Russia
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(19), 3180; https://doi.org/10.3390/math13193180
Submission received: 10 August 2025 / Revised: 26 September 2025 / Accepted: 30 September 2025 / Published: 3 October 2025

Abstract

The article presents a new method for multi-agent traffic flow balancing. It is based on the MAROH multi-agent optimization method. However, unlike MAROH, the agent’s control plane is built on the principles of human decision-making and consists of two layers. The first layer ensures autonomous decision-making by the agent based on accumulated experience—representatives of states the agent has encountered and knows which actions to take in them. The second layer enables the agent to make decisions for unfamiliar states. A state is considered familiar to the agent if it is close, in terms of a specific metric, to a state the agent has already encountered. The article explores variants of state proximity metrics and various ways to organize the agent’s memory. It has been experimentally shown that an agent with the proposed two-layer control plane SAMAROH-2L outperforms the efficiency of an agent with a single-layer control plane, e.g., makes decisions faster, and inter-agent communication reduction varies from 1% to 80% depending on the selected similarity threshold comparing the method with simultaneous actions SAMAROH and from 80% to 96% comparing to MAROH.
Keywords: traffic engineering; multi-agent reinforcement learning; traffic load balancing traffic engineering; multi-agent reinforcement learning; traffic load balancing

Share and Cite

MDPI and ACS Style

Stepanov, E.; Smeliansky, R.; Garkavy, I. Multi-Agent Reinforcement Learning with Two-Layer Control Plane for Traffic Engineering. Mathematics 2025, 13, 3180. https://doi.org/10.3390/math13193180

AMA Style

Stepanov E, Smeliansky R, Garkavy I. Multi-Agent Reinforcement Learning with Two-Layer Control Plane for Traffic Engineering. Mathematics. 2025; 13(19):3180. https://doi.org/10.3390/math13193180

Chicago/Turabian Style

Stepanov, Evgeniy, Ruslan Smeliansky, and Ivan Garkavy. 2025. "Multi-Agent Reinforcement Learning with Two-Layer Control Plane for Traffic Engineering" Mathematics 13, no. 19: 3180. https://doi.org/10.3390/math13193180

APA Style

Stepanov, E., Smeliansky, R., & Garkavy, I. (2025). Multi-Agent Reinforcement Learning with Two-Layer Control Plane for Traffic Engineering. Mathematics, 13(19), 3180. https://doi.org/10.3390/math13193180

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop