Scalable Wireless Sensor Network Control Using Multi-Agent Reinforcement Learning †
Abstract
1. Introduction
- Mass Neural Network (Mass NN): Approximates the population-level PDFs of agents’ tracking errors and transmission power.
- Critic Neural Network (Critic NN): Estimates the value function, which quantifies tracking accuracy and QoS performance.
- Actor Neural Network (Actor NN): Learns the optimal control input for navigation and transmission power adjustment in real time.
- The decentralized co-optimization problem is formulated for MWSNs as two interconnected Mean Field Games: one for optimal navigation and one for transmission power control. The MFG framework effectively mitigates the “Curse of Dimensionality” associated with large-scale multi-agent systems.
- The data-driven Actor–Critic–Mass (ACM) reinforcement learning algorithm is developed to learn the optimal solution of the MWSN control online, enabling real-time implementation in uncertain and dynamic environments.
- The proposed novel MWSN algorithm is fully decentralized, requiring no inter-agent communication, making it highly scalable and communication-efficient for large populations of mobile agents.
2. Related Work
3. Preliminaries
3.1. Mean Field Game Theory
3.2. The ACM Structure
4. Problem Formulation
4.1. Scenario 1: Optimal Navigation Formulation
4.2. Scenario 2: Optimal Transmission Power Allocation Formulation
5. Mean Field Type Control
5.1. Mean Field Power Allocation Games and Solution Learning
5.2. Mean Field Optimal Navigation Games
5.3. Convergence of Neural Network Weights
| Algorithm 1 ACM for Mean Field Navigation (Scenario 1) and Power Control (Scenario 2) |
|
6. Simulations
Simulation Setup
7. Results and Analysis
8. Discussion
9. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ACM | Actor–Critic–Mass |
| FPK | Fokker–Planck–Kolmogorov |
| HJB | Hamilton–Jacobi–Bellman |
| MFG | Mean Field Game |
| MWSN | Mobile Wireless Sensor Network |
| NN | Neural Network |
| PDE | Partial Differential Equation |
| Probability Density Function | |
| PUA | Parallel Update Algorithm |
| QoS | Quality of Service |
| SINR | Signal-to-Interference-plus-Noise Ratio |
| SLAM | Simultaneous Localization and Mapping |
| UUB | Uniformly Ultimately Bounded |
| AGV | Automated Guided Vehicle |
| AI | Artificial Intelligence |
| AUV | Autonomous Underwater Vehicle |
| MAS | Multi-Agent System |
| POI | Point of Interest |
| UAV | Unmanned Aerial Vehicle |
| UCRL | Upper-Confidence Reinforcement Learning |
| V2X | Vehicle-to-Everything |
References
- Liu, L.; Zheng, Z.; Zhu, S.; Chan, S.; Wu, C. Virtual-Mobile-Agent-Assisted Boundary Tracking for Continuous Objects in Underwater Acoustic Sensor Networks. IEEE Internet Things J. 2024, 11, 9171–9183. [Google Scholar] [CrossRef]
- Huang, P.; Zeng, L.; Chen, X.; Luo, K.; Zhou, Z.; Yu, S. Edge Robotics: Edge-Computing-Accelerated Multirobot Simultaneous Localization and Mapping. IEEE Internet Things J. 2022, 9, 14087–14102. [Google Scholar] [CrossRef]
- Fernández-Jiménez, F.J.; Dios, J.R.M.d. A Robot–Sensor Network Security Architecture for Monitoring Applications. IEEE Internet Things J. 2022, 9, 6288–6304. [Google Scholar] [CrossRef]
- Lee, J.S.; Jiang, H.T. An Extended Hierarchical Clustering Approach to Energy-Harvesting Mobile Wireless Sensor Networks. IEEE Internet Things J. 2021, 8, 7105–7114. [Google Scholar] [CrossRef]
- Su, Y.; Guo, L.; Jin, Z.; Fu, X. A Mobile-Beacon-Based Iterative Localization Mechanism in Large-Scale Underwater Acoustic Sensor Networks. IEEE Internet Things J. 2021, 8, 3653–3664. [Google Scholar] [CrossRef]
- Wang, D.; Chen, H.; Lao, S.; Drew, S. Efficient Path Planning and Dynamic Obstacle Avoidance in Edge for Safe Navigation of USV. IEEE Internet Things J. 2024, 11, 10084–10094. [Google Scholar] [CrossRef]
- Ma, C.; Li, A.; Du, Y.; Dong, H.; Yang, Y. Efficient and scalable reinforcement learning for large-scale network control. Nat. Mach. Intell. 2024, 6, 1006–1020. [Google Scholar] [CrossRef]
- Huang, M.; Caines, P.E.; Charalambous, C.D. Stochastic power control for wireless systems: Classical and viscosity solutions. In Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No. 01CH37228), Orlando, FL, USA, 4–7 December 2001; Volume 2, pp. 1037–1042. [Google Scholar]
- Kafetzis, D.; Vassilaras, S.; Vardoulias, G.; Koutsopoulos, I. Software-defined networking meets software-defined radio in mobile ad hoc networks: State of the art and future directions. IEEE Access 2022, 10, 9989–10014. [Google Scholar] [CrossRef]
- Zejian, Z.; Xu, H. Decentralized Adaptive Optimal Tracking Control for Massive Multi-agent Systems: An Actor-Critic-Mass Algorithm. In Proceedings of the 58th IEEE Conference on Decision and Control, Nice, France, 11–13 December 2019. [Google Scholar]
- Zejian, Z.; Xu, H. Decentralized Adaptive Optimal Control for Massive Multi-agent Systems Using Mean Field Game with Self-Organizing Neural Networks. In Proceedings of the 58th IEEE Conference on Decision and Control, Nice, France, 11–13 December 2019. [Google Scholar]
- Guéant, O.; Lasry, J.M.; Lions, P.L. Mean field games and applications. In Paris-Princeton Lectures on Mathematical Finance 2010; Springer: Berlin/Heidelberg, Germany, 2011; pp. 205–266. [Google Scholar]
- Lasry, J.M.; Lions, P.L. Mean field games. Jpn. J. Math. 2007, 2, 229–260. [Google Scholar] [CrossRef]
- Prag, K.; Woolway, M.; Celik, T. Toward data-driven optimal control: A systematic review of the landscape. IEEE Access 2022, 10, 32190–32212. [Google Scholar] [CrossRef]
- Huang, M.; Caines, P.E.; Malhamé, R.P. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria. IEEE Trans. Autom. Control 2007, 52, 1560–1571. [Google Scholar] [CrossRef]
- Huang, M.; Sheu, S.; Sun, L. Mean field social optimization: Feedback person-by-person optimality and the dynamic programming equation. In Proceedings of the 2020 59th IEEE Conference on Decision and Control (CDC), Jeju, Republic of Korea, 14–18 December 2020. [Google Scholar]
- Cardaliaguet, P.; Porretta, A. An Introduction to Mean Field Game Theory; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–158. [Google Scholar]
- Liu, M.; Zhao, L.; Lopez, V.; Wan, Y.; Lewis, F.; Tseng, H.E.; Filev, D. Game-Theoretic Decision-Making for Autonomous Driving; CRC Press: Boca Raton, FL, USA, 2025; pp. 236–272. [Google Scholar]
- Wei, X.; Zhao, J.; Zhou, L.; Qian, Y. Broad Reinforcement Learning for Supporting Fast Autonomous IoT. IEEE Internet Things J. 2020, 7, 7010–7020. [Google Scholar] [CrossRef]
- Liang, S.; Wang, X.; Huang, J. Actor–Critic Reinforcement Learning Algorithms for Mean Field Games in Continuous Time, State, and Action Spaces. arXiv 2024, arXiv:2401.00052. [Google Scholar] [CrossRef]
- Angiuli, A.; Subramanian, J.; Perolat, J.; Carpentier, A.; Geist, M.; Pietquin, O. Deep Reinforcement Learning for Mean Field Control and Games. arXiv 2023, arXiv:2309.10953. [Google Scholar]
- Bogunovic, I.; Pirotta, M.; Rosolia, U. Safe-M3-UCRL: Safe Mean-Field Multi-Agent Reinforcement Learning under Global Constraints. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 6–10 May 2024; pp. 973–981. [Google Scholar]
- Zaman, A.; Ratliff, L.; Mesbahi, M. Robust Multi-Agent Reinforcement Learning via Mean-Field Games. In Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Jiang, Y.; Xu, K.; Wu, Y.; Zhang, M. A Survey of Fully Decentralized Multi-Agent Reinforcement Learning. arXiv 2024, arXiv:2306.02766. [Google Scholar]
- Gabler, L.; Scheller, S.; Albrecht, S.V. Decentralized Actor–Critic Reinforcement Learning for Cooperative Tasks with Sparse Rewards. Front. Robot. AI 2024, 11, 1229026. [Google Scholar]
- Gu, Y. Centralized training with hybrid execution in multi-agent reinforcement learning via predictive observation imputation. Artif. Intell. 2025, 348, 104404. [Google Scholar] [CrossRef]
- Alam, S.; Khan, M.; Zhang, W. Actor–Critic Frameworks for UAV Swarm Networks: A Survey. Drones 2025, 9, 153. [Google Scholar] [CrossRef]
- Xu, C.; Li, P.; Sun, X. Mean-Field Multi-Agent Reinforcement Learning for UAV-Assisted V2X Communications. arXiv 2025, arXiv:2502.01234. [Google Scholar]
- Emami, N.; Joo, C.; Kim, S.C. Age of Information Minimization Using Multi-Agent UAVs Based on AI-Enhanced Mean Field Resource Allocation. IEEE Trans. Wirel. Commun. 2024, 73, 13368–13380. [Google Scholar] [CrossRef]
- Mostofi, Y.; Malmirchegini, M.; Ghaffarkhah, A. Estimation of communication signal strength in robotic networks. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 1946–1951. [Google Scholar]
- Malmirchegini, M.; Mostofi, Y. On the spatial predictability of communication channels. IEEE Trans. Wirel. Commun. 2012, 11, 964–978. [Google Scholar] [CrossRef]
- Charalambous, C.D.; Menemenlis, N. Stochastic models for long-term multipath fading channels and their statistical properties. In Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No. 99CH36304), Phoenix, AZ, USA, 7–10 December 1999; Volume 5, pp. 4947–4952. [Google Scholar]
- Huang, M.; Malhamé, R.P.; Caines, P.E. Stochastic power control in wireless communication systems: Analysis, approximate control algorithms and state aggregation. In Proceedings of the 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), Maui, HI, USA, 9–12 December 2003; Volume 4, pp. 4231–4236. [Google Scholar]
- Huang, M.; Caines, P.E.; Malhamé, R.P. Individual and mass behaviour in large population stochastic wireless power control problems: Centralized and Nash equilibrium solutions. In Proceedings of the 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475)), Maui, HI, USA, 9–12 December 2003; Volume 1, pp. 98–103. [Google Scholar]
- Aziz, M.; Caines, P.E. Computational investigations of decentralized cellular network optimization via mean field control. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; pp. 5560–5567. [Google Scholar]
- Aziz, M.; Caines, P.E. A mean field game computational methodology for decentralized cellular network optimization. IEEE Trans. Control Syst. Technol. 2016, 25, 563–576. [Google Scholar] [CrossRef]
- Haenggi, M.; Ganti, R.K. Interference in large wireless networks. Found. Trends Netw. 2009, 3, 127–248. [Google Scholar] [CrossRef]
- Baccelli, F.; Błaszczyszyn, B. Stochastic geometry and wireless networks: Volume II applications. Found. Trends Netw. 2010, 4, 1–312. [Google Scholar] [CrossRef]
- Jiang, Y.; Fan, J.; Chai, T.; Lewis, F.L.; Li, J. Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 4607–4620. [Google Scholar] [CrossRef] [PubMed]
- Nourian, M.; Caines, P.E. ϵ-Nash mean field game theory for nonlinear stochastic dynamical systems with major and minor agents. SIAM J. Control Optim. 2013, 51, 3302–3331. [Google Scholar] [CrossRef]
- Alpcan, T.; Başar, T.; Srikant, R.; Altman, E. CDMA uplink power control as a noncooperative game. Wirel. Netw. 2002, 8, 659–670. [Google Scholar] [CrossRef]
- El Jamous, Z.; Davaslioglu, K.; Sagduyu, Y.E. Deep reinforcement learning for power control in next-generation wifi network systems. In Proceedings of the MILCOM 2022—2022 IEEE Military Communications Conference (MILCOM), Rockville, MD, USA, 28 November–2 December 2022; pp. 547–552. [Google Scholar]
- Choi, H.; Kim, T.; Lee, S.; Choi, H.S.; Yoo, N. Energy-Efficient Dynamic Enhanced Inter-Cell Interference Coordination Scheme Based on Deep Reinforcement Learning in H-CRAN. Sensors 2024, 24, 7980. [Google Scholar] [CrossRef]
- Soltani, P.; Eskandarpour, M.; Ahmadizad, A.; Soleimani, H. Energy-Efficient Routing Algorithm for Wireless Sensor Networks: A Multi-Agent Reinforcement Learning Approach. arXiv 2025, arXiv:2508.14679. [Google Scholar]
- Wu, Y.; Wu, J.; Huang, M.; Shi, L. Mean-field transmission power control in dense networks. IEEE Trans. Control Netw. Syst. 2020, 8, 99–110. [Google Scholar] [CrossRef]
- Zhang, H.; Lu, C.; Tang, H.; Wei, X.; Liang, L.; Cheng, L.; Ding, W.; Han, Z. Mean-field-aided multiagent reinforcement learning for resource allocation in vehicular networks. IEEE Internet Things J. 2022, 10, 2667–2679. [Google Scholar] [CrossRef]
- Zhou, Z.; Qian, L.; Xu, H. Decentralized multi-agent reinforcement learning for large-scale mobile wireless sensor network control using mean field games. In Proceedings of the 2024 33rd International Conference on Computer Communications and Networks (ICCCN), Kailua-Kona, HI, USA, 29–31 July 2024; pp. 1–6. [Google Scholar]
















| Parameter | Value |
|---|---|
| N | 1000 |
| Workspace size | |
| 1 | |
| 1 | |
| Neural network type | 2 hidden layers, 64 neurons, ReLU |
| Learning rates |
| Area | Time | Tracking Error Average (m) | Tracking Error Percentage (m) |
|---|---|---|---|
| 1 | 0.00 | 84.28 | 1.80 |
| 1 | 2.22 | 8.02 | 0.17 |
| 1 | 4.44 | 10.29 | 0.27 |
| 1 | 6.67 | 1.04 | 0.03 |
| 1 | 8.89 | 3.21 | 0.09 |
| 1 | 11.11 | 2.98 | 0.09 |
| 1 | 13.33 | 4.58 | 0.16 |
| 1 | 15.56 | 5.31 | 0.18 |
| 1 | 17.78 | 3.22 | 0.12 |
| 1 | 20.00 | 1.67 | 0.07 |
| 2 | 0.00 | 82.03 | 2.39 |
| 2 | 2.22 | 14.90 | 0.41 |
| 2 | 4.44 | 3.21 | 0.11 |
| 2 | 6.67 | 3.74 | 0.14 |
| 2 | 8.89 | 1.42 | 0.05 |
| 2 | 11.11 | 0.75 | 0.03 |
| 2 | 13.33 | 1.68 | 0.08 |
| 2 | 15.56 | 3.03 | 0.13 |
| 2 | 17.78 | 3.63 | 0.16 |
| 2 | 20.00 | 0.71 | 0.04 |
| 3 | 0.00 | 106.73 | 2.87 |
| 3 | 2.22 | 7.03 | 0.20 |
| 3 | 4.44 | 3.15 | 0.10 |
| 3 | 6.67 | 3.62 | 0.12 |
| 3 | 8.89 | 1.68 | 0.06 |
| 3 | 11.11 | 4.10 | 0.15 |
| 3 | 13.33 | 4.02 | 0.16 |
| 3 | 15.56 | 4.31 | 0.18 |
| 3 | 17.78 | 4.31 | 0.19 |
| 3 | 20.00 | 1.37 | 0.06 |
| 4 | 0.00 | 166.29 | 3.37 |
| 4 | 2.22 | 22.69 | 0.41 |
| 4 | 4.44 | 1.92 | 0.05 |
| 4 | 6.67 | 7.16 | 0.18 |
| 4 | 8.89 | 5.18 | 0.12 |
| 4 | 11.11 | 7.95 | 0.20 |
| 4 | 13.33 | 2.14 | 0.06 |
| 4 | 15.56 | 9.61 | 0.28 |
| 4 | 17.78 | 4.85 | 0.15 |
| 4 | 20.00 | 5.54 | 0.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, Z. Scalable Wireless Sensor Network Control Using Multi-Agent Reinforcement Learning. Electronics 2025, 14, 4445. https://doi.org/10.3390/electronics14224445
Zhou Z. Scalable Wireless Sensor Network Control Using Multi-Agent Reinforcement Learning. Electronics. 2025; 14(22):4445. https://doi.org/10.3390/electronics14224445
Chicago/Turabian StyleZhou, Zejian. 2025. "Scalable Wireless Sensor Network Control Using Multi-Agent Reinforcement Learning" Electronics 14, no. 22: 4445. https://doi.org/10.3390/electronics14224445
APA StyleZhou, Z. (2025). Scalable Wireless Sensor Network Control Using Multi-Agent Reinforcement Learning. Electronics, 14(22), 4445. https://doi.org/10.3390/electronics14224445

