Multi-Agent DRL-Based Resource Scheduling and Energy Management for Electric Vehicles
Abstract
:1. Introduction
1.1. Related Work
1.2. Contribution and Structure
- The cross-network wireless resource allocation and energy management problem is formulated as a mixed-integer nonlinear program, considering the dynamics of EVs and the uncertainty of renewable energy. The objective is to minimize both the total delay and energy consumption of EVs and the total energy consumption of charing stations. This is achieved by optimizing offloading decisions in the information network and charging decisions and rates in the energy network.
- To address this issue, the problem is reframed as a Markov Decision Process (MDP) and a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is employed. Each EV acts as an agent to jointly optimize EV-assisted charging station edge computing and charging schedules. Through offline training of the MADDPG model, each EV can make real-time charging station association and resource allocation decisions, thereby maximizing overall system utility by leveraging enhanced scheduling exploration and experience sampling strategies.
- Extensive numerical simulations were conducted to validate the effectiveness of the proposed method. The results demonstrate significant improvements in task performance and reductions in energy consumption and delay across various numbers of EVs. Several DRL algorithms were compared, including DDPG under a fully centralized mechanism (FC-DDPG), DDPG under a fully decentralized mechanism (FD-DDPG), and Actor–Critic. The results show that MADDPG outperforms the others in terms of EV energy consumption, delay, and charging station energy consumption, achieving faster convergence rates and better long-term utility.
2. System Model and Problem Formulation
2.1. Communication Model
2.2. Computation Model
2.3. EV Charging Model
2.4. Problem Formulation
3. MADDPG-Based EV Scheduling Algorithm
3.1. DEC-POMDP Framework
- Environmental State Space: The state can be precisely defined as
- Observation Space: The state can be precisely defined as
- Action space: As the agent, the EV makes decisions regarding its actions at every time slot. These decisions encompass whether to select a charging station for charging, as well as whether to compute the unloading task and charging rate subsequent to selecting a charging station for charging. Therefore, the action performed by each individual EV n can be precisely defined as:Based on the given statement, we observe that the mentioned action involves several variables. Firstly, there is a discrete binary offloading factor variable, denoted as , then the association variable between the n-th EV and the k-th charging station during a specific time slot t. Additionally, we have the continuous charging rate of EV n, represented as variable .
- Reward function: To maximize the system’s overall utility, the reward function is designed to align with this objective and can be formulated as follows during time slot t:
3.2. Preliminary of DDPG Algorithm
3.3. MADDPG Framework for EVs
Algorithm 1 Multi-Agent Deep Deterministic Policy Gradient (MADDPG). |
Observe joint reward and new state Store tuple in replay buffer R. Set .
|
4. Experimental Results
4.1. Simulation Setup
- DDPG under fully centralized mechanism (FC-DDPG): The system jointly determines the task offloading decisions and computation resource allocations by employing a DDPG agent. This agent processes global system states as inputs, with the utility function acting as the reward mechanism.
- DDPG under fully decentralized mechanism (FD-DDPG): The MADDPG algorithm, under a purely decentralized mechanism, allows each electric vehicle (EV) to act as an intelligent agent. These EV agents independently determine task offloading decisions and computation resource allocations. The system implements DDPG within each EV, using the utility function as the reward metric for these agents. This approach enables decentralized decision-making, critical for scalable and efficient resource management in vehicular networks.
- Actor–Critic-based algorithm (Actor–Critic): To evaluate the performance of the proposed MADDPG under the CTDE mechanism for computation offloading, we implement the Actor–Critic-based algorithm. This continuous action space RL algorithm addresses the computation offloading problem, enabling robust comparisons of continuous action dynamics.
4.2. Convergence Performance
4.3. System Performance Comparison
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Satyanarayanan, M. The emergence of edge computing. Computer 2017, 50, 30–39. [Google Scholar]
- Saraswat, S.; Gupta, H.P.; Dutta, T.; Das, S.K. Energy efficient data forwarding scheme in fog-based ubiquitous system with deadline constraints. IEEE Trans. Netw. Serv. Manag. 2020, 17, 213–226. [Google Scholar]
- Zhou, H.; Jiang, K.; Liu, X.; Li, X.; Leung, V. Deep reinforcement learning for energy-efficient computation offloading in mobile-edge computing. IEEE Internet Things J. 2022, 9, 1517–1530. [Google Scholar]
- Ali, H.S.; Rout, R.R.; Parimi, P.; Das, S.K. Real-time task scheduling in fog-cloud computing framework for IoT applications: A fuzzy logic based approach. In Proceedings of the International Conference on Communication Systems and Networks (COMSNETS), Bangalore, India, 5–9 January 2021; pp. 556–564. [Google Scholar]
- Ahani, G.; Yuan, D. BS-assisted task offloading for D2D networks with presence of user mobility. In Proceedings of the 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur, Malaysia, 28 April–1 May 2019; pp. 1–5. [Google Scholar]
- Dinh, T.Q.; Tang, J.; La, Q.D.; Quek, T.Q.S. Offloading in mobile edge computing: Task allocation and computational frequency scaling. IEEE Trans. Commun. 2017, 65, 3571–3584. [Google Scholar]
- Yan, J.; Bi, S.; Zhang, Y.J.; Tao, M. Optimal task offloading and resource allocation in mobile-edge computing with inter-user task dependency. IEEE Trans. Wirel. Commun. 2020, 19, 235–250. [Google Scholar]
- Lyu, X.; Ren, C.; Ni, W.; Tian, H.; Liu, R.P.; Dutkiewicz, E. Optimal online data partitioning for geo-distributed machine learning in edge of wireless networks. IEEE J. Sel. Areas Commun. 2019, 37, 2393–2406. [Google Scholar]
- Ale, L.; Zhang, N.; Wu, H.; Chen, D.; Han, T. Online proactive caching in mobile edge computing using bidirectional deep recurrent neural network. IEEE Internet Things J. 2019, 6, 5520–5530. [Google Scholar]
- Zhou, B.; Zou, J.; Chung, C.Y.; Wang, H.; Liu, N.; Voropai, N.; Xu, D. Multi-microgrid energy management systems: Architecture, communication, and scheduling strategies. J. Mod. Power Syst. Clean Energy 2021, 9, 463–476. [Google Scholar]
- Sun, B.; Huang, Z.; Tan, X.; Tsang, D.H.K. Optimal scheduling for electric vehicle charging with discrete charging levels in distribution grid. IEEE Trans. Smart Grid 2018, 9, 624–634. [Google Scholar]
- Hou, X.; Wang, J.; Huang, T.; Wang, T.; Wang, P. Smart home energy management optimization method considering energy storage and electric vehicle. IEEE Access 2019, 7, 144010–144020. [Google Scholar]
- Kasani, V.S.; Tiwari, D.; Khalghani, M.R.; Solanki, S.K.; Solanki, J. Optimal coordinated charging and routing scheme of electric vehicles in distribution grids: Real grid cases. Sustain. Cities Soc. 2021, 73, 103081. [Google Scholar]
- Yao, L.; Lim, W.H.; Tsai, T.S. A real-time charging scheme for demand response in electric vehicle parking station. IEEE Trans. Smart Grid 2017, 8, 52–62. [Google Scholar]
- Zhao, S.; Lin, X.; Chen, M. Peak-minimizing online EV charging. In Proceedings of the 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–4 October 2013; pp. 46–53. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; Bradford Book: Cambridge, MA, USA, 2018. [Google Scholar]
- Wan, Z.; Li, H.; He, H.; Prokhorov, D. Model-free real-time EV charging scheduling based on deep reinforcement learning. IEEE Trans. Smart Grid 2019, 10, 5246–5257. [Google Scholar]
- Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-line building energy optimization using deep reinforcement learning. IEEE Trans. Smart Grid 2019, 10, 3698–3708. [Google Scholar]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar]
- Zhou, H.; Jiang, K.; He, S.; Min, G.; Wu, J. Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles. IEEE Trans. Wirel. Commun. 2023, 22, 9595–9606. [Google Scholar]
- Zhou, H.; Jiang, K.; He, S.; Min, G.; Wu, J. Federated Distributed Deep Reinforcement Learning for Recommendation-enabled Edge Caching. IEEE Trans. Wirel. Commun. 2023, 22, 9619–9630. [Google Scholar]
- Zhou, H.; Wang, Z.; Zheng, H.; He, S.; Dong, M. Cost Minimization-Oriented Computation Offloading and Service Caching in Mobile Cloud-Edge Computing: An A3C-Based Approach. IEEE Trans. Netw. Sci. Eng. 2023, 10, 1326–1337. [Google Scholar]
- Liu, K.; Liao, W. Intelligent offloading for multi-access edge computing: A new actor-critic approach. In Proceedings of the 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
- Song, S.; Fang, Z.; Zhang, Z.; Chen, C.; Sun, H. Semi-online computational offloading by dueling deep-Q network for user behavior prediction. IEEE Access 2020, 8, 118192–118204. [Google Scholar]
- Zhang, F.; Yang, Q.; An, D. CDDPG: A deep reinforcement learning-based approach for electric vehicle charging control. IEEE Internet Things J. 2021, 8, 3075–3087. [Google Scholar]
- Da Silva, F.L.; Nishida, C.E.H.; Roijers, D.M.; Costa, A.H.R. Coordination of electric vehicle charging through multiagent reinforcement learning. IEEE Trans. Smart Grid 2020, 11, 2347–2356. [Google Scholar]
- Neely, M.J. Stochastic network optimization with application to communication and queueing systems. Synth. Lect. Commun. Netw. 2010, 3, 1–211. [Google Scholar]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, O.P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc. Adv. Neural Inf. Process. Syst. 2017, 30, 6379–6390. [Google Scholar]
- Diao, X.; Zheng, J.; Cai, Y.; Wu, Y.; Anpalagan, A. Fair data allocation and trajectory optimization for UAV-Assisted mobile edge computing. IEEE Commun. Lett. 2019, 23, 2357–2361. [Google Scholar]
- Hu, Q.; Cai, Y.; Yu, G.; Qin, Z.; Zhao, M.; Li, G.Y. Joint offloading and trajectory design for UAV-enabled mobile edge computing systems. IEEE Internet Things J. 2019, 6, 879–1892. [Google Scholar]
Symbol | Description | Symbol | Description |
---|---|---|---|
K | Total number of charging stations in the V2G system. | N | Maximum number of EVs in the V2G system. |
The spectrum density of the additive white Gaussian noise. | Average path loss. | ||
The Euclidean distance between EV n and charging station k. | Path loss exponent. | ||
The transmit power of charging station k. | N | The number of CPU cycles needed to process a single bit of task input. | |
Task drop loss | B | The bandwidth of all the channels. | |
Channel gain | The charging station produces computation tasks correlated with a data volume. | ||
Offloading factor | The association between the n-th EV and the k-th CS at time slot t. | ||
Offloading policy | CPU frequency for charging station. | ||
The delay of execution for CS computation at a given time slot t. | The energy for executing CS computations. | ||
The delay required to offload data of charging station k for EV computing. | The energy for the charging station k to offload tasks to EV n at time slot t. | ||
The delay of execution for n-th EV computation at a given time slot t. | CPU frequency of the EVs for each cycle. | ||
Energy for executing EV computations. | The power required for maintenance when the charging station k is in an idle state. | ||
The total energy for charging station k for EV computation. | The total energy for charging station k. | ||
The charging rate of EV n. | The charging rate of charging station k. | ||
The intensity of solar radiation. | The wind speed. | ||
The generation rate by the photovoltaic. | The power generated by the wind power. | ||
The rate of renewable energy production. | The quantity of renewable energy stored in the battery at charging station k. | ||
The rate at which charging station k purchases energy from the grid. | The energy state of the battery for the k-th charging station. | ||
The amount of energy withdrawn from the battery by charging station k. | The demand for charging station k. | ||
The power requirement of the n-th EV. | The total latency for EV n. |
System Model Parameters | |
---|---|
Parameter | Value |
Number of electric vehicles N | 8 |
Number of charging stations K | 4 |
Time period T | 320 s |
Number of time slots s | 40 |
Task deadline | [5, 10] s |
EV speed V | 10 m/s |
Transmission bandwidth B | 5 MHz |
Channel gain | −50 dB |
Noise power | −100 dBm |
Transmission power | 0.5 W |
CPU cycles per bit s | 1000 cycles/bit |
Computational capability of charging stations | 0.6 GHz |
Computational capability of electric vehicles | 1.2 GHz |
MADDPG Parameters | |
Parameter | Value |
Discount factor | 0.999 |
Batch size | 112 |
Replay buffer size | |
Actor network learning rate | |
Critic network learning rate | |
Exploration constant | |
Soft update factor |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Z.; Yu, C.; Tian, B. Multi-Agent DRL-Based Resource Scheduling and Energy Management for Electric Vehicles. Electronics 2024, 13, 3311. https://doi.org/10.3390/electronics13163311
Zhang Z, Yu C, Tian B. Multi-Agent DRL-Based Resource Scheduling and Energy Management for Electric Vehicles. Electronics. 2024; 13(16):3311. https://doi.org/10.3390/electronics13163311
Chicago/Turabian StyleZhang, Zhewei, Chengbo Yu, and Bingxin Tian. 2024. "Multi-Agent DRL-Based Resource Scheduling and Energy Management for Electric Vehicles" Electronics 13, no. 16: 3311. https://doi.org/10.3390/electronics13163311
APA StyleZhang, Z., Yu, C., & Tian, B. (2024). Multi-Agent DRL-Based Resource Scheduling and Energy Management for Electric Vehicles. Electronics, 13(16), 3311. https://doi.org/10.3390/electronics13163311