# An Optimal Scheduling Strategy of a Microgrid with V2G Based on Deep Q-Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- A V2G mathematical model considering the mobility of EVs and the randomness of user charging behavior is proposed. The user charging time distribution model, charging demand model, EV state-of-charge (SOC) dynamic model and the model of travel location are comprehensively established, so that the agent can obtain the charging/discharging situation in an EV station to obtain the overall output power of the EV station.
- (2)
- A microgrid optimization scheduling strategy based on Deep Q-learning is proposed. The strategy has the ability of online learning and can cope with the randomness of renewable resources better. Meanwhile, the agent with experience replay ability can be trained to complete the evolution process, so as to adapt to the nonlinear influence caused by the mobility of EVs and the periodicity of user behavior, which is feasible and superior in the optimal scheduling of microgrids with renewable resources and EVs.

## 2. The Mathematical Model Construction of Microgrid with EVs

#### 2.1. The V2G Model of EVs

_{0}of the electric vehicle when it returns to the charging station can be obtained. The return time of different EVs within a day and the corresponding charging time are also important components of the V2G model of the EVs station.

_{L}and σ

_{L}represent the mean and variance, respectively, which is determined by different types of user behavior.

_{s}and σ

_{s}represent the mean and variance, respectively, which is determined by different types of user behavior as well.

_{m}, the driving process expected by the user after the EV leaves the charging station can be satisfied. Therefore, according to the daily driving mileage, the time for the battery capacity to be charged to SOC

_{m}after the EV enters the station can be calculated as T

_{c}:

_{c}is the charging power and Q

_{100}is the power consumption per 100 km, W

_{total}is the full power of the EVs, and W

_{m}is the power of the EVs when the state of charge is at SOC

_{m}.

_{leav}

_{e}. It is easy to know that ΔT ≥ T

_{c}. Therefore, ΔT and T

_{leave}satisfy the following formula.

_{enter}is the EV inbound charging time.

_{m}, or the state of charge is greater than SOC

_{m}when entering the station, the EVs will be able to participate in the load distribution optimization scheduling process of the microgrid. That is, it can be discharged when the microgrid encounters peak power consumption, and this discharge process will not make the EV power lower than SOC

_{m}. When the state of charge of the EV reaches SOC

_{max}, the EVs will no longer be charged to ensure battery life. At this time, the EVs will automatically stop charging (maintain SOC

_{max}) or discharge.

_{enter,i}and the necessary charging time ΔT

_{i}and will automatically participate in scheduling or continue charging according to the load status of the microgrid after the charge reaches SOC

_{m}, and leave the charging station when T

_{leave,i}. The specific scheduling process is shown in Figure 3: EV1 enters the station at time T

_{1}, and its state of charge is less than SOC

_{m}at this time, so it enters the state of charge and participates in the scheduling of feeding at time T

_{3}until T

_{4}, at which time the state of charge of the vehicle is greater than SOC

_{m}; EV2 enters the station at time T

_{2}, and its state of charge is greater than SOC

_{m}at this time, so it can immediately participate in dispatching and distribution when the microgrid is at peak load power consumption until T

_{4}; EV3 is extremely low in battery power when entering the station, so it is always kept charged, and it is always kept charged until time T

_{5}.

_{max}), including j vehicle in a state of rechargeable and non-dischargeable (SOC < SOC

_{m}), and the remaining vehicles in both charging and discharging states (SOC

_{m}< SOC < SOC

_{max}). It can be obtained that at time t, the boundary of the overall charging power of the EV station is shown in (7):

#### 2.2. The Optimal Dispatching Model of Microgrid

_{L}is the load disturbance power, P

_{wt}is the wind disturbance power, P

_{pv}is the photovoltaic power generation power, P

_{MT}is the power variation of MT, ΔP

_{EV}is the power variation of EVs and P

_{e}is the power between the microgrid and the large grid.

#### 2.2.1. Objective Function

_{gas}represents the cost of purchasing natural gas for the micro-gas turbine, C

_{e}represents the cost of purchasing and selling electricity generated by the interaction between the microgrid and the power grid and C

_{EV}represents the cost of purchasing and selling electricity generated by the charging and discharging of EVs.

_{MT}represents the output of the micro-gas turbine at time t, η represents the conversion efficiency of the micro-gas turbine, q

_{NG}represents the low calorific value of natural gas and c

_{gas}represents the gas purchase cost coefficient of the micro-gas turbine. P

_{buy,e}, P

_{sell,e}represent the power purchase and sale between the microgrid and the large grid, e

_{b}and e

_{s}represent the cost coefficient of purchasing and selling electricity. ΔP

_{EV}

^{+}and ΔP

_{EV}

^{−}represent the charging power and discharging power of the electric vehicle charging station, and c

_{b}and c

_{s}represent the cost coefficient of charging and discharging.

#### 2.2.2. Constraints

- (1)
- Power Balance Constraints:$$\begin{array}{l}{P}_{wt}(t)+{P}_{pv}(t)+{P}_{MT}(t)+{P}_{buy,ev}(t)+{P}_{buy,e}(t)\\ =L(t)+{P}_{sell,ev}(t)+{P}_{sell,e}(t)\end{array}$$
_{wt}(t), P_{pv}(t) represent the output power of winds and photovoltaics in the t period, and L(t) represents the load in the t period. - (2)
- Micro gas turbine operating constraints:$$\begin{array}{l}-{R}_{d}\Delta t\le {P}_{MT}(t)-{P}_{MT}(t-\Delta t)\le {R}_{u}\Delta t\\ {P}_{MT,\mathrm{min}}\le {P}_{MT}(t)\le {P}_{MT,\mathrm{max}}\end{array}$$
_{MT}represents the output power of the micro-gas turbine, R_{d}and R_{u}represent the downward and upward ramp rates of the micro-gas turbine and P_{MT}_{,min}, P_{MT}_{,max}represent the lower and upper output limits of the micro-gas turbine. - (3)
- Grid interaction power constraints:$$\begin{array}{l}0\le {P}_{\mathrm{sell},\mathrm{e}}(t)\le {P}_{\mathrm{ex},\mathrm{max}}\\ 0\le {P}_{\mathrm{buy},\mathrm{e}}(t)\le {P}_{\mathrm{ex},\mathrm{max}}\\ {P}_{\mathrm{sell},\mathrm{e}}(t)\times {P}_{\mathrm{buy},\mathrm{e}}(t)=0\end{array}$$
- (4)
- EV station constraints:

## 3. A Microgrid Dispatch Model Based on Deep Reinforcement Learning

#### 3.1. Theory of Reinforcement Learning Algorithms

_{t}based on the current state s

_{t}, as shown in Figure 6. After choosing an action at time t, the agent receives a scalar reward r

_{t}

_{+1}and finds itself in a new state s

_{t}

_{+1}, which depends on the current state and the chosen action.

_{t}is shown in (17):

^{∗}is the policy that yields the largest cumulative reward in the long run:

#### 3.2. Design of Optimal Scheduling Strategy for Microgrid Based on Deep Q-Learning

- (1)
- State space:

- (2)
- Action space:

- (3)
- Reward function:

_{d}is the penalty term coefficient.

#### 3.3. Neural Network Structure

#### 3.4. The Flow Diagram of Deep Q-learning Algorithm

## 4. Simulation Results

^{6}, the network parameter learning rate α is 0.0001 and the Adam optimizer is used to update the network weights. The iterative training times are 5 × 10

^{5}times. In this paper, Python software and the computing unit of CPUi7-10700 are used in the simulation experiment platform to construct and verify the simulation model.

#### 4.1. Case1: Analysis of Electric Vehicle Mobility and User Behavior Habits

_{m}. At this time, the controllable capacity of the station reaches its peak and starts to gradually decrease at 4:00, and a large number of EVs leave the charging station at 8:00, which makes the controllable capacity of the station plummeted.

#### 4.2. Case2: Energy Dispatching Results of a Microgrid

**Figure 11.**Microgrid scheduling results based on the PSO algorithm. (

**a**) Microgrid schedule results when EVs output is divided into charging and discharging. (

**b**) Microgrid schedule results when the output of EVs is the whole.

**Figure 12.**Microgrid scheduling results based on the Deep Q-learning algorithm. (

**a**) Microgrid schedule results when EVs output is divided into charging and discharging. (

**b**) Microgrid schedule results when the output of EVs is the whole.

Index | Operating Costs (USD) | Gas Costs (USD) | V2G Costs (USD) | Grid-Connected Costs (USD) | Calculation Time |
---|---|---|---|---|---|

PSO | 814.57 | 825.34 | −169.84 | 159.07 | 7 min 23 s |

Deep Q-learning | 801.07 | 897.70 | −92.79 | −3.84 | 0.05 s |

## 5. Conclusions

- As a mobile energy storage component with V2 G capability, EVs can participate well in the dispatching control of the microgrid, providing a more flexible dispatching scheme for the stable operation of the microgrid.
- Compared with traditional algorithms, Deep Q-learning with online learning ability can better adapt to the strong nonlinear effects caused by the mobility of EVs, randomness of user behavior and renewable resources based on the experience accumulated in the training process. The cost of the microgrid under Deep Q-learning was 801.07 USD, and the calculation time was 0.05 s, while the total operating cost of the microgrid under the PSO algorithm was 814.57 USD, and the calculation time was 7 min 23 s. Therefore, Deep Q-learning was better than the PSO algorithm in all aspects, such as operating total costs, micro-turbine output, V2G interaction situation, grid-connected costs and operating time, which is explained in great detail in Section 4.2.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Lee, E.-K.; Shi, W.; Gadh, R.; Kim, W. Design and Implementation of a Microgrid Energy Management System. Sustainability
**2016**, 8, 1143. [Google Scholar] [CrossRef] - Bevrani, H.; Feizi, M.R.; Ataee, S. Robust Frequency Control in an Islanded Microgrid: H∞ and μ-Synthesis Approaches. IEEE Trans. Smart Grid
**2015**, 99, 1527–1532. [Google Scholar] [CrossRef] - Li, Q.; Gao, M.; Lin, H.; Chen, Z.; Chen, M. MAS-based distributed control method for multi-microgrids with high-penetration renewable energy. Energy
**2019**, 15, 284–295. [Google Scholar] [CrossRef] - Chu, S.; Majumdar, A. Opportunities and challenges for a sustainable energy future. Nature
**2012**, 488, 294–303. [Google Scholar] [CrossRef] - Ciftci, O.; Mehrtash, M.; Marvasti, A.K. Data-Driven Nonparametric Chance-Constrained Optimization for Microgrid Energy Management. IEEE Trans. Ind. Inform.
**2019**, 99, 2447–2457. [Google Scholar] [CrossRef] - Askarzadeh, A. A memory-based genetic algorithm for optimization of power generation in a microgrid. IEEE Trans. Sustain. Energy
**2017**, 9, 1081–1089. [Google Scholar] [CrossRef] - Anh, H.P.H.; Van Kien, C. Optimal energy management of microgrid using advanced multi-objective particle swarm optimization. Eng. Comput.
**2020**, 37, 2085–2110. [Google Scholar] [CrossRef] - Liu, J.; Xu, F.; Lin, S.; Cai, H.; Yan, S. A Multi-Agent-Based Optimization Model for Microgrid Operation Using Dynamic Guiding Chaotic Search Particle Swarm Optimization. Energies
**2018**, 11, 3286. [Google Scholar] [CrossRef] - Zhu, X.; Xia, M.; Chiang, H.D. Coordinated sectional droop charging control for EV aggregator enhancing frequency stability of microgrid with high penetration of renewable energy sources. Appl. Energy
**2018**, 210, 936–943. [Google Scholar] [CrossRef] - Rahimi, F.; Ipakchi, A. Demand Response as a Market Resource Under the Smart Grid Paradigm. IEEE Trans. Smart Grid
**2010**, 1, 82–88. [Google Scholar] [CrossRef] - Bremermann, L.E.; Matos, M.; Lopes, J.A.P.; Rosa, M. Electric vehicle models for evaluating the security of supply. Electr. Power Syst. Res.
**2014**, 111, 32–39. [Google Scholar] [CrossRef] - Yang, J.; Zeng, Z.; Tang, Y.; Yan, J.; He, H.; Wu, Y. Load Frequency Control in Isolated Micro-Grids with Electrical Vehicles Based on Multivariable Generalized Predictive Theory. Energies
**2015**, 8, 2145–2164. [Google Scholar] [CrossRef] - Fan, P.; Ke, S.; Kamel, S.; Yang, J.; Li, Y.; Xiao, J.; Xu, B.; Rashed, G.I. A Frequency and Voltage Coordinated Control Strategy of Island Microgrid including Electric Vehicles. Electronics
**2022**, 11, 17. [Google Scholar] [CrossRef] - Tang, Y.; He, H.; Wen, J.; Liu, J. Power system stability control for a wind farm based on adaptive dynamic programming. IEEE Trans. Smart Grid
**2015**, 6, 166–177. [Google Scholar] [CrossRef] - Ruelens, F.; Claessens, B.J.; Vandael, S.; De Schutter, B.; Babuška, R.; Belmans, R. Residential demand response of thermostatically controlled loads using batch Reinforcement Learning. IEEE Trans. Smart Grid
**2017**, 8, 2149–2159. [Google Scholar] [CrossRef] - Foruzan, E.; Soh, L.K.; Asgarpoor, S. Reinforcement learning approach for optimal distributed energy management in a microgrid. IEEE Trans. Power Syst.
**2018**, 33, 5749–5758. [Google Scholar] [CrossRef] - Kofinas, P.; Dounis, A.I.; Vouros, G.A. Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids. Appl. Energy
**2018**, 210, 53–67. [Google Scholar] [CrossRef] - Sun, J.Y.; Tang, J.M.; Chen, Z.R. Multi-agent Deep Reinforcement Learning for Distributed Energy Management and Strategy Optimization of Microgrid Market—Science Direct. Sustain. Cities Soc.
**2021**, 74, 103163. [Google Scholar] - Li, P.; Hu, W.; Xu, X.; Huang, Q.; Liu, Z.; Chen, Z. A frequency control strategy of electric vehicles in microgrid using virtual synchronous generator control. Energy
**2019**, 189, 116389. [Google Scholar] [CrossRef] - Zhong, W.; Xie, K.; Liu, Y.; Yang, C.; Xie, S. Topology-Aware Vehicle-to-Grid Energy Trading for Active Distribution Systems. IEEE Trans. Smart Grid
**2018**, 10, 2137–2147. [Google Scholar] [CrossRef] - Rao, Y.; Yang, J.; Xiao, J.; Xu, B.; Liu, W.; Li, Y. A frequency control strategy for multimicrogrids with V2G based on the improved robust model predictive control. Energy
**2021**, 222, 119963. [Google Scholar] [CrossRef] - Huang, L.; Fu, M.; Qu, H.; Wang, S.; Hu, S. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems. Expert Syst. Appl.
**2021**, 176, 114896. [Google Scholar] [CrossRef] - Yang, Q.; Zhu, Y.; Zhang, J.; Qiao, S.; Liu, J. UAV Air Combat Autonomous Maneuver Decision Based on DDPG Algorithm. In Proceedings of the 2019 IEEE 15th International Conference on Control and Automation (ICCA) IEEE, Edinburgh, Scotland, 16–19 July 2019. [Google Scholar]
- Yu, T.; Zhou, B.; Chan, K.W.; Yuan, Y.; Yang, B.; Wu, Q.H. R(λ) imitation learning for automatic generation control of interconnected power grids. Automatica
**2012**, 48, 2130–2136. [Google Scholar] [CrossRef]

**Figure 2.**The charging/discharging constraint boundary of EVs. (

**a**) The constraint boundary of EVs when SOC

_{0}< SOC

_{m}. (

**b**) The constraint boundary of EVs when SOC

_{0}> SOC

_{m}.

The User Types | The Chain of Travel | The Proportion/% |
---|---|---|

1 | R→C→R | 52.8 |

2 | R→P→R | 24.1 |

3 | R→C→P→R | 23.1 |

Unit | Parameter | Meaning | Value |
---|---|---|---|

MT | η | generation efficiency | 0.85 |

P_{MT} | capacity of MT | 1000 kW | |

EV | P_{ch} | charge power for EV | 5 kW |

P_{dis} | discharge power for EV | 2.5 kW |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wen, Y.; Fan, P.; Hu, J.; Ke, S.; Wu, F.; Zhu, X.
An Optimal Scheduling Strategy of a Microgrid with V2G Based on Deep Q-Learning. *Sustainability* **2022**, *14*, 10351.
https://doi.org/10.3390/su141610351

**AMA Style**

Wen Y, Fan P, Hu J, Ke S, Wu F, Zhu X.
An Optimal Scheduling Strategy of a Microgrid with V2G Based on Deep Q-Learning. *Sustainability*. 2022; 14(16):10351.
https://doi.org/10.3390/su141610351

**Chicago/Turabian Style**

Wen, Yuxin, Peixiao Fan, Jia Hu, Song Ke, Fuzhang Wu, and Xu Zhu.
2022. "An Optimal Scheduling Strategy of a Microgrid with V2G Based on Deep Q-Learning" *Sustainability* 14, no. 16: 10351.
https://doi.org/10.3390/su141610351