1. Introduction
With the gradual depletion of traditional energy and the increasingly serious problem of environmental pollution, electric vehicles (EVs), as a clean and efficient means of transportation, have become the development trend of the global automobile industry [
1]. However, the large-scale popularization of electric vehicles still faces many challenges, among which the charge and discharge management is particularly prominent. Traditional charge and discharge control strategies, such as rule control and heuristic control, are difficult to adapt to the complex and changeable working conditions of electric vehicles in practical applications, and have limitations such as slow response speed and poor robustness [
2]. Therefore, it is urgent to develop an efficient and intelligent charge and discharge control strategy to improve the energy utilization efficiency and operation reliability of electric vehicles.
This paper explores the challenges of charge and discharge control for electric vehicles, such as demand uncertainty, battery management, and grid balance. An intelligent control model based on the Deep Q network (DQN) is proposed, which uses battery state, power, and electricity price as state variables, takes charge and discharge decision as behavior variables, and optimizes the performance to maximize the long-term cumulative reward. Dual DQN and experiential playback techniques are used to accelerate convergence and improve stability, and an adaptive exploration rate adjustment mechanism is designed to balance exploration and utilization and prevent local optimization.
In order to verify the effectiveness of the method, a simulation model of electric vehicle charging and discharging was constructed on MATLAB/Simulink R2023a (MathWorks, Natick, MA, USA), and a comparative experiment was carried out. The results show that the charge and discharge management strategy based on DRL is more effective than the traditional strategy, which can improve the energy utilization efficiency, reduce the charging cost, and extend the battery life. Compared with the heuristic algorithm, the average charging cost is reduced by 12.5%, and the battery life is increased by 8.3%. The strategy also shows good robustness and adaptability, which can quickly adapt to changes in electricity prices and user demand.
In conclusion, this paper proposes an intelligent charge and discharge management strategy for electric vehicles based on deep reinforcement learning, which provides a new idea and method for solving the difficult problem of charge and discharge control for electric vehicles. Future work will further optimize the DRL model structure, introduce more environmental factors, and carry out application verification on the real vehicle platform to promote the intelligent development of electric vehicle charging and discharging technology.
2. Electric Vehicle Charging and Discharging Challenges
2.1. Limitations of Traditional Control Strategies
With the rapid growth of electric vehicle ownership, the impact of charging and discharging behavior of large-scale electric vehicle clusters on the power grid cannot be ignored [
3]. Literature points out that when the permeability reaches 10%, EV charging load may lead to a power overload of the distribution network by 11.7% and a voltage deviation of more than 6%. On the other hand, the rational use of the energy storage characteristics of electric vehicles to participate in the adjustment of power grid peak consumption can alleviate the pressure on the power grid and improve energy utilization efficiency. Studies have shown that 1 million electric vehicles can provide 5.7 GWh of flexible resources, equivalent to 1/3 of pumped storage [
4]. Therefore, charge and discharge management is faced with an increasingly complex decision-making environment: (1) The number of charging stations and electric vehicles is growing rapidly, and the heterogeneity and uncertainty of control objects are increasing; (2) Dynamic and variable external factors such as power grid dispatching, electricity price and renewable energy output; (3) It is necessary to weigh many factors such as electric vehicle user experience, power grid peak-filling, and charging station operating costs.
2.2. Electric Vehicle Charging and Discharging Demand Analysis
Because of its environmental protection and efficient characteristics, electric vehicles are rapidly becoming popular all over the world. However, the charge and discharge problem of electric vehicles has always been one of the key factors restricting their large-scale application. Electric vehicle users want to complete the charging at the smallest cost and in the shortest time, and at the same time want to extend the battery life as much as possible. This requires that the charging and discharging control strategy of electric vehicles can meet the needs of users while minimizing the charging cost, shortening the charging time, and extending the battery life.
Most of the traditional charge and discharge control strategies are based on preset rules, which lack flexibility and adaptability. For example, the most common constant-current-constant-voltage (CC-CV) charging strategy is simple and reliable, but it is difficult to adapt to complex and changeable practical application scenarios. Although the on-board battery management system (BMS) can monitor the battery status in real time and adjust the charge and discharge parameters, its control algorithm is usually based on the empirical model, which makes it difficult to deal with unknown situations.
The charging and discharging demand of electric vehicles shows obvious spatial and temporal distribution characteristics. In the time dimension, the price of electricity varies significantly between different periods, and the price of electricity in the peak period is two to three times that of the low period. A reasonable arrangement of the charging period, avoiding peak, can significantly reduce the cost of charging. In spatial dimensions, charging facilities in different regions are unevenly distributed, and vehicles often need to detour to charge, resulting in range anxiety. Optimizing the location of charging stations and guiding vehicles to charge nearby can alleviate this problem. In addition, the choice between fast and slow charging is also a matter worth weighing. Although fast charging can greatly shorten the charging time, frequent fast charging will accelerate battery aging. Although the slow charging time is long, it is conducive to extending the battery life.
Deep reinforcement learning is making progress in several areas, especially in smart grids and autonomous driving. Wen et al. developed a DQN-based EV charging strategy that discretized the continuous control problem and learned the optimal charging power sequence through DQN to reduce cost and maintain user comfort. Simulation results show that this strategy saves 17.1% charging cost compared with the rule-based algorithm. Li et al. proposed an intelligent charging algorithm based on DDPG that takes into account real-time electricity prices and renewable energy prediction, learning continuous control strategies to reduce costs and smooth grid loads. The simulation results show that the DDPG algorithm reduces the charging cost by 22.7% compared with the hourly electricity price strategy. These studies prove that deep reinforcement learning is an effective method to optimize the charging and discharging of electric vehicles.
3. Deep Reinforcement Learning Concept
3.1. Intensive Learning of the Basics
In the field of electric vehicle charge and discharge control management, cost minimization and energy efficiency maximization are the two pillars of optimization objectives. The use of the Deep Reinforcement Learning (DRL) algorithm for intelligent decision-making has been proven to be an effective strategy. On the basis of an in-depth study of the DRL algorithm, understanding the construction of the reward function is very important to guide the implementation of the charge and discharge management policy.
The development of the EV charging and discharging DRL model requires the construction of a state space, covering vehicle charging, grid state, and user demand. The action space includes feasible charge and discharge operations, such as selecting the charge power. Using neural networks can predict long-term returns and optimize strategies. The DRL algorithm adopts the actor–critic framework. The actor generates a strategy, and the critic evaluates the action value.
Just like the careful design of the reward signal in the standard reinforcement learning model, the setting of the reward function is a key part of the EV charging strategy optimization, which needs to consider the immediate cost during the charging and discharging process, the long-term battery health, and the degree of satisfaction of user needs. Through a large number of simulation experiments and real-world experiments, researchers can refine and adjust the reward function to ensure that EV charge and discharge control management strategies can adapt to the changing environment, market prices, and demand patterns, thus significantly improving the economic efficiency and energy efficiency of the entire system.
3.2. Current Situation of Deep Reinforcement Learning
In recent years, Deep Reinforcement Learning (DRL) has made significant progress in many fields. DRL utilizes Deep Neural Networks (DNNs) by combining Deep Learning (DL) with Reinforcement Learning (RL). DNNs approach the optimal strategy, overcoming the limitations of traditional RL methods, such as a too-large state space and poor generalization ability [
5]. NIH et al. [
6] proposed the Deep Q-Network (DQN) in 2015, which uses Convolutional Neural Networks (CNNs) to learn control strategies directly from high-dimensional input data—2600 games have achieved performance beyond that of human beings. Since then, tendon and its variants have been widely used in games, robot control, and other fields [
7]. In order to further improve the exploration efficiency and training stability of DRL, Silver et al. [
8] proposed a Deep Deterministic Policy Gradient (DDPG) algorithm, which uses an actor–critic architecture to learn the state–action value function (Critic) and deterministic strategy (Actor) at the same time, achieving excellent performance in control tasks of continuous action space. Schulman et al. [
8] proposed the Proximal Policy Optimization (PPO) algorithm, which avoids drastic changes by constraining policy update amplitude, improves training stability while ensuring exploration efficiency, and is one of the most advanced DRL algorithms. The above DRL algorithm has made remarkable achievements in the standard test environment, such as games and robots, but its application in the actual industrial control system still faces many challenges.
In the future, on the one hand, it is necessary to further improve the sample efficiency, training stability and optimization performance of the DRL algorithm; on the other hand, it is necessary to develop an effective DRL framework according to the characteristics of EVs charging and discharging scenarios, while strengthening the security, robustness, and interpretability of the DRL system, and finally achieve a breakthrough in the practical application of DRL technology in EV energy management [
9,
10].
4. Methodological Framework
4.1. Construction of Deep Reinforcement Learning Model
In the field of electric vehicle intelligent charging, the construction and implementation of a deep reinforcement learning (DRL) model is a key link to achieving efficient and intelligent control. This study aims to explore an innovative DRL algorithm to improve energy utilization efficiency and maintain power grid stability by accurately regulating the charging and discharging process of electric vehicles. Therefore, based on the deep reinforcement learning process of electric vehicle charging control and the end-to-end network architecture, we build a set of deep reinforcement learning models suitable for electric vehicle intelligent charging. As shown in
Figure 1, the overall deep reinforcement learning process for EV charging control is clearly illustrated.
To improve the energy efficiency of grid operators and electric vehicle users, the research team defined the goals of the reinforcement learning model. In this study, a large amount of real-time charging pile data was collected and pre-processed to ensure data quality. Next, the team created simulation environments and reward mechanisms to simulate energy consumption and costs under different charging strategies, providing a basis for algorithmic learning.
As for the core algorithm selection of deep reinforcement learning, this study adopts the dual DQN (DDQN) algorithm, which combines value function approximation and the strategy gradient method. This algorithm can effectively alleviate the problem of overestimation of value, reduce the convergence time, and improve the stability and robustness of the final strategy. Then, the model enters the training stage, using the techniques of experience playback and fixed Q target to improve the data utilization and algorithm stability in the training process.
To test the accuracy and generalization ability of the model, the performance of the model was comprehensively evaluated in multiple validation environments, including preset task performance and the ability to adapt to changes in real-world factors. In the iterative training, the model parameters are fine-tuned, and the target network is used for stable training. After each iteration, the model parameters are fine-tuned to ensure the optimal performance of the algorithm in EV charging control [
11].
In the section on deep reinforcement learning model construction code, we define the architecture of a deep neural network and strictly specify the parameters, such as the number of layers, the number of nodes, and the activation function. The detailed network construction code is displayed in
Figure 2. In particular, the use of the Python language, combined with the Keras API under the TensorFlow framework, through the ‘Sequential’ model class and ‘Dense’ layer method, successfully realized the complex reinforcement learning algorithm model. Parameters such as action value dimension, number of hidden layer nodes, middle layer, and output layer activation function are defined in detail in the class ‘DeepQNetwork’, and ‘adam’ optimizer and ‘mse’ loss function are selected from ‘model.pile’ to complete the model compilation. These details are designed and selected to further improve the accuracy and efficiency of the model in actual operation.
The results show that the optimized DRL model achieves high efficiency in electric vehicle charging control. The intelligent control strategy effectively improves the charging efficiency and keeps the power grid stable. The research introduces a new adaptive approach to EV charging and is expected to extend to smart grid management.
4.2. Design of Charge and Discharge Control Algorithm
In this paper, a charging and discharging control strategy for electric vehicles based on deep Q learning (DQN) is designed. Set the DQN model hyperparameters, including a learning rate of 5 × 10−4, discount factor γ of 0.9, and explore the initial probability ϵ from 1 decay to 0.01. The policy network and the target network are updated every 100 cycles.
When constructing a deep Q learning network, a three-layer fully connected neural network is used, with 256 neurons in each layer, followed by the LeakyReLU activation function. The reward mechanism balances battery health with driving performance, considering charge and discharge efficiency and environmental changes. State space data include speed, battery power, external temperature, etc., reflecting the operating status of electric vehicles. The target power allocation is designed to increase driving range and reduce energy consumption.
The experience replay mechanism is introduced in the charging strategy training, the random sampling method is improved, and a higher priority weight is set for recent and high-value experiences. This enables the network to optimize the control strategy by constantly learning new data, converge to the optimal solution, and make hyperparameter adjustments through grid search and Bayesian optimization to improve the control performance. The complete training, hyperparameter optimization, and control flow are illustrated in
Figure 3.
In the test phase, the implementation efficiency of the proposed power allocation strategy for electric vehicles based on deep Q learning was evaluated. In the simulation experiment, the fuzzy energy management strategy of EV is verified by the data set, and the power distribution effect under different charging and discharging conditions is compared to ensure the feasibility and superiority of the proposed strategy in practical application. The structure of the fuzzy energy management strategy is presented in
Figure 4.
The whole strategy training and optimization process can be summarized as follows: starting from initializing model hyperparameters, designing a reward mechanism, collecting state space data, setting target power allocation, training the network with experience playback until network performance meets the standard, then performing hyperparameter optimization, and finally, conducting the strategy performance test on a real vehicle or simulation platform.
This study explores a new charging and discharging management strategy for electric vehicles, which utilizes deep reinforcement learning and efficient data processing to optimize power distribution, reduce energy consumption, and enhance battery life. The training and control process of electric vehicle power distribution strategy based on deep Q learning provides a new perspective for energy management and lays a foundation for future research.
5. Experimental Results and Discussion
5.1. Algorithm Simulation Experiment
To ensure the accuracy and reliability of the experimental results, we define the loss function
L(
θ) to guide the learning process of the algorithm. The function adopts the basic form of
Q-learning, where
θ represents the parameters of the neural network,
Q(
s,
a;
θ) represents the expected return of taking action
a in the current state
s,
θ− represents the parameter of the target network, and
γ is the discount factor for the purpose of calculating the long-term cumulative return. The optimization of this loss function aims to reduce the gap between the value of the selected action and the value of the optimal action, thereby driving the strategy towards maximizing the cumulative return. The specific form of the loss function is given in Formula.
The deep Q learning algorithm outperforms the benchmark algorithm in performance, especially in power management and response time. At the same time, the A3C algorithm leads in system stability and user satisfaction, demonstrating the potential of complex strategies.
The figure “Electric vehicle charge and discharge optimization Strategy” shows the performance of different algorithms in real-time control, and the figure clearly reflects the process of strategy optimization, providing visual cognitive support for readers. The comparative performance of different strategies is shown in
Figure 5. Each algorithm is verified by a large number of simulation experiments, including specific execution rounds and corresponding training cycles. Through horizontal comparison and vertical analysis, we not only successfully demonstrate the numerical superiority of the proposed algorithms but also emphasize their effectiveness and applicability in actual power grid operation.
In this study, deep reinforcement learning technology is used to significantly improve the intelligent charge and discharge management of electric vehicles. Experiments show that this strategy optimizes the charge–discharge process and brings economic and stability benefits to grid operators and users. Future research will expand the application of deep reinforcement learning in multiple energy systems to enable a broader range of smart energy management and services.
5.2. Result Analysis and Evaluation
In order to accurately control the charging and discharging process of electric vehicles, a new set of charging and discharging management strategies is designed by using the reinforcement learning algorithm based on the deep Q network (DQN). When constructing the strategy model, the input state is composed of multi-dimensional data such as the current battery capacity, charge and discharge demand, time interval, historical charging behavior, and power grid status. The action space is composed of two kinds of decisions: charging and discharging, and the discrete action space design is adopted. The reward function takes into account factors such as current electricity price, charge and discharge efficiency, and battery life, in order to achieve the dual goals of optimal economic benefits and battery life.
In the experimental part, we accurately built the battery model through the simulation environment, and tested the DQN strategy with grid and user data training. The results showed that after 500,000 iterations of the DQN strategy, electricity savings increased by an average of 20% and battery life increased by about 10%, while reducing charging needs and reducing grid stress during peak hours.
Experience playback and memory bank are used in training to prevent short-sightedness of the deep Q network. Learning rate reduced from 5 × 10−4 to 1 × 10−5 to avoid overfitting. Add noise to reward signals and learn from Noisy Nets to improve strategy generalization and robustness. A t-test and ANOVA were used for statistical analysis to ensure the significance of the results and exclude random interference. Calculate the cost sensitivity ratio, compare the simulation and real performance, and ensure the practicability and generalization of the results.
As electric vehicles spread, the load on the grid increased. The research provides new ideas for charge–discharge management and demonstrates economic and environmental benefits in practice. It has a profound impact on electric vehicle charging and discharging action, grid dispatching strategy, and renewable energy utilization. Future work will consider the global impact of charge and discharge on urban power grids, explore the construction of smart charging pile networks, and support the development of smart cities and green transportation.
6. Conclusions
This paper proposes a management strategy for electric vehicle charge and discharge control based on deep reinforcement learning. By constructing a deep reinforcement learning model and designing the corresponding charge and discharge control algorithm, the adaptive optimization control of the charge and discharge process of electric vehicles is realized.
Simulation experiments show that this strategy can effectively improve the energy efficiency of electric vehicles and extend the battery life. Compared with traditional methods, deep reinforcement learning has significant advantages in battery SOC maintenance, charging cost reduction, and grid balance. In this paper, the Double DQN algorithm is used to optimize the deep reinforcement learning model, and Experience Replay and Target Network are introduced to improve the training stability and speed. The reward function combines charging cost, grid balance, battery health, and other factors to guide the agent to learn the best charging and discharging strategy. In the design of the control algorithm, a hierarchical decision-making architecture is proposed. The macro layer is responsible for the global optimization goal, and the micro layer outputs the charging and discharging power scheduling scheme, which not only ensures the global optimization but also improves the response speed.
It should be pointed out that there are still some limitations in this study. First of all, the simulation experiment is carried out in a specific scenario, and the practical application needs to consider the influence of more uncertain factors. Secondly, this paper focuses on the optimal control of a single electric vehicle, and multi-vehicle collaborative optimization strategies need to be explored in the future. In addition, external factors such as power grid scheduling and price mechanism will also affect the control performance, and a more comprehensive analysis is needed.
Nevertheless, the research work in this paper is of great significance for the development of intelligent charging and discharging control technology for electric vehicles. Deep reinforcement learning provides a new idea and method to solve this complex optimization control problem. With the continuous development of intelligent connected vehicles and smart grid technology, it is expected that more research will be devoted to the application of artificial intelligence technology to improve the level of electric vehicle charge and discharge management, and promote the interaction of vehicle networks and the efficient use of new energy. The exploratory work of this paper lays a foundation for the subsequent research.
Author Contributions
Conceptualization, C.Y.; methodology, C.Y.; software, C.Y.; validation, C.Y.; formal analysis, C.Y.; investigation, C.Y.; resources, C.Y.; data curation, C.Y.; writing—original draft preparation, C.Y.; writing—review and editing, C.Y.; visualization, C.Y.; supervision, C.Y.; and project administration, C.Y.; Participated in validation, W.H. and X.L. All authors have read and agreed to the published version of the manuscript.
Funding
This work was financially supported by the Science and Technology Research Project of the Science and Technology Bureau of Bishan District, Chongqing (Grant No. BSKJ2022002) and the Science and Technology Research Project of Chongqing Education Commission (Grant No. KJQN202203702).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Acknowledgments
Thanks to Chongqing Vocational and Technical University of Mechatronics for providing administrative and technical support in terms of experimental sites, equipment use, and academic exchanges; gratitude to the relevant staff who assisted in the data collection process. All individuals and institutions included in this section have provided their explicit consent to be acknowledged.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Zhang, Z. Coordinated charging Algorithm of electric vehicle based on deep reinforcement Learning. Inf. Technol. Netw. Secur. 2022, 23, 18774–18784. [Google Scholar]
- Du, M.; Li, Y.; Wang, B. Deep Reinforcement Learning Optimization Method for Electric Vehicle Charging Control. Chin. J. Electr. Eng. 2019, 39, 4042–4049. [Google Scholar]
- Tang, X.; Chen, J.; Liu, T.; Li, J.; Hu, X. Research on Intelligent Following Control and Energy Management Strategy of Hybrid Electric Vehicles based on Deep reinforcement Learning. J. Mech. Eng. 2022, 57, 237–246. [Google Scholar]
- Zhan, H.; Jiang, C.; Su, Q. Electric Vehicle Charging Guidance Method based on Hierarchical reinforcement Learning. Electr. Power Autom. Equip. 2022, 42, 264–272. [Google Scholar]
- Zhao, X.; Zhang, K.; Feng, D.; Li, H.; Zhou, Y. Real-time Optimal Scheduling Strategy of electric vehicle cluster based on reinforcement Learning. Smart Power 2022, 50, 53–59+81. [Google Scholar]
- Zhang, X.; Shen, L.; Tang, P.; Shi, Y.; Li, Y. Research on Energy management Strategy of THS-III platform based on deep reinforcement learning. Automot. Technol. 2023, 4, 16. [Google Scholar]
- Zhang, W.; Zang, X.; Zhu, J. Optimization of real-time scheduling strategy for electric vehicle changing station based on reinforcement learning. Electr. Power Autom. Equip. 2022, 42, 134–141. [Google Scholar]
- Zhao, X.; Hu, J. Deep Reinforcement Learning optimization Method for charging behavior of Clustered electric vehicles. Power Grid Technol. 2021, 45, 2319–2327. [Google Scholar]
- Tan, C.; Yang, S.; Jiang, Z. Analysis and Optimization of Electric Vehicle Charging and Discharging Behavior. South. Agric. Mach. 2020, 1, 8850654. [Google Scholar]
- Luo, B.; Long, T.; Mai, R.; Dai, R.; He, Z.; Li, W. Analysis and design of hybrid inductive and capacitive wireless power transfer for high-power applications. IET Power Electron. 2018, 11, 2263–2270. [Google Scholar] [CrossRef]
- Luo, S.; Li, S.; Zhao, H. Reactive power comparison of four-coil, LCC and CLC compensation network for wireless power transfer. In Proceedings of the 2017 IEEE PELS Workshop on Emerging Technologies: Wireless Power Transfer (WoW), Chongqing, China, 20–22 May 2017; IEEE: New York, NY, USA, 2017; pp. 268–271. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |