Reinforcement Learning for Fair and Efficient Charging Coordination for Smart Grid
Abstract
:1. Introduction
- Formulation as a Markov Decision Process (MDP): We formulate the home battery management problem in an SG environment as an MDP, which allows the RL agent to make optimal charging decisions based on the current system state and past actions, without requiring perfect knowledge of future events. This formulation enables the RL agent to learn effective control strategies in a dynamic and uncertain SG environment.
- Single-Agent Multi-Environment RL for Home Battery Management: We propose a novel approach utilizing a single RL agent for simultaneously optimizing the charging of multiple home batteries in an SG. This approach treats the problem as a single-agent multi-environment scenario, enabling the agent to learn optimal charging policies while considering the unique characteristics of each home battery system, and aiming to achieve the following objectives:
- –
- Enhanced Grid Stability: Our RL-based approach is designed to prevent grid overloads by learning to optimize charging schedules and avoid excessive power demand from batteries, leading to a more stable and reliable power grid.
- –
- Fair Energy Allocation: We incorporate a fairness criterion into the RL agent’s decision-making process. This ensures that all customers receive fair energy distribution, preventing significant disparities in the state of charge (SoC) levels of their home batteries.
- –
- Guaranteed Customer Satisfaction: The RL agent prioritizes maintaining battery levels above a specified threshold. This ensures customer satisfaction by preventing battery depletion and guaranteeing that energy needs are consistently met.
- Extensive Evaluations and Experiments: We have conducted extensive experiments using a real dataset to evaluate our proposed RL system. Our experimental results demonstrate significant improvements in fairness, power efficiency, and customer satisfaction compared to traditional optimization methods, highlighting the potential of RL for optimizing smart grid operations and energy management systems.
2. Related Works and Motivations
2.1. Related Works
2.1.1. Home Energy Management Systems
2.1.2. Microgrid Management System
2.1.3. EVs’ Charging Control
2.2. Motivation
- Adaptability to Real-Time Changes: Traditional methods produce static solutions that fail to adapt to real-time changes in energy consumption and generation patterns. Reinforcement learning continuously learns and adjusts policies based on new data, ensuring optimal performance in dynamic environments.
- Handling Uncertainty: The variability and uncertainty in renewable energy generation and consumption patterns pose significant challenges for traditional methods. Reinforcement learning can manage this uncertainty by optimizing policies that account for the stochastic nature of the environment.
- Scalability: As the number of batteries and the complexity of the system increase, traditional optimization methods struggle to find optimal solutions. Reinforcement learning scales effectively with system complexity, making it suitable for large-scale applications.
3. Preliminaries: Key Concepts and Problem Formulation
3.1. Reinforcement Learning (RL)
3.1.1. Distinction from Traditional Optimization Techniques
3.1.2. Optimization in RL
- : The agent relies solely on past knowledge and ignores the newly acquired information from the latest interaction.
- : The agent completely disregards previous knowledge and focuses entirely on exploring new actions in the current state.
3.2. Deep Learning (DL)
- Convolutional Neural Networks (CNNs) excel at tasks that involve recognizing patterns in grid-like data, making them well-suited for image and video analysis.
- Feed-forward Neural Networks (FFNNs) are general-purpose networks where information flows in one direction, from the input layer to the output layer through hidden layers. They can be applied to a wide range of problems.
- Recurrent Neural Networks (RNNs) are specifically designed to handle sequential data, where the output depends not only on the current input but also on past inputs. This makes them well-suited for tasks like language processing and time series forecasting.
3.3. Proximal Policy Optimization (PPO)
Algorithm 1: Proximal Policy Optimization [37] |
|
4. Proposed Scheme
4.1. System Architecture
- Communities: Each community comprises multiple houses with solar panels generating electricity from the sun. The excess energy generated is stored in batteries for later use. If a battery stores more energy than the need of the consumer, the excess energy is sold to the electricity company.
- Net Meters: Each house has a net meter that records the amount of power generated, consumed, and stored. These meters send data to the control centers and receive power allocation instructions.
- Control Centers: Control centers are responsible for monitoring and managing the power distribution within each community. They receive real-time data from the net meters and make decisions regarding power allocation to maintain optimal operation of the grid.
- Power Grid: The power grid supplies additional power to the communities when the renewable energy generated is insufficient. It also receives excess power from the houses when their generation exceeds consumption.
4.2. Problem Objectives
- Minimize Overall Power Consumption: Minimizing overall power consumption offers significant benefits. Grid operators benefit from reduced operational costs due to lower energy generation requirements, which translates to potential delays in expensive grid infrastructure upgrades. This promotes efficient grid management and long-term power sustainability.
- Maintain Battery Level within Thresholds: Maintaining customer battery levels within pre-defined thresholds is crucial for both the grid and its customers. This ensures grid stability by allowing the grid to draw from stored energy during peak demand periods, reducing reliance solely on power generation. This can potentially delay the need for additional power plants, saving resources and promoting grid sustainability. For customers, maintaining adequate battery levels guarantees uninterrupted service and the ability to power their homes and appliances reliably, meeting their essential energy needs.
- Minimize Discrepancies in Battery Levels (Fairness): Fairness in power distribution, minimizing discrepancies among customer battery levels, is essential for both the grid and its customers. This promotes efficient utilization of available grid capacity by ensuring power is distributed fairly. It avoids overloading specific grid sectors and prevents situations where some customers face power shortages while others have surplus stored energy. This fosters overall grid stability and efficient resource management. From the customer’s perspective, fair charging power allocation builds trust in the grid management system. Customers can be confident they receive a fair share of available resources, ensuring reliable and consistent power availability.
4.3. Markov Decision Process (MDP)-Based Reinforcement Learning (RL) Approach
- Agent: The RL agent is responsible for deciding the optimal power allocation actions based on current battery levels and other relevant states.
- Environment: The environment represents the SG and includes the state of each customer’s current battery level and the maximum power available from the grid.
- State (s): The state, denoted by , provides a comprehensive snapshot of the current energy situation in the SG at a given time step t. It consists of the following key elements:
- –
- : Represents the state of the environment at a given time step t.
- –
- : Denotes the maximum power available for charging from the grid at time step t. This value defines the operational constraint on power allocation, ensuring the agent does not exceed grid capacity. The agent cannot allocate more power in total than what is available from the grid (). This constraint ensures the agent’s decisions are feasible within the physical limitations of the system and guides the agent to stay within the permissible power budget, distributing power efficiently among all customers.
- –
- : Indicates the current state of charge (battery level) for each customer i at time step t. A higher value of SoC for a customer indicates less immediate need for power allocation, allowing the agent to prioritize customers with lower levels. Knowing the SoC of all customers provides a holistic view of the system and helps avoid neglecting critical needs.
- Action (a): The action, denoted by a, represents the power allocation decisions made by the RL agent for each customer’s battery at a given time step t. The action for each customer is a value between (0,1), and the action space defines all possible power allocation combinations with a sum equal to or less than () that the agent can choose from. Each action element corresponds to the amount of power allocated to customer i during the current time step t. The agent aims to select an action that maximizes the expected cumulative reward over time, considering the previously defined objectives. By effectively selecting actions within the action space, the RL agent can learn an optimal policy for power allocation, achieving the desired balance between power saving, fairness, and customer satisfaction in the SG environment.
- State Transition Function (P(s’|(s,a))): The transition function defines the probability of transitioning from state s to state given action a. This function captures the dynamics of the SG and determines how the state evolves based on the agent’s actions and the underlying environment. The transition probabilities are influenced by factors such as power consumption patterns, renewable energy generation, and battery characteristics.
- Reward (r): The reward function, denoted by r, provides feedback to the RL agent based on the chosen actions and their impact on the system’s objectives. The reward function is designed to capture the trade-offs between power saving, customer satisfaction, and fairness. It includes the following components:
- –
- Grid Reward (): This is a penalty applied to actions that result in consuming more power than the maximum power allocated for charging, and it is a reward that increases as the difference between the maximum power allocated for charging and the consumed energy for charging increases. This component aims to prevent exceeding the maximum power and also minimize the amount of power used for charging the batteries. It is defined as follows:
- –
- Fairness Reward (): A reward component that incentives actions leading to equal battery levels among customers. This component promotes fairness in power distribution and prevents disparities in battery levels. It is defined as follows:We use Equation (15) to calculate the reward, where CV = 0 indicates equal distribution of battery levels while CV = −1 indicates the greatest disparity in the battery levels. We multiply CV by (−) sign and add one so that the reward becomes in the range (0,1), where (1) indicates equal distribution of battery levels, while (0) indicates the greatest disparity in the battery levels. Thus, the maximum is (1) and it is achieved when CV is equal to (0) and all customers have the same battery level, and the minimum is (0) and it is achieved when CV is equal to (−1) and there is a big disparity in the battery levels. We perform this shift to normalize all the different components of the rewards by making them in the range (0,1) where (0) represents the min value of each component and (1) represents the max value.
- –
- Customer Reward (): This reward component expresses customer satisfaction by targeting adequate battery levels for each customer that ensures energy needs are met. It is defined as follows:
- ∗
- Fixed Penalty (SoC < ): When the SoC is below a low threshold (), the reward is a fixed penalty of . This penalty discourages low battery levels, ensuring that customer satisfaction remains a priority by avoiding situations where batteries are critically low.
- ∗
- Exponential Equation (): As the SoC increases from to a medium threshold (), the reward follows an exponential curve, starting from 0 at and rising to at . This segment of the reward function aims to provide a smooth and increasing incentive as the battery charge increases, reflecting an improved customer experience.
- ∗
- Linear Equation (): When the SoC is between and , the reward increases linearly from to . This linear increment incentivizes maintaining battery levels within this range, which is considered optimal for balancing power supply and customer satisfaction.
- ∗
- Fixed Reward (): Once the SoC exceeds , the reward is fixed at . This fixed reward ensures that there is no incentive to overcharge the batteries above . We want the agent to keep the SoC between and .
5. Dataset Preparation
5.1. Data Source
5.2. Data Preprocessing
- Outlier Removal: We eliminated anomalous measurements potentially caused by malfunctioning equipment. This is a standard practice for enhancing the training efficacy of machine learning models.
- Net Metering: For each house, we computed the net difference between the power consumption readings and the solar panel generation readings. This essentially simulates having a single meter that reflects the net power usage by the house (consumption minus generation).
- Data Resampling: We down-sampled the data from half-hourly readings to hourly readings (24 readings per day). This reduction in data granularity helps preserve customer privacy.
5.3. Data Visualization
6. Performance Evaluation
- Scenario 1: is equal to average consumption. In this scenario, is set to be the average consumption of all 10 customers combined. This serves as a baseline to understand how closely customer loads can adhere to the overall power limit.
- Scenario 2: is slightly above average consumption. This scenario investigates how the system copes with a limited surplus of available power compared to the typical customer demand.
- Scenario 3: is significantly above average consumption. This scenario explores how the system manages customer loads when there is a substantial amount of excess power available.
6.1. Experimental Setup
- Ensure System Stability: By restricting the sum of actions to be within the capacity limits, we prevent drawing excessive power, thereby maintaining the overall stability and safety of the grid.
- Facilitate Learning: By providing a well-defined action space, the agent can focus on exploring viable solutions, accelerating the learning process, and improving the convergence of the training algorithm.
6.2. Performance Evaluation Metrics
6.2.1. Capacity Utilization
6.2.2. Battery Threshold Violations
6.2.3. Fairness
6.2.4. Cumulative Reward over the Training Process
6.3. Scenario 1: Is Equal to Average Consumption
- Battery Levels Below Critical Threshold: The batteries drop below the critical threshold (), as shown in Figure 8d–g for different customers. This occurs because the available power is insufficient to serve all customers adequately.
- Fairness as a Secondary Priority: Due to the limited power supply, the agent does not prioritize fairness in battery level distribution, as shown in Figure 8c. Instead, the primary focus is preventing the SoC from dropping below the critical threshold to avoid substantial penalties.
- Adaptive Power Utilization: The agent learns to strategically allocate more power to maintain all SoC levels above , as shown in Figure 8b. This demonstrates the agent’s ability to adapt its power usage dynamically to prevent critical battery levels despite the limited power availability.
- Reward Optimization: Although is not sufficient for all customers, the agent successfully increases the cumulative reward across episodes, as shown in Figure 8a. This indicates that the RL agent effectively learns a robust policy for action selection under constrained power conditions, optimizing the overall system performance over time.
6.4. Scenario 2: Is Slightly above Average Consumption
- Improved Battery Levels: The batteries drop below the critical threshold () much less frequently than in Scenario one, as shown in Figure 9d–g for different customers. The modest surplus in available power allows the agent to better manage the battery levels, ensuring that they remain above the critical threshold more consistently.
- Enhanced Fairness: The agent achieves higher fairness in battery level distribution compared to Scenario one, as shown in Figure 9c. With slightly more power available, the agent can allocate resources more fairly among customers, reducing discrepancies in battery levels.
- Efficient Power Allocation: The agent learns to allocate less power compared to Scenario one while still maintaining sufficient battery levels for all customers, as shown in Figure 9b. This indicates that the RL agent is capable of optimizing power usage more effectively when there is a small surplus.
- Higher Cumulative Reward: The total cumulative reward is higher in this scenario compared to Scenario one due to the increased fairness, more savings in power, and higher SoC levels across episodes, as shown in Figure 9a. This demonstrates that the RL agent not only prevents critical battery levels but also enhances overall system performance by promoting fairness and efficient power utilization.
6.5. Scenario 3: Is Significantly above Average Consumption
- Rare Critical Battery Levels: As shown in Figure 10d–g, the value is mostly above with occasional drops below . These happen due to the nature of consumption patterns of certain customers. These drops are typically from very high SoC levels to low levels, reflecting sudden changes in individual consumption.
- Enhanced Fairness: The agent achieves significantly higher fairness in power distribution compared to the two previous scenarios as shown in Figure 10c. The ample power supply allows for fair energy distribution among all customers, minimizing discrepancies in battery levels.
- Optimized Power Utilization: The agent effectively learns to allocate power efficiently, maintaining high SoC levels across all customers while preventing wasteful energy usage as shown in Figure 10b.
- Highest Cumulative Reward: The overall cumulative reward is the highest among all scenarios, indicating the agent’s successful learning and adaptation to the optimal power allocation policy, as shown in Figure 10a.
6.6. Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
SG | Smart Grid |
SM | Smart Meter |
AMI | Advanced Metering Infrastructure |
SO | System Operator |
ML | Machine Learning |
DL | Deep learning |
RL | Reinforcement learning |
DRL | Deep reinforcement learning |
EVs | Electric Vehicles |
DERs | Distributed Energy Resources |
CNN | Convolutional Neural Network |
FFNs | Feed-forward Neural Networks |
RNNs | Recurrent Neural Networks |
PSO | Particle Swarm Optimization |
ADMM | Alternating Direction Method of Multipliers |
MPC | Model-Predictive Control |
MDP | Markov Decision Process |
DR | Demand Response |
HVAC | Heating, Ventilation, and Air Conditioning |
HEMS | Home energy management systems |
MARL | Multi-Agent Reinforcement Learning |
DDPG | Deep Deterministic Policy Gradient |
CommNet | Communication Neural Network |
LSTM | Long Short-Term Memory |
MADDPG | Multi-Agent Deep Deterministic Policy Gradient |
PPO | Proximal Policy Optimization |
References
- Ibrahem, M.I.; Nabil, M.; Fouda, M.M.; Mahmoud, M.M.; Alasmary, W.; Alsolami, F. Efficient privacy-preserving electricity theft detection with dynamic billing and load monitoring for AMI networks. IEEE Internet Things J. 2020, 8, 1243–1258. [Google Scholar] [CrossRef]
- Takiddin, A.; Ismail, M.; Nabil, M.; Mahmoud, M.M.; Serpedin, E. Detecting electricity theft cyber-attacks in AMI networks using deep vector embeddings. IEEE Syst. J. 2020, 15, 4189–4198. [Google Scholar] [CrossRef]
- Ibrahem, M.I.; Badr, M.M.; Fouda, M.M.; Mahmoud, M.; Alasmary, W.; Fadlullah, Z.M. PMBFE: Efficient and privacy-preserving monitoring and billing using functional encryption for AMI networks. In Proceedings of the 2020 International Symposium on Networks, Computers and Communications (ISNCC), Montreal, QC, Canada, 20–22 October 2020; pp. 1–7. [Google Scholar]
- Ibrahem, M.I.; Badr, M.M.; Mahmoud, M.; Fouda, M.M.; Alasmary, W. Countering presence privacy attack in efficient AMI networks using interactive deep-learning. In Proceedings of the 2021 International Symposium on Networks, Computers and Communications (ISNCC), Dubai, United Arab Emirates, 31 October–2 November 2021; pp. 1–7. [Google Scholar]
- Yao, L.; Lim, W.H.; Tsai, T.S. A real-time charging scheme for demand response in electric vehicle parking station. IEEE Trans. Smart Grid 2016, 8, 52–62. [Google Scholar] [CrossRef]
- Arias, N.B.; Sabillón, C.; Franco, J.F.; Quirós-Tortós, J.; Rider, M.J. Hierarchical optimization for user-satisfaction-driven electric vehicles charging coordination in integrated MV/LV networks. IEEE Syst. J. 2022, 17, 1247–1258. [Google Scholar] [CrossRef]
- Xu, Z.; Su, W.; Hu, Z.; Song, Y.; Zhang, H. A hierarchical framework for coordinated charging of plug-in electric vehicles in China. IEEE Trans. Smart Grid 2015, 7, 428–438. [Google Scholar] [CrossRef]
- Malisani, P.; Zhu, J.; Pognant-Gros, P. Optimal charging scheduling of electric vehicles: The co-charging case. IEEE Trans. Power Syst. 2022, 38, 1069–1080. [Google Scholar] [CrossRef]
- Saner, C.B.; Trivedi, A.; Srinivasan, D. A cooperative hierarchical multi-agent system for EV charging scheduling in presence of multiple charging stations. IEEE Trans. Smart Grid 2022, 13, 2218–2233. [Google Scholar] [CrossRef]
- Chen, W.; Wang, J.; Zhang, T.; Li, G.; Jin, Y.; Ge, L.; Zhou, M.; Tan, C.W. Exploring symmetry-induced divergence in decentralized electric vehicle scheduling. IEEE Trans. Ind. Appl. 2023, 60, 1117–1128. [Google Scholar] [CrossRef]
- Wang, H.; Shi, M.; Xie, P.; Lai, C.S.; Li, K.; Jia, Y. Electric vehicle charging scheduling strategy for supporting load flattening under uncertain electric vehicle departures. J. Mod. Power Syst. Clean Energy 2022, 11, 1634–1645. [Google Scholar] [CrossRef]
- Afshar, S.; Disfani, V.; Siano, P. A distributed electric vehicle charging scheduling platform considering aggregators coordination. IEEE Access 2021, 9, 151294–151305. [Google Scholar] [CrossRef]
- Zhao, T.; Ding, Z. Distributed initialization-free cost-optimal charging control of plug-in electric vehicles for demand management. IEEE Trans. Ind. Inform. 2017, 13, 2791–2801. [Google Scholar] [CrossRef]
- Akhavan-Rezai, E.; Shaaban, M.F.; El-Saadany, E.F.; Karray, F. Online intelligent demand management of plug-in electric vehicles in future smart parking lots. IEEE Syst. J. 2015, 10, 483–494. [Google Scholar] [CrossRef]
- Kang, Q.; Feng, S.; Zhou, M.; Ammari, A.C.; Sedraoui, K. Optimal load scheduling of plug-in hybrid electric vehicles via weight-aggregation multi-objective evolutionary algorithms. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2557–2568. [Google Scholar] [CrossRef]
- Chen, J.; Huang, X.; Tian, S.; Cao, Y.; Huang, B.; Luo, X.; Yu, W. Electric vehicle charging schedule considering user’s charging selection from economics. IET Gener. Transm. Distrib. 2019, 13, 3388–3396. [Google Scholar] [CrossRef]
- Ceusters, G.; Rodríguez, R.C.; García, A.B.; Franke, R.; Deconinck, G.; Helsen, L.; Nowé, A.; Messagie, M.; Camargo, L.R. Model-predictive control and reinforcement learning in multi-energy system case studies. Appl. Energy 2021, 303, 117634. [Google Scholar] [CrossRef]
- Zhao, X.; Liang, G. Optimizing electric vehicle charging schedules and energy management in smart grids using an integrated GA-GRU-RL approach. Front. Energy Res. 2023, 11, 1268513. [Google Scholar] [CrossRef]
- Wang, K.; Wang, H.; Yang, Z.; Feng, J.; Li, Y.; Yang, J.; Chen, Z. A transfer learning method for electric vehicles charging strategy based on deep reinforcement learning. Appl. Energy 2023, 343, 121186. [Google Scholar] [CrossRef]
- Hossain, M.B.; Pokhrel, S.R.; Vu, H.L. Efficient and private scheduling of wireless electric vehicles charging using reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4089–4102. [Google Scholar] [CrossRef]
- Verschae, R.; Kawashima, H.; Kato, T.; Matsuyama, T. Coordinated energy management for inter-community imbalance minimization. Renew. Energy 2016, 87, 922–935. [Google Scholar] [CrossRef]
- Zhou, K.; Cheng, L.; Wen, L.; Lu, X.; Ding, T. A coordinated charging scheduling method for electric vehicles considering different charging demands. Energy 2020, 213, 118882. [Google Scholar] [CrossRef]
- Chang, T.H.; Alizadeh, M.; Scaglione, A. Real-time power balancing via decentralized coordinated home energy scheduling. IEEE Trans. Smart Grid 2013, 4, 1490–1504. [Google Scholar] [CrossRef]
- Pinto, G.; Piscitelli, M.S.; Vázquez-Canteli, J.R.; Nagy, Z.; Capozzoli, A. Coordinated energy management for a cluster of buildings through deep reinforcement learning. Energy 2021, 229, 120725. [Google Scholar] [CrossRef]
- Kaewdornhan, N.; Srithapon, C.; Liemthong, R.; Chatthaworn, R. Real-Time Multi-Home Energy Management with EV Charging Scheduling Using Multi-Agent Deep Reinforcement Learning Optimization. Energies 2023, 16, 2357. [Google Scholar] [CrossRef]
- Real, A.C.; Luz, G.P.; Sousa, J.; Brito, M.; Vieira, S. Optimization of a photovoltaic-battery system using deep reinforcement learning and load forecasting. Energy AI 2024, 16, 100347. [Google Scholar] [CrossRef]
- Cai, W.; Kordabad, A.B.; Gros, S. Energy management in residential microgrid using model predictive control-based reinforcement learning and Shapley value. Eng. Appl. Artif. Intell. 2023, 119, 105793. [Google Scholar] [CrossRef]
- Zhang, Z.; Wan, Y.; Qin, J.; Fu, W.; Kang, Y. A Deep RL-Based Algorithm for Coordinated Charging of Electric Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18774–18784. [Google Scholar] [CrossRef]
- Zhang, W.; Liu, H.; Xiong, H.; Xu, T.; Wang, F.; Xin, H.; Wu, H. RLCharge: Imitative Multi-Agent Spatiotemporal Reinforcement Learning for Electric Vehicle Charging Station Recommendation. IEEE Trans. Knowl. Data Eng. 2023, 35, 6290–6304. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, Q.; An, D.; Li, D.; Wu, Z. Multistep Multiagent Reinforcement Learning for Optimal Energy Schedule Strategy of Charging Stations in Smart Grid. IEEE Trans. Cybern. 2023, 53, 4292–4305. [Google Scholar] [CrossRef]
- Fu, L.; Wang, T.; Song, M.; Zhou, Y.; Gao, S. Electric vehicle charging scheduling control strategy for the large-scale scenario with non-cooperative game-based multi-agent reinforcement learning. Int. J. Electr. Power Energy Syst. 2023, 153, 109348. [Google Scholar] [CrossRef]
- Sultanuddin, S.; Vibin, R.; Kumar, A.R.; Behera, N.R.; Pasha, M.J.; Baseer, K. Development of improved reinforcement learning smart charging strategy for electric vehicle fleet. J. Energy Storage 2023, 64, 106987. [Google Scholar] [CrossRef]
- Paudel, D.; Das, T.K. A deep reinforcement learning approach for power management of battery-assisted fast-charging EV hubs participating in day-ahead and real-time electricity markets. Energy 2023, 283, 129097. [Google Scholar] [CrossRef]
- El-Toukhy, A.T.; Badr, M.M.; Mahmoud, M.M.E.A.; Srivastava, G.; Fouda, M.M.; Alsabaan, M. Electricity Theft Detection Using Deep Reinforcement Learning in Smart Power Grids. IEEE Access 2023, 11, 59558–59574. [Google Scholar] [CrossRef]
- Kumari, A.; Tanwar, S. A reinforcement-learning-based secure demand response scheme for smart grid system. IEEE Internet Things J. 2021, 9, 2180–2191. [Google Scholar] [CrossRef]
- Lu, R.; Hong, S.H.; Yu, M. Demand response for home energy management using reinforcement learning and artificial neural network. IEEE Trans. Smart Grid 2019, 10, 6629–6639. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Dong, S.; Xia, Y.; Peng, T. Network abnormal traffic detection model based on semi-supervised deep reinforcement learning. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4197–4212. [Google Scholar] [CrossRef]
- Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Syst. Appl. 2020, 141, 112963. [Google Scholar] [CrossRef]
- Kokar, M.M.; Reveliotis, S.A. Reinforcement learning: Architectures and algorithms. Int. J. Intell. Syst. 1993, 8, 875–894. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Reddi, V.J. Deep reinforcement learning for cyber security. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 3779–3795. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Available online: https://www.ausgrid.com.au/Industry/Our-Research/Data-to-share (accessed on 20 December 2023).
- Blonsky, M.; McKenna, K.; Maguire, J.; Vincent, T. Home energy management under realistic and uncertain conditions: A comparison of heuristic, deterministic, and stochastic control methods. Appl. Energy 2022, 325, 119770. [Google Scholar] [CrossRef]
- Maatug, F. Anomaly Detection of Smart Meter Data. Master’s Thesis, University of Stavanger, Stavanger, Norway, 2021. [Google Scholar]
- Badr, M.M.; Ibrahem, M.I.; Mahmoud, M.; Fouda, M.M.; Alsolami, F.; Alasmary, W. Detection of false-reading attacks in smart grid net-metering system. IEEE Internet Things J. 2021, 9, 1386–1401. [Google Scholar] [CrossRef]
Specification | Value |
---|---|
Capacity | 13.5 KWh |
Usable Capacity | 13.5 KWh |
Voltage | 350–450 Volts |
Efficiency | 90% round trip |
Power Output | 5 kw continuous, 7 kw peak |
Parameter | Value |
---|---|
Number of Episodes | 600 |
Discount Factor | 0.8 |
Learning Rate | 0.01 |
Epsilon Decay | 0.998 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elshazly, A.A.; Badr, M.M.; Mahmoud, M.; Eberle, W.; Alsabaan, M.; Ibrahem, M.I. Reinforcement Learning for Fair and Efficient Charging Coordination for Smart Grid. Energies 2024, 17, 4557. https://doi.org/10.3390/en17184557
Elshazly AA, Badr MM, Mahmoud M, Eberle W, Alsabaan M, Ibrahem MI. Reinforcement Learning for Fair and Efficient Charging Coordination for Smart Grid. Energies. 2024; 17(18):4557. https://doi.org/10.3390/en17184557
Chicago/Turabian StyleElshazly, Amr A., Mahmoud M. Badr, Mohamed Mahmoud, William Eberle, Maazen Alsabaan, and Mohamed I. Ibrahem. 2024. "Reinforcement Learning for Fair and Efficient Charging Coordination for Smart Grid" Energies 17, no. 18: 4557. https://doi.org/10.3390/en17184557
APA StyleElshazly, A. A., Badr, M. M., Mahmoud, M., Eberle, W., Alsabaan, M., & Ibrahem, M. I. (2024). Reinforcement Learning for Fair and Efficient Charging Coordination for Smart Grid. Energies, 17(18), 4557. https://doi.org/10.3390/en17184557