A Deep Reinforcement Learning Approach for Energy Management in Low Earth Orbit Satellite Electrical Power Systems
Abstract
1. Introduction
1.1. Typical Approaches for Satellite EPSs
1.2. Motivation and Contribution
- Sequential decision-making: The energy management task involves sequential decision-making under uncertainty and dynamic environmental conditions (e.g., varying solar exposure and power profiles). As a DRL technique, DQN is well-suited for such problems where the goal is to learn optimal policies over time, rather than static mappings between input and output as in supervised learning [20,31].
- Lack of labeled data: Supervised learning approaches (including LSTM- or CNN-based solutions) require large, labeled datasets representing optimal energy management decisions. Such datasets are typically unavailable or infeasible to generate at scale for highly dynamic satellite environments [32]. In contrast, DQN-based agents learn through interaction with a simulated environment, making it more flexible and scalable for this context [33].
- Online adaptability and robustness: DQN-trained agents can also adapt to new operating conditions without requiring manual relabeling or retraining, offering robustness to modeling uncertainties and non-stationary environments, both of which are common in space operations [34].
- Discrete action space suitability: The energy management problem for LEO satellite EPSs can be effectively modeled using a discrete action space, as shown in Section 3, making DQN particularly well-suited to this setting without requiring the complexities of continuous control methods. This design choice allows achieving a good trade-off between expressiveness and implementability, which is critical for embedded systems aboard LEO satellites with limited computational and memory resources [6,32].
- Computational efficiency: DQN is lightweight in both training and inference, which is a crucial advantage when targeting real-time deployment aboard resource-constrained satellite platforms [34]. Methods like TD3 and SAC generally involve a higher computational overhead and memory consumption [20,21].
- Demonstrated effectiveness: Despite its relative simplicity, as reported in Section 4, the proposed DQN-based solution achieved very competitive results in extensive simulation studies, providing good adaptability and performance across multiple operational scenarios.
1.3. Paper Organization and Notation
2. The EPS Environment
2.1. Photovoltaic Panel
2.2. DC-DC Converter
2.3. Battery Model
2.4. Payload Power Profiles
2.5. Controller
3. The Proposed DQN-Based Framework for LEO Satellite EPSs
3.1. Brief on Reinforcement Learning and DQN Algorithm
3.2. Designing State Space, Action Space, and Reward Function
4. Numerical Simulations and Results
4.1. Simulation Set-Up and Parameter Configuration
4.2. The Learning Process
- is the initial (maximum) exploration probability,
- is the final (minimum) exploration probability,
- is the exponential decay rate,
- is the current step number, incremented after each action.
4.3. Performance Validation and Generalization Capabilities of the Trained DQN Agent
4.4. A Comparison with MPPT Control Technique
4.5. Multi-Orbit Scenario
5. Conclusions and Outlook
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
CNN | Convolutional Neural Network |
DDPG | Deep Deterministic Policy Gradient |
DQN | Deep-Q Network |
DRL | Deep Reinforcement Learning |
DTE | Direct Energy Transfer |
EPS | Electrical Power System |
LEO | Low Earth Orbit |
LSTM | Long Short-Term Memory |
MDP | Markov Decision Process |
MPP | Maximum Power Point |
MPPT | Maximum Power Point Tracking |
NN | Neural Network |
PPO | Proximal Policy Optimization |
PPT | Peak Power Tracking |
PV | Photo Voltaic |
RL | Reinforcement Learning |
SA | Solar Array |
SAC | Soft Actor–Critic |
SoC | State of Charge |
TD3 | Twin-Delayed Deep Deterministic |
References
- Capannolo, A.; Silvestrini, S.; Colagrossi, A.; Pesce, V. Chapter Four-Orbital dynamics. In Modern Spacecraft Guidance, Navigation, and Control; Pesce, V., Colagrossi, A., Silvestrini, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2023; pp. 131–206. [Google Scholar]
- Low Earth Orbit (LEO) Satellite Feasibility Report; Technical report; Washington State Department of Commerce: Washington, DC, USA, 2023.
- Jung, J.; Sy, N.V.; Lee, D.; Joe, S.; Hwang, J.; Kim, B. A Single Motor-Driven Focusing Mechanism with Flexure Hinges for Small Satellite Optical Systems. Appl. Sci. 2020, 10, 7087. [Google Scholar] [CrossRef]
- Yang, Y.; Mao, Y.; Ren, X.; Jia, X.; Sun, B. Demand and key technology for a LEO constellation as augmentation of satellite navigation systems. Satell. Navig. 2024, 5, 11. [Google Scholar] [CrossRef]
- Knap, V.; Vestergaard, L.K.; Stroe, D.I. A Review of Battery Technology in CubeSats and Small Satellite Solutions. Energies 2020, 13, 4097. [Google Scholar] [CrossRef]
- Tipaldi, M.; Legendre, C.; Koopmann, O.; Ferraguto, M.; Wenker, R.; D’Angelo, G. Development strategies for the satellite flight software on-board Meteosat Third Generation. Acta Astronaut. 2018, 145, 482–491. [Google Scholar] [CrossRef]
- Nardone, V.; Santone, A.; Tipaldi, M.; Liuzza, D.; Glielmo, L. Model checking techniques applied to satellite operational mode management. IEEE Syst. J. 2018, 13, 1018–1029. [Google Scholar] [CrossRef]
- Zoppi, M.; Tipaldi, M.; Di Cerbo, A. Cross-model verification of the electrical power subsystem in space projects. Measurement 2018, 122, 473–483. [Google Scholar] [CrossRef]
- Faria, R.P.; Gouvêa, C.P.; Vilela de Castro, J.C.; Rocha, R. Design and implementation of a photovoltaic system for artificial satellites with regulated DC bus. In Proceedings of the 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE), Edinburgh, UK, 19–21 June 2017; pp. 676–681. [Google Scholar]
- Mokhtar, M.A.; ElTohamy, H.A.F.; Elhalwagy Yehia, Z.; Mohamed, E.H. Developing a novel battery management algorithm with energy budget calculation for low Earth orbit (LEO) spacecraft. Aerosp. Syst. 2024, 7, 143–157. [Google Scholar] [CrossRef]
- Erickson, R.W.; Maksimović, D. Fundamentals of Power Electronics, 3rd ed.; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Patel, M.R. Spacecraft Power Systems; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
- Mostacciuolo, E.; Baccari, S.; Sagnelli, S.; Iannelli, L.; Vasca, F. An Optimization Approach for Electrical Power System Supervision and Sizing in Low Earth Orbit Satellites. IEEE Access 2024, 12, 151864–151875. [Google Scholar] [CrossRef]
- Mostacciuolo, E.; Iannelli, L.; Sagnelli, S.; Vasca, F.; Luisi, R.; Stanzione, V. Modeling and power management of a LEO small satellite electrical power system. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 2738–2743. [Google Scholar]
- Yaqoob, M.; Lashab, A.; Vasquez, J.C.; Guerrero, J.M.; Orchard, M.E.; Bintoudi, A.D. A Comprehensive Review on Small Satellite Microgrids. IEEE Trans. Power Electron. 2022, 37, 12741–12762. [Google Scholar] [CrossRef]
- Khan, O.; Moursi, M.E.; Zeineldin, H.; Khadkikar, V.; Al Hosani, M. Comprehensive design and control methodology for DC-powered satellite electrical subsystem based on PV and battery. IET Renew. Power Gener. 2020, 14, 2202–2210. [Google Scholar] [CrossRef]
- Mostacciuolo, E.; Vasca, F.; Baccari, S.; Iannelli, L.; Sagnelli, S.; Luisi, R.; Stanzione, V. An optimization strategy for battery charging in small satellites. In Proceedings of the 2019 European Space Power Conference, Juan-les-Pins, France, 30 September–4 October 2019; pp. 1–8. [Google Scholar]
- Maddalena, E.T.; Lian, Y.; Jones, C.N. Data-driven methods for building control—A review and promising future directions. Control Eng. Pract. 2020, 95, 104211. [Google Scholar] [CrossRef]
- Yaghoubi, E.; Yaghoubi, E.; Khamees, A.; Razmi, D.; Lu, T. A systematic review and meta-analysis of machine learning, deep learning, and ensemble learning approaches in predicting EV charging behavior. Eng. Appl. Artif. Intell. 2024, 135, 108789. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
- Baccari, S.; Tipaldi, M.; Mariani, V. Deep Reinforcement Learning for Cell Balancing in Electric Vehicles with Dynamic Reconfigurable Batteries. IEEE Trans. Intell. Veh. 2024, 9, 6450–6461. [Google Scholar] [CrossRef]
- Joshi, A.; Tipaldi, M.; Glielmo, L. Multi-agent reinforcement learning for decentralized control of shared battery energy storage system in residential community. Sustain. Energy Grids Netw. 2025, 41, 101627. [Google Scholar] [CrossRef]
- Brescia, E.; Pio Savastio, L.; Di Nardo, M.; Leonardo Cascella, G.; Cupertino, F. Accuracy of Online Estimation Methods of Stator Resistance and Rotor Flux Linkage in PMSMs. IEEE J. Emerg. Sel. Top. Power Electron. 2024, 12, 4941–4955. [Google Scholar] [CrossRef]
- Yu, L.; Qin, S.; Zhang, M.; Shen, C.; Jiang, T.; Guan, X. A Review of Deep Reinforcement Learning for Smart Building Energy Management. IEEE Internet Things J. 2021, 8, 12046–12063. [Google Scholar] [CrossRef]
- Wu, J.; Wei, Z.; Li, W.; Wang, Y.; Li, Y.; Sauer, D.U. Battery Thermal- and Health-Constrained Energy Management for Hybrid Electric Bus Based on Soft Actor-Critic DRL Algorithm. IEEE Trans. Ind. Inform. 2021, 17, 3751–3761. [Google Scholar] [CrossRef]
- Wu, J.; Huang, C.; He, H.; Huang, H. Confidence-aware reinforcement learning for energy management of electrified vehicles. Renew. Sustain. Energy Rev. 2024, 191, 114154. [Google Scholar] [CrossRef]
- Guo, C.; Wang, X.; Zheng, Y.; Zhang, F. Real-time optimal energy management of microgrid with uncertainties based on deep reinforcement learning. Energy 2022, 238, 121873. [Google Scholar] [CrossRef]
- Zheng, K.; Jia, X.; Chi, K.; Liu, X. DDPG-Based Joint Time and Energy Management in Ambient Backscatter-Assisted Hybrid Underlay CRNs. IEEE Trans. Commun. 2023, 71, 441–456. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Bertsekas, D. Reinforcement Learning and Optimal Control; Athena Scientific: Nashua, NH, USA, 2019. [Google Scholar]
- Furano, G.; Meoni, G.; Dunne, A.; Moloney, D.; Ferlet-Cavrois, V.; Tavoularis, A.; Byrne, J.; Buckley, L.; Psarakis, M.; Voss, K.O.; et al. Towards the Use of Artificial Intelligence on the Edge in Space Systems: Challenges and Opportunities. IEEE Aerosp. Electron. Syst. Mag. 2020, 35, 44–56. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Tipaldi, M.; Iervolino, R.; Massenio, P.R. Reinforcement learning in spacecraft control applications: Advances, prospects, and challenges. Annu. Rev. Control 2022, 54, 1–23. [Google Scholar] [CrossRef]
- Baccari, S.; Vasca, F.; Mostacciuolo, E.; Iannelli, L.; Sagnelli, S.; Luisi, R.; Stanzione, V. A characterization system for LEO satellites batteries. In Proceedings of the 2019 European Space Power Conference (ESPC), Juan-les-Pins, France, 30 September–4 October 2019; pp. 1–6. [Google Scholar]
- Edpuganti, A.; Khadkikar, V.; Moursi, M.S.E.; Zeineldin, H.; Al-Sayari, N.; Al Hosani, K. A Comprehensive Review on CubeSat Electrical Power System Architectures. IEEE Trans. Power Electron. 2022, 37, 3161–3177. [Google Scholar] [CrossRef]
- Triple-Junction Solar Cell for Space Applications (CTJ30). 2020. Available online: https://www.cesi.it/app/uploads/2020/03/Datasheet-CTJ30-1.pdf (accessed on 13 July 2025).
- MP 176065 Rechargeble Li-on Prismatic Cell. Available online: https://docs.rs-online.com/32de/0900766b8056b890.pdf (accessed on 13 July 2025).
- Jalil, M.F.; Khatoon, S.; Nasiruddin, I.; Bansal, R. Review of PV array modelling, configuration and MPPT techniques. Int. J. Model. Simul. 2022, 42, 533–550. [Google Scholar] [CrossRef]
- Chaibi, Y.; Salhi, M.; El-Jouni, A.; Essadki, A. A new method to extract the equivalent circuit parameters of a photovoltaic panel. Sol. Energy 2018, 163, 376–386. [Google Scholar] [CrossRef]
- Farmann, A.; Sauer, D.U. A study on the dependency of the open-circuit voltage on temperature and actual aging state of lithium-ion batteries. J. Power Sources 2017, 347, 1–13. [Google Scholar] [CrossRef]
- Mostacciuolo, E.; Iannelli, L.; Baccari, S.; Vasca, F. An interlaced co-estimation technique for batteries. In Proceedings of the 2023 31st Mediterranean Conference on Control and Automation (MED), Limassol, Cyprus, 26–29 June 2023; pp. 73–78. [Google Scholar]
- Joshi, A.; Tipaldi, M.; Glielmo, L. A belief-based multi-agent reinforcement learning approach for electric vehicle coordination in a residential community. Sustain. Energy Grids Netw. 2025, 43, 101790. [Google Scholar] [CrossRef]
- Subudhi, B.; Pradhan, R. A comparative study on maximum power point tracking techniques for photovoltaic power systems. IEEE Trans. Sustain. Energy 2012, 4, 89–98. [Google Scholar] [CrossRef]
- Schirone, L.; Granello, P.; Massaioli, S.; Ferrara, M.; Pellitteri, F. An Approach for Maximum Power Point Tracking in Satellite Photovoltaic Arrays. In Proceedings of the 2024 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM), Ischia, Italy, 19–21 June 2024; IEEE: New York, NY, USA, 2024; pp. 788–793. [Google Scholar]
- Balal, A.; Murshed, M. Implementation and comparison of Perturb and Observe, and Fuzzy Logic Control on Maximum Power Point Tracking (MPPT) for a Small Satellite. J. Soft Comput. Decis. Support Syst. 2021, 8, 14–18. [Google Scholar]
- Mason, K.; Grijalva, S. A review of reinforcement learning for autonomous building energy management. Comput. Electr. Eng. 2019, 78, 300–312. [Google Scholar] [CrossRef]
- Bollipo, R.B.; Mikkili, S.; Bonthagorla, P.K. Hybrid, optimal, intelligent and classical PV MPPT techniques: A review. CSEE J. Power Energy Syst. 2021, 7, 9–33. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Baccari, S.; Mostacciuolo, E.; Tipaldi, M.; Mariani, V. A Deep Reinforcement Learning Approach for Energy Management in Low Earth Orbit Satellite Electrical Power Systems. Electronics 2025, 14, 3110. https://doi.org/10.3390/electronics14153110
Baccari S, Mostacciuolo E, Tipaldi M, Mariani V. A Deep Reinforcement Learning Approach for Energy Management in Low Earth Orbit Satellite Electrical Power Systems. Electronics. 2025; 14(15):3110. https://doi.org/10.3390/electronics14153110
Chicago/Turabian StyleBaccari, Silvio, Elisa Mostacciuolo, Massimo Tipaldi, and Valerio Mariani. 2025. "A Deep Reinforcement Learning Approach for Energy Management in Low Earth Orbit Satellite Electrical Power Systems" Electronics 14, no. 15: 3110. https://doi.org/10.3390/electronics14153110
APA StyleBaccari, S., Mostacciuolo, E., Tipaldi, M., & Mariani, V. (2025). A Deep Reinforcement Learning Approach for Energy Management in Low Earth Orbit Satellite Electrical Power Systems. Electronics, 14(15), 3110. https://doi.org/10.3390/electronics14153110