Optimization Solution for Unit Power Generation Plan Based on the Integration of Constraint Identification and Deep Reinforcement Learning
Abstract
1. Introduction
2. A Joint Optimization Model Considering the Output Limitations of Thermal Power Units and the Fluctuation Risks of Wind and Solar Energy
2.1. Joint Optimization Model
2.1.1. Unit Combination Optimization
2.1.2. Backup Decision Optimization
2.2. Risk Measurement Based on CVaR
3. Model Solution Based on Deep Reinforcement Learning and Constraint Identification
3.1. The Constraint Identification Method Based on the SDAE
3.2. Markov Decision Process Modeling
3.3. Optimal Strategy Solution
3.4. Solution Architecture
4. Case Analysis
4.1. Introduction to Practical Examples
4.2. Uncertainty Handling
4.3. Comparison of Model Training Effects
4.4. Analysis of Optimization Results Comparison
4.5. Sensitivity Analysis of Different Risk Penalty Systems
5. Conclusions
5.1. Summary of Work
5.2. Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| PPO | Proximal Policy Optimization |
| SDAE | Stacked De-Noising Auto-Encoder |
| AE | Auto-Encoder |
| CVaR | Conditional Value at Risk |
| LHS | Latin Hypercubic Sampling |
| DRL | Deep Reinforcement Learning |
| MLP | Multi-Layer Perceptron |
| MDP | Markov Decision Process |
| DDPG | Deep Deterministic Policy Gradient |
| MILP | Mixed-Integer Linear Programming |
References
- Lu, Z.X.; Xu, X.; Yan, Z.; Wu, J.; Sang, D.; Wang, S. Overview on data-driven optimal scheduling methods of power system in uncertain environment. Autom. Electr. Power Syst. 2020, 44, 172–183. [Google Scholar] [CrossRef]
- Li, J.H.; Xie, Y.T.; Zeng, H.Y. A Review of Uncertain Optimal Scheduling Research and Its Application in the New Power System. High Volt. Technol. 2022, 48, 3447–3464. [Google Scholar] [CrossRef]
- Jiang, W.; Feng, B.; Guo, X.Z. An Identification Method for Effective Safety Constraints Based on Graph Neural Network. Electr. Autom. 2023, 45, 106–108. [Google Scholar]
- Zhu, Z.C.; Yang, Z.F.; Yu, J. A Data-driven Fast Calculation Method for Safety-Constrained Economic Dispatch in Small Sample Scenarios. Proc. CSEE 2022, 42, 4430–4440. [Google Scholar] [CrossRef]
- Wang, K.; Chen, S.Y.; Xu, J. Robust Decision-making Method for Forward Dispatch of Power System Based on Redundancy Constraint Fast Identification. Autom. Electr. Power Syst. 2025, 10, 1–14. Available online: https://link.cnki.net/urlid/32.1180.TP.20250107.1436.002 (accessed on 6 November 2025).
- Fan, S.X.; Li, L.X.; Wang, S.Y. Application analysis and exploration of artificial intelligence technology in power grid dispatch and control. Power Syst. Technol. 2020, 44, 401–411. [Google Scholar] [CrossRef]
- Dong, L.; Liu, Y.; Qiao, J. Optimal dispatch of combined heat and power system based on multi-agent deep reinforcement learning. Power Syst. Technol. 2021, 45, 4729–4737. [Google Scholar] [CrossRef]
- Dai, P.C.; Yu, W.W.; Wen, G.H. Distributed reinforcement learning algorithm for dynamic economic dispatch with unknown generation cost functions. IEEE Trans. Ind. Inform. 2020, 16, 2258–2267. [Google Scholar] [CrossRef]
- Shen, R.; Zhong, S.; Wen, X. Multi-agent deep reinforcement learning optimization framework for building energy system with renewable energy. Appl. Energy 2022, 312, 118724. [Google Scholar] [CrossRef]
- Zhang, J.Y.; Pu, T.J.; Li, Y. Distributed Generation Optimal Scheduling Strategy Based on Multi-Agent Deep Reinforcement Learning. Power Syst. Technol. 2022, 46, 3496–3504. [Google Scholar] [CrossRef]
- Rockafellar, R.T.; Uryasev, S. Optimization of conditional value-at-risk. J. Risk 2000, 2, 21–42. Available online: https://api.semanticscholar.org/CorpusID:854622 (accessed on 19 November 2025). [CrossRef]
- Luo, J.S.; Tian, X.Q.; Wang, Y. Research on the Output Range of Thermal Power Units under Heating Conditions Based on Data-driven Approach. Energy Conserv. Technol. 2024, 42, 39–43. [Google Scholar]
- Tian, X.Q.; Luo, J.S.; Yang, L. A Method for Judging Coal Quality Changes and Their Impact Based on Operating Data of Thermal Power Plants. Energy Conserv. Technol. 2024, 42, 73–76+92. [Google Scholar]
- Chen, Y.; Zhang, Z.; Liu, Z.; Zhang, P.; Ding, Q.; Liu, X.; Wang, W. Robust N–k CCUC model considering the fault outage probability of units and transmission lines. IET Gener. Transm. Distrib 2019, 13, 3782–3791. [Google Scholar] [CrossRef]
- Zhang, B.M.; Chen, S.S.; Yan, Z. Advanced Power Network Analysis; Tsinghua University Press: Beijing, China, 2007. [Google Scholar]
- Li, J.Z.; Xie, M.; Li, S. Consider CVaR unit combination and scene decision alternatives more joint optimization. J. South. Energy Constr. 2021, 8, 16. [Google Scholar] [CrossRef]
- Wu, W.C.; Xu, S.W.; Yang, Y. Probabilistic scheduling of high-proportion new energy power systems based on risk quantification. Autom. Electr. Power Syst. 2023, 47, 3–11. [Google Scholar] [CrossRef]
- Liu, Y.J. Short-Term Optimization and Risk Management of Renewable Energy Generation Under Different Operating Environments. Ph.D. Thesis, Shanghai Jiao Tong University, Shanghai, China, 2013. [Google Scholar]
- Artzner, P.; Delbaen, F.; Eber, J.-M.; David, D.H. Coherent meansures of risk. Math. Financ. 1999, 9, 203–228. [Google Scholar] [CrossRef]
- Yang, Y.F. Optimization of Power System Generation Plan and Risk Assessment for Adapting to Large-Scale Integration of Renewable Energy Sources. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2015. [Google Scholar] [CrossRef]
- Song, Y.; Li, H. Generation of Wind and Light Output Scenarios Based on Kernel Density Estimation and Copula Function. Electr. Technol. 2022, 23, 56–63. [Google Scholar] [CrossRef]
- Wang, Y.L.; Zhou, T.; Chen, Z. Stepwise Inertial Intelligent Control for Wind Power Frequency Regulation Based on Stacked Denoising Autoencoder and Deep Neural Network. J. Shanghai Jiao Tong Univ. 2023, 57, 1477–1491. [Google Scholar] [CrossRef]
- Wu, Y.L.; Zhang, J.X.; Li, B. A Fast Clearing Method for Power Market Based on Deep Learning-Assisted Constraint Identification. China Electr. Power 2020, 53, 90–97+207. [Google Scholar] [CrossRef]
- Yu, J.; Yang, Y.; Yang, Z.F. A Probabilistic Energy Flow Fast Calculation Method Based on Deep Learning. Proc. CSEE 2019, 39, 22–30+317. [Google Scholar] [CrossRef]
- Ou, J.Y.; Zhang, Y.; Xin, R. Distributed Cooperative Regulation Strategy for Power Quality in Low-Voltage Distribution Networks Based on Multi-Agent Deep Reinforcement Learning. Proc. CSEE 2025, 12, 1–15. Available online: https://link.cnki.net/urlid/11.2107.TM.20241227.1319.011 (accessed on 19 November 2025).
- Chen, Z.; Pan, Y.; Fan, S.X. Research on Unit Commitment Optimization Method Based on Deep Reinforcement Learning. Electr. Power Inf. Commun. Technol. 2023, 21, 33–40. [Google Scholar]
- Feng, B.; Hu, Y.J.; Huang, G. Review of New Dispatching Optimization Methods for Power Systems Based on Deep Reinforcement Learning. Autom. Electr. Power Syst. 2023, 47, 187–199. [Google Scholar] [CrossRef]
- Lin, W.S.; Wang, X.J.; Sun, Q.K. Research on Deep Reinforcement Learning Optimization Scheduling Strategy for Integrated Energy System Considering Safety Constraints. Power Syst. Technol. 2023, 47, 1970–1983. [Google Scholar] [CrossRef]
- Peng, L.Y.; Sun, Y.Z.; Xu, J.; Liao, S.Y.; Yang, L. Adaptive Uncertain Economic Dispatching Based on Deep Reinforcement Learning. Power Syst. Autom. 2020, 44, 33–42. [Google Scholar] [CrossRef]
- Yang, Z.X.; Ren, Z.Y.; Sun, Z.Y. Security-Constrained Economic Dispatch Method for New Energy Power Systems Based on Proximal Policy Optimization Algorithm. Power Syst. Technol. 2023, 47, 988–998. [Google Scholar] [CrossRef]
- Li, T.; Li, Z.W.; Yang, J.Y. Complementary and Coordinated Optimal Dispatch of Multi-energy Systems Considering Peak-Shaving Activeness of Wind, Solar, Hydro and Thermal Power Sources. Power Syst. Technol. 2020, 44, 3622–3630. [Google Scholar] [CrossRef]











| Model | ACC | Recall | Performance Decline |
|---|---|---|---|
| SDAE | 98.8% | 97.5% | <2% |
| Non-de-noising AE | 93.2% | 91.4% | ~10% |
| Shallow AE | 89.5% | 85.7% | ~15% |
| MLP | 90.7% | 88.2% | ~12% |
| Method | Total Cost (CNY 10,000) | Risk Cost (CNY 10,000) | Training Time (h) | Calculation Time (s) |
|---|---|---|---|---|
| PPO + SDAE | 6775.39 | 6521.23 | 3.3 | 2.77 |
| PPO | 6771.62 | 6521.21 | 14 | 16.39 |
| DDPG | 6765.62 | 6515.61 | 9 | 127.54 |
| MILP(Gurobi) | 6768.45 | 6510.78 | -- | 15,839.86 |
| Scheme | Total Cost (CNY 10,000) | Running Cost (CNY 10,000) | Standby Cost (CNY 10,000) | Shutoff Power (MWh) | Load Shedding Power (MWh) | Risk Cost (CNY 10,000) |
|---|---|---|---|---|---|---|
| 1 | 7222.76 | 6578.82 | 521.6 | 900 | 100 | 7406.27 |
| 2 | 6914.74 | 6596.78 | 406.08 | 200 | 0 | 6782.56 |
| 3 | 6890.24 | 6603.26 | 452.72 | 100 | 0 | 6641.63 |
| 4 | 6775.39 | 6612.94 | 423.51 | 0 | 0 | 6383.81 |
| Risk Factor | Running Cost (CNY 10,000) | Standby Cost (CNY 10,000) | Risk Cost (CNY 10,000) | Total Cost (CNY 10,000) |
|---|---|---|---|---|
| 0 | 6578.82 | 643.94 | 7406.27 | 7222.76 |
| 0.1 | 6442.37 | 664.82 | 7129.36 | 7109.41 |
| 0.2 | 6411.54 | 608.32 | 7002.15 | 7016.32 |
| 0.3 | 6520.97 | 513.97 | 6713.97 | 6938.65 |
| 0.4 | 6612.93 | 423.51 | 6383.81 | 6775.39 |
| 0.5 | 6980.47 | 498.15 | 6295.34 | 6886.98 |
| 0.6 | 7422.99 | 539.24 | 6175.63 | 6950.27 |
| 0.7 | 8646.20 | 602.56 | 6021.23 | 7129.49 |
| 0.8 | 11,584.86 | 634.99 | 5896.45 | 7401.13 |
| 0.9 | 21,273.33 | 699.13 | 5628.26 | 7622.68 |
| 1 | 6578.82 | 643.94 | 7406.27 | 7222.76 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, D.; Zhang, L.; Mi, N.; Zhong, H. Optimization Solution for Unit Power Generation Plan Based on the Integration of Constraint Identification and Deep Reinforcement Learning. Processes 2025, 13, 3778. https://doi.org/10.3390/pr13123778
Li D, Zhang L, Mi N, Zhong H. Optimization Solution for Unit Power Generation Plan Based on the Integration of Constraint Identification and Deep Reinforcement Learning. Processes. 2025; 13(12):3778. https://doi.org/10.3390/pr13123778
Chicago/Turabian StyleLi, Dan, Lei Zhang, Ning Mi, and Hailiang Zhong. 2025. "Optimization Solution for Unit Power Generation Plan Based on the Integration of Constraint Identification and Deep Reinforcement Learning" Processes 13, no. 12: 3778. https://doi.org/10.3390/pr13123778
APA StyleLi, D., Zhang, L., Mi, N., & Zhong, H. (2025). Optimization Solution for Unit Power Generation Plan Based on the Integration of Constraint Identification and Deep Reinforcement Learning. Processes, 13(12), 3778. https://doi.org/10.3390/pr13123778

