Application of a Gradient Descent Continuous Actor-Critic Algorithm for Double-Side Day-Ahead Electricity Market Modeling
Abstract
:1. Introduction
1.1. Background and Motivation
1.2. Literature Review and Main Contributions
- (1)
- Every participant (GenCO or DisCO) has no idea about what the cost and revenue functions of all its rivals are;
- (2)
- Every participant has no idea about what the ongoing and historical strategies of all its rivals and those strategies’ real probability distribution functions are in the day-ahead market every day;
- (3)
- The common information published by ISO after the completion of bidding and market clearing every day is only the MCP of every time intervals of the next day. One participant can only be noticed by ISO its own producing or consuming schedules in every time interval of the next day;
- (4)
- Every participant can adjust its bidding strategy within a continuous interval of values, and the MCP also changes within a continuous interval of values over time.
2. Agent-Based Double-Side Day-Ahead Electricity Market Model
2.1. Participants’ Bidding Model
2.2. Market Clearing Model
2.3. Agent Learning Mechanism
- (1)
- Transaction day: In a transaction day T (T = 1, 2, …), since the market is assumed to be cleared in a day-ahead single (negotiation) time interval basis, every GenCO or DisCO bids only one supply or demand function for the single time interval corresponding to the next day by use of MCP information calculated and published by the ISO in transaction day T−1 (for transaction day T).
- (2)
- State variable: Historical MCP information calculated and published by the ISO in transaction day T−1 constitutes a value of the state variable in transaction day T.
- (3)
- Action variable: In a transaction day T, the GenCO i or DisCO j’s bidding strategy constitutes a value of its action variable. Hence, the action variable for GenCO i and DisCO j can be respectively described as follows:
- (4)
- Iteration: We consider each transaction day as one iteration.
3. Methodology
3.1. Policy Search
3.2. Introduction of the Gradient Descent Continuous Actor-Critic Algorithm
- (1)
- Input: the feature extraction function , discount factor , 0 ≤ γ ≤ 1, step length parameter series and parameters σ, m.
- (2)
- Initialize linear parameter vectors θ0 and ω0.
- (3)
- Repeat (for each episode)
- Initialize state x0 randomly
- Repeat (for each time step T = 0, 1, 2, … in the episode)
- Choose and implement an action from state xT, then observe immediate reword rT and the next state xT+1;
- ;
- Until xT+1 is terminal
- Until the desired number of episodes has been searched.
- (4)
- Output: and V*(x), A*(x).
3.3. The Proposed Market Procedure
- (1)
- Input: for GenCO i and for DisCO j , discount factor 0 ≤ γ ≤ 1, step length parameter series and for GenCO i, step length parameter series and for DisCO j, and parameters σ, m.
- (2)
- Initialize the linear parameter vectors and for GenCO i, linear parameter vectors and for DisCO j.
- (3)
- T = 0.
- (4)
- Initialize for GenCO i and for DisCO j randomly, and calculate x1 = MCP0 through Equations (1), (3), and (7)–(12).
- (5)
- Repeat (for each time step T = 1, 2, …, TN).
- GenCO i chooses and implements an action from state xT = MCPT−1, then observes the immediate reword rgi,T using Equation (5) and the next state xT+1 generated by Equations (1), (3), and (7)–(12);
- DisCO j chooses and implements an action from state xT = MCPT−1, and then observes the immediate reword rdj,T using Equation (6) and the next state xT+1 generated by Equations (1), (3), and (7)–(12);
- GenCO i updates:
- ;;
- DisCO j updates:
- ;;
- (6)
- Output: for GenCO i: and Vgi*(x), Agi*(x); for GenCO i: and Vdj*(x), Adj*(x).
4. Simulation and Discussions
4.1. Data and Assumptions
4.2. Simulation Result and Comparative Analysis
- (1)
- After the same number of iterations (including 3000 iterations of training and 500 iterations of decision making), GenCO 1’s profit in Scenario 2 is yuan which is higher than GenCO 1’s profit in Scenario 1 (e.g., yuan). This indicates one can get more profit by using our proposed GDCAC reinforcement learning model to bid in the market than using the traditional Q-learning model with the same conditions (namely the same parameters values, number of iterations, and adaptive learning mechanism of other participants);
- (2)
- If we ignore the externality, the total social welfare of the electricity market is equal to the summation of all participants’ profits. Therefore, after the same number of iterations, the social welfare in Scenario 3 is higher than that in Scenario 2, and the social welfare in Scenario 2 is higher than that in Scenario 1. This indicates with the increase in the number of participants by using our proposed GDCAC reinforcement learning model to bid in electricity market, the total social welfare can be higher and higher;
4.3. Sensitivity Analysis
- (1)
- There is no monotonic relationship between the social welfare and the number of training iterations, which may be caused by the system noises during training process. Therefore, in the market simulation with our proposed GDCAC reinforcement learning model, how to find the globally optimal number of training iterations that can bring the highest social welfare may be a new topic to be studied.
- (2)
- Social welfare increases with the decrease of MSE between all participants’ strategy values and 1. It is known that every participant will respectively bid at its marginal cost or revenue when all participants’ strategy values equal to 1, which also means the perfect competition and the highest welfare. Therefore, how to design the double-side electricity market mechanism especially for China to pursue higher efficiency of resource allocation by means of our proposed GDCAC reinforcement learning market model may be another new topic to be studied.
5. Conclusions
- (1)
- Our proposed GDCAC reinforcement learning market model needs no common knowledge of every participant’s cost or revenue, strategy probability distribution function of every participant, MCP probability distribution function of the market, and scheduling result of every participant, which need be more or less assumed to be known by every participant in most game-based models.
- (2)
- Our proposed GDCAC reinforcement learning market model can cope with the issues with continuous state and action sets without causing trouble of ‘curse of dimensionality’, which cannot be overcome by using traditional table-based reinforcement learning algorithms. Therefore, our proposed model is more suitable and feasible for simulating the practical double-side day-ahead electricity market in which both the state (MCP) and action (every participant’s bidding strategy) sets are continuous.
- (3)
- Because the time complexity of GDCAC reinforcement learning algorithm is only O(n), our proposed model can be used in large-scale electricity market system simulation with a lot of participants competing with each other simultaneously, which can hardly be achieved by using game-based models or table-based reinforcement models.
- (4)
- The simulation results show that by using our proposed model, a participant can get more profit than that without using it. Meanwhile, if every participant in the market adopts our proposed model simultaneously, the Nash equilibrium result of electricity market will bring higher social welfare, which is very close to the situation of every participant using marginal cost or revenue based bidding strategy.
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Mou, D. Understanding China’s electricity market reform from the perspective of the coal-fired power disparity. J. Energy Policy 2014, 74, 224–234. [Google Scholar] [CrossRef]
- Ma, C.; Zhao, X. China’s electricity market restructuring and technology mandates: Plant-level evidence for changing operational efficiency. J. Energy Econ. 2015, 47, 227–237. [Google Scholar] [CrossRef]
- Sun, Q.; Xu, L.; Yin, H. Energy pricing reform and energy efficiency in China: Evidence from the automobile market. J. Resour. Energy Econ. 2016, 44, 39–51. [Google Scholar] [CrossRef]
- Website of National Development and Reform Commission (NDRC) People’s Republic of China. Available online: http://www.sdpc.gov.cn/zcfb/zcfbtz/201511/t20151130_760016.html (accessed on 26 November 2015).
- Prabavathi, M.; Gnanadass, R. Energy bidding strategies for restructured electricity market. J. Int. J. Electr. Power Energy Syst. 2015, 64, 956–966. [Google Scholar] [CrossRef]
- Ringler, P.; Keles, D.; Fichtner, W. Agent-based modeling and simulation of smart electricity grids and markets—A literature review. J. Renew. Sustain. Energy Rev. 2016, 57, 205–215. [Google Scholar] [CrossRef]
- Kardakos, E.G.; Simoglou, C.K.; Bakirtzis, A.G. Optimal bidding strategy in transmission-constrained electricity markets. J. Electr. Power Syst. Res. 2014, 109, 141–149. [Google Scholar] [CrossRef]
- Al-Agtash, S.Y. Supply curve bidding of electricity in constrained power networks. J. Energy 2010, 35, 2886–2892. [Google Scholar] [CrossRef]
- Langary, D.; Sadati, N.; Ranjbar, A.M. Direct approach in computing robust Nash strategies for generating companies in electricity markets. J. Int. J. Electr. Power Energy Syst. 2014, 54, 442–453. [Google Scholar] [CrossRef]
- Borghetti, A.; Massucco, S.; Silvestro, F. Influence of feasibility constrains on the bidding strategy selection in a day-ahead electricity market session. J. Electr. Power Syst. Res. 2009, 79, 1727–1737. [Google Scholar] [CrossRef]
- Gao, F.; Sheble, G.B.; Hedman, K.W.; Yu, C.N. Optimal bidding strategy for GENCOs based on parametric linear programming considering incomplete information. J. Int. J. Electr. Power Energy Syst. 2015, 66, 272–279. [Google Scholar] [CrossRef]
- Kumar, J.V.; Kumar, D.M.V. Generation bidding strategy in a pool based electricity market using Shuffled Frog Leaping Algorithm. J. Appl. Soft Comput. 2014, 21, 407–414. [Google Scholar] [CrossRef]
- Wang, J.; Zhou, Z.; Botterud, A. An evolutionary game approach to analyzing bidding strategies in electricity markets with elastic demand. J. Energy 2011, 36, 3459–3467. [Google Scholar] [CrossRef]
- Liu, Z.; Yan, J.; Shi, Y.; Zhu, K.; Pu, G. Multi-agent based experimental analysis on bidding mechanism in electricity auction markets. J. Int. J. Electr. Power Energy Syst. 2012, 43, 696–702. [Google Scholar] [CrossRef]
- Nojavan, S.; Zare, K.; Feyzi, M.R. Optimal bidding strategy of generation station in power market using information gap decision theory (IGDT). J. Electr. Power Syst. 2013, 96, 56–63. [Google Scholar] [CrossRef]
- Wen, F.; David, A.K. Optimal bidding strategies and modeling of imperfect information among competitive generators. IEEE Trans. Power Syst. 2001, 16, 15–21. [Google Scholar]
- Kumar, J.V.; Kumar, D.M.V.; Edukondalu, K. Strategic bidding using fuzzy adaptive gravitational search algorithm in a pool based electricity market. J. Appl. Soft Comput. 2013, 13, 2445–2455. [Google Scholar] [CrossRef]
- Azadeh, A.; Skandari, M.R.; Maleki-Shoja, B. An integrated ant colony optimization approach to compare strategies of clearing market in electricity markets: Agent-based simulation. J. Energy Policy 2010, 38, 6307–6319. [Google Scholar] [CrossRef]
- Rahimiyan, M.; Mashhadi, H.R. Supplier’s optimal bidding strategy in electricity pay-as-bid auction: Comparison of the Q-learning and a model-based approach. J. Electric Power Syst. Res. 2008, 78, 165–175. [Google Scholar] [CrossRef]
- Shivaie, M.; Ameli, M.T. An environmental/techno-economic approach for bidding strategy in security-constrained electricity markets by a bi-level harmony search algorithm. J. Renew. Energy 2015, 83, 881–896. [Google Scholar] [CrossRef]
- Menniti, D.; Pinnarelli, A.; Sorrentino, N. Simulation of producers’ behaviour in the electricity market by evolutionary games. J. Electr. Power Syst. Res. 2008, 78, 475–483. [Google Scholar] [CrossRef]
- Ladjici, A.A.; Boudour, M. Nash–Cournot equilibrium of a deregulated electricity market using competitive coevolutionary algorithms. J. Electr. Power Syst. Res. 2011, 81, 958–966. [Google Scholar] [CrossRef]
- Ladjici, A.A.; Tiguercha, A.; Boudour, M. Nash Equilibrium in a two-settlement electricity market using competitive coevolutionary algorithms. J. Int. J. Electr. Power Energy Syst. 2014, 57, 148–155. [Google Scholar] [CrossRef]
- Salehizadeh, M.R.; Soltaniyan, S. Application of fuzzy Q-learning for electricity market modeling by considering renewable power penetration. Renew. Sustain. Energy Rev. 2016, 56, 1172–1181. [Google Scholar] [CrossRef]
- Thanhquy, B. Using Reinforcement Learning to Study the Features of the Participants’ Behavior in Wholesale Power Market. Ph.D Thesis, Hunan University, Hunan, China, 2013. [Google Scholar]
- Naghibi-Sistani, M.B.; Akbarzadeh-Tootoonchi, M.R.; Javidi-Dashte, B.M.H.; Rajabi-Mashhadi, H. Application of Q-learning with temperature variation for bidding strategies in market based power systems. J. Energy Convers. Manag. 2006, 47, 1529–1538. [Google Scholar] [CrossRef]
- Li, H.; Tesfatsion, L. Co-learning patterns as emergent market phenomena: An electricity market illustration. J. Econ. Behav. Organ. 2012, 82, 395–419. [Google Scholar] [CrossRef]
- Pinto, T.; Sousa, T.M.; Morais, H.; Praça, I.; Vale, Z. Metalearning to support competitive electricity market players’ strategic bidding. J. Electr. Power Syst. Res. 2016, 135, 27–34. [Google Scholar] [CrossRef]
- Lim, Y.; Kim, H.M. Strategic bidding using reinforcement learning for load shedding in microgrids. J. Comput. Electr. Eng. 2014, 40, 1439–1446. [Google Scholar] [CrossRef]
- Sheikhi, A.; Rayati, M.; Ranjbar, A.M. Dynamic load management for a residential customer: Reinforcement Learning approach. J. Sustain. Cities Soc. 2016, 24, 42–51. [Google Scholar] [CrossRef]
- Mahvi, M.; Ardehali, M.M. Optimal bidding strategy in a competitive electricity market based on agent-based approach and numerical sensitivity analysis. J. Energy 2011, 36, 6367–6374. [Google Scholar] [CrossRef]
- Bublitz, A.; Genoese, M.; Fichtner, W. An agent-based model of the German electricity market with short-time uncertainty factors. In Proceedings of the 2014 11th International Conference on European Energy Market (EEM) IEEE, Cracow, Poland, 28–30 May 2014.
- Raju, L.; Sibi, S.; Milton, R.S. Distributed optimization of solar micro-grid using multi agent reinforcement learning. J. Procedia Comput. Sci. 2015, 46, 231–239. [Google Scholar] [CrossRef]
- Wang, Y.H.; Li, T.H.S.; Lin, C.J. Backward Q-learning: The combination of Sarsa algorithm and Q-learning. J. Eng. Appl. Artif. Intell. 2013, 26, 2184–2193. [Google Scholar] [CrossRef]
- Xu, M.L.; Xu, W.B. Fuzzy Q-learning in continuous state and action space. J. China Univ. Posts Telecommun. 2010, 17, 100–109. [Google Scholar] [CrossRef]
- Xu, X.; Zuo, L.; Huang, Z. Reinforcement learning algorithms with function approximation: Recent advances and applications. J. Inf. Sci. 2014, 261, 1–31. [Google Scholar] [CrossRef]
- Chen, X. Study of Reinforcement Learning Algorithms Based on Value Function Approximation. Ph.D Thesis, Nanjing University, Jiangsu, China, 2013. [Google Scholar]
- Chen, G. Research on Value Function Approximation Methods in Reinforcement Learning. Master’s Thesis, Soochow University, Jiangsu, China, 2014. [Google Scholar]
Participants | ai (103 RMB yuan /MW2) | bi (103 RMB yuan /MW) | kgi,min | kgi,max | Pgi,min | Pgi,max |
---|---|---|---|---|---|---|
GenCO1 | 0.046 | 14 | 1.0 | 3.0 | 0 | 210 |
GenCO2 | 0.074 | 10 | 1.0 | 3.0 | 0 | 600 |
GenCO3 | 0.062 | 12 | 1.0 | 3.0 | 0 | 200 |
GenCO4 | 0.043 | 25 | 1.0 | 3.0 | 0 | 520 |
GenCO5 | 0.031 | 20 | 1.0 | 3.0 | 0 | 250 |
GenCO6 | 0.064 | 20 | 1.0 | 3.0 | 0 | 400 |
Participants | cj (103 RMB yuan /MW2) | dj (103 RMB yuan /MW) | kj,min | kj,max | Pdj,min | Pdj,max |
DisCO1 | −0.052 | 25 | 0 | 1.0 | 0 | 250 |
DisCO2 | −0.034 | 25 | 0 | 1.0 | 0 | 250 |
DisCO3 | −0.031 | 20 | 0 | 1.0 | 0 | 300 |
DisCO4 | −0.054 | 25 | 0 | 1.0 | 0 | 300 |
DisCO5 | −0.013 | 20 | 0 | 1.0 | 0 | 300 |
Scenarios | Participants | State Set (RMB yuan/MWh) | Action Set | m | |||||
---|---|---|---|---|---|---|---|---|---|
Scenario 1 | Gen1 | {X1,X2, …,,X20} | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - |
Gen2 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Gen3 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Gen4 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Gen5 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Gen6 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Dis1 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis2 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis3 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis4 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis5 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Scenario 2 | Gen1 | [10 34] | [1 3] | - | 0.5 | 0.1 | 0.1 | 4 | 1 |
Gen2 | {X1,X2, …,,X20} | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | |
Gen3 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Gen4 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Gen5 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Gen6 | {Ug1, Ug2, …, Ug20} | 0.1 | 0.5 | - | - | - | - | ||
Dis1 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis2 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis3 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis4 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Dis4 | {Ud1, Ud2, …, Ud20} | 0.1 | 0.5 | - | - | - | - | ||
Scenario 3 | Gen1 | [10 34] | [1 3] | - | 0.5 | 0.1 | 0.1 | 4 | 1 |
Gen2 | [1 3] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Gen3 | [1 3] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Gen4 | [1 3] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Gen5 | [1 3] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Gen6 | [1 3] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Dis1 | [0.3 1] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Dis2 | [0.3 1] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Dis3 | [0.3 1] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Dis4 | [0.3 1] | - | 0.5 | 0.1 | 0.1 | 4 | 1 | ||
Dis5 | [0.3 1] | - | 0.5 | 0.1 | 0.1 | 4 | 1 |
Participants | Gen1 | Gen2 | Gen3 | Gen4 | Gen5 | Gen6 | Dis1 | Dis2 | Dis3 | Dis4 | Dis5 | Sum | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Profits | Scen 1 | 1.3353 | 1.4581 | 1.3870 | 1.1777 | 1.8035 | 0.6384 | 1.2555 | 1.9202 | 1.4146 | 1.0530 | 1.4774 | 14.9207 |
Scen 2 | 1.4030 | 1.5142 | 1.3622 | 1.1502 | 1.7634 | 0.6217 | 1.2828 | 1.9620 | 1.4529 | 1.0759 | 1.5414 | 15.1297 | |
Scen 3 | 1.4952 | 1.6809 | 1.5363 | 1.3604 | 2.0445 | 0.7442 | 1.1860 | 1.7686 | 1.3464 | 0.9963 | 1.4221 | 15.5809 |
Cases | Gen1 | Gen2 | Gen3 | Gen4 | Gen5 | Gen6 | Dis1 | Dis2 | Dis3 | Dis4 | Dis5 | Sum | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Numbers of training iterations | 1000 | 1.5441 | 1.7012 | 1.5932 | 1.4133 | 2.0992 | 0.7905 | 1.1380 | 1.7204 | 1.2790 | 0.9585 | 1.3209 | 15.5583 |
2000 | 1.4744 | 1.6543 | 1.4986 | 1.2826 | 1.9602 | 0.7249 | 1.2252 | 1.8838 | 1.3884 | 1.0241 | 1.4600 | 15.5765 | |
3000 | 1.4952 | 1.6809 | 1.5363 | 1.3604 | 2.0445 | 0.7442 | 1.1860 | 1.7686 | 1.3464 | 0.9963 | 1.4221 | 15.5809 | |
4000 | 1.5686 | 1.6696 | 1.4343 | 1.3713 | 2.1683 | 0.7986 | 1.1329 | 1.6471 | 1.2650 | 0.9255 | 1.2714 | 15.2526 | |
5000 | 1.4315 | 1.6855 | 1.5455 | 1.3334 | 2.0525 | 0.7525 | 1.1844 | 1.8216 | 1.2907 | 1.0006 | 1.3901 | 15.4883 |
Cases | Gen1 | Gen2 | Gen3 | Gen4 | Gen5 | Gen6 | Dis1 | Dis2 | Dis3 | Dis4 | Dis5 | MSE * | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Numbers of training iterations | 1000 | 1.1117 | 1.0986 | 1.1723 | 1.0576 | 1.0976 | 1.0308 | 0.9555 | 0.9398 | 0.9828 | 0.9477 | 0.9225 | 0.0854 |
2000 | 1.0719 | 1.1116 | 1.1238 | 1.0693 | 1.0494 | 1.0877 | 0.9661 | 0.9243 | 0.9518 | 0.8928 | 0.9645 | 0.0797 | |
3000 | 1.0185 | 1.0186 | 1.0343 | 1.0791 | 1.0771 | 1.0291 | 0.9571 | 0.9640 | 0.9552 | 0.9471 | 0.9392 | 0.0491 | |
4000 | 1.1068 | 1.1478 | 1.1146 | 1.0257 | 1.0332 | 1.0508 | 0.9145 | 0.9592 | 0.9623 | 0.9813 | 0.9294 | 0.0968 | |
5000 | 1.0706 | 1.3096 | 1.1995 | 1.0471 | 1.0427 | 1.1625 | 0.9285 | 0.9928 | 0.9721 | 0.9899 | 0.9447 | 0.0881 |
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, H.; Wang, Y.; Guo, S.; Zhao, M.; Zhang, C. Application of a Gradient Descent Continuous Actor-Critic Algorithm for Double-Side Day-Ahead Electricity Market Modeling. Energies 2016, 9, 725. https://doi.org/10.3390/en9090725
Zhao H, Wang Y, Guo S, Zhao M, Zhang C. Application of a Gradient Descent Continuous Actor-Critic Algorithm for Double-Side Day-Ahead Electricity Market Modeling. Energies. 2016; 9(9):725. https://doi.org/10.3390/en9090725
Chicago/Turabian StyleZhao, Huiru, Yuwei Wang, Sen Guo, Mingrui Zhao, and Chao Zhang. 2016. "Application of a Gradient Descent Continuous Actor-Critic Algorithm for Double-Side Day-Ahead Electricity Market Modeling" Energies 9, no. 9: 725. https://doi.org/10.3390/en9090725