Next Article in Journal
From Climate Risks to Resilient Energy Systems: Addressing the Implications of Climate Change on Indonesia’s Energy Policy
Previous Article in Journal
Analysis of the Possibility of Using CO2 Capture in a Coal-Fired Power Plant
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market

School of Electric Power Engineering, South China University of Technology, Guangzhou 510640, China
*
Author to whom correspondence should be addressed.
Energies 2025, 18(9), 2388; https://doi.org/10.3390/en18092388
Submission received: 3 April 2025 / Revised: 22 April 2025 / Accepted: 27 April 2025 / Published: 7 May 2025

Abstract

:
The deepening operation of the carbon emission trading market has reshaped the cost–benefit structure of the power generation side. In the process of participating in the market quotation, power suppliers not only need to calculate the conventional power generation cost but also need to coordinate the superimposed impact of carbon quota accounting on operating income, which causes the power suppliers a multi-time-scale decision-making collaborative optimization problem under the interaction of the carbon market, power market, and coal market. This paper focuses on the multi-market-coupling decision optimization problem of thermal power suppliers. It proposes a collaborative bidding decision framework based on a multi-agent deep deterministic policy gradient (MADDPG). Firstly, aiming at the time-scale difference of multi-sided market decision making, a decision-making cycle coordination scheme for the carbon–electricity–coal coupling market is proposed. Secondly, upper and lower optimization models for the bidding decision making of power suppliers are constructed. Then, based on the MADDPG algorithm, the multi-generator bidding scenario is simulated to solve the optimal multi-generator bidding strategy in the carbon–electricity–coal coupling market. Finally, the multi-scenario simulation based on the IEEE-5 node system shows that the model can effectively analyze the differential influence of a multi-market structure on the bidding strategy of power suppliers, verifying the superiority of the algorithm in convergence speed and revenue optimization.

1. Introduction

Under the background of the establishment of the dual-carbon strategic goal, the construction process of the carbon trading market has been fully accelerated. At the same time, under the background of a high proportion of renewable energy grid connections, the correlation of price fluctuation between different markets has been enhanced, and the cross-influence of policy constraints has intensified. This presents a market pattern of multi-market coordinated operation, such as the electricity spot market, carbon trading market, and upstream coal market [1]. Power suppliers are faced with the problem of collaborative optimization of decision making in the multi-side market trading environment. The traditional single-market bidding strategy has been difficult to adapt to the bidding decision-making needs of power suppliers under the new power market system [2]. It is urgent to analyze the impact of multi-dimensional market structure on bidding clearing results and explore the coordination mechanism of the bidding decision making of power suppliers under the carbon–electricity–coal coupling environment so as to optimize the allocation of resources in the power market and improve the operating income of power suppliers [3].
The complexity of the market mechanism promotes the evolution of a bidding strategy from single-market equilibrium analysis to multi-market-coupling modeling. Early research on bidding strategies has mainly focused on the electricity spot market. Based on the short-term bidding game model, reference [4] verified the effectiveness of the marginal cost pricing mechanism of thermal power plants and clarified the bidding optimization path under the equilibrium of supply and demand in the spot market. With the promotion of the carbon trading mechanism, some scholars have studied the bidding strategy of electricity–carbon dual-market linkage. Reference [5] constructed a carbon–electricity collaborative demand-response model and revealed the marginal adjustment effect of carbon price fluctuation on generator output decision. By introducing a carbon tax constraint into the day-ahead market, the literature [6] realized the revenue co-optimization of thermal power unit energy and reserve service. In reference [7], the fuel procurement market was included in the analysis framework, and the risk hedging mechanism under the synergistic effect of fuel cost and electricity price fluctuation was clarified. However, the above research was limited to a single market or a specific dual-coupling market, and it is difficult to reflect the dynamic transmission effect of multi-market constraints on the bidding strategy of power suppliers, which may lead to the strategic loss risk of power suppliers in the real multi-market-coupling environment.
In order to solve the bidding strategy of power suppliers, agent-based algorithms have become a powerful tool for research in this field because of their ability to model and simulate the dynamic evolution of the market and the bidding behavior of power suppliers [8]. From the perspective of technological evolution, reinforcement learning based on the value function method has experienced an iterative leap from basic algorithms to deep architecture. In the early research, Q-learning was used to construct a bidding game model of discrete state-action space [9,10], but it was difficult to realize the representation of a high-dimensional market state. The deep Q-network (DQN) algorithm solves the problem of strategy generalization of Q-learning in high-dimensional state space by introducing a deep neural network and experience playback mechanism [11,12]. The double deep Q-network (DDQN) further decouples action selection and value evaluation, which effectively alleviates the overestimation bias in the solution of multi-generator bidding equilibrium [13,14]. The policy function law opens up a new path through continuous-action space modeling: deep deterministic policy gradient (DDPG) and its improved algorithm map the bidding strategy to the probability distribution space, which solves the discrete decision constraints of the traditional value function method [15,16,17]. However, the existing algorithms face the challenges of difficult cross-market dynamic interaction capture and high complexity of multi-agent game equilibrium solution in the multi-market-coupling environment, which leads to limited strategy generalization ability and decision accuracy, and further research on strategy stability and convergence efficiency in continuous-action space is still needed.
In order to solve the existing problems, this paper proposes a bidding decision optimization model for power suppliers based on a multi-agent deep deterministic strategy gradient algorithm in a carbon–electricity–coal multi-market-coupling environment. Firstly, the time scale difference of carbon–electricity–coal market decision making is analyzed, and the coordination mechanism of the carbon–electricity–coal coupling market decision-making cycle is established. Secondly, the upper-level day-ahead electricity spot market bidding model is established with the goal of maximizing the profit of power suppliers, and the lower-level day-ahead electricity spot market clearing model is constructed. Then, based on the MADDPG algorithm, the competition scenario of multi-generator entities in the power market is simulated. Finally, in the simulation environment of the IEEE-5 node system [18], the thermal power producer is taken as the research object, and several examples are set up to reveal the “low-carbon priority” characteristics of power system scheduling in the carbon trading market environment and the bidding process considering the role of the fuel market in improving the market competitiveness of power producers and also to verify that the decision optimization model proposed in this paper can bring higher benefits to power producers.

2. Carbon–Electricity–Coal Coupling Market Decision-Making Cycle Coordination

2.1. Analysis of Decision-Making Cycle of Power Suppliers Participating in Multi-Sided Market

Under the background of multi-policy coordination and control, such as power market reform and carbon trading, the operation of power suppliers faces the multi-objective coordination challenge of coordinating fuel cost control, power revenue optimization, and carbon quota balance: (1) Based on the power demand forecast, the monthly coal procurement plan of the upstream coal market is formulated, and its procurement cost directly affects the bidding strategy of the power market through the cost transmission mechanism. (2) By participating in the day-ahead bidding of the downstream power market to obtain the time-sharing power generation plan, the output adjustment will change the actual carbon emission intensity. (3) In the horizontal carbon trading market, power suppliers obtain the pre-allocation carbon quota at the beginning of the period, and at the end of the period, it is adjusted retroactively according to the actual power generation [5]. Through the cross-cycle policy design, the carbon market and power production form a feedback loop. The operation mode, decision-making cycle, management system, and policy direction of the coal market, the power market, and the carbon trading market are not the same, but carbon emissions is a transmission medium throughout the whole chain of fuel procurement, power production, and quota clearing, resulting in a coupling relationship between the upstream and downstream markets and the carbon trading market. The specific coupling relationship is shown in Figure 1.
The coupling relationship between the power market, the coal market, and the carbon trading market causes the power suppliers the problem of multi-time-scale coordinated bidding decision making in the multi-sided market operation. On the one hand, to avoid the influence of uncertain factors such as transportation delay and fuel price fluctuation on the cost of electricity production, the power suppliers need to predict the trend of the coal price and sign the monthly fuel procurement agreement before the monthly cycle, while the main power of the electricity is determined by the delivery of the day-ahead market on the day before the day of power operation, forming the time mismatch between fuel procurement and power decision making. On the other hand, the generation plan formed by day-ahead bidding needs to be implemented immediately, but the final accounting of carbon quotas needs to be delayed until the end of the monthly cycle, resulting in a vertical separation of the timing of market clearing and policy compensation. This multi-stage time-scale difference across month-ahead, day-ahead, and month-after leads to the need for power suppliers to coordinate fuel costs, power benefits, and policy risks under asymmetric timing constraints, which greatly increases the complexity of operation coordination. To this end, this paper proposes a carbon–electricity–coal coupling market decision-making cycle coordination mechanism to coordinate the operation decisions of power suppliers with multi-stage time scales in a complex market environment.

2.2. Carbon–Electricity–Coal Coupling Market Decision-Making Cycle Coordination Model

2.2.1. Coordination of Monthly Coal Market Cost Decision Making

At present, the electricity spot market is a game market with incomplete information. It is difficult for generators to obtain the unit quotation information of other generators, and it is difficult to obtain the maximum profit according to marginal cost quotation. Therefore, power suppliers need to reduce the upstream power generation cost as much as possible and take appropriate pricing strategies to obtain higher profits. This paper takes the example of coal-fired power plants to illustrate, and most of its day-ahead electricity spot market decisions use quadratic curve-fitting unit power generation cost as a decision reference [7]:
C o s t g , t G = ( a λ P g , t 2 + b λ P g , t + c λ )
In the formula, the winning power quantity of the generator in the period is Pg,t; aλ, bλ, and cλ are the secondary term, primary term, and constant term coefficient of the generation cost, respectively.
However, the above traditional power generation cost accounting method based on the historical calculation data to fit the quadratic curve not only has a fixed error, but it has difficulty in reflecting the price changes [19] in the upstream coal market in real time. In order to fully tap the cost advantage of generators and transform it into competitiveness in downstream power market bidding, it is proposed to estimate the power generation cost by using the standard coal consumption per generation unit and the daily updated average unit price of standard coal in the furnace and dynamically update the power generation cost based on the average unit price of standard coal in the furnace. The specific method is as follows:
C o s t g , t G = k T p d ψ t P g , t
where kT represents the cost conversion coefficient; pd is the average unit price of standard coal in the furnace; pdψt is the unit fuel cost of power generation; ψt represents the unit standard coal consumption of power generation in the t period of the unit, and the unit is g/kWh. The real-time coal consumption of power production equipment is difficult to directly measure, so the unit of the coal consumption of power generation is usually fitted by experimental method with the help of a quadratic function. The formula is as follows:
ψ t = a T P g , t 2 + b T P g , t + c T
where aT, bT, and cT are the quadratic term, primary term, and constant term coefficients of the standard coal consumption curve of power generation unit, respectively.

2.2.2. Coordination of Monthly Carbon Trading Market Decision Cycle

The approved cost of the carbon emission trading market of the power suppliers is affected by the monthly electricity consumption and the original carbon quota at the beginning of the contract performance period, and the trading cycle belongs to the monthly settlement mechanism, which is contrary to the day-before auction clearance decision-making cycle. This paper considers the synergistic factors [20] of the decision cycle of the carbon trading market and the power market and determines the connection method of carbon trading policy simulation and the power market.
The carbon trading center calculates the initial carbon quota and pre-issues the carbon quota. The regional power dispatching center calculates the initial free carbon quota of the generators based on the baseline method and realizes the low-carbon clearing of the electric–carbon coupling market by combining the carbon emission intensity of the generators of each power supplier with the daily quotation information of the generators. The initial carbon emission quota, carbon emission, and carbon quota trading volume of generator g are calculated as shown in Equation (4):
M g , t T = P g , t δ T M g , t G = P g , t δ G M g , t = P g , t ( δ G δ T )
where M g , t T is the inherent carbon quota of generator g, and δT is the inherent carbon emission quota per unit of power generation. M g , t G is the carbon emission of generator g during period t, and δG is the carbon emission coefficient per unit power generation, that is, carbon emission intensity.
Carbon trading costs are related to carbon emissions. In view of the lack of a differentiated regulation mechanism for emission reduction incentives and violation punishment in traditional carbon emission methods, which is likely to lead to the lack of incentive for power suppliers to reduce emissions and the failure of market regulation, this paper proposes a stepped interval improvement based on carbon trading cost to achieve accurate guidance for power suppliers’ emission reduction behavior and efficient allocation of resources. The improved stepped carbon trading model is shown in Equation (5).
C o s t g , t CO 2 = p CO 2 ( 1 + 3 υ ) ( M g , t + 3 l ) , 4 l < M g , t < 3 l p CO 2 ( 1 + 2 υ ) ( M g , t + 2 l ) , 3 l < M g , t < 2 l p CO 2 M g , t θ , l < M g , t < l + p CO 2 k + ( k 1 ) θ l + + p CO 2 1 + k θ ( M g , t k l + ) , k l + < M g , t ( k + 1 ) l + , k = 1 , 2 , 3 ...
In the formula, C o s t g , t C O 2 is the carbon trading cost of generator g; pCO2 is the carbon price; υ and θ are the reward coefficient and penalty coefficient of carbon trading, respectively; and l+ and l- are the positive and negative interval carbon trading lengths, respectively.

2.2.3. Day-Ahead Decision-Making Process of Carbon–Electricity–Coal Coupling Market

Based on the coal market cost guidance and carbon trading volume calculation model, the decision-making cycle of the carbon–electricity–coal coupling market is herein coordinated, and the multi-sided market decision making is coordinated and reflected in the decision making of the electricity spot day-ahead market, which is helpful for power suppliers to optimize the bidding decision making of the power market according to the generation cost and carbon transaction cost so as to achieve the maximum profit. The market organization process based on carbon–electricity–coal multi-sided market coupling constructed in this paper is shown in Figure 2.

3. Bidding Decision Optimization Model of Power Suppliers in Carbon–Electricity–Coal Coupling Market

3.1. Bidding Decision Optimization Model of Power Suppliers

In order to focus on the optimization of quotation strategy of power suppliers in the day-ahead power market, this paper temporarily does not consider the impact of market transaction process on the error of user-side load declaration and instead only considers the day-ahead electricity energy market in the power spot market.
In this paper, it is assumed that the generators owned by the power suppliers are all thermal generators, and the purpose of maximizing the total profit of power generation is to purchase coal in advance of each month, modify the maximum unit output within the month, calculate the carbon trading cost in real time, and update the bidding decision-making scheme dynamically. The power generation income of the power suppliers in the multi-market-coupling environment can be expressed as follows:
max φ g , d , [ a g , t , P g , t o ] F g , d = t = 1 T ρ g , t P g , t C o s t g , t CO 2 ( P g , t ) C o s t g , t G ( P g , t )
Among them, φg,d is the maximum output of the D-day reporting unit; ag,t and P g , t o are the bidding curve parameters of the unit; Fg,d is total generation profit on D-day; T is the bidding cycle of D-day, and t is the bidding period; ρg,t and Pg,t are the clearing price and clearing power of marginal node in time period t, respectively; C o s t g , t C O 2 is carbon transaction cost; C o s t g , t G is the cost of power generation.
Generators generally use the “stepped multi-stage quotation” mode to design the quotation strategy. In this paper, the ag,t times coefficient of the ladder quotation is used for quotation:
P min P g , t b P max , b B = { 1 , 2 , 3 } P g , t = b = 1 B P g , t b p g , t 1 = p g , t p g , t b = ( 1 + a g , t ) p g , t , b B = { 2 , 3 }
where P g , t b is the bid-winning electricity quantity of the generator g in the b capacity segment during the period t. The callable capacity of the generator is divided into three sections. In order to ensure the successful bidding of the unit, the first quotation coefficient is set to 1, and the second and third sections of each capacity [ P g , t b 1 , P g , t b ] corresponding to the quote are p g , t b , which is composed of the unit power generation cost pg,t of the generator and the quotation coefficient a g , t .

3.2. Day-Ahead Electricity Spot Market Clearing Model

The dispatching agency usually constructs the power market clearing model based on the declared data of the generation side and maximizes social welfare and determines the generator start–stop plan together with the security constrained unit commitment (SCUT) model. Using the security constrained economic dispatch (SCED) to allocate the clearing electricity and finally combining the power transfer factor and Lagrange multiplier constrained by the power flow to determine the marginal node price, the specific clearing model is as follows [20]:
The SCUC model determines the generator start–stop combination:
min g = 1 N t = 1 T b = 1 B p g , t b P g , t b + C g , t U + C g , t K
In the formula, P g , t b is the bid power of the generator g in the period t, p g , t b is the quotation of the corresponding power segment, C g , t U is the cost of generator start-up, and C g , t K is the cost when the unit load power is 0.
The power balance and start–stop constraint are as follows:
g = 1 N P g , t u g , t = P D , t
where ug,t represents the operating state of generator g during the time period t; PD,t is the system power load demand in time period t.
The unit power upper and lower limit constraints are as follows:
P g , min P g , t P g , max
where Pg,max is the maximum output of the generator g, and Pg,min is the minimum output of the generator g.
The unit climbing constraints are as follows:
P g , t P g , t 1 Δ P g U u g , t 1 + P g , min ( u g , t u g , t 1 ) + P g , max ( 1 u g , t ) P g , t 1 P g , t Δ P g D u g , t 1 P g , min ( u g , t u g , t 1 ) + P g , max ( 1 u g , t 1 )
where Δ P g U is the maximum climbing rate of generator g, and Δ P g D is the maximum downhill rate.
Consider branch power flow constraints:
F l F l M 0 F l F l M 0
where Fl is the power flow of the branch number l, and F l M is the limit value of the power flow of the branch number l.
The SCED model determines the generator start–stop combination to calculate the clearing power of each generator, and the objective function is the maximum economic benefit of ISO, which is as follows:
min g = 1 N t = 1 T b = 1 B p g , t b P g , t b
The constraints of the economic dispatching model are power constraints, slope constraints, and line flow constraints.
g = 1 N P g , t = P D , t P g , min P g , t P g , max P g , t P g , t 1 Δ P g U P g , t 1 P g , t Δ P g D F l F l M 0 F l F l M 0
After the SCUC and SCED market clearing, each power generator will obtain the clearing price and the clearing electricity. The clearing price is calculated by the marginal node price in this paper:
ρ k , t = λ t l = 1 L ( τ l , t max τ l , t min ) G l k
where λ and τ are Lagrangian multipliers of the power balance constraint and power flow constraint, respectively, and G represents the power transfer distribution factor of the generator.

4. Multi-Generator Bidding Strategy Based on MADDPG Algorithm

4.1. Bidding Strategy Design of Multi-Generator Market

In order to simulate the bidding decision-making process of multi-generators participating in the power market, the generator is modeled as an agent, and the bidding behavior of the generator is described as a Markov game (MG) process. The specific definition is as follows:
Environment: This paper constructs the trading environment of the electricity spot market according to the implementation rules of spot electricity energy trading in the Guangdong power market (2022). The power suppliers formulate the bidding strategy through three-stage bidding, and the ISO clears the clearing price and clearing electricity of each power supplier through the market clearing model and feeds back the private information to the corresponding power supplier.
State: The state space of the agent is set to the load information published by the ISO in the current period and the private information of the previous period, that is, the clearing electricity quantity and the clearing electricity price. Among them, the load of each period is directly published by the ISO, and the private information is calculated and fed back to the agent by the ISO through the clearing program.
Action: The action domain of the multi-agent is set to the quotation coefficient ag,t of the second and third sections of the quotation curve.
Reward: The reward of the agent is defined by Equation (6); that is, the agent calculates the reward with the goal of maximizing the profit of the power supplier and updates the network parameters.

4.2. Multi-Agent Deep Deterministic Gradient Strategy Algorithm

The multi-agent deep reinforcement learning (MADRL) algorithm was gradually used to solve the bidding problem of multi-generator market with complex market rules and random market participants strategies due to its good simulation ability of multi-market players. In this paper, combined with MADRL and DDPG algorithms, a MADDPG algorithm based on the “centralized training, decentralized execution” framework is proposed. The deep neural network is used to realize the strategy learning of the deterministic strategy gradient and solve the problem of the multi-generator participating in market bidding strategy.
The networks of policy function μ (s|θμ) and Q-value function Q (s,a|θQ) are fitted based on different deep neural networks. The parameter of policy function μ (s|θμ) is θμ, and the parameter of value network function Q (s,a|θQ) is θQ [21].
Q μ ( s t , a t ) = E [ r t + γ Q μ ( s t + 1 , a t + 1 ) ]
θ μ J ( μ g ) = E s , a [ θ g μ g ( a g s g ) a g Q g μ ( s , a ) a g = μ ( s g ) ]
where μ (sg) represents the current policy function, and s and a represent the joint state and joint action vector of multiple agents, respectively; that is, s = (s, s2,…, sN), a = (a1, a2,…, aN).
The increase in the number of agents in the environment will exacerbate the difficulty of policy convergence. In this paper, the stability of the algorithm is improved by using the master–target network architecture: the main network generates actions and evaluates returns in real time, while the target network’s updates are delayed based on the main network parameters, and it gradually approximates the optimal value through linear interpolation, where the loss function of the value function main network can be defined as follows:
L ( θ Q ) = E s , a , r , s [ ( Q ( s , a θ Q ) y ) 2 ] y = r t + γ Q ( s , a θ Q ) a g = μ ( s g )
where the main network parameter of the value function is θQ, rt is the reward value of the environment feedback to the agent, γ is the discount coefficient, the target network parameter of the value function is θQ′, and the target network output value of the value function is expressed as Q′ (s, a, and θQ′). The target network of the policy function is μ′ ( s g ), sg represents the state vector at the next moment, and ag represents the action vector at the next moment.
We can update the parameters of the policy function master network as well as the value function master network by Equation (19).
θ μ = θ μ + l r a J θ Q = θ Q l r q L ( θ Q )
where lra and lrq represent the learning rate of the policy network and the value network, respectively. The target network parameters of the policy function and the target value network parameters of the value function are soft-updated by Equation (20), and τ is the update rate.
θ μ = θ μ τ + ( 1 τ ) θ μ θ Q = θ Q τ + ( 1 τ ) θ Q

4.3. Multi-Generator Bidding Strategy Based on MADDPG

The bidding strategy solving framework based on MADDPG is shown in Figure 3. Each generator as an independent agent first observes the current state st to formulate a strategy π = (at,|st). ISO collects the bidding information reported by each agent for market clearing and feeds back the clearing results to each agent. The MADDPG-based generator bidding strategy optimization algorithm is as follows:
(1)
The parameters of the main value function network, the target value function network, the main policy function network, and the target strategy function network are initialized as θQ, θQ′, θμ, and θμ′, respectively. The number of iterations is set to M, and the time step is set to T;
(2)
Initialize the number of training iterations k = 0, and set the experience pool overflow flag;
(3)
The initial state of the random agent and the time interval of the initial iteration rounds is t = 0;
(4)
According to the current state st, the action of each agent is output through the main network of the policy function at= (a1,t, a2,t,…,aN,t). ISO calculates the clearing price and clearing power of each agent based on the market clearing program. Each agent calculates its own instant reward value according to Equation (6). Then, update the status value of the agent in the next period st+1 = (s1,t+1, s2,t+1,…, sN,t+1);
(5)
Store experiences (st, at, rt, st+1) in the experience pool and number them.
(6)
Check whether the current experience number is greater than the experience pool overflow flag. If the experience pool overflow flag is greater than the experience pool overflow flag, select Batch size samples at random to execute the training function. Otherwise, directly perform step 8;
(7)
Update the main network parameters of the policy function and the main network parameters of the value function according to Equation (19). After the update, the target network parameters of the policy function and the target network parameters of the value function are soft-updated according to Equation (20);
(8)
Let t = t + 1, and determine whether the number of iterations exceeds step T; if so, go to the next step; otherwise, go to step 4;
(9)
Let k = k + 1, and judge whether the number of iterations exceeds the total number of iterations. If so, end the iteration process and output the final clearing price and clearing power of each agent and save the parameters of the policy function model; otherwise, turn to step 3.

5. Results and Discussion

Through typical experimental cases, this paper studies the influence of multi-market structure on the bidding strategy of power suppliers and verifies the effectiveness and superiority of MADDPG algorithm in solving the bidding decision optimization model of multi-power suppliers in carbon–electricity–coal coupling environment. Based on the IEEE-5 node system, the bidding game behavior of multi-power suppliers is simulated. The system topology is shown in Figure 4a. The 24 h load curve and the parameters of the generator set are from the PJM website [22]. The 24 h load curve is shown in Figure 4b. Each node is configured with a power supplier, and each power supplier has a thermal generator. The upper limit of the bidding coefficient is set to 0.5, and the lower limit is set to −0.2. The learning rate of the critic network in the MADDPG algorithm is set to be higher than that of the actor network, and the values are taken as 1 × 10−3 and 1 × 10−4, respectively. The purpose is to provide accurate Q-value estimation through fast critic update and to ensure the stability of the strategy through conservative actor update. Assuming that the system is not blocked, the technical parameters of the generator are shown in Table A1 of Appendix A, the carbon emission information of the generator is shown in Table A2, and the network structure and hyperparameters of the MADDPG algorithm are shown in Table A3 [23,24].

5.1. Feasibility and Effectiveness Analysis of MADDPG-Based Generator Equilibrium Game

In this section, example 1 is set to verify the effectiveness of the proposed MADDPG algorithm for solving the multi-generator bidding decision optimization model. The example setting does not consider the factors of carbon trading market and coal market. Each generator formulates the bidding strategy based on the supply cost curve, and the independent system operator (ISO) performs the market clearing calculation.
Figure 5a shows the convergence characteristics of the cumulative reward of each agent in the multi-generator equilibrium game process based on MADDPG. Figure 5b,c show the game process of the bidding coefficient of the generator and the winning electricity at the time of t = 5 h with lower load. Figure 5d shows the iteration process of the system marginal price at the selected time point. As shown in Figure 5a, the initial training stage shows a significant return shock, which is due to the fact that agents accumulate market awareness by exploring new bidding strategies. With the continuous precipitation of strategic experience, the return value of learning shows an increasing trend and eventually tends to be stable. As the bidding coefficient of each generator gradually converges, the output of the generators and the market clearing price gradually stabilize.
In order to evaluate the superiority of the proposed MADDPG algorithm, this study compares it with multi-agent double deep Q-network (MADDQN) algorithm, proximal policy optimization (PPO) algorithm, and independent Q-learning (IQL) algorithm. The change curve of the total reward of the generator’s moving average is shown in Figure 6.
From the observation of convergence characteristics, the MADDPG algorithm and MADDQN algorithm converge at about 600 iterations, while the PPO and IQL algorithms converge after 300 iterations. The convergence speed of MADDPG and MADDQN is lower than that of PPO and IQL algorithm. The MADDPG algorithm and MADDQN show a more robust progressive learning curve, which verifies the dynamic balance characteristics of the multi-agent interactive environment in the power market. Although the PPO and IQL algorithms show faster initial convergence speed, their revenue growth models have obvious fluctuations, reflecting the lack of sensitivity of traditional reinforcement learning methods to market microstructure when dealing with continuous-action space.
In the stability dimension, thanks to the collaborative optimization of the actor–critic dual-network architecture, the MADDPG and PPO algorithms maintain the stability of the revenue curve in the long-term game, and the fluctuation of the revenue is maintained within 2000 yuan. The MADDQN algorithm has a discontinuous-action space, resulting in a maximum fluctuation of the total revenue of 5000 yuan after the model is stable, confirming the importance of continuous-action modeling to the real-time bidding scenario of the power market. At the level of optimization results, the optimal total revenue obtained by the MADDPG algorithm is 369,975.39 yuan. Compared with the IQL algorithm, PPO algorithm, and MADDQN algorithm, the revenue increases by 9.15%, 10.07%, and 17.49%, respectively, showing a significant total revenue advantage.
It can be seen that the MADDPG algorithm proposed in this paper shows high superiority in the calculation efficiency of game equilibrium and the accuracy of strategy convergence, which verifies its theoretical value as the optimal solution algorithm for the bidding decision model of power suppliers.

5.2. Bidding Decision of Power Suppliers Considering Carbon–Electricity–Coal Coupling Market

In this section, examples 2 and 3 are set to study the bidding decision-making behavior of multi-generators in the market-coupling scenario. Example 2 assumes that all generators participate in the power market and the carbon trading market, and the influence of the carbon trading market on bidding strategy of multi-generators is simulated and analyzed.
Figure 7a shows the equilibrium results of the generator output and the clearing price in the electricity–carbon coupling market. Figure 7b is a comparison of the total carbon emissions of generators in 24 periods. Table 1 is the comparison between case 2 and case 1 in terms of power generation revenue, carbon emissions, carbon transaction costs, power generation costs, and total revenue.
From Figure 7a, it can be seen that the carbon trading mechanism significantly changes the bidding strategy of generators through the cost internalization effect: low-emission generators (such as generator 4) form a cost advantage by virtue of carbon emission quota savings. Under the condition of similar marginal power generation costs, the bid-winning electricity increases from 859.51 MWh to 1073.66 MWh, an increase of 24.91%; high-emission units (e.g., generator 1) are subject to rigid constraints on carbon emission costs, resulting in weakened price competitiveness and reduced output share.
Figure 7b shows that after considering the carbon trading market, the overall trend of total carbon emissions decreases significantly. Table 1 shows that carbon trading restructures the operating cost structure of market players—the purchase cost of carbon emission rights becomes an independent cost item, which reduces the total revenue of generators to a certain extent and promotes cost allocation to low-carbon attributes. This cost reconstruction is directly reflected in the bidding strategy: low-carbon generators hedge operating costs through carbon asset returns and enhance market competitiveness, while high-emission generators are forced to accept lower output levels due to cost rigidity. The system scheduling shows the characteristics of “low-carbon priority”. The carbon emission intensity is negatively correlated with the output probability, indicating that the carbon price signal has been deeply integrated into the market clearing mechanism. The essence of this structural adjustment is the reshaping of the marginal cost function of generators by carbon trading price signals, which promotes the formation of an endogenous linkage mechanism between market clearing prices and carbon emission costs.
In example 3, the bidding decision of generator 4 is set to consider the coal market additionally. Compared with generator 4 without considering the coal market in example 2, the moderating effect of coal market cost transmission on the bidding strategy of generator 4 is analyzed. The generator output, power generation income, and their comparison of the generator 4 in each period are shown in Figure 8a,b, which presents the comparison of the total carbon emissions in the whole period under the coupling of each market. Table 2 shows the income results of generators participating in the carbon–electricity–coal coupling market.
When generator 4 introduces the dynamic cost accounting mechanism of the coal market into the power market quotation model, the bidding power is significantly improved from 1073.66 MWh to 1362.83 MWh. This strategy transmits the unit power generation fuel cost to the bidding function by tracking the fuel price fluctuation in real time and directly reduces the marginal cost of electricity, thus forming a dynamic cost advantage with market competitiveness.
From the comparison of the income results in the coupling of the carbon–electricity–coal market in Table 2 with other market conditions, it can be seen that the increase in the output of generator 4 as a low-emission generator produces double benefits: on the one hand, the increase in the power generation of low-emission generators directly reduces the demand for an alternative output of high-emission generators, the overall carbon emission intensity of the power system decreases, and the total carbon emission decreases from 4334.51 t to 4263.37 t. The overall carbon emission level changes are shown in Figure 8b. On the other hand, the reduction in total carbon emissions has a chain reaction through the carbon quota allocation mechanism—the improvement of the marginal emission intensity of the system not only improves the efficiency of the use of carbon quotas but also reduces the total control cost of the carbon emission trading market. The cost of carbon trading decreases from 162,147.91 to 153,611.72 yuan. The synergistic realization of environmental and economic benefits is due to the endogenous effect of the market mechanism. The share of power generation obtained by generator 4 through cost advantage expands, which objectively promotes the transformation of the power system to low-carbon while improving individual operating income, forming a positive cycle of “cost optimization driving market selection–market structure change guiding emission reduction–emission reduction effect feeding back market efficiency”.

6. Conclusions

The results of this paper are applicable to the transitional electricity market with thermal power as the main body. In this paper, by constructing the decision-making framework for power suppliers to participate in the bidding of carbon–electricity–coal coupling market, the decision-making time-scale differences of the coal market, day-ahead power market, and carbon trading market were analyzed, and the decision-making cycle coordination mechanism of the carbon–electricity–coal market was established to realize the coordinated decision making of the multi-sided coupling market. The upper-level bidding model was established to maximize the revenue of power suppliers. The lower-level clearing model was established based on security-constrained unit commitment and security-constrained economic dispatch. Based on the MADDPG algorithm, the competition scenario of multi-power suppliers in the coupling environment of the carbon–electricity–coal market was simulated, and the influence of a multi-market structure on the bidding strategy of power suppliers was analyzed. The following conclusions were obtained through simulation analysis:
(1)
The results show that the proposed MADDPG algorithm-based multi-generator bidding strategy can effectively simulate the dynamic process of the bidding decision making of multi-generators. The performance of the MADDPG algorithm was compared with MADDQN, PPO, and IQL algorithms. The results show that the convergence performance of the MADDPG algorithm is better, and the optimized decision benefit is higher than that of other algorithms;
(2)
The coupling environment of the carbon–electricity–coal market has a significant impact on the bidding strategy of power suppliers. The carbon trading market reshapes the cost structure of power generation through cost internalization, which reduces the total carbon emissions while reducing the overall profit margin of the power generation side. Low-emission units rely on carbon quota income to form a competitive advantage, and high-emission units are subject to cost constraints. To reduce the share of output, the bidding process considers the coal market to update the power generation cost in real time, which can enhance the market competitiveness of power suppliers, enable power suppliers to obtain additional market advantages, and improve the clear power and profit income. The bidding decision optimization model of power suppliers in the carbon–electricity–coal coupling market as proposed in this paper will encourage power suppliers to upgrade and promote the energy conservation and emission reduction of generating units.

Author Contributions

Conceptualization, Z.L.; methodology, C.L.; software, B.W.; validation, Q.H.; writing—original draft preparation, C.L.; writing—review and editing, X.Z.; supervision, Z.L. and B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2022YFB2403503).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Acknowledgments

Zhiwei Liao thanks Chengjin Li, Xiang Zhang, Qiyun Hu, and Bowen Wang for the valuable discussions and their helpful advice with this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Technical coefficient of generator.
Table A1. Technical coefficient of generator.
Unit Numbera
(CNY/MWh2)
b
(CNY/MWh)
P g min
(MW)
P g max
(MW)
Ramp Rate (MW/min)kT
G10.363601012050-
G20.423601013050-
G30.333451015050-
G40.3936010110501.5
G50.483301015050-
Table A2. Generator unit carbon emission parameters.
Table A2. Generator unit carbon emission parameters.
Unit Number pCO2
(CNY/t)
δG
(t/MWh)
δT
(t/MWh)
G11200.9750.575
G21200.60.575
G31200.9750.575
G41200.60.575
G51200.9750.575
Table A3. Parameter setting of MADDPG algorithm.
Table A3. Parameter setting of MADDPG algorithm.
Parameters of MADDPGCritic NetworkActor Network
Number of neural network layers33
Batch size 5050
Hidden layer activation functionReluRelu
Output layer activation functionReluSigmoid
Buffer size50,00050,000
Discount factor0.950.95
Learning rate1 × 10−31 × 10−4
Number of rounds20002000
Soft update rate0.010.01
Number of agents55

References

  1. Xie, K.; Liu, D.; Li, Z.; Sun, T.; Pang, B.; Liu, S.; Zhang, X. Multi-dimensional Collaborative Electricity Market System for New Power System. Autom. Electr. Power Syst. 2024, 48, 2–12. [Google Scholar]
  2. Shang, J.; Jiang, X.; Xiao, D.; Li, Z.; Yin, S.; Gao, J. Marginal Cost Comparison Bidding Mode with Deep Fusion of Marginal Pricing and Economic Dispatching. Autom. Electr. Power Syst. 2024, 48, 185–193. [Google Scholar]
  3. Peng, C.; Yi, T.; Sun, H.; Chen, S. Power Generator Balanced Bidding Based on Multi-agent Deep Deterministic Strategy. Power Syst. Technol. 2023, 47, 4229–4239. [Google Scholar]
  4. Zhao, E.; Wang, H.; Lin, H. Ladder Bidding Strategy of Thermal Power Enterprises According to Evolutionary Game in Spot Market. Electr. Power Constr. 2020, 41, 68–77. [Google Scholar]
  5. Zheng, L.; Zhou, B.; Chung, C.; Li, J.; Cao, Y.; Zhao, Y. Coordinated Operation of Multi-energy Systems With Uncertainty Couplings in Electricity and Carbon Markets. IEEE Internet Things J. 2024, 11, 24414–24427. [Google Scholar] [CrossRef]
  6. Wang, Y.; Qiu, J.; Tao, Y.; Zhao, J. Carbon-oriented operational planning in coupled electricity and emission trading markets. IEEE Trans. Power Syst. 2020, 35, 3145–3157. [Google Scholar] [CrossRef]
  7. Liao, Z.; Zheng, G.; Xie, X.; Wang, B.; Zhang, W. Two-stage Decision Model for coal-fired Power Plant Based on Upstream and Downstream Market Linkage. Proc. CSEE 2024, 44, 3036–3046. [Google Scholar]
  8. Feng, H.; Yang, Z.; Zheng, Y.; Ye, F.; Zhang, X.; Shi, X. Intelligent Agent Based Bidding Simulation Method for Multi-input Decision Factors of Power Suppliers. Autom. Electr. Power Syst. 2018, 42, 72–77. [Google Scholar] [CrossRef]
  9. Wu, J.; Li, C.; Guan, X.; Gao, F. Unit Constraints Considered Genco’s Bidding Strategies in Hour-ahead Electricity Market. Proc. CSEE 2008, 16, 72–78. [Google Scholar]
  10. Wang, J.; Wu, J.; Kong, X. Multi-agent simulation for strategic bidding in electricity markets using reinforcement learning. CSEE J. Power Energy Syst. 2021, 9, 1051–1065. [Google Scholar]
  11. Gong, K.; Wang, X.; Deng, H.; Jiang, C.; Ma, J.; Fang, L. Deep Reinforcement Learning Based Optimal Energy Storage System Operation of Photovoltaic Power Stations With Energy Storage in Power Market. Power Syst. Technol. 2022, 46, 3365–3377. [Google Scholar]
  12. Ye, Y.; Qiu, D.; Sun, M.; Papadaskalopoulos, D.; Strbac, G. Deep reinforcement learning for strategic bidding in electricity markets. IEEE Trans. Smart Grid 2019, 11, 1343–1355. [Google Scholar] [CrossRef]
  13. Gao, Y.; Li, Y.; Cao, R. Simulation of Generators’ Bidding Behavior Based on Multi-agent Double DQN. Power Syst. Technol. 2020, 44, 4175–4183. [Google Scholar]
  14. Liu, D.; Gao, Y.; Wang, W.; Dong, Z. Research on bidding strategy of thermal power companies in electricity market based on multi-agent deep deterministic policy gradient. IEEE Access 2021, 9, 81750–81764. [Google Scholar] [CrossRef]
  15. Xu, D.; Hu, X.; Hu, F.; Cha, Y.; Zhang, C.; Yu, Y.; Zhao, Y. Strategic Bidding of Price-quantity Pairs in Electricity Market Based on Deep Reinforcement Learning. Power Syst. Technol. 2024, 48, 3278–3286. [Google Scholar]
  16. Ren, K.; Liu, J.; Liu, X.; Nie, Y. Reinforcement Learning-Based Bi-Level strategic bidding model of Gas-fired unit in integrated electricity and natural gas markets preventing market manipulation. Appl. Energy 2023, 336, 120813. [Google Scholar] [CrossRef]
  17. Liang, Y.; Guo, C.; Ding, Z.; Hua, H. Agent-based modeling in electricity market using deep deterministic policy gradient algorithm. IEEE Trans. Power Syst. 2020, 35, 4180–4192. [Google Scholar] [CrossRef]
  18. Wang, B.; Li, C.; Ban, Y.; Zhao, Z.; Wang, Z. A two-tier bidding model considering a multi-stage offer carbon joint incentive clearing mechanism for coupled electricity and carbon markets. Appl. Energy 2024, 368, 123497. [Google Scholar] [CrossRef]
  19. Jiang, W.; Wu, J.; Feng, W.; Duan, X.; Tang, H.; Wu, L. Bilateral Game Model of Power Supply and Demand Sides with Incomplete Information in Day-ahead Electricity Market. Autom. Electr. Power Syst. 2019, 43, 18–24+75. [Google Scholar]
  20. Lu, Z.; Shang, N.; Zhang, Y.; Chen, Z.; Yang, X.; Li, P. Nash-Stackelberg Game Model for Power Generation Enterprises Participating in Capacity Market. Autom. Electr. Power Syst. 2023, 47, 94–102. [Google Scholar]
  21. Yang, Y.; Ji, T.; Jing, Z. Selective learning for strategic bidding in uniform pricing electricity spot market. CSEE J. Power Energy Syst. 2021, 7, 1334–1344. [Google Scholar]
  22. PJM Interconnection. Markets Database Dictionary. Available online: https://pjm.com (accessed on 5 October 2024).
  23. Yuan, J.; Yang, M.; Liu, N.; Zhang, C.; Huang, S. Bidding Strategy of Generation Companies Based on Multi-agent Deep Deterministic Policy Gradient Algorithm Under Incomplete Information. Power Syst. Technol. 2022, 46, 4832–4844. [Google Scholar]
  24. Rashedi, N.; Mohammad, A.T.; Hamed, K. Markov game approach for multi-agent competitive bidding strategies in electricity market. IET Gener. Transm. Distrib. 2016, 10, 3756–3763. [Google Scholar] [CrossRef]
Figure 1. Coupling relationship of carbon–electricity–coal market.
Figure 1. Coupling relationship of carbon–electricity–coal market.
Energies 18 02388 g001
Figure 2. Market organization process based on carbon–electricity–coal multi-sided market coupling.
Figure 2. Market organization process based on carbon–electricity–coal multi-sided market coupling.
Energies 18 02388 g002
Figure 3. The bidding strategy framework of power suppliers in carbon–electricity–coal coupling market based on MADDPG algorithm.
Figure 3. The bidding strategy framework of power suppliers in carbon–electricity–coal coupling market based on MADDPG algorithm.
Energies 18 02388 g003
Figure 4. (a) IEEE-5 standard node system topology diagram; (b) 24 h load curve diagram.
Figure 4. (a) IEEE-5 standard node system topology diagram; (b) 24 h load curve diagram.
Energies 18 02388 g004
Figure 5. (a) Average returns of MADDPG for solving the equilibrium model of generators; (b) the bidding coefficient game process of each generator when t = 5 h; (c) the output game process of each generator when t = 5 h; (d) iterative convergence of system marginal price.
Figure 5. (a) Average returns of MADDPG for solving the equilibrium model of generators; (b) the bidding coefficient game process of each generator when t = 5 h; (c) the output game process of each generator when t = 5 h; (d) iterative convergence of system marginal price.
Energies 18 02388 g005
Figure 6. Performance comparison of multiple algorithms.
Figure 6. Performance comparison of multiple algorithms.
Energies 18 02388 g006
Figure 7. (a) Bidding results of generators in electricity–carbon coupling market; (b) the full cycle change trend of total carbon emissions.
Figure 7. (a) Bidding results of generators in electricity–carbon coupling market; (b) the full cycle change trend of total carbon emissions.
Energies 18 02388 g007
Figure 8. (a) The winning power and revenue of generator 4 in different scenarios; (b) total carbon emissions over the whole period.
Figure 8. (a) The winning power and revenue of generator 4 in different scenarios; (b) total carbon emissions over the whole period.
Energies 18 02388 g008
Table 1. Comparison of bidding clearing results in electricity–carbon coupling market.
Table 1. Comparison of bidding clearing results in electricity–carbon coupling market.
Generation RevenueCarbon EmissionsCarbon Trading CostPower-Production CostGross Income
Power market2,379,427.514431.8002,009,452.54369,975.63
Electricity–carbon market2,500,657.824334.51162,147.912,013,965.78324,544.24
Table 2. Comparison of income results in carbon–electricity–coal coupling market.
Table 2. Comparison of income results in carbon–electricity–coal coupling market.
Generation RevenueCarbon EmissionsCarbon Trading CostPower-Production CostGross Income
Power market2,379,427.514431.8002,009,452.54369,975.63
Electricity–carbon market2,500,657.824334.51162,147.912,013,965.78324,544.24
Carbon–electricity–coal coupling market2,503,293.884263.37153,611.722,018,970.71330,711.45
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liao, Z.; Li, C.; Zhang, X.; Hu, Q.; Wang, B. A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market. Energies 2025, 18, 2388. https://doi.org/10.3390/en18092388

AMA Style

Liao Z, Li C, Zhang X, Hu Q, Wang B. A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market. Energies. 2025; 18(9):2388. https://doi.org/10.3390/en18092388

Chicago/Turabian Style

Liao, Zhiwei, Chengjin Li, Xiang Zhang, Qiyun Hu, and Bowen Wang. 2025. "A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market" Energies 18, no. 9: 2388. https://doi.org/10.3390/en18092388

APA Style

Liao, Z., Li, C., Zhang, X., Hu, Q., & Wang, B. (2025). A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market. Energies, 18(9), 2388. https://doi.org/10.3390/en18092388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop