An Agent-Based Bidding Simulation Framework to Recognize Monopoly Behavior in Power Markets

: Although many countries prefer deregulated power markets as a means of containing power costs


Introduction
For many years, the power industry has been dominated by large utilities with the authority to manage all generation, transmission, and distribution activities within their operating scope [1].Such utilities are known as vertically integrated.State-franchised utility companies need a monopoly covering sales and control of the transmission network in an area of operation to generate and transmit electricity in that area.When a residential seller or an independent power producer (IPP) is geographically restricted, the utility is the only buyer [2].
The market structure is slowly transitioning to a more competitive market, the socalled deregulated power market (DPM), owing to a high demand growth coupled with an inefficient system management [3,4].A DPM encourages electricity suppliers to operate competitively and consumers to choose a preferred power supplier.The United States was the first country to implement electricity deregulation using independent system operators (ISOs), such as ISO New England, NYISO in New York, and the PJM, a regional transmission organization servicing the combined electricity markets of Pennsylvania, New Jersey, and Maryland [5].Europe is another region where electricity deregulation has been deployed, and the addition of new competitiveness has weakened the traditional monopoly of power utilities, which has had a positive impact on the electricity market [6,7].Experience has shown that appropriate reform programs in the power sector invariably improve efficiency in terms of cost, service quality, and reliability [8].In fact, a DPM allows competition among market participants when electricity is traded, but this does not equate to competition.A monopoly reduces the competitive advantage and the positive effects of a DPM.It reduces competition and may generate unreasonable pricing, reducing both the efficiency of the power system and the interests of consumers [9][10][11].Market power is the main anti-competitive practice that may hinder competition in the power sector, particularly in the power generation sector [12].Market power exists in a restructured power system when any single generation company exerts an influence on market pricing or power supply.Market power can be defined as the ability of a seller or a group of sellers to push prices above competitive levels, control total output, or exclude competitors from the relevant market for an extended period of time [12].It reduces the competitiveness, quality, and impact of technological development.For example, generators with a global market power can manipulate the marginal (spot price) and the locational marginal price (LMP) due to transmission congestion, such as in the Power Pool of England and Wales [13].Therefore, when there is a monopoly in the power market, the demand from consumers always requires the unit capacity of at least one power producer.This means that the owner of the capacity can keep raising the price or boundary cost to a very high level.
Monopoly recognition for monopoly behavior prevention has attracted attention from researchers for over 10 years.Multiple models are created to recognize the existence of monopolies [14][15][16].For example, Yen-Yu Lee et al. in [17] proposed an indicator, the transmission-constrained residual supply index (TCRSI), and generalized the RSI into meshed networks.However, their disadvantage is that they are not comprehensive enough and only consider unidirectional power flow.Another example is that Peng Wang et al. in [18] define must-run generation (MRG) as the minimum capacity that a generator must supply to the system load that takes into account generation and transmission constraints.They believe that when the generator has the potential for monopoly, its MRG value will be greater than zero.These existing studies mainly recognize the 'Must-Buy Section' as the existing condition of monopoly, and thus 'Must-Buy Section' recognition is used as the function of monopoly recognition.Indeed, 'Must-Buy Section' is only a potential condition for monopoly.The non-existence of the 'Must-Buy Section' does not represent the absolute non-existence of monopoly behavior.Thus, a more exact method to recognize the monopoly behavior is required.
In recent years, the demand for the public to understand market mechanisms and how market participants are affected by their results has encouraged the use of simulation models and tools.Some experience has demonstrated the sustainability and efficiency of agent-based system technology in simulating the market behavior of the power market [19].An agent-based system is a platform that provides agents with the ability to analyze the negotiation context and allows players to automatically adjust their strategic bidding behavior in the market [20,21].
At present, the combination of agent simulation systems and reinforcement learning technology has become a popular research field for analyzing the behavior of market participants in the power auction market [22][23][24][25].In a study of the electricity market, Gong Li and Jing Shi simulated the bidding behavior of suppliers in the electricity market using the Roth Erev learning algorithm [22].The simulation results show that improving the accuracy of wind forecasts can help increase net revenue for wind power companies.Viehmann et al. use a Q-learning algorithm to analyze the optimal bidding strategy of suppliers.The results show that the prices rise with additional information about the supply-demand ratio only when the number of participants is limited and there is a large asymmetry in size [23].Ye et al. used the deep Q-learning (DQL) algorithm to study the strategic bidding behavior of producers in the power market [24].Mohtavipour and Mehdi Jabbari Zideh proposed an iterative collusive strategy search method to detect a collusive strategy in a prisoner's dilemma game in which there is collusive equilibrium.The results show that market participants' collusion in transmission-congested networks can provide them with additional profit opportunities compared to uncongested networks [25].Nevertheless, Energies 2023, 16, 434 3 of 19 these studies did not consider whether there is a monopoly to reduce competition in the power grid and did not study the behavior-strategy choices of agents further in the case of a potential monopoly in the market.
In this study, we propose an agent-based bidding simulation framework for monopoly identification in the electricity market.In this framework, each electricity producer in the electricity market is modeled as an agent aiming to schedule a bidding strategy for more benefit from electricity trading in power markets.The bidding strategy for each agent is selected from a Q-learning strategy based on the effect of historical strategies, which can be easily replaced by any other preferred strategies.With this framework, monopoly behavior can be recognized by un-converging rising bid prices.If bidding price of all agents converge into a stable range, it means that the monopoly behavior is removed.Unlike the pure monopoly existing from adjustment functions from traditional 'Must-Buy Section' based methods, this framework can also provide a market equilibrium point with behaviors from all agents.Moreover, the effect of behavioral-constraint policies (such as a price upper boundary limit) can be easily simulated in this framework as well.The 'Must-Buy Section' based monopoly recognition model will not be able to obtain this result.

The Description of the Proposed Simulation Framework
The typical logic flow of power market trading is revealed below.In the power market, the power generation companies and suppliers submit the quotation curve to the ISO, which obtains the bidding results of each participating party through the scheduling optimization algorithm.The ISO then calculates the LMPs and feeds the winning bid results back to the corresponding power producers [26].The power producers obtain their winning bid results in the market and calculate their profit.By analyzing historical bidding information and profits, the power producers constantly adjust their quotation curve to obtain greater income in the next bidding round.In this context, the adjustment strategies of different power producers are different, and when developing strategies, their market information is not exactly the same.The profit of each power producer is not only tied to its own bidding price, but it is also affected by what others offer.The power producers' decision-making must be based on the transaction model and algorithm of the trading center (ISO) because a slight strategic change in any company can affect the strategies and earnings of other companies.In a competitive power market, there is a strong correlation between the behavior of different players in the market and their profits.The interests of each participant also depend on the market behavior of other participants in the decision-making process.Figure 1 shows the simulation framework approximating the logic flow above.
Figure 1 shows the proposed agent-based bidding simulation framework.In this framework, it is assumed that the demand is deterministic, power producers have individual generators, and they all bid in the day-ahead market and aim to maximize their profits by using the bidding strategies that best represent their expectations.The scheduling model is an hour-by-hour scheduling optimization that schedules the power generation capacity of power producers at different times of the day in the future.Before participating in the nth bid, each power producer must submit its bid data to an ISO.The bidding data include the unit's quotation curve, generating capacity constraints, startup and shutdown costs, and operating ramping constraints.After collecting the bidding data, the ISO starts the economic dispatch algorithm of the security-constrained unit commitment (SCUC), feeds back the unit output plan of each participating power producer, and calculates the LMPs [26].Each power producer can obtain the information about their own winning capacity, which supports their bidding strategy in the next bidding iteration.In the power market, power producers with monopoly behavior will gradually increase their bidding price, owing to their monopoly capacity, thereby reducing the market's competitiveness.In addition, the agent can fully explore and continue to function autonomously in an environment after being provided with pre-designed behaviors.Therefore, to make the bidding strategies and decisions of power producers smarter, this study assumes that each participating power producer has autonomy and is modeled as an agent that can adjust its bidding strategy according to its own historical bidding information to maximize its own profits.

The Quote Curve
The unit data submitted by the participating power producers to the ISOs are called bidding data, which indicate the relationship between the output and price of the units of each power producer, as well as the constraints on the output of the units.Equation (1) introduces the quote curve. .
In Equation (1), Pi,t is the output of the ith unit at time period t; Ai, Bi and Ci are the parameters of the generation cost of the ith unit; Gi,t is the generation cost of the ith unit at time period t;  , is the bidding coefficient of the ith unit at time period t.

The SCUC Model
After receiving the bidding data from the power producers, the ISO obtains the output plan of each unit through the scheduling optimization algorithm.The SCUC model includes UC and transmission network security checks as the base case [27,28].Equations ( 2)-( 7) reveal the SCUC model: ; (2) In the power market, power producers with monopoly behavior will gradually increase their bidding price, owing to their monopoly capacity, thereby reducing the market's competitiveness.In addition, the agent can fully explore and continue to function autonomously in an environment after being provided with pre-designed behaviors.Therefore, to make the bidding strategies and decisions of power producers smarter, this study assumes that each participating power producer has autonomy and is modeled as an agent that can adjust its bidding strategy according to its own historical bidding information to maximize its own profits.

The Quote Curve
The unit data submitted by the participating power producers to the ISOs are called bidding data, which indicate the relationship between the output and price of the units of each power producer, as well as the constraints on the output of the units.Equation (1) introduces the quote curve.
In Equation ( 1), P i,t is the output of the ith unit at time period t; A i , B i and C i are the parameters of the generation cost of the ith unit; G i,t is the generation cost of the ith unit at time period t; α i,t is the bidding coefficient of the ith unit at time period t.

The SCUC Model
After receiving the bidding data from the power producers, the ISO obtains the output plan of each unit through the scheduling optimization algorithm.The SCUC model includes UC and transmission network security checks as the base case [27,28].Equations ( 2)-( 7) reveal the SCUC model: Energies 2023, 16, 434 Equation ( 2) is the objective function of power system unit commitment.NG is the number of generation units; NT is the number of time periods/day; I i,t is the commitment state of unit i at time period t (binary, 1 means at time period t unit i is on, and 0 means unit i is off); SU i,t and SD i,t are the startup and shutdown costs of the ith unit at time period t, respectively.Equation ( 3) reflects the constraints of the supply and the demand balancing.D t is the total load of the power grid at time t, and Loss t is the transmission loss of the power grid at time t.Equation ( 4) represents the constraint on the unit-generation capacity boundary.P min and P max are the minimum and maximum capacities of the unit reserves, respectively.Equation ( 5) represents the ramp rate limits of the units, where UR i and DR i are the ramp-up and ramp-down rates of the ith unit, respectively, and UP i and DP i are the initial ramp-up and ramp-down rates of the ith unit, respectively.Equation ( 6) represents the load flow boundary constraint [29].SF is the shifting factor matrix, KP is the unit correlation matrix, KD is the load correlation matrix, D j,t is the size of the jth load at time period t, and PL max is the upper boundary for the backup power transmission in the transmission line.

The LMP Calculation
The LMP is defined as the marginal cost of supplying the next increment of electrical energy at a specific bus while considering generation and transmission constraints.The electricity market cannot be uniformly cleared when there is congestion in the transmission system.Instead, the market is cleared at the bus level, and the bus clearing price is called the LMP.Physically, the LMP is the cost of supplying the next MW of load at a specific location after considering the costs associated with generation, transmission, and losses [30].That is, the LMP is the sum of the generation marginal cost, transmission congestion cost, and cost of supplying marginal losses, although the cost of losses is usually small.Equations ( 7) and ( 8) reveal a typical solution model of the LMP.
The LMP is determined based on the solution of the optimal power flow from the SCUC.In Equation ( 7), C i,t is the marginal generation cost of the ith unit at time period t, ∆L is a price-taking incremental bus load, λ is a Lagrange multiplier corresponding to the demand constraint, π T and π −T are the Lagrange multipliers corresponding to the power flow constraints, respectively, and µ −T p and µ T p are the Lagrange multipliers corresponding to the maximum and minimum generation constraints, respectively.According to the definition of the LMP, it can be calculated using Equation (8).

The Profit of Power Producers
After the SCUC scheduling optimization, the power producers can calculate their profit based on the bidding result.
In Equation ( 9), Pro i,t is the profit of the ith unit at time t.This value is passed to the Q-learning model as an evaluation metric for the Q-learning model.

The Q-Learning-Based Modelling of Bidding Strategy Making
In the proposed framework, the section of bidding strategy-making is a critical part of the entire simulation.This section actually approximates the market behavior of each power producer, representing their decision-making process with historical bidding information.In this section, the Q-learning-based model is selected as a typical decision-making process.The reason is that the Q-learning-based model is a value-based reinforcement learning algorithm.It finds the optimal action strategy by analyzing historical bidding behavior and its effects.The profit and bidding coefficient of the nth bidding of the power producer are input into the agent model.The sub-state bidding coefficient that is more beneficial to itself is determined through the bidding strategy of the agent in the model and used for the n+1th market bidding.Figure 2 introduces the Q-learning-based model for the bidding strategy making module from Figure 1.

The Profit of Power Producers
After the SCUC scheduling optimization, the power producers can calculate their profit based on the bidding result. .
In Equation ( 9), Proi,t is the profit of the ith unit at time t.This value is passed to the Q-learning model as an evaluation metric for the Q-learning model.

The Q-Learning-Based Modelling of Bidding Strategy Making
In the proposed framework, the section of bidding strategy-making is a critical part of the entire simulation.This section actually approximates the market behavior of each power producer, representing their decision-making process with historical bidding information.In this section, the Q-learning-based model is selected as a typical decisionmaking process.The reason is that the Q-learning-based model is a value-based reinforcement learning algorithm.It finds the optimal action strategy by analyzing historical bidding behavior and its effects.The profit and bidding coefficient of the nth bidding of the power producer are input into the agent model.The sub-state bidding coefficient that is more beneficial to itself is determined through the bidding strategy of the agent in the model and used for the n+1th market bidding.Figure 2 introduces the Q-learning-based model for the bidding strategy making module from Figure 1.

An Overview of the Q-Learning Algorithm
Q-learning is a form of model-free reinforcement learning that works by learning an action-value function that gives the expected result for taking a given action in a given state, following a policy.It provides agents with the capacity to learn to act optimally in Markovian domains by experiencing the consequences of their actions [31].With the application of the optimal Bellman operator, the state-action value can be obtained through the value iteration in Equation (10).
. (10) Q-learning comprises the learning agent, environment, states, actions, and rewards.In Equation (10), to implement Q-learning, consider s = [s1, s2, s3, …, sn] as a set of states of the learning agent, a = [a1, a2, a3, …, am] as a set of actions that the learning agent can execute, R as the reward or punishment resulting from executing an action a in state s, and α as the learning rate, which is typically set between zero and one.If α is close to zero, the previous knowledge learned becomes more important, whereas if it is close to one, the newly acquired information becomes more relevant instantly.In other words, setting it to

An Overview of the Q-Learning Algorithm
Q-learning is a form of model-free reinforcement learning that works by learning an action-value function that gives the expected result for taking a given action in a given state, following a policy.It provides agents with the capacity to learn to act optimally in Markovian domains by experiencing the consequences of their actions [31].With the application of the optimal Bellman operator, the state-action value can be obtained through the value iteration in Equation (10).
Q-learning comprises the learning agent, environment, states, actions, and rewards.In Equation (10), to implement Q-learning, consider s = [s1, s2, s3, . . ., s n ] as a set of states of the learning agent, a = [a1, a2, a3, . . ., a m ] as a set of actions that the learning agent can execute, R as the reward or punishment resulting from executing an action a in state s, and α as the learning rate, which is typically set between zero and one.If α is close to zero, the previous knowledge learned becomes more important, whereas if it is close to one, the newly acquired information becomes more relevant instantly.In other words, setting it to zero prevents the Q-table from being updated and therefore prevents any learning.Setting α to a high value, such as 0.9, enables rapid learning.γ denotes the discount factor, which is between zero and one.γ indicates the extent to which the agent's decision-making is influenced by future reward expectations.When γ is close to zero, only the current reward is considered; as γ approaches one, the future reward is given more weight than the immediate reward.Q(s, a) denotes the total cumulative reward gained by the learning agent, and max Q (s , a ) is the maximum Q value of all possible actions a in the next new state s .Using Equation (10), an updated Q-table, which is shown in Figure 2, is produced.

Action Selection
Arbitrary states of Q-learning are uniquely determined by the bidding coefficients and time (s i,t = (α i,t , t)).The agent acts before the next bidding in the market; therefore, the selection strategy is based on historical transaction information, including the data corresponding to the current bidding coefficient and time.
The action taken by the agent is defined by one of three major choices in which the agent decides to execute an action at a quotation below, equal to, or higher than the previous transaction price in the market environment.In this study, the agents learn an action-value function that provides the expected bidding price by selecting an action using an ε-greedy policy approach in a given state.Figure 3 shows the concept of the ε-greedy policy.ε is the exploration rate of Q-learning, which is generally set between 0 and 1.Every time an agent chooses an action, a random number, rand, is generated for comparison with ε.When rand < ε, the agent selects the action with the largest Q value in the corresponding state; otherwise, it randomly selects any action.This ensures that Q-learning has a breakthrough ability when it encounters a soft boundary.
Energies 2023, 16, x FOR PEER REVIEW 7 of 20 zero prevents the Q-table from being updated and therefore prevents any learning.Setting α to a high value, such as 0.9, enables rapid learning.γ denotes the discount factor, which is between zero and one.γ indicates the extent to which the agent's decision-making is influenced by future reward expectations.When γ is close to zero, only the current reward is considered; as γ approaches one, the future reward is given more weight than the immediate reward.Q(s, a) denotes the total cumulative reward gained by the learning agent, and max  ′ ( ′ ,  ′ ) is the maximum Q value of all possible actions  ′ in the next new state  ′ .Using Equation ( 10), an updated Q-table, which is shown in Figure 2, is produced.

Action Selection
Arbitrary states of Q-learning are uniquely determined by the bidding coefficients and time ( , = ( , , )).The agent acts before the next bidding in the market; therefore, the selection strategy is based on historical transaction information, including the data corresponding to the current bidding coefficient and time.
The action taken by the agent is defined by one of three major choices in which the agent decides to execute an action at a quotation below, equal to, or higher than the previous transaction price in the market environment.In this study, the agents learn an action-value function that provides the expected bidding price by selecting an action using an -greedy policy approach in a given state.Figure 3 shows the concept of the -greedy policy. is the exploration rate of Q-learning, which is generally set between 0 and 1.Every time an agent chooses an action, a random number, rand, is generated for comparison with .When rand < , the agent selects the action with the largest Q value in the corresponding state; otherwise, it randomly selects any action.This ensures that Q-learning has a breakthrough ability when it encounters a soft boundary.After the agent determines an action, it changes from the current state to another state, so the sub-state bidding coefficient can be obtained according to the current bidding coefficient.
In Equation (11),  , ′ is the sub-state bidding coefficient, Am is the value obtained by performing the mth action, and X is the bidding price boundary of the power producer.The update of the bidding coefficient is accompanied by an update to the quotation curve of the power producer.The updated quotation curve is then sent to the electricity market.After the market is cleared, the power producer can obtain a set of time-series profit value data, which is the feedback value.Therefore, the reward of the environment,

Reward Calculation and Update Q Value
After the agent determines an action, it changes from the current state to another state, so the sub-state bidding coefficient can be obtained according to the current bidding coefficient.
In Equation (11), α i,t is the sub-state bidding coefficient, A m is the value obtained by performing the mth action, and X is the bidding price boundary of the power producer.
The update of the bidding coefficient is accompanied by an update to the quotation curve of the power producer.The updated quotation curve is then sent to the electricity market.After the market is cleared, the power producer can obtain a set of time-series profit value data, which is the feedback value.Therefore, the reward of the environment, R i,t = Pro i,t , and the Q value in any state is updated using the following Bellman equa-tion [32].Based on this formula, the Q-table update for NT periods can be completed:

The Simulation Execution Process
A full detailed procedure for simulating Q-learning agents for the proposed bidding simulation framework is provided as follows: Initialize Q value (Q(s,a)) for all state-action pairs.Repeat (for each episode): For each time step t: Given state s i,t , take action a based on ε-greedy policy.Obtain reward R i,t , and reach new state s i,t .Update Q(s,a) using Equation ( 12).

Background
This paper selects a typical 9-bus system and a 33-bus system to verify the feasibility of the proposed scheme.Figure 4 shows the structure of both systems.
, =  , , and the Q value in any state is updated using the following Bellman equation [32].Based on this formula, the Q-table update for NT periods can be completed: .(12)

The Simulation Execution Process
A full detailed procedure for simulating Q-learning agents for the proposed bidding simulation framework is provided as follows: Initialize Q value (Q(s,a)) for all state-action pairs.Repeat (for each episode): For each time step t: Given state si,t, take action a based on ε-greedy policy.
Obtain reward Ri,t, and reach new state  , ′ .

Background
This paper selects a typical 9-bus system and a 33-bus system to verify the feasibility of the proposed scheme.Figure 4 shows the structure of both systems.Each system contains 2 scenarios: one is a case with monopoly and the other is a case without monopoly.The monopoly verification is rechecked by traditional methods in Table 1.In Table 2, Gen. NO. 2 stands for No. 2 power producer.The Gen of System exists monopoly when the result value is less than one in TCRSI and more than zero in MGR.In Table 2, the monopoly power producers in the 9-bus and 33-bus systems are listed, and result values of the MRG and TCRSI of the power producers in two different scenarios are also listed.The details of both systems are introduced in Appendix A. Table 2 introduces the Q-learning related parameters.Three actions are set by the agent's possible strategies, decreasing, maintaining, and increasing the value of the bidding coefficient.The exploration rate is set to 0.95.The literature suggests that the discount factor (γ) should be set to 0.1, and the learning rate parameter (α) is 0.9 [33].
In this section, two scenarios are selected for the case simulation analysis.In Scenario 1, there is a certain time when the maximum power generation capacity of the two power producers cannot meet the load demand.In Scenario 2, the maximum power generation capacity of any two power producers can meet the load demand at any time.The SCUC problem is formulated using YALMIP in MATLAB and solved using GUROBI 9.5.1.The agents in each scenario of the 9-Bus System and 33-Bus System executed 150,000 sets and 67,000 sets, respectively, and output simulation results every 24 h.

The Result Analysis of Both Systems
Figure 4 shows the result comparison between scenario 1 and 2 for the 9-bus system.Figure 5 shows the result comparison between scenario 1 and 2 for the 33-bus system.The following features can be revealed in the two comparisons above.

1.
Power producers with low power generation costs have the greatest advantage during the early stages of the simulation.They are in an environment where quotations are raised, profits are rapidly increasing, and they can occupy most of the market share during the long-term simulation process.In the example of the 9-bus system in Figure 5a-d, power producer 1 occupied a large market share in the long-term simulation process, accounting for more than half of the total share.Its profit growth rate was faster than that of other power generation companies at the initial stage, and the profit of power producer 1 reached a higher level than that of the other power producers.In the example of the 33-bus system in Figure 6a-d, compared with other power producers, power producer 1 and 6 had lower costs, so their profits at the beginning of simulation grew rapidly, and they occupied a certain share in the whole simulation process, accounting for approximately one-third of the total shares, respectively.2.
Power producers with low power generation costs must find suitable bidding strategies through long-term games with other power producers to bring profits to their companies.
As shown in Figures 5c,d and 6c,d, low-cost power producers, such as power producer 1 in the 9-bus system and power producer 1 and 6 in the 33-bus system, have large fluctuations in their profits in the long-term market simulation.If they cannot find an appropriate bidding strategy, they may have small or even no bid winning volume, leading to lower profits.Therefore, it is very important to find an appropriate bidding strategy.

3.
The agent will always choose a strategy that is conducive to increasing interest.In the simulation experiment of Scenario 1, when there is a structured monopoly in the electricity market and the supply and demand balance of the power grid needs to purchase power from a fixed electricity seller, the agent will continue to increase the quotation during the simulation process, and the benefits will continue to increase.
Then, there will be other sellers in the market to increase their quotations, and the income of each seller cannot converge, which makes the LMP continue to rise and the market collapses.Thus, the owner of this capacity will keep increasing the price or the claimed boundary cost to a very high level.There may be malicious bidding by producers in the market to drive prices up.As shown in Figures 5e,f and 6e,f, the average electricity price in Scenario 1 is much higher than that in Scenario 2, and the electricity price in Scenario 1 shows a continuous upward trend.4.
In the simulation experiments of Scenario 2, any generation section of any power producer can be replaced by other power producers.Before the optimal bidding strategy is found, the electricity price in the electricity market increases.However, through the strategy selection between the agents, the parties quickly reach convergence, the market tends to be balanced, and it becomes a competitive electricity market.The producers find the optimal quotation strategy in the market environment.As shown in Figures 5f and 6f, the average electricity price in the power market shows a continuous rising trend at the beginning of the simulation, with the 9-bus system increasing approximately one sixth of the initial price and the 33-bus system increasing approximately one fifth of the initial price.Through the strategic choice among agents, all parties reached a consensus quickly, and the market tended to be balanced, becoming a highly competitive electricity market.

5.
The successful rate of winning, assuming generation capacity, in the monopoly case is higher than that in the case without monopoly.It can be revealed by the degree of fluctuation in Figures 5a,b and 6a,b.Higher fluctuation in the market share indicates that even competitors with small market power will still obtain chances to win higher generation capacity, which represents a more fair environment in power market.In other words, competition in the case without monopoly is much heavier.

6.
The price fluctuation in the case with monopoly is smaller than in the case without monopoly.As shown in Figures 5e,f and 6e,f, after a certain period of simulation, the average price in the case without monopoly fluctuates sharply in a range due to fierce competition among the power producers.The reason for this phenomenon is that more heavy competition in the case without monopoly will lead to more risk in the winning capacity.Thus, the power producers will need to decrease their prices frequently when they face decreasing winning capacity.
winning capacity.Thus, the power producers will need to decrease their prices frequently when they face decreasing winning capacity.Energies 2023, 16, 434 13 of 19

Result of Hidden Monopoly Recognition and Effect of Price Boundary
To verify the advantages of the proposed scheme compared to the traditional methods, another case on a 6-bus system is selected for simulation.Figure 7 shows this 6-bus system, and its data is in Appendix B. For this 6-bus system, traditional 'Must-Buy Section' based models, such as TCRSI and MRG, recognize that there is no monopoly in it.However, the simulation result in Figure 8a shows that the monopoly behavior exists when bidding prices rise continuously.The reason for this interesting phenomenon is that the two power plants naturally form a grouping monopoly.Indeed, the bid-winning rule of the power market in the simulation performs similar to searching a minimum total cost solution with constraints satisfied.Thus, the winning capacity of all generations actually depends on the relative relationship among prices from generation plants rather than the absolute value of the price.For example, if the two generation plants in the 6-bus system increase 10 USD/kWh simultaneously, the winning capacity of each plant will not change.Thus, when both plants simultaneously increase their bidding price, they will find that their winning capacities have not changed.This signal encourages them to keep increasing their bidding prices and performs similarly to monopoly behavior.Though these two plants do not communicate with each other, this phenomenon shows that they naturally perform similar to a group with simultaneous behavior and form a kind of grouping monopoly.This phenomenon will seldom happen in those cases where there are many competitors, for behavior randomness makes it difficult for simultaneous behavior to occur.However, in cases with small generation groups, such as the case here, the randomness is small, and the probability of this grouping monopoly increases.Moreover, this situation cannot be recognized by traditional 'Must-Buy Section' methods.
of all power producers in Scenario 2. (e).The average LMP of all buses in Scenario 1. (f).The average LMP of all buses in Scenario 2.

Result of Hidden Monopoly Recognition and Effect of Price Boundary
To verify the advantages of the proposed scheme compared to the traditional methods, another case on a 6-bus system is selected for simulation.Figure 7 shows this 6-bus system, and its data is in Appendix B. For this 6-bus system, traditional 'Must-Buy Section' based models, such as TCRSI and MRG, recognize that there is no monopoly in it.However, the simulation result in Figure 8a shows that the monopoly behavior exists when bidding prices rise continuously.The reason for this interesting phenomenon is that the two power plants naturally form a grouping monopoly.Indeed, the bid-winning rule of the power market in the simulation performs similar to searching a minimum total cost solution with constraints satisfied.Thus, the winning capacity of all generations actually depends on the relative relationship among prices from generation plants rather than the absolute value of the price.For example, if the two generation plants in the 6-bus system increase 10 USD/kWh simultaneously, the winning capacity of each plant will not change.Thus, when both plants simultaneously increase their bidding price, they will find that their winning capacities have not changed.This signal encourages them to keep increasing their bidding prices and performs similarly to monopoly behavior.Though these two plants do not communicate with each other, this phenomenon shows that they naturally perform similar to a group with simultaneous behavior and form a kind of grouping monopoly.This phenomenon will seldom happen in those cases where there are many competitors, for behavior randomness makes it difficult for simultaneous behavior to occur.However, in cases with small generation groups, such as the case here, the randomness is small, and the probability of this grouping monopoly increases.Moreover, this situation cannot be recognized by traditional 'Must-Buy Section' methods.To prevent this phenomenon, one possible way is to set up a price boundary as a constraint to limit the monopoly behavior.The proposed framework can easily simulate the result of the price boundary setting.Figure 8b shows the result of the effect of a price boundary.Once the restrictions are imposed on the power producers, the bidding coefficients of Power producer 1 and 2 cannot rise indefinitely.The bidding coefficient of Power Producer 1 is stable at approximately 1.4, and that of Power Producer 2 is stable at approximately 1.2.To prevent this phenomenon, one possible way is to set up a price boundary as a constraint to limit the monopoly behavior.The proposed framework can easily simulate the result of the price boundary setting.Figure 8b shows the result of the effect of a price boundary.Once the restrictions are imposed on the power producers, the bidding coefficients of Power producer 1 and 2 cannot rise indefinitely.The bidding coefficient of Power Producer 1 is stable at approximately 1.4, and that of Power Producer 2 is stable at approximately 1.2.

Conclusions
This study proposes an agent-based bidding simulation framework to recognize monopolies in power markets.Using the proposed framework, the bidding behavior of electricity producers may change.Changes in the LMP in long-term electricity markets can be simulated to determine whether the electricity market is operating healthily.The results of the numerical study on the proposed framework show that in a power market with monopoly potential, the profit of the power producers does not converge, and the market price becomes unacceptable.Whereas in a power market without monopoly potential, power producers maintain competition, and the market remains active and healthy.For the structural design of the electricity market, the proposed framework can become a tool for market monitoring to detect the health of market operations.
Compared to traditional monopoly recognition methods, the largest advantages of the proposed method include three points.The first point is that the proposed framework can reveal the details of the market operation's evolution quantitatively.It not only reveals the existence of monopoly, but also shows the entire performance in the process of monopoly under a given behavior model.The second point is that the proposed framework can be easily used in the simulation of different bidding strategy behaviors by modifying the behavior module without making changes to the other modules, as well as to simulate different market environments by only changing the power market scheme module.This feature enables the proposed framework to be utilized widely in different power market analyses with low cost in modelling and programming.The third point is that the proposed framework can reveal more monopoly cases than traditional 'Must-Buy Section'based methods.
This method also contains limitations.The first limitation is that the entire simulation will cost more computational time and resources than traditional parsing methods.Additionally, the cost of computational resources will significantly increase when they simulate.
Author Contributions: Y.

Conclusions
This study proposes an agent-based bidding simulation framework to recognize monopolies in power markets.Using the proposed framework, the bidding behavior of electricity producers may change.Changes in the LMP in long-term electricity markets can be simulated to determine whether the electricity market is operating healthily.The results of the numerical study on the proposed framework show that in a power market with monopoly potential, the profit of the power producers does not converge, and the market price becomes unacceptable.Whereas in a power market without monopoly potential, power producers maintain competition, and the market remains active and healthy.For the structural design of the electricity market, the proposed framework can become a tool for market monitoring to detect the health of market operations.
Compared to traditional monopoly recognition methods, the largest advantages of the proposed method include three points.The first point is that the proposed framework can reveal the details of the market operation's evolution quantitatively.It not only reveals the existence of monopoly, but also shows the entire performance in the process of monopoly under a given behavior model.The second point is that the proposed framework can be easily used in the simulation of different bidding strategy behaviors by modifying the behavior module without making changes to the other modules, as well as to simulate different market environments by only changing the power market scheme module.This feature enables the proposed framework to be utilized widely in different power market analyses with low cost in modelling and programming.The third point is that the proposed framework can reveal more monopoly cases than traditional 'Must-Buy Section'-based methods.
This method also contains limitations.The first limitation is that the entire simulation will cost more computational time and resources than traditional parsing methods.Additionally, the cost of computational resources will significantly increase when they simulate.

Figure 2 .
Figure 2. The typical Q-Learning-based model for the bidding strategy making module from Figure 1.

Figure 2 .
Figure 2. The typical Q-Learning-based model for the bidding strategy making module from Figure 1.

Figure 3 .
Figure 3.The introduction of -greedy policy

Figure 4 .Figure 4 .
Figure 4.The 9-bus system and the 33-bus system used in the numerical study.(B stands for bus, L stands for load, and G stands for generator) Figure 4.The 9-bus system and the 33-bus system used in the numerical study.(B stands for bus, L stands for load, and G stands for generator).

Figure 5 .Figure 6 .
Figure 5. (a).The market share of all power producers in Scenario 1. (b).The market share of all power producers in Scenario 2. (c).The profits of all power producers in Scenario 1. (d).The profits Figure 5. (a).The market share of all power producers in Scenario 1. (b).The market share of all power producers in Scenario 2. (c).The profits of all power producers in Scenario 1. (d).The profits of all power producers in Scenario 2. (e).The average LMP of all buses in Scenario 1. (f).The average LMP of all buses in Scenario 2.

Figure 7 .
Figure 7.The 6-Bus System used in the numerical study.(B stands for bus, L stands for load, and G stands for generator)

Figure 7 .
Figure 7.The 6-Bus System used in the numerical study.(B stands for bus, L stands for load, and G stands for generator).

Figure 8 .
Figure 8. (a).The bidding coefficients of each power producer with no price boundary.(b).The bidding coefficients of each Power producer with a price boundary.
H.: Supervision, Methodology, Investigation, and Software.S.G.: Conceptualization, Software, and Formal Analysis.Y.W.: Software and Writing-Review and Editing.Y.Z.: Software and Validation.W.Z.: Validation and Writing-Original Draft Preparation.F.X.: Supervision, Project Administration, and Mathematical Modeling.C.S.L.: Code Debugging, Visualization, and Formal Analysis.A.F.Z.: Visualization and Data Curation.All authors have read and agreed to the published version of the manuscript.

Figure 8 .
Figure 8. (a).The bidding coefficients of each power producer with no price boundary.(b).The bidding coefficients of each Power producer with a price boundary.

Table 1 .
Scenarios introduction on both systems.

Table A4 .
Data of the base load in the 33-bus system.

Table A5 .
Bidding parameters of power producers in the 33-bus system.