Research on Multi-Agent Event-Triggered Control Algorithms for Power Systems

Chen, Yanming; Sun, Qiming; Zhang, Ying; Li, Chengxuan

doi:10.3390/app16115354

Open AccessArticle

Research on Multi-Agent Event-Triggered Control Algorithms for Power Systems

¹

College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China

²

Henan Key Laboratory of Cable Advanced Materials and Intelligent Manufacturing, Xinxiang 453003, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5354; https://doi.org/10.3390/app16115354

Submission received: 8 March 2026 / Revised: 20 May 2026 / Accepted: 22 May 2026 / Published: 27 May 2026

Download

Browse Figures

Versions Notes

Abstract

Multi-agent systems are widely used in modern power systems, but they face challenges such as low data utilization, stringent triggering conditions, and poor environmental adaptability. This study proposes a multi-agent event-triggered control method based on the Proximal Policy Optimization (PPO) policy gradient algorithm. By maximizing the cumulative reward, the agents are driven to learn adaptive triggering strategies, which reduces communication frequency while ensuring system stability. A multi-agent reinforcement learning model is constructed, and the training results show that both the single-episode reward and the average reward significantly increase with the number of training episodes, thus verifying the effectiveness of the algorithm. Based on Lyapunov stability and LaSalle’s invariance principle, an event-triggering threshold is designed using an exponential decay function. Moreover, the sequential decision-making process under uncertain environments is described using the Markov decision process. In the case study with six agents, the triggering conditions effectively constrain the error growth and ensure system stability. The method is further extended to a 33-node power system, where each node is regarded as an agent to simulate voltage fluctuations under load variations. Compared with periodic sampling control, the event-triggered control exhibits faster convergence speed, higher steady-state accuracy, and stronger anti-interference capability, thus confirming its superiority in complex power systems.

Keywords:

multi-agent systems; event-triggered control; PPO policy gradient; Markov decision process

1. Introduction

The power system is the core infrastructure supporting national economic and social development, and its transformation toward clean, low-carbon, safe, and efficient development has become a global strategic consensus under the “carbon peak and carbon neutrality” goals [1]. Driven by this transformation, renewable energy sources such as wind and photovoltaic (PV) power have been rapidly integrated into the grid, with China’s new energy installed capacity accounting for over 40% of the total installed capacity and power generation exceeding 15% as of 2023 [2]. The high penetration of renewable energy has fundamentally changed the operational characteristics of traditional centralized power systems, shifting them toward a distributed structure with the coordinated operation of source, grid, load, and storage. However, the inherent intermittency, volatility, and randomness of renewable energy pose severe challenges to the stable operation, voltage regulation, and dynamic response of power systems [3]. The spatial mismatch between renewable energy bases and load centers, coupled with the limitations of traditional centralized control methods in communication efficiency and dynamic adaptability, makes it urgent to develop efficient distributed intelligent control methods to address these practical engineering problems.

The overall structure and operational framework of the new power system are illustrated in Figure 1. Multi-agent systems (MAS) have emerged as a promising solution for the control of new power systems due to their distributed collaboration, autonomous decision-making, and flexible dynamic response capabilities [4]. By modeling each distributed energy unit, load node, or microgrid as an independent agent, MAS can achieve real-time coordination and optimal control of the power system through local information interaction, effectively adapting to the distributed characteristics of high renewable energy penetration power grids. Event-triggered control (ETC), as a resource-efficient control strategy, only initiates control updates and information communication when the system state deviates beyond a preset threshold, which significantly reduces communication overhead and computational consumption compared with traditional periodic sampling control [5]. The combination of MAS and ETC leverages the spatial coordination of multi-agents and the temporal efficiency of event-triggering, forming a distributed control framework that is well-suited for the complex and dynamic operational requirements of new power systems [6]. This multi-agent event-triggered control framework has become a research hotspot in the field of power system intelligent control, with broad application prospects in solving problems such as renewable energy grid integration, voltage fluctuation suppression, and power balance regulation.

Despite the promising application potential of multi-agent event-triggered control in power systems, existing research still faces several critical limitations that restrict its practical engineering deployment [7,8]. First, most existing event-triggered control strategies rely on accurate mathematical models of the power system, leading to poor adaptability in complex and uncertain operating environments with high renewable energy penetration. Second, the design of triggering conditions is relatively rigid, with fixed or simply adjusted thresholds that fail to dynamically adapt to the real-time operational state of the power system, resulting in low data utilization efficiency and difficulty in balancing control performance and communication resource consumption. Third, the integration of reinforcement learning and multi-agent event-triggered control for power systems is still in the preliminary stage; existing studies lack effective policy optimization algorithms to guide agents in learning adaptive triggering strategies, and the stability and convergence of the control system under uncertain environments need further theoretical and experimental verification. In addition, current research on multi-agent event-triggered control mostly focuses on small-scale simulation scenarios, and its applicability and superiority in large-scale actual power systems (e.g., node-level distribution networks) remain to be further validated [9]. These research gaps constitute the core problems to be solved in this study.

To address the above-mentioned research gaps, this paper proposes a multi-agent event-triggered control algorithm for power systems based on the Proximal Policy Optimization (PPO) policy gradient algorithm, aiming to improve the adaptability, stability, and resource efficiency of distributed control in high renewable energy penetration power systems. The core research questions addressed in this study are as follows: (1) How do we design a data-driven adaptive event-triggering strategy that breaks the dependence on accurate system models and enables dynamic adjustment of triggering thresholds based on the real-time operational state of the power system? (2) How do we construct a multi-agent reinforcement learning framework based on the PPO algorithm to drive agents to learn optimal collaborative triggering strategies, achieving a balance between control performance and communication overhead? (3) Can the proposed algorithm effectively suppress voltage fluctuations, accelerate system convergence, and enhance anti-interference capability in large-scale power system scenarios, and what is its performance superiority compared with traditional periodic sampling control?

To answer these questions, this study first constructs a multi-agent reinforcement learning model for power systems, using the Markov decision process (MDP) to describe the sequential decision-making process of agents under uncertain environments. Based on Lyapunov stability and LaSalle’s invariance principle, an exponentially decaying dynamic event-triggering threshold is designed to constrain the error growth of the system and ensure stable operation. The PPO policy gradient algorithm is introduced to optimize the joint policy of multi-agents, driving agents to maximize cumulative rewards and learn adaptive event-triggering strategies, thus reducing communication frequency while maintaining control performance. The effectiveness of the proposed algorithm is first verified through a six-agent basic simulation scenario, where the triggering conditions and system convergence characteristics are analyzed in depth. The algorithm is further extended to a 33-node power system to simulate voltage fluctuation control under load variations, with a comprehensive quantitative and qualitative performance comparison with traditional periodic sampling control in terms of convergence speed, steady-state accuracy, and anti-interference capability.

The main original contributions of this paper are summarized as follows: (1) A data-driven adaptive event-triggered control strategy is proposed, which abandons the dependence on accurate power system models and designs a dynamically adjustable triggering threshold based on an exponential decay function, effectively improving the system’s adaptability to uncertain environments and the utilization efficiency of real-time operational data. (2) A multi-agent reinforcement learning framework based on the PPO algorithm is constructed for power system control, which takes consistent performance and communication cost as the joint reward function, guides multi-agents to learn collaborative optimal triggering strategies, and achieves a trade-off between control accuracy and communication resource consumption with guaranteed system stability. (3) The proposed algorithm is validated in both small-scale multi-agent scenarios and a large-scale 33-node power system, and the results show that it has faster convergence speed, higher steady-state accuracy, and stronger anti-interference capability compared with traditional periodic sampling control, providing a feasible technical solution for the distributed intelligent control of new large-scale power systems with high renewable energy penetration.

The remainder of this paper is organized as follows: Section 2 constructs a multi-agent system model for power systems, elaborates on the power generation characteristics of wind, PV, and energy storage systems, and builds a multi-agent reinforcement learning model based on the PPO algorithm. Section 3 designs a multi-agent event-triggered control framework, including the MDP modeling of sequential decision-making, the design of PPO-based event-triggering conditions, and the detailed implementation process of the algorithm. Section 4 conducts case studies on six-agent and 33-node power system scenarios, analyzes the control performance of the proposed algorithm from multiple dimensions, and carries out a rigorous comparative analysis with periodic sampling control while discussing the algorithm’s limitations and applicable assumptions. Finally, Section 5 summarizes the main research results of the paper, points out the shortcomings of the study, and prospects future research directions to further improve the algorithm’s engineering applicability.

2. Construction of Multi-Agent System

2.1. Basic Structure of Multi-Agent Systems

The coordinated operation of multiple microgrids and distribution systems is gradually breaking through the underlying logic of traditional power system dispatch. Multi-agent systems endow each distributed unit with a certain degree of decision-making autonomy, forming a more optimized collaborative effect at the system level. The unique aspect of this architecture is that it does not aim for precise control of each node. Instead, it relies on establishing game rules between agents to drive the system towards an optimal state in dynamic equilibrium. Traditional power system control methods emphasize comprehensive control of global information. However, multi-agent systems demonstrate that appropriate interaction mechanisms can produce superior system responses with limited information sharing. Distributed intelligent algorithms enhance the efficiency of dispatch operations. Local faults do not trigger cascading failures, significantly improving the overall stability of the power system during dynamic adjustments. This approach allows the system to transform volatility into adaptive regulation.

MAS can effectively coordinate the grid integration and operation of distributed energy sources, such as photovoltaic and wind power, and energy storage. Each energy unit acts as an independent agent, achieving power balance and voltage regulation through local information interaction, thus addressing the uncertainty brought by high penetration of renewable energy. For example, in a microgrid, MAS can enhance system flexibility. By modeling load devices as agents, MAS can aggregate a large number of flexible load resources. Each load agent autonomously adjusts its electricity consumption strategy based on price signals and local constraints, achieving peak shaving and valley filling while ensuring user comfort. Practical applications show that this distributed control method is more scalable and robust than traditional centralized scheduling. When a system failure occurs, MAS can quickly identify the fault area and independently formulate a recovery strategy through multi-agent collaboration. Protection device agents, based on local information interaction, achieve adaptive protection coordination, reducing outage time. This distributed self-healing capability is crucial in distribution networks with distributed power sources.

2.2. Multi-Agent Reinforcement Learning-Based Model

Multi-Agent Reinforcement Learning (MARL) is a key branch of reinforcement learning that focuses on multiple autonomous decision-making agents learning optimal policies within a shared environment. In such settings, each agent must consider the state of the environment and how its own actions can maximize its long-term rewards, while also accounting for the presence and behavior of other agents that may influence its strategy [9]. This model simulates and studies the cooperative reinforcement learning process among multiple agents. Building such a model offers the advantage of intuitively presenting the dynamics of multi-agent systems, facilitating the understanding and debugging of complex interactive behaviors.

2.2.1. Multi-Agent Decision-Making Process

In multi-agent reinforcement learning, each agent learns an optimal policy through interactions with the environment and other agents, aiming to maximize collective benefit. The decision-making process of multiple agents is modeled using a Markov model, and the Markov decision process for N agents is represented in tuple form. By defining a joint reward function and a Markovian state-transition probability function, data observation for each agent can be achieved.

Assuming that each agent has

m

actions to choose from under real-time monitoring, Equation (1) represents the action space of the

i

agent. Equation (2) denotes the joint action function of all agents.

a_{i} \in \{a_{i}^{1}, \dots, a_{i}^{c}\}

(1)

a = (a_{i}, \dots, a_{N})

(2)

Designing a joint reward function is the core mechanism for solving cooperation and competition problems in multi-agent systems. If the system relies solely on individual reward strategies, it is prone to fall into the trap of maximizing local benefits while deviating from the overall goal. By incorporating a joint reward, the global optimal objective is embedded into the reward function, which improves system stability and forces agents to focus on the collective interest. The joint reward function is established as shown in Equation (3), where

P (\cdot | \hat{S}, a)

represents the Markov state transition probability function, and γ denotes the discount factor. To monitor and record the operating states of each agent in real time, the historical observation-action data of the

i

agent is denoted as

τ_{i} \in T \equiv {(O \times A)}^{*}

.

R : S \times A \to r_{i}^{t}

(3)

A stochastic policy is introduced by defining an action probability distribution (e.g., a Boltzmann distribution) that models the likelihood of an agent selecting a particular action. This allows agents to explore the environment in a non-deterministic manner and discover potentially high-reward paths. The action probability distribution provides a differentiable optimization objective for policy-gradient methods, as shown in Equation (4). Policy parameters θ are updated via gradient ascent, as expressed in Equation (5).

π (a | s) = P (a_{t} = a | s_{t} = s)

(4)

𝛻_{θ} J (θ) = E [𝛻_{θ} \log π_{θ} (a | s) Q (s, a)]

(5)

Under the risk-aware decision-making framework of multi-agent reinforcement learning, the system still learns a joint policy with the core objective of maximizing returns, but the policy-optimization process must incorporate risk factors as key constraints. Let

π = (π_{i}, π_{- i})

denote the set of policies of all agents except the current agent

π_{- i}

The search is driven mainly by two types of objective functions: one is a joint value function based on cooperative benefits, as shown in Equation (6); the other focuses on policy optimization under individual risk constraints, expressed as a joint action-value function in Equation (7).

V (π, \hat{s}) = E (\sum_{t = 0}^{\infty} γ^{t} r^{t} | (π_{i}, π_{- i}), \hat{s})

(6)

Q (π, \hat{s}, a) = R (\hat{s}, a) + γ E_{{\hat{s}}^{'}} (V (π, {\hat{s}}^{'}))

(7)

2.2.2. Building a Simulink Model Based on Multi-Agent Systems

A multi-agent simulation model is built in Simulink (MATLAB R2018b), as shown in Figure 2. The core architecture of this simulation model consists of AgentA, AgentB, and the environment module MyEnv, achieving multi-agent collaborative simulation through a modular design. Each agent module integrates a policy decision unit. Its input ports U and Y receive environmental states and external signals, while the output ports observation, action, reward, and isDone represent state observation, action output, immediate reward, and termination condition, respectively. The environment module MyEnv, serving as the core of the dynamic system, triggers a state transition mechanism upon receiving the agents’ actions. It calculates the next moment’s system state based on physical equations or data-driven models and feeds back the updated environmental parameters and reward signals. The interaction module achieves information coupling among agents through a signal bus. On the basis of supporting observation-based indirect collaboration and sharing local state information, it can also build explicit communication channels to transmit policy parameters.

In this Simulink model, two agent modules—AgentA and AgentB—are configured. They adopt a distributed decision-making mechanism, where each agent independently executes its policy network to compute a stochastic policy using the probability distribution output by the PPO algorithm. The generated actions are transmitted to the environment module via the bus. The policy configuration network is set up as a stochastic policy based on the probability distribution output by the PPO algorithm.

2.2.3. Multi-Agent Reinforcement Learning Based on the PPO Algorithm

PPO, or Proximal Policy Optimization, is a policy-based reinforcement learning algorithm. It aims to improve training stability and efficiency by constraining the magnitude of policy updates [10]. Its main objective is to optimize expected returns, and its core idea is to use a clipping method to limit the step size of policy gradient updates, thereby deriving update rules for various reinforcement learning parameters. PPO performs mini-batch updates over multiple training steps, successfully addressing the step-size selection problem present in traditional policy gradient methods [11]. Moreover, by restricting the policy update size, PPO enhances training stability [12]. The objective function of PPO can be expressed as Equation (8).

\begin{array}{l} L_{t o t a l} = E_{t} [\min (r_{t} (θ) A_{t}, c l i p (r_{t} (θ), 1 - ε, 1 + ε) A_{t})] \\ + c_{1} \cdot E_{t} [{(V (s_{t}) - R_{t})}^{2}] - c_{1} \cdot [\sum_{a} π (a| s_{t}) \log π (a| s_{t})] \end{array}

(8)

Among them,

r_{t} (θ)

represents the probability ratio as shown in Equation (9), representing the probability ratio of choosing actions under the same state for new and old strategies.

A_{t}

represents the advantage function as shown in Equation (10). Through clipping operation clip(

r_{t} (θ)

,

1 - ε

,

1 + ε

),

r_{t} (θ)

is limited within the range [

1 - ε

,

1 + ε

], and to prevent strategy mutation

ε

is usually set to 0.2. The clipping objective function is shown in Equation (15), which prevents the strategy update amplitude from being too large while ensuring that the intelligent body is continuously updated.

c_{1}

and

c_{2}

are weight coefficients used to control the update of the previous strategy and the convergence of later operations.

r_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{old}} (a_{t} | s_{t})}

(9)

In the equation,

π_{θ} (a_{t} | s_{t})

denotes the probability of the agent executing action

a_{t}

in the current state

s_{t}

under the new policy;

π_{θ_{old}} (a_{t} | s_{t})

denotes the probability of executing the same action in the same state under the old policy and;

r_{t} (θ)

represents the degree of policy change for the agent in different environments.

A_{t} = Q (s_{t}, a_{t}) - V (s_{t})

(10)

To enable the Critic to estimate state values more accurately and thus provide more reliable policy optimization guidance, the value function is updated by minimizing the mean squared error, as shown in Equation (11).

L^{V F} (θ) = E_{t} [{(V (s_{t}) - R_{t})}^{2}]

(11)

In the equation, as shown in Equation (12),

V (s_{t}; θ)

represents the current value-function prediction and serves as the Critic network’s estimate of the long-term value of the current state.

R_{t}

denotes the total cumulative return that the agent can actually obtain starting from that state, as shown in Equation (13). By combining immediate rewards with long-term objectives, the algorithm can escape the trap of local optima.

V (s_{t}; θ) = E [\sum_{k = 0}^{\infty} γ^{k} r_{t + k} | s_{t}]

(12)

In the equation, γ is the discount factor, used to evaluate the correlation between current rewards and future rewards.

L_{ent} = - E_{t} [\sum_{a} π (a | s_{t}) \log π (a | s_{t})]

(13)

Because the interplay of cooperation and competition among agents sharply increases the complexity of the action–state space, a single loss function can no longer cover all optimization objectives. Therefore, a total loss function is introduced to avoid conflicts in multi-objective optimization. In PPO’s total loss function, the entropy-regularization term is combined with the policy-gradient loss and the value-function loss to form the overall objective, as shown in Equation (14).

L_{total} = L_{actor} + c_{1} \cdot L_{critic} - c_{2} \cdot L_{ent}

(14)

The specific operation process of the PPO algorithm is as follows:

Initialization: Initialize the policy network (actor), the value network (critic), the optimizer, and other relevant configuration parameters.

Sampling: Use the current policy to interact with the actual environment and collect data on the multi-agent states, actions, rewards, and next states.

Compute the advantage function: Using the advantage function, evaluate each agent’s action in a given state to measure how much better or worse it is than the average performance, and then progressively improve the policy through guidance.

Policy update: Use the clipping method to compute the probability ratio between the new and old policies, and constrain it within a specific range to prevent excessive policy changes. If the change exceeds the allowed magnitude, a penalty is applied to ensure the update remains moderate.

Value-function update: Optimize the value function using its loss—by minimizing the mean-squared error—to update the value network so that it estimates state values more accurately.

Repeat the above steps: through multiple iterations, the policy is gradually optimized until convergence is achieved.

After building the multi-agent model in Simulink, the reinforcement-learning (MARL) algorithm is invoked through a MATLAB Function block to train the agents’ actions. Once the training parameters are configured, the PPO network is constructed, training and test simulations are carried out, and the behavior of the two agents is evaluated—thereby integrating Simulink simulation with MATLAB algorithms.

To guarantee real-time interaction between the agents and the environment, the demoMARL block inside the Simulink model defines the environment and a timing parameter sets the training duration. The training hyperparameters are configured as follows: a sample time of 0.1 s and a total simulation horizon of 100 s. By binding the Simulink model to the reinforcement-learning interface, multi-agent interaction is achieved. For each agent, an independent Actor (policy network) and Critic (value network) are constructed [13]. The Actor network uses a multi-layer perceptron (MLP) to map observations to an action-probability distribution, while the Critic network evaluates the state value so that the Generalized Advantage Estimation (GAE) can be computed [14]. PPO’s stabilization mechanisms are employed to refine the agents’ policies: a ClipFactor enforces PPO’s core clipping operation, limiting the magnitude of policy updates [15]; GAE balances bias and variance; and an entropy-regularization term prevents the policies from prematurely converging to local optima [16]. The detailed execution steps are as follows:

In the training setup, the sample time, total time, and maximum number of steps are defined. (1) Launch the Simulink model, set the number of agents to 2, the observation-space size to 1, and represent the action space with the discrete set {0, 1}; create the environment object env to interface with the AgentA and AgentB blocks in Simulink. (2) Loop to construct the Actor and Critic networks for the two agents. The Actor network consists of an input layer, several fully connected layers with ReLU activations, and a final softmax layer that outputs action probabilities. The Critic network outputs a single scalar that estimates the agent’s state value. (3) Configure the PPO algorithm with an entropy-loss weight of 0.2, create the two PPO agents, and start training. (4) Training options specify 10 episodes with a maximum of 1000 steps per episode and enable the training-progress display. After training, testing is carried out with the sim function to run the environment, and the experience data are saved. The progress plot in Figure 3 illustrates the policy-optimization process during training and the control performance of the agents during testing.

Figure 3 shows that as the number of training episodes increases, the reward obtained in a single episode gradually rises, indicating continuous improvement in the agents’ performance. Meanwhile, the average reward remains relatively stable, demonstrating the strong stability of the multi-agent reinforcement learning system.

2.3. Section Summary

This Section begins with a brief overview of the new power system and the structure of multi-agent systems. It then uses wind power, photovoltaic power, and energy storage systems as examples to illustrate the power generation characteristics of the new power system. After analyzing multi-agent decision-making, a Simulink model based on multi-agent reinforcement learning is constructed, enabling an intuitive demonstration of the dynamics of multi-agent systems and facilitating the understanding and debugging of complex interactive behaviors. Finally, the Section introduces multi-agent reinforcement learning based on the PPO algorithm, which improves the stability of multi-agent training by limiting the magnitude of policy updates.

3. Multi-Agent Event-Triggered Control Design

Building on Section 2, this Section proposes a design for multi-agent event-triggered control. To optimize the communication mechanism of multi-agent systems, multi-agent event-triggered control can reduce the consumption of communication resources among various agents while ensuring that the system maintains high performance. By integrating distributed collaboration with event-driven mechanisms, multi-agent event-triggered control significantly reduces communication costs while ensuring system performance.

3.1. Markov Decision Process

An MDP is a framework for reinforcement learning used to describe sequential decision-making by an agent in an uncertain environment. In multi-agent systems, each agent’s state transition and reward function depend on the actions of other agents. Specifically, the decision-making process for an individual agent’s actions can be considered a local Markov decision process.

An MDP can be defined by the tuple

M = (S, A, P, R, γ)

. In this quintuple,

S

represents the state space,

A

represents the action space,

P : S \times A \to Δ (S)

represents the transition probability, defined as the probability of transitioning to state

S

under the condition that the current state is

s \in S

and action

a \in A

is taken;

R : S \times A \times S \to R

is the reward function, representing the immediate reward the agent receives after transitioning from

(s, a)

to

s^{'}

;

γ \in [0, 1)

is used to evaluate the discount factor of the immediate reward and the impact of future rewards [17].

In multi-agent systems, MDP is often used to describe the decision-making process of an agent with full observability of the system state. At each time step t, the agent in state

s_{t}

chooses an action

a_{t}

, causing the system to transition to

s_{t + 1} \sim P (\cdot | s_{t}, a_{t})

and the agent receives an immediate reward

R (s_{t}, a_{t}, s_{t + 1})

. The goal of solving the MDP is to find a policy

π : S \to Δ (A)

, that is, a mapping from the state space

S

to the action space

A

. In this state, under

a_{t} \sim π (\cdot | s_{t})

, the cumulative discounted reward is defined as shown in Equation (15).

E [\sum_{t = 0}^{\infty} γ^{t} R_{t} (s_{t}, a_{t}, s_{t + 1}) | a_{t} \sim π (\cdot | s_{t}, s_{0})]

(15)

The value function under the policy and the action value function are defined by Equations (16) and (17), respectively.

V_{π} (s) = E [\sum_{t \geq 0}^{\infty} γ^{t} R_{i} (s_{t}, a_{t}, s_{t + 1}) | a_{t} \sim π (\cdot | s_{t}), s_{0} = s]

(16)

V_{π} (s) = E [\sum_{t \geq 0}^{\infty} γ^{t} R_{i} (s_{t}, a_{t}, s_{t + 1}) | a_{t} \sim π (\cdot | s_{t}), s_{0} = s]

(17)

Here,

s \in S

,

a \in A

, and Equations (16) and (17) represent the expected cumulative discounted rewards of the system starting from state

s_{0} = s

and

(s_{0}, a_{0}) = (s, a)

, respectively.

3.2. Multi-Agent Event-Triggered Control Based on PPO Policy Gradient Algorithm

In multi-agent event-triggered control based on the PPO policy gradient algorithm, the Markov decision process provides a foundation for modeling the decision-making of each agent. Based on the Markov decision, the state space includes the agent’s own state, the communication state of neighboring agents, and environmental information. The action space is defined as whether to trigger communication or update control inputs. The reward function combines consistency performance metrics and communication costs. PPO, as an efficient policy gradient algorithm, optimizes the policy network to directly learn stochastic policies that handle continuous action spaces and high-dimensional state spaces, meeting the needs of dynamically adjusting triggering strategies in multi-agent systems. The core of event-triggered control is to have agents update control states only when the state error exceeds a threshold. The PPO algorithm drives agents to learn adaptive triggering strategies by maximizing cumulative rewards, achieving on-demand triggering in distributed collaborative tasks, balancing control performance with communication overhead, significantly reducing communication frequency while ensuring system performance.

The specific architecture of multi-agent distributed event-triggered control based on the PPO algorithm is shown in Figure 4. It autonomously optimizes the triggering mechanism through a data-driven approach, providing an effective application path to address communication efficiency issues in large-scale distributed control.

In the multi-agent event-triggered control based on the PPO policy gradient algorithm, two variables are defined in the event-triggering condition part. These represent whether agent

i

communicates with agent

j

and whether agent

i

receives the

m

observation value

o_{j}^{m}

from agent

d r_{i}^{m}

. The formulas for calculating the triggering conditions are shown in Equations (18) and (19).

c r_{i}^{j} = \{\begin{array}{l} 1, c_{i}^{j} > λ_{1} \\ 0, e l s e \end{array}

(18)

d r_{i}^{j} = \{\begin{array}{l} 1, d_{i}^{m} > λ_{2} \\ 0, else \end{array}

(19)

Based on the aforementioned event-triggering conditions, the elements of the observation information ultimately input to the agent can be updated according to Equation (20).

o_{i}^{j m} \leftarrow \{\begin{array}{l} 0_{j}^{m}, c r_{i}^{j} \cdot d r_{i}^{m} = 1 \\ - 1, else \end{array}

(20)

The joint policy calculation formula is shown in Equation (21).

π_{i} (u_{i}, c_{i}, d_{i} | O_{i}) = π_{i} (a_{i} | O_{i})

(21)

Distributed event-triggered control achieves rapid convergence of the agents’ reward values by continuously optimizing the structure and parameters of the policy network [18]. To encourage agents to achieve a balance between motion processes and communication costs, a reward function is proposed as shown in Equation (22), representing the total reward obtained by the agent at each moment [19]. The reward for the agent’s motion control is shown in Equation (23).

r_{i} = r_{i}^{mot} + r_{i}^{col} + r_{i}^{com}

(22)

r_{i}^{mot} = - σ ∥ x_{i}^{*} - x_{i} ∥_{2}

(23)

To prevent collisions due to agents being too close to each other during motion exploration, a collision penalty term for the agents is designed as shown in Equation (24).

r_{i}^{c o l} = - \sum_{k}^{N_{o b}} \sum_{j = 1, j \neq i}^{N} α (e^{- β x_{j} - x_{i 2}} + e^{- β x_{k}^{o b} - x_{i 2}})

(24)

Here,

α, β > 0

,

∥ x_{j} - x_{i} ∥_{2}

represents the straight-line distance between any two distinct agents, and

∥ x_{k}^{o b} - x_{i} ∥_{2}

represents the straight-line distance between the agent and the O-th obstacle. Since the key terms in the calculation are relative distances, it follows that the penalty intensity is inversely proportional to both the distance between agents and the relative distance between the agent and the obstacle.

To reduce communication among agents and achieve the goal of lowering communication costs, this paper imposes a penalty on agent communication. The collision penalty term is designed as shown in Equation (25).

r_{i}^{com} = - η (| | c r_{i} | |_{1} + | | d r_{i} | |_{1})

(25)

The stability of the system is theoretically guaranteed by control-theoretic design, not by learning alone. The event-triggering threshold is derived from Lyapunov stability theory and LaSalle’s invariance principle using an exponential decay function, which provides a provably safe upper bound for system state errors. The PPO policy learns adaptive triggering strategies within this stable envelope to optimize communication efficiency, while the threshold ensures all system trajectories remain within safe bounds and the system converges asymptotically. Therefore, the learned policy does not compromise stability; instead, it improves efficiency under strict stability constraints.

The exponential decay function-based event-triggered mechanism we designed inherently avoids Zeno behavior, and no additional mathematical proof or minimum inter-event time analysis is required, for three core reasons. First, the exponentially decaying triggering threshold is strictly positive with a non-vanishing lower bound (converging to a fixed positive value instead of zero), and agent state errors are bounded by the physical constraints of the power system (e.g., 198 V–242 V node voltage fluctuation), precluding infinite threshold crossings in finite time. Second, the mechanism is rooted in Lyapunov stability theory and LaSalle’s invariance principle; the non-increasing Lyapunov function and positive threshold lower bound form a well-established theoretical guarantee for a positive minimum inter-event time in control theory. Third, practical power system engineering constraints—including inherent physical communication delays and slow-varying state dynamics caused by power equipment inertia—create a hard physical barrier against infinitely frequent triggering events.

Collectively, the mechanism’s design, theoretical framework and engineering application context jointly ensure anti-Zeno behavior as an inherent natural property, making extra independent verification unnecessary.

In the presented multi-agent event-triggered control framework for modern power systems, each agent only obtains local state information rather than global grid-wide states. The state space of each agent consists of its local physical quantities (node voltage, load, distributed generation output) and limited neighboring information exchanged through event-triggered communication, under a distributed communication topology.

Such a local observation scheme does not impair global optimality. Instead, it is guaranteed by three key designs: the exponential-decay event-triggered mechanism ensures effective transmission of critical local information; the PPO joint reward function embeds global objectives into local decision-making; and the Lyapunov-based consensus protocol drives all local states to converge globally. Meanwhile, local observation improves robustness, scalability, and communication efficiency. Simulation results on the 33-bus system confirm that the proposed method achieves faster global convergence, smaller error, and strong anti-interference performance while reducing communication costs.

Stability is guaranteed by Lyapunov stability theory and LaSalle’s invariance principle.

The event-triggering threshold uses an exponential decay function with a strictly positive lower bound.

This inherently avoids Zeno behavior because physical inertia and minimum inter-event time prevent infinite triggers in finite time.

The closed-loop stability of the proposed method is theoretically guaranteed by Lyapunov stability theory and LaSalle’s invariance principle, independent of the PPO learning process. The exponential decay event-triggering threshold provides a strict safety envelope for system state errors. The PPO algorithm only learns adaptive triggering strategies within this stable region to optimize communication efficiency, without violating stability constraints. Zeno behavior is inherently avoided due to the strictly positive lower bound of the threshold and physical inertia of power systems. Thus, the learned policy preserves closed-loop stability and ensures safe operation under operational uncertainties.

3.3. PPO Strategy Gradient Algorithm-Based Multi-Agent Event-Triggered Control Procedure

After setting up the multi-agent environment, initialize the training parameters, and then randomly initialize the multi-agent scenario. Enter the event-triggered control module to use the event-triggered conditions to determine whether communication occurs between each agent and to calculate control inputs and communication decisions in conjunction with the strategy network. In the distributed event-triggered control, the initial triggering threshold is set to 15% of the maximum value of the state space and binds the Lyapunov function decay rate as the dynamic adjustment strategy. The joint reward of the multi-agent system needs to be normalized to [−1, 1], and the cooperative reward is multiplied by a coefficient of 0.8, while the competitive reward is multiplied by a coefficient of 1.2 to strengthen competitive guidance.

As each agent interacts with its environment, update and store the observation information and related data of each agent to reach the system’s maximum data processing capacity. Use the PPO strategy gradient algorithm to train the agents and continuously update network parameters and communication decisions, encouraging agents to increase their motion probability. Over time, the multi-agent system will gradually tend towards a stable state until the training episodes reach the maximum and training ends.

Key implementation details are listed below to ensure full reproducibility:

The actor network adopts a three-layer MLP with ReLU activation and softmax output; the critic network uses a three-layer MLP with ReLU activation.

PPO hyperparameters: learning rate = 3 × 10⁻⁴, clip factor = 0.2, discount factor γ = 0.99, generalized advantage estimation λ = 0.95.

Event-triggered threshold: initial value = 15% of the state range, exponential decay rate = 0.05.

Reward function: weighted sum of consensus error, communication cost, and safety penalty.

Simulation step size = 0.1 s, total simulation time = 100 s.

Software environment: MATLAB R2023b with Simulink and Reinforcement Learning Toolbox.

3.4. Stability and Anti-Zeno Behavior Analysis

This subsection presents rigorous theoretical results to prove the global asymptotic stability and exclude Zeno behavior for the proposed multi-agent event-triggered control system.

3.4.1. Notations and Preliminaries

Consider a power system modeled as a multi-agent system with N nodes (agents). Let

G = (V, ξ, A)

be a connected undirected communication graph.

x_{i} (t) \in R

: state of agent i (node voltage)

N_{i}

: neighbor set of agent i

a_{i j} > 0

: weight if

(i, j) \in ξ

, else 0

δ_{i} (t) = \sum_{j \in N_{i}} a_{i j} (x_{i} (t) - x_{j} (t))

: consensus error

e_{i} (t) = a_{i j} (x_{i} (t) - x_{j} (t))

: event-triggered error

T_{i} (t) = θ_{i} e^{- σ_{i} t} + ε_{i}

: exponential-decay event threshold

Triggering condition:

‖e_{i} (t)‖ \geq T_{i} (t)

where

θ_{i} > 0

,

- σ_{i} > 0

,

ε_{i} > 0

.

3.4.2. Global Asymptotic Stability

Theorem 1.

Under the proposed event-triggered control strategy, the closed-loop system is globally asymptotically stable, and all agents achieve consensus asymptotically.

Proof.

Choose the Lyapunov function:

V (t) = \frac{1}{2} \sum_{i = 1}^{N} {δ_{i}}^{T} (t) δ_{i} (t)

(26)

It is positive definite and radially unbounded.

Differentiate

V (t)

:

\dot{V} (t) = \sum_{i = 1}^{N} {δ_{i}}^{T} (t) {\dot{δ}}_{i} (t) = - \sum_{i = 1}^{N} {δ_{i}}^{T} (t) δ_{i} (t) \leq 0

(27)

Since

\dot{V} (t) \leq 0

and

V (t) \geq 0

, by LaSalle’s invariance principle,

\lim_{t \to \infty} δ_{i} (t) = 0

. The system is globally asymptotically stable. □

3.4.3. Anti-Zeno Behavior Proof

Theorem 2.

The event-triggered mechanism strictly excludes Zeno behavior. The minimal inter-event time is lower bounded by a positive constant.

Proof.

The threshold satisfies:

T_{i} (t) \geq ε_{i} \geq 0

(28)

Due to the physical inertia of power components, the state derivative is bounded:

‖{\dot{x}}_{i} (t)‖ \leq M_{i} \leq \infty

(29)

Thus

‖{\dot{e}}_{i} (t)‖ = ‖{\dot{x}}_{i} (t)‖ \leq M_{i} \leq \infty

(30)

For consecutive triggering times

t_{k}^{i}

,

t_{k + 1}^{i}

:

‖\int_{t_{k}^{i}}^{t_{k + 1}^{i}} {\dot{e}}_{i} (t) d τ‖ \geq ε_{i}

(31)

M_{i} (t_{k + 1}^{i} - t_{k}^{i}) \geq ε_{i}

(32)

The minimal inter-event time is:

τ_{\min}^{i} = \frac{ε_{i}}{M_{i}} > 0

(33)

Zeno behavior is strictly excluded. □

4. Application Scenarios and Case Study Analysis

4.1. Case Study Analysis

For comprehensive performance validation, the proposed method is compared with three representative baseline schemes:

Classical event-triggered control with fixed threshold;

Adaptive event-triggered control with time-varying threshold;

Multi-agent reinforcement learning method based on the DDPG algorithm.

Performance comparisons include convergence speed, communication frequency, steady-state accuracy, and anti-interference capability.

All experiments are implemented in MATLAB/Simulink, a deterministic simulation platform. Under identical parameters, initial states and solver settings, the results are highly consistent and reproducible without obvious randomness; thus, single-run results are sufficiently reliable.

A case study test of multi-agent event-triggered control was conducted with six agents. The multi-agent system was first initialized. The connection relationships among the six agents were defined using a Laplacian matrix L, with the diagonal elements representing the degree of each node. To test the system’s convergence capabilities under non-uniform initial conditions, the agents were dispersedly laid out, and their two-dimensional states were significantly distinct. The parameter-controlled threshold function decay rate was adjusted to allow for larger errors initially, with stricter error growth constraints applied later. The output results are shown in Figure 5, Figure 6 and Figure 7.

Figure 5 illustrates that under the control of event-triggered conditions, as time increases, the triggering moments corresponding to different agents vary. The density of points in the graph represents how multi-agents continuously adjust their strategies to avoid the trap of threshold decay occurring too quickly, ensuring that the system converges stably in the later stages.

From Figure 5, it can be concluded that the triggering moments of different agent nodes are unevenly distributed, with frequent triggering moments during the system adjustment phase and sparse triggering moments during the system stabilization phase.

As shown in Figure 6, the curves of different colors represent different agents. As the running time increases, the six agents experience a brief oscillation and enter a stable phase after the decay of the triggering event condition threshold. Eventually, the state values of the six agents all tend towards 0.5, indicating that the system has reached stability.

Figure 6 demonstrates that under the event-triggered mechanism, the states of multiple agents can still converge at non-uniform triggering times, and the multi-agent system can achieve consensus.

Figure 7 consists of six subplots, each representing the relationship between the error of each agent and the dynamic threshold under event-triggered control. The intersection points of the curves in the graph are the moments when events are triggered. When the error curve exceeds the set threshold, an event is triggered. Since the threshold changes exponentially over time, larger errors due to the system not reaching a stable state in the initial phase will lead to frequent triggering. In the later stages of control, as the system gradually stabilizes, the intervals between triggering become longer. This graph shows that the real-time error of each agent is always kept below the exponentially decaying threshold, confirming that event triggering can effectively control the multi-agent system. The triggering conditions effectively constrain error growth, ensuring the effectiveness of control inputs and the stability of the system.

The proposed method combines control-theoretic stability guarantees with PPO-based adaptive optimization. Stability is ensured by Lyapunov-based event-triggering conditions, and the learned policy operates within provably safe bounds to reduce communication while maintaining high control performance.

4.2. Scenario Extension

The 33-node power system employed in this study is a 220 V low-voltage radial secondary distribution system, which is not a longitudinal (long-line) distribution system. The radial topology is the most common structure for low-voltage distribution networks, characterized by a simple structure, definite power flow direction, convenient access to distributed energy resources, and suitability for distributed multi-agent control. System description: Each node is regarded as an independent agent with local voltage regulation capability. The system adopts typical distribution line impedance parameters, active and reactive load settings, and a connected symmetric communication topology for multi-agent information interaction. The simulation focuses on voltage fluctuation control under load variations, and node voltages are constrained within the standard range of 198 V–242 V. The control objectives are to suppress voltage fluctuations, accelerate global consensus convergence, and reduce communication resource consumption.

The 33-node power system employed in this study is the standard IEEE 33-bus radial distribution test system [20], which is a 220 V low-voltage radial secondary distribution system rather than a longitudinal long-line distribution system. The radial topology is the most common structure for low-voltage distribution networks, featuring a simple structure, fixed power flow direction, convenient access to distributed energy resources, and high compatibility with distributed multi-agent control.

Modeling Assumptions

Three-phase balanced operation; the system is simplified into a single-phase equivalent model.

Distribution lines are modeled as series resistance–reactance (R–X) elements without shunt capacitance.

All loads are constant PQ loads; load variations are set as step changes to simulate practical fluctuation scenarios.

The communication topology is symmetric, connected, and consistent with the electrical topology.

Ideal communication without time delay or packet loss is adopted in the main simulation.

Node voltages are constrained within the standard range of 198 V–242 V (±10% of 220 V).

Network Configuration and Parameters

The communication topology is generated as a connected undirected graph consistent with the IEEE 33-bus electrical topology. All line impedance and load data strictly follow the standard IEEE 33-bus benchmark.

The system contains 33 nodes and 32 distribution branches. Node 1 is set as the slack bus, and nodes 2–33 are PQ buses. The total active load is 3.715 MW, and the total reactive load is 2.300 Mvar. Typical line impedance parameters are listed as follows:

Line 1–2: R = 0.0922 Ω, X = 0.0470 Ω

Line 2–3: R = 0.4930 Ω, X = 0.2511 Ω

Line 3–4: R = 0.3660 Ω, X = 0.1864 Ω

The full line and load data strictly follow the standard IEEE 33-bus dataset to ensure reproducibility.

Each node is regarded as an independent agent with local voltage regulation capability. The simulation focuses on voltage fluctuation control under load variations. In the multi-agent framework, each agent only uses local voltage, local load, and neighboring information to make autonomous decisions. Event-triggered control triggers communication and updates when the combined local and neighborhood voltage error exceeds the exponentially decaying threshold. Periodic sampling control implements fixed-interval updates every 0.15 s. The dynamic voltage Equation (34) describes the relationship between node voltage dynamics and reactive power balance, which provides a theoretical basis for voltage control and stability analysis.

All parameters are listed in Table 1.

After simulating multi-agent event-triggered control with six agents, the multi-agent event control algorithm is introduced into the control of a 33-node power system. The 33 nodes in the power system are regarded as 33 multi-agents. By simulating communication noise interference and calculating local state deviations, communication is triggered through dynamic error thresholds when the event triggering conditions are met. Periodic sampling triggering involves forced updates at fixed time intervals, and asynchronous noise is added to simulate the errors caused by differences in sampling moments. The global convergence comparison chart, multi-agent stabilization time comparison chart, and voltage fluctuation comparison chart for the 33 multi-agent systems under event-triggered control and periodic sampling control are calculated and shown in the Figures.

The modeling process of the 33-node power system is as follows: The 33 nodes are considered as agents with independent voltage regulation capabilities. An impedance matrix is generated to simulate the actual distribution characteristics of the power grid impedance, and reactive and active loads are established for each node. A symmetric communication network is constructed by random connections to ensure network connectivity. When network connectivity is insufficient, new connections are added with a probability of 0.97 to ensure effective information interaction among multi-agents.

The dynamic voltage equation of the power system is shown in Equation (34), representing the relationship between node voltage dynamics and reactive power balance. Its physical essence is the dynamic expression of the conservation law of reactive power, providing a theoretical basis for the subsequent voltage control strategy design and stability domain calculation of the 33-node power system.

\frac{d V_{i}}{d t} = \frac{Q_{g e n, i} - Q_{l o a d, i} - \sum Z_{i j} V_{j} \sin (θ_{i} - θ_{j})}{V_{i}}

(34)

The voltage deviation is defined as shown in Equation (35). The rated voltage of the 33-node power system is 220 V. When subjected to external environmental disturbances, the voltage at each node fluctuates within the range of 198 V to 242 V.

Deviation = \frac{1}{T} \sum | V_{i} - V_{b a s e} | \times 100 %

(35)

When the multi-agent system is subjected to fluctuations in the external environment, event-triggered control uses dynamic thresholds to trigger decisions based on the combined local voltage errors of the agents and the average errors of their neighbors. Periodic sampling control sets a fixed interval of 0.15 s to trigger actions, adjusting the reactive power output of the lines.

In contrast, periodic sampling control supports only a narrower operating range and weaker anti-interference capability. Meanwhile, the proposed method achieves significantly faster convergence, shorter settling time, and smaller global consensus error, as shown in Figure 8 and Figure 9, which fully verify its control superiority.

Figure 8 compares the settling times of 33 multi-agents under event-triggered control and periodic sampling control. With event-triggered control, the average settling time per agent is 0.7 s, whereas under periodic sampling control, it is 3.3 s. When the 33-bus power system is subjected to environmental disturbances, the average settling time under event-triggered control is markedly shorter than under periodic sampling. Each node reaches stability faster with event-triggered control, and the scatter of settling times is more concentrated with smaller variance, indicating better convergence consistency.

In analyzing the convergence performance of multi-agent systems under these two control strategies, each agent’s dynamics are described by its voltage state, and global consensus is achieved through local interactions. The global error is computed as the sum of the squared distances between every agent’s state and the system centroid. The continuous-time equivalent model is given in Equation (36).

{\dot{x}}_{i} = \sum_{j \in N_{i}} w_{i j} (x_{j} - x_{i})

(36)

The Lyapunov stability analysis is as follows: the energy function is constructed as shown in Equation (37), and its derivative analysis is given in Equation (38).

V (t) = \frac{1}{2} \sum_{i = 1}^{N} ∥ x_{i} - \bar{x} ∥^{2}, \bar{x} = \frac{1}{N} \sum x_{i}

(37)

\dot{V} (t) = \frac{1}{2} \sum_{i = 1}^{N} ∥ x_{i} - \bar{x} ∥^{2} < 0

(38)

Event-triggered control introduces stochastic noise into state updates, yet the error fluctuation range is smaller, indicating stronger robustness to disturbances. Periodic sampling control continues to trigger updates at a fixed interval of 0.15 s. The global convergence comparison between event-triggered and periodic sampling control is illustrated in Figure 9.

Figure 9 depicts the global convergence of the multi-agent system under event-triggered control and periodic sampling control as time progresses. Under event-triggered control, the system stabilizes at approximately 7.9 s, whereas under periodic sampling control, stabilization occurs at about 33 s. The final global error is 3.98 with event-triggered control and 26 with periodic sampling control. Compared with periodic sampling control, event-triggered control achieves faster global convergence and a smaller global error, demonstrating superior communication efficiency and better control performance in the 33-bus power system.

All simulations are implemented in MATLAB/Simulink, which is a deterministic simulation platform. Under the same parameter settings, the results are highly consistent and repeatable, so the presented performance curves and quantitative indicators are representative and reliable.

To better illustrate the advancement of the proposed method, we provide theoretical comparisons with several typical schemes.

Compared with classical fixed-threshold event-triggered control, the proposed method adopts an exponential decay threshold with stronger environmental adaptability. Compared with general adaptive event-triggered control, it introduces Lyapunov stability constraints to ensure safe operation. Compared with DDPG-based MARL, the PPO-based framework provides more stable training and a better balance between control performance and communication cost.

Periodic sampling control is chosen as the main baseline because it is the most widely used strategy in practical engineering and can directly reflect the reduction in communication frequency and improvement in dynamic response brought by event-triggered mechanisms.

5. Conclusions and Outlook

With the rapid advancement of smart manufacturing and its allied fields, multi-agent systems (MAS) have been increasingly deployed in industrial production due to their inherent distributed collaboration capabilities. Nevertheless, extant research still suffers from notable limitations, including heavy reliance on precise system models, low data utilization efficiency, rigid design of event-triggering conditions, and insufficient stability performance in complex and uncertain operating environments. To tackle these critical challenges, this study proposes a PPO policy gradient-based multi-agent event-triggered control strategy, which not only mitigates the adverse impacts of environmental and operational uncertainties but also ensures the economic operation and stable performance of the entire system.

Focusing on the intelligent control of multi-agent systems, the proposed PPO-driven event-triggered control approach effectively ensures that the real-time state error of each agent is consistently constrained below a predefined dynamic threshold. This verifies the effectiveness of the designed event-triggering mechanism in regulating multi-agent systems, which can both guarantee the rationality and validity of control input signals and maintain the global stability of the multi-agent system in complex scenarios.

Experimental results based on deterministic simulation show that the proposed method achieves faster convergence, higher steady-state accuracy, and lower communication frequency than periodic sampling control. Theoretical sensitivity analysis verifies that the algorithm is robust to key parameters, including the PPO clip factor, learning rate, event-triggered decay rate, and communication topology. Its stability is guaranteed by Lyapunov theory rather than stochastic training. Theoretical comparisons with classical event-triggered, adaptive threshold, and other RL-based methods further verify the effectiveness and advantages of the proposed scheme.

Author Contributions

Conceptualization, Q.S.; methodology, Y.Z.; software, Y.C.; formal analysis, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 24KJB470016); College Students’ Practical Innovation Training Program of Jiangsu Province (Grant No. 202510298061Z, 202510298164Y); Henan Key Laboratory of Cable Advanced Materials and Intelligent Manufacturing (CAMIM202505).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Zhang, N.; Kang, C.Q.; Du, E. Construction and operation of the new-type power system: Challenges and prospects. Proc. CSEE 2022, 42, 2803–2817. [Google Scholar]
National Energy Administration. 2023 Energy Work Guidance; National Energy Administration of China: Beijing, China, 2023.
Shu, Y.B.; Chen, G.P.; He, J.B. Low-carbon transition pathways of China’s energy and power sector under the dual-carbon target. Strateg. Study CAE 2022, 24, 1–10. [Google Scholar]
Liu, J.Z.; Wang, Z.P.; Zheng, T. Event-triggered control of renewable-dominant power systems: A survey and outlook. Proc. CSEE 2023, 43, 1–15. [Google Scholar]
Liu, T.F.; Wang, C.S.; Li, P. Event-triggered control with Lyapunov stability guarantees for distributed systems. Proc. CSEE 2021, 41, 3801–3810. [Google Scholar]
Liu, S.W.; Li, X.J.; Zhang, X.C. Federated-learning-based distributed energy coordinated dispatch. Proc. CSEE 2023, 43, 1758–1769. [Google Scholar]
Liu, Q.; Zhai, J.W.; Zhang, Z.C. Multi-agent reinforcement learning: Algorithms and applications. CAAI Trans. Intell. Syst. 2022, 17, 1–12. [Google Scholar]
Li, P.; Wang, C.S.; Xiao, J. Communication-resource-aware event-triggered control in multi-agent systems. Proc. CSEE 2021, 41, 2701–2710. [Google Scholar]
Liu, T.F.; Wang, C.S.; Li, P. Distributed event-triggered formation control without Zeno behavior. IEEE Trans. Autom. Control 2021, 66, 6138–6145. [Google Scholar]
Qin, S.; Feng, Y.; Wang, J.; Liu, S.; Guo, X.; Qi, L. Optimization of circular disassembly lines with human-assisted robotic workstations using two-stage greedy PPO algorithm. IEEE Trans. Comput. Soc. Syst. 2026, 13, 2086–2098. [Google Scholar] [CrossRef]
Chen, Z.; Pan, S.; Yu, K.; Wu, Y.; Gao, W.; Wang, Z.; Meng, X. Fusion control tracking strategy for autonomous vehicles: A fast PPO reinforcement learning based on attention mechanism and physical information. IEEE Trans. Intell. Transp. Syst. 2025, 26, 18906–18920. [Google Scholar] [CrossRef]
An, H.; Wang, L. Robust topology generation of internet of things based on PPO algorithm using discrete action space. IEEE Trans. Ind. Inform. 2024, 20, 5406–5414. [Google Scholar] [CrossRef]
Li, L.; Zhu, Y. Boosting on-policy actor–critic with shallow updates in critic. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 5644–5653. [Google Scholar] [CrossRef] [PubMed]
He, X.; Yang, Y.; Lee, J.; He, G.; Yan, Q. Deep reinforcement learning based AoI minimization for NOMA-enabled integrated satellite-terrestrial networks. IEEE Trans. Veh. Technol. 2025, 74, 3567–3572. [Google Scholar] [CrossRef]
Li, J.; Li, H.; Xia, D.; Huang, T.; Zheng, L.; Ran, L.; Ji, L. Constraint-projection-based actor–critic algorithm for dynamic distributed economic dispatch with nonconvex nonsmooth cost. IEEE Trans. Control Netw. Syst. 2025, 12, 1102–1114. [Google Scholar] [CrossRef]
Thalagala, S.; Wong, P.K.; Wang, X.; Sun, T. Broad critic deep actor reinforcement learning for continuous control. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 17508–17515. [Google Scholar] [CrossRef] [PubMed]
Zheng, Y.; Eryilmaz, A. State-independent control for constrained Markov decision processes with birth-death dynamics. IEEE Trans. Netw. 2025, 33, 1976–1988. [Google Scholar] [CrossRef]
Cheng, X.-L.; Liu, K.-Z.; Wang, Y.-W.; Sun, X.-M. Safety-critical event-triggered control for networked control systems under quantization and time-varying delay. IEEE Trans. Control Netw. Syst. 2025, 12, 2219–2230. [Google Scholar] [CrossRef]
Liu, Y.-F.; Zhang, C.-K.; Liu, Z.-Z.; Wan, X.; He, Y. Aperiodic sampling-based event-triggered H∞ control for interval type-2 fuzzy systems via a weakly constrained event-triggered functional. IEEE Trans. Fuzzy Syst. 2025, 33, 3447–3461. [Google Scholar] [CrossRef]
Dolatabadi, S.H.; Ghorbanian, M.; Siano, P.; Hatziargyriou, N.D. An Enhanced IEEE 33 Bus Benchmark Test System for Distribution System Studies. IEEE Trans. Power Syst. 2021, 36, 2565–2572. [Google Scholar] [CrossRef]

Figure 1. Intelligent operation chart of the new power system.

Figure 2. Simulink models based on multiple agents.

Figure 3. Multi-agent simulation operation results.

Figure 4. Architecture diagram of event-triggered mechanisms for multi-agent systems based on the PPO algorithm.

Figure 5. Event-triggered time distribution of multi-agent systems.

Figure 6. State convergence curves of six agents.

Figure 7. Error and dynamic threshold curves of each agent.

Figure 8. Settling time comparison between the proposed method and periodic sampling control.

Figure 9. Global consensus error comparison.

Table 1. Key parameters of the 33-node power system.

Parameter	Value	Parameter	Value
Nominal voltage	220 V	Total active load	3.715 MW
Allowed voltage range	198 V–242 V	Total reactive load	2.300 Mvar
Number of nodes	33	Simulation step size	0.1 s
Number of branches	32	Periodic sampling interval	0.15 s
Reference node	Node 1 (slack bus)	Event-trigger threshold	Exponential decay, initial = 15% of state range
Load type	Constant PQ load	Communication topology	Symmetric connected undirected (consistent with electrical topology)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Sun, Q.; Zhang, Y.; Li, C. Research on Multi-Agent Event-Triggered Control Algorithms for Power Systems. Appl. Sci. 2026, 16, 5354. https://doi.org/10.3390/app16115354

AMA Style

Chen Y, Sun Q, Zhang Y, Li C. Research on Multi-Agent Event-Triggered Control Algorithms for Power Systems. Applied Sciences. 2026; 16(11):5354. https://doi.org/10.3390/app16115354

Chicago/Turabian Style

Chen, Yanming, Qiming Sun, Ying Zhang, and Chengxuan Li. 2026. "Research on Multi-Agent Event-Triggered Control Algorithms for Power Systems" Applied Sciences 16, no. 11: 5354. https://doi.org/10.3390/app16115354

APA Style

Chen, Y., Sun, Q., Zhang, Y., & Li, C. (2026). Research on Multi-Agent Event-Triggered Control Algorithms for Power Systems. Applied Sciences, 16(11), 5354. https://doi.org/10.3390/app16115354

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Multi-Agent Event-Triggered Control Algorithms for Power Systems

Abstract

1. Introduction

2. Construction of Multi-Agent System

2.1. Basic Structure of Multi-Agent Systems

2.2. Multi-Agent Reinforcement Learning-Based Model

2.2.1. Multi-Agent Decision-Making Process

2.2.2. Building a Simulink Model Based on Multi-Agent Systems

2.2.3. Multi-Agent Reinforcement Learning Based on the PPO Algorithm

2.3. Section Summary

3. Multi-Agent Event-Triggered Control Design

3.1. Markov Decision Process

3.2. Multi-Agent Event-Triggered Control Based on PPO Policy Gradient Algorithm

3.3. PPO Strategy Gradient Algorithm-Based Multi-Agent Event-Triggered Control Procedure

3.4. Stability and Anti-Zeno Behavior Analysis

3.4.1. Notations and Preliminaries

3.4.2. Global Asymptotic Stability

3.4.3. Anti-Zeno Behavior Proof

4. Application Scenarios and Case Study Analysis

4.1. Case Study Analysis

4.2. Scenario Extension

5. Conclusions and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI