Dynamic Model of Collaboration in Multi-Agent System Based on Evolutionary Game Theory

: Multi-agent collaboration is greatly important in order to reduce the frequency of errors in message communication and enhance the consistency of exchanging information. This study explores the process of evolutionary decision and stable strategies among multi-agent systems, including followers, leaders, and loners, involved in collaboration based on evolutionary game theory (EGT). The main elements that affected the strategies are discussed, and a 3D evolution model is established. The evolutionary stability strategy (ESS) and stable conditions were analyzed subsequently. Numerical simulation results were obtained through MATLAB simulation, and they manifested that leaders play an important role in exchanging information with other agents, accepting agents’ state information, and sending messages to agents. Then, with the positivity of receiving and feeding back messages for followers, implementing message communication is proﬁtable for the system, and the high positivity can accelerate the exchange of information. At the behavior level, reducing costs can strengthen the punishment of impeding the exchange of information and improve the positivity of collaboration to facilitate the evolutionary convergence toward the ideal state. Finally, the EGT results revealed that the possibility of collaboration between loners and others is improved, and the rewards are increased, thereby promoting the implementation of message communication that encourages leaders to send all messages, improve the feedback positivity of followers, and reduce the hindering degree of loners.


Introduction
A multi-agent system is an important branch of distributed artificial intelligence, and several independent agents are adopted to achieve common goals in this system. These agents have an autonomous ability to coordinate with each other. In multi-agent systems, the research on the system's collaboration control has mainly involved tracking [1][2][3][4], formation [5][6][7], swarm [8][9][10], rendezvous [11], distributed filtering [12], and consistency [13,14]. Collaboration consistency elucidates that the state of all agents tends toward the same tendency, and it illustrates the rule of interacting and transmitting information when agents cooperate with other agents; additionally, it describes the process of information exchange between each agent and other agents. When agents are able to deal with various unpredictable situations and suddenly variable environments, the effectiveness of collaboration is reflected in reaching consensus on goals as the environment changes. Therefore, the agreement of multi-agents to achieve common goals is a primary condition for collaborative control.
In previous works, collaboration consistency was first applied to solve the problem of fusion under uncertain information in multi-sensors in 1974 [15]. In the subsequent few years, Borkar et al. [16,17] studied synchronous asymptotic consistency, which was adopted to investigate the decision of a distributed system in the field of control theory. In 1995 [18], Vicsek et al. proposed a classical model, that is, the dispersion system of multi-agents moving in a plane to simulate the phenomenon of particles presenting coherent behavior. Through the introduction of graph theory and matrix theory in 2003 [19], Jadbabaie explained the theory of consistency and found that the sets of agents' neighbors varied over time in the system. Subsequently, R. Olfati et al. [20][21][22] described a framework of theory to figure out a consistency problem for dynamic systems. In 2010 [23], researchers observed the problem of consistency and synchronization of multi-agent systems in complex networks. Over the last few decades, researchers have explored this collaboration from different aspects. Some researchers focused on controlling groups of autonomous mobile vehicles to implement concentrated and decentralized collaboration control [24]. Two basic controllers of leader-follower control were proposed to allow the followers to maintain a relative position and avoid collisions in front of the obstacles. Different from other studies on leader-follower approaches, a recent article has suggested that the orientation deviations of leader-followers be explicitly expressed in the model to successfully solve collaboration controls when the agents move backward [25].
In previous studies, the agents' consistency has been investigated in simple integrators, whereas agents are complex in practical engineering applications. In addition, it is not in line with the conditions of actual applications under complex and changeable environments.
In recent years, with the continuous efforts of researchers, the consistency of static and dynamic networks has been adopted in various fields to satisfy practical applications. In terms of the consistency of collaboration in formation control, leadership-follow strategy [26,27] indicates that some agents are leaders and others are followers who track the position and direction of the leaders at a certain distance. Some researchers [28] investigated the leader-follower formation control model based on uncertain nonholonomic-wheeled mobile robots. They also expressed that the leaders' signal can be smooth, feasible, or nonfeasible. Adopting the estimated states of a leader, they transform formation errors into external oscillator states in an augmented system that presents additional control parameters that overcome actuation difficulties and reduce formation errors. One article [29] manifested the problem of formation control based on the leader-follower model in 3D space, which explores the persistent excitation of the desired formation to achieve the exponential stabilization of actual formation in terms of shape and scale. In general, designing these controllers to realize and describe the collaboration of agents is easy. However, considering the operating capability of different agents is difficult. Ignoring the perspective of global programming limits the effectiveness of the collaboration, which can be resolved in a distributed coordination approach. Then, for the consistency of collaboration, researchers [30,31] have investigated swarming motility in various networks. Tanner and Jadbabaie [32] proved the stability of swarm control and proposed a new protocol of consistency to analyze the stable properties of mobile agents and stabilize their inter-agent distances, adopting the rules of decentralized and nearest-neighbor interaction with exchanging information. The discontinuities of control laws are introduced via these changes. Nonsmooth analysis is used to accommodate arbitrary switching based on the network of interactions. The main result shows that regardless of switching, a common velocity vector is guaranteed to reach a convergence state when the network remains connected all along. Moreover, the collaboration based on the evolutionary game is analyzed thoroughly in small-world and scale-free networks [33][34][35]. Meanwhile, researchers described the consistency of fixed and switched topology in a multi-agent system [36][37][38], where each agent is a universal linear dynamic system and a linear model of nonlinear networks. Thus, a unified framework for complex networks is set up.
In summary, the existing studies only consider the interactions of leaders-followers. In reality, environmental factors play an indispensable role in the exchange of information. Therefore, the interactions of three stakeholders involved in the collaboration should be investigated.
Hence, in response to this discussion, the process of evolutionary decision and stable strategies among three stakeholders, such as followers, leaders, and loners, involved in the collaboration of a multi-agent system based on evolutionary game theory (EGT) is demonstrated. The main elements that affected the strategies of the agents are discussed, and the 3D replicator evolution equation is established to obtain the evolutionary stability strategy (ESS). Stable conditions are acquired through the theory of Lyapunov stability. The reasonability of the proposed mechanism is confirmed by simulation experiments. This research may help the agents to make optimal decisions and may provide theoretical guidance to agents to implement collaboration and adapt to complex environments. The contributions of this study are presented as follows: (1) We establish a tripartite dynamic evolution model of followers, leaders, and loners for collaboration. Different from the previous game model, which involved only two stakeholders of leadership-followers, this model investigates the influence of factors and the exchange of information among the three stakeholders effectively.
(2) The main parameters of the strategies are involved in feedback, sending, and receiving messages for three parties, namely, the followers, leaders, and loners, respectively; these parameters are analyzed in the simulation discussion. Moreover, other influential factors, including the degree of positivity and the possibility of interaction, are discussed in the game model. We figure out the evolutionary stable strategies (ESS) of agents under different stability conditions and scenarios.
(3) The simulation results indicate that when the possibility of collaboration between loners and others is improved and when the rewards are increased, the implementation of message communication can be promoted to encourage leaders to send all messages, improve the positivity of feedback for followers, and reduce the hindering degree of loners.
(4) Finally, conclusions are obtained and policy implementation is put forward to offer suggestive guidance of actual application.
The remainder of this paper is presented as follows: We describe the evolution of the game model in Section 2. Then, Section 3 illustrates the equilibrium points and stability analysis. In Section 4, the simulation results and discussion are confirmed. Finally, our conclusions and policy enlightenment are figured out in Section 5.

Model
In this section, the dynamic collaboration model based on evolutionary game theory is proposed. Then, the payoff matrix of the agents is obtained according to the parameters of the agents' behavior. In addition, the tripartite replication dynamic equation is derived.

Descriptions and Notes of the Parameters in a Multi-Agent System
In a multi-agent system, each agent can work by itself or in an environment and interact with other agents. Thus, mutually independent agents deal with complex problems in the coordinated approach to achieve a common goal. However, agents may be disturbed by external factors in a hostile environment when completing tasks, thereby resulting in the failure of normal communication. We assume that the interferential factors are described by the loners' behavior. As shown in Figure 1, in the multi-agent system model, different agents perform their own various tasks. Initially, leaders send messages to followers and loners. Then, the followers decide whether to provide feedback to the leaders after receiving messages while sending messages to the loners. Subsequently, the loners can select whether or not to receive messages. If the loners receive messages, then the destructive power of the loner decreases when they communicate with each other. Otherwise, the destructive power of the loner increases in the exchange of information in a changing environment.
All agents have the right to select their own decisions in the communication process. Therefore, the set of strategies for followers is {feedback, not feedback}. Regardless of sending all or partial messages to the leaders, the followers receive messages and obtain payoffs. Then, the followers decide whether or not the processed information is fed back to the leaders. They can obtain rewards when feeding back to the leaders. Otherwise, they obtain nothing and do not involve cost.  Figure 1. Multi-agent system model. 148 All agents have the right to select their own decisions in the communication process. 149 Therefore, the set of strategies for followers is {feedback, not feedback}. Regardless of 150 sending all or partial messages to the leaders, the followers receive messages and obtain 151 payoffs. Then, the followers decide whether or not the processed information is fed back 152 to the leaders. They can obtain rewards when feeding back to the leaders. Otherwise, they 153 obtain nothing and do not involve cost. 154 For leaders, the set of strategies is {all messages, partial messages}. They obtain pay-155 offs with the cost of when sending all messages. When sending partial messages, 156 they obtain payoffs under the cost of . Leaders can gain rewards as the feedback 157 messages are received. We assume that is greater than , and is more than . 158 In terms of loners, the strategy set is {receive, not receive}. represents the payoff 159 of loners receiving messages from followers with the cost of . The loners obtain payoffs 160 when messages from the leaders are received under the cost of . represents the 161 payoff of unsuccessfully receiving messages. Interactive rewards and can be obtained 162 as loners interact with the followers and leaders, respectively. 163 Other parameters and notes are described as follows: represents the payoffs of 164 receiving messages for followers at the cost of .
indicates the payoffs that followers 165 obtain as they send messages to loners with the cost of . We adopt parame-166 ters and to describe the degree of positivity for feedback and receiving, respectively, 167 to define the positivity. represents the probability of successful sending. γ is the inter-168 active possibility of agents communicating with others. For interactive rewards, we stip-169 ulate that the value of is the same as that of . Specific parameters and notes can be 170 represented in Table 1 Rewards and costs of receiving messages from followers, respectively. > 0, > 0 Probability of sending messages successfully. 0 < < 1 = 1 indicates all messages, 0 < < 1 represents partial messages. For leaders, the set of strategies is {all messages, partial messages}. They obtain payoffs P L1 with the cost of C L1 when sending all messages. When sending partial messages, they obtain P L2 payoffs under the cost of C L2 . Leaders can gain rewards R L as the feedback messages are received. We assume that P L1 is greater than P L2 , and C L1 is more than C L2 .
In terms of loners, the strategy set is {receive, not receive}. P z1 represents the payoff of loners receiving messages from followers with the cost of C z1 . The loners obtain payoffs P z2 when messages from the leaders are received under the cost of C z2 . R represents the payoff of unsuccessfully receiving messages. Interactive rewards I f and I L can be obtained as loners interact with the followers and leaders, respectively.
Other parameters and notes are described as follows: P f 1 represents the payoffs of receiving messages for followers at the cost of C f 1 . P f 2 indicates the payoffs that followers obtain as they send messages to loners with the cost of C f 2 . We adopt parameters α and β to describe the degree of positivity for feedback and receiving, respectively, to define the positivity. p represents the probability of successful sending. γ is the interactive possibility of agents communicating with others. For interactive rewards, we stipulate that the value of I f is the same as that of I L . Specific parameters and notes can be represented in Table 1. Table 1. Parameter descriptions and notes.

Parameters
Descriptions Notes P f 1 , C f 1 Profits and costs of followers receiving messages, respectively.
Profits and costs of sending messages to loners, respectively.
Rewards and costs of followers sending feedback messages to leaders.
Positive degree of feedback to leaders.
Profits and costs of leaders sending all messages, respectively.
Profits and costs of leaders sending partial messages, respectively.
Rewards and costs of receiving messages from followers, respectively.
Profits and costs of receiving messages from leaders, respectively.
Profits of receiving messages unsuccessfully.
Possibility of interaction when loners receive messages successfully. 0 < γ < 1 I f , I L Rewards of interacting with followers and leaders respectively.

Payoff Matrix of Agents
The proportion of strategies in the agents' population can be denoted as follows. x (0 ≤ x ≤ 1) is the probability of followers feeding back messages. On the contrary, the probability of nonfeedback is 1 − x . For leaders, y (y = 1) represents the proportion of sending all messages. 0 < y < 1 indicates the proportion of sending partial messages. In terms of loners, assuming that the probability of receiving messages is z , 1 − z denotes the proportion of nonreceiving messages. The corresponding payoff matrix is shown in Table 2. Table 2. Payoff matrix of followers, leaders, and loners.
In accordance with the different strategies decided by agents, the corresponding payoffs can be obtained. Fop, Lep, and Lop represent the payoffs of followers, leaders, and loners, respectively. The specific expression is shown in the following equations:

Replication Dynamic Equation of Agents
The expected revenue can be obtained according to the payoff matrix. Let E x represent the expected payoffs of followers when they feedback messages. Similarly, E 1−x represents the expected payoffs of nonfeedback, as shown in the equation where E x is the average of expected payoffs for followers. E x , E 1−x , and E x can be shown as follows: Hence, we can obtain the dynamic equation of followers as follows: They obtain the expected payoffs E y when the leaders send all messages. Similarly, E 1−y is the expected payoffs of sending partial messages, as shown in the equation where E y is the average of the expected payoffs.
Therefore, the dynamic equation of the leaders can be expressed as follows: As loners receive messages, they obtain the expected payoffs E z . Similarly, E 1−z indicates the expected payoffs of nonreceiving, as shown in the equation where E z is the average of the expected payoffs.
The dynamic equation of loners can be expressed as follows: Finally, the 3D dynamic equations of the system are expressed by the replicated dynamic equations of the followers, leaders, and loners, as follows:
According to the method of Frideman [39,40], x is an evolutionary stable strategy as F(x) = 0 and F (x) = 0. Jacobian matrix analyses of the stability of the system should be adopted for convenient calculation. The Jacobian matrix for the system can be described as follows: The specific matrix representation is shown in the following equation: Initially, the equilibrium points are carried out into the Jacobian matrix to obtain the eigenvalues. Then, whether the equilibrium points are stable or not is judged according to  Table 3. Specifically, the eigenvalue A 1 of p 9 and p 10 is shown as follows: Table 3. Stability analysis of equilibrium points.

Simulation Results and Discussion
The replication dynamic equation (RD) and the evolutionary stable strategy (ESS) constitute the core concepts of evolutionary game theory. They describe the dynamic convergence process to the steady-state of the evolutionary game. In RD, the time step of t represents the derivative of the dynamic system of followers, leaders, and loners as follows: Simulation experiments are carried out with different parameters to demonstrate the influence of the parameters on the convergence rate under the restricted condition of ESS. The length of time is set to 30. leaders with high comprehensive ability and loners with weak cooperation ability is in 244 the minority due to the agents' different abilities. At the initial time of dynamic evolution, 245 that is, = 0, we assume that the proportion of followers equals 0.5, the proportion of 246 leaders equals 0.3, and the proportion of loners equals 0.2. The evolutionary results 247 are denoted in Figure 2. For < 0, < 0,

Scenarios of Different Parameters with Constraint Conditions in the Equilibrium
we find that the cost of 252 tasks completed together is higher than the payoffs in the multi-agent system, thereby 253 leading to the probability of feedback, sending, and receiving tending to zero over time. 254 If the leaders do not send messages, then the followers and loners will not receive and 255 feedback messages. This phenomenon is not conducive to the collaboration and interac-256 tion of the system. For we find that the cost of tasks completed together is higher than the payoffs in the multi-agent system, thereby leading to the probability of feedback, sending, and receiving tending to zero over time. If the leaders do not send messages, then the followers and loners will not receive and feedback messages. This phenomenon is not conducive to the collaboration and interaction of the system.  Figure 3. Under the condition of αR f − C f < 0, p(P L1 − P L2 ) + (C L2 − C L1 ) < 0 and λ[β(P z1 + P z2 ) − (C z1 + C z2 )] > R − γ(I f + I L ), the payoffs of receiving P z and the possibility of interaction γ are improved for loners, compared with their initial values. Hence, the probability of receiving tends to 1 for loners. Therefore, the willingness to receive messages is enhanced as the payoffs of receiving and the possibility of interaction are improved. When the loners are willing to receive messages, their damage is reduced; thus, the ability to collaborate is enhanced in the multi-agent system. are improved for loners, compared with their initial values. 266 Hence, the probability of receiving tends to 1 for loners. Therefore, the willingness to re-267 ceive messages is enhanced as the payoffs of receiving and the possibility of interaction 268 are improved. When the loners are willing to receive messages, their damage is reduced; 269 thus, the ability to collaborate is enhanced in the multi-agent system.

For the condition of αR
and λ[β(P z1 + P z2 ) − (C z1 + C z2 )] < R − γ I f + I L , the positivity of feedback α is improved for followers, compared with initial values. Hence, the probability of the strategies of feedback tends to 1 for followers. That is, the followers can track leaders and share information with each other in real-time, strengthening their ability to cooperate with each other in the multi-agent system. The possibility that agents are influenced by others with the power to destroy collaboration is reduced; thus, the ability to collaborate is enhanced in the system.

For the condition of αR
and λ[β(P z1 + P z2 ) − (C z1 + C z2 )] < R − γ I f + I L , the probability of successful sending p and payoffs P L1 , P L2 is improved for leaders, compared with the parameter values at point p 4 (1, 0, 0). Hence, the probability of the strategies to send also tends to 1; that is, sending accurate messages is a prerequisite of successful communication in the multi-agent system.

For the condition of αR
and λ[β(P z1 + P z2 ) − (C z1 + C z2 )] > R − γ I f + I L , the positivity of feedback α is improved for followers, and the payoffs of unreceiving R decreases, compared with the parameters' values in p 2 (0, 0, 1). Hence, the probability of the strategies of feedback also tends to 1 for the followers; that is, the followers can track leaders and share information with each other in real time, strengthening the ability of agents to cooperate with each other. The damage of loners is reduced due to the decrease in R; thus, the ability to collaborate is enhanced in the multi-agent system. Figure 6 shows the difference in graphs.
For the condition of the positivity of feedback α and rewards R f decreased for the followers, compared with the parameter values in point p 5 (1, 1, 0). Meanwhile, the payoffs of sending P L1 , P L2 reduce for leaders. Hence, the probability of the strategies of feedback tends to 0. The convergence speed of the strategy of receiving declines. In this scenario, loners do not send messages on time, and the followers are inactive to feedback, resulting in delayed collaboration and tracking errors in the multi-agent system.    For the condition of αR f − C f < 0, p(P L1 − P L2 ) + (C L2 − C L1 ) <0 and β(P z1 + P z2 ) − (C z1 + C z2 ) > R − γ I f + I L , the payoffs of receiving P z1 , P z2 are improved for the loners, compared with the parameter values at point p 6 (0, 1, 0). Hence, the probability of the strategies of receiving also tends to 1. In this scenario, the loners can receive messages and promote the probability of interaction, reducing the destructive possibility of tracking to cooperate in the multi-agent system.

Scenario 8
In terms of the stability conditions of p 8 (1, 1, 1), the parameter values are λ = 0.5, α = 0.5, β = 0.2, γ = 0.1, p = 0.5, R f = 20, C f = 5, P L1 = 20, P L2 = 10, C L1 = 3, C L2 = 1, R L = 20, C L = 5, P z1 = 20, P z2 = 20, C z1 = 1, C z2 = 1, I f = 15, I L = 15, R = 10. The probability of strategies remains unchanged. The simulation results are shown in Figure 9.  Under the conditions of and β(P z1 + P z2 ) − (C z1 + C z2 ) > R − γ I f + I L , the payoffs of feedback R f and the positivity of feedback α are improved for the followers, compared with the parameter values at point p 8 (1, 1, 1). Hence, the probability of the strategies of feedback also tends to 1. In this scenario, the convergence speed of the loners decreases and that of the followers and leaders increases, enhancing the communication and cooperation capabilities of target tracking in the multi-agent system.

Impacts of Different Parameters on the Evolutionary Results
This analysis indicates that point p 8 (1, 1, 1) is an ideal ESS at eight equilibrium points. Though the initial values of the parameters do not affect the evolutionary results, they can affect the speed of convergence. Subsequently, we investigate the effect of parameters, such as the proportion of messages λ, the positivity of feedback α and receiving β, the possibility of interaction γ , and the probability of successful sending p on mutual cooperation. The evolutionary results of λ, α, β, γ , and p are shown in the following section.

Influence of Parameter λ on Dynamic Evolution
Leaders send messages with a certain proportion, which can affect the accuracy of communication throughout the system. Sending all messages provides the basic guarantee for target tracking. On the contrary, lack of information leads to the deviation in tracking and affects the feedback of followers. Hence, exploring the effect of parameter λ on followers, leaders, and loners is necessary. The other parameters are set as follows: α = 0.5, β = 0.2, γ = 0.1, p = 0.5, R f = 20, C f = 5, P L1 = 20, P L2 = 10, C L1 = 3, C L2 = 1, R L = 20, C L = 5, P z1 = 20, P z2 = 20, C z1 = 1, C z2 = 1, I f = 15, I L = 15, R = 10. When λ is 0.2, 0.5, and 0.8, the simulation results are as shown in Figure 10.
For the followers, as λ increases, the probability of feedback remains unchanged, but the convergence speed increases with different proportions of feedback. Sending all messages can enhance leaders' performance to motivate the effectiveness of followers' feedback. With the increase in λ, the probability and convergence speed of sending are unchanged. λ does not affect the probability and convergence speed of receiving of the loners. communication throughout the system. Sending all messages provides the basic guaran-374 tee for target tracking. On the contrary, lack of information leads to the deviation in track-375 ing and affects the feedback of followers. Hence, exploring the effect of parameter on 376 followers, leaders, and loners is necessary. The other parameters are set as follows: is 0.2, 0.5, and 0.8, the simulation results are as shown in Figure 10. back builds up from 0 to 1; thus, can affect the selection of strategy of the followers. 406 However, the probability of sending is not affected by the increase in as it approaches 407 0. The convergence speed of sending is improved when the value of increases. For the 408 loners, with the increase in , the probability of receiving remains unchanged, and 409 tends to 0 under different . Thus, the feedback's positivity evidently affects dynamic 410 evolution. When the positivity of feedback α of the followers increases, the probability of feedback builds up from 0 to 1; thus, α can affect the selection of strategy of the followers. However, the probability of sending is not affected by the increase in α as it approaches 0. The convergence speed of sending is improved when the value of α increases. For the loners, with the increase in α, the probability of receiving z remains unchanged, and z tends to 0 under different α. Thus, the feedback's positivity evidently affects dynamic evolution.

Influence of Parameter β on Dynamic Evolution
The positivity of receiving messages of a certain proportion can affect the accuracy of communication throughout the system. In fact, if messages are received by the leaders and loners, then this provides the basic guarantee for target tracking. On the contrary, lack of receiving information leads to deviation in tracking and affects the feedback for followers. Hence, exploring the effect of parameter β on followers, leaders, and loners is necessary. When the possibility of interaction increases, the probability of feedback and send-444 ing remains unchanged. We found that can affect the selection of strategy for loners. 445 The probability of receiving increases from 0 to 1 with the increase in , and its conver-446 gence speed also increases. The dynamic evolution of collaboration is enhanced with the 447 increase in interactive possibilities in multi-agent systems. In summary, the possibility of 448 As β increases, the probability of feedback remains unchanged, but the probability of sending y and receiving z increases from 0 to 1. Receiving feedback messages and transmitting messages can enhance the system's performance to motivate the effectiveness of feedback, sending, and receiving. With the increase in β, the probability and the convergence speed of sending of the followers are unchanged. In terms of loners and leaders, β does not affect the convergence and the speed of receiving and sending. Figure 13 illustrates the impact of parameter γ on the evolutionary results under different agents in the multi-agent system. Other parameters are assumed as follows: λ = 0.5, α = 0.5, β = 0.2, p = 0.5, R f = 20, C f = 5, P L1 = 20, P L2 = 10, C L1 = 3, C L2 = 1, R L = 20, C L = 5, P z1 = 20, P z2 = 20, C z1 = 1, C z2 = 1, I f = 15, I L = 15, R = 10. The values of γ are set to 0.2, 0.5, and 0.8, and the simulation results are shown in the following section.

Influence of Parameter γ on Dynamic Evolution
ing remains unchanged. We found that can affect the selection of strategy for loners. 445 The probability of receiving increases from 0 to 1 with the increase in , and its conver-446 gence speed also increases. The dynamic evolution of collaboration is enhanced with the 447 increase in interactive possibilities in multi-agent systems. In summary, the possibility of 448 interaction increases, thereby enhancing the positivity of receiving messages. There-449 fore, this condition is favorable when communicating with each other. When the possibility of interaction γ increases, the probability of feedback and sending remains unchanged. We found that γ can affect the selection of strategy for loners. The probability of receiving increases from 0 to 1 with the increase in γ, and its convergence speed also increases. The dynamic evolution of collaboration is enhanced with the increase in interactive possibilities in multi-agent systems. In summary, the possibility of interaction γ increases, thereby enhancing the positivity of receiving messages. Therefore, this condition is favorable when communicating with each other.

Influence of Parameter p on Dynamic Evolution
The probability of successful sending with different proportions can affect the accuracy of communication throughout the system. Hence, exploring the effect of parameter p on the system is necessary. Other parameters are λ = 0.5, α = 0.5, β = 0.2, γ = 0.1, R f = 20, C f = 5, P L1 = 20, P L2 = 10, C L1 = 3, C L2 = 1, R L = 20, C L = 5, P z1 = 20, P z2 = 20, C z1 = 1, C z2 = 1, I f = 15, I L = 15, R = 10. When p is 0.2, 0.5, and 0.8, the simulation results are shown in Figure 14. With the increase in , the probability of receiving remains unchanged, but the 466 probability of sending increases from 0 to 1. The convergence speed of feedback and 467 sending is improved. In fact, if the messages are sent by leaders successfully, then the 468 basic guarantee for target tracking is provided. On the contrary, lack of sending infor-469 mation leads to deviation in tracking and affects the feedback. As increases, the con-470 vergence speed of feedback increases for the followers. In terms of loners, does not af-471 fect the convergence speed of receiving. In summary, the probability of successful sending 472 With the increase in p, the probability of receiving z remains unchanged, but the probability of sending y increases from 0 to 1. The convergence speed of feedback and sending is improved. In fact, if the messages are sent by leaders successfully, then the basic guarantee for target tracking is provided. On the contrary, lack of sending information leads to deviation in tracking and affects the feedback. As p increases, the convergence speed of feedback increases for the followers. In terms of loners, p does not affect the convergence speed of receiving. In summary, the probability of successful sending p increases, thereby enhancing the positivity of feedback and sending messages to communicate with each other.

Conclusions and Policy Enlightenment
In summary, we have demonstrated the evolution of collaboration based on evolutionary games. This study initially develops the model of different strategies, different parameters, and interaction in a multi-agent system of followers, leaders, and loners. Subsequently, after setting up the replication dynamic equation of different roles, equilibrium points are obtained to confirm the constraint conditions of the evolutionary stable strategy. Then, the research focuses on the influence of strategies and parameters on the dynamic evolutionary results of collaboration in different scenarios. The simulation results indicate that to gain the optimal results, followers should feedback messages to leaders positively while receiving messages from the leaders and transmitting messages to the loners. The leaders send all messages to the followers and loners; the loners receive the messages from the followers and leaders. In fact, the results have shown that the leaders played an important role in the collaboration. If all messages are sent by the leaders successfully, then they provide a basic guarantee for the accurate exchange of information. On the contrary, the lack of sending all information leads to a deviation in tracking and an impact on feedback. In terms of a multi-agent system, collaboration is key to ensure that all agents harmoniously form a unified entirety in an expectant approach.
In addition, according to the simulation results, we found that the consistency of collaboration is in an optimal state when the stakeholders agree to achieve a common goal of exchanging information. That is, leaders send all messages, followers feedback messages to leaders on time, and loners receive messages positively, as shown in Figure 9. This result demonstrates that the effectiveness of our proposed model is reasonable.
This study elucidates some policy implementations of the model realized by the followers, leaders, and loners for collaboration in evolutionary games. The probability of strategies is affected by their obtained payoffs and costs in the system. Therefore, promoting their payoffs is necessary to enhance the positivity of interaction while decreasing costs of communication with each other. For the followers, the payoffs of receiving and transmitting are increased to motivate the positivity of feedback messages. Then, improving the rewards of feedback in the communication process between followers and leaders is reasonable. For the leaders, sending all messages is necessary to provide the basis of mutual communication. On the contrary, if partial messages are sent, then integral communication is affected in the multi-agent system. Meanwhile, reducing the costs of sending can enhance interaction with agents. The probability of the successful sending of messages should be improved to provide a basic guarantee for collaboration, which can ensure that all agents receive messages. For the loners, increasing the possibility of interaction with followers and leaders is the most important to reduce hindrance to the system. That is, as much interaction as possible between loners and others is a good decision, thereby ensuring that all agents harmoniously form a unified entirety in an expectant approach. Then, through cooperation between agents, the basic capabilities of each agent are improved, and their social behavior can be further understood from the interaction of the agents. In summary, in a dynamic and open environment, agents with different goals must coordinate their goals and resources. During conflicts between resources and goals, if the coordination of agents fails to reach a better situation, then a deadlock occurs. This condition causes the agents to be unable to carry out their next step of work. On the contrary, if all agents can reach an agreement of collaboration in a multi-agent system, then the exchange of information is enhanced to improve cooperation.