Coordination of Multiple Autonomous Agents Using Naturally Generated Languages in Task Planning

: Language plays a prominent role in the activities of human beings and other intelligent creatures. One of the most important functions of languages is communication. Inspired by this, we attempt to develop a novel language for cooperation between artiﬁcial agents. The language generation problem has been studied earlier in the context of evolutionary games in computational linguistics. In this paper, we take a different approach by formulating it in the computational model of rationality in a multi-agent planning setting. This paper includes three main parts: First, we present a language generation problem that is connected to state abstraction and introduce a few of the languages’ properties. Second, we give the sufﬁcient and necessary conditions of a valid abstraction with proofs and develop an efﬁcient algorithm to construct the languages where several words are generated naturally. The sentences composed of words can be used by agents to regulate their behaviors during task planning. Finally, we conduct several experiments to evaluate the beneﬁts of the languages in a variety of scenarios of a path-planning domain. The empirical results demonstrate that our languages lead to reduction in communication cost and behavior restriction.


Introduction
Compared with single-agent systems, multi-agent systems have the distribution properties of time, space, and function, and have several advantages in task applicability, execution efficiency, and system robustness [1]. Real-world applications of multi-agent systems include logistics [2], construction [3], search and rescue [4], warehouse automation [5], infrastructure placement [6], computer animation [7], etc. Due to the lack of complete knowledge, agents usually need to exchange their states, actions, or goals to collectively carry out system tasks. Consequently, a communication language or protocol should be predefined to inform cooperative strategies when designing multi-agent systems.
Two methods are commonly used to construct languages for agent communication. One is to design a certain artificial language for agents [8,9]. The other is to let agents communicate in natural languages [10,11]. Most of the studies on multi-agent planning and distributed control use the former method to exchange messages. The latter approach is helpful for human partners to understand the behavior of agents. However, agents must learn two different internal representations of themselves and humans, which can be counterproductive. In fact, it is not necessary for agents to use human languages in a situation where only artificial agents exist. In this paper, we try to create the agents' own languages that can be used to coordinate them. The languages are not predefined case-by-case and are naturally generated based on the abstraction of agents' states in the environment.
Our work is motivated by a fundamental question from agent coordination: what kind of information do agents need to communicate? We set out to answer the question by considering a particular situation in which communication is only available in task planning phase. In this case, the Our language generation problem has a connection with the work on the origin and evolution of natural languages, which has been studied earlier in evolutionary and computational linguistics [29,30]. Most of the prior works studied language evolution in the context of evolutionary games [31], for example, the talking heads experiment [32]. In particular, a study [33] suggests a relation between natural languages and a hidden planning language that preexists in the mind. While our work addresses a similar problem, we take a step further by formulating the language generation problem in the computational model of rationality. It is worth noting that language evolution is quite a complicated problem, and many researchers from different disciplines continue to develop it. In this work, we study the language generation problem in a limiting cooperative multi-agent planning setting and hope that our work could shed some light on it.
We studied the language generation problem for planning agents from different perspectives. The general idea and framework for language construction based on state abstraction are introduced in work [34]. In this paper, we extend the formulation to a multi-agent setting and present a different kind of language. Additionally, the work [35] studies the minimal language generation problem for optimal planning. However, the language is constructed and abstracted based on predefined perception symbols. The language in this work is naturally generated and only depends on planning domains.

Multi-Agent Planning Model
For simplicity, we consider a planning setting including multiple agents which act with complete information and deterministic actions. The multi-agent planning problem can be modeled as M = (O, A, I, G), which is an extension of the STRIPS model [36], where O is a finite set of propositional variables and A is the set of joint actions of agents. Each action a ∈ A is given by preconditions, Pre(a) ⊆ O, add effects, Add(a) ⊆ O, and delete effects, Del(a) ⊆ O. The task of a planning domain is specified by (I, G), where I ⊆ O and G ⊆ O denote the set of initial states and goal states, respectively.
Given a planning problem, a solution or plan is to find a sequence of states connected by agents' actions that lead the initial state to a goal state. Generally, the cost of actions is measured by a cost function. A solution is optimal if it takes the lowest action cost.

Assumptions
In this work, we make several assumptions that are shown as follows: • Agents are cooperative so that they could carry out a task that cannot be done by a single agent, and they are rational and perform the task at the least cost; • Observation and communication are only available before task execution, and agents perform the task synchronously at each step. All agents understand the constructed language. For example, two agents, agent 1 and agent 2 , are assigned to do a task for which there are only two optimal plans, p A and p B . Without communication, agent 1 may act following plan p A , while agent 2 acts according to plan p B . As a result, they execute the task sub-optimally or in failure.

Definition 2.
(Language): Given the state set S of a multi-agent planning domain M, where s ∈ S is the joint state of agents, a language for the domain is a tuple L = (W, R), where W represents a set of words, and R is the concatenation operator, denoted as "#", which can be used to combine words. Each word w ∈ W denotes an abstraction of joint states and is a subset of S.
A sentence is defined as a sequence of words combined by operators. In this paper, we only consider the concatenation operator, so the sentence including n words has the following form: For planning agents, to avoid potential conflicts or improve team performance, the sentence as a communicative message is considered to be a constraint on agents' actions. Since agents are rational, the receiver agents choose an optimal plan that is consistent with the communicated sentence. Please note that in this work, plans specified by a sentence refer to the optimal plans which satisfy the constraint of the sentence, and we mean a joint optimal plan when mentioning a plan.
We give an intuitive example to explain the language defined above. For a multi-agent task instance (S i , S g ), there are three optimal plans, shown as follows: P 1 : S i , S 11 , S 12 , S 13 , S 14 , S g P 2 : S i , S 21 , S 22 , S 23 , S 24 , S g P 3 : S i , S 31 , S 32 , S 33 , S 34 , S g where there is RC between P 1 and P 2 or P 3 , and no RC between P 2 and P 3 . One of the valid languages is L = {W 1 = (S 14 , S 22 , S 31 ), W 2 = (S 11 , S 23 , S 33 )}. For instance, sentence W 2 #W 1 means that agents must be in a grounding state of W 2 at some moment and be in a grounding state of W 1 at a later moment in task execution. Thus, the state sequence of plans specified by W 2 #W 1 has the following form: S i , ..., S 11 or S 23 or S 33 , ..., S 14 or S 22 or S 31 , ..., S g Symbol "..." denotes an omitted state sequence that can be void. Here, only P 1 is consistent with the state sequence, i.e., W 2 #W 1 can specify P 1 . Similarly, sentence W 1 #W 2 can specify a plan set including P 2 and P 3 .

Definition 3. (Language Generation Problem,
LGP): Given a multi-agent planning model M, we must find a language L that can be used to resolve the RC for any given task.

Definition 4.
(Sentence Generation Problem, SGP): Given a multi-agent planning model M, a language L, and a task, we must find a sentence that can specify a plan set without RC.
Specifically, the LGP is about how to construct a language, and the SGP is about how to use the language to remove RC. As shown in Figure 1, words are constructed based on the states and tasks of the environment where agents act. This optimization process is operated offline. After that, given the words and tasks, sentences are generated online by agents to communicate with their teammates. Next, we look at a few language properties that may be required. Optimality: A language is optimal if plans specified by it are also optimal. Completeness: A language is complete if it can specify a plan set that includes any plan. Minimality: A language is minimal if the number of words is the smallest. Globality: A language is global if it can describe the whole plan sequence. Locality: A language is local if it only expresses partial specification of plan sequences. In our prior work, the generation problem of complete global languages [34], and minimal optimal languages [35] are studied. In this work, we generate languages that are complete and local. In Section 5, we compare the performances of the languages.
The goal of this paper is to construct languages that could help agents to optimally accomplish tasks. In detail, we try to design an efficient algorithm to obtain valid state abstractions for multi-agent planning domains.

Approach
We know that a language is related to an abstraction of states. A language is useful only when it can distinguish the plans for which RC exists. In this section, we introduce the language generation and communication processes. First, the sufficient and necessary conditions for valid abstraction are provided with the proofs. Second, we propose an efficient algorithm to obtain such abstractions. Finally, we introduce the coordination process of languages.

Conditions
Theorem 1. The state abstraction for a given domain is valid if and only if it satisfies that: For any task where RC is presented between plans p 1 and p 2 , b 1 and b 2 , separately, is the abstracted plan sequence of p 1 and p 2 . Then, the following four conditions must be true: Proof of Theorem 1. Sufficient condition: By the definition of sentences, we can see that the two plans set specified by sentence b 1 and b 2 have no elements in common when the four above conditions are true, i.e., b 1 and b 2 can specify the plans that do not introduce RC. Necessary condition: If b 1 and b 2 can separately specify p 1 and p 2 , we have that b 1 and b 2 are two different sentences and do not separately express p 2 and p 1 . By using reduction to absurdity, if b 1 (b 2 ) can express p 2 (p 1 ), b 1 (b 2 ) must be equal or a partial specification of the sentence sequence b 2 (b 1 ) of p 2 (p 1 ). Therefore, we can reduce the conclusions.
From the theorem, if we want to construct a useful language, we must ensure that the above conditions are satisfied during the language generation.

General Idea
Since the language is used to specify all optimal plans for RC tasks, it should be able to describe every plan sequence. Therefore, we first seek a smaller state set that could distinguish all plan pairs in which RC is present. Then, we generate a language by abstracting the states of the set while ensuring the conditions in Theorem 1 are true.

Algorithm
The language generation process includes four procedures: finding plan pairs; finding state set; finding state pairs; word generation.
Finding plan pairs: For each task t, we find the optimal plans by a modified A * algorithm. The standard A * algorithm [37] stops to search nodes when the minimum estimate of the cost value of explored nodes is equal to or greater than the value of the goal. The modified A * algorithm continues the search process until the value of nodes is greater than the value of the goal. For any two plans, we put them into plan pairs set P s if RC is present.
Finding state set: With P s , we need to find a state set S m that could specify any plan pair of P s . To make it true, a different state should at least appear in the two plan sequences. Here, we do not intend to find the minimal language, so we only need to get a state set that is approximately the smallest. First, we set state set S m as void. Then, for each plan pair (x, y) in P s , we remove their common states. For each plan, if no state in its sequence is the element of S m , we add the first state of the plan into S m . Otherwise, we address the next plan pair.
Finding state pairs: State pairs denote that the two states cannot be abstracted as the same word. First, we set state pairs set C s as void. For each plan pairs (x, y) in P s , we remove the states that do not appear in S m and obtain the reduced plan sequences. If the two plan sequences have different lengths, we cut the longer plan into several subsequences, the length of which equals that of the shorter plan. Then, we check to see whether there are two states that are in the same place of plan sequences and appear in C s . If they do not, we add the first state pair into C s .
Word generation: Under the restriction of the state pairs of C s , a greedy CSP (Constraint Satisfaction Problem) solver is used to assign the states of S m to several state sets in which any pair of states does not appear in C s . We define each state set as a word, and a language is then constructed.
The pseudo-code of the algorithm is presented in Algorithm 1. Get optimal plans P of t 7: if P(i) and P(j) introduce RC then Put (P(i), P(j)) in P s 8: procedure FINDING STATE SET 9:

Algorithm 1 Language Generation Process
for plan pairs (x, y) ∈ P s 10: Remove common states of plan x and y 11: if not all states of x or y appear in Sm then Put a state of x or y in S m

12:
Get reduced sequence R(x), R(y) of x, y 13: if R(x) equals R(y) then Put (R(x), R(y)) in E s

14:
elseif R(x) is longer than R(y) then 15: Get the subsequence SR(x) whose length equals R(y); Put (SR(x), R(y)) in E s 16: procedure FINDING STATE PAIRS 17: for sequence pairs (e 1 , e 2 ) ∈ E s 18: for each step i ∈ |e 1 |

19:
if state pairs (e 1 (i), e 2 (i)) ∈ C s then break 20: elseif i = |e 1 | then Put (e 1 (i), e 2 (i)) in C s 21: procedure WORD GENERATION 22: for state s ∈ S m 23: for word w ∈ W 24: if s is not a member of F(w) then Abstract s as w; Add s c into F(w), (s, s c ) ∈ C s 25: elseif w = W(|W|) then Abstract s as w n , put w n in W Theorem 2. Given a multi-agent planning domain, the languages generated by the algorithm are optimal and complete.
Proof of Theorem 2. For any RC plan pair, p 1 and p 2 , we assume that the two abstracted sentence sequences are b 1 and b 2 , respectively. From the language generation process, b 1 and b 2 are obviously not void. If the length of b 1 equals the length of b 2 , we know that the two first words of the sentences are different, so b 1 does not equal b 2 . If they do not, the shorter sentence does not equal any subsequence of the longer sentence, so b 1 does not contain or is involved in b 2 . Therefore, the four conditions are always satisfied. More specifically, the plan set specified by b 1 must include p 1 and not include p 2 , and vice versa, i.e., the languages are complete and, since the languages are generated upon optimal plans, the languages are also optimal.
Please note that we do not consider the semantics of words in this work. However, some features representing the relationship between agents and environment can be defined in accordance with the application requirement of languages, such as the distance between the agents, and whether agents are close to the goals or not. Thus, the word function that describes the mapping from states to words, can be achieved by classical clustering methods (e.g., CLARANS algorithm [38]).

Language Communication
The coordination process between agents using our language can be described as follows: First, an optimal plan is found by a coordinator for current task. Depending on the framework of the multi-agent system, the coordinator could be an agent or a control station.
Second, the sentence of the plan can be generated based on the language, and is then sent to other agents as coordination information. In the sentence generation, the states are abstracted as the words that they belong to. It can be seen that there always exists a sentence that could express the plan.
Third, receiver agents choose their actions under the constraints of the sentence. Consequently, the task is finished without conflicts between the agents. Communication is not required when no sentence is generated, and agents can act freely.
Please note that this work mainly focuses on generating coordination messages rather than obtaining task solutions. The automated planning methods [39] can be applied to find agents' plan based on the sentences.

Experiments and Results
In this section, we make several simulation experiments to verify the performance of the algorithm and the advantages of the languages in a grid-world domain. First, a simple navigation example is provided to illustrate the language generation and coordination. Furthermore, we compare our languages with the languages in [34,35] from several aspects. In the end, we implement the algorithm to more scenarios with different settings.

Coordination Example
A path-planning problem in a grid-world domain involving two agents, agent b and agent r , is shown in Figure 2. The numbered white cells are reachable for the agents and the black cells are the obstacles. In each step, the agent can move to an adjacent cell or remain where it is. Agents are not allowed to stay in the same cell or move to each other's place at the same step. Given target points, the goal of the agents is to arrive at the points with the least time and energy cost. As mentioned before, we assume that observation and communication are only available during the planning phase. We apply the algorithm to the environment with different RC tasks. The size of the language and the computation time of the algorithm, along with the increase of the number of RC tasks, are denoted by the blue and red curves in Figure 3, respectively. We observe that the number of language words does not always increase as more RC tasks are included, since the generated language could apply to other RC tasks. The language of the environment involving all RC tasks is L = {W 1 , W 2 , W 3 , W 4 , W 5 , W 6 }. The states of each word are shown in Table 1. Where state S x,y indicates that agent b is at point x and agent r is at point y. For a planning instance, the task of agent b and agent r is moving from 1 and 9 to 3 and 4. Assume that every time step and movement take a cost of 1, respectively. Thus, there are seven optimal plans for the two agents, which are shown as follows: P 1 : S 1,9 , S 1,8 , S 4,7 , S 5,6 , S 6,2 , S 7,6 , S 8,5 , S 9,4 , S 3,4 ; B 1 : W 5 #W 2 P 2 : S 1,9 , S 4,8 , S 4,7 , S 5,6 , S 6,2 , S 7,6 , S 8,5 , S 9,4 , S 3,4 ; B 2 : W 5 #W 2 P 3 : S 1,9 , S 4,8 , S 5,7 , S 5,6 , S 6,2 , S 7,6 , S 8,5 , S 9,4 , S 3,4 ; B 3 : W 5 #W 2 P 4 : S 1,9 , S 4,9 , S 5,8 , S 6,7 , S 7,11 , S 8,7 , S 9,6 , S 3,5 , S 3,4 ; B 4 : W 6 #W 3 P 5 : S 1,9 , S 4,8 , S 5,8 , S 6,7 , S 7,11 , S 8,7 , S 9,6 , S 3,5 , S 3,4 ; B 5 : W 6 #W 3 P 6 : S 1,9 , S 4,8 , S 5,7 , S 6,7 , S 7,11 , S 8,7 , S 9,6 , S 3,5 , S 3,4 ; B 6 : W 6 #W 3 P 7 : S 1,9 , S 4,8 , S 5,7 , S 6,11 , S 7,11 , S 8,7 , S 9,6 , S 3,5 , S 3,4 ; In this case, the agents may choose any plan to follow if they do not communicate. However, they conflict with each other in task execution if an agent chooses anyone from the first three plans and its partner chooses another plan from the last four plans. In other words, RC is present for the task. Based on the language L, we generate the sentences of the seven plans, which are shown following the plans. Sentence W 5 #W 2 can express agents' preference for plan, which specifies a no RC plan set including P 1 , P 2 , and P 3 . Similarly, sentence W 6 #W 3 can specify a plan set that includes P 4 , P 5 , and P 6 , and sentence W 3 can specify P 4 , P 5 , P 6 , and P 7 . Thus, the potential conflicts between the agents can be solved by communicating one of the sentences.

Languages Comparison
To measure the performance of the languages constructed in this work, we compare our languages with languages generated by the method in [24], denoted as CGL (complete global language), and the method in [25], denoted as MOL (minimal optimal language), in terms of message lengths, specified plans, and computation time. Figure 4 is the test example in which there are 10 task points, marked as blue stars, which need to be continually visited by two agents. For this environment, there are a total of 8100 tasks (a task is specified by a pair of agents' states <initial, goal>, 8100 = 10 * 9 * 10 * 9), of which 350 tasks introduce RC. We obtain that the number of language words generated by our approach, CGL, and MOL are 6, 7, and 2, respectively. For the sake of contrast, we conduct 100 RC tasks at random and record the relevant data. Figures 5-7 separately show the length of the communicative messages, the number of specified optimal plans, and the time cost of sentence generation, respectively. The red, purple, and green curves represent the results achieved by our method, CGL, and MOL, respectively. In Figure 5, the blue curve denotes the number of states in the plans. As we can see, coordination messages generated by our algorithm are quietly shorter than those of other communication methods, which greatly reduces the burden of communication of agents. Figure 6 shows that our sentences could specify many available plans for agents to follow, which gives them more flexibility to execute tasks. Although the constructed language in this work is not minimal, our approach has nearly comparable benefits to those of MOL in this respect. Furthermore, as can be seen in Figure 7, the computation costs of language planning using our method are much less than the costs of MOL. As with CGL, the complexity of the SGP in our method is almost equivalent to the task planning problem. In general, the results show that our languages offer considerable advantages compared to CGL and MOL in these criteria.

Different Scenarios
We also implement our algorithm to the path-planning problem in 30 grid-world scenarios. For each scenario, we separately run 500 RC tasks in the setting of two, three, and four agents. Figures 8  and 9 demonstrate the number of agents' joint states and our language words used in these tasks, respectively. The number of words is far smaller than the states, which means that agents require less effort to understand the messages by our approach. To assess the benefits of the languages in terms of communication cost and behavioral flexibility of agents, we compute their coordination sentences and specified plans for all task plans. Bars in Figure 10 represent the average decrement of the length of sentences compared with plan sequences. The communication costs decline by 60% on average using the languages. Bars in Figure 11 indicate the average amount of plans specified by the sentences. From these results, we can learn that the constructed languages are quite useful for multi-agent coordination.    It should be noted that the classical planning problem is known to be PSPACE-complete [40]. The LGP is considered to be more difficult, and finding a minimal language is PSPACE-hard [34]. It is challenging to solve the LGP involving more agents. In this work, we only apply our method to the path-planning problem with a small number of agents, and a more efficient method remains to be further exploited when the problem increases. Fortunately, the LGP can be addressed offline in a centralized manner. Generating a communication sentence for agents' coordination is rather simpler than language generation, which is not harder than finding an optimal plan.

Conclusions
In this paper, we provided a new coordination approach using simple languages for multi-agent systems. The languages are not manually defined and are naturally generated via state abstraction. The benefits of the languages include specifying optimal plans, empowering agents more flexibility in behaviors, and reducing communication costs. The experiment results confirmed that efficient languages for task execution can be constructed based on agents' own internal representation in the computational model of rationality. They also suggest that the LGP in multi-agent planning domains may provide a perspective for creating a "natural language" for autonomous agents.
In future work, we intend to investigate the LGP involving more agents. There are multiple ways in which scalability can be improved. A method is to introduce communication between the agents during plan execution, which breaks plans into plan segments, essentially reducing the number of planning problems. A second method is to study the strategy for pairwise coordination between the agents using the language constructed for two agents. Another interesting direction is to construct languages that work in varying environments. A possible solution is dividing a new environment into subspaces that are isomorphic to the original environment where the language is constructed, and augmenting it when this cannot be done.

Acknowledgments:
The authors thank the anonymous reviewers for their thoughtful comments on this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this paper: