Learning and Planning Based on Merged Experience from Multiple Situations for a Service Robot

: For a service robot, learning appropriate behaviours to acquire task knowledge and deliberation in various situations is essential, but the existing methods do not support merging the plan-based activity experiences from multiple situations in the same task. In this paper, an abstract method is introduced to integrate the empirical activity schemas of multiple situations, and a novel algorithm is presented to learn activity schema with abstract methods. Furthermore, a novel planner called the schema-based optimized planner (SBOP) is developed based on the learned activity schema, in which actions merging optimization and partially backtracking techniques are adopted. A a PR2 robot and a physical experiment is conducted validate the proposed method. The results show that the robot can generate a plan to recover from failure automatically using the novel learning and planning method, given that the experienced exception has been merged in the activity schema. Owing to the improved autonomy, the proposed SBOP exhibits increased efﬁciency in dealing with tasks containing loops and multiple activity schema instances. This research presents a novel solution of how to merge activity experiences from multiple situations and generate an intelligent and efﬁcient plan that could adapt to a dynamically changing environment. to perform a complex task. The experience extractor continuously observes the history of fluents in the working memory to detect and extract potential relevant experiences including the sequence of applied actions, the methods with names and arguments, and user


Introduction
Today, an increasing number of robots are developed for service in daily human life, and these robots are expected to be more flexible and efficient. Moreover, robots working in the real world need to adapt to the dynamically changing human-inhabited environments. Thus, approaches to learning the appropriate behavior to acquire task knowledge and deliberation in various situations are essentially required [1]. Early work attempted to address high-level task learning from human demonstration based on experience-based planning domains (EBPDs) [2]. The task models refer to activity schemas composed of action sequences or operation plans and the features of objects involved in the operators. Furthermore, the activity schemas are used for grounding task vocabulary which is transmitted to the robot by the human user, and as a guide to increase the speed of plan searching. However, the approach attempted to abstract activity schema directly from the operators, without a middle layer to learn different action sequences for certain task classes, so it does not support merging the plan-based activity experiences from multiple situations in the same task.
Numerous studies have focused on obtaining robot activity experiences and conceptualizing the experiences in Web Ontology Language (OWL) ontology [3][4][5]. However, the problem of dynamically storing acquired concepts in the robot's ontology has not been addressed to date. The problem of reinterpreting past episodes in terms of a new ontology [4,6] is a research topic under investigation by several groups; however, a perfect solution has not been developed to date. Generalization is the first step in conceptualizing a plan-based robot activity experience. Seabra presents several deductive and inductive transformation methods for the principles that explain the success of a failure-recovery strategy from experienced episodes [7,8]. Recently, conceptualizing activity experiences in the form of activity schemata that can be used to guide a planner towards a solution has attracted much attention from academia [7,9]. Following the idea, Vahid proposed a conceptualization approach based on deductive generalization, abstraction and feature extraction [10]. Then, a goal inference procedure is added to activity schemas, and the inferred goal is useful in solving different instances of the same problems [2]. In one study [11], the approach is extended to robot experiences containing loops. However, the learned activity schemas are fixed and separated from each other, which causes the difficulty of switching to activity schema when the current learned activity schema fails.
Hierarchical task network (HTN) planning is an approach that is adopted extensively for automated planning. Numerous studies focus on learning the applicability conditions from given structural traces of HTNs. Early studies, such as those utilizing the CaMeL algorithm [12] and DInCAD algorithm [13], assumed the complete observability of the intermediate states of the plans and a complete action model was given. Approaches such as the Icarus algorithm [14] and the HTN-MAKER algorithm [15] have been developed to learn the decomposition methods for HTN. It is difficult to apply such approaches because the observed plan traces lack complete and consistent information in real world situations. Hankz presented HTN-learner to learn the HTN method preconditions in HTN planning given a set of partially observed decomposition trees [16]. This method is inefficient for scenarios with complex tasks and uncertain environments, in which the number of predicates needed to be identified is large.
A plan solution guided by the activity schema is an effective means that uses the learned task knowledge. Schema-Based Planner (SBP), an adapted A * heuristic search planner, is proposed to exploit learned activity schemas as heuristics to make plans in similar situations in experience-based planning domains [17], and it is applied to solving problem instances of the same task with varying sets of objects [11]. However, using a plan generated by SBP to handle other situations such as an environment that changes dynamically during the task execution has not been reported until now. Furthermore, the solution to planning a task that contains loops and multiple activity schema instances is not optimal.
Robots performing complex tasks in a dynamically changing environment involve multiple-tasks planning and replanning. Zhang [18,19] expanded the model of HTN planning with action cliques, and proposed an optimization method to improve the task execution efficiency. In the same work, when the protected preconditions or protected states fail, the execution monitor triggers partially backtracking replanning. Helen introduced an enhanced robotic planning framework that improved robots' ability to operate in dynamically changing environments [20]. However, these methods rely on the definition and description of a complete task decomposition method, and there is no evidence that these methods support abstracted planning experience.
In this paper, we present a novel learning and planning method that integrates the empirical activity decomposition schemas of different situations and generates optimal HTN plans through a partially backtracking algorithm and merging optimization. The robot system first gathers the robot activity experience of the teaching task. Secondly, the system learns the activity schema and combines it with the historical activity schemas. The process consists of three steps: extracting the predicates, which is the component of the abstract method preconditions; determining the structure of the new activity schema which consists of abstract methods (see Definition 2); and merging the activity schema with the historical activity schema by changing its structure. Finally, to improve the task planning and execution efficiency in the process of generating a task plan based on activity schema, the actions merging the optimization solver and the partially backtracking solver are added to the planner. The proposed learning and planning method is demonstrated on the Robustness by Autonomous Competence Enhancement (RACE) platform [21], and on our robot through two task scenarios: "deal with obstacles in serving a coffee to a guest" and "collect and deliver drink boxes to a counter". The results demonstrate that our method enables robots to better organize and utilize the various task experiences previously learned to solve the planning and optimization problems of similar tasks in a changing environment.
In this paper, we make the following contributions: • We introduce abstract methods to integrate the empirical activity schemas of multiple situations, allowing the robot to adapt task plans in a dynamically changing environment by selecting the branches that correspond to learned activity schema from a similar situation.

•
We present a novel algorithm that learns an activity schema with abstract methods for scenarios with complex tasks.

•
We present an approach of generation of HTN plans based on integrated empirical activity schema with merging optimization, which solves the tasks with loops in the plan layer.

•
We present a replanning algorithm that partially backtracks to the parent node layer by layer in the activity schema tree.

•
We demonstrate the proposed algorithms with simulation and physical experiment in two scenarios.
The paper is structured as follows. We first present the related work, overall learning and planning architecture in Section 1. Section 2 proposes learning algorithms for multiple activity schemas with the abstract methods, and a planning algorithm with optimization. Finally, the respective experiments in the simulations of restaurant scenarios and in the physical robot of the delivery domain are demonstrated and compared with the state-of-the-art methods in Section 3, and the conclusion and discussion of future studies are summarized in Section 4.

Overall Framework
In this section, we present the modified components of the modular RACE from the learning and planning perspective as red components in Figure 1. The central component is working memory. The basic data type of data written to the working memory is fluent [21]. The user interface is used to send a planning goal to the working memory which triggers the Task Planner and teaches a robot how to perform a complex task. The experience extractor continuously observes the history of fluents in the working memory to detect and extract potential relevant experiences including the sequence of applied actions, the methods with names and arguments, and user feedback. Taking the recorded plan-based robot activity experiences and learned activity schema as input, the activity schema learner constructs and records an activity schema with abstract methods into the activity schema memory. The developed task planner is composed of three components: a schema-based planner, merging optimization solver and partially backtracking solver. The merging optimization solver is employed to find an optimal solution to a given plan generated by the schema-based planner. The planner creates a plan as a sequence of ground actions and writes the plan into the working memory. The execution manager receives the plan, dispatches the planned actions to the robot, and records success or failure information into the working memory. If an exception happens, the planner will replan using the partially backtracking solver. The activity schema learner and task planner are addressed in this paper. schemas with the abstract methods, and a planning algorithm with optimization. Finally, the respective experiments in the simulations of restaurant scenarios and in the physical robot of the delivery domain are demonstrated and compared with the state-of-the-art methods in Section 4, and the conclusion and discussion of future studies are summarized in Section 5. In this section, we present the modified components of the modular RACE from the learning and planning perspective as red components in Figure 1. The central component is working memory. The basic data type of data written to the working memory is fluent [21]. The user interface is used to send a planning goal to the working memory which triggers the Task Planner and teaches a robot how to perform a complex task. The experience extractor continuously observes the history of fluents in the working memory to detect and extract potential relevant experiences including the sequence of applied actions, the methods with names and arguments, and user  Figure 1. An overview of the learning and planning system.

Representation
In this section, we propose or modify some definitions in experience-based planning domains (EFPDs). The representations of actions can be hierarchical as in HTNs [22]. The main definitions of main concepts in this paper are listed below: Definition 1. A method: m = (h, L 1 T 1 , L 2 T 2 , . . . L n T n ), in which h is the abstract method's head including the name of an argument list for m, L i is a predicate list for the preconditions of the ith branch, and T i is a subtask list. Based on methods, the planner knows how nonprimitive tasks can be decomposed into other tasks.
There is a complete definition of methods and operators in the complete task knowledge domain, and the methods are used to guide the decomposition of a task under multiple conditions as "middle layers". Inspired by this notion, to merge the plan-based activity experiences from multiple situations in the same task, we define the abstract methods that served as middle layers in the merged activity schema.

Definition 2.
An abstract method: am = [h, (presub 1 )(sub 1 ), (presub 2 )(sub 2 ), . . . , (presub n )(sub n )], where h is the head of the abstract method, (presub i )(sub i ) is the ith branch of am, presub i is a predicate list for the preconditions of the ith branch, and sub i is a list of ordered subtasks, which is composed of abstract operators or other abstract methods.

Definition 3.
An activity schema is a triple, ma = (t, AM, Ω), where t is a task head, AM is the set of abstract methods to decompose the t and Ω is an enriched abstract plan to achieve the t. Here, ma is a task model learned from an experience.

Solution 2:
The preconditions or structures of the abstract methods MA' that compose the activity schema ma' can be detected and adjusted according to the function Merge (ma, ma') in Algorithm 1 in Section 2.3.2.
Problem 3: How to find an optimal plan for a learned task with different objects. Problem 3 is formalized as: (π, tasktree, plantree) = P(D, P, ma), where π is an optimal plan solution, tasktree is a task tree [19], plantree is a plan tree [19], P is a task planning problem, and D is an experience-based planning domain [2].
Solution 3: Planning based on activity schema with merging optimization as the function SBOP(D, P, ma) in Algorithm 2 in Section 2.4.1.
Problem 4: How to replan automatically to recover from the failure that is due to an exception that has been merged in the activity schema. Problem 4 is formalized as: (π, tasktree', plantree') = REP(S, failedAction, ma, tasktree, plantree), where tasktree' is the updated task tree, plantree' is the updated plan tree, S is the current world state, and failedAction is the failed action.
Solution 4: Planning based on activity schema with partially backtracking technique as the function replanning_pb(S, failedAction) in Algorithm 3 in Section 2.4.2.

Learn Activity Schema with Abstract Methods
Our goal is to generalize and conceptualize the collected robot activity experience with activity schema, so we need to extract the preconditions and structures of the abstract methods that comprise the activity schema.

Relevant Predicates Extraction
The first problem we try to solve is how to extract the predicate list of an abstract method from the robot's experience. The number of predicates in a robot system with complex scenes and tasks is large, but the predicates in preconditions of a method are limited. We found that most predicates in a precondition of a method can be grouped into two categories. The first category contains one of the subtask's arguments and the second category contains another argument of a predicate belonging to the first category.
In the property graph for a concept, nodes within a semantical distance of 2 are considered as relevant [23]. Given that the goal of a task is to change the state of some objects, the objects used as arguments of the taught task, the preconditions and effects of actions are considered as the key arguments. The predicates that connect an argument of a subtask with a key argument within a two-step relationships are considered as relevant predicates and extracted as the precondition of the taught task (an activity schema or an abstract method). In the system, all predicates follow one of the following two forms: (Instance class instance_name) and (property instance_name filler). The former states the existence of an instance of the class type and the latter is used to represent the properties of that instance.
Let L 1 be the predicates following the first form, and let L 2 be the predicates following the second form. The relationship from one argument x to another parameter y is calculated by the number of predicates connecting them that follow the second form, and the sign of the relationship is determined by the direction. However, if the number of connected predicates is 2, and the intermediate argument is reified relation (e.g., at, on), the distance is considered as 1 (as in Formula (3)). Additionally, a predicate in the form of (Instance class instance_name) that contains an argument x is regarded as distance 0. According to such rules, the relevant predicates for a branch b of an abstract method are computed as follows: Appl. Sci. 2018, 8, 1832 6 of 18 where t i is the ith subtask in b; p and q are predicates; τ, τ 1 and τ 2 are qualitative timestamps; A is the set of arguments of a task t; B is the set of key arguments; and F i (t) is the collection of predicates that connect x with y with distance i. R is a set of arguments that are reified relations.
To support similarity-based planning, the relevant predicates are also used as features for each abstract operator via the same calculation as mentioned above. In the RACE system, propositions in the form of first-order binary predicates with qualitative time stamps are called fluents, and are gathered by an experience extractor.

Determine the Structure of the Abstract Method
In our system, the robot has basic action knowledge consisting of methods and operators. In general, there are two cases that the robot needs to learn from: 1) the first time when the robot faces a new task consisting of the subtasks sequence provided by a human teacher; 2) a failure situation of an existing task happens due to the learned activity schema not being applied, and a subtasks sequence is provided to recover from the failure situation.
For the purpose of formulating general concepts from a single training example and domain knowledge, deductive generalization is employed [8]. To improve the adaptability and flexibility of conceptual experience (activity schema), an abstract method is used as the middle layer. The learning algorithm is elaborated in Algorithm 1, which goes through the following steps: Generalize a robot activity experience e using explanation-based generalization (EBG) as the work [2].

2.
The generalized operators are abstracted and translated to enriched abstract operators Ω [2] with features.

3.
In the first case, the taught subtask sequence M can be directly transformed into a corresponding abstract method sequence AM (line 3). The name and arguments of the abstract method are the same as the corresponding subtasks (lines 5-6), and each abstract method consists of a sequence of corresponding abstract operators (line 7). In this way, the initial three-layer structure of an activity schema is built: the schema layer, the abstract method layer and the abstract operator layer. 4.
In the second case, the corresponding abstract method sequence AM is obtained using the previous method. If AM' (original abstract method sequence) is a true subset of AM, the complement of set AM' in set AM will be extracted as a free abstract method and generalized as AMF (line 13). The generalization process involves replacing the observed constants in the arguments of abstract methods, operators and their preconditions with variables. If AM does not contain the same sequence as AM', it is created as a new branch of the activity schema ma under the preconditions F(AM) (line 15), and preconditions of other branches will be adjusted as noted in line 16. Otherwise, the preconditions of AM and AM' (AM is the same as AM') will be merged, and the abstract operator sequence Ω i under each am i should be checked. Three potential cases are noted: 1) if an identical abstract operator sequence sub k ' as Ω i exists in the branches of the abstract method in ma', the preconditions of them will be combined by intersection (line 20); 2) if a branch sub k ' which is a subset of Ω i exists, the complement of set sub k ' in set Ω i will be extracted as a free abstract method and generalized as amf ; and 3) otherwise, Ω i will be regarded as a novel branch and added to am i under the preconditions F(Ω i ) (line 24), and preconditions of other branches will be adjusted as noted in line 26.

Planning Based on Activity Schema with Merging Optimization
To find an optimal solution to a task planning problem P in a given domain D, we developed a schema-based optimized planner (SBOP) which is based on the learned activity schema, and actions merging optimization and partially backtracking techniques are adopted.
The SBOP is presented in Algorithm 2. SBOP begins via selecting an activity schema ma with respect to a given task t. After instantiation, a global task tree rooted as 'tastree' is created to store the task tree, and a global plan tree rooted as 'plantree' is created to store the action chains that implement the task goal [19].
Given that the execution of the lowest-level actions changes the state of the world and affects the judgment of its subsequent nodes' preconditions, the check state of the current node depends on the check state of its children nodes. The search algorithm is designed as a depth-first algorithm. Only when all the children nodes are checked, is the next node in the same branch checked. When all the nodes in the current branch are checked, the parent node is returned to check the next node of the parent node.
We set three values for the check state of an abstract method instance node: "0" indicates that the nodes in its branch have not been checked, "1" indicates that all the nodes in one of its branches have been detected, "2" indicates that it is being checked. The abstract method am is checked layer by layer according to the structure of the selected activity schema. In this process, the current world states are substituted into the preconditions of each branch in am to check whether sub i is applicable in a state S (sub i is applicable in a state S only if there is a binding θ that binds all variables in the corresponding task's arguments [12], referring to line 19). If there is such a branch, the subtasks sub i θ of activenode are added to tasktree. Then, the check state of activenode is set to "2", and the nodes in the sub i θ are checked one by one (lines [21][22]. If such a branch does not exist, the preconditions that are not satisfied will be published. If activenode is an instance of an abstract operator, then S is updated to S' by applying active action to the transition function. If S' is true, the current action a is added to plantree and tasktree (line 30).
The condition for terminating a loop in this check is that a branch of the current activity schema is traversed, that is, state(ma) = 1. It is very likely that the task contains multiple subtasks that conform to the same schema. It is necessary to determine whether the search should be terminated by judging the satisfactory conditions in the goal proposition in S'. Finally, the planner stops and returns the tasktree and plantree.
To improve the execution efficiency, the whole plan is subject to merging optimization, and the action cliques [19] are generated by selecting the actions layer by layer of the plan tree in a top-down manner until the final plan is obtained (line 34). The optimization is modelled as a MWCP (maximum-weight clique problem) [24] based on the graph theory. Finally, the optimized plan is taken as the output. When execution fails, the replanning is activated to recover from the failure. If the failure is due to an exception that has been merged in the activity schema, the recovery will be successful. The process is presented in Algorithm 3.
The failed action in a tasktree can be obtained by Action2Task(), which maps the actions from the plantree to tasktree according to a dictionary (line 2). Replanning is conducted for the task tree under the parent node failedParent of the failed action. We first translate failedParent to the corresponding abstract method am in the activity schema, and check whether the branch presub i in am exists that is applicable in state S (lines 5-6). If no such branch is identified, we will check whether there exists amf i in AMF applicable in state S. If not, the parent node of failedParent will continue to be translated and checked (line 11). If an amf i exists, the children nodes of the branch amf i θ will be added to the node failedParent in the task tree (line 13). If presub i exists, the first abstract operator ω f whose feature with qualitative timestamps "at_goal" (refer to [2]) is satisfied under sub i should be identified, and the children nodes of failedParent in the task tree will be replaced by the nodes of the branch sub i θ after ω f θ (line 16). After that, the plan solution can be found according to SBOP. Figure 2 presents the basic principle.
Appl. Sci. 2018, 8,1832 9 of 18 continue to be translated and checked (line 11). If an amfi exists, the children nodes of the branch amfiθ will be added to the node failedParent in the task tree (line 13). If presubi exists, the first abstract operator ωf whose feature with qualitative timestamps "at_goal" (refer to [2]) is satisfied under subi should be identified, and the children nodes of failedParent in the task tree will be replaced by the nodes of the branch subiθ after ωfθ (line 16). After that, the plan solution can be found according to SBOP. Figure 2 presents the basic principle.

Demonstration and Evaluation
The proposed approach is demonstrated in two different scenarios: "deal with obstacles in serving a coffee to a guest" for learning and planning based on activity schema with partially backtracking technique, and "collect and deliver drink boxes to a counter" for merging optimization of a plan.

Demonstration 1: simulation in scenario "deal with obstacles in serving a coffee to a guest"
3.1.1. Learning abstract methods and robot activity schema To allow the user to provide instructions to the robot, we use a command-line interface that supports the following three types of commands: "achieve" is used to specify a task to be

Demonstration and Evaluation
The proposed approach is demonstrated in two different scenarios: "deal with obstacles in serving a coffee to a guest" for learning and planning based on activity schema with partially backtracking technique, and "collect and deliver drink boxes to a counter" for merging optimization of a plan.
3.1. Demonstration 1: Simulation in Scenario "Deal with Obstacles in Serving a Coffee to a Guest"

Learning Abstract Methods and Robot Activity Schema
To allow the user to provide instructions to the robot, we use a command-line interface that supports the following three types of commands: "achieve" is used to specify a task to be performed, "abort" is used to abort the requested task by the user, and "teach_task" is used to teach the robot a new task. Before performing or learning a new task, the operators and methods such as drive_robot, put_object, grasp_object and so on have been defined in JSHOP2 syntax.
We employed three sub-scenarios on the PR2 platform in a simulated restaurant environment to teach the "serve-a-coffee" task [25,26]. In the first sub-scenario, we teach the robot to serve a coffee to a guest from the right without any obstacle as shown in Figure 3a. In the second sub-scenario, an obstacle is present, which is a person in the manipulation area blocking the path to the south of Table 1; the robot was taught to wait for a while until the person leaves the region (as shown in Figure 3b). In the third sub-scenario, another type of obstacle, namely an extension table, blocks the manipulation area. The robot was taught to detour to the other side of Table 1 and place the coffee to the left of the guest (as shown in Figure 3c).
Appl. Sci. 2018, 8, 1832 10 of 18 performed, "abort" is used to abort the requested task by the user, and "teach_task" is used to teach the robot a new task. Before performing or learning a new task, the operators and methods such as drive_robot, put_object, grasp_object and so on have been defined in JSHOP2 syntax. We employed three sub-scenarios on the PR2 platform in a simulated restaurant environment to teach the "serve-a-coffee" task [25,26]. In the first sub-scenario, we teach the robot to serve a coffee to a guest from the right without any obstacle as shown in Figure 3a. In the second sub-scenario, an obstacle is present, which is a person in the manipulation area blocking the path to the south of table 1; the robot was taught to wait for a while until the person leaves the region (as shown in Figure 3b). In the third sub-scenario, another type of obstacle, namely an extension table, blocks the manipulation area. The robot was taught to detour to the other side of table 1 and place the coffee to the left of the guest (as shown in Figure 3c). The learned activity schemas without the abstract method are presented in Figure 4, in which ma1, ma2 and ma3 represent the learned activity schema in sub-scenarios 1 through 3, respectively. In ma2, the abstract operator sequence in green is the part newly learned from sub-scenario 2, which is used to handle the problem of being blocked by a person. In ma3, the abstract operator sequence in orange is the part newly learned from sub-scenario 3 to handle the case being blocked by a table. As a contrast, the structure and learned activity schema with abstract methods (denoted as ma) is presented in Figure 5. The key components of ma are shown in Figure 6. The activity schemas learned from the three sub-scenarios. The abstract operator sequences of ma1, ma2 and ma3 are presented from left to right. Each abstract operator is represented by a rectangular box, in which "ax" is its abbreviation. The green part indicates a sequence in ma2 that differs from ma1. The orange part indicates a sequence in ma3 that differs from ma1.  The learned activity schemas without the abstract method are presented in Figure 4, in which ma 1 , ma 2 and ma 3 represent the learned activity schema in sub-scenarios 1 through 3, respectively. In ma 2 , the abstract operator sequence in green is the part newly learned from sub-scenario 2, which is used to handle the problem of being blocked by a person. In ma 3 , the abstract operator sequence in orange is the part newly learned from sub-scenario 3 to handle the case being blocked by a table. As a contrast, the structure and learned activity schema with abstract methods (denoted as ma) is presented in Figure 5. The key components of ma are shown in Figure 6.
Appl. Sci. 2018, 8, 1832 10 of 18 performed, "abort" is used to abort the requested task by the user, and "teach_task" is used to teach the robot a new task. Before performing or learning a new task, the operators and methods such as drive_robot, put_object, grasp_object and so on have been defined in JSHOP2 syntax. We employed three sub-scenarios on the PR2 platform in a simulated restaurant environment to teach the "serve-a-coffee" task [25,26]. In the first sub-scenario, we teach the robot to serve a coffee to a guest from the right without any obstacle as shown in Figure 3a. In the second sub-scenario, an obstacle is present, which is a person in the manipulation area blocking the path to the south of table 1; the robot was taught to wait for a while until the person leaves the region (as shown in Figure 3b). In the third sub-scenario, another type of obstacle, namely an extension table, blocks the manipulation area. The robot was taught to detour to the other side of table 1 and place the coffee to the left of the guest (as shown in Figure 3c). The learned activity schemas without the abstract method are presented in Figure 4, in which ma1, ma2 and ma3 represent the learned activity schema in sub-scenarios 1 through 3, respectively. In ma2, the abstract operator sequence in green is the part newly learned from sub-scenario 2, which is used to handle the problem of being blocked by a person. In ma3, the abstract operator sequence in orange is the part newly learned from sub-scenario 3 to handle the case being blocked by a table. As a contrast, the structure and learned activity schema with abstract methods (denoted as ma) is presented in Figure 5. The key components of ma are shown in Figure 6. The activity schemas learned from the three sub-scenarios. The abstract operator sequences of ma1, ma2 and ma3 are presented from left to right. Each abstract operator is represented by a rectangular box, in which "ax" is its abbreviation. The green part indicates a sequence in ma2 that differs from ma1. The orange part indicates a sequence in ma3 that differs from ma1.  :pick(mug1,rarm1,paerc1)   a39:place(mug1,rarm1,pawlt1) a310:move(mant1, pmant1) Figure 4. The activity schemas learned from the three sub-scenarios. The abstract operator sequences of ma 1 , ma 2 and ma 3 are presented from left to right. Each abstract operator is represented by a rectangular box, in which "ax" is its abbreviation. The green part indicates a sequence in ma 2 that differs from ma 1 . The orange part indicates a sequence in ma 3 that differs from ma 1 .
Of note, ma 1 , ma 2 and ma 3 have been integrated into ma owing to the middle layer of the abstract method. The initial three-layer structure of ma is learned from the first sub-scenario, and its abstract method sequence is recorded as AM' (line 8 in Listing 1). The sequence of abstract methods AM2 extracted from sub-scenario 2 is noted as follows: [drive(part1, pmaec1), grasp(mug1, rarm1), drive(pmaec1, pmast1), wait_unblock(mast1), put(mug1, pawrt1)]. Clearly, AM' ⊂ AM2, so the complement of set AM' in set AM2, wait_unblock(mast1), is extracted as a free abstract method and generalized as wait_unblock-(?manArea) (lines 15-17 in Listing 1). Given that the sequence of abstract methods extracted from sub-scenario 3 did not contain the same sequence as AM', it was created as a new branch of ma under the preconditions F(AM) (lines 10-12 in Listing 1), and preconditions of the other branch were adjusted (line 6 in Listing 1).
Appl. Sci. 2018, 8, 1832 11 of 18 Of note, ma1, ma2 and ma3 have been integrated into ma owing to the middle layer of the abstract method. The initial three-layer structure of ma is learned from the first sub-scenario, and its abstract method sequence is recorded as AM' (line 8 in Listing 1). The sequence of abstract methods AM2 extracted from sub-scenario 2 is noted as follows: [drive(part1, pmaec1), grasp(mug1, rarm1), drive(pmaec1, pmast1), wait_unblock(mast1), put(mug1, pawrt1)]. Clearly, AM'⊂AM2, so the complement of set AM' in set AM2, wait_unblock(mast1), is extracted as a free abstract method and generalized as wait_unblock-(?manArea) (lines 15-17 in Listing 1). Given that the sequence of abstract methods extracted from sub-scenario 3 did not contain the same sequence as AM', it was created as a new branch of ma under the preconditions F(AM) (lines 10-12 in Listing 1), and preconditions of the other branch were adjusted (line 6 in Listing 1).

Planning Based on Activity Schema with Abstract Methods
To assess the performance of the planning algorithm based on activity schema with abstract methods, two types of obstacles, a table and a person, were both added to sub-scenario 4. The position of the table was the same as the scenario of learning, but the position of the person changed to the other side of Table 1. The robot was not aware of the presence of obstacles at the beginning. The execution process of the plan generated by SBOP and SBP is presented in Figure 6.
Initially, a plan was generated following Algorithm 2. When the action with the highest percentage of verified features in a17 (move(pmast1, mast1)) was to be performed, the robot found that mant1 was occupied by a table, and the robot cannot complete the action. The failed execution of the action was published to the working memory, and when the planner received the fault information, replanning was activated to recover from the failure. failedParent, the parent nodes of the failed action in the task tree, was translated to the corresponding abstract method put(mug1, pawrt1) in the activity schema, and no such branch whose preconditions were applicable in state S was identified. Then, the parent node of failedParent continued to be translated to the corresponding abstract method, [drive(part1,pmaec1), grasp(mug1,rarm1), drive(pmaec1,pmast1), put(mug1, pawrt1)] (as shown in List 1), and the robot checked that the preconditions or condition 1 in List 1 were not applicable in state S. Then, the parent node, serve_a_coffee (guest1), continued to be checked, and preconditions of the second branch or condition 2 were applicable in state S. The first abstract operator a36 (move(pmaec1, pmant1)), whose feature with qualitative timestamps "at_goal" was satisfied under the second branch, was identified. So far, the best recovery position for the plan has been identified. Then, the children nodes of serve_a_coffee (guest1) in the task tree were updated from a36 as the Algorithm 2 and Algorithm 3, and a new optimal plan was generated after the merging optimization as the Algorithm 2.
changed to the other side of table 1. The robot was not aware of the presence of obstacles at the beginning. The execution process of the plan generated by SBOP and SBP is presented in Figure 6.
Initially, a plan was generated following Algorithm 2. When the action with the highest percentage of verified features in a17 (move(pmast1, mast1)) was to be performed, the robot found that mant1 was occupied by a table, and the robot cannot complete the action. The failed execution of the action was published to the working memory, and when the planner received the fault information, replanning was activated to recover from the failure. failedParent, the parent nodes of the failed action in the task tree, was translated to the corresponding abstract method put(mug1, pawrt1) in the activity schema, and no such branch whose preconditions were applicable in state S was identified. Then, the parent node of failedParent continued to be translated to the corresponding abstract method, [drive(part1,pmaec1), grasp(mug1,rarm1), drive(pmaec1,pmast1), put(mug1, pawrt1)] (as shown in List 1), and the robot checked that the preconditions or condition 1 in List 1 were not applicable in state S. Then, the parent node, serve_a_coffee (guest1), continued to be checked, and preconditions of the second branch or condition 2 were applicable in state S. The first abstract operator a36 (move(pmaec1, pmant1)), whose feature with qualitative timestamps "at_goal" was satisfied under the second branch, was identified. So far, the best recovery position for the plan has been identified. Then, the children nodes of serve_a_coffee (guest1) in the task tree were updated from a36 as the Algorithm 2 and Algorithm 3, and a new optimal plan was generated after the merging optimization as the Algorithm 2.
When the robot encountered the person who blocked the path towards table 1, the free abstract method wait_unblock (?manArea)) was identified and translated to wait_unblock (mant1) through a binding "mant1". Then, the plan solution was successfully generated following Algorithm 3. The simulation result shows that, when using our replanning method and the learned activity schema with abstract methods, the robot's execution recovers from two failures automatically.
By contrast, when using traditional SBP and learned activity schema without an abstract method, and if a17 and a37 failed, the plan would fail because SBP does not have the ability to update the task problem and replan automatically. The instructor needs to designate the activity schema to guide the planning. This process is detailed below. When the robot was to perform the action with the highest percentage of verified features in a17 (move(pmast1, mast1)) according to the plan π1 generated based on ma1, the robot cannot complete When the robot encountered the person who blocked the path towards Table 1, the free abstract method wait_unblock (?manArea)) was identified and translated to wait_unblock (mant1) through a binding "mant1". Then, the plan solution was successfully generated following Algorithm 3. The simulation result shows that, when using our replanning method and the learned activity schema with abstract methods, the robot's execution recovers from two failures automatically.
By contrast, when using traditional SBP and learned activity schema without an abstract method, and if a17 and a37 failed, the plan would fail because SBP does not have the ability to update the task problem and replan automatically. The instructor needs to designate the activity schema to guide the planning. This process is detailed below.
When the robot was to perform the action with the highest percentage of verified features in a17 (move(pmast1, mast1)) according to the plan π 1 generated based on ma 1 , the robot cannot complete the action due to an unexpected obstacle table occupying mast1. However, the SBP does not have the ability to update the task problem and replan automatically. Therefore, when the current action fails, the subsequent actions in π 1 could not be executed, and π 1 was abandoned due to the unreachability of the target state. As a result, the request was made to the instructor. At this time, the instructor was required to guide the robot. The instructor assigned the current activity schema to ma 3 according to the updated world state. Then, the SBP searched for a new plan-π 2 based on ma 3 . The first action to be performed was move_base(pmast1, pmant1), which had the highest percentage of verified features in a36 (move(pmaec1, pmant1)). That meant that the robot should place the coffee cup on the left side of the guest (the north side of Table 1).
When the action with the highest percentage of verified features in a37 (move(pmant1, mant1)) was to be performed, the robot found that mant1 was occupied by a person, and the robot could not complete the action. Similarly, since the SBP could not update the task problem and replan automatically, the current action failed. When the follow-up action could not be performed, π 2 failed because the target state of the plan could not be reached, and the request was made again to the instructor. At this time, the instructor was required to guide the robot. The instructor assigned the current activity schema to ma 2 according to the updated world state. Then, the SBP searched for a new plan-π 3 based on ma 2 . The first action to be performed was wait_unblock (mant1), which had the highest percentage of verified features in a27 (wait_unblock(mast1)). That meant that the robot should wait until the person leaves. Finally, the task was completed.
The performances of SBP and our planner SBOP were evaluated based on the aspects of autonomy, adaptability and efficiency. In terms of autonomy, SBP requires more than two intervention instructions during task execution, whereas SBOP does not require intervention instruction and can recover from failures automatically. In terms of adaptability, SBOP can adjust among multiple learned activity schemas during execution and generate the corresponding plans by detecting the preconditions. However, SBP does not have this capacity. In terms of efficiency, even though the plan-generating time cost for the first recovery of SBOP is greater than SBP's time to check the preconditions of the abstract methods, the total time for plan recovery from failure of SBOP is 59 seconds shorter. As shown in Table 1 below. In this scenario, the environment is set with 3 tables (Table 1-3) and a counter (Table 4) distributed in a laboratory (as shown in Figure 7). The task for the robot is to move the drink boxes from 3 tables to the counter. Similar to PR2, the Inbot robot has learned an activity schema for collecting a beverage box, including several abstract methods, such as grasp_object, put and move_object. Inbot is a mobile robot equipped with an arm and a gripper, together with sufficient sensors for automated navigation and manipulation.
In the experiment, the action sequences generated with SBOP and SBP are presented with three colours in Figure 8. The actions in blue represent the same components in the two plans; the actions in orange represent components in the SBP-generated plan but removed in the SBOP-generated plan; and the actions in green represent the merged components in the SBP-generated plan compared with the SBOP-generated plan.   Figure 8. The action sequences generated by SBOP and SBP. The actions in blue represent the same components among the two plans; the actions in orange represent components in the SBP-generated plan but removed in the SBOP-generated plan; the actions in green represent the merged components in the SBP-generated plan compared with the SBOP-generated plan.
SBOP initially generated the same plan as SBP. Then, actions with independence or allowance relations were identified and placed in the same action clique. For example, since actions ob4 and kt11 exhibited independence relation, and kt11 and p14 exhibited allowance relation, the three actions could be placed in the same action clique AC1 to complete the task of transferring drink box 1 to the counter in the order: (ob4, kt11, p14). If multiple drink boxes were on the robot tray, several action cliques, which were independent of each other, would be placed in a larger action clique AC. For example, when AC1 (ob4, kt11, p14) and AC2 (ob4, kt22, pt24) were placed in action clique AC, the repetitive action ob4 would be merged to leave only one in AC.
Then, in the optimization modeled as MWCP, the task plan with maximum time savings had the least number of actions, in which the robot collected the maximum number of the boxes at a time. During this process, two ob4 actions were deleted. Actions m14 and m42 were replaced by action m12, since m12 had the same preconditions as m14 and the same effect as m42, with shorter time costs. Actions m24 and m43 were replaced by action m23 accordingly.
The execution process with planner SBOP is shown in Figure 9. The performance comparison between SBOP and SBP is presented in Table 2. Although the plan-generating time of SBOP is 18.4% longer than SBP, the execution time is 17.4% shorter. In total, the robot can save 178 s to complete all tasks with SBOP, indicating the improvement in efficiency of SBOP compared with SBP. Table 1   Table 2   Table 3 counter

The plan generated by SBP
The plan generated by SBOP  Figure 8. The action sequences generated by SBOP and SBP. The actions in blue represent the same components among the two plans; the actions in orange represent components in the SBP-generated plan but removed in the SBOP-generated plan; the actions in green represent the merged components in the SBP-generated plan compared with the SBOP-generated plan.
SBOP initially generated the same plan as SBP. Then, actions with independence or allowance relations were identified and placed in the same action clique. For example, since actions ob4 and kt11 exhibited independence relation, and kt11 and p14 exhibited allowance relation, the three actions could be placed in the same action clique AC1 to complete the task of transferring drink box 1 to the counter in the order: (ob4, kt11, p14). If multiple drink boxes were on the robot tray, several action cliques, which were independent of each other, would be placed in a larger action clique AC. For example, when AC1 (ob4, kt11, p14) and AC2 (ob4, kt22, pt24) were placed in action clique AC, the repetitive action ob4 would be merged to leave only one in AC.
Then, in the optimization modeled as MWCP, the task plan with maximum time savings had the least number of actions, in which the robot collected the maximum number of the boxes at a time. During this process, two ob4 actions were deleted. Actions m14 and m42 were replaced by action m12, since m12 had the same preconditions as m14 and the same effect as m42, with shorter time costs. Actions m24 and m43 were replaced by action m23 accordingly.
The execution process with planner SBOP is shown in Figure 9. The performance comparison between SBOP and SBP is presented in Table 2. Although the plan-generating time of SBOP is 18.4% longer than SBP, the execution time is 17.4% shorter. In total, the robot can save 178 s to complete all tasks with SBOP, indicating the improvement in efficiency of SBOP compared with SBP. Table 1   Table 2   Table 3 counter Figure 8. The action sequences generated by SBOP and SBP. The actions in blue represent the same components among the two plans; the actions in orange represent components in the SBP-generated plan but removed in the SBOP-generated plan; the actions in green represent the merged components in the SBP-generated plan compared with the SBOP-generated plan.
SBOP initially generated the same plan as SBP. Then, actions with independence or allowance relations were identified and placed in the same action clique. For example, since actions ob4 and kt11 exhibited independence relation, and kt11 and p14 exhibited allowance relation, the three actions could be placed in the same action clique AC 1 to complete the task of transferring drink box 1 to the counter in the order: (ob4, kt11, p14). If multiple drink boxes were on the robot tray, several action cliques, which were independent of each other, would be placed in a larger action clique AC. For example, when AC 1 (ob4, kt11, p14) and AC 2 (ob4, kt22, pt24) were placed in action clique AC, the repetitive action ob4 would be merged to leave only one in AC.
Then, in the optimization modeled as MWCP, the task plan with maximum time savings had the least number of actions, in which the robot collected the maximum number of the boxes at a time. During this process, two ob4 actions were deleted. Actions m14 and m42 were replaced by action m12, since m12 had the same preconditions as m14 and the same effect as m42, with shorter time costs. Actions m24 and m43 were replaced by action m23 accordingly.
The execution process with planner SBOP is shown in Figure 9. The performance comparison between SBOP and SBP is presented in Table 2. Although the plan-generating time of SBOP is 18.4% longer than SBP, the execution time is 17.4% shorter. In total, the robot can save 178 s to complete all tasks with SBOP, indicating the improvement in efficiency of SBOP compared with SBP.

Discussion and Conclusions
In this paper, the abstract method is proposed to serve as a middle layer for an activity schema to integrate the empirical activity schemas of multiple situations, and this method allows a robot to adjust task plans and adapt to the dynamically changing environment by selecting the corresponding branches. The method for extracting relevant predicates for preconditions of an abstract method proves to be simple and practical for scenarios with complex tasks and an uncertain environment. In addition, without relying on a large number of training examples, it can be used as a general method for determining preconditions of methods in case of insufficient training examples. The arguments and the names of abstract methods can be obtained from the methods previously defined directly. Thus, this method reduces the difficulty of learning and enhances the efficiency of applying the knowledge of the existing methods in not only the observed planning results but also the entire planning domain.
The developed planner SBOP supports task plan merging optimization and task replanning with a partially backtracking technique, which clearly provides better results compared with the state-of-the-art methods in terms of autonomy and adaptability, as shown in the simulation and physical experiment results. The integration of an activity schema-based planning and backtracking technique enables the robot to adjust the current plan under a changing environment, and search for coping strategies of similar situations from the experienced activity schema, thus demonstrating increased adaptability. Compared with the traditional partial backtracking technique, SBOP depends on action relations and the hierarchical structure of the task decomposition, which are provided by the abstract methods, rather than complete definition and description of a task. The integration of activity schema-based planning and merging optimization allows the generated plan to emerge from the possible suboptimal state and provide an optimal solution. To the best of our knowledge, the proposed learning and planning method is a novel solution of how to merge

Discussion and Conclusions
In this paper, the abstract method is proposed to serve as a middle layer for an activity schema to integrate the empirical activity schemas of multiple situations, and this method allows a robot to adjust task plans and adapt to the dynamically changing environment by selecting the corresponding branches. The method for extracting relevant predicates for preconditions of an abstract method proves to be simple and practical for scenarios with complex tasks and an uncertain environment. In addition, without relying on a large number of training examples, it can be used as a general method for determining preconditions of methods in case of insufficient training examples. The arguments and the names of abstract methods can be obtained from the methods previously defined directly. Thus, this method reduces the difficulty of learning and enhances the efficiency of applying the knowledge of the existing methods in not only the observed planning results but also the entire planning domain.
The developed planner SBOP supports task plan merging optimization and task replanning with a partially backtracking technique, which clearly provides better results compared with the state-of-the-art methods in terms of autonomy and adaptability, as shown in the simulation and physical experiment results. The integration of an activity schema-based planning and backtracking technique enables the robot to adjust the current plan under a changing environment, and search for coping strategies of similar situations from the experienced activity schema, thus demonstrating increased adaptability. Compared with the traditional partial backtracking technique, SBOP depends on action relations and the hierarchical structure of the task decomposition, which are provided by the abstract methods, rather than complete definition and description of a task. The integration of activity schema-based planning and merging optimization allows the generated plan to emerge from the possible suboptimal state and provide an optimal solution. To the best of our knowledge, the proposed learning and planning method is a novel solution of how to merge activity experiences from multiple situations and generate an intelligent and efficient plan that could be adapted to a dynamically changing environment.
The computation load and time are significant for system learning and optimization. We considered three aspects of efficiency. Firstly, the efficiency of an activity schema-based planner will be improved with the increase of the number of task arguments [2]. Therefore, the instructor needs to provide sufficient task arguments to guarantee the efficiency of the plan. Due to the added abstract methods with additional task arguments in SBOP, the total number of task arguments of activity schema with abstract methods is often more sufficient than those without abstract methods, thus with higher efficiency. Secondly, using the merged activity schema in SBOP, the robot can automatically generate the corresponding plan when encountering similar situations. Without using abstract methods in SBP, the instructor needs to specify the activity schema in the similar situation. In contrast, automatic planning is more efficient. Thirdly, we introduced a partial backtracking technique in the replanning phase of SBOP, which not only realized automatic replanning but also improved efficiency compared to the method that required overall replanning.
However, there are several limitations to this research. The generalization performance remains to be improved. The efficiency of replanning is still relatively low even with partial backtracking, and certain issues still remain in terms of real-time replanning. One possible idea is to retain a few invalid supplement plans that can be switched to valid plans with minimal environmental change by designing an alternative layer in the planner [27].