Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation

Fu, Yan; Guo, Wen; Wang, Haipeng; Xue, Shuqi; Wang, Chunhui

doi:10.3390/app13137506

Open AccessArticle

Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation

by

Yan Fu

¹

,

Wen Guo

^1,*

,

Haipeng Wang

^2,*

,

Shuqi Xue

³ and

Chunhui Wang

³

¹

School of Mechanical Engineering and Science, Huazhong University of Science and Technology, Wuhan 430072, China

²

School of Intelligent Manufacturing, Jiangsu College of Engineering and Technology, Nantong 226006, China

³

National Key Laboratory of Human Factors Engineering, Astronaut Research and Training Center of China, Beijing 100094, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(13), 7506; https://doi.org/10.3390/app13137506

Submission received: 19 May 2023 / Revised: 20 June 2023 / Accepted: 22 June 2023 / Published: 25 June 2023

(This article belongs to the Special Issue Ergonomics and Human Factors in Transportation Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

On lunar missions, efficient and safe transportation of human–robot systems is essential for the success of human exploration and scientific endeavors. Given the fact that transportation constructs bottlenecks for numerous typical lunar missions, it is appealing to investigate what function allocation strategies can generate optimal task implementation paths for robots with low-human workloads when the situation changes. Thus, this paper presents a novel approach to dynamic human–robot function allocation explicitly designed for team transportation in lunar missions. The proposed dynamic allocation framework aims to optimize human–robot collaboration by responding to existing and potential contingencies. First, a fitness concept model is designed to quantify the factors that motivate the functional adaptation of each agent in dynamic lunar mission scenarios. A hierarchical reinforcement learning (HRL) algorithm with two layers is then employed for decision-making and optimization of human–robot function allocation. Finally, the validity of the framework and algorithm proposed is validated by a series of human–robot function allocation experiments on a simulated environment that mimics lunar transportation scenarios, and is compared with the performance of other algorithms. In the future, path-planning algorithms can be incorporated into the proposed framework to improve the adaptability and efficiency of the human–robot function allocation in lunar missions.

Keywords:

fitness; human–robot function allocation; hierarchical reinforcement learning; unstructured environment; lunar mission

1. Introduction

As the exploration of space continues to evolve, the Moon has emerged as a promising destination for scientific research and human settlement. In lunar missions, the transportation of personnel, equipment and resources plays a crucial role in facilitating exploration and enabling the successful execution of mission objectives. Advances in autonomous technologies have the potential to increase the ability to work on large-scale operations and minimize human risk through the use of autonomous rovers [1]. However, there is also the risk that if an autonomous system fails, the astronauts would have to spend valuable time rescuing the rover or taking over the tasks, which is considered a difficult failure to solve in an early stage lunar mission. The challenges and complexities associated with lunar transportation necessitate innovative solutions to optimize efficiency, ensure safety, and enhance mission productivity. One promising avenue for addressing these challenges lies in the collaboration between humans and robots [2]. By leveraging the complementary capabilities of both humans and robots, a dynamic human–robot function allocation system can be developed to optimize the allocation of transportation tasks, leading to increased efficiency and improved adaptability to the variability of complex environments. This is especially the case in the resource-intensive, technologically risky and environmentally unstructured world of lunar surface operations [3].

In multi-human–robot lunar surface operations (LSO), astronauts are able to drive lunar rovers. When not driven by an astronaut, some vehicles might also function independently as robots, either teleoperated from the cabin or operated with supervised autonomy [4]. However, as the complexity of the environment and the task size expand, the function allocation efficiency is significantly suppressed. Due to the specificity of lunar mission scenarios, few studies have been found to construct human–robot task allocation methods for dynamic lunar missions. Fortunately, Thomas et al. proposed a task assignment method for lunar missions involving multiple agents that matches capabilities with specific functions, enabling a suitable assignment plan based on task decomposition and helping to address the unpredictable and changing nature of the lunar environment [5]. Okubo et al. leveraged an action graph to enhance the allocation of tasks and path planning for robots in dynamic environments, leading to faster and more efficient performance [6]. However, the aforementioned studies do not focus on optimizing lunar transportation, and the dynamic function allocation is still based on the predefined functions of each agent, whose capabilities will vary in different task scenarios and during the process of task implementation.

Machine learning is extremely promising for solving dynamic task allocation [7]. A series of human–robot function allocation algorithms have been proposed, such as the market-based auction algorithm [8,9], particle swarm optimization algorithm [10,11], greedy algorithm [12], heuristic algorithm [13], queen bee-assisted genetic algorithm [14], and simulation-based learning techniques [15,16]. Liu et al. introduced a task allocation approach that takes into account both capacity and time, with the objective of enhancing the efficiency of task allocation and accommodating current tasks as they arise [17]. Lyu et al. have suggested an approach that takes into account transportation time and collision avoidance, aiming to reduce the number of robots required [18]. Chen et al. suggested an algorithm for task allocation that optimizes the time required to complete the task [19]. Zitouni et al. optimized the task allocation algorithm by conducting the maximum number of tasks [20]. Al-Hussaini et al. have suggested methods that take into account the effect of possible mission uncertainties on both the robot and the task, intending to generate task reallocation suggestions automatically [21]. Considering the heterogeneity of robots, Tai et al. studied the different states of the robot, categorizing them as normal, delayed, or capable of recovering from a delay [22]. Huang et al. analyzed the capabilities of various types of robots to assess the range of tasks they could perform [23]. Wang et al. combined the terrain and task adaptabilities of humans and heterogeneous robots to obtain a more accurate method of quantifying the search and rescue ability [24]. The above studies shed light on addressing the dynamic challenge of assigning tasks and planning paths in the presence of contingencies. But they ignore how environmental changes will bring about corresponding variations in agent capabilities. In addition, the proposed method cannot be efficient in the LSO context when the lunar scenario specificity of the lunar environment and terrain is not carefully considered, and the various influences of lunar transport are not accurately quantified.

To address the above issues, we propose an approach for dynamic human–robot function allocation in unstructured lunar environments. First, we develop a set of three fitness functions to measure the adaptability of agents in lunar surface operations, including distance, task, and environment fitness. The idea of measuring the level of collaboration between humans and robots has been introduced to determine how well they work together. Then, a proposal has been constructed for the allocation of functions between humans and robots using the HRL algorithm with two layers. The optimization objective in the first layer is to maximize the fitness of the task and environment, which aims to obtain the astronaut type and robot type, as well as the level of human–robot collaboration that would best fit the task requirements. For layer two, The definitive optimization objective is to increase the distance fitness so that each task can be allocated to a designated astronaut and robot. Finally, the transportation scenario of a sequence of lunar surface operations, encompassing ground exploration, failed robot rescue, material transport, and return of lunar surface samples, is used as a case scenario to validate how well the human–robot function allocation approach works. Based on the experimental findings, it has been demonstrated that the proposed task allocation algorithm is capable of significantly enhancing the efficiency of task allocation. In the face of complex LSO tasks, only some of the most common scenarios and task execution subjects are considered in this paper. In order to expand the application area of the proposed approach, it would be beneficial to consider additional types of subjects and tasks in future work. In addition, we plan to incorporate path planning algorithms into the proposed framework to enhance the adaptability and effectiveness of assigning human–robot functions during lunar missions.

As shown in Figure 1, the following sections outline the paper’s contents. In Section 2, we explain the procedure for the human–robot function allocation strategy, which consists of defining and computing the fitness and using the HRL algorithm. We state the experimental design in Section 3 and analyze the results in Section 4. Finally, we conclude the work of this paper and suggest potential avenues for future improvement in Section 5.

2. Methods

This section presents the dynamic human–robot function allocation model, including its problem context, algorithmic framework, function definition, and some specific parameter settings. This algorithm is presented in the context of coping with existing or potential contingencies in the unstructured environment of multi-agent lunar surface operations (LSO) tasks. A two-layer HRL algorithm is used to measure the level of adaptation of agents to tasks, environments, and distances using fitness-based metrics.

2.1. Formulation of the Problem

Figure 2 shows the typical transportation scenarios and agents of the LSO mission. The environment contains unpredictable terrain and uncertain lighting, where agents are required to transport in a variety of LSO tasks, such as ground exploration, robot rescue, material transport, and lunar surface samples. Different agents have different tasks and environment adaptabilities, including astronauts, a small exploration robot, large rescue robot, material transport robot, and lunar rover. There are two types of LSO mission executions, the first being primarily teleoperation, where the astronaut monitors or directly operates the robot in the capsule, and the other being directly performed by the astronaut, where the astronaut exits and drives the lunar rover on the lunar surface. There are three typical terrains of the lunar surface, including powdery, rocky, undulating and unknown. When allocating tasks between humans and robots, it is crucial to take into account the nature of the task, its location, and the flexibility of different agents to cope with changing environmental conditions with the ultimate goal of real-time adaptation of task assignment plans on demand.

2.2. Overview of the Algorithm for Dynamic Human–Robot Function Allocation

Figure 3 shows the flow of the dynamic human–robot function allocation for LSO task transportation. First, the stylized description of the transport tasks and agents of the lunar surface operations is made. LSO agents include astronauts and humanoid robots. Then, based on the stylized description and the unpredictable characteristics of the environment, the values of three fitness are calculated. Finally, the dynamic human–robot function allocation is implemented using a two-layer HRL algorithm, which is optimized with the Q-learning algorithm. Time steps are taken into account, and the results of the task allocation are iterated continuously by real-time computation. In contrast, the state of agents and tasks is influenced by the task allocation results. Thus, the HRL algorithm can output in real-time a task allocation method that best matches the input of the dynamically changing environment.

The LSO agents include astronauts and robots with distinct perceptions, predictions, and manipulation capabilities. The LSO agents are defined as

\{\begin{matrix} H_{i}^{1} = \{p_{i 1}^{1}, e_{i 1}^{1}, t_{i 1}^{1}\} \\ ⋮ \\ H_{i}^{H} = \{p_{i 1}^{H}, e_{i 1}^{H}, t_{i 1}^{H}\} \\ R_{j}^{1} = \{p_{j 1}^{1}, e_{j 1}^{1}, t_{j 1}^{1}\} \\ ⋮ \\ R_{j}^{R} = \{p_{j 1}^{R}, e_{j 1}^{R}, t_{j 1}^{R}\} \end{matrix} i \in [1, I^{*}], j \in [1, J^{*}]

(1)

In Equation (1),

H_{i}

and

R_{j}

refer to the specific features of the astronaut and lunar robot, respectively. The variables

p_{*}^{*}

,

e_{*}^{*}

, moreover,

t_{*}^{*}

indicate the position, environment types, and transportation related to LSO tasks for each astronaut and robot. Additionally,

I^{*}

and

J

represent the entire number of astronauts and robots, which belong to

* t h

kind.

The tasks are defined as

T_{m} = \{p_{m}^{t}, d_{m}^{t}, t_{m}^{t}\}, m \in [1, M]

(2)

The total number of tasks is represented by

M

. The position, environment, and task specifications of the

m

th task are referred to as

p_{m}^{t}

,

d_{m}^{t}

and

t_{m}^{t}

, respectively.

2.3. Definition and Calculation of the Fitness

Task fitness, environment fitness, and distance fitness are functions that quantify the adaptability of the agent to the LSO mission, the lunar surface environment, and the transport distance.

2.3.1. Task Fitness

Task fitness is calculated as follows:

T F (a_{i, j}, t_{m}) = W_{i, j} \times L_{m}

(3)

The capabilities of the

i

th astronaut and

j

th lunar robot to perceive, make decisions, and operate are denoted by

W_{i, j}

, while the abilities required for the mth task are represented by

L_{m}

.

There is a sequence of LSO missions that encompasses ground exploration, robot rescue, material transport, and lunar surface samples. In the ground exploration task, the agent is supposed to collect environmental data and plan a moving path while avoiding obstacles. In robot rescue tasks, agents locate, inspect, repair, and move malfunctioning robots. In material transport tasks, agents are supposed to lift, move, and lower material. In the lunar surface sample task, the agent is supposed to plan the cut trajectory and perform the cut operation.

In Table 1, the capabilities of agents for different LSO tasks can be found. The symbol “

\sqrt

” represents that the agent possesses the corresponding ability, while the symbol “

\times

” represents that the agent lacks the corresponding ability. The LSO tasks are rigorously categorized to provide a clear understanding of the fitness calculations for the tasks. The robots involved are small exploration robots, large rescue robots, material transport robots and lunar rovers. The crew consists of astronauts inside the capsule and outside.

The normalization of task fitness is calculated as

T F (a_{i, j}, t_{m}) = \frac{T F (a_{i, j}, t_{m})}{\max_{k} T F (a_{i, j}, t_{m})}

(4)

To quantify the task fitness, a normalizing computation with specific rules is used. The task fitness has a range of 0.0 to 1.0. A score of 0 indicates that the agent is not sufficiently adaptive to the task, while a score of 1 indicates that the agent is sufficiently adaptive.

Figure 4 shows the results of the task fitness calculations for different agents. It can be found that the task fitness is low for all four task types, and collaboration between astronauts and robots could significantly improve task fitness. For ground exploration tasks, any combination of astronaut–robot collaborations has a higher task fitness than a single robot. For the robot rescue task, the highest fitness is calculated by combining the astronauts in the capsule and the large rescue robot. For material transport tasks, small exploration robots and large rescue robots have the lowest task fitness. In contrast, the task fitness of other robot–astronaut combinations is relatively high. Only the lunar rover or astronaut–rover collaboration has a relatively high fitness for lunar surface sample tasks.

2.3.2. Environment Fitness

The concept of environment fitness refers to the ability of the robot to adapt to the lunar environment.

E F (r o, t e, m p, l t) = P (r o, t e, m p, l t) [S (r o, t e, m p, l t) + V (r o, t e, m p, l t)]

(5)

E F (r o, t e, m p, l t)

represents the terrain fitness of the robot

r o

in the environmrnt with the type

t e

of terrain, with the map

m p

and light

m p

. Where

t e

represents the type of topography, it may be one of four types, including powdery, rocky, undulating, and unknown.

m p

represents the degree of unpredictability of the map of the task region, which may be of one of two types, including the case where only the coordinates are known but no detailed map is, and the case where the detailed map with the coordinates is known.

l t

is the intensity of the light

l t

, divided into two types, dark and bright.

The passability, stationarity, and speed of the robot

r o

in the environmrnt with the type

t e

of terrain are denoted by

P (r o, t e, m p, l t)

,

S (r o, t e, m p, l t)

, and

V (r o, t e, m p, l t)

, respectively. It is worth noting that

E F (r o, t e, m p, l t)

is calculated using (5) only if

t e

is known. The parameters are defined and computed as follows.

The passability

P (r o, t e, m p, l t)

is defined as 1.0 when the robot

r o

is able to pass through the terrain

t e

with the map

m p

and light

m p

successfully.

P (r o, t e, m p, l t)

is defined as 0.0 when the robot

r o

is not able to pass through

t e

at all.

The stationarity

S (r o, t e, m p, l t)

is calculated through the Sperling equation, which represents the stability when the robot is moving.

S (r o, t e, m p, l t) = 2.7 \sqrt[10]{a^{3} f^{5} F c (f)} = 0.896 \sqrt[10]{\frac{j^{3}}{f} F c (f)}

(6)

where

a

is the amplitude of robot moving.

j

is the acceleration of amplitude,

f

is the frequency of vibration, and

F c (f)

is the correction factor of frequency.

F c (f) = \{\begin{array}{l} 0.8 f^{2} & f = 0.5 ~ 5.4 Hz \\ \frac{650}{f^{2}} & f = 5.4 ~ 26 Hz \\ 1 & f > 26 Hz \end{array}

(7)

The velocity

V (r o, t e, m p, l t)

is defined as the maximum speed that each robot can move on specific terrain.

If the type of terrain is unknown, then considering the types of terrain that might occur are known, which include powdery, rocky, and undulating. If all these types are adaptable (could be passed successfully) for the combination of astronaut and robot,

E F (r o, t e, m p, l t)

is defined as 1.0. Otherwise,

E F (r o, t e, m p, l t)

is defined as 0.0.

E F (r o, t e, m p, l t) = \frac{E F (r o, t e, m p, l t)}{\max_{k} E F (r o, t e, m p, l t)}

(8)

The environment fitness is normalized in the same way as terrain fitness. The environment fitness range takes values from 0.0 to 1.0.

r o

is considered not able to adapt to the lunar environment at all when

E F (r o, t e, m p, l t)

is 0.0, and

r o

is considered able to adapt to the environment fully when

E F (r o, t e, m p, l t)

is 1.0.

To make the statistical graphics more concise, the 13 environment types are numbered using numbers 1–13, the correspondence of which is shown in Table 2.

Figure 5 shows the environment fitness of different robots in various lunar environments. It can be found that only the small exploration robot has a higher environment fitness in unknown terrain. The environment fitness of the different robots is higher in powdery terrain than in rocky and undulating terrain, where the robots tend to be more adaptable in bright situations than in dark situations, and tend to pass faster and steadier in environments where coordinates are known than in environments where coordinates are unknown. Compared to the other three types of robots, the environment fitness of the small exploration robot is more consistently high across different types of environments.

2.3.3. Distance Fitness

The “distance fitness” means the level of effectiveness that changes depending on how far the person is from the task at hand. It is calculated using the following formula:

{DF}_{i} (a_{i, j}, t_{m}) = \min (D H (i, m), D R (j, m))

(9)

The adaptabilities of distance from the

i

th astronaut and

j

th lunar robot to the

m

th mission are represented by

D H (i, m)

and

D R (j, m)

. Agents that have higher fitness values are deemed more suitable for the task. When the astronauts in the capsule are selected, the distance fitness is calculated using the robot’s moving distance, and the degree of human–robot collaboration is designed to describe how closely astronauts and robots collaborate. When selecting an astronaut outside the capsule, the distance fitness is calculated using the larger moving distance between the astronaut and the robot. The detailed calculation of the mobility distance fitness is as follows:

\{\begin{array}{l} D H_{i n} (i_{i n}, m) = \frac{H R C d (i_{i n}, m)}{\max H R C d (i_{i n}, m)} i_{i n} ϵ [1, I_{i n}] \\ D H_{o u t} (i_{o u t}, m) = 1 - \frac{D H_{o u t} (i_{o u t}, m)}{\max (D H_{o u t} (1, m), \dots, D H_{o u t} (I_{o u t}, m))} \\ D R (j, m) = 1 - \frac{D R (j, m)}{\max (D R (1, m), \dots, D R (J, m))} \end{array}

(10)

The range of HRCd for human–robot collaboration is taken from 0.0 to 1.0. The lower the HRCd, the higher the autonomy of the robot. When HRCd is 0.0, the robot is considered capable of performing the task entirely on its own, without the operation of the astronaut who is supposed to do nothing but confirm the results of the operation. When HRCd is 1.0, the robot is not considered capable of performing the task on its own at all, and the astronaut should teleoperate with the robot for the entire mission.

To calculate the distance traveled by an astronaut or a robot outside of the capsule during a mission, we use the straight line distance between two locations on the unstructured lunar surface, as detailed maps are not available. The straight line distance between the agent and the mission location is used to calculate the distance traveled by an astronaut or robot outside the capsule to the mission.

In order to account for the increase in distance around a barrier that is not accounted for by the calculation in a straight line, the dynamic HRL algorithm is used in this study. This algorithm is able to continuously compute the task distance in real-time, accounting for changes in time and the movement of the robot’s position, allowing for simultaneous running, computation, and planning. Additional details on the algorithm can be found in the following subsection.

2.4. Hierarchical Reinforcement Learning Algorithm

Reinforcement learning is considered as an efficient approach to solve the optimal allocation problem [25]. By learning to operate on different levels of temporal abstraction, hierarchical reinforcement learning (HRL) can solve the reinforcement learning expansion problem, and considering its advantages in data richness complexity, efficient data utilization and learning performance, this paper uses the HRL algorithm with two layers for solving the optimal human–robot functional assignment problem [26,27].

It is important to note that the position and state of the agent and task will change as the agent performs the task. To address this issue, the proposed method re-allocates tasks in real-time at each point in time with the goal of updating the task allocation scheme to adapt to shifts in the agent and task states. The initial value of time is set to 0 before the human–robot function allocation starts, and the initial state of the agent and the task is fed into the HRL algorithm. After the human–robot function allocation begins, the algorithm will collect the input information at fixed intervals, which includes the status of agents (whether agents are performing a task or not), the position of agents, and the status of tasks. Then, the algorithm could calculate the combination of the astronaut, robot, and the HRCd between them (if the astronaut is in the capsule, the HRCd should be considered), which has the highest fitness, based on the input information of the current point in time. At the same time, as the task is executed, the state of the agent and the task changes at the next point in time, and the updated input computes a fresh task allocation scheme to suit the current situation. As time changes, the input and output of the algorithm are continuously updated, forming a dynamic loop for the entire HRL algorithm, which may assist astronauts in supervising the transport of LSO tasks.

The first layer of HRL builds M Q-tables, one-to-one correspondence with tasks, and aims to compute the agent that best matches each mission. The HRL algorithm’s second layer takes the first layer’s output as its input and builds another Q-table using the Q-learning algorithm, with the goal of computing the agent that best matches all tasks. The HRL algorithm optimizes the first layer based on task and environment fitness. The optimization of the second layer is based on distance fitness.

2.4.1. HRL Algorithm-Layer One

To ensure efficient and effective mission execution, the first layer of HRL involves computing the most appropriate crew and robot based on mission and environment requirements. This is achieved by using a Q-learning algorithm with the goal of maximizing the task and environment fitness for each task. To calculate the sum of the task and environment fitness, the function of reward

R_{m}^{1} (s^{1}, a^{1})

is proposed.

R_{m}^{1} = (s^{1}, a^{1}) = [\begin{matrix} α & β \end{matrix}] [\begin{matrix} T F (a_{i, j}, t_{m}) \\ E F (a_{i, j}, t_{m}) \end{matrix}]

(11)

The task fitness of agent

a^{1}

for the

m

th task is denoted by

T F (a_{i, j}, t_{m})

, while

E F (a_{i, j}, t_{m})

represents the environment fitness of agent

a^{1}

for the

m

th task. The total number of tasks is

M

, and

a^{1}

refers to the combination of the ith type of

i

th type of astronaut and the

j

th type of robot for human–robot collaboration. Coefficients

α

and

β

have also been defined.

a^{1} = h^{i} \cup r^{j}, i ϵ [1, I], j ϵ [1, J]

(12)

The variables

I

and

J

represent the entire number of astronauts and lunar robots, respectively.

s^{1} = [\begin{matrix} H & R \end{matrix}]

(13)

The variable

s^{1}

denotes the state, comprising of both the astronaut (

H

) and lunar robot (

R

). The agent in this context is described as:

a^{1} = \{\begin{array}{l} h^{i} & i f s^{1} = H \\ r^{j} & i f s^{1} = R \end{array}

(14)

In Q-learning, the time difference method is used to update the formulation. The formula for updating is provided as follows:

Q^{1} (s^{1}, a^{1}) = Q^{1} (s^{1}, a^{1}) + ε ∆ Q^{1} (s^{1}, a^{1})

(15)

The Q value of agent

a^{1}

in state

s^{1}

is represented by

Q^{1} (s^{1}, a^{1})

. This value is updated by adding the increment to the total value and multiplying it by the learning rate ϵ.

∆ Q^{1} (s^{1}, a^{1}) = R_{n}^{1} (s^{1}, a^{1}) + δ \max_{A^{1}} Q^{1} ({(s^{1})}^{'}, A^{1}) - Q^{1} (s^{1}, a^{1})

(16)

To update the state (

s^{1})

, we select the highest value of

Q^{1}

for the next state

{(s^{1})}^{'}

, and then multiply it by the decay factor 𝛿. This is then added to the present reward and subtracted from the present value of Q.

2.4.2. HRL Algorithm-Layer Two

In the first layer, the types of astronauts and robots are calculated, which are suitable for specific LSO tasks. This is used as input to the second layer, which aims to compute the best match between the astronaut and the robot for each task by maximizing the distance fitness. We designed a reward function that calculates the overall fitness score for all tasks by taking into account their distances fitness and its weights.

R^{2} (s^{2}, a^{2}) = \sum_{m \in M} D F (a^{2}, t_{m})

(17)

When referring to the fitness distance of agent

a^{2}

in task

t_{m}

, we define

D F (a^{2}, t_{m})

as distance fitness. Additionally, the state of the second layer is denoted as

s^{2}

.

s^{2} = [t_{1}, \dots, t_{m}]

(18)

The agents could be any possible combination of astronaut and robot.

a^{2} = \{\begin{matrix} [h_{1}^{1}, \dots, h_{I 1}^{1}] \cup [r_{1}^{1}, \dots, r_{J 1}^{1}], s^{2} = t_{1} \\ \dots \\ [h_{1}^{m}, \dots, h_{I m}^{m}] \cup [r_{1}^{m}, \dots, r_{J m}^{m}], s^{2} = t_{m} \end{matrix}

(19)

In the

m

th task,

h_{i}^{m}

and

r_{j}^{m}

represent the selected

i

th astronaut and

j

th lunar robot. The second layer is updated by the same formula as the first layer and is as follows:

Q^{2} (s^{2}, a^{2}) = Q^{2} (s^{2}, a^{2}) + ε ∆ Q^{2} (s^{2}, a^{2})

(20)

∆ Q^{2} (s^{2}, a^{2}) = R_{n}^{2} (s^{2}, a^{2}) + δ \max_{A^{2}} Q^{2} ({(s^{2})}^{'}, A^{2}) - Q^{2} (s^{2}, a^{2})

(21)

In this formula,

ε

and

δ

refer to the learning rate and value of attenuation, respectively, of the second layer.

3. Experiments

There are two comparison experiments designed as follows:

In the first experiment, HRL and RL, DRL, and HDRL algorithms are used for human–human function allocation with the same number of agents, respectively. In the second experiment, four types of algorithms are used for task allocation, each with a varying number of agents. The efficiency of task allocation is the metric used for comparison, and the experiments are designed as follows.

3.1. Experiment 1

The parameters of the HRL algorithm are designed as follows:

The coordinates of agents and tasks are designed to arrange in the task area randomly. The learning rate is 0.4, and the attenuation value is 0.6 in the HRL algorithm. It is worth noting that the coordinates of each agent change with their movement. In Figure 5, the numbers of optimization iterations for the HRL-based first and second layers are shown.

To demonstrate the proposed approach’s effectiveness, the RL algorithm is chosen as a representative of the traditional human–robot function allocation algorithm, and the same computation is performed with the same number of agents to compare the task assignment efficiency of the two algorithms.

The reward function of the RL algorithm is as follows:

R_{m} (s, a) = [\begin{matrix} α & β & θ \end{matrix}] [\begin{matrix} T F (a, t_{m}) \\ E F (a, t_{m}) \\ D F (a, t_{m}) \end{matrix}]

(22)

In the Q-learning algorithm, the state representing the astronaut and the robot is denoted by

s

, while the agent representing the human–robot collaborative combination is denoted by

a

. There are three coefficients defined as

α

,

β

, and

θ

.

\{\begin{matrix} S^{1} = [H, R] \\ A = h_{m}^{i} \cup r_{n}^{j} \end{matrix}, i ϵ [0, I], j ϵ [0, J]

(23)

In the above equation, the variables

I

and

J

denote the total count of astronauts and robots, respectively. The variable

h_{m}^{i}

represents the

m

th astronaut belonging to the

i

th type, whereas

r_{n}^{j}

denotes the

n

th lunar robot belonging to the

j

th type. Here, is the formula for updating:

Q (s, a) = Q (s, a) + δ ∆ Q (s, a)

(24)

Q (s, a) = R (s, a) + θ \max_{A} Q ({(s)}^{'}, a) - Q (s, a)

(25)

In this scenario,

s

and

{(s)}^{'}

denote the current state and the following state, respectively.

a

represents the agent involved, while

Q (s, a)

is referred to as the

Q

value. Meanwhile,

δ

and

θ

stand for the learning rate and attenuation value of RL, respectively.

The DRL and HDRL algorithms are constructed by combining RL and HRL algorithms with a three-layer neural network with the number of neurons in the three layers being 10, 15 and 10, respectively.

Table 3 shows the setting of the agents and tasks in experiment 1. The experiment involves a total of 24 agents, comprising of four astronauts inside the capsule (

H_{1}^{i n}

–

H_{4}^{i n}

), four astronauts outside the capsule (

H_{1}^{o u t} - H_{4}^{o u t}

), four small exploration robots (

{R e}_{1} - {R e}_{4}

), four large rescue robots (

{R r}_{1} - {R r}_{4}

), four material transport robots (

{R t}_{1}

–

{R t}_{4}

), and four lunar rovers (

{R l}_{1} - {R l}_{4}

). There are four types of tasks set up in the experiment, which include ground exploration (point E), robot rescue (point R), material transport (point T), and lunar surface samples (point S).

The proposed HRL and RL, DRL, and HDRL algorithms are used to compute the optimal human–robot function allocation scheme for the number of agents and tasks set in Table 3, respectively, and their task allocation efficiency is compared by the number of iterations of the algorithms.

3.2. Experiment 2

In Experiment 2, different numbers of agents are set. The detailed designs of the four types of algorithms is the same as in Experiment 1. Table 4 shows the design of the number of agents in Experiment 2. The number of each agent (astronaut and robot) was set as 3, 4, 5, 6, and 7, respectively. The robustness of the proposed method is verified through comparing the magnitude of the shift in the number of iterations of HRL and the other algorithms when computing the optimal human–robot combination with different numbers of agents.

4. Results

This section shows the experimental results of the comparison experiments designed in the previous section, including the task assignment efficiency of each algorithm with different numbers of agents, and the computational efficiency of each algorithm with the same number of agents. The algorithm types include HRL (used in the model of this paper), RL, DRL, and HDRL. The effectiveness and superiority of the proposed human–robot allocation method is validated by two comparison experiments.

4.1. Comparison with Other Algorithms under Different Agent Numbers

Figure 6 shows the number of iterations based on the proposed HRL algorithm. Two lines are shown in the figure: the orange one shows the second layer’s optimization and the blue line shows the first layer’s optimization. It is evident that the first layer of the algorithm converges around 100 iterations and the second layer converges around 220 iterations. Eventually, the fitness function stabilizes around 1.0.

In Figure 7, one can observe the iteration curves of RL and DRL for the same computational case. Remarkably, the reward function of RL converges to approximately 1.0 after around 12,500 iterations, while the reward function of DRL converges to approximately 1.0 after around 18,000 iterations. Both algorithms have the same final stationary value of the fitness function as HRL. Nonetheless, their ergodic counts are significantly higher than those of HRL.

In Figure 8, there are two iteration curves representing the HDRL algorithm for the same computational situation. In particular, the blue one represents the optimisation curve for the first layer and the orange one for the second layer. Of interest is that the reward functions of the two layers of HDRL converge to approximately 1.0 after 580 and 780 iterations, respectively. Although the fitness function converges to the same value, the number of iterations is still higher for this algorithm than for HRL.

To sum up, the proposed HRL improves the efficiency of human–robot function allocation by no less than 98.24%, 98.78% and 71.79%, respectively, compared to the RL, DRL and HDRL algorithms for the same number of agents. It can be demonstrated that the proposed approach could effectively improve the efficiency of task allocation compared to traditional algorithms.

4.2. Comparison with Other Algorithms under the Same Agent Number

Figure 9 presents the comparison of the efficiency of four types of algorithms for human–robot function allocation as the number of agents varies. When comparing the iteration number between HRL and the alternative algorithms for different numbers of agents, we consistently find that the iteration number for HRL is always significantly less than the others. Furthermore, the iteration number of HRL is able to remain relatively stable to a higher degree as the number of agents increases, compared to alternative algorithms.

When the number of agents of each type is 3, 4, 5, 6, and 7, the number of iterations of HRL is 930, 1040, 1250, 1650, and 1960. Meanwhile, the number of iterations for RL is 6000, 10, 16, 22, and 31, respectively. The number of iterations for DRL is 12,000, 20,000, 28, 38, and 52,000, respectively. The number of iterations for HDRL is 5000, 9000, 12, 16, and 25, respectively.

The proposed HRL is at least 90.49% more computationally efficient than regular RL for different numbers of agents by at least 94.89%, compared to DRL; this is at least 88.26% better than the HDRL algorithm. It can be shown that the proposed approach is in general highly robust compared to other traditional algorithms.

In summary, a series of simulation experiments and calculations have been performed in randomly varying and complex environments with agents planning while marching, and with varying numbers of agents to verify that the proposed method is effective and robust. Combining the results of experiments 1 and 2, it appears that the proposed method not only efficiently increases the computational efficiency compared to regular RL, DRL and HDRL algorithms, but also remains primarily highly robust while the number of agents changes.

5. Conclusion

In this paper, an approach for allocating human–robot functions dynamically in LSO missions using the hierarchical reinforcement learning algorithm has been proposed. First, functions that quantify an agent’s ability in unstructured environments, including task, environment, and distance fitness, are designed. Next, an HRL algorithm is designed to accurately compute the most optimal human–robot function allocation scheme in real-time, which consists of two layers that take into account the mobility of the agents and the unpredictability of the lunar surface. Finally, we validate the effectiveness of the proposed approach by comparing different algorithms with the same or different number of agents. The main conclusions of the paper are summarized below.

Compared to RL, DRL, and HDRL algorithms, the proposed method improves the task allocation efficiency by approximately 98.24%, 98.78%, and 71.79%, respectively, for the same number of agents;
When the number of agents is varied, the proposed method improves the task allocation efficiency by about 90.49%, 94.89% and 88.26% compared to RL, DRL and HDRL algorithms, respectively. It can demonstrate the better robustness of the proposed approach when the number of agents varies.

In future work, the path planning algorithm will be integrated based on the parameter optimization of the task assignment method, with the aim of enhancing the adaptability and efficiency of the human–robot function allocation during the lunar mission.

Author Contributions

Y.F.: Conceptualization, Formal analysis, Investigation, Methodology, Project administration. W.G.: Conceptualization, Investigation, Writing—original draft, Writing—review and editing. H.W.: Investigation, Methodology. Writing—review and editing. S.X.: Investigation, Writing—review and editing. C.W.: Conceptualization, Methodology, Investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Aerospace Medical Research Fund [No: HYZHXM03009] on the research of Adaptive Human-Robot Collaboration Method based on Cognitive Engineering, and by National Laboratory of Human Factors Engineering Stable Support Fund [No: GJSD22004] on the research of Key Technology and System Implementation of Human-Robot Collaboration in Planet Exploration.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare that they have no competing interests.

References

Timman, S.; Landgraf, M.; Haskamp, C.; Lizy-Destrez, S.; Dehais, F. Effect of time-delay on lunar sampling tele-operations: Evidences from cardiac, ocular and behavioral measures. Appl. Ergon. 2023, 197, 103910. [Google Scholar] [CrossRef] [PubMed]
Cheng, Y.; Yuhui, G.; Rui, Z.; XiaoFeng, C.; Jun, S.; Peng, L. Automatic Planning Method of Space-Ground Integrated Tele-Operation for Unmanned Lunar Exploration Rovers. In Advances in Guidance, Navigation and Control; Springer: Singapore, 2023; pp. 3644–3655. [Google Scholar] [CrossRef]
Reviews [review of two books]. IEEE Ann. Hist. Comput. 2008, 30, 104–105. [CrossRef]
Elfes, A.; Weisbin, C.R.; Hua, H.; Smith, J.H.; Mrozinski, J.; Shelton, K. The HURON Task Allocation and Scheduling System: Planning Human and Robot Activities for Lunar Mis-Sions. In Proceedings of the 2008 World Automation Congress, Waikoloa, HI, USA, 28 September–2 October 2008; pp. 1–8. [Google Scholar]
Thomas, G.; Howard, A.M.; Williams, A.B.; Moore-Alston, A. Multi-Robot Task Allocation in Lunar Mission Construction Scenarios. In Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, HI, USA, 12 October 2005. [Google Scholar] [CrossRef]
Okubo, T.; Takahashi, M. Multi-Agent Action Graph Based Task Allocation and Path Planning Considering Changes in Environment. IEEE Access 2023, 11, 21160–21175. [Google Scholar] [CrossRef]
Schleif, F.-M.; Biehl, M.; Vellido, A. Advances in machine learning and computational intelligence. Neurocomputing 2009, 72, 1377–1378. [Google Scholar] [CrossRef]
Eijyne, T.; G, R.; G, P.S. Development of a task-oriented, auction-based task allocation framework for a heterogeneous multirobot system. Sadhana 2020, 45, 1–13. [Google Scholar] [CrossRef]
Otte, M.; Kuhlman, M.J.; Sofge, D. Auctions for multi-robot task allocation in communication limited environments. Auton. Robot. 2019, 44, 547–584. [Google Scholar] [CrossRef]
Zhu, Z.; Tang, B.; Yuan, J. Multirobot task allocation based on an improved particle swarm optimization approach. Int. J. Adv. Robot. Syst. 2017, 14, 172988141771031. [Google Scholar] [CrossRef] [Green Version]
Lim, C.P.; Jain, L.C. Advances in Swarm Intelligence; Springer: Cham, Switzerland, 2009; pp. 1–7. [Google Scholar] [CrossRef]
Farinelli, A.; Iocchi, L.; Nardi, D. Distributed on-line dynamic task assignment for multi-robot patrolling. Auton. Robot. 2016, 41, 1321–1345. [Google Scholar] [CrossRef]
Nagarajan, T.; Thondiyath, A. Heuristic based Task Allocation Algorithm for Multiple Robots Using Agents. Procedia Eng. 2013, 64, 844–853. [Google Scholar] [CrossRef] [Green Version]
Sundaram, E.; Gunasekaran, M.; Krishnan, R.; Padmanaban, S.; Chenniappan, S.; Ertas, A.H. Genetic algorithm based reference current control extraction based shunt active power filter. Int. Trans. Electr. Energy Syst. 2020, 31, e12623. [Google Scholar] [CrossRef]
Chi, W.; Agrawal, J.; Chien, S.; Fosse, E.; Guduri, U. Optimizing Parameters for Uncertain Execution and Rescheduling Robustness. Int. Conf. Autom. Plan. Sched. 2021, 29, 501–509. [Google Scholar] [CrossRef]
Hu, H.-C.; Smith, S.F. Learning Model Parameters for Decentralized Schedule-Driven Traffic Control. Proc. Thirtieth Int. Conf. Autom. Plan. Sched. 2020, 30, 531–539. [Google Scholar] [CrossRef]
Liu, Z.; Wang, H.; Chen, W.; Yu, J.; Chen, J. An Incidental Delivery Based Method for Resolving Multirobot Pairwised Transportation Problems. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1852–1866. [Google Scholar] [CrossRef]
Lyu, X.-F.; Song, Y.-C.; He, C.-Z.; Lei, Q.; Guo, W.-F. Approach to Integrated Scheduling Problems Considering Optimal Number of Automated Guided Vehicles and Conflict-Free Routing in Flexible Manufacturing Systems. IEEE Access 2019, 7, 74909–74924. [Google Scholar] [CrossRef]
Chen, X.; Zhang, P.; Du, G.; Li, F. A distributed method for dynamic multi-robot task allocation problems with critical time constraints. Robot. Auton. Syst. 2019, 118, 31–46. [Google Scholar] [CrossRef]
Zitouni, F.; Maamri, R.; Harous, S. FA–QABC–MRTA: A solution for solving the multi-robot task allocation problem. Intell. Serv. Robot. 2019, 12, 407–418. [Google Scholar] [CrossRef]
Al-Hussaini, S.; Gregory, J.M.; Gupta, S.K. Generating Task Reallocation Suggestions to Handle Contingencies in Human-Supervised Multi-Robot Missions. IEEE Trans. Autom. Sci. Eng. 2023, 1. [Google Scholar] [CrossRef]
Tai, R.; Wang, J.; Chen, W. A prioritized planning algorithm of trajectory coordination based on time windows for multiple AGVs with delay disturbance. Assem. Autom. 2019, 39, 753–768. [Google Scholar] [CrossRef]
Nie, Z.; Chen, K.-C. Hypergraphical Real-Time Multirobot Task Allocation in a Smart Factory. IEEE Trans. Ind. Inform. 2021, 18, 6047–6056. [Google Scholar] [CrossRef]
Wang, H.; Li, S.; Ji, H. Fitness-Based Hierarchical Reinforcement Learning for Multi-human-robot Task Allocation in Complex Terrain Conditions. Arab. J. Sci. Eng. 2022, 48, 7031–7041. [Google Scholar] [CrossRef]
Plaat, A.; Kosters, W.; Preuss, M. High-accuracy model-based reinforcement learning, a survey. Artif. Intell. Rev. 2023, 1–33. [Google Scholar] [CrossRef]
Alpdemir, M.N. A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments. Turk. J. Sci. Technol. 2023, 18, 243–259. [Google Scholar] [CrossRef]
Pateria, S.; Subagdja, B.; Tan, A.-H.; Quek, C. Hierarchical Reinforcement Learning: A Comprehensive Survey. ACM Comput. Surv. 2021, 54, 1–35. [Google Scholar] [CrossRef]

Figure 1. Flow chart of this article.

Figure 2. Typical scenarios and agents of LSO mission.

Figure 3. The flow of the dynamic human–robot function allocation.

Figure 4. Task fitness of different agents.

Figure 5. Environment fitness of different robots in various environments.

Figure 6. The result of the human–robot function allocation based on the HRL algorithm.

Figure 7. The calculation results based on the RL and DRL algorithm.

Figure 8. The result of the human–robot function allocation based on the HDRL algorithm.

Figure 9. Comparison of the allocation efficiency of each algorithm for different numbers of agents.

Table 1. The agents’ ability in LSO tasks.

		Astronaut in the Capsule	Astronaut Outside the Capsule	Small Exploration Robot	Large Recue Robot	Material Transport Robot	Lunar Rover
Ground Exploration	Terrain Detection	$\sqrt$	$\times$	$\sqrt$	$\times$	$\times$	$\times$
	Path Planning	$\sqrt$	$\times$	$\sqrt$	$\times$	$\times$	$\sqrt$
	Obstacle Avoidance	$\times$	$\sqrt$	$\sqrt$	$\sqrt$	$\sqrt$	$\sqrt$
Robot Rescue	Positioning Robot	$\sqrt$	$\times$	$\sqrt$	$\sqrt$	$\times$	$\sqrt$
	Trouble Shooting	$\sqrt$	$\sqrt$	$\times$	$\sqrt$	$\times$	$\times$
	Moving Robot	$\times$	$\times$	$\times$	$\sqrt$	$\sqrt$	$\sqrt$
Material Transportation	Lifting Material	$\sqrt$	$\sqrt$	$\times$	$\times$	$\sqrt$	$\times$
	Moving Material	$\times$	$\times$	$\times$	$\times$	$\sqrt$	$\sqrt$
	Lowering Material	$\sqrt$	$\sqrt$	$\times$	$\times$	$\sqrt$	$\times$
Lunar surface samples	Trajectory Planning	$\sqrt$	$\sqrt$	$\times$	$\times$	$\times$	$\sqrt$
Lunar surface samples	Cutting Operation	$\times$	$\sqrt$	$\times$	$\times$	$\times$	$\sqrt$

Table 2. Numbering of the 13 environmental types.

ID	1	2	3	4	5
Environment Type	unknown terrain	powdery, no-coordinates, dark	powdery, no-coordinates, bright	powdery, coordinates, dark	powdery, coordinates, bright
ID	6	7	8	9	10
Environment Type	rocky, no-coordinates, dark	rocky, no-coordinates, bright	rocky, coordinates, dark	rocky, coordinates, bright	undulating, no-coordinates, dark
ID	11	12	13
Environment Type	undulating, no-coordinates, bright	undulating, coordinates, dark	undulating, coordinates, bright

Table 3. Experimental setting of multi-agent human–robot function allocation.

Item	Type	Number
Astronaut	Astronauts in the Capsule	4
Astronaut	Astronauts outside the Capsule	4
Robot	Small Exploration Robot	4
	Large Rescue Robot	4
	Material transport Robot	4
	Lunar Rover	4
Task	Ground Exploration	1
	Robot Rescue	1
	Material transport	1
	Lunar surface samples	1

Table 4. Experimental setting of experiment 2.

Item	Type	Number
Astronaut	Astronauts in the Capsule	3	4	5	6	7
Astronaut	Astronauts outside the Capsule	3	4	5	6	7
Robot	Small Exploration Robot	3	4	5	6	7
	Large Rescue Robot	3	4	5	6	7
	Material transport Robot	3	4	5	6	7
	Lunar Rover	3	4	5	6	7
Task	Ground Exploration	1	1	1	1	1
	Robot Rescue	1	1	1	1	1
	Material transport	1	1	1	1	1
	Lunar surface samples	1	1	1	1	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, Y.; Guo, W.; Wang, H.; Xue, S.; Wang, C. Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation. Appl. Sci. 2023, 13, 7506. https://doi.org/10.3390/app13137506

AMA Style

Fu Y, Guo W, Wang H, Xue S, Wang C. Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation. Applied Sciences. 2023; 13(13):7506. https://doi.org/10.3390/app13137506

Chicago/Turabian Style

Fu, Yan, Wen Guo, Haipeng Wang, Shuqi Xue, and Chunhui Wang. 2023. "Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation" Applied Sciences 13, no. 13: 7506. https://doi.org/10.3390/app13137506

APA Style

Fu, Y., Guo, W., Wang, H., Xue, S., & Wang, C. (2023). Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation. Applied Sciences, 13(13), 7506. https://doi.org/10.3390/app13137506

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation

Abstract

1. Introduction

2. Methods

2.1. Formulation of the Problem

2.2. Overview of the Algorithm for Dynamic Human–Robot Function Allocation

2.3. Definition and Calculation of the Fitness

2.3.1. Task Fitness

2.3.2. Environment Fitness

2.3.3. Distance Fitness

2.4. Hierarchical Reinforcement Learning Algorithm

2.4.1. HRL Algorithm-Layer One

2.4.2. HRL Algorithm-Layer Two

3. Experiments

3.1. Experiment 1

3.2. Experiment 2

4. Results

4.1. Comparison with Other Algorithms under Different Agent Numbers

4.2. Comparison with Other Algorithms under the Same Agent Number

5. Conclusion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI