A Two-Stage MLP-LSTM Network-Based Task Planning Method for Human–Robot Collaborative Assembly Scenarios

Pan, Zhenyu; Wang, Weiming

doi:10.3390/app152010922

Open AccessArticle

A Two-Stage MLP-LSTM Network-Based Task Planning Method for Human–Robot Collaborative Assembly Scenarios

by

Zhenyu Pan

and

Weiming Wang

^*

School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 10922; https://doi.org/10.3390/app152010922

Submission received: 6 September 2025 / Revised: 7 October 2025 / Accepted: 10 October 2025 / Published: 11 October 2025

(This article belongs to the Section Mechanical Engineering)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The results of this study can generate optimized assembly sequences based on information from different initial scenarios in human–robot collaborative assembly contexts, thereby enabling the planning of human-robot assembly tasks.

Abstract

In many current assembly scenarios, efficient collaboration between humans and robots can improve collaborative efficiency and quality. However, the efficient arrangement of human–robot collaborative (HRC) tasks constitutes a significant challenge. In a collaborative workspace where humans and robots collaborate on assembling a shared product, the determination of task allocation between them is of crucial importance. To address this issue, offline feasible HRC paths are established based on assembly task constraint information. Subsequently, the HRC process is simulated within a virtual environment leveraging these feasible paths. Human assembly intentions are explicitly expressed through human assembly trajectories, and implicitly expressed through simulation results such as assembly time and human–robot resource allocation. Furthermore, a two-stage MLP-LSTM network is employed to train and optimize the assembly simulation database. In the first stage, a sequence generation model is trained using high-quality HRC processes. Then, the network learns human evaluation patterns to score the generated sequences. Ultimately, task allocation for HRC is performed based on the high-scoring generated sequences. The effectiveness of the proposed method is demonstrated through assembly scenarios of two products. Compared with traditional optimization methods like DFS and Greedy, the human collaboration ratio has been optimized by 10%, while the collaborative quality evaluation has been improved by 3%.

Keywords:

human–robot collaboration; assembly sequence; two-stage MLP-LSTM network

1. Introduction

In recent years, the role of robots in industrial settings has been undergoing continuous changes. In response to the production demands of customized small-batch products, there is a growing demand for greater flexibility in production lines. Consequently, robots are required to exhibit enhanced flexibility, reduced maintenance costs, and the capability for rapid reconfiguration and deployment [1]. Assembly tasks, as a crucial component of industrial production, significantly impact overall manufacturing productivity and quality [2]. By combining the complementary advantages of human intelligence and adaptability with the precision, repeatability, and strength of robots, more complex and precise assembly tasks can be achieved [3].

In HRC assembly scenarios, the role of robots has evolved from performing repetitive and fixed actions to inferring the motions of human operators, thereby enhancing the effectiveness and overall performance of the tasks [4]. HRC systems should fully leverage the strengths of both parties. Collaborative robots possess advantages such as high strength, durability, precision, and freedom from fatigue, which can greatly enhance production efficiency [5]. However, in certain stages of collaborative tasks, the flexibility of collaborative robots may be limited in the face of more complex scenarios and operational requirements [6]. On the other hand, human collaborators exhibit flexibility, adaptability, creativity, and problem-solving abilities, enabling them to quickly adapt to new process sequences. However, the repetitive and fixed actions involved in these tasks can lead to fatigue and impose a significant burden on the physical well-being of workers [7].

HRC assembly tasks encompass a class of intricate and continuous sequential tasks, wherein humans and robots coexist in the same working environment to accomplish the assembly of specific components. These assembly tasks can be decomposed into simpler individual tasks in a hierarchical manner [8]. Moreover, the scope of these tasks extends beyond a single objective to be fulfilled within the workspace. To attain more efficient and natural multitask assembly, the robot necessitates the following capabilities: (1) the ability to comprehend multiple tasks, enabling the classification and assessment of ongoing tasks based on object affordance information within the workspace [9]; (2) the ability to infer the operational intentions of human collaborators [10], thereby allowing the robot to deduce the assembly sequence selection made by human collaborators using observable information in the shared workspace; and (3) the ability to plan assembly tasks, motion planning is facilitated via object affordance information and human collaborators’ operational intentions. This planning enhances both parties’ assembly efficiency and enables parallel execution of multiple assembly tasks.

Inferring the intentions of human collaborators is crucial within a shared workspace [11]. Human behaviors introduce uncertainty to the entire system. In previous studies [12,13], human actions were constrained to fixed action sequences, strictly following pre-defined planning processes, and deviating from these fixed task sequences was prohibited during online execution. In [14], the researcher predefined the assembly tasks and their execution order. However, the behaviors exhibited by human collaborators during task completion differ from those of highly automated robots. As highlighted in [15,16], where robots can adjust their movements based on the preferences of human collaborators. On the other hand, the robots enhance their autonomous decision-making capabilities by observing the behaviors of collaborators [17].

To address these challenges, it is essential to plan the optimal assembly routes for different assembly scenarios. Given that non-single-task assembly significantly increases the complexity of the task space, optimization of assembly sequences must be performed based on topological, geometric, and technical constraints. The contributions of this paper are outlined as follows:

(1): To represent HRC assembly tasks, a task information-based assembly graph system is proposed. The serial and parallel relationships between different tasks are identified. Based on these relationships, mathematical expressions for human and robot behavioral levels are established.
(2): A HRC assembly dataset is established based on a simulation system, enabling the generation of high-quality assembly sequences. Concurrently, an evaluation method based on explicit trajectories and implicit assembly information is developed. This method is used to simulate human assembly preferences and quantitatively assess the quality of assembly sequences.
(3): A two-stage MLP-LSTM network is employed to realize sequence generation and evaluation for new assembly scenarios, thereby achieving optimal assembly sequence prediction and HRC task planning under different initial assembly environments.

2. Related Work

In traditional manufacturing, assembly is a time-consuming and energy-intensive process [18]. In conventional industrial robot production scenarios, robots are often confined within fences and tasked with repetitive and fixed operations [19]. In unstructured environments, robots require reprogramming, resulting in significant reductions in production efficiency [20]. With advancements in robot intelligence, it has become possible for robots and humans to collaborate in shared workspaces to accomplish common tasks [5]. Currently, there have been numerous studies on strategies for HRC assembly tasks [21,22]. The key elements in such research include complex task information representation, recognition of human intentions during the assembly process.

2.1. HRC Task Representation

Numerous studies have proposed modeling approaches for expressing HRC task representations. Homem de Mello and Sanderson [23] were the first to represent assembly plans using AND/OR graphs, which represent assembly tasks as directed acyclic graphs. This approach enables hierarchical representation of complex assembly tasks and provides the ability to search for feasible assembly sequences. Similar research [24] decomposed assembly tasks into individual subtasks using a hierarchical structure and represented the tasks based on the dependencies between the subtasks. In [25], assembly tasks were modeled as a three-layer framework using AND/OR graphs, and a cost function was computed offline to select assembly strategies. However, the uncertainties in online robot perception and motion were not addressed, and the robot merely performed passive assembly actions. In recent research, AND/OR graphs have also been applied in multi-agent environments for HRC [26,27]. Optimal assembly sequences can be searched offline [28] and online [27], but AND/OR graphs face challenges in modeling tasks in the temporal dimension for parallel assembly tasks. In [29], Casalino et al. modeled the unpredictability of human behavior in HRC using a Partially Controlled Time Petri Net (TPN). Additionally, behavior trees have been frequently mentioned as a method for task representation in HRC tasks, and they have been used in industrial environments to create and monitor task plans for industrial robots [30]. However, robots struggle to adapt to flexible and changing human behavior. For human–robot assembly tasks, ref. [31] proposed a generic task allocation method based on Hierarchical Finite State Machines (HFSMs). The developed framework first decomposes the main task into subtasks modeled as state machines. Based on considerations of capabilities, workload, and performance estimation, the task allocator assigns the subtasks to human or robotic agents. In recent years, Large Language Models (LLMs) have demonstrated breakthrough performance in the field of contextual semantic understanding [32]. Particularly, in HRC scenarios, their core value as high-level semantic task planners has been fully verified [33].

However, as complex assembly tasks are characterized by uncertainty, complexity, and diversity, expressing them via graphical means allows for the assimilation of highly unstructured information into computable models. This provides a framework for obtaining optimized task sequences and constitutes the most fundamental step in realizing HRC assembly tasks.

2.2. Human Intention Recognition for HRC

The HRC process aims to achieve efficient assembly tasks. This requires the robot to infer the behavior of human collaborators within the shared workspace. A key focus in this inference process is recognizing human intentions. Prior research on human intention recognition models can be categorized into two main approaches: explicit representation and implicit representation of task-related human intentions.

Explicit human action behavior representation primarily involves sensing human motion information through sensors. In [30], a robot employed a methodology to capture different human motion patterns and unmodeled dynamics for more accurate human action perception. Andrianakos et al. implemented automatic monitoring of human assembly operations using visual sensors and machine learning techniques [31]. Natural interaction in HRC can be achieved through gesture recognition and human re-identification, enabling the robot to accurately perceive human attention and infer their intentions [34,35]. Wearable motion sensors provide another approach. In [36], Gaussian Mixture Models and Gaussian Mixture Regression were used to model action templates from inertial datasets obtained from sensors worn by operators, and online pattern matching algorithms were applied to detect and recognize meaningful actions performed by the operator. Unlike directly acquiring human motion information, implicit representation methods primarily express human intentions through the collection of task information and control of the assembly process. In [37], human intentions were modeled as a continuous sequence of assembly states that the operator considers from the current time onwards, which is translated graphically as various paths in an assembly state graph. Ravichandar et al. [38] proposed a novel approach to infer human intentions by representing them as the motion goals to be reached, using a neural network-based approximate Expectation Maximization algorithm and online model learning. Lin et al. [39] treated human intentions as hidden variables and predicted human assembly efficiency using Hidden Semi-Markov Models (HSMM). With the development of deep learning, the HRiRT model proposed by [40] achieves human intention recognition (hand force estimation) by integrating the Transformer architecture. Similarly, ref. [41] adopts the encoder module of Transformer, leveraging its self-attention mechanism to capture long-sequence dependencies in motion trajectories, which is applied to the key link of human intention recognition. In addition, ref. [42] infers human intention targets by aggregating neighborhood information through Graph Neural Networks (GNNs) and generates semantic robot action plans. The method in [43] uses a shared GNN encoder to enable the model to simultaneously learn the latent space representations of three tasks—action recognition, action prediction, and motion prediction—thus realizing an end-to-end multi-task inference framework.

Explicit human intention recognition methods are commonly used in multitask assembly scenarios. However, these methods exhibit limitations in real-time performance and efficiency. Human intentions typically revolve around specific goals. Hence, integrating task information with the current environmental data becomes a critical factor in effectively discerning human intentions.

3. Methodology

3.1. Problem Statement

This study aims to generate real-time HRC task graphs for assembly scenarios, with the specific workflow illustrated in Figure 1. First, mathematical expression of task information is completed based on domain knowledge to generate an offline assembly feasible path graph. However, due to the diversity of feasible task sequences, it is necessary to select the optimal assembly sequence, for which a two-stage MLP network is employed in this paper. Nevertheless, the involvement of human collaborators introduces uncertainty into the HRC system, thus requiring further optimization of the assembly sequence based on human intentions.

3.2. Offline Task Graph for Human–Robot Collaborative Assembly

The proposed task representation framework in this paper, as depicted in Figure 2, enables the robot to comprehend task information through a task graph, thereby avoiding the repetition or erroneous execution of tasks that could lead to a decrease in assembly efficiency. Particularly in the case of multi-task assembly operations, where the assembly sequence is not unique, human operators possess an advantage in understanding the assembly of multiple tasks.

The proposed behavior graph in this paper is a type of directed acyclic graph based on object affordance information and task information. The object affordance set is expressed as

O = \{a_{1}, a_{2} \dots a_{i}, b_{1}, b_{2} \dots b_{j}\}, a_{i} = {n a m e, k, P o s i t i o n}

represent the affordance of

{c o m p o n e n t}_{i}

in task a. The set of object affordance information comprises the names of components, the quantity of components, and the respective physical locations of each component. These factors serve as criteria to determine the feasible task types at the current stage. The task information is represented as a collection of assembly part sequences, denoted as

T = \{t_{1}, t_{2}\}, t_{i} = \{o_{1}, o_{2}, o_{2} \dots o_{n}\}

represents the assembly sequence of an individual task, where each

o_{i}

enotes the specific component involved in the assembly process. In behavior graph, each vertex V represents a collection of different components, while the edges E between vertices represent the progress of assembly tasks. We employ Algorithm 1 to construct the offline part of the behavior graph.

Algorithm 1: Offline Behavior Graph Construction Method

input : object affordance O (n)

, task information T (n)

output : G = \{V, E\}, P_{F e a s i b l e}

1:

V = {}

2:

f o r a l l t s . t . \in T (n) d o

3:

i f t \in O (n) t h e n

4:

f o r a l l O_{i} s . t . \in t d o

5:

E \leftarrow E \cup (a l l O s . t . \in V \to O_{i})

6:

E \leftarrow E \cup (O_{i} \to O_{i + 1})

7:

e n d f o r

8:

V \leftarrow V \cup {t}

9:

e n d i f

10:

e n d f o r

11:

V \leftarrow V \cup {V_{0}, V_{n + 1}}

12:

E \leftarrow E \cup (V_{0} \to O_{1}, O_{n} \to V_{n + 1})

13:

P_{p o s s i b l e} = p a t h (V_{0} \to V_{n + 1})

14:

f o r a l l p a t h s . t . \in P_{p o s s i b l e} d o

15:

i f l e n (p a t h) = l e n (V) t h e n

16:

P_{F e a s i b l e} \leftarrow P_{F e a s i b l e} \cup {p a t h}

17:

e n d i f

18:

e n d f o r

19:

r e t u r n G = \{V, E\}, P_{F e a s i b l e}

In comparison to the behavior graph of a single task, the behavior graph of multiple tasks exhibits a significant difference in the number of feasible paths. However, within the set of feasible paths, there exist samples that do not conform to the assembly rules, thereby affecting the overall effectiveness of the assembly graph. Algorithm 2 is proposed, which aims to eliminate potential assembly paths that do not comply with the specified assembly rules. Based on the specified assembly order for each assembly component within the subtasks, respective logical distance values

D (a_{i})

are defined, resulting in the assembly value set

T (a)

, for the entire assembly sequence. The logical distance

{d i s}_{l} (a)

is defined as the sum of logical distances of individual subtasks within a given path. Paths that strictly adhere to the feasible assembly rules will have a logical distance of 0, and are thus selected as

P_{e f f e c t i v e}

. The path selection logic in Algorithm 2 is designed to filter out the greatest common divisor (GCD) of paths. By comparing logical distances, it reduces the number of path heuristics and simplifies the complexity of subsequent assembly path sequence generation.

Algorithm 2: Path Screening Method Based on Logical Distance

input : P_{F e a s i b l e}

, T

output : P_{t o p o l o g y}

P_{e f f e c t i v e}

1:

f o r a_{i}, b_{i} i n T d o

2:

D (a_{i}) = 2^{i}

3:

D (b_{i}) = 2^{i}

4:

e n d f o r

5:

f o r a l l p a t h s . t . \in P_{F e a s i b l e} d o

6:

p a t h (a) = [a_{i} \in p a t h]

7:

p a t h (b) = [b_{i} \in p a t h]

8:

{d i s}_{l} (a) = | |p a t h (a) - T (a)| |

9:

{d i s}_{l} (b) = | |p a t h (b) - T (b)| |

10:

{D i s}_{l} (p a t h) = \sqrt{{d i s}_{l} (a) + {d i s}_{l} (b)}

11:

i f {D i s}_{l} (p a t h) = 0 t h e n

12:

P_{e f f e c t i v e} \leftarrow P_{e f f e c t i v e} \cup {p a t h}

12:

e n d i f

13

e n d f o r

14:

f o r a l l p a t h s . t . \in P_{e f f e c t i v e} d o

15:

{i f p a t h n o t i n P}_{t o p o l o g y} t h e n

16:

P_{t o p o l o g y} \leftarrow P_{t o p o l o g y} \cup {p a t h}

17:

e n d i f

18:

e n d f o r

19:

r e t u r n P_{t o p o l o g y}, P_{e f f e c t i v e}

An assembly state graph for the task can be generated. Unlike behavior graphs where each node represents a part to be assembled. The assembly state graph depicts changes in task states during the assembly process. Each node is a vector with a dimension equal to the number of tasks, reflecting basic assembly logic and serving as a mapping of the part space to the task space. As illustrated in Figure 3, nodes in dark blue represent tasks that must be performed by humans.

3.3. Two-Stage MLP-LSTM Training Network

The HRC assembly task graph contains task sequences for assembly. However, variations in the initial positions of parts lead to differences in initial assembly states, rendering many feasible sequences non-optimal. To obtain the optimal assembly sequence, this paper proposes a two-stage MLP training network.

(1) Dataset Establishment

The quality of assembly sequences is influenced by the coupling of multiple factors (time, load, collaboration, and spatial pose), making it difficult to fully characterize using analytical rules. A labeled dataset is constructed to enable the model to learn nonlinear scoring mappings through supervised learning, thereby enhancing its generalization ability to unseen scenarios. Since collecting motion information in actual HRC scenarios consumes substantial resources, this paper uses simulation data to build a dataset for HRC assembly graphs. As shown in Figure 4, human assembly intent consists of two parts: explicit expression via human operation trajectories

{t r a j}_{h}

, which reflects humans’ part selection preferences during assembly, and implicit expression via Human–Robot assembly information (e.g., task allocation ratio, assembly time ratio, assembly distance ratio). Among this information, blue represents the assembly information of human collaborators, while pink represents that of the robot. This information reflects humans’ preferences for the overall performance of the assembly process. Expert scoring of high-quality assembly sequences is performed based on the above two parts.

(2) Sequence Generation Network

Based on high-quality assembly sequences in the database, this paper employs a temporal network to learn scene geometry and prior constraints, generating executable and relatively efficient assembly sequences for different assembly scenarios. The specific implementation workflow of the sequence generation network is illustrated in Figure 5. The generator consists of three main components: Scene Encoder, Pre-Multi-Head Attention (Pre-MHA) & Temporal Modeling (Bi-LSTM), and Post-Multi-Head Attention with Normalization (Post MHA + Residual + LN). Two parallel prediction heads are connected to output component and executor categories.

Scene features consist of two categories: fine-grained position features and spatial relationship features. The former counts the number of instances for each component category, and records normalized coordinates and presence indicators for up to K instances according to a fixed upper limit K, thereby establishing a comparable fixed-length representation. The latter encodes the geometric layout across components by calculating cross-category neighbor pairwise distances, relative displacements, and category indices. These two types of features are concatenated as model inputs, enabling the network to perceive both multi-instance distributions and cross-category geometric correlations. The scene feature

x \in R^{d}

is projected into a global embedding

h \in R^{h}

via MLP;

h

is expanded to sequence length and combined with learnable positional encoding to form

X_{0} \in R^{L \times H}

; further, it is added to tokens composed of previous-time-step token (component/executor) embeddings to obtain X. Temporal representation Y is derived through multi-head self-attention and multi-layer bidirectional LSTM, followed by lightweight self-attention, residual connection, and layer normalization to yield U. Finally, the parallel linear heads—Part Head and Operator Head—project U to the component vocabulary dimension

| V_{p a r t} |

and operator vocabulary dimension

|V_{o p}|

, respectively, generating

{l o g i t s}_{t}^{p a r t}

and

{l o g i t s}_{t}^{o p}

. During inference, autoregression is adopted: at each step, sampling is performed based on

{l o g i t s}_{t}^{p a r t}

and

{l o g i t s}_{t}^{o p}

, and an “assembly order and quantity constraint” mask is applied to only allow feasible component candidates at the current step. Probabilistic selection is then conducted on these candidates to ensure the sequence complies with process order. For “human-exclusive” steps,

{l o g i t s}_{t}^{o p}

is forced to select humans, guaranteeing task allocation adheres to rules.

(3) Assembly Sequence Scoring Network

Based on the assembly sequences generated by the aforementioned sequence generation network, we propose an LSTM-based sequence scoring network to evaluate the execution quality of assembly sequences. This network adopts a dual-branch architecture: a positional encoding branch and a sequence modeling branch, which ultimately fuse features via a multi-layer perceptron (MLP) and output a scalar score.

The input to the network is a joint state set

I_{p, s}

of part positions and assembly sequences, where the position vector

P \in R^{d}

, (

d = N \times K

, N denoting the number of part categories and k representing the coordinate dimension) and the sequence index matrix

S \in R^{L \times 2}

, with each row containing two indices:

$s_{t, 1} \in 0,1, \dots, N - 1$ : Part index
$s_{t, 2} \in 0,1, 2$ : Executor index, where $(0 = P A D, 1 = h u m a n, 2 = r o b o t)$

The part embedding function maps discrete part indices to continuous high-dimensional vector representations:

e_{p a r t}^{t} = E_{p a r t} (s_{t, 1})

, where

e_{part}^{t}

denotes the part embedding vector at time t. The executor embedding function maps executor types to embedding vectors:

e_{agent}^{t} = E_{agent} (s_{t, 2})

, representing the executor embedding vector at time t. For each time step t in the sequence, we concatenate the part embedding and executor embedding to form a fused representation:

h_{t} = [e_{part}^{t}, e_{agent}^{t}]

. The fused sequence features are fed into an LSTM network for temporal modeling:

H = LSTM ([h_{1}, h_{2}, \dots, h_{L}])

, where the LSTM can be a bidirectional structure with an output dimension of

d_{hidde n} \times (1 + bidirectional)

. H denotes the hidden state sequence of the entire sequence. The final hidden state of the LSTM is used as the global representation of the entire assembly sequence:

s_{seq} = h_{L}

. Feature refinement is performed on the sequence representation

s_{seq}

using layer normalization and linear transformation:

s_{{seq}^{'}} = LayerNorm (W_{s} \cdot s_{seq} + b_{s})

. Nonlinear transformations via several fully connected layers yield the predicted score

\hat{y}

, with mean squared error (MSE) adopted as the regression loss. The scoring network is trained on the scored sequences from the database to obtain high-score generated sequences.

4. Case Study

The method proposed in this paper is validated in HRC assembly scenarios. To verify the effectiveness of the method under reasonable environmental constraints, the following assumptions are made for generalized assembly scenarios.

Assumption 1: Unlike [40], where types and positions of all assembly components are predefined, this paper simulates a most generalized assembly environment. Thus, the positions of task parts are not identical, and the robot needs to perceive environmental information and object affordances before performing tasks.
Assumption 2: To verify the effectiveness of the method in a multi-task environment, the number of operators and robots in the assembly environment constructed in this paper is strictly limited. This paper discusses how to complete multi-task assembly in a one-to-one robot-operator assembly system.
Assumption 3: Each individual assembly task discussed in this paper has a specific assembly sequence, which must follow certain constraint orders to ensure the smooth completion of the task. Corresponding assembly sequences have been predefined in the knowledge base; however, there is no explicit assembly order between parallel tasks.
Assumption 4: Due to the limitations of the robot’s end effector, certain assembly steps can only be completed by human collaborators. This paper incorporates this factor as an action constraint in HRC to plan robot actions.
Assumption 5: This paper aims to improve the efficiency of HRC in a multi-task environment while ensuring safety in collaboration. It is assumed that when there is an overlap between human motion trajectories and robot action trajectories, the robot will adopt a strategy to avoid the movement path of human collaborators.

The assembly tasks studied in this paper are illustrated in Figure 6, mainly including the assembly of two types of objects: dumbbell assembly and tower assembly. Each component of the assembly can be assigned to either a robot or a human collaborator. The assembly difficulty of the two tasks is set to be similar, aiming to reduce the variability of different operators across different tasks.

Case 1: Dumbbell Assembly

The dumbbell-shaped assembly task involves 3 types of objects and 6 components, with the components featuring the following characteristics:

(1): Fully sequential assembly: There is a size relationship between components, which must be assembled in descending order of size.
(2): Three pairs of identical components: During assembly, the selectable components may not be unique.
(3): Symmetric assembly: Mirror image action sequences exist.
(4): Human-exclusive assembly for one component: Due to assembly orientation constraints, one component must be assembled by a human.

Case 2: Tower Assembly

The other assembly task is to complete the assembly of a tower-shaped object. This type of object assembly is designed to involve 3 types of objects and 5 components, with the components having the following characteristics:

(1): Incomplete sequential assembly: The positions of components can be swapped.
(2): One type of identical components: There exists a category of identical components.
(3): Asymmetric assembly: It can only be assembled in a specific order.
(4): Human-collaborator-exclusive final assembly: The assembly of the last component must be completed by a human collaborator.

5. Experiments

In multi-task assembly scenarios, during the assembly of subtasks, there exist states where tasks interact with each other, while both parties must strictly follow the assembly sequence of subtasks. This process is referred to as relative sequential assembly, which endows the assembly process with multiple select ability. As shown in Figure 7, mixed assembly involves a large number of feasible paths.

The real robot platform includes a RealSense D435i depth camera and a UR5 collaborative robot. However, since implementing the robot’s shaft-hole assembly operation is not the main research content of this paper, the experiment uses the PyBullet simulation environment for sequence simulation.

In this paper, 400 high-quality assembly sequences under different assembly scenarios are generated through dynamic constraint algorithm simulation. Combining domain knowledge, after balancing assembly information such as human assembly trajectories and HRC resource allocation, 20 distinct and relatively optimal assembly paths are generated for each scenario. Then, two skilled operators and two unskilled operators score each path, considering both explicit human movement trajectories and implicit collaborative information (e.g., assembly time and the proportion of human idle time) from multiple dimensions.

Since labeling costs increase exponentially with the number of scenarios, the study adopts a part position drift method for data augmentation—by applying slight drifts to the position of each part and its corresponding score. Ultimately, 12,000 optimized assembly paths and their respective scores are formed.

In the sequence generation stage, we adopt a type of LSTM generation model conditioned on scene features. The model first encodes scene embeddings using a two-layer feedforward network (with ReLU as the activation function and the hidden layer dimension set to 128). Subsequently, the scene representation is concatenated with autoregressive word embeddings and input into a two-layer LSTM for sequence modeling (with 128 hidden units, 2 layers, and a dropout rate of 0.2). The training configuration includes a learning rate of

{1 \times 10}^{- 3}

, the AdamW optimizer (equipped with ROP scheduling), a batch size of 16, and a maximum of 100 epochs. A linear mapping is used to obtain the conditional distribution over the vocabulary, which is employed to incrementally sample assembly sequences. In the sequence scoring stage, we use an LSTM regressor that jointly models assembly sequence embeddings and position features to score the quality of candidate sequences. The model consists of a one-layer LSTM (1 layer with 128 hidden units), whose input is formed by concatenating discrete embeddings of parts and executors (with dimensions 64 and 8, respectively). The LSTM output, after LayerNorm and linear transformation, is jointly input with the scene position vector into a two-layer feedforward network (with hidden layer dimensions of 256 and 128, ReLU as the activation function, and a dropout rate of 0.1) to generate a scalar score. For training configuration, the Adam optimizer is used (with a learning rate of 1 × 10⁻³ and weight decay of 1 × 10⁻⁴), along with a batch size of 128 and 100 training epochs. This experiment is equipped with an NVIDIA GeForce RTX 4060 GPU and adopts the Python + PyTorch 2.5.1 deep learning framework. The loss results of the scoring model training are shown in Figure 8. Blue represents the MSE of the training set, and orange represents the MSE of the validation set. The model loss had converged when trained to 40 epochs. Since the number of parts used in our experiment is not large, the training cost of our sequence generation network is relatively low. For an 11-stepHRC sequence, the generative model takes 13.5 s per epoch, with a total training time of 8.1 min; the scoring model takes 1.32 s per epoch, with a total training time of 2.2 min. The average time for a single sequence inference is 378.8 ms, and the time to score one sequence is 0.95 ms. This means 2.6 sequences can be generated and evaluated per second.

We validated the proposed method using 100 regenerated scenarios and compared it with several state-of-the-art models, including Greedy Search [38], Depth-First Search (DFS) [39], and Breadth-First Search (BFS) [40].

First, we conducted simulation comparisons based on the optimal sequences generated by each method. As shown in Figure 9, in terms of the comprehensive quality dimension—composed of three types of sequence structural features with weighted values: balance (40%, requiring the Human–Robot work ratio to be close to 0.5), collaboration efficiency (30%, requiring the role switching frequency to be within a reasonable range), and task continuity (30%, requiring the maximum continuous task duration for the same executor to not be excessively long)—MLP_Optimal significantly outperforms other methods. Specifically, the average comprehensive quality of MLP_Optimal is 0.8846, which is higher than that of DFS (0.8572), Baseline (0.8500), GREEDY_LOAD (0.8268), and GREEDY_TIME (0.7558). This indicates that MLP_Optimal has greater advantages in the overall optimization of balanced task allocation, reasonable switching rhythm, and continuity constraints.

In addition, from the perspective of Human–Robot division balance (with an ideal ratio of 0.5), the average human work ratio of MLP_Optimal is 0.4610. Its deviation from the ideal ratio (0.0390) is smaller than that of Baseline (0.0482) and significantly lower than that of DFS (0.1364) and the two Greedy strategies (0.1580/0.2246). The above results demonstrate that while improving the overall quality of sequences, MLP_Optimal better maintains the balance of Human–Robot workload, and thus has higher usability and potential robustness. Meanwhile, in Scenario 51, the human ratio of MLP_Optimal is 0.61, which deviates significantly from the ideal ratio of 0.5. Additionally, the maximum number of consecutive human tasks is 6, resulting in humans performing tasks for extended periods. Such failure cases mainly stem from the unbalanced spatial layout of the scenario and the model’s insensitivity to local constraints.

Since we need to compare the differences between different sequences, this paper also adopts the Kendall rank correlation coefficient [41] and Spearman rank correlation coefficient [42] to evaluate the performance of each method. By comparing with the optimal sequences evaluated by human experts, as shown in Figure 10, the proposed two-stage generation method achieves the best performance in both the Kendall coefficient and Spearman coefficient metrics. Due to the addition of speed noise in the simulation process, the assembly time of each assembly step is roughly similar, resulting in an overall low correlation coefficient. However, compared with other algorithms, the method in this paper can still clearly learn the pattern of humans selecting optimal assembly paths.

6. Conclusions

HRC assembly plays a crucial role in the field of intelligent manufacturing. However, different initial assembly scenarios give rise to a variety of complex feasible assembly paths. Generating optimized assembly sequences is a key step in improving assembly quality and the assembly experience of human collaborators. To reasonably arrange the work of humans and robots in a shared space, task planning needs to be carried out based on human preferences and intentions. This study established an assembly simulation database, and through a two-stage MLP-LSTM sequence generation and scoring network, learned human preferences in HRC assembly tasks. Optimized assembly sequences were generated according to different initial assembly scenarios, balancing Human–Robot resource allocation and overall task efficiency. This study has clear engineering application value. The proposed sequence encoding and MLP methods are sensitive to key constraints such as task allocation ratio, collaborative switching, and continuity, enabling rapid migration to larger-scale assembly lines without modifying production hardware. We suggest integrating weak supervision and self-supervision strategies: using simulation-generated feasible sequences and process rules to automatically generate pseudo-labels, combining active learning to select a small number of high-value scenarios for manual annotation, and reusing existing model capabilities through transfer learning/incremental learning. This approach can control the demand for manual annotation within a manageable range.

Nevertheless, this study also has several limitations: (1) The method relies on a high-quality assembly simulation database, and the cost of data collection for real HRC tasks is very high. (2) No assembly sequence that dynamically tracks task status in real time has been established. (3) The evaluation of sequence quality relies excessively on manual work, and the optimal sequence obtained through final evaluation may not necessarily be the globally optimal one. (4) The ability to generalize to new assembly tasks is insufficient, requiring prior knowledge of task information and assembly constraints. (5) Restricted by sensor performance, visual sensors are prone to part pose recognition deviations under complex lighting conditions (e.g., direct glare, backlighting) or occluded scenarios. (6) The operational flexibility of robot end-effectors is limited, which cannot fully match the fine movements of human hands; moreover, the safety distance setting in the HRC space limits operational efficiency, making it difficult to balance safety and assembly speed. Future research can be conducted in the following aspects:

First, a “hybrid dataset augmentation framework” integrating real-scene data and virtual simulation data will be constructed. Leveraging transfer learning and data augmentation techniques, the limitation of high collection costs for real data will be addressed, while research on multi-agent collaborative systems will be conducted.

Second, real-time perception and dynamic planning modules will be introduced. Combined with visual sensors and force feedback devices, part status, human operation trajectories, and robot workload during the assembly process will be captured. Through the Action Chunk Transformer framework, dynamic adjustment of assembly sequences will be realized.

Finally, a general assembly sequence generation model based on large language models (LLMs) will be explored. In the pre-training phase, the model will learn the common constraints of different assembly tasks. When confronted with new tasks, only a small amount of task descriptions (e.g., part types, assembly objectives) need to be input to quickly generate feasible sequences, which assists the model in reasoning and decision-making in unknown scenarios and significantly enhances its generalization capability.

Author Contributions

Z.P.: Methodology, Software, Investigation, Validation, Visualization and Writing—Original Draft Preparation. W.W.: Writing—Review and Editing, Project Administration. All authors have read and agreed to the published version of the manuscript.

Funding

This dissertation is funded by the General Program of the National Natural Science Foundation of China (NSFC) entitled “Human-Robot Collaborative Contact Operation Based on Multimodal Perception and Reinforcement Learning” (Grant No. 51975350).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Darvish, K.; Simetti, E.; Mastrogiovanni, F.; Casalino, G. A Hierarchical Architecture for Human–Robot Cooperation Processes. IEEE Trans. Robot. 2021, 37, 567–586. [Google Scholar] [CrossRef]
Rane, A.B.; Sudhakar, D.S.S.; Sunnapwar, V.K.; Rane, S. Improving the Performance of Assembly Line: Review with Case Study. In Proceedings of the 2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE), Navi Mumbai, India, 9–10 January 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–14. [Google Scholar]
Cramer, M.; Cramer, J.; Kellens, K.; Demeester, E. Towards Robust Intention Estimation Based on Object Affordance Enabling Natural Human-Robot Collaboration in Assembly Tasks. Procedia CIRP 2018, 78, 255–260. [Google Scholar] [CrossRef]
Bortot, D.; Born, M.; Bengler, K. Directly or on Detours? How Should Industrial Robots Approximate Humans? In Proceedings of the 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Tokyo, Japan, 3–6 March 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 89–90. [Google Scholar]
Kyrarini, M.; Haseeb, M.A.; Ristić-Durrant, D.; Gräser, A. Robot Learning of Industrial Assembly Task via Human Demonstrations. Auton. Robot. 2019, 43, 239–257. [Google Scholar] [CrossRef]
Zhang, S.; Huang, H.; Huang, D.; Yao, L.; Wei, J.; Fan, Q. Subtask-Learning Based for Robot Self-Assembly in Flexible Collaborative Assembly in Manufacturing. Int. J. Adv. Manuf. Technol. 2022, 120, 6807–6819. [Google Scholar] [CrossRef]
Huang, J.; Pham, D.T.; Li, R.; Qu, M.; Wang, Y.; Kerin, M.; Su, S.; Ji, C.; Mahomed, O.; Khalil, R.; et al. An Experimental Human-Robot Collaborative Disassembly Cell. Comput. Ind. Eng. 2021, 155, 107189. [Google Scholar] [CrossRef]
El Makrini, I.; Omidi, M.; Fusaro, F.; Lamon, E.; Ajoudani, A.; Vandcrborght, B. A Hierarchical Finite-State Machine-Based Task Allocation Framework for Human-Robot Collaborative Assembly Tasks. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 10238–10244. [Google Scholar]
Chandra, A.S.; Matthias, P.; Andreas, P. Rinner Bernhard Human Robot Collaboration to Reach a Common Goal in an Assembly Process. In Frontiers in Artificial Intelligence and Applications; IOS Press: Amsterdam, The Netherlands, 2016. [Google Scholar]
Guo, L.; Zhang, Z.; Zhang, X. Human–Robot Collaborative Partial Destruction Disassembly Sequence Planning Method for End-of-Life Product Driven by Multi-Failures. Adv. Eng. Inform. 2023, 55, 101821. [Google Scholar] [CrossRef]
Aliev, K.; Antonelli, D.; Bruno, G. Task-Based Programming and Sequence Planning for Human-Robot Collaborative Assembly. IFAC-PapersOnLine 2019, 52, 1638–1643. [Google Scholar] [CrossRef]
Tsarouchi, P.; Matthaiakis, A.-S.; Makris, S.; Chryssolouris, G. On a Human-Robot Collaboration in an Assembly Cell. Int. J. Comput. Integr. Manuf. 2017, 30, 580–589. [Google Scholar] [CrossRef]
Michalos, G.; Spiliotopoulos, J.; Makris, S.; Chryssolouris, G. A Method for Planning Human Robot Shared Tasks. CIRP J. Manuf. Sci. Technol. 2018, 22, 76–90. [Google Scholar] [CrossRef]
Malik, A.A.; Bilberg, A. Complexity-Based Task Allocation in Human-Robot Collaborative Assembly. Ind. Robot Int. J. Robot. Res. Appl. 2019, 46, 471–480. [Google Scholar] [CrossRef]
Pellegrinelli, S.; Admoni, H.; Javdani, S.; Srinivasa, S. Human-Robot Shared Workspace Collaboration via Hindsight Optimization. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 831–838. [Google Scholar]
Wang, Q.; Jiao, W.; Yu, R.; Johnson, M.T.; Zhang, Y. Virtual Reality Robot-Assisted Welding Based on Human Intention Recognition. IEEE Trans. Autom. Sci. Eng. 2020, 17, 799–808. [Google Scholar] [CrossRef]
Zhang, R.; Lv, J.; Li, J.; Bao, J.; Zheng, P.; Peng, T. A Graph-Based Reinforcement Learning-Enabled Approach for Adaptive Human-Robot Collaborative Assembly Operations. J. Manuf. Syst. 2022, 63, 491–503. [Google Scholar] [CrossRef]
Duan, F.; Tan, J.T.C.; Tong, J.G.; Kato, R.; Arai, T. Application of the Assembly Skill Transfer System in an Actual Cellular Manufacturing System. IEEE Trans. Autom. Sci. Eng. 2012, 9, 31–41. [Google Scholar] [CrossRef]
Vogt, D.; Stepputtis, S.; Grehl, S.; Jung, B.; Ben Amor, H. A System for Learning Continuous Human-Robot Interactions from Human-Human Demonstrations. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May 2017–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2882–2889. [Google Scholar]
Hernansanz, A.; Casals, A.; Amat, J. A Multi-Robot Cooperation Strategy for Dexterous Task Oriented Teleoperation. Robot. Auton. Syst. 2015, 68, 156–172. [Google Scholar] [CrossRef]
Gualtieri, L.; Rauch, E.; Vidoni, R.; Matt, D.T. An Evaluation Methodology for the Conversion of Manual Assembly Systems into Human-Robot Collaborative Workcells. Procedia Manuf. 2019, 38, 358–366. [Google Scholar] [CrossRef]
Raessa, M.; Chen, J.C.Y.; Wan, W.; Harada, K. Human-in-the-Loop Robotic Manipulation Planning for Collaborative Assembly. IEEE Trans. Autom. Sci. Eng. 2020, 17, 1800–1813. [Google Scholar] [CrossRef]
Homem De Mello, L.S.; Sanderson, A.C. AND/OR Graph Representation of Assembly Plans. IEEE Trans. Robot. Autom. 1990, 6, 188–199. [Google Scholar] [CrossRef]
Pupa, A.; Van Dijk, W.; Secchi, C. A Human-Centered Dynamic Scheduling Architecture for Collaborative Application. IEEE Robot. Autom. Lett. 2021, 6, 4736–4743. [Google Scholar] [CrossRef]
Johannsmeier, L.; Haddadin, S. A Hierarchical Human-Robot Interaction-Planning Framework for Task Allocation in Collaborative Industrial Assembly Processes. IEEE Robot. Autom. Lett. 2017, 2, 41–48. [Google Scholar] [CrossRef]
Darvish, K.; Wanderlingh, F.; Bruno, B.; Simetti, E.; Mastrogiovanni, F.; Casalino, G. Flexible Human–Robot Cooperation Models for Assisted Shop-Floor Tasks. Mechatronics 2018, 51, 97–114. [Google Scholar] [CrossRef]
Merlo, E.; Lamon, E.; Fusaro, F.; Lorenzini, M.; Carfi, A.; Mastrogiovanni, F.; Ajoudani, A. Dynamic Human-Robot Role Allocation Based on Human Ergonomics Risk Prediction and Robot Actions Adaptation. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2825–2831. [Google Scholar]
Casalino, A.; Zanchettin, A.M.; Piroddi, L.; Rocco, P. Optimal Scheduling of Human–Robot Collaborative Assembly Operations with Time Petri Nets. IEEE Trans. Autom. Sci. Eng. 2021, 18, 70–84. [Google Scholar] [CrossRef]
Paxton, C.; Hundt, A.; Jonathan, F.; Guerin, K.; Hager, G.D. CoSTAR: Instructing Collaborative Robots with Behavior Trees and Vision. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 564–571. [Google Scholar]
Lee, D.; Liu, C.; Liao, Y.-W.; Hedrick, J.K. Parallel Interacting Multiple Model-Based Human Motion Prediction for Motion Planning of Companion Robots. IEEE Trans. Autom. Sci. Eng. 2017, 14, 52–61. [Google Scholar] [CrossRef]
Andrianakos, G.; Dimitropoulos, N.; Michalos, G.; Makris, S. An Approach for Monitoring the Execution of Human Based Assembly Operations Using Machine Learning. Procedia CIRP 2019, 86, 198–203. [Google Scholar] [CrossRef]
Gkournelos, C.; Konstantinou, C.; Makris, S. An LLM-Based Approach for Enabling Seamless Human-Robot Collaboration in Assembly. CIRP Ann. 2024, 73, 9–12. [Google Scholar] [CrossRef]
Liu, S.; Zhang, J.; Wang, L.; Gao, R.X. Vision AI-Based Human-Robot Collaborative Assembly Driven by Autonomous Robots. CIRP Ann. 2024, 73, 13–16. [Google Scholar] [CrossRef]
Nan, Z.; Shu, T.; Gong, R.; Wang, S.; Wei, P.; Zhu, S.-C.; Zheng, N. Learning to Infer Human Attention in Daily Activities. Pattern Recognit. 2020, 103, 107314. [Google Scholar] [CrossRef]
Liu, Z.; Liu, Q.; Xu, W.; Liu, Z.; Zhou, Z.; Chen, J. Deep Learning-Based Human Motion Prediction Considering Context Awareness for Human-Robot Collaboration in Manufacturing. Procedia CIRP 2019, 83, 272–278. [Google Scholar] [CrossRef]
Bruno, B.; Mastrogiovanni, F.; Sgorbissa, A.; Vernazza, T.; Zaccaria, R. Analysis of Human Behavior Recognition Algorithms Based on Acceleration Data. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1602–1607. [Google Scholar]
Cramer, M.; Kellens, K.; Demeester, E. Probabilistic Decision Model for Adaptive Task Planning in Human-Robot Collaborative Assembly Based on Designer and Operator Intents. IEEE Robot. Autom. Lett. 2021, 6, 7325–7332. [Google Scholar] [CrossRef]
Ravichandar, H.C.; Dani, A.P. Human Intention Inference Using Expectation-Maximization Algorithm with Online Model Learning. IEEE Trans. Autom. Sci. Eng. 2017, 14, 855–868. [Google Scholar] [CrossRef]
Lin, C.-H.; Wang, K.-J.; Tadesse, A.A.; Woldegiorgis, B.H. Human-Robot Collaboration Empowered by Hidden Semi-Markov Model for Operator Behaviour Prediction in a Smart Assembly System. J. Manuf. Syst. 2022, 62, 317–333. [Google Scholar] [CrossRef]
Zafar, M.H.; Moosavi, S.K.R.; Sanfilippo, F. Hierarchical Recurrent-Inception Residual Transformer (HRIRT) for Multidimensional Hand Force Estimation Using Force Myography Sensor. IEEE Sens. Lett. 2024, 8, 6011204. [Google Scholar] [CrossRef]
Zhang, X.; Tian, S.; Liang, X.; Zheng, M.; Behdad, S. Early Prediction of Human Intention for Human–Robot Collaboration Using Transformer Network. J. Comput. Inf. Sci. Eng. 2024, 24, 051003. [Google Scholar] [CrossRef]
Ma, R.; Liu, Y.; Graf, E.W.; Oyekan, J. Applying Vision-Guided Graph Neural Networks for Adaptive Task Planning in Dynamic Human Robot Collaborative Scenarios. Adv. Robot. 2024, 38, 1690–1709. [Google Scholar] [CrossRef]
Lagamtzis, D.; Schmidt, F.; Seyler, J.; Dang, T.; Schober, S. Graph Neural Networks for Joint Action Recognition, Prediction and Motion Forecasting for Industrial Human-Robot Collaboration. In Proceedings of the ISR Europe 2023; 56th International Symposium on Robotics, Stuttgart, Germany, 26–27 September 2023. [Google Scholar]

Figure 1. The Framework of the Proposed Method.

Figure 2. Flowchart for Generating HRC Assembly Task logical paths.

Figure 3. HRC Task Assembly State Graph.

Figure 4. Human Assembly Trajectories (Explicit Expression of Human Intent) and Human–Robot Assembly Information Graph (Implicit Expression of Human Intent).

Figure 5. Structure Diagram of the Sequence Generator.

Figure 6. Assembly Tasks Studied in This Paper, Including Task 1: Dumbbell Assembly and Task 2: Tower Assembly.

Figure 7. Visualization diagram of path complexity for single tasks and composite assembly tasks.

Figure 8. MLP-LSTM Sequence Scoring Network Loss Curve.

Figure 9. Comparison of Simulation Results of Various Algorithms.

Figure 10. Comparison of Kendall Rank Correlation Coefficients and Spearman’s Rank Correlation Coefficients Across Different Algorithms.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, Z.; Wang, W. A Two-Stage MLP-LSTM Network-Based Task Planning Method for Human–Robot Collaborative Assembly Scenarios. Appl. Sci. 2025, 15, 10922. https://doi.org/10.3390/app152010922

AMA Style

Pan Z, Wang W. A Two-Stage MLP-LSTM Network-Based Task Planning Method for Human–Robot Collaborative Assembly Scenarios. Applied Sciences. 2025; 15(20):10922. https://doi.org/10.3390/app152010922

Chicago/Turabian Style

Pan, Zhenyu, and Weiming Wang. 2025. "A Two-Stage MLP-LSTM Network-Based Task Planning Method for Human–Robot Collaborative Assembly Scenarios" Applied Sciences 15, no. 20: 10922. https://doi.org/10.3390/app152010922

APA Style

Pan, Z., & Wang, W. (2025). A Two-Stage MLP-LSTM Network-Based Task Planning Method for Human–Robot Collaborative Assembly Scenarios. Applied Sciences, 15(20), 10922. https://doi.org/10.3390/app152010922

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Two-Stage MLP-LSTM Network-Based Task Planning Method for Human–Robot Collaborative Assembly Scenarios

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

2.1. HRC Task Representation

2.2. Human Intention Recognition for HRC

3. Methodology

3.1. Problem Statement

3.2. Offline Task Graph for Human–Robot Collaborative Assembly

3.3. Two-Stage MLP-LSTM Training Network

4. Case Study

5. Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI