Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control for Space Modular Robot

Luo, Wenwei; Meng, Ling; Feng, Fei; Guo, Pengyu; Li, Bo

doi:10.3390/act14050234

Open AccessArticle

Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control for Space Modular Robot

by

Wenwei Luo

¹

,

Ling Meng

²,

Fei Feng

³,

Pengyu Guo

^2,*

and

Bo Li

^1,*

¹

Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China

²

National Innovation Institute of Defense Technology, Chinese Academy of Military Science, Beijing 100071, China

³

DFH Satellite Co., Ltd., Beijing 100094, China

^*

Authors to whom correspondence should be addressed.

Actuators 2025, 14(5), 234; https://doi.org/10.3390/act14050234

Submission received: 27 March 2025 / Revised: 29 April 2025 / Accepted: 5 May 2025 / Published: 8 May 2025

(This article belongs to the Special Issue Actuators in Robotic Control—3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes an inner–outer loop computational framework to address the morphology optimization and pursuit–evasion control problem for space modular robots. First, a morphological design space considering the functional characteristics of different modules is designed. Then, an elite genetic algorithm is applied to evolve the morphology within this space, and a proximal policy optimization algorithm is applied to control the space modular robot with evolved morphology. Considering symmetry, centrality, module cost, and average cumulative reward, a comprehensive morphological assessment is proposed to evaluate the morphology. And the assessment result serves as the fitness of evolution. In addition, by implementing the algorithm on the JAX framework for parallel computing, the computational efficiency was significantly enhanced, allowing the entire optimization process within 17.3 h. Comparative simulation results verify the effectiveness and superiority of the proposed computational framework.

Keywords:

reinforcement learning; pursuit–evasion control; genetic algorithm; modular robots

1. Introduction

Due to the ability of adaptive reconfiguration according to task environments, modular robots serve as an ideal platform for performing complex tasks. Presently, modular robots have garnered increasing attention for their potential prospects in environment exploration, object transportation, emergency rescue, and other fields [1,2,3,4]. Space pursuit–evasion problems have been extensively studied as fundamental problems in fields such as search and rescue [5,6,7]. Compared with conventional and monolithic robots, modular robots promise to be versatile and resilient in the face of complex tasks [8]. These attributes may potentially enhance its efficiency in performing space pursuit–evasion tasks. However, research specifically addressing the relevant application of modular robots for space pursuit–evasion tasks is still lacking. The modular robot design for complex tasks is also quite a challenging issue.

With the advancement of computational design, a variety of optimization methods have been applied to solve morphology optimization and control problems of modular robots [9,10]. In terms of morphology optimization, some studies employ genetic algorithms (GA), particle swarm optimization (PSO), Bayesian optimization (BO), and other optimization algorithms to optimize kinematics, topology, and locomotion of robots [11,12,13,14]. With the increasing complexity of modular robots and tasks, recent works in morphology optimization primarily focus on developing different morphological encoding schemes for various modular robots and tasks. In [15], the hierarchy of articulated three-dimensional rigid parts is directly encoded as the directed graph of nodes and connections. In [16], the direct encoding scheme is used to achieve the morphological evolution of the reconfigurable organisms in simulation and further applied to the design of the practical completely biological machines. A powerful indirect encoding scheme called CPPN-NEAT encoding is tested on various morphological evolution in [17]. The results show that this encoding scheme can create complex regularities such as symmetry and repetition. A graph grammar is proposed in [14] to express possible arrangements of physical robot assemblies. Then, the robot with components including different joint types, links, and wheels is encoded as a sequence of grammar rules indirectly. Overall, direct encodings ensure complete morphological design space, while it struggles to consistently produce regular morphologies and coordinated behaviors. Indirect encodings facilitate the emergence of regularity and variation in morphology, but they restrict the morphological design space [17]. Therefore, it is crucial to design specific encoding schemes according to the requirements of the tasks and the characteristics of the modular robots.

Control for robots with complex locomotor systems is a difficult task, as successful and efficient movement requires precise coordination of numerous parameters [18]. Reinforcement learning (RL) is currently the mainstream approach used to address the control issues of robots with complex morphologies, and it has also been increasingly integrated with large language models (LLMs) and deep neural networks (DNNs) to enhance their decision-making and adaptability [19,20]. In [21], a computational framework named Deep Evolutionary Reinforcement Learning (DERL) is proposed to evolve and control the robot to learn locomotion and manipulation tasks in complex environments with different terrains. In [22], the REINFORCE, soft actor-critic, and proximal policy optimization (PPO) algorithm are adopted to simultaneously optimize the discrete and continuous parameters of the morphology and controller, therefore achieving the efficient search for various robot designs. In particular, the PPO algorithm is applied in [23] to train and guide the pursuer to approach the evader intelligently. In addition to the PPO algorithm, a highly efficient co-optimization scheme that combines the control policy pre-training based on meta-reinforcement learning with the gradient-based optimization of variable hardware parameters to optimize robotic hand morphology was proposed in [24]. In [25], a multi-objective collaborative deep reinforcement learning algorithm based on deep deterministic policy gradient (DDPG) is proposed to address poor robustness and height limitations in bipedal robot jumping. Moreover, external disturbances and actuator failures are critical challenges in robotic control [26], and reinforcement learning has been employed to address these issues. Two novel high-accuracy trajectory tracking control policies for a quadrotor unmanned aerial vehicle (UAV) subject to external disturbances are developed in [27] utilizing reinforcement learning methodology, which can balance the control cost and control performance effectively. In [28], an optimized intelligent control scheme based on adaptive dynamic programming is proposed to control a UAV exposed to disturbances and actuator failures. However, most of these studies primarily focus on generating basic behaviors, such as swimming, crawling, and manipulation [21,22,29]. Simplification of control objectives shifts the focus of research towards optimizing the fundamental capability of modular robots while neglecting comprehensive task-oriented control. Some studies have begun to focus on controlling modular robots to perform complex tasks [30,31], but research in this area is still relatively limited.

For modular robots with vast morphological design spaces, achieving convergence in evolution and learning is challenging. Common strategies adopted by prior work include constraining the morphological design spaces or focusing on finding optimal parameters given a fixed hand-designed morphology [32,33,34]. Several other approaches utilize GPUs and parallel computing to accelerate the computational speed of algorithms, thereby supporting large-scale evolution and learning [35,36,37]. Among them, JAX is a framework designed for high-performance computing and machine learning, which significantly enhances the execution speed of code through just-in-time (jit) compilation [38,39].

Inspired by the above findings, an intelligent morphology optimization and pursuit–evasion control approach is proposed for space modular robots. The proposed approach evolves the morphology and trains the space modular robot efficiently, achieving the globally approximate optimal combination of the morphology and the strategy ultimately. The main contributions are summarized as follows:

(1): Aninner–outer loop computational framework alternating between the evolution of morphology and the training of modular robots is built for space pursuit–evasion task for the first time. This framework is implemented on JAX to achieve efficient parallel computation, ultimately resulting in the globally approximate optimal combination of the morphology and the strategy for the space pursuit–evasion task.
(2): In the outer loop, a morphological evolutionary algorithm based on the elite GA (EGA) is developed to optimize the morphology of the modular robots. Unlike most studies on modular robots that focus on modules with straightforward functions [21,22], this study investigates a modular robot with various practical module functions. Therefore, a morphological design space that considers the various module functional characteristics and a corresponding morphological encoding scheme is designed. Additionally, a comprehensive morphological assessment is proposed to guide morphological evolution and ensure the evolved morphology has good structural and control performance.
(3): In the inner loop, a pursuit–evasion control approach based on the PPO algorithm is proposed for the space modular robots to pursue the free-floating and maneuvering evaders. By introducing the fuel consumption punishment into the reward function, the pursuit strategy is near fuel-optimal. The proposed approach possesses the superiority in balancing the pursuit performance and control cost.

2. Model and Problem Statement

2.1. Model of a Space Modular Robot

To investigate the morphology optimization and pursuit–evasion control for the space modular robot with different functional modules, we introduce a morphological design space

M

, which contains physically realistic morphologies that can learn pursuit strategy in complex task environments. Each morphology

M \in M

is the structure composed of various functional modules interconnected via rigid face-to-face connections, as shown in Figure 1. The module types include a control module (blue), auxiliary module (green), energy supply module (yellow), propulsion module (red), and basic connection module (grey). Establish the body-fixed frame

F_{B} {O_{B} X_{B} Y_{B} Z_{B}}

with the origin

O_{B}

being the center of mass of the space modular robot, as illustrated in Figure 1a. Subsequently, the method of one module connects to another can be represented as

\pm X_{B}

,

\pm Y_{B}

,

\pm Z_{B}

, where

+ X_{B}

represents connecting the one module to another in the

+ X

direction.

Due to the uniqueness of the propulsion module as the actuator driving the the locomotion of the space modular robot in three-dimensional space, it must be connected to at least one energy supply module and cannot be connected by any other module. Meanwhile, each energy supply module can only be connected by at most one propulsion module. Additionally, each morphology must include at least one control module and one auxiliary module, with no volume collisions between the modules. Here, in addition to incorporating the necessary conditions to ensure engineering practicality, we have not imposed any additional constraints on the morphological design space, thus achieving an extremely expressive morphological design space with minimal priors and restrictions.

A robot morphology

M

can be represented as a multiway tree

G

via direct encoding, as shown in Figure 1b. The modules of the morphology are regarded as the nodes of the multiway tree, and the connections between the two modules are regarded as the edges between the two nodes. The edges possess six weights (

\pm 1

,

\pm 2

,

\pm 3

), corresponding to six connection methods; the nodes possess five attributes (0,1,2,3,4), corresponding to five module functions. Determined the root node as control module, so the attribute of the root node is always 0. For a multiway tree

G

, denote the sets of nodes, node attributes, edges, edge weights are

V

,

F

,

E

,

D

, respectively, it can be represented as

G = {V, F, E, D}

. Thus, the robot morphology

M

in Figure 1a can be encoded as

\begin{matrix} G = {V, F, E, D} \\ V = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} \\ F = {0, 4, 2, 4, 1, 2, 3, 3, 1, 2, 3} \\ E = {(0, 1), (1, 0), (0, 2), (2, 0), (0, 3), \\ (3, 0), (0, 4), (4, 0), (1, 5), (5, 1), \\ (5, 6), (6, 5), (2, 7), (7, 2), (4, 8), \\ (8, 4), (4, 9), (9, 4), (9, 10), (10, 9)} \\ D = {1, - 2, - 1, 2, 3, - 2, - 1, 1, - 1, - 3} \end{matrix}

(1)

To model the dynamics of the space modular robot, the Earth-Centered-Inertial coordinate frame

F_{I} \{O_{I} X_{I} Y_{I} Z_{I}\}

is established, as illustrated in Figure 2. Let

x_{i} = {[r_{i}, {\dot{r}}_{i}]}^{T}

be the position vector, where

r_{i} \in R^{3}

is the position relative to

F_{I}

and expressed in

F_{I}

. The subscript

i = p, e

represents the pursuer (space modular robot) and the evader, respectively. Based on Newton’s gravitation law and Newton’s second law of motion, and considering the Earth as a homogeneous sphere, the equations of motion of the space modular robot and the evader in the reference frame

F_{I}

are given by

{\dot{x}}_{i} = f_{i} (x_{i}) + g_{i} u_{i}

(2)

where

u_{i} \in R^{3}

is the control thrust relative to

F_{I}

and expressed in

F_{I}

.

f_{i} (x_{i})

and

g_{i}

are denoted as

f_{i} (x_{i}) = [\begin{matrix} {\dot{r}}_{i} \\ - \frac{μ}{{∥ r_{i} ∥}^{3}} r_{i} \end{matrix}], g_{i} = [\begin{matrix} 0_{3 \times 3} \\ \frac{1}{m_{i}} I_{3 \times 3} \end{matrix}]

(3)

where

m_{i}

is the mass;

μ

is the gravitational constant of the Earth. Neglecting the impact of fuel consumption on mass, the mass

m_{p}

depends solely on the number of modules that compose the space modular robot, and it is modeled as

m_{p} = k n

(4)

where k is the mass of one module; n is the total number of modules. Additionally, the control thrust

u_{p} = {[u_{p x}, u_{p y}, u_{p z}]}^{T}

is constrained by

\{\begin{matrix} u_{p x} \leq T_{m a x} \\ u_{p y} \leq T_{m a x} \\ u_{p z} \leq T_{m a x} \\ T_{m a x} = T_{s} \cdot max (∥q^{+}∥, ∥q^{-}∥) \end{matrix}

(5)

where

T_{max}

is the maximum thrust of the space modular robot;

T_{s}

is the maximum thrust that a propulsion module can generate;

q^{+} = {[b_{x 1}, b_{y 1}, b_{z 1}]}^{T}

and

q^{-} = {[b_{x 2}, b_{y 2}, b_{z 2}]}^{T}

, with

b_{x 1}

,

b_{x 2}

,

b_{y 1}

,

b_{y 2}

,

b_{z 1}

,

b_{z 2}

are the quantities of propulsion modules which operative face arranged along

\pm X_{B}

,

\pm Y_{B}

,

\pm Z_{B}

, respectively (the operative face of propulsion module is opposite to its connection face). Meanwhile, the total thrust the space modular robot can generate is limited by

\int ∥u_{p}∥ \leq I_{s p} m_{fuel} g_{0}

(6)

where

I_{s p}

is the specific impulse of the fuel;

g_{0}

is the standard gravitational acceleration.

m_{fuel}

is the mass of fuel carried by the space modular robot, which depends on the number of energy supply modules

n_{energy}

, i.e.,

m_{fuel} = m_{f} n_{energy}

(7)

where

m_{f}

is the mass of fuel carried by one energy supply module.

2.2. Problem Statement

Denote the comprehensive morphological assessment (Section 3.2) of the morphology

M

with the policy parameters

θ

as

M A (M, θ)

, and treat it as the objective of the morphology optimization. The morphology optimization problem can be formulated as

\begin{matrix} max_{M, θ} & J = M A (M, θ) \\ s . t . & \{\begin{matrix} M \in M \\ θ \in R^{N_{θ}} \end{matrix} \end{matrix}

(8)

where J is the objective that needs to be maximized;

N^{θ}

is the number of the policy parameters.

In practical space pursuit–evasion tasks, the evader has a certain perception range, and the evader will initiate evasive maneuvers only when the pursuer enters the perception range. Therefore, the space pursuit–evasion tasks are typically divided into two phases based on whether the pursuer enters the perception range of the evader. This paper focuses on the stage within the perception range illustrated in Figure 2. The pursuer’s objective in the task is to approach the evader through maneuvering. In view of the distance between them, the criteria for completing the space pursuit–evasion task can be defined as

d_{safe} < d < d_{work}

(9)

where

d_{work}

is the minimum distance that should satisfy for completing the task;

d_{safe}

is the safe distance the pursuer needs to maintain from the evader;

d = ∥r_{p} - r_{e}∥

is the distance between the pursuer and the evader. The pursuer achieves position maneuver by executing impulsive thrust with fixed impulse interval p. The pursuit–evasion control problem can be stated as follows: For the pursuer with the specific morphology, design a pursuit strategy to approach the evader where the completion condition is described in Equation (9) while minimizing fuel consumption.

3. Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control

In order to improve the intelligence and autonomy of the space modular robot during the pursuit–evasion task, an inner–outer loop computational framework is built to achieve intelligent morphology optimization and pursuit–evasion control for the space modular robot based on EGA [40] and PPO [41], as shown in Figure 3.

At the initialization of alternation between morphological evolution and training, the outer evolutionary loop generates an initial population of P individuals, each with a random morphology

M \in M

composed of

n_{init}

modules. Then, the initial population undergoes lifetime learning in the inner reinforcement learning loop, and the result of the comprehensive morphological assessment determines fitness. After initialization, the entire inner–outer loop computation begins with individual selection. A total of G tournaments [42] are conducted to generate the elite population, with each involving the random grouping of 4 individuals. The crossover and mutation operations are then acted on by the elite population to produce elite offspring, which undergo lifetime learning and comprehensive morphological assessment to evaluate their fitness. Finally, the population is updated to the new one by incorporating the elite offspring and eliminating the individuals with the lowest fitness to maintain the population size at P. Thus, the inner and outer loops create a closed-loop system, where evolution and learning proceed in an alternating fashion. The following provides a detailed introduction to the core genetic mechanisms and the learning algorithm.

3.1. Crossover and Mutation

(1) Crossover

In Section 2, the robot morphology is represented as a multiway tree using the direct encoding method. Consider the crossover operation as a transformation

M_{c} : (G_{p 1}, G_{p 2}) \to G_{c}

, which is acted on two different trees

G_{p 1}

and

G_{p 2}

(parents) to produce a new tree

G_{c}

(child). In this process, two nodes are selected as the crossover points by a random number generator (RNG), and the child will be produced by crossing subtrees of parents according to the crossover points. Figure 4a provides a concrete example illustrating the concept. The red nodes represent the crossover points. Denote the probability of crossover occurrence is

r_{c}

, i.e., the crossover rate. Each multiway tree in the population does not occur. Crossover is preserved as a child directly.

(2) Mutation

Mutation operations are acted on a child produced by crossover, including function mutation and connection method mutation. They have the same probability of occurrence and will not be acted simultaneously on a child. Define the probability of mutation occurrence as

r_{m}

, i.e., the mutation rate.

Function mutation: Randomly select a module (excluding the control module) and randomly change its module function to another one. In specific implementations, a RNG is used to select a leaf node and a function number (the root node is determined as the control module, so the control module is excluded from the leaf nodes). Consider the function mutation operation as a transformation

M_{f} : G_{c} (V, E, f_{0}) \to {G_{c}}^{'} (V, E, f_{mut})

, the computation process can be represented as

f n c_m u t_{v_{i}} = R N G ({1, 2, 3, 4} ∖ {f n c_{v_{i}}})

(10)

where

f n c_{v_{i}}

and

f n c_m u t_{v_{i}}

are the node attributes of the leaf node

v_{i}

before and after mutation,

v_{i} = R N G (V ∖ {v_{root}})

is the selected leaf node;

v_{root}

is the root node of

G_{c}

; function

R N G (S)

represents the random selection of an element from set

S

using RNG. Figure 4b provides a concrete example illustrating the concept.

Connection method mutation: Randomly select a module other than the control module, then randomly change its module connection method to another one. Consider the function mutation operation as a transformation

M_{d} : G_{c} (V, E, f_{0}) \to {G_{c}}^{'} (V, E, f_{mut})

, the computation process can be represented as

c o n_m u t_{v_{i}} = R N G ({\pm 1, \pm 2, \pm 3} ∖ {c o n_{v_{i}}})

(11)

where

c o n_{v_{i}}

,

c o n_m u t_{v_{i}}

are the edge weights of the edge with

v_{i}

as its child node before and after mutation. Figure 4c provides a concrete example illustrating the concept.

During the crossover and mutation, it is inevitable that some morphologies may fail to meet the necessary conditions in Section 2.1. Therefore, in the actual implementation, reasonable judgment is performed on each offspring generated by the crossover and mutation of two parents. If an offspring fails to meet the necessary conditions, the two parents will undergo crossover and mutation again until a valid offspring is generated.

3.2. Comprehensive Morphological Assessment

Considering symmetry, centrality, module cost, and average cumulative reward in the inner reinforcement learning loop, a comprehensive morphological assessment is proposed. The result of the assessment serves as the fitness of morphology, determining the retention and elimination of the morphology in the outer evolutionary loop. To ensure good pursuit–evasion control performance of the space modular robot, the evolved morphology should possess high levels of symmetry and centrality. The symmetry and centrality are defined in the form of

f_{s} (M) = (x_{s} + y_{s} + z_{s}) / 3 n

(12)

f_{c} (M) = - \sum_{i = 1}^{n} ∥ X_{i} - X_{c} ∥ / 3 n

(13)

where

x_{s}

,

y_{s}

,

z_{s}

represent the number of modules symmetric about

X_{B}

,

Y_{B}

,

Z_{B}

, respectively.

X_{i}

represents the coordinate of module i relative to the body-fixed frame

F_{B}

, and

X_{c}

represents the coordinate of morphology’s center of mass relative to the body-fixed frame

F_{B}

. In practical engineering missions, different modules have different usage costs, and the number of modules assigned to a robot is limited. Therefore, it is necessary to consider the module cost as an optimization metric to indirectly optimize the module configuration of the robot. The module cost is defined as

f_{m} (M) = \sum_{i = 0}^{4} n_{i} c_{i}

(14)

where

n_{i}

and

c_{i}

are the number and cost of different functional modules; the subscript

i = 0, 1, 2, 3, 4

represents the control module, auxiliary module, energy supply module, propulsion module, and basic connection module, respectively. Then, the comprehensive morphological assessment rule is constructed in the form of

M A (M, θ_{M}) = w_{1} f_{s} + w_{2} f_{c} + w_{3} f_{m} + w_{4} F (M, θ_{M})

(15)

where

w_{1}

,

w_{2}

,

w_{3}

,

w_{4}

are positive constant parameters;

F (M, θ_{M})

denotes the average of the cumulative reward attained by the space modular robot over the last 100 episodes at the end of its lifetime learning. This metric reflects the learning ability and performance of the space modular robot with specific morphology in achieving the task objective.

3.3. Proximal Policy Optimization Algorithm

The PPO algorithm is applied as a reinforcement learning algorithm to achieve pursuit–evasion control for the space modular robot. In this paper, we use an architecture (Figure 5) where the action probability distribution

π_{θ} (a_{t} | s_{t})

and the estimation of state value

{\hat{V}}_{π_{θ}} (s_{t})

are generated by one neural network whose parameter is denoted as

θ

. The action

a_{π_{θ}} (s_{t}) \in R^{3}

is derived by sampling from

π_{θ} (a_{t} | s_{t})

. Then, the action

a_{π_{θ}} (s_{t})

serves as the control signal of the controller, which results in

u_{p} (t) = T_{\max} a_{π_{θ}} (s_{t})

(16)

The index to be maximized for PPO is defined as [41]

L_{t}^{P P O} (θ) = E_{t} [L_{t}^{C L I P} (θ) - α L_{t}^{V F} (θ) + β S [π_{θ}] (s_{t})]

(17)

where

α

,

β

are constant parameters;

L_{t}^{C L I P}

is the clipped advantage function;

L_{t}^{V F}

represents the estimation error of the state value function;

S [π_{θ}] (s_{t})

is the cross-entropy that reflects the stochastic degree of the policy. To improve computational efficiency and training stability, multiple identical task environments are created for parallel training. The space modular robot interacts with the task environments by applying the action

a_{π_{θ}} (s_{t})

. Subsequently, the task environments transition to a new state

s_{t + 1}

. The space modular robot and evader are deemed to be parts of the environment. During the environmental interaction process, the state

s

, reward r, action

a_{π_{θ}}

, state value

{\hat{V}}_{π_{θ}}

, and action probability

π_{θ} (a | s)

at each time step are stored in the replay buffer as a batch of data. When the data reaches the batch size, the data from the replay buffer is used for subsequent network parameter updates. The parameters of the neural network are updated by adaptive moment estimation (Adam).

In the implementation of the algorithm for the space pursuit–evasion task, the input of the network is the position error vector

r_{p} - r_{e}

, velocity of the pursuer

{\dot{r}}_{p}

, and velocity of the evader

{\dot{r}}_{e}

. The neural network has two hidden layers, each with 64 nodes. The activation function for all hidden layers is tanh. In the initialization process, orthogonal initialization is used for the weight of the neural network to ensure stability throughout the training process.

3.4. Reward Function of the Space Pursuit–Evasion Task

The reward function plays a crucial role in guiding the space modular robot to complete the pursuit–evasion task, i.e., it directs the robot state to satisfy the completion criteria (9). On this basis, considering the environmental boundary and fuel consumption, the reward function at each time step is defined as

r_{t} = r_{success} - r_{b} + ω_{1} Δ d - ω_{2} ∥u_{p}∥

(18)

where

ω_{1}

and

ω_{2}

are constant coefficients, the third term is continuous rewards, and

Δ d

represents the distance between the pursuer and the evader in the previous time step minus this same quantity in the current time step. The fourth term is continuous punishment. This term enables the algorithm to continuously optimize fuel consumption during task performance while ensuring that the robot completes the task. As a result, the trained robot can accomplish the task with reduced fuel consumption, which holds practical engineering significance. The first and second terms are terminal reward and punishment, respectively, which are defined as

r_{success} = \{\begin{matrix} 100, & Equation (9) \\ 0, & otherwise . \end{matrix}

(19)

r_{b} = \{\begin{matrix} 150, & ∥r_{p}∥ \geq b_{\max} \\ 150, & ∥r_{p}∥ \leq b_{\min} \\ 0, & otherwise . \end{matrix}

(20)

where

b_{\max}

and

b_{\min}

are two constants. The position of the robot must be within these boundaries while performing the task. Once the terminal reward or punishment is triggered, the current episode is immediately terminated. For the task environment discussed in this paper, establishing environmental boundaries is necessary. During the initial training phase, the space modular robot is prone to maneuver to excessively high or low positions that are impractical for task completion within the limited time and fuel. Thus, setting environmental boundaries allows for the early termination of episodes when the robot cannot complete the task, thereby enhancing the overall training efficiency.

3.5. Outline of the Inner–Outer Loop Computational Framework Implementation

The pseudocode for the inner–outer loop computational framework is shown in Algorithm 1.

Remark 1.

The proposed inner–outer loop computational framework is inspired by DERL [21]. It considers various practical module functions and connection methods within the morphological encoding scheme, thereby enabling its application to a wider range of modular robot design scenarios compared to DERL. In the outer evolutionary loop, it employs EGA to evolve the morphology of the space modular robot, which helps preserve advantageous genes and significantly improves global exploration capability and convergence speed of the algorithm. Moreover, the proposed framework uses the result of the comprehensive morphological assessment as the fitness for morphological evolution. By taking engineering factors into account, it possesses greater practical engineering significance. Additionally, the proposed framework adopts reinforcement learning to control the space modular robot without the need for a dynamics model, which makes it more applicable to scenarios where accurate robot modeling is challenging. By comparison, certain graph-based morphology optimization frameworks adopt model-based control algorithms such as model predictive control (MPC), which require an accurate dynamics model of the modular robot [14,30].

Algorithm 1: Inner–Outer Loop Computational Framework

1:: Initialize the population $P_{0} = {G_{0}^{1}, G_{0}^{2}, . . ., G_{0}^{p}}$
2:: for $i = 1$ to p do
3:: Run an inner reinforcement learning for $G_{0}^{i}$
4:: $F_{0}^{i} \leftarrow$ The average cumulative reward of the reinforcement learning
5:: $f_{s, 0}^{i}, f_{c, 0}^{i}, f_{m, 0}^{i} \leftarrow$ Right-hand side of (12)–(14)
6:: $M A_{0}^{i} \leftarrow$ Right-hand side of (15)
7:: end for
8:: Fitness $F_{0} \leftarrow {M A_{0}^{1}, M A_{0}^{2}, . . ., M A_{0}^{p}}$
9:: for $j = 1$ to G do
10:: $E_{j} \leftarrow$ The elites ${{G_{j}^{1}}^{'}, {G_{j}^{2}}^{'}, . . ., {G_{j}^{g r o u p s}}^{'}}$ selected from $P_{j - 1}$ using tournament algorithm
11:: $O_{j} \leftarrow$ The offspring ${{G_{j}^{1}}^{''}, {G_{j}^{2}}^{''}, . . ., {G_{j}^{g r o u p s}}^{''}}$ generated by acting crossover and mutation on $E_{j}$
12:: for $k = 1$ to $g r o u p s$ do
13:: Run an inner reinforcement learning for ${G_{j}^{k}}^{''}$
14:: $F_{j}^{k} \leftarrow$ The average cumulative reward of the reinforcement learning
15:: $f_{s, j}^{k}, f_{c, j}^{k}, f_{m, j}^{k} \leftarrow$ Right-hand side of (12)–(14)
16:: $M A_{j}^{k} \leftarrow$ Right-hand side of (15)
17:: end for
18:: $F_{j} \leftarrow F_{j - 1} + {M A_{j}^{1}, M A_{j}^{2}, . . ., M A_{j}^{g r o u p s}}$
19:: $P_{j} \leftarrow P_{j - 1} + O_{j}$
20:: for $l = 1$ to $g r o u p s$ do
21:: Delete $\underset{G \in P_{j}}{arg min} M A \in F_{j}$ from $P_{j}$
22:: end for
23:: end for

4. Numerical Simulation

To verify the performance of the proposed inner–outer loop computational framework, comparative simulation experiments are implemented using different space pursuit–evasion task environments and optimization algorithms on a server equipped with two Intel Xeon Silver 4210R Processors (Intel, Santa Clara, CA, USA). The experimental framework is implemented on JAX to enable parallel computation on CPUs, which significantly accelerates the computation speed. Additionally, the implementation of PPO is based on PureJaxRL [43].

4.1. Configuration of Task and Algorithm

In the initialization phase, the positions of the evader are randomly initialized on a spherical surface centered at

O_{I}

with a radius of R, while the space modular robot is randomly initialized around the evader, ensuring a distance of

d_{p}

(perception radius) between them. The core parameters of the space pursuit–evasion task are listed in Table 1. The core hyper-parameters of EGA and PPO are listed in Table 2. To ensure that the task performance of the space modular robot is solely influenced by its morphology, offspring do not inherit the neural network parameters of their parents during the evolutionary process. Additionally, the total number of steps for all inner-loop reinforcement learning is kept the same.

4.2. Simulation in Different Space Pursuit–Evasion Task Environments

Considering the state of the evader as part of the space pursuit–evasion task environments, the environments can be classified into free-floating evader environment (FEEnv) and maneuvering evader environment (MEEnv) based on whether the evader adopts a maneuvering strategy. During the pursuit–evasion process, the free-floating evader does not perform any maneuvers, i.e.,

u_{e} = 0

, while the maneuvering evader adopts the following evasion strategy

u_{e} = \frac{r_{e} - r_{p}}{∥r_{e} - r_{p}∥} T_{maxe}

(21)

where

T_{maxe}

is the maximum thrust of the maneuvering evader.

The entire optimization and learning process in each environment takes 17.3 h, resulting in 816 morphologies along with their pursuit strategies. Figure 6 shows the optimal (with the highest fitness) individuals from generation 0 (initial population), 25, and 50 in the two environments. Due to the consideration of symmetry and centrality in the fitness function, the optimal morphology in the final population is more regular compared to the initial population. In particular, the optimal morphology in the final population under MEEnv exhibits pronounced symmetry and centrality. Additionally, due to the incorporation of module cost in the fitness function, the morphologies in both environments evolved towards a lighter configuration. They gradually removed redundant propulsion and auxiliary modules to reduce module costs while still being able to complete their tasks. Figure 7 illustrates the fitness changes in populations in both environments throughout the evolution. The fitness of the populations is stably convergent during the evolutionary process. The average fitness values of the initial populations in both environments are below zero, indicating that a significant number of individuals in the initial populations have poor morphologies and are unable to complete the task through fixed-step reinforcement learning. However, after 50 generations of evolution, low-fitness individuals in both environments are eliminated, and the overall fitness of the populations reaches a higher level. Furthermore, the overall fitness of the population in MEEnv is lower than that in FEEnv, suggesting that the complexity of the environment has a certain impact on the evolution of morphology. Within the same number of evolutionary generations, individuals who evolved in a relatively simpler environment tend to have higher fitness values.

To better illustrate the pursuit–evasion process, a free-floating object that is initialized 50 km from the evader in the direction of the evader to

O_{I}

is introduced as a reference, transforming the absolute positions of the pursuer and the evader into relative positions with respect to this reference. Define the position of the reference as

r_{ref}

, then the relative positions of pursuer and evader are represented as

r_{p} - r_{ref}

and

r_{e} - r_{ref}

. Figure 8 illustrates the results of inner-loop reinforcement learning for the optimal morphologies in the initial population (Figure 6a,d) and the final population (Figure 6c,f) in two environments. As seen in Figure 8a, the pursuer with the optimal morphology in the initial population can approach the evader and complete the space pursuit–evasion task in FEEnv. However, its pursuit trajectory is quite unstable. This maneuver leads to excessive fuel consumption and is not practical for real-world engineering. Therefore, this morphology and its corresponding control strategy have significant room for optimization. Furthermore, as shown in Figure 8c, the pursuer with the optimal morphology in the initial population fails to complete the space pursuit–evasion task in MEEnv. Throughout the entire task period, the pursuer is unable to close the gap with the evader to the required distance. In contrast, as clearly shown in Figure 8b,d, through the optimization of the inner–outer loop, the pursuers with the optimal morphologies in the final populations can effectively complete the space pursuit–evasion task. The pursuers can approach the evader within the constraints of limited fuel and time while maintaining the stability of the pursuit process. This demonstrates the effective optimization of the proposed computational framework. Figure 9 illustrates that the cumulative reward of the two individuals exhibits an increasing trend throughout the reinforcement learning process. Additionally, as shown in Figure 10, due to the consideration of fuel consumption in the reward function, the fuel required by the two individuals to complete the task shows a decreasing trend.

4.3. Simulation in Different Optimization Frameworks

Furthermore, to highlight the superiority of the proposed inner–outer loop computational framework, the following two optimization frameworks are employed for comparison. (1) Using traditional genetic algorithm (GA) as the outer loop evolution algorithm and the inner-loop reinforcement learning algorithm remains unchanged (denoted as GA+PPO). During the evolutionary process, each individual in the population generates offspring directly through crossover and mutation operators. The hyper-parameters of traditional GA and mutation operators remain the same as those of the EGA. (2) Random search morphologies in the outer loop and the inner-loop reinforcement learning algorithm remain unchanged (denoted as Random+PPO). This framework explores the entire design space in a fully stochastic way by assigning a random morphology for each space modular robot. It means that the changes in morphology are random and no longer depend on fitness.

MEEnv is set as the task environment for the comparative experiment. Figure 11 shows the optimal individuals from generation 0 (initial population), 25, and 50 generated by different optimization frameworks. Compared to the proposed optimization framework, the “GA+PPO” framework guides the morphology toward simpler structures. However, the comprehensive morphological assessment results of the resulting optimal morphologies are lower than those of the proposed framework. In contrast to the first two frameworks, the “Random+PPO” optimization framework generates highly random morphologies without a clear evolutionary trend. Furthermore, the comprehensive morphological assessment results of its optimal morphologies are significantly lower than those of the other two frameworks. Figure 12 illustrates the fitness changes in populations generated by different optimization frameworks throughout the evolution process. The fitness of the populations optimized by the proposed framework and the “GA+PPO” framework converges stably during the evolution, whereas that optimized by the “Random+PPO” framework does not show any convergence. Both the average and highest fitness of the population optimized by the proposed algorithm exceed those of the other two frameworks. Overall, the proposed framework demonstrates a strong ability to optimize morphology within the vast morphological design space.

5. Conclusions

The intelligent morphology optimization and pursuit–evasion control approach for space modular robots is investigated in this paper, and an inner–outer loop computational framework based on an elite genetic algorithm and proximal policy optimization is proposed. In the proposed framework, morphological evolution and reinforcement learning alternately execute as the inner and outer loops to calculate the globally approximate optimal combination of the morphology and the pursuit strategy for the space pursuit–evasion task. Considering the various module functional characteristics, a morphological design space and a corresponding morphological encoding scheme are designed in the outer evolutionary loop. Meanwhile, a comprehensive morphological assessment is proposed to guide morphological evolution. In the inner reinforcement learning loop, a reward function considering the fuel consumption is designed to guide the space modular robot with specific morphology to learn near fuel-optimal pursuit strategy. Finally, the comparative simulation demonstrates that the proposed computational framework is effective in handling the pursuit–evasion task with varying maneuverability of evaders and possesses superior optimization and control performance with lower fuel consumption.

In the future, an intelligent evasion strategy will be considered for the evader, or a self-play approach will be used where the control strategies of both the pursuer and the evader are trained by the same reinforcement learning algorithm. Additionally, space obstacles and environmental disturbances will be incorporated to establish more realistic task environments with environmental uncertainties. This will improve the intelligence of the resulting morphology and control strategy. Moreover, Pareto-based multi-objective optimization algorithms (e.g., NSGA-II) will be adopted in the outer evolutionary loop to obtain the Pareto-optimal morphologies for specific tasks.

Author Contributions

Conceptualization, W.L.; methodology, W.L.; software, W.L.; validation, W.L.; formal analysis, L.M.; investigation, F.F.; resources, P.G.; writing—original draft preparation, W.L. and B.L.; writing—review and editing, W.L. and B.L.; supervision, P.G. and B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (Grant Number 62473249) and the Natural Science Foundation of Shanghai (Grant Number 23ZR1426600).

Data Availability Statement

The data used to support the findings of this study are included within the article.

Conflicts of Interest

Author Fei Feng was employed by the company DFH Satellite Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Han, N.; Luo, J.; Zong, L. Cooperative Game Method for On-Orbit Substructure Transportation Using Modular Robots. IEEE Trans. Aerosp. Electron. Syst. 2021, 58, 1161–1175. [Google Scholar] [CrossRef]
Han, N.; Luo, J.; Zheng, Z. Robust Coordinated Control for On-Orbit Substructure Transportation under Distributed Information. Nonlinear Dyn. 2021, 104, 2331–2346. [Google Scholar] [CrossRef]
Wharton, P.; You, T.L.; Jenkinson, G.P.; Diteesawat, R.S.; Le, N.H.; Hall, E.C.; Garrad, M.; Conn, A.T.; Rossiter, J. Tetraflex: A Multigait Soft Robot for Object Transportation in Confined Environments. IEEE Robot. Autom. Lett. 2023, 8, 5007–5014. [Google Scholar] [CrossRef]
Hu, Q.; Dong, E.; Sun, D. Soft Modular Climbing Robots. IEEE Trans. Robot. 2023, 39, 399–416. [Google Scholar] [CrossRef]
Gerkey, B.P.; Thrun, S.; Gordon, G. Visibility-Based Pursuit-Evasion with Limited Field of View. Int. J. Robot. Res. 2006, 25, 299–315. [Google Scholar] [CrossRef]
Tovar, B.; LaValle, S.M. Visibility-Based Pursuit-Evasion with Bounded Speed. Int. J. Rob. Res. 2008, 27, 1350–1360. [Google Scholar] [CrossRef]
Gurumurthy, V.; Mohanty, N.; Sundaram, S.; Sundararajan, N. An Efficient Reinforcement Learning Scheme for the Confinement Escape Problem. Appl. Soft. Comput. 2024, 152, 111248. [Google Scholar] [CrossRef]
Post, M.A.; Yan, X.T.; Letier, P. Modularity for the Future in Space Robotics: A Review. ACTA Astronaut. 2021, 189, 530–547. [Google Scholar] [CrossRef]
Stroppa, F.; Majeed, F.J.; Batiya, J.; Baran, E.; Sarac, M. Optimizing Soft Robot Design and Tracking with and without Evolutionary Computation: An Intensive Survey. Robotica 2024, 42, 2848–2884. [Google Scholar] [CrossRef]
Nadizar, G.; Medvet, E.; Ramstad, H.H.; Nichele, S.; Pellegrino, F.A.; Zullich, M. Merging Pruning and Neuroevolution: Towards Robust and Efficient Controllers for Modular Soft Robots. Knowl. Eng. Rev. 2022, 37, 1–27. [Google Scholar] [CrossRef]
Hiller, J.; Lipson, H. Automatic Design and Manufacture of Soft Robots. IEEE Trans. Robot. 2012, 28, 457–466. [Google Scholar] [CrossRef]
Atia, M.G.B.; Mohammad, A.; Gameros, A.; Axinte, D.; Wright, I. Reconfigurable Soft Robots by Building Blocks. Adv. Sci. 2022, 9, 2203217. [Google Scholar] [CrossRef] [PubMed]
Ghoreishi, S.F.; Sochol, R.D.; Gandhi, D.; Krieger, A.; Fuge, M. Bayesian Optimization for Design of Multi-Actuator Soft Catheter Robots. IEEE Trans. Med. Robot. Bionics 2021, 3, 725–737. [Google Scholar] [CrossRef]
Zhao, A.; Xu, J.; Konaković-Luković, M.; Hughes, J.; Spielberg, A.; Rus, D.; Matusik, W. RoboGrammar: Graph Grammar for Terrain-Optimized Robot Design. ACM Trans. Graph. 2020, 39, 1–16. [Google Scholar] [CrossRef]
Sims, K. Evolving 3D Morphology and Behavior by Competition. Artif. Life 1994, 1, 353–372. [Google Scholar] [CrossRef]
Kriegman, S.; Blackiston, D.; Levin, M.; Bongard, J. A Scalable Pipeline for Designing Reconfigurable Organisms. Proc. Natl. Acad. Sci. USA 2020, 117, 1853–1859. [Google Scholar] [CrossRef]
Cheney, N.; MacCurdy, R.; Clune, J.; Lipson, H. Unshackling Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative Encoding. ACM SIGEVOlution 2014, 7, 11–23. [Google Scholar] [CrossRef]
Mintchev, S.; Floreano, D. Adaptive Morphology: A Design Principle for Multimodal and Multifunctional Robots. IEEE Robot. Autom. Mag. 2016, 23, 42–54. [Google Scholar] [CrossRef]
Zhang, C.; Chen, J.; Li, J.; Peng, Y.; Mao, Z. Large Language Models for Human–Robot Interaction: A Review. Biomim. Intell. Robot. 2023, 3, 100131. [Google Scholar] [CrossRef]
Mao, Z.; Kobayashi, R.; Nabae, H.; Suzumori, K. Multimodal Strain Sensing System for Shape Recognition of Tensegrity Structures by Combining Traditional Regression and Deep Learning Approaches. IEEE Robot. Autom. Lett. 2024, 9, 10050–10056. [Google Scholar] [CrossRef]
Gupta, A.; Savarese, S.; Ganguli, S.; Li, F. Embodied Intelligence via Learning and Evolution. Nat. Commun. 2021, 12, 5721. [Google Scholar] [CrossRef]
Koike, R.; Ariizumi, R.; Matsuno, F. Simultaneous Optimization of Discrete and Continuous Parameters Defining a Robot Morphology and Controller. IEEE Trans. Neural Netw. Learning Syst. 2024, 35, 13816–13829. [Google Scholar] [CrossRef] [PubMed]
Geng, Y.; Yuan, L.; Guo, Y.; Tang, L.; Huang, H. Impulsive Guidance of Optimal Pursuit with Conical Imaging Zone for the Evader. Aerosp. Sci. Technol. 2023, 142, 108604. [Google Scholar] [CrossRef]
Yang, B.; Jiang, L.; Wu, W.; Zhen, R. Evolving Robotic Hand Morphology through Grasping and Learning. IEEE Robot. Autom. Lett. 2024, 9, 8475–8482. [Google Scholar] [CrossRef]
Tao, C.; Li, M.; Cao, F.; Gao, Z.; Zhang, Z. A Multiobjective Collaborative Deep Reinforcement Learning Algorithm for Jumping Optimization of Bipedal Robot. Adv. Intell. Syst. 2024, 6, 2300352. [Google Scholar] [CrossRef]
Li, B.; Gong, W.; Yang, Y.; Xiao, B. Appointed-fixed-time Observer based Sliding Mode Control for A Quadrotor UAV Under External Disturbances. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 7281–7294. [Google Scholar] [CrossRef]
Liu, H.; Li, B.; Xiao, B.; Ran, D.; Zhang, C. Reinforcement Learning-Based Tracking Control for a Quadrotor Unmanned Aerial Vehicle under External Disturbances. Int. J. Robust Nonlinear Control 2023, 33, 10360–10377. [Google Scholar] [CrossRef]
Li, B.; Liu, H.; Ahn, C.K.; Gong, W. Optimized Intelligent Tracking Control for a Quadrotor Unmanned Aerial Vehicle with Actuator Failures. Aerosp. Sci. Technol. 2024, 144, 108803. [Google Scholar] [CrossRef]
Wang, T.; Zhou, Y.; Fidler, S.; Ba, J. Neural Graph Evolution: Towards Efficient Automatic Robot Design. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar] [CrossRef]
Duan, K.; Suen, C.W.K.; Zou, Z. Robot Morphology Evolution for Automated HVAC System Inspections Using Graph Heuristic Search and Reinforcement Learning. Autom. Constr. 2023, 153, 104956. [Google Scholar] [CrossRef]
Jing, G.; Tosun, T.; Yim, M.; Kress-Gazit, H. Accomplishing High-Level Tasks with Modular Robots. Auton. Robots 2018, 42, 1337–1354. [Google Scholar] [CrossRef]
Luck, K.S.; Amor, H.B.; Calandra, R. Data-efficient co-adaptation of morphology and behaviour with deep reinforcement learning. In Proceedings of the Conference on Robot Learning, Virtual, 16–18 November 2020; Volume 100, pp. 854–869. [Google Scholar] [CrossRef]
Schaff, C.; Yunis, D.; Chakrabarti, A.; Walter, M.R. Jointly Learning to Construct and Control Agents Using Deep Reinforcement Learning. In Proceedings of the International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 9798–9805. [Google Scholar] [CrossRef]
Ha, D. Reinforcement learning for improving agent design. Artif. Life 2019, 25, 352–365. [Google Scholar] [CrossRef] [PubMed]
Huang, B.; Cheng, R.; Li, Z.; Jin, Y.; Tan, K.C. EvoX: A Distributed GPU-Accelerated Framework for Scalable Evolutionary Computation. IEEE Trans. Evol. Comput. 2024, 1. [Google Scholar] [CrossRef]
Makoviychuk, V.; Wawrzyniak, L.; Guo, Y.; Lu, M.; Storey, K.; Macklin, M.; Hoeller, D.; Rudin, N.; Allshire, A.; Handa, A.; et al. Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), Virtual, 6–14 December 2021. [Google Scholar] [CrossRef]
Rudin, N.; Hoeller, D.; Reist, P.; Hutter, M. Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. In Proceedings of the Conference on Robot Learning, Auckland, New Zealand, 14–18 December 2022; pp. 91–100. [Google Scholar] [CrossRef]
Bradbury, J.; Frostig, R.; Hawkins, P.; Johnson, M.J.; Leary, C.; Maclaurin, D.; Necula, G.; Paszke, A.; VanderPlas, J.; Wanderman-Milne, S.; et al. JAX: Composable Transformations of Python+NumPy Programs. 2018. Available online: http://github.com/jax-ml/jax (accessed on 4 May 2025).
Shishir, M.I.R.; Tabarraei, A. Multi-Materials Topology Optimization Using Deep Neural Network for Coupled Thermo-Mechanical Problems. Comput. Struct. 2024, 291, 107218. [Google Scholar] [CrossRef]
Golberg, D.E. Genetic Algorithms in Search, Optimization, and Machine Learning; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1989. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Goldberg, D.E.; Deb, K. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms. In Foundations of Genetic Algorithms; Elsevier: Amsterdam, The Netherlands, 1991; Volume 1, pp. 69–93. [Google Scholar] [CrossRef]
Lu, C.; Kuba, J.; Letcher, A.; Metz, L.; Schroeder de Witt, C.; Foerster, J. Discovered policy optimisation. Adv. Neural Inform. Process. Syst. 2022, 35, 16455–16468. [Google Scholar] [CrossRef]

Figure 1. The morphology of a space modular robot. (a) Overview. (b) Multiway tree.

Figure 2. Space pursuit–evasion task.

Figure 3. Inner–outer loop computational framework.

Figure 4. Crossover and mutation for the morphology. (a) Crossover. (b) Function mutation. (c) Connection method mutation.

Figure 5. Architecture of the PPO algorithm.

Figure 6. The optimal morphologies obtained through optimization in FEEnv (top) and MEEnv (bottom). (a) From generation 0 with

M A = 169.0

. (b) From generation 25 with

M A = 273.6

. (c) From generation 50 with

M A = 318.0

. (d) From generation 0 with

M A = - 156.1

. (e) From generation 25 with

M A = 145.5

. (f) From generation 50 with

M A = 200.7

.

Figure 6. The optimal morphologies obtained through optimization in FEEnv (top) and MEEnv (bottom). (a) From generation 0 with

M A = 169.0

. (b) From generation 25 with

M A = 273.6

. (c) From generation 50 with

M A = 318.0

. (d) From generation 0 with

M A = - 156.1

. (e) From generation 25 with

M A = 145.5

. (f) From generation 50 with

M A = 200.7

.

Figure 7. Fitness of populations in FEEnv and MEEnv.

Figure 8. Relative pursuit–evasion trajectories. (a) Trajectories with the morphology Figure 6a in FEEnv. (b) Trajectories with the morphology Figure 6c in FEEnv. (c) Trajectories with the morphology Figure 6d in MEEnv. (d) Trajectories with the morphology Figure 6f in MEEnv.

Figure 9. Average cumulative reward of two optimal individuals in FEEnv and MEEnv.

Figure 10. Fuel consumption of two optimal individuals in FEEnv and MEEnv.

Figure 11. The optimal morphologies obtained using the proposed framework (top), GA+PPO (middle), and Random+PPO (bottom). (a) From generation 0 with

M A = - 156.1

. (b) From generation 25 with

M A = 145.5

. (c) From generation 50 with

M A = 200.7

. (d) From generation 0 with

M A = - 156.1

. (e) From generation 25 with

M A = 151.2

. (f) From generation 50 with

M A = 187.1

. (g) From generation 0 with

M A = - 156.1

. (h) From generation 25 with

M A = 135.0

. (i) From generation 50 with

M A = 139.1

.

Figure 11. The optimal morphologies obtained using the proposed framework (top), GA+PPO (middle), and Random+PPO (bottom). (a) From generation 0 with

M A = - 156.1

. (b) From generation 25 with

M A = 145.5

. (c) From generation 50 with

M A = 200.7

. (d) From generation 0 with

M A = - 156.1

. (e) From generation 25 with

M A = 151.2

. (f) From generation 50 with

M A = 187.1

. (g) From generation 0 with

M A = - 156.1

. (h) From generation 25 with

M A = 135.0

. (i) From generation 50 with

M A = 139.1

.

Figure 12. Fitness of populations obtained using different optimization frameworks.

Table 1. Parameters of the space pursuit–evasion task.

Parameter	Value
$m_{e}$	150 kg
k	10 kg
$T_{s}$	50 N
$I_{s p}$	260 s
$m_{f}$	0.47 kg
$d_{safe}$	100 m
$d_{work}$	1000 m
p	100 s
$d_{p}$	50,000 m
R	6,971,393 m
$T_{maxe}$	100 N

Table 2. Hyper-parameters of the algorithm.

Parameter	Value
Population size	16
Generations	50
Tournament groups	16
Crossover rate	0.8
Mutation rate	0.5
Total steps	800,000
Learning rate	0.001
Epochs	10
Batch size	2048

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, W.; Meng, L.; Feng, F.; Guo, P.; Li, B. Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control for Space Modular Robot. Actuators 2025, 14, 234. https://doi.org/10.3390/act14050234

AMA Style

Luo W, Meng L, Feng F, Guo P, Li B. Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control for Space Modular Robot. Actuators. 2025; 14(5):234. https://doi.org/10.3390/act14050234

Chicago/Turabian Style

Luo, Wenwei, Ling Meng, Fei Feng, Pengyu Guo, and Bo Li. 2025. "Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control for Space Modular Robot" Actuators 14, no. 5: 234. https://doi.org/10.3390/act14050234

APA Style

Luo, W., Meng, L., Feng, F., Guo, P., & Li, B. (2025). Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control for Space Modular Robot. Actuators, 14(5), 234. https://doi.org/10.3390/act14050234

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control for Space Modular Robot

Abstract

1. Introduction

2. Model and Problem Statement

2.1. Model of a Space Modular Robot

2.2. Problem Statement

3. Inner–Outer Loop Intelligent Morphology Optimization and Pursuit–Evasion Control

3.1. Crossover and Mutation

3.2. Comprehensive Morphological Assessment

3.3. Proximal Policy Optimization Algorithm

3.4. Reward Function of the Space Pursuit–Evasion Task

3.5. Outline of the Inner–Outer Loop Computational Framework Implementation

4. Numerical Simulation

4.1. Configuration of Task and Algorithm

4.2. Simulation in Different Space Pursuit–Evasion Task Environments

4.3. Simulation in Different Optimization Frameworks

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI