A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite

Zhu, An; Ai, Haiping; Chen, Li

doi:10.3390/app12052662

Open AccessArticle

A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite

by

An Zhu

^1,2,

Haiping Ai

^1,2,*

and

Li Chen

²

¹

School of Energy and Mechanical Engineering, Jiangxi University of Science and Technology, Nanchang 330013, China

²

School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350116, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(5), 2662; https://doi.org/10.3390/app12052662

Submission received: 30 December 2021 / Revised: 7 February 2022 / Accepted: 25 February 2022 / Published: 4 March 2022

(This article belongs to the Special Issue Mechanisms and Robotics in Astronautic and Deep Space Exploration)

Download

Browse Figures

Versions Notes

Abstract

:

In order to prevent joints from being damaged by impact force in a space robot capturing satellite, a spring-damper device (SDD) is added between the joint motor and manipulator. The device can not only absorb and attrition impact energy, but also limit impact force to a safe range through reasonable design compliance control strategy. Firstly, the dynamic mode of the space robot and target satellite systems before capture are established by using a Lagrange function based on dissipation theory and Newton-Euler function, respectively. After that, the impact effect is analyzed and the hybrid system dynamic equation is obtained by combining Newton’s third law, momentum conservation, and a kinematic geometric relationship. To realize the buffer compliance stability control of the hybrid system, a reinforcement learning (RL) control strategy based on a fuzzy wavelet network is proposed. The controller consists of a performance measurement unit (PMU), an associative search network (ASN), and an adaptive critic network (ACN). Finally, the stability of system is proved by Lyapunov theorem, and both the impact resistance of SDD and the effectiveness of buffer compliance control strategy are verified by numerical simulation.

Keywords:

space robot; spring-damper device; buffer compliance control; fuzzy wavelet network; reinforcement learning

1. Introduction

Space exploration is of great significance to the development of resources exploration, meteorological observation, navigation, and positioning, so a large number of satellites are launched into space every year. It is inevitable that a small part of satellites will fail to enter the intended orbit or damage in orbit. If the satellites can be recovered, the cost of space exploration will be greatly saved. At present, it is feasible to use a space robot to complete the capture task. This has therefore become one of the research hot spots of space exploration [1,2,3,4,5,6,7,8]. Generally, the process of capture operation can be divided into four stages: (1) the space robot observes the target satellite; (2) a pre-operation stage before capture operation, such as deceleration and detumbling control of the target satellite; (3) contact and collision between the space robot and target satellite; and (4) the stabilization control of the closed chain hybrid system. The impact force makes it easy to damage the joints of space robot in the third stage, and the impact effect will make the hybrid system unstable in the fourth stage. Therefore, these two stages represent the focus and challenges of this study.

For the third stage, Cheng et al. [9] analyzed the dynamic evolution for dual-arm space robot with capturing a spin satellite and calm control for unstable closed chain composite system are discussed. Uyama et al. [10] presented the impedance-based contact control of a free-flying space robot utilizing a compliant wrist for non-cooperative satellite capture operation. An open loop impedance control law based on contact dynamics model is introduced to realize the desired coefficient of restitution defined between the manipulator hand of a space robot and a contact point on a free-flying target. Dimitrov et al. [11] established the contact dynamic model of a space robot capturing satellite and analyzed the momentum exchange problem in the capture process. Yoshida et al. [12] studied the collision dynamics and kinematics of a space robot capturing satellite based on momentum conservation. Dong et al. [13] approximately described the elastic deformation of flexible manipulator by using the assumed mode method, and then analyzed the collision dynamics of space robot capturing satellite by combining the momentum impulse. It is worth noting that scholars mainly focus on the analysis of collision dynamics, and they do not pay attention to the protection of the joints. In fact, if the joints are not protected during the capture operation, they will be damaged by impact force and could even cause damage to the space robot.

A series elastic actuator (SEA) is usually added between the joint motor and manipulator in the ground robot, which can avoid joint being damaged by impact force when the robot collides with the external environment [14,15,16,17,18,19,20,21]. However, the space robot is a rootless system, and it works in both microgravity and a vacuum. If the SEA is added in the joints of space robot (refer to our published previously work [3,22]), the flexible vibration caused by SEA will be difficult to suppress. Therefore, the controller was divided into a fast subsystem controller and slow subsystem controller, wherein the slow subsystem controller realizes trajectory tracking and the fast subsystem controller suppresses the flexible vibration. This means that the controller becomes more complex, and the joint motor needs to provide additional torque to suppress flexible vibration. Sometimes this torque will exceed the load capacity of the motor. In order to cope with the problem, this paper adds a spring-damper device (SDD) between the joint motor and manipulator. Compared with SEA, The SDD can not only absorb and digest the impact energy, but also quickly attenuate the flexible vibration to achieve the suppression of flexible vibration.

For the fourth stage, Liu et al. [23] used impedance control to stabilize the hybrid system after space robot captures satellite. Huang et al. [24] designed a reconfigurable control system for attitude takeover control of target spacecraft, which considers the changes of mass properties and reaction wheels’ configuration. Luo et al. [25] proposed a robust inertia free attitude takeover control scheme with guaranteed prescribed performance for post capture combined spacecraft with consideration of unmeasurable states, unknown inertial property, and external disturbance torque. Gangapersaud et al. [26] presented a novel detumbling strategy for realizing detumbling of a non-cooperative, tumbling target by a space robot without prior knowledge of the target’s inertial parameters (mass, inertia tensor, location of center of mass). However, the impact effect is not considered in the above control schemes. When the impact effect exceeds a certain value, the hybrid system will lose stability, which makes it difficult to realize stability control.

In recent years, intelligent algorithms have been more and more applied to the control of robot systems [27,28,29]. Since the target satellite has a large initial velocity, the hybrid system after capture will be in a serious unstable state (e.g., involving overturning and rotation) and the traditional control method will make it difficult to complete its calm control. The actor-critic reinforcement learning (RL) control can optimize itself through trial-and-error and interaction with dynamic environment, and has strong environmental adaptability [30,31,32,33,34]. Therefore, an RL control scheme based on fuzzy wavelet network is proposed to realize stabilization of the hybrid system. The controller consists of a performance measurement unit (PMU), an associative search network (ASN) and an adaptive critic network (ACN). The controller obtains the primary reinforcement signal through the PMU, and then uses the ACN to collect the primary reinforcement signal for generating the enhancement signal, which is used to adjust the ASN dominate strategy function to realize the hybrid system stability control. The mechanism of the proposed reinforcement learning can be found in [35]. The food is the primary reinforcement signal, and the bell is the secondary reinforcement signal. In order to maintain a stronger effect, the food and the bell are matched to generate the enhancement signal. Compared to RL in our published previously work [3], this paper enhanced the primary reinforcement signal to become an enhancement signal containing more information (which ultimately makes the calm control of the hybrid system more stable).

This paper is organized as follows. In Section 2, the structure of the SDD and the buffer compliance strategy are described. In Section 3, the hybrid system dynamic model is obtained, and the impact effect and force are analyzed. Furthermore, a RL control strategy based on fuzzy wavelet network is designed in Section 4. The results of the simulation are given in Section 5. Finally, the conclusion is summarized in Section 6.

2. Structure of the SDD and Buffer Compliance Strategy

2.1. Structure of the SDD

Compared with our published previously work [3], dampers are added. The structure diagram of the SDD is shown in Figure 1. The spring is used to absorb impact energy, and the damper provides damping force in real time to suppress flexible vibration. In order to describe the resistance of motor and manipulator more realistically, the equivalent dampers are added to them, respectively (in fact, there are no dampers), and the connection mode of the SDD is shown in Figure 2. In Figure 1,

K_{s i}, D_{t i} (i = 1, 2, \cdot \cdot \cdot, n)

are the stiffness of torsion spring and the damping coefficient of rotary damper,

D_{m i}, D_{L i} (i = 1, 2, \cdot \cdot \cdot, n)

are the damping coefficient of equivalent damper at motor and manipulator.

2.2. Buffer Compliance Strategy

In the third stage, the end-effector of space robot collides with the target satellite, and the impact energy will be quickly buffered and consumed by SDD when it is transmitted to the motor rotors so as to protect the joints. In the fourth stage, due to the impact effect, the instantaneous impact torque will be generated when the motors are turned on. If the instantaneous impact torque exceeds the limit of joint and the motor still turns on, the joint will be damaged. Therefore, it is necessary to set a torque threshold according to the ultimate torque value that the joints can withstand. When the instantaneous impact torque is detected to exceed the set threshold, the motor will be turned off. At this time, the spring will provide elasticity to mitigate the impact torque on joints, and the damper will quickly dissipate energy to suppress the flexible vibration. However, if only set one threshold is turned on, the motors will be frequently switched (which easily causes motor damage). The buffer compliance strategy proposed in this paper set a threshold in order to turn on and off at the same time. When the impact torques of the joints exceed the turn off torque threshold, the motor is turned off; when the impact torque is reduced to the turn on threshold by SDD, the motor is turned on again.

3. Dynamic Modeling and Impact Analysis

The structure of a space robot with SDD and target satellite systems is shown in Figure 3. The

O_{0}

,

O_{s}

and

O_{i} (i = 1, 2, \cdot \cdot \cdot, n)

are the centroids of carrier, satellite, and joints respectively. P and P′ are the acquisition point of space manipulator and the acquisition point of satellite, respectively.

x O y

is the inertial coordinate system moving with orbit,

x_{0} O_{0} y_{0}

and

x_{s} O_{s} y_{s}

are the coordinate systems fixed on the centroids of carrier and satellite, respectively,

x_{i} O_{i} y_{i} (i = 1, 2, \cdot \cdot \cdot, n)

is the coordinate system fixed at the centre of ith joint. The parameters of space robot and satellite are defined as follows:

m_{0}

,

m_{i} (i = 1, 2, \cdot \cdot \cdot, n)

and

m_{s}

are the mass of space robot carrier, manipulator, and satellite, respectively,

I_{i}

,

I_{m i} (i = 1, 2, \cdot \cdot \cdot, n)

and

I_{s}

are the moment of inertia of manipulator, motor rotor and satellite respectively,

d_{0}

,

d_{i} (i = 1, 2, \cdot \cdot \cdot, n)

and

d_{s}

are the distance from

O_{0}

to

O_{1}

, from the centre of joint to manipulator and from satellite centroid to the end, respectively.

L_{i} (i = 1, 2, \cdot \cdot \cdot, n)

is the length of the manipulator, and

θ_{0}

,

θ_{i}

,

θ_{s}

, and

θ_{m i} (i = 1, 2, \cdot \cdot \cdot, n)

are the angle of carrier attitude, manipulator, satellite attitude, and motor rotor, respectively.

As the system of space robot with SDD (see Figure 3), the total kinetic energy of space robot in the pre-contact phase is as follows:

T_{r} = \frac{1}{2} \sum_{i = 0}^{n} (m_{i} {\dot{r}}_{i}^{T} {\dot{r}}_{i} + I_{i} ω_{i}^{2}) + \frac{1}{2} \sum_{j = 1}^{n} (I_{m j} ω_{m j}^{2})

(1)

where

r_{i}

is the position vector of manipulator’s mass centre, and

ω_{i}

and

ω_{m j}

are the angular velocity of manipulator and motor rotor, respectively.

If the microgravity is ignored, the potential energy of space robot comes from the spring in SDD:

U_{r} = \frac{1}{2} \sum_{i = 1}^{n} [k_{s i} {(θ_{m i} - θ_{i})}^{2}]

(2)

Due to the addition of SDD in the joints of space robot, the space robot is a nonpotential system, the dissipation function should be added:

ϑ_{r} = \sum_{i = 1}^{n} [D_{m i} {\dot{θ}}_{m i}^{2} + D_{t i} {({\dot{θ}}_{m i} - {\dot{θ}}_{i})}^{2} + D_{L i} {\dot{θ}}_{i}^{2}]

(3)

Reference [36], the Lagrange equation of dissipative system is as follows:

\frac{d}{d t} (\frac{\partial L_{r}}{\partial {\dot{q}}_{r}}) - \frac{\partial L_{r}}{\partial q_{r}} + \frac{\partial ϑ_{r}}{\partial {\dot{q}}_{r}} = Q

(4)

Combined Equations (1)–(4), the model of space robot system before capture is obtained as follows:

\{\begin{cases} M_{r} (q_{r}) {\ddot{q}}_{r} + (H_{r} (q_{r}, {\dot{q}}_{r}) + D_{L}) {\dot{q}}_{r} = τ_{r} + J_{r}^{T} F_{P} \\ I_{m} {\ddot{q}}_{m} + D_{mc} {\dot{q}}_{m} + τ_{c} = τ_{m} \\ K_{s} (q_{m} - q_{c}) + D_{tc} ({\dot{q}}_{m} - {\dot{q}}_{c}) = τ_{c} \end{cases}

(5)

where

M_{r} \in R^{(n + 3) \times (n + 3)}

is the symmetric positive definite inertia matrix of space robot,

H_{r} \in R^{(n + 3) \times (n + 3)}

is the matrix containing Coriolis force and centrifugal force,

D_{L} \in R^{(n + 3) \times (n + 3)}

is the augmented equivalent damper coefficient matrix of manipulator,

D_{mc} \in R^{n \times n}

is the equivalent damper coefficient matrix of motor,

D_{tc} \in R^{n \times n}

is the damper coefficient matrix in SDD.

q_{r} = {[x_{0}, y_{0}, θ_{0}, q_{c}^{T}]}^{T}

is the generalized coordinate of space robot system, and

q_{c} = {[θ_{1}, θ_{2}, \cdot \cdot \cdot, θ_{n}]}^{T}

,

q_{m} = {[θ_{m 1}, θ_{m 2}, \cdot \cdot \cdot, θ_{m n}]}^{T}

.

τ_{r} = {[τ_{B}^{T}, τ_{0}, τ_{c}^{T}]}^{T}

,

τ_{B} = {[0, 0]}^{T}

,

τ_{0}

is the input torque of carrier,

τ_{c} = {[τ_{1}, τ_{2}, \cdot \cdot \cdot, τ_{n}]}^{T}

is the joint input torque,

τ_{mc} = {[τ_{m 1}, τ_{m 2}, \cdot \cdot \cdot, τ_{m n}]}^{T}

is the motor output torque.

I_{m} = {[I_{m 1}, I_{m 2}, \cdot \cdot \cdot, I_{m n}]}^{T}

is the inertia matrix of motor rotor,

K_{s} = {[k_{s 1}, k_{s 2}, \cdot \cdot \cdot, k_{s n}]}^{T}

is the stiffness matrix of spring.

J_{r} \in R^{3 \times (n + 3)}

is the Jacobian matrix of space robot,

F_{P} \in R^{3 \times 1}

is the force acting on the end-effector of space robot.

The model of satellite system before capture is obtained by the Newton–Euler equation:

M_{s} {\ddot{q}}_{s} = J_{s}^{T} F_{p^{'}}

(6)

where

M_{s} \in R^{3 \times 3}

is the symmetric positive definite inertia matrix of satellite,

q_{s} = {[x_{s}, y_{s}, θ_{s}]}^{T}

is the generalized coordinate of satellite system,

x_{s}

and

y_{s}

are the satellite centroid coordinates,

J_{s} \in R^{3 \times 3}

is the Jacobian matrix for satellite,

F_{P^{'}} \in R^{3 \times 1}

is the force acting on the satellite, and

F_{P} + F_{P^{'}} = 0

.

After the capture operation, the velocity on end-effector of space robot and satellite meets the following requirements:

S_{P} (t_{0} + Δ t) = S_{P^{'}} (t_{0} + Δ t)

(7)

where

Δ t

is the length of collision time, and

S_{P} = {[{\dot{x}}_{p}, {\dot{y}}_{p}, ω_{2}]}^{T}

,

S_{P^{'}} = {[{\dot{x}}_{p^{'}}, {\dot{y}}_{p^{'}}, {\dot{θ}}_{s}]}^{T}

. Through Equation (7), the acceleration of satellite after collision:

{\ddot{q}}_{s} (t_{0} + Δ t) = J_{s}^{- 1} J_{r} {\ddot{q}}_{r} (t_{0} + Δ t) + J_{s}^{- 1} ({\dot{J}}_{r} - {\dot{J}}_{s} J_{s}^{- 1} J_{r}) {\dot{q}}_{r} (t_{0} + Δ t)

(8)

Combining the law of momentum conservation with Equations (5) and (6), we can get the following equation:

\{\begin{cases} M_{r} [{\dot{q}}_{r} (t_{0} + Δ t) - {\dot{q}}_{r} (t_{0})] = J_{r}^{T} f_{P} \\ M_{s} [{\dot{q}}_{s} (t_{0} + Δ t) - {\dot{q}}_{s} (t_{0})] = J_{s}^{T} f_{P^{'}} \end{cases}

(9)

where

f_{P} = \int_{t_{0}}^{t_{0} + Δ t} F_{P} d t

and

f_{P^{'}} = \int_{t_{0}}^{t_{0} + Δ t} F_{P^{'}} d t

are the impact impulse, and

f_{P} + f_{P^{'}} = 0

.

The impact effect can be obtained by Equations (7) and (9):

{\dot{q}}_{r} (t_{0} + Δ t) = A^{- 1} [M_{r} {\dot{q}}_{r} (t_{0}) + B {\dot{q}}_{s} (t_{0})]

(10)

where

A = M_{r} + B J_{s}^{- 1} J_{r}

,

B = J_{r}^{T} {(J_{s}^{T})}^{- 1} Μ_{s}

. The impact force can be obtained by Equations (9) and (10)

F_{P} = \frac{{(J_{r}^{T})}^{+} M_{r} [(A^{- 1} M_{r} - E_{5 \times 1}) {\dot{q}}_{r} (t_{0}) + B {\dot{q}}_{s} (t_{0})]}{Δ t}

(11)

The hybrid system dynamic model can be obtained by Equations (5), (6) and (8) and

F_{P} + F_{P^{'}} = 0

:

[\begin{matrix} M_{rs 11} & M_{rs 12} \\ M_{rs 21} & M_{rs 22} \end{matrix}] [\begin{array}{l} \ddot{X} \\ {\ddot{q}}_{h} \end{array}] + [\begin{matrix} H_{rs 11} + D_{L 11} & H_{rs 12} + D_{L 12} \\ H_{rs 21} + D_{L 21} & H_{rs 22} + D_{L 22} \end{matrix}] [\begin{array}{l} \dot{X} \\ {\dot{q}}_{h} \end{array}] = [\begin{matrix} τ_{B} \\ τ_{h} \end{matrix}]

(12)

where

M_{rs} = M_{r} + B J_{s}^{- 1} J_{r}

,

H_{rs} = H_{r} + B J_{s}^{- 1} ({\dot{J}}_{r} - {\dot{J}}_{s} J_{s}^{- 1} J_{r})

,

q_{h} = {[θ_{0}, q_{c}^{T}]}^{T}

,

X = {[x_{0}, y_{0}]}^{T}

,

τ_{h} = {[τ_{0}, τ_{c}^{T}]}^{T}

.

The dynamic model of fully controllable hybrid system can be obtained from Equation (12):

\{\begin{cases} M_{h} {\ddot{q}}_{h} + H_{h} {\dot{q}}_{h} + D_{Lh} {\dot{q}}_{h} = τ_{h} \\ I_{m} {\ddot{q}}_{m} + D_{mc} {\dot{q}}_{m} + τ_{c} = τ_{m} \\ K_{s} (q_{m} - q_{c}) + D_{tc} ({\dot{q}}_{m} - {\dot{q}}_{c}) = τ_{c} \end{cases}

(13)

where

D_{Lh} = D_{L 22}

,

M_{h} = M_{rs 22} - M_{rs 21} M_{rs 11}^{- 1} M_{rs 12}

,

H_{h} = H_{rs 22} - M_{rs 21} M_{rs 11}^{- 1} H_{rs 12}

.

4. Design of Controller

The controller consists of a PMU, an ASN, and an ACN. Since the primary reinforcement signal is used to design the control torque directly, it easily leads to failure of stability control. Therefore, the controller obtains the primary reinforcement signal through the PMU, in turn using the ACN to construct a more informative signal than the primary reinforcement alone in order to tune the ASN realize stability control of the hybrid system. The control based on fuzzy logic RL is shown in Figure 4.

Property 1.

The matrix

{\dot{M}}_{h} (q_{h})

and

H_{h} (q_{h}, {\dot{q}}_{h})

are skew symmetric matrix such that

\frac{1}{2} z^{T} {\dot{M}}_{h} (q_{h}) z = z^{T} H_{h} (q_{h}, {\dot{q}}_{h}) z

, where

z \in R^{n + 1}

is the arbitrary column vector.

Assumption 1.

The matrix

M_{h} (q_{h})

and

H_{h} (q_{h}, {\dot{q}}_{h})

are bounded such that

‖H_{h} (q_{h}, {\dot{q}}_{h}) {\dot{q}}_{h}‖ \leq λ_{H} ‖{\dot{q}}_{h}‖

,

λ_{m} \leq ‖M_{h} (q_{h})‖ \leq λ_{M}

, where

λ_{m}

,

λ_{M}

and

λ_{H}

are positive constants.

The position error and performance measurement signal are defined as follows:

e = q_{d} - q_{h}

(14)

S = \dot{e} + Λ e

(15)

where

Λ = diag (Λ_{1}, Λ_{2}, \cdot \cdot \cdot, Λ_{n + 1})

is a diagonal matrix.

The primary reinforcement signal can be obtained from Equations (14) and (15):

r (t) = L σ (t)

(16)

where

L = diag (\frac{1}{ρ_{1}}, \frac{1}{ρ_{2}}, \cdot \cdot \cdot, \frac{1}{ρ_{n + 1}}), ρ_{i} > 0, (i = 1, 2, \cdot \cdot \cdot, n + 1)

,

σ (t) = [σ_{1} (s_{1} (t)), σ_{2} (s_{2} (t))

,

\cdot \cdot \cdot, σ_{n + 1} (s_{n + 1} (t))]^{T}

,

σ_{i} (s_{i} (t)) = \frac{e^{ρ_{i} s_{i} (t)} - e^{- ρ_{i} s_{i} (t)}}{e^{ρ_{i} s_{i} (t)} + e^{- ρ_{i} s_{i} (t)}}, (i = 1, 2, \cdot \cdot \cdot, n + 1)

. Then,

M_{h} \dot{r} = - (H_{h} + D_{Lh}) r - τ_{h} + χ (x)

(17)

where

χ (x) = M_{h} [{\ddot{q}}_{d} + Λ \dot{e} - diag (σ^{2}) (\ddot{e} + Λ \dot{e})] + H_{h} ({\dot{q}}_{d} - \dot{e} + r)

,

x = {[q_{h}, {\dot{q}}_{h}, {\ddot{q}}_{h}, q_{d}, {\dot{q}}_{d}, {\ddot{q}}_{d}]}^{T}

is the uncertain item in system.

In order to eliminate the influence of uncertain item on control accuracy, ASN is used to estimate the controller of wavelet neural network. It is assumed that the estimation of uncertain item by wavelet neural network is as follows:

\hat{χ} = {\hat{W}}_{S}^{T} ψ (x, \hat{l}, \hat{ϖ})

(18)

where

{\hat{W}}_{S} = [{\hat{w}}_{S 1}, {\hat{w}}_{S 2}, \cdot \cdot \cdot, {\hat{w}}_{S m}]

is the estimates of the weight matrix.

Based on this, the control signals are designed as follow:

τ_{h} = K_{r} r + \hat{χ} + τ_{H}

(19)

where

K_{r} \in R^{(n + 1) \times (n + 1)}

is a positive definite diagonal matrix,

τ_{H} = \frac{1}{k_{d}} r, k_{d} > 0

.

Combine Equations (17) and (19) to get:

M_{h} \dot{r} = - (K_{r} + H_{h} + D_{Lh}) r + \tilde{χ} (x) - τ_{H}

(20)

where

\tilde{χ} = χ - \hat{χ}

.

The primary reinforcement signal can be enhanced as follows:

r_{s} (t) = r (t) + Γ W_{C}^{* T} (t) ψ^{*} (t)

(21)

where

W_{C}^{*}

is the ideal weight matrix,

ψ^{*}

is the ideal regression matrix,

Γ = diag (r_{s 1}, r_{s 2}, \cdot \cdot \cdot, r_{s, n + 1})

and

r_{s i}

is the element in

r_{s}

.

It is assumed that the enhancement signal can be approximated by ACN as follows:

{\hat{r}}_{s} (t) = r (t) + Γ {\hat{W}}_{C}^{T} (t) \hat{ψ} (t)

(22)

where

{\hat{W}}_{C} = [{\hat{w}}_{C 1}, {\hat{w}}_{C 2}, \cdot \cdot \cdot, {\hat{w}}_{C m}]

is the estimated value of weight matrix, and

\hat{ψ}

is the estimated value of regression matrix. Expand

\tilde{ψ} = ψ^{*} - \hat{ψ}

by Taylor as follows:

\tilde{ψ} = {[\begin{matrix} {(\frac{\partial ψ_{1}}{\partial ϖ})}^{T} \\ ⋮ \\ {(\frac{\partial ψ_{N}}{\partial ϖ})}^{T} \end{matrix}]|}_{ϖ = \hat{ϖ}} \tilde{ϖ} + {[\begin{matrix} {(\frac{\partial ψ_{1}}{\partial l})}^{T} \\ ⋮ \\ {(\frac{\partial ψ_{N}}{\partial l})}^{T} \end{matrix}]|}_{l = \hat{l}} \tilde{l} + ξ

(23)

where

\tilde{ϖ} = ϖ^{*} - \hat{ϖ}

,

\tilde{l} = l^{*} - \hat{l}

. The Equation (23) can be rewritten as:

\tilde{ψ} = α^{T} \tilde{ϖ} + β^{T} \tilde{l} + ξ

(24)

Assumption 2.

The ideal weight

w_{C k}^{*}

and

ϖ_{j}^{*}

, centre value

l_{j}^{*}

, and width

ϖ_{j}^{*}

of fuzzy wavelet neural network are bounded such that

0 < ‖w_{C k}^{*}‖ \leq b_{w c k}

,

0 < ‖w_{S k}^{*}‖ \leq b_{s c k}

,

0 < ‖ϖ_{j}^{*}‖ \leq b_{ω j}

,

0 < ‖l_{j}^{*}‖ \leq b_{l j}

,

k = 1, 2, \cdot \cdot \cdot, m

,

j = 1, 2, \cdot \cdot \cdot, N

.

Assumption 3.

The Taylor remainder

ξ

,

\partial ψ_{j} / \partial ϖ

and

\partial ψ_{j} / \partial l

are bounded such that

‖ξ‖ \leq b_{ξ}

,

\partial ψ_{j} / \partial ϖ \leq b_{η j}

,

\partial ψ_{j} / \partial ϖ \leq b_{κ j}

.

Combined with Equation (24), the output of ACN can be expressed as follows:

{\hat{W}}_{C}^{T} \hat{ψ} = - {\tilde{W}}_{C}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l}) - {\hat{W}}_{C}^{T} (α^{T} \tilde{ϖ} + β^{T} \tilde{l}) - υ_{1}

(25)

where

υ_{1} = W_{C}^{* T} (ψ^{*} - α^{T} ϖ^{*} - β^{T} l^{*} - ξ) - {\hat{W}}_{C}^{T} (α^{T} ϖ^{*} + β^{T} l^{*})

,

{\tilde{W}}_{C} = W_{C}^{*} - {\hat{W}}_{C}

. Then:

χ - {\hat{W}}_{C}^{T} \hat{ψ} = Ξ_{χ} + W_{S}^{* T} \tilde{ψ} + {\tilde{W}}_{S}^{T} \hat{ψ}

(26)

where

{\tilde{W}}_{S} = W_{S}^{*} - {\hat{W}}_{S}

,

Ξ_{χ} = χ - W_{S}^{* T} ψ^{*}

.

Combined with Equation (26), the Equation (20) can be rewrite:

M_{h} \dot{r} = - (K_{r} + H_{h} + D_{Lh}) r + Ξ_{χ} + W_{S}^{* T} \tilde{ψ} + {\tilde{W}}_{S}^{T} \hat{ψ} - τ_{H}

(27)

The adaptive rate of ASN and ACN are designed as follows:

{\dot{\hat{w}}}_{S k} = k_{w s k} ({\hat{r}}_{s k} + {\hat{r}}_{s k} {\hat{w}}_{C k}^{T} \hat{ψ}) (\hat{ψ} - I_{s k} \frac{{\hat{w}}_{S k} {\hat{w}}_{S k}^{T} \hat{ψ}}{{‖{\hat{w}}_{S k}‖}^{2}})

(28)

{\dot{\hat{w}}}_{C k} = - k_{w c k} {\hat{r}}_{s k} ({\hat{w}}_{S k}^{T} \hat{ψ}) ((\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l}) - I_{c k} \frac{{\hat{w}}_{C k} {\hat{w}}_{C k}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l})}{{‖{\hat{w}}_{C k}‖}^{2}})

(29)

\dot{\hat{ϖ}} = - k_{ω} (A {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ} - I_{ω} \frac{\hat{ω} {\hat{ω}}^{T} (A {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ})}{{‖\hat{ϖ}‖}^{2}})

(30)

\dot{\hat{l}} = - k_{l} (B {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ} - I_{l} \frac{\hat{l} {\hat{l}}^{T} (B {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ})}{{‖\hat{l}‖}^{2}})

(31)

where

k_{w s k}

,

k_{w c k}

,

k_{ω}

and

k_{l}

are positive constants, and the value of

I_{s k}

,

I_{c k}

,

I_{ω}

and

I_{l}

are as follows:

I_{s k} = \{\begin{cases} 0 if (‖{\hat{w}}_{S k}‖ < b_{w s k}) or (‖{\hat{w}}_{S k}‖ = b_{w s k}, ({\hat{r}}_{s k} + {\hat{r}}_{s k} {\hat{w}}_{C k}^{T} \hat{ψ}) {\hat{w}}_{S k}^{T} \hat{ψ} \leq 0) \\ 1 if (‖{\hat{w}}_{S k}‖ = b_{w s k}, ({\hat{r}}_{s k} + {\hat{r}}_{s k} {\hat{w}}_{C k}^{T} \hat{ψ}) {\hat{w}}_{S k}^{T} \hat{ψ} > > 0) \end{cases}

(32)

I_{c k} = \{\begin{cases} 0 if (‖{\hat{w}}_{C k}‖ < < b_{w c k}) or (‖{\hat{w}}_{C k}‖ = b_{w c k}, {\hat{r}}_{s k} ({\hat{w}}_{S k}^{T} \hat{ψ}) {\hat{w}}_{C k}^{T} ((\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l})) \geq 0) \\ 1 if (‖{\hat{w}}_{S k}‖ = b_{w c k}, {\hat{r}}_{s k} ({\hat{w}}_{S k}^{T} \hat{ψ}) {\hat{w}}_{C k}^{T} ((\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l})) < < 0) \end{cases}

(33)

I_{ω} = \{\begin{cases} 0 if (‖\hat{ϖ}‖ < < b_{ϖ}) or (‖\hat{ϖ}‖ = b_{ϖ}, {\hat{ϖ}}^{T} (A {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ}) \geq 0) \\ 1 if ‖\hat{ϖ}‖ = b_{ϖ}, {\hat{ϖ}}^{T} (A {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ}) < < 0) \end{cases}

(34)

I_{l} = \{\begin{cases} 0 if (‖\hat{l}‖ < < b_{l}) or (‖\hat{l}‖ = b_{l}, {\hat{l}}^{T} (B {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ}) \geq 0) \\ 1 if ‖\hat{l}‖ = b_{l}, {\hat{l}}^{T} (B {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ}) < < 0) \end{cases}

(35)

Theorem 1.

For the hybrid system dynamic mode of the Equation (13), supposing that Assumptions 1 to 3 hold and adopting the error evaluation signal shown in Equation (15), the control signal shown in Equation (19), the reinforcement signal shown in Equations (16) and (21), and the update rate of fuzzy wavelet neural network shown in Equations (28)–(31) can ensure that the trajectory tracking error

e

converges to zero asymptotically.

Proof of Theorem 1.

Introducing the Lyapunov function:

V = \frac{1}{2} r^{T} M_{h} r + \frac{1}{2} \sum_{k = 1}^{m} \frac{1}{k_{w s k}} {\tilde{w}}_{S k}^{T} {\tilde{w}}_{S k} + \frac{1}{2} \sum_{k = 1}^{m} \frac{1}{k_{w c k}} {\tilde{w}}_{C k}^{T} {\tilde{w}}_{C k} + \frac{1}{2 k_{ω}} {\tilde{ω}}^{T} \tilde{ω} + \frac{1}{2 k_{l}} {\tilde{l}}^{T} \tilde{l}

(36)

Then, derivative of the Equation (36)

\dot{V} = r^{T} M_{h} \dot{r} + \frac{1}{2} r^{T} {\dot{M}}_{h} r - \sum_{k = 1}^{m} \frac{1}{k_{w s k}} {\tilde{w}}_{S k}^{T} {\dot{\hat{w}}}_{S k} - \sum_{k = 1}^{m} \frac{1}{k_{w c k}} {\tilde{w}}_{C k}^{T} {\dot{\hat{w}}}_{C k} - \frac{1}{k_{ω}} {\tilde{ω}}^{T} \dot{\hat{ω}} - \frac{1}{k_{l}} {\tilde{l}}^{T} \dot{\hat{l}}

(37)

Combined with Property 1 and Equations (27)–(29), the Equation (37) can be rewrite:

\begin{matrix} \dot{V} = & r^{T} (Ξ_{χ} + W_{S}^{* T} \tilde{ψ}) - r^{T} τ_{H} - {(W_{S}^{* T} \hat{ψ})}^{T} Γ {\hat{W}}_{C}^{T} \hat{ψ} + {({\hat{W}}_{S}^{T} \hat{ψ})}^{T} Γ {\hat{W}}_{C}^{T} \hat{ψ} - \\ \sum_{k = 1}^{m} (r_{s k} + r_{s k} {\hat{w}}_{C k}^{T} \hat{ψ}) I_{s k} \frac{{\tilde{w}}_{S k}^{T} {\hat{w}}_{S k} {\hat{w}}_{S k}^{T} \hat{ψ}}{{‖{\hat{w}}_{S k}‖}^{2}} - r^{T} K_{DLr} r - \frac{1}{k_{ω}} {\tilde{ω}}^{T} \dot{\hat{ω}} - \frac{1}{k_{l}} {\tilde{l}}^{T} \dot{\hat{l}} + \\ \sum_{k = 1}^{m} r_{s k} ({\hat{w}}_{S k}^{T} \hat{ψ}) ({\tilde{w}}_{C k}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l}) - I_{c k} \frac{{\tilde{w}}_{C k}^{T} {\hat{w}}_{C k} {\hat{w}}_{C k}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l})}{{‖{\hat{w}}_{C k}‖}^{2}}) \end{matrix}

(38)

where

K_{DLr} = K_{r} + D_{Lc}

. Substituting Equations (26), (30) and (31) into Equation (38) can obtain:

\begin{matrix} \dot{V} = & - {(W_{S}^{* T} \hat{ψ})}^{T} Γ {\hat{W}}_{C}^{T} \hat{ψ} - {({\hat{W}}_{S}^{T} \hat{ψ})}^{T} Γ [{\tilde{W}}_{C}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l}) + {\hat{W}}_{C}^{T} (α^{T} \tilde{ϖ} + β^{T} \tilde{l}) + Ξ_{χ}] - \\ r^{T} K_{DLr} r + r^{T} (Ξ_{χ} + W_{S}^{* T} \tilde{ψ}) - r^{T} τ_{H} + \sum_{k = 1}^{m} (r_{s k} + r_{s k} {\hat{w}}_{C k}^{T} \hat{ψ}) I_{s k} \frac{{\tilde{w}}_{S k}^{T} {\hat{w}}_{S k} {\hat{w}}_{S k}^{T} \hat{ψ}}{{‖{\hat{w}}_{S k}‖}^{2}} + \\ \sum_{k = 1}^{m} r_{s k} ({\hat{w}}_{S k}^{T} \hat{ψ}) ({\tilde{w}}_{C k}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l}) - I_{c k} \frac{{\tilde{w}}_{C k}^{T} {\hat{w}}_{C k} {\hat{w}}_{C k}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l})}{{‖{\hat{w}}_{C k}‖}^{2}}) + \\ {\tilde{ϖ}}^{T} ((A {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ} - I_{ω} \frac{\hat{ω} {\hat{ω}}^{T} (A {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ})}{{‖\hat{ϖ}‖}^{2}}) + {\tilde{l}}^{T} (B {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ} - I_{l} \frac{\hat{l} {\hat{l}}^{T} (B {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ})}{{‖\hat{l}‖}^{2}}) \end{matrix}

(39)

Based on Assumption 2, Assumption 3 and Equations (32)–(35) to obtain:

I_{s k} (r_{s k} + r_{s k} {\hat{w}}_{C k}^{T} \hat{ψ}) {\tilde{w}}_{S k}^{T} {\hat{w}}_{S k} {\hat{w}}_{S k}^{T} \hat{ψ} / {‖{\hat{w}}_{S k}‖}^{2} \leq 0 r_{s k} ({\hat{w}}_{S k}^{T} \hat{ψ}) (I_{c k} {\tilde{w}}_{C k}^{T} {\hat{w}}_{C k} {\hat{w}}_{C k}^{T} (\hat{ψ} - α^{T} \hat{ϖ} - β^{T} \hat{l}) / {‖{\hat{w}}_{C k}‖}^{2}) \geq 0 I_{ω} ({\tilde{ϖ}}^{T} \hat{ω} {\hat{ω}}^{T} (A {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ}) / {‖\hat{ϖ}‖}^{2}) \geq 0 I_{l} ({\tilde{l}}^{T} \hat{l} {\hat{l}}^{T} (B {\hat{W}}_{C} Γ {\hat{W}}_{S}^{T} \hat{ψ}) / {‖\hat{l}‖}^{2}) \geq 0

Then

\begin{array}{l} \dot{V} \leq - r^{T} K_{DLr} r + r^{T} (Ξ_{χ} + W_{S}^{* T} \tilde{ψ}) - r^{T} τ_{H} + {(W_{S}^{* T} \hat{ψ})}^{T} Γ {\hat{W}}_{C}^{T} \hat{ψ} - {({\hat{W}}_{S}^{T} \hat{ψ})}^{T} Γ υ_{1} \\ \leq - r^{T} K_{DLr} r + r^{T} υ - r^{T} τ_{H} \end{array}

(40)

where

υ = Ξ_{χ} + W_{S}^{* T} \tilde{ψ} + diag (W_{S}^{* T} \hat{ψ}) {\hat{W}}_{C}^{T} \hat{ψ} - diag ({\hat{W}}_{S}^{T} \hat{ψ}) υ_{1}

.

The Equation (40) can be rewrite:

\begin{array}{l} \dot{V} \leq - \frac{1}{2} r^{T} [2 K_{DLr} + (\frac{2}{k_{d}} - \frac{1}{γ^{2}}) E] r - \frac{1}{2} {(\frac{1}{γ} r - γ υ)}^{T} (\frac{1}{γ} r - γ υ) + \frac{1}{2} γ^{2} υ^{T} υ \\ \leq - \frac{1}{2} r^{T} Q r + \frac{1}{2} γ^{2} υ^{T} υ \end{array}

(41)

where

Q = 2 K_{DLr} + (2 / k_{d} - 1 / γ^{2}) E

is a positive definite diagonal matrix,

E

is a unit matrix.

By integrating Equation (41) to obtain:

V (T) - V (0) \leq \frac{1}{2} \int_{0}^{T} r^{T} Q r d t + \frac{1}{2} γ^{2} \int_{0}^{T} υ^{T} υ d t

(42)

Then

\begin{array}{l} \int_{0}^{T} r^{T} Q r d t \leq r {(0)}^{T} M_{c} r (0) + \sum_{k = 1}^{m} \frac{1}{k_{w s k}} {\tilde{w}}_{S k}^{T} (0) {\tilde{w}}_{S k} (0) + \sum_{k = 1}^{m} \frac{1}{k_{w c k}} {\tilde{w}}_{C k}^{T} (0) {\tilde{w}}_{C k} (0) + \\ \frac{1}{k_{ω}} {\tilde{ω}}^{T} (0) \tilde{ω} (0) + \frac{1}{k_{l}} {\tilde{l}}^{T} (0) \tilde{l} (0) + γ^{2} \int_{0}^{T} υ^{T} υ d t \end{array}

(43)

Equation (43) can be written in the following form through references [37,38].

\int_{0}^{T} {‖r‖}^{2} Q d t \leq γ^{2} \int_{0}^{T} {‖υ‖}^{2} d t + β_{H}

(44)

where

β_{H} = r {(0)}^{T} M_{c} r (0) + \sum_{k = 1}^{m} \frac{1}{k_{w s k}} {\tilde{w}}_{S k}^{T} (0) {\tilde{w}}_{S k} (0) + \sum_{k = 1}^{m} \frac{1}{k_{w c k}} {\tilde{w}}_{C k}^{T} (0) {\tilde{w}}_{C k} (0) + \frac{1}{k_{ω}} {\tilde{ω}}^{T} (0) \tilde{ω} (0) + \frac{1}{k_{l}} {\tilde{l}}^{T} (0) \tilde{l} (0)

.

Through references [37,38], It can be known that

υ

is bounded by Assumption 1 and Assumption 2, further

\sup_{υ \in L_{2} [0, T]} {‖Q^{1 / 2} r‖}_{L 2 T} / {‖υ‖}_{L 2 T} \leq γ

when the initial states of the hybrid system are all zero. According to the definition of Q in Equation (41), it can be explained that the system is stable according to the

H \infty

control theory when

k_{d} = 2 γ^{2}

, and it can be seen that with the decrease of

γ

,

‖r‖

will also decrease. □

5. Simulation Results

5.1. Simulation of Impact Resistance of the SDD

Using the two-link space robot (

n = 2

) and satellite systems in Figure 3 for simulation analysis. The initial position and velocity of space robot are

q (0) = {[1.75, 0.52, 1.05]}^{T}

rad,

\dot{q} (0) = {[0, 0, 0]}^{T}

rad/s. The actual parameters of space robot and satellite systems are as follow:

m_{0} = 100 kg

,

m_{1} = m_{2} = 10 kg

,

m_{s} = 50 kg

,

L_{0} = 1 m

,

L_{1} = L_{2} = 2 m

,

d_{1} = d_{2} = 1 m

,

d_{s} = 0.5 m

,

I_{0} = 64 kg \cdot m^{2}

,

I_{1} = I_{2} = 3.5 kg \cdot m^{2}

,

I_{m 1} = I_{m 2} = 0.05 kg \cdot m^{2}

,

I_{s} = 12.5 kg \cdot m^{2}

,

k_{s 1} = k_{s 2} = 1000 N / rad

,

D_{m 1} = D_{m 2} = 28.65 N \cdot s / rad

,

D_{t 1} = D_{t 2} = 1146 N \cdot s / rad

,

D_{L 1} = D_{L 2} = 28.65 N \cdot s / rad

.

In order to verify the impact resistance of the SDD during the third stage, the space robot system with/without SDD was used to carry out acquisition simulation tests on spacecraft with different velocity. The simulation results are shown in Table 1.

As can be seen from Table 1, given different satellite velocities the SDD can significantly reduce the impact torque of joints, and the maximum can be reduced by 54.14%. Therefore, the SDD can be considered that it plays a good role in protecting joints during the third stage.

5.2. Buffer Compliance Control Strategy Performance Simulation

To show the buffer compliance control performance of the proposed controller, simulations are carried out for stable control phase. The actual parameters of the system are as follows:

Λ = diag (2, 2, 2)

,

ρ_{i} = 1.5 (i = 1, 2, 3)

,

K_{r} = diag (150, 150, 150)

,

γ = 0.03

,

k_{w c k} = 3

,

k_{w s k} = 3

,

k_{ϖ} = 2

,

k_{l} = 2

. The position of hybrid system after capture is

q (t_{0}) = {[{84.61}^{o}, {11.10}^{o}, {25.65}^{o}]}^{T}

, The initial position and velocity of space robot are the same as 5.1, the satellite initial velocity is

{\dot{q}}_{s} (t_{0}) = {[0.1 m / s, 0.1 m / s, 0.15 rad / s]}^{T}

, and the expected state of hybrid system is

q_{d} = {[100^{o}, 30^{o}, 60^{o}]}^{T}

.

In order to highlight the advantages of SDD on SEA, the SEA structure of reference [3] is used for comparative analysis. In reference [3], to suppress the flexible vibration introduced by SEA, the controller is divided into a fast subsystem controller and slow subsystem controller, wherein the slow subsystem controller realizes trajectory tracking and the fast subsystem controller suppresses the flexible vibration. The simulation results are shown in Figure 5, Figure 6 and Figure 7.

It can be seen from Figure 5 and Figure 6 that under the same control parameters, whether in regards joint 1 or joint 2, adding SEA requires greater output torque than adding SDD, which means adding SEA will make the joint motor need more load capacity. Figure 7 shows that without the fast subsystem controller, the joint flexible vibration will be difficult to suppress and the joint angle fails to reach the desired state.

Assume that when the joint actuators run, the limit of the impact torque it can bear is

150 N \cdot m

. In order to protect the joint actuators, the buffer compliance control strategy of active opening and closing actuators (named switching strategy) is adopted. The shutdown torque threshold is

120 N \cdot m

, and the startup torque threshold is

10 N \cdot m

. The simulation results are shown in Figure 8, Figure 9, Figure 10 and Figure 11.

It can be seen from Figure 8 that the joint motor enters the stable output state after four shutdowns. Figure 9 shows that the RL controller can continuously output reinforcement signals when the base attitude angle and joint angle do not reach the desired position and reinforcement signal will to zero when they reach the desired position. Figure 10 shows that the ACN and ASN are optimized through the interaction with the environment and finally make the hybrid system reach a stable state. Figure 11 shows that the joint impact is limited to the bear torque well, and the buffer compliance strategy can realize the compliance of capture operation well.

Considering that the impact torque that the joint can bear will decrease with the increase of the space robot service years, the second group of simulation sets the shutdown torque threshold is

80 N \cdot m

, and the startup torque threshold is

10 N \cdot m

. The simulation results are shown in Figure 12, Figure 13, Figure 14 and Figure 15.

Figure 12 shows the joint motor enters the stable output state after seven shutdowns. As can be seen from Figure 12, Figure 13, Figure 14 and Figure 15, even if the shutdown torque threshold is lowered, the joint impact torque can be limited well within a safe range, and the RL controller can still complete the calm control of the unstable hybrid system by outputting the reinforcement signal. This means that the SDD has excellent protection performance for space robot joints, and it plays an important role in prolonging the service life of space robot.

6. Conclusions

In this paper, in order to protect the joints of a space robot from impact damage in the process of capturing satellite operation, an SDD is added between joint motor and manipulator, and a buffer compliance strategy that matches the SDD is given. The dynamic model of hybrid system is derived, and the impact effect and impact force are calculated during the third stage. For the purpose of realizing the stabilization control of the system, an RL control based on fuzzy wavelet neural network is proposed.

In the third stage of capture operation, a huge impact torque will be generated at the motor joint. Adding SDD between the motor and the manipulator can realize the rapid unloading of impact force, and the maximum value can be reduced by 54.14%.

In the fourth stage of capture operation (i.e., matching with the buffer compliance strategy designed by SDD), the joints’ impact torque can be limited to a safe range. The shutdown threshold of buffer compliance strategy can be set flexibly, which can protect the joints of space robot with different service years.

Author Contributions

Conceptualization, A.Z.; methodology, A.Z. and H.A.; software, A.Z. and L.C.; investigation, A.Z. and H.A.; writing—original draft preparation, A.Z.; writing—review and editing, A.Z. and H.A.; supervision, H.A. and L.C.; funding acquisition, H.A. and L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 51741502, 11372073), Science and Technology Project of the Education Department of Jiangxi Province (Grant No. GJJ200864), Jiangxi University of Science and Technology PhD Research Initiation Fund (Grant No. 205200100514).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fu, X.; Ai, H.; Chen, L. Repetitive learning sliding mode stabilization control for a flexible-base, flexible-link and flexible-joint space robot capturing a satellite. Appl. Sci. 2021, 11, 8077. [Google Scholar] [CrossRef]
He, J.; Zheng, H.; Gao, F.; Zhang, H. Dynamics and control of a 7-DOF hybrid manipulator for capturing a non-cooperative target in space. Mech. Machine Theory 2019, 140, 83–103. [Google Scholar] [CrossRef]
Ai, H.; Zhu, A.; Wang, J.; Yu, X.; Chen, L. Buffer Compliance Control of Space Robots Capturing a Non-Cooperative Spacecraft Based on Reinforcement Learning. Appl. Sci. 2021, 11, 5783. [Google Scholar] [CrossRef]
Meng, Q.; Liang, J.; Ma, O. Identification of all the inertial parameters of a non-cooperative object in orbit. Aerosp. Sci. Technol. 2019, 91, 571–582. [Google Scholar] [CrossRef]
Liu, X.; Li, H.; Chen, Y.; Cai, G. Dynamics and control of space robot considering joint friction. Acta Astronaut. 2015, 111, 1–18. [Google Scholar] [CrossRef]
Li, M.; Guo, W.; Lin, R.; Wu, C. An efficient motion generation method for redundant humanoid robot arms based on motion continuity. Adv. Robot. 2018, 32, 1185–1196. [Google Scholar] [CrossRef]
Fu, X.; Ai, H.; Chen, L. Integrated Fixed Time Sliding Mode Control for Motion and Vibration of Space Robot with Fully Flexible Base–Link–Joint. Appl. Sci. 2021, 11, 11685. [Google Scholar] [CrossRef]
Liu, X.; Li, H.; Chen, Y.; Cai, G.; Wang, X. Dynamics and control of capture of a floating rigid body by a spacecraft robotic arm. Multibody Syst. Dyn. 2015, 33, 315–332. [Google Scholar] [CrossRef]
Cheng, J.; Chen, L. Mechanical analysis and calm control of dual-arm space robot for capturing a satellite. Chin. J. Theor. Appl. Mech. 2016, 48, 832–842. (In Chinese) [Google Scholar]
Uyama, N.; Hirano, D.; Nakanishi, H.; Nagaoka, K.; Yoshida, K. Impedance-based contact control of a free-flying space robot with respect to coefficient of restitution. IEEE/SICE Int. Symp. Syst. Integr. 2012, 1196–1201. [Google Scholar] [CrossRef]
Dimitrov, D.; Yoshida, K. Utilization of the bias momentum approach for capturing a tumbling satellite. IEEE/RSJ Int. Conf. Intell. Robot. Syst. 2004, 3333–3338. [Google Scholar] [CrossRef]
Yoshida, K.; Nakanishi, H.; Ueno, H.; Inaba, N. Dynamics control and impedance matching for robotic capture of a non-cooperative satellite. Adv. Robot. 2004, 2, 175–198. [Google Scholar] [CrossRef]
Dong, Q.; Chen, L. Composite control of robust stabilization and adaptive vibration suppression of flexible space manipulator capturing a satellite. Robot 2014, 36, 342–348. (In Chinese) [Google Scholar]
Sariyildiz, E.; Chen, G.; Yu, H. An acceleration-based robust motion controller design for a novel series elastic actuator. IEEE Trans. Ind. Electron. 2016, 63, 1900–1910. [Google Scholar] [CrossRef]
Li, X.; Pan, Y.; Chen, G.; Yu, H. Continuous tracking control for a compliant actuator with two-stage stiffness. IEEE Trans. Autom. Sci. Eng. 2018, 15, 57–66. [Google Scholar] [CrossRef]
Lin, Y.; Chen, Z.; Yao, B. Decoupled torque control of series elastic actuator with adaptive robust compensation of time-varying load-side dynamics. IEEE Trans. Ind. Electron. 2019, 67, 5604–5614. [Google Scholar] [CrossRef]
Irmscher, C.; Woschke, E.; May, E.; Daniel, C. Design, Optimization and testing of a compact, inexpensive elastic element for series elastic actuators. Med. Eng. Phys. 2018, 52, 84–89. [Google Scholar] [CrossRef]
Keppler, M.; Lakatos, D.; Ott, C.; Albu-Schaffer, A. Elastic structure preserving (EPS) control for compliantly actuated robots. IEEE Trans. Robot. 2018, 34, 317–335. [Google Scholar] [CrossRef] [Green Version]
Yang, T.; Sun, N.; Fang, Y. Adaptive fuzzy control for a class of mimo underactuated systems with plant uncertainties and actuator deadzones: Design and experiments. IEEE Trans. Cybern. 2021, 99, 1–14. [Google Scholar] [CrossRef]
Sun, L.; Li, M.; Wang, M.; Yin, W.; Sun, N.; Liu, J. Continuous finite-time output torque control approach for series elastic actuator. Mech. Syst. Signal Processing 2020, 139, 105853. [Google Scholar] [CrossRef]
Wang, M.; Sun, L.; Yin, W.; Dong, S.; Liu, J. Continuous robust control for series elastic actuator with unknown payload parameters and external disturbances. IEEE/CAA J. Autom. Sin. 2017, 4, 620–627. [Google Scholar] [CrossRef]
Ai, H.; Chen, L. Buffer and compliant dynamic surface control of space robot capturing satellite based on compliant mechanism. Chin. J. Theor. Appl. Mech. 2020, 52, 975–984. (In Chinese) [Google Scholar]
Liu, S.; Wu, L.; Lu, Z. Impact dynamics and control of a flexible dual-arm space robot capturing an object. Appl. Math. Comput. 2007, 185, 1149–1159. [Google Scholar] [CrossRef]
Huang, X.; Duan, G. Attitude control and structure robust control allocation for combined spacecraft. Control. Theory Appl. 2018, 35, 1447–1457. (In Chinese) [Google Scholar]
Luo, J.; Wei, C.; Dai, H.; Yin, Z. Robust inertia-free attitude takeover control of post capture combined spacecraft with guaranteed prescribed performance. ISA Trans. 2018, 74, 28–44. [Google Scholar] [CrossRef]
Gangapersaud, R.; Liu, G.; De Ruiter, A. Detumbling of a non-cooperative target with unknown inertial parameters using a space robot. Adv. Space Res. 2019, 63, 3900–3915. [Google Scholar] [CrossRef]
Sands, T. Development of Deterministic Artificial Intelligence for Unmanned Underwater Vehicles (UUV). J. Mar. Sci. Eng. 2020, 8, 578. [Google Scholar] [CrossRef]
Ai, H.; Chen, L. Passivity-based neural network H∞ avoidance compliant control of space robot capturing spacecraft. Opt. Precis. Eng. 2020, 28, 717–726. (In Chinese) [Google Scholar] [CrossRef]
Chen, Z.; Gao, F. Time-optimal trajectory planning method for six-legged robots under actuator constraints. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2019, 233, 095440621983307. [Google Scholar] [CrossRef]
Hu, Y.; Si, B. A reinforcement learning neural network for robotic manipulator control. Neural Comput. 2018, 30, 1983–2004. [Google Scholar] [CrossRef] [Green Version]
Li, Z.; Liu, J.; Huang, Z.; Peng, Y.; Pu, H.; Ding, L. Adaptive impedance control of human-robot cooperation using reinforcement learning. IEEE Trans. Ind. Electron. 2017, 64, 8013–8022. [Google Scholar] [CrossRef]
Leottau, D.; Ruiz-Del-Solar, J.; Babuška, R. Decentralized reinforcement learning of robot behaviors. Artif. Intell. 2018, 256, 130–159. [Google Scholar] [CrossRef] [Green Version]
Kobayashi, T. Student-t policy in reinforcement learning to acquire global optimum of robot control. Appl. Intell. 2019, 49, 4335–4347. [Google Scholar] [CrossRef]
Tai, L.; Liu, M. Mobile robots exploration through cnn-based reinforcement learning. Robot. Biomim. 2016, 3, 24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lee, C.C. A self-learning rule-based controller employing approximate reasoning and neural net concepts. Int. J. Intell. Syst. 1991, 6, 71–93. [Google Scholar] [CrossRef]
Bustamante, C.; Gadea, E.; Horsfield, A.; Todorov, T.; González Lebrero, M.; Scherlis, D. Dissipative equation of motion for electromagnetic radiation in quantum dynamics. Phys. Rev. Lett. 2021, 126, 087401. [Google Scholar] [CrossRef]
Chang, Y.; Chen, B. A nonlinear adaptive H∞ tracking control design in robotic systems via neural networks. IEEE Trans. Control. Syst. Technol. 1997, 5, 13–29. [Google Scholar] [CrossRef]
Lin, C.; Wang, S. An adaptive H∞ controller design for bank-to-turn missiles using ridge Gaussian neural networks. IEEE Trans. Neural Netw. 2004, 15, 1507–1516. [Google Scholar] [CrossRef]

Figure 1. Structure of the SDD. (a) 3D structure of SDD, (b) Modeling of the SDD.

Figure 2. Connection mode of the SDD.

Figure 3. Space robot with SDD and target satellite systems.

Figure 4. The control block diagram based on fuzzy logic Reinforcement Learning. (a) The overall control block diagram; (b) The ACN and ASN unit.

Figure 5. 1st joint motor output torque.

Figure 6. 2st joint motor output torque.

Figure 7. Joint angle trajectory of without the fast subsystem controller.

Figure 8. Switch signal of motor (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 8. Switch signal of motor (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 9. Reinforcement signal (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 9. Reinforcement signal (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 10. Base and joint angle (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 10. Base and joint angle (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 11. Joint impact torque (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 11. Joint impact torque (

F_{C 1} = 120 N \cdot m, F_{O 1} = 10 N \cdot m

).

Figure 12. Switch signal of motor (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Figure 12. Switch signal of motor (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Figure 13. Reinforcement signal (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Figure 13. Reinforcement signal (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Figure 14. Base and joint angle (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Figure 14. Base and joint angle (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Figure 15. Joint impact torque (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Figure 15. Joint impact torque (

F_{C 2} = 80 N \cdot m, F_{O 2} = 10 N \cdot m

).

Table 1. SDD impact resistance at different initial velocities of spacecraft.

Initial Velocity of Satellite/ (m/s, m/s, rad/s)	Max Impact Torque without SDD/ (N·m)	Max Impact Torque with SDD/ (N·m)	Percentage Reduction/ (%)
[0.1, 0.1, 0.15]^T	226.68	133.27	41.21
[0.1, 0, 0]^T	78.38	38.78	50.52
[0, 0.1, 0]^T [0, 0, 0.15]^T	34.17 117.33	15.67 66.09	54.14 43.67

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, A.; Ai, H.; Chen, L. A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite. Appl. Sci. 2022, 12, 2662. https://doi.org/10.3390/app12052662

AMA Style

Zhu A, Ai H, Chen L. A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite. Applied Sciences. 2022; 12(5):2662. https://doi.org/10.3390/app12052662

Chicago/Turabian Style

Zhu, An, Haiping Ai, and Li Chen. 2022. "A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite" Applied Sciences 12, no. 5: 2662. https://doi.org/10.3390/app12052662

APA Style

Zhu, A., Ai, H., & Chen, L. (2022). A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite. Applied Sciences, 12(5), 2662. https://doi.org/10.3390/app12052662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite

Abstract

1. Introduction

2. Structure of the SDD and Buffer Compliance Strategy

2.1. Structure of the SDD

2.2. Buffer Compliance Strategy

3. Dynamic Modeling and Impact Analysis

4. Design of Controller

5. Simulation Results

5.1. Simulation of Impact Resistance of the SDD

5.2. Buffer Compliance Control Strategy Performance Simulation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI