A Fuzzy Logic Reinforcement Learning Control with Spring-Damper Device for Space Robot Capturing Satellite

: In order to prevent joints from being damaged by impact force in a space robot capturing satellite, a spring-damper device (SDD) is added between the joint motor and manipulator. The device can not only absorb and attrition impact energy, but also limit impact force to a safe range through reasonable design compliance control strategy. Firstly, the dynamic mode of the space robot and target satellite systems before capture are established by using a Lagrange function based on dissipation theory and Newton-Euler function, respectively. After that, the impact effect is analyzed and the hybrid system dynamic equation is obtained by combining Newton’s third law, momentum conservation, and a kinematic geometric relationship. To realize the buffer compliance stability control of the hybrid system, a reinforcement learning (RL) control strategy based on a fuzzy wavelet network is proposed. The controller consists of a performance measurement unit (PMU), an associative search network (ASN), and an adaptive critic network (ACN). Finally, the stability of system is proved by Lyapunov theorem, and both the impact resistance of SDD and the effectiveness of buffer compliance control strategy are veriﬁed by numerical simulation.


Introduction
Space exploration is of great significance to the development of resources exploration, meteorological observation, navigation, and positioning, so a large number of satellites are launched into space every year. It is inevitable that a small part of satellites will fail to enter the intended orbit or damage in orbit. If the satellites can be recovered, the cost of space exploration will be greatly saved. At present, it is feasible to use a space robot to complete the capture task. This has therefore become one of the research hot spots of space exploration [1][2][3][4][5][6][7][8]. Generally, the process of capture operation can be divided into four stages: (1) the space robot observes the target satellite; (2) a pre-operation stage before capture operation, such as deceleration and detumbling control of the target satellite; (3) contact and collision between the space robot and target satellite; and (4) the stabilization control of the closed chain hybrid system. The impact force makes it easy to damage the joints of space robot in the third stage, and the impact effect will make the hybrid system unstable in the fourth stage. Therefore, these two stages represent the focus and challenges of this study.
For the third stage, Cheng et al. [9] analyzed the dynamic evolution for dual-arm space robot with capturing a spin satellite and calm control for unstable closed chain composite system are discussed. Uyama et al. [10] presented the impedance-based contact control of a free-flying space robot utilizing a compliant wrist for non-cooperative satellite capture operation. An open loop impedance control law based on contact dynamics model is introduced to realize the desired coefficient of restitution defined between the manipulator enhancement signal containing more information (which ultimately makes the calm control of the hybrid system more stable). This paper is organized as follows. In Section 2, the structure of the SDD and the buffer compliance strategy are described. In Section 3, the hybrid system dynamic model is obtained, and the impact effect and force are analyzed. Furthermore, a RL control strategy based on fuzzy wavelet network is designed in Section 4. The results of the simulation are given in Section 5. Finally, the conclusion is summarized in Section 6.

Structure of the SDD
Compared with our published previously work [3], dampers are added. The structure diagram of the SDD is shown in Figure 1. The spring is used to absorb impact energy, and the damper provides damping force in real time to suppress flexible vibration. In order to describe the resistance of motor and manipulator more realistically, the equivalent dampers are added to them, respectively (in fact, there are no dampers), and the connection mode of the SDD is shown in Figure 2. In Figure 1, K si , D ti (i = 1, 2, · · ·, n) are the stiffness of torsion spring and the damping coefficient of rotary damper, D mi , D Li (i = 1, 2, · · ·, n) are the damping coefficient of equivalent damper at motor and manipulator. Appl. Sci. 2022, 12, x FOR PEER REVIEW 3 of 17 published previously work [3], this paper enhanced the primary reinforcement signal to become an enhancement signal containing more information (which ultimately makes the calm control of the hybrid system more stable). This paper is organized as follows. In Section 2, the structure of the SDD and the buffer compliance strategy are described. In Section 3, the hybrid system dynamic model is obtained, and the impact effect and force are analyzed. Furthermore, a RL control strategy based on fuzzy wavelet network is designed in Section 4. The results of the simulation are given in Section 5. Finally, the conclusion is summarized in Section 6.

Structure of the SDD
Compared with our published previously work [3], dampers are added. The structure diagram of the SDD is shown in Figure 1. The spring is used to absorb impact energy, and the damper provides damping force in real time to suppress flexible vibration. In order to describe the resistance of motor and manipulator more realistically, the equivalent dampers are added to them, respectively (in fact, there are no dampers), and the connection mode of the SDD is shown in Figure 2. In Figure 1

Buffer Compliance Strategy
In the third stage, the end-effector of space robot collides with the target satellite, and the impact energy will be quickly buffered and consumed by SDD when it is transmitted to the motor rotors so as to protect the joints. In the fourth stage, due to the impact effect, the instantaneous impact torque will be generated when the motors are turned on. If the instantaneous impact torque exceeds the limit of joint and the motor still turns on, the joint will be damaged. Therefore, it is necessary to set a torque threshold according to the ultimate torque value that the joints can withstand. When the instantaneous impact torque is detected to exceed the set threshold, the motor will be turned off. At this time, the spring will provide elasticity to mitigate the impact torque on joints, and the damper will quickly dissipate energy to suppress the flexible vibration. However, if only set one threshold is turned on, the motors will be frequently switched (which easily causes motor damage).  published previously work [3], this paper enhanced the primary reinforcement signal to become an enhancement signal containing more information (which ultimately makes the calm control of the hybrid system more stable). This paper is organized as follows. In Section 2, the structure of the SDD and the buffer compliance strategy are described. In Section 3, the hybrid system dynamic model is obtained, and the impact effect and force are analyzed. Furthermore, a RL control strategy based on fuzzy wavelet network is designed in Section 4. The results of the simulation are given in Section 5. Finally, the conclusion is summarized in Section 6.

Structure of the SDD
Compared with our published previously work [3], dampers are added. The structure diagram of the SDD is shown in Figure 1. The spring is used to absorb impact energy, and the damper provides damping force in real time to suppress flexible vibration. In order to describe the resistance of motor and manipulator more realistically, the equivalent dampers are added to them, respectively (in fact, there are no dampers), and the connection mode of the SDD is shown in Figure 2. In Figure 1

Buffer Compliance Strategy
In the third stage, the end-effector of space robot collides with the target satellite, and the impact energy will be quickly buffered and consumed by SDD when it is transmitted to the motor rotors so as to protect the joints. In the fourth stage, due to the impact effect, the instantaneous impact torque will be generated when the motors are turned on. If the instantaneous impact torque exceeds the limit of joint and the motor still turns on, the joint will be damaged. Therefore, it is necessary to set a torque threshold according to the ultimate torque value that the joints can withstand. When the instantaneous impact torque is detected to exceed the set threshold, the motor will be turned off. At this time, the spring will provide elasticity to mitigate the impact torque on joints, and the damper will quickly dissipate energy to suppress the flexible vibration. However, if only set one threshold is turned on, the motors will be frequently switched (which easily causes motor damage).

Buffer Compliance Strategy
In the third stage, the end-effector of space robot collides with the target satellite, and the impact energy will be quickly buffered and consumed by SDD when it is transmitted to the motor rotors so as to protect the joints. In the fourth stage, due to the impact effect, the instantaneous impact torque will be generated when the motors are turned on. If the instantaneous impact torque exceeds the limit of joint and the motor still turns on, the joint will be damaged. Therefore, it is necessary to set a torque threshold according to the ultimate torque value that the joints can withstand. When the instantaneous impact torque is detected to exceed the set threshold, the motor will be turned off. At this time, the spring will provide elasticity to mitigate the impact torque on joints, and the damper will quickly dissipate energy to suppress the flexible vibration. However, if only set one threshold is turned on, the motors will be frequently switched (which easily causes motor damage). The buffer compliance strategy proposed in this paper set a threshold in order to turn on and off at the same time. When the impact torques of the joints exceed the turn off torque threshold, the motor is turned off; when the impact torque is reduced to the turn on threshold by SDD, the motor is turned on again.

Dynamic Modeling and Impact Analysis
The structure of a space robot with SDD and target satellite systems is shown in Figure 3. The O 0 , O s and O i (i = 1, 2, · · ·, n) are the centroids of carrier, satellite, and joints respectively. P and P are the acquisition point of space manipulator and the acquisition point of satellite, respectively. xOy is the inertial coordinate system moving with orbit, x 0 O 0 y 0 and x s O s y s are the coordinate systems fixed on the centroids of carrier and satellite, respectively, x i O i y i (i = 1, 2, · · ·, n) is the coordinate system fixed at the centre of ith joint. The parameters of space robot and satellite are defined as follows: m 0 , m i (i = 1, 2, · · ·, n) and m s are the mass of space robot carrier, manipulator, and satellite, respectively, I i , I mi (i = 1, 2, · · ·, n) and I s are the moment of inertia of manipulator, motor rotor and satellite respectively, d 0 , d i (i = 1, 2, · · ·, n) and d s are the distance from O 0 to O 1 , from the centre of joint to manipulator and from satellite centroid to the end, respectively. L i (i = 1, 2, · · ·, n) is the length of the manipulator, and θ 0 , θ i , θ s , and θ mi (i = 1, 2, · · ·, n) are the angle of carrier attitude, manipulator, satellite attitude, and motor rotor, respectively.
The buffer compliance strategy proposed in this paper set a threshold in order to turn on and off at the same time. When the impact torques of the joints exceed the turn off torque threshold, the motor is turned off; when the impact torque is reduced to the turn on threshold by SDD, the motor is turned on again.

Dynamic Modeling and Impact Analysis
The structure of a space robot with SDD and target satellite systems is shown in Fig As the system of space robot with SDD (see Figure 3), the total kinetic energy of space robot in the pre-contact phase is as follows: where i r is the position vector of manipulator's mass centre, and i  and mj  are the angular velocity of manipulator and motor rotor, respectively. If the microgravity is ignored, the potential energy of space robot comes from the spring in SDD: Due to the addition of SDD in the joints of space robot, the space robot is a nonpotential system, the dissipation function should be added: As the system of space robot with SDD (see Figure 3), the total kinetic energy of space robot in the pre-contact phase is as follows: where r i is the position vector of manipulator's mass centre, and ω i and ω mj are the angular velocity of manipulator and motor rotor, respectively. If the microgravity is ignored, the potential energy of space robot comes from the spring in SDD: Due to the addition of SDD in the joints of space robot, the space robot is a nonpotential system, the dissipation function should be added: Reference [36], the Lagrange equation of dissipative system is as follows: Combined Equations (1)-(4), the model of space robot system before capture is obtained as follows: is the matrix containing Coriolis force and centrifugal force, is the Jacobian matrix of space robot, F P ∈ R 3×1 is the force acting on the end-effector of space robot. The model of satellite system before capture is obtained by the Newton-Euler equation: where M s ∈ R 3×3 is the symmetric positive definite inertia matrix of satellite, q s = [x s , y s , θ s ] T is the generalized coordinate of satellite system, x s and y s are the satellite centroid coordinates, J s ∈ R 3×3 is the Jacobian matrix for satellite,F P ∈ R 3×1 is the force acting on the satellite, and F P + F P = 0. After the capture operation, the velocity on end-effector of space robot and satellite meets the following requirements: where ∆t is the length of collision time, and S P = [ Through Equation (7), the acceleration of satellite after collision: ..
Combining the law of momentum conservation with Equations (5) and (6), we can get the following equation: F P dt are the impact impulse, and f P + f P = 0. The impact effect can be obtained by Equations (7) and (9): where The impact force can be obtained by Equations (9) and (10) The hybrid system dynamic model can be obtained by Equations (5), (6) and (8) and where The dynamic model of fully controllable hybrid system can be obtained from Equation (12):

Design of Controller
The controller consists of a PMU, an ASN, and an ACN. Since the primary reinforcement signal is used to design the control torque directly, it easily leads to failure of stability control. Therefore, the controller obtains the primary reinforcement signal through the PMU, in turn using the ACN to construct a more informative signal than the primary reinforcement alone in order to tune the ASN realize stability control of the hybrid system. The control based on fuzzy logic RL is shown in Figure 4.
q h )z, where z ∈ R n+1 is the arbitrary column vector.
The position error and performance measurement signal are defined as follows: where The primary reinforcement signal can be obtained from Equations (14) and (15): where L = diag( 1 e ρ i s i (t) +e −ρ i s i (t) , (i = 1, 2, · · ·, n + 1). Then, where The controller consists of a PMU, an ASN, and an ACN. Since the primary reinforcement signal is used to design the control torque directly, it easily leads to failure of stability control. Therefore, the controller obtains the primary reinforcement signal through the PMU, in turn using the ACN to construct a more informative signal than the primary reinforcement alone in order to tune the ASN realize stability control of the hybrid system. The control based on fuzzy logic RL is shown in Figure 4.   The position error and performance measurement signal are defined as follows: dh  e q q (14)  SeΛe (15) where The primary reinforcement signal can be obtained from Equations (14) and (15): x q q q q q q is the uncertain item in system.
In order to eliminate the influence of uncertain item on control accuracy, ASN is used to estimate the controller of wavelet neural network. It is assumed that the estimation of uncertain item by wavelet neural network is as follows: In order to eliminate the influence of uncertain item on control accuracy, ASN is used to estimate the controller of wavelet neural network. It is assumed that the estimation of uncertain item by wavelet neural network is as follows: whereŴ S = [ŵ S1 ,ŵ S2 , · · ·,ŵ Sm ] is the estimates of the weight matrix. Based on this, the control signals are designed as follow: where (17) and (19) to get: where χ = χ −χ.
The primary reinforcement signal can be enhanced as follows: where W * C is the ideal weight matrix, ψ * is the ideal regression matrix, Γ = diag(r s1 , r s2 , · · ·, r s,n+1 ) and r si is the element in r s .
Combined with Equation (24), the output of ACN can be expressed as follows: where where W S = W * S −Ŵ S , Ξ χ = χ − W * T S ψ * . Combined with Equation (26), the Equation (20) can be rewrite: The adaptive rate of ASN and ACN are designed as follows: . w Sk = k wsk (r sk +r skŵ . .
where k wsk , k wck , k ω and k l are positive constants, and the value of I sk , I ck , I ω and I l are as follows: Theorem 1. For the hybrid system dynamic mode of the Equation (13), supposing that Assumptions 1 to 3 hold and adopting the error evaluation signal shown in Equation (15), the control signal shown in Equation (19), the reinforcement signal shown in Equations (16) and (21), and the update rate of fuzzy wavelet neural network shown in Equations (28)- (31) can ensure that the trajectory tracking error e converges to zero asymptotically.
Proof of Theorem 1. Introducing the Lyapunov function: Then, derivative of the Equation (36) .
Through references [37,38], It can be known that υ is bounded by Assumption 1 and Assumption 2, further sup / υ L2T ≤ γ when the initial states of the hybrid system are all zero. According to the definition of Q in Equation (41), it can be explained that the system is stable according to the H∞ control theory when k d = 2γ 2 , and it can be seen that with the decrease of γ, r will also decrease.
In order to verify the impact resistance of the SDD during the third stage, the space robot system with/without SDD was used to carry out acquisition simulation tests on spacecraft with different velocity. The simulation results are shown in Table 1. As can be seen from Table 1, given different satellite velocities the SDD can significantly reduce the impact torque of joints, and the maximum can be reduced by 54.14%. Therefore, the SDD can be considered that it plays a good role in protecting joints during the third stage.

Buffer Compliance Control Strategy Performance Simulation
To show the buffer compliance control performance of the proposed controller, simulations are carried out for stable control phase. The actual parameters of the system are as follows:Λ = diag(2, 2, 2), ρ i = 1.5(i = 1, 2, 3), K r = diag(150, 150, 150), γ = 0.03, k wck = 3, k wsk = 3, k = 2, k l = 2. The position of hybrid system after capture is In order to highlight the advantages of SDD on SEA, the SEA structure of reference [3] is used for comparative analysis. In reference [3], to suppress the flexible vibration introduced by SEA, the controller is divided into a fast subsystem controller and slow subsystem controller, wherein the slow subsystem controller realizes trajectory tracking and the fast subsystem controller suppresses the flexible vibration. The simulation results are shown in Figures 5-7.
It can be seen from Figures 5 and 6 that under the same control parameters, whether in regards joint 1 or joint 2, adding SEA requires greater output torque than adding SDD, which means adding SEA will make the joint motor need more load capacity. Figure 7 shows that without the fast subsystem controller, the joint flexible vibration will be difficult to suppress and the joint angle fails to reach the desired state.      It can be seen from Figures 5 and 6 that under the same control parameters, whether in regards joint 1 or joint 2, adding SEA requires greater output torque than adding SDD, which means adding SEA will make the joint motor need more load capacity. Figure 7 shows that without the fast subsystem controller, the joint flexible vibration will be difficult to suppress and the joint angle fails to reach the desired state.
Assume that when the joint actuators run, the limit of the impact torque it can bear is150 N m  . In order to protect the joint actuators, the buffer compliance control strategy of active opening and closing actuators (named switching strategy) is adopted. The shutdown torque threshold is120 N m  , and the startup torque threshold is 10N m . The simulation results are shown in Figures 8-11.    It can be seen from Figures 5 and 6 that under the same control parameters, whether in regards joint 1 or joint 2, adding SEA requires greater output torque than adding SDD, which means adding SEA will make the joint motor need more load capacity. Figure 7 shows that without the fast subsystem controller, the joint flexible vibration will be difficult to suppress and the joint angle fails to reach the desired state.
Assume that when the joint actuators run, the limit of the impact torque it can bear is150 N m  . In order to protect the joint actuators, the buffer compliance control strategy of active opening and closing actuators (named switching strategy) is adopted. The shutdown torque threshold is120 N m  , and the startup torque threshold is 10N m . The simulation results are shown in Figures 8-11. Assume that when the joint actuators run, the limit of the impact torque it can bear is 150 N · m. In order to protect the joint actuators, the buffer compliance control strategy of active opening and closing actuators (named switching strategy) is adopted. The shutdown torque threshold is 120 N · m, and the startup torque threshold is 10N·m. The simulation results are shown in Figures 8-11.                 It can be seen from Figure 8 that the joint motor enters the stable output state after four shutdowns. Figure 9 shows that the RL controller can continuously output reinforcement signals when the base attitude angle and joint angle do not reach the desired position and reinforcement signal will to zero when they reach the desired position. Figure 10 shows that the ACN and ASN are optimized through the interaction with the environment and finally make the hybrid system reach a stable state. Figure 11 shows that the joint impact is limited to the bear torque well, and the buffer compliance strategy can realize the compliance of capture operation well.
Considering that the impact torque that the joint can bear will decrease with the increase of the space robot service years, the second group of simulation sets the shutdown torque threshold is 80 N · m, and the startup torque threshold is 10 N · m. The simulation results are shown in Figures 12-15. Figure 12 shows the joint motor enters the stable output state after seven shutdowns. As can be seen from Figures 12-15, even if the shutdown torque threshold is lowered, the joint impact torque can be limited well within a safe range, and the RL controller can still complete the calm control of the unstable hybrid system by outputting the reinforcement signal. This means that the SDD has excellent protection performance for space robot joints, and it plays an important role in prolonging the service life of space robot.
It can be seen from Figure 8 that the joint motor enters the stable output state after four shutdowns. Figure 9 shows that the RL controller can continuously output reinforcement signals when the base attitude angle and joint angle do not reach the desired position and reinforcement signal will to zero when they reach the desired position. Figure 10 shows that the ACN and ASN are optimized through the interaction with the environment and finally make the hybrid system reach a stable state. Figure 11 shows that the joint impact is limited to the bear torque well, and the buffer compliance strategy can realize the compliance of capture operation well.
Considering that the impact torque that the joint can bear will decrease with the increase of the space robot service years, the second group of simulation sets the shutdown torque threshold is 80 N m  , and the startup torque threshold is 10 N m  . The simulation results are shown in Figures 12-15.    It can be seen from Figure 8 that the joint motor enters the stable output state after four shutdowns. Figure 9 shows that the RL controller can continuously output reinforcement signals when the base attitude angle and joint angle do not reach the desired position and reinforcement signal will to zero when they reach the desired position. Figure 10 shows that the ACN and ASN are optimized through the interaction with the environment and finally make the hybrid system reach a stable state. Figure 11 shows that the joint impact is limited to the bear torque well, and the buffer compliance strategy can realize the compliance of capture operation well.
Considering that the impact torque that the joint can bear will decrease with the increase of the space robot service years, the second group of simulation sets the shutdown torque threshold is 80 N m  , and the startup torque threshold is 10 N m  . The simulation results are shown in Figures 12-15.     It can be seen from Figure 8 that the joint motor enters the stable output state after four shutdowns. Figure 9 shows that the RL controller can continuously output reinforcement signals when the base attitude angle and joint angle do not reach the desired position and reinforcement signal will to zero when they reach the desired position. Figure 10 shows that the ACN and ASN are optimized through the interaction with the environment and finally make the hybrid system reach a stable state. Figure 11 shows that the joint impact is limited to the bear torque well, and the buffer compliance strategy can realize the compliance of capture operation well.
Considering that the impact torque that the joint can bear will decrease with the increase of the space robot service years, the second group of simulation sets the shutdown torque threshold is 80 N m  , and the startup torque threshold is 10 N m  . The simulation results are shown in Figures 12-15.     Figure 12 shows the joint motor enters the stable output state after seven shutdowns. As can be seen from Figures 12-15, even if the shutdown torque threshold is lowered, the joint impact torque can be limited well within a safe range, and the RL controller can still complete the calm control of the unstable hybrid system by outputting the reinforcement signal. This means that the SDD has excellent protection performance for space robot joints, and it plays an important role in prolonging the service life of space robot.

Conclusions
In this paper, in order to protect the joints of a space robot from impact damage in the process of capturing satellite operation, an SDD is added between joint motor and manipulator, and a buffer compliance strategy that matches the SDD is given. The dynamic model of hybrid system is derived, and the impact effect and impact force are calculated during the third stage. For the purpose of realizing the stabilization control of the system, an RL control based on fuzzy wavelet neural network is proposed.
In the third stage of capture operation, a huge impact torque will be generated at the motor joint. Adding SDD between the motor and the manipulator can realize the rapid unloading of impact force, and the maximum value can be reduced by 54.14%.
In the fourth stage of capture operation (i.e., matching with the buffer compliance strategy designed by SDD), the joints' impact torque can be limited to a safe range. The shutdown threshold of buffer compliance strategy can be set flexibly, which can protect the joints of space robot with different service years.

Conclusions
In this paper, in order to protect the joints of a space robot from impact damage in the process of capturing satellite operation, an SDD is added between joint motor and manipulator, and a buffer compliance strategy that matches the SDD is given. The dynamic model of hybrid system is derived, and the impact effect and impact force are calculated during the third stage. For the purpose of realizing the stabilization control of the system, an RL control based on fuzzy wavelet neural network is proposed.
In the third stage of capture operation, a huge impact torque will be generated at the motor joint. Adding SDD between the motor and the manipulator can realize the rapid unloading of impact force, and the maximum value can be reduced by 54.14%.
In the fourth stage of capture operation (i.e., matching with the buffer compliance strategy designed by SDD), the joints' impact torque can be limited to a safe range. The shutdown threshold of buffer compliance strategy can be set flexibly, which can protect the joints of space robot with different service years.