Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning

: To address the local minimum issue commonly encountered in active collision avoidance using artificial potential field (APF), this paper presents a novel algorithm that integrates APF with deep reinforcement learning (DRL) for robotic arms. Firstly, to improve the training efficiency of DRL for the collision avoidance problem, Hindsight Experience Replay (HER) was enhanced by adjusting the positions of obstacles, resulting in Hindsight Experience Replay for Collision Avoidance (HER-CA). Subsequently, A robotic arm collision avoidance action network model was trained based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) and HER-CA methods. Further, a full-body collision avoidance potential field model of the robotic arm was established based on the artificial potential field. Lastly, the trained action network model was used to guide APF in real-time collision avoidance planning. Comparative experiments between HER and HER-CA were conducted. The model trained with HER-CA improves the average success rate of the collision avoidance task by about 10% compared to the model trained with HER. And a collision avoidance simulation was conducted on the rock drilling robotic arm, confirming the effectiveness of the guided APF method.


Introduction
In the process of tunnel construction, workers often need to face an extreme construction environment, which brings a great challenge for the traditional tunnel construction methods, but also provides a broad practice space for intelligent construction and construction in tunnel engineering [1].Meeting the complex requirements of desired tasks involves addressing the critical issue of collision avoidance for rock drilling robotic arms in intelligent construction processes.The issue of real-time collision avoidance path planning for a rock drilling robotic arm to evade two dynamic obstacles needs to be solved.How to ensure the real-time aspect of collision avoidance planning while also making sure that the planned actions allow the robotic arm to reach the target position, not just to avoid obstacles, is the main problem to be resolved.
The drilling robotic arm possesses seven degrees of freedom (DOF), and is computationally very expensive with higher DOFs and obstacles [2].Existing algorithms mainly focus on planning a new collision-free path applications [3], which is not appropriate for real-time collision avoidance.However, the artificial potential field method and deep learning approaches are considered to solve this problem.The artificial potential field (APF) method, a local algorithm, can effectively generate full-body collision avoidance actions.Deep reinforcement learning (DRL) emerges as a promising approach, with its trained action models capable of global optima identification.But, each of these methods has a problem with real-time collision avoidance planning: 1.
The artificial potential field has the problem of falling into local minima [4].

2.
DRL training is inefficient because of the need to avoid multiple obstacles.
To actualize collision avoidance, a myriad of studies have been conducted, among which the research on artificial potential field (APF) emerges as a significant direction [5][6][7][8][9].To solve the collision avoidance problem of the end-effector, it converts the distance between the end-effector of the robotic arm and the obstacle into a repulsive force [10,11].Because in more scenarios it is necessary to ensure that the whole body of the robotic arm will not collide with the obstacle, the whole-body collision avoidance of the robotic arm has been considered in some studies.Li et al. [12] estimate the distance between the obstacle and the robot skeleton in real time using deep visual perception, thereby establishing a multijoint repulsive force model.Additionally, there is also some research on robots without visual recognition.Control points are set on the robotic arm, and full-body collision avoidance is performed by calculating the repulsive force from the control point to the obstacle, and local minima are avoided by virtual obstacles [13,14].However, the method of avoiding local minima through virtual obstacles requires entering the local minima position first, and after determining the position as local minima, setting the virtual obstacle to make the robotic arm leave the current position.This method is very inefficient for real-time motion planning.Generally speaking, there is almost no artificial potential field-based method that can provide real-time path planning to avoid local minimum positions.
To address the above problem reinforcement learning provides a new solution to this problem due to its global planning capability.In the study conducted by Cao et al. [15], a Deep Deterministic Policy Gradient (DDPG) algorithm is used for collision avoidance training for a three-degree-of-freedom robotic arm to solve the problem of collision of the robotic arm under static obstacles.Li et al. [16] used a combination of APF and DDPG to process the obstacle avoidance trajectory planning process in phases, designed the robotic arm artificial potential field method to approach the target, and then the obstacle avoidance phase set the DDPG dominant training, which improves the mobile robotic arms in the case of narrow passages and obstacle restraints Obstacle avoidance ability.For the collision avoidance motion planning of robotic arms, the exploration space is large, rewards are sparse, and the learning efficiency of the agent is relatively low.Andrychowicz et al. [17] were the first to propose Hindsight Experience Replay (HER), which enables an agent to learn efficiently from samples of sparse and binary rewards by setting up virtual targets.HER is widely used in path planning research, and scholars add HER to their methods to increase the training effect [18,19].Further research has been conducted on the basis of HER, and improved algorithms have been proposed with some success.HER is considered to produce target bias in multi-target learning because of changing the likelihood of the sampled transitions and trajectories used in training.So, bias-corrected HER (BHER) was proposed to correct the bias [20].The Hindsight Goal Ranking (HGR) method has been proposed to further improve the learning efficiency [21].Although HER works well for the reach-target problem with binary rewards, it is not optimized for the collision avoidance problem.
To grapple effectively with these challenges, this paper presents an innovative actionbased obstacle avoidance method that accomplishes active robotic arm collision prevention, guided by the potential field forces in tandem with DRL.The main contributions of this paper are as follows.

1.
Propose a DRL-guided method for collision avoidance in a simplified robotic arm model.The local minima problem of APF is solved by DRL guidance, while the use of a simplified robotic arm reduces the difficulty of DRL training.

2.
Proposed Hindsight Experience Replay for Collision Avoidance (HER-CA) algorithm to improve the training effect.The algorithm allows the agent to learn about collision avoidance from collision experience.

3.
Full-body collision avoidance model for rock drilling robotic arm based on artificial potential field.
The rest of the paper is structured as follows.In Section 2, a simplified method for the robotic arm is shown first, followed by a Twin Delayed Deep Deterministic Policy Gradient (TD3) and HER-CA based method.In Section 3, a full-body collision avoidance model based on APF is shown.In Section 4, we conducted experiments comparing HER-CA and Hindsight Experience Replay (HER), and simulation of robotic arm collision avoidance.Section 5 summarizes and concludes.

Simplified Robotic Arm
In this study, the robotic arm is a 7-DOF rock-drilling robotic arm that has five slewing joints and two telescoping joints, developed by Gengli Machinery in Luoyang, China.The first three joints and the last three joints are in a spherical coordinate configuration in the robotic arm, connected by the fourth slewing joint.Figure 1 shows the coordinate structure of the robotic arm and Table 1 shows the D-H parameters of the robotic arm.Table 1.D-H parameters of 7-DOF rock-drilling robotic arm, q 1 -q 7 are the robotic arm joint values.
The primary function of the deep reinforcement learning (DRL) method is to train an action model, which aids the robotic arm in successfully navigating around obstacles.In this scenario, the need is for a general directional obstacle avoidance, rather than a meticulous precision avoidance.When analyzing the structure of the rock drilling arm, the main body can be simplified as the upper arm paired with two segments of the propelling beam, represented as a cylinder Φ with the red line serving as the axis.The offset angle from Φ to the direction of the beam is denoted as θ m , as illustrated in Figure 2.
Determine the obstacle coordinates for cylinder Φ in the joint space by leveraging the coordinates of the obstacle point in its operational space.Let the coordinates of the obstacle in the operation space be x, y, z, and then the coordinates of the obstacle point in the joint space of the big arm are where q 1obs , q 2obs are the positions of the obstacle points for the first two joints.The final simplified joint space obstacle points are obtained by performing one slewing of Φ according to the slewing joint angle q 4 x obs = q 1obs + θ m cos(q 4 ) (3) where x obs , y obs are the coordinate value of the joint obstacle points after the simplification of the robotic arm.

Describe Environment
TD3 was first proposed by Fujimoto et al. [22], and it is a type of actor-critic method for deep reinforcement learning.This algorithm mainly solves the problems in the DDPG algorithm.TD3 learns two Q-functions instead of one, and then takes a more conservative (i.e., smaller) estimate of the Q-value as the target value.TD3's policy update is delayed compared to the Q-value update, in short, all of the above help to reduce overestimation bias in Deep Q-Learning [23].The model is trained using the TD3 algorithm, and the environment is described below.
Under the context of the simplified operation of the robotic arm mentioned above, this obstacle avoidance planning issue can be recognized as a problem of planning a two-dimensional spatial path.In this research, two points of obstruction exist.The state begins by randomly locating the starting point, target point, and the two obstruction points.Following this, the state space s t is designed as where S represents the entire state space and the other symbolism is shown in Table 2.
Define the action space as a t by using ∆x as the x-coordinate variation and ∆y as the y-coordinate variation, and A represents the entire action space: A binary reward function for arrivals and collisions is designed, while the step length is added as a penalty to the reward function in order to make the agent try to take the shortest path, and the reward function is r t (a t , s t ): r t (a t , s t ) = r step + r collision + r reach (10) where r step is the step reward function, and k step is the reward coefficient.Where r collision is the collision reward function, and C collision represents the reward value for which a collision occurs.r reach is the arrival reward function, C reach is the arrival reward value, and C thr is the range of arrival determination.

Hindsight Experience Replay for Collision Avoidance
Hindsight Experience Replay (HER) is an experience pooling mechanism for improving the training efficiency of agents used in sparse and binary reward situations.The basic idea is that every experience of an agent still contains valuable information even if it fails to reach the goal.By treating a hindsight goal from failed experiences as the original goal, HER enables the agent to receive rewards frequently.
To improve HER for the collision problem, the path points are considered as target points in the HER algorithm while the obstacle points in the collision experience are moved.The location of the new obstacle point is at a random location, thus there is a high probability that the experience will become a collision-free experience.In this way, agent learns the collision avoidance ability from the experience of failed collision avoidance, and achieves the purpose of accelerating deep reinforcement learning training and improving the training effect, as shown in Figure 3.
There are two ways to keep obstacles away when using HER-CA, obstacle reset and obstacle move away.Obstacle reset is when an obstacle is randomly placed into the space as if resetting the environment, and obstacle move away is to move the obstacle away by x random , y random distance, where x random , y random are random numbers.We will perform a comparison of the two later in the course of the experiment.
In HER-CA we save the data recorded in each game as OP, save the data of one of the steps as OS, and use the future strategy [17] for goal conversion, i.e., converting a random step after the current position to the goal position, and using obstacle resets to generate possible collision-free experiences.See Algorithm 1 for a more formal description of the algorithm.Play a number of games and add the data from each game to the data pool PL.

4:
for trun ← 1to MAX t do 5: Randomly select a game OL from PL.

6:
Randomly select a step OS from OL.

7:
Get the future step coordinates x f uture , y f uture .8: if OS collision then 10: Limit x obsi and y obsi to 0 to 90.Recalculating OS's reward value.

Attractive Force
The attractive force is utilized for guiding the rock drilling robotic arm to achieve the desired target position.The target orientation of the arm is determined by both the position and attitude of its endpoint, which can be represented as a rotation matrix T goal .In a bid to simplify the computation of the APF, we calculate the target angle vector Q goal of the robotic arm's joints.This is achieved by resolving the inverse kinematics using an iterative method.Consequently, we compute the vector for the joint attractive force (or moment) by using the difference between the current joint angle vector of the robotic arm, that is, Q goal and the target angle as showcased in (11): where M att is the vector of attractive scaling factors, and the elements in K r att are the inverse of the corresponding elements of K att .Equation (11) uses the inverse cotangent function tan −1 to limit the maximum value of the attractive force (or moment), preventing excessive speed of motion when the robotic arm is moving at a distance from the target position.

Control Point Repulsion
The role of the repulsive force is to move the arm away from the obstacle.In order to achieve full-body collision avoidance of the arm, the main structure of the arm is regarded as a collection of multiple control points (as shown in Figure 4), and the positions of the control points are derived by means of the positive kinematics of the robot.When an obstacle enters the radius of the repulsive field of a control point (as shown in Figure 5), the obstacle generates a repulsive force on the control point: where F rep refers to the vector of the repulsive force.The term r symbolizes the distance between the obstacle and the control point.R f ield demarcates the radius within which the repulsive force field influences, while α f represents the direction vector that emanates from the obstacle and leads to the control point.Where g is the weighted average value representing the distance of the joint position from the target position, and K f represents the proportionality coefficient of the repulsive force in the operation space.When obstacles are near the target position, the repulsive force exceeds the attractive force, which results in a target inaccessibility problem.Where tan −1 (K g × g) makes the repulsive force decrease when the distance from the target is closer, and solves the target inaccessibility problem.

Joint Space Repulsion
To operate a robotic arm, the production of joint force (or moment) is crucial, the repulsive force vector is converted into the robotic arm joint force (or moment) by the Jacobian method.
The end-effector positional state of the robotic arm (containing position and angle) can be regarded as a vector ṗ and the joint value of the robotic arm can be regarded as a vector q: where x, y, z represent the end-effector spatial position coordinates, and α, β, γ are the spatial attitude angles.Where q 1 ∼ q 7 are the robotic arm joint values.Theoretically, any element in ṗ can be calculated by q, so there is ( 14) Each term on the left side of ( 15) is fully differentiated, and the right side is obtained by taking partial derivatives for q 1 to q 7 : The vector on the left side of the equation represents the amount of change in the position of the end-effector denoted as ∆P, the matrix with 7 rows and 6 columns on the right side is the Jacobi matrix which we will denote as J, and the vector on the far right side of the equation represents the amount of change in the joints denoted as ∆Q, and (15) can be abbreviated as ( 16) In the differential scenario, this equation can be seen as a mapping relationship that converts joint velocities into end-effector velocities.By numbering the control points, with n carrying the number of joints and m representing the ordering of the control points, the Jacobian matrix J nm of the control points can be derived.In the differential case, where time ∆t is consistently one, velocity is represented by force in this document.The three components of the vector −→ F rep are utilized as the initial three components of ∆P.Since the repulsive force does not produce torque on the control point so the last three elements of ∆P representing the angular velocity vector are θ, which gives ∆P.From this the joint space repulsive force (or moment) vector can be calculated.
where M rep signifies the joint repulsive force (or moment) vector.The function pinv stands for the pseudo-inverse of the matrix.Meanwhile, K rep denotes the vector of repulsion scale factors.

Guidance
The issue of target unreachability is addressed during the computation of the repulsive force.The pre-established Action Model is employed to resolve the local minima problem and direct the movement of the robotic arm.
The target values for the first two joints, simplified points for joint space obstacles, and the current joint values of the robotic arm are fed into the Action Model to arrive at the subsequent position Q mod for the two joints.This Q mod is then utilized as the target values for the initial two joints to ascertain a temporary target position Q target , which guides the robotic arm by generating an attractive force at the joints.
Since Action Model does not incorporate precise obstacle avoidance and only provides directional guidance, fewer training steps are necessary.The guidance can take place at short intervals.However, the proximity to the temporary target position may result in slower guidance.Thus, to expedite the process, the maximum attractive force is implemented when the arm is quite distant from the final target.
where M att 1 represents the attractive moment value of the first joint, M att 2 represents the attractive moment value of the second joint, and C mattm represents the attractive moment maximum.
The rapid alteration of the temporary target position during the guidance process may precipitate a sudden change in the attractive moment, potentially resulting in a shock.
To circumvent such a situation, a limit is imposed on the rate of change of the attractive moment, as follows: where M attold 1 , M attold 2 symbolize the attractive moment value at the previous point in time, and C ali and C alr are constants used to adjust the function.
At the end of Algorithm 2, L step is utilized as a step factor.It multiplies the combined force (or moment) M to ascertain the joint action quantity designated as ∆Q.
Get the position of the obstacle points OP.

7:
Get the current joint value Q cur .
Generate s t as input to Action Moled, and take a step to get Q mod .10: Use Q target to calculate M att .
12: Amplify M att 1 and M att 2 to maximum.Attractive limits are imposed on M att 1 and M att 1 .

17:
Calculate the coordinates of the control points CP. ∆Q ← M ⊙ L step 28: end while

Experiments
In this segment, we implement three distinct experiments to substantiate the validity of the methodology delineated in this paper.Initially, a comparative analysis is performed between HER and HER-CA.This experiment serves to assess the efficacy of each algorithm when subjected to diverse parameters.To define the environment, we tapped into OpenAI's Gym library whereas network training was facilitated through the Pytorch library.The upcoming experiment is designed for the simulation of robotic arm collision avoidance.The purpose of these trials was to observe and evaluate the arm's ability to evade static as well as dynamic obstacles.MATLAB and CoppeliaSim served as the platforms on which these robotic arm collision avoidance simulations were executed.

Comparison of HER and HER-CA
Firstly, we configured the environment.We set the joint limits for the initial two joints of the robotic arm to ±45°, and applied a 45°offset to them, creating a two-dimensional joint space where both dimensions ranged from 0 to 90°.During the environment's initialization, the starting point, obstacle point, and target point were indiscriminately positioned within the joint space.The path is judged to be a collision when it is within 3°of an obstacle, and a reach when it is within 2°of the target point.The actions given by the action model plus a Gaussian noise were explored for each game, and the maximum number of steps in each game was five.Both the action model and the value model use a fully connected neural network with 3 hidden layers, each having 256 neurons.The action model uses the ReLU activation function and the network output is processed using the Tanh function with the learning rate set to 3 × 10 −5 .The value model uses the ReLU activation function and the learning rate is set to 3 × 10 −4 .All networks use the Adam optimizer.
Each epoch plays 200 games and stores the data from each game in the data pool, and then performs 20 network trainings.Each network training is randomly collected from the data pool for 50 steps, where each step has an 80% probability of being processed by HER or HER-CA.In addition a soft update of the network model is performed at the end of each epoch.
Data acquisition was achieved by testing preserved action network models.Specifically, each action model was saved at intervals of every 10 epochs, and the success rate of each model was ascertained by running 5000 games with every individual action model.
The initial experiment sought to evaluate the effectiveness between obstacle resetting and obstacle moving away in the context of using the HER-CA method.As depicted in Figure 6, using the move away strategy slightly outperforms the resetting method, exhibiting a marginally higher average and peak success rate.We used the results of the success test for the full training process to calculate the average success rate, and the success rate for the 5000 epochs at the end of the training was used for interval estimation to calculate the confidence level.The average success rate improvement of HER-CA in the case of one obstacle was 9.4%, and the confidence level for an improvement greater than 5% was 71.1%.The average success rate of HER-CA in the case of two obstacles was improved by 10.1%, and the confidence level for an improvement greater than 5% was 88.5%.

Move Away Resetting
The one-obstacle case HER-CA carried less effect enhancement than the two-obstacle case, and the reason for this phenomenon may be that one obstacle is inherently less difficult to avoid, leading to the fact that HER can also be trained slightly more efficiently.
In the third experiment we tested the performance of HER-CA in a 3D path finding task.The space is a cube space with a side length of 45, in which Obstacles and target points were randomly generated in a cubic space with coordinates from 3 to 45.The performance of the HER-CA and HER was tested for the 2 obstacles and 3 obstacles cases, respectively.A decaying learning rate is used, with an initial learning rate of 5 * 10 −6 for the action model and 5 * 10 −5 for the value model, decaying by half every 20,000 epochs.The remaining parameters were the same as in the previous experiment.
The experimental results are shown in Figure 8.In the case of 2 obstacles, the average success rate of HER-CA improves by 20.1% compared to HER, and the confidence level of improvement greater than 10% is 82.7%.In the case of 3 obstacles, HER-CA has a maximum success rate of 90.9%, while HER could not be successfully trained, with a maximum success rate of only 29.7% at an epoch number of 9000.

Static Obstacle Avoidance
In the experimental paradigm, the robotic arm is tasked to navigate from its initial position to the target position while avoiding two stationary obstacles.The task information of the robotic arm and the definition of obstacles are shown in Table 3. Comparative experiments utilizing the APF method were also conducted.As shown in Figure 9, when using the APF (artificial potential field) method for collision avoidance, the robotic arm directly tries to move towards the target position.This leads to it getting stuck in a local minimum position, unable to reach the target point.

Spherical Obstacle and Robot Information
Start joint value of the robot arm [0 • , 0 • , 0 mm, 0 • , 0 • , 0 • , 0 mm] Goal position and attitude of the end-effector [8680 mm, 4506 mm, 1700 mm, 0 • , 0 • , 0  However, our guided APF method enables the arm to circumvent the two obstacles achieving its destined position, as shown in Figure 10.The closest proximity of the entire robotic arm from the obstacles is represented in Figure 11.A collision could potentially occur when this minimum distance reduces to less than 150 mm.However, in this experiment, the shortest distance is consistently greater than this collision threshold.The most minimal distance observed was 161 mm, occurring at 4.6 s.

Dynamic Obstacle Avoidance
This experiment was conducted to corroborate the effectiveness of our novel active collision avoidance method in bypassing dynamic obstacles to reach the target position.During the experiment, we positioned the obstacle to move in the path of the robotic arm, while the arm was required to traverse from the starting position to the same final goal position as the previous experiment, avoiding any collision on its path.The initial positions of the two obstacles are in the upper left side of the beam, moving towards the lower right side at different speeds, and the obstacle information is shown in Table 4.As demonstrated in Figure 12, the robotic arm initially attempts to pass underneath the obstacles.When the obstacles move down and block the path, it changes its direction and moves upwards, passing through the gap between the two obstacles.In the end, the robotic arm successfully navigates around the dynamic obstacles and reaches its target position smoothly.The change in each joint angle is shown in Figure 13.The closest proximity between the robotic arm and the moving obstacle during this collision avoidance process is 200.5 mm.This smallest measured distance occurred at the 3 s, which is graphically represented in Figure 14.

Conclusions
The HER-CA method is proposed, which has improved the success rate of the model trained with the HER method by about 10%.Then a development was proposed for robotic arm's collision avoidance, which combines deep reinforcement learning for guidance and APF for full-body collision avoidance.The algorithm enables the robotic arm to demonstrate higher intelligence during the dynamic collision avoidance process, as it can autonomously find a path to avoid local minima positions in real time, as shown in Figure 12.There

Figure 1 .
Figure 1.The structure and coordinate system of the robot.

15 :
One network training using OS's data.
update of the network model.

Figure 3 .
Figure 3.The figure on the left shows the original experience and the figure on the right shows the transformed experience.

Figure 4 .
Figure 4. Schematic diagram of the control points of the rock drilling robotic arm.

Figure 5 .
Figure 5. Schematic diagram of repulsive forces generated at control points.

Algorithm 2 APF 1 : while Run do 2 :if T goal changed then 3 :
GuidedCalculate the joint angle Q goal .4:

Figure 6 .
Figure 6.Comparison plot of training success rates for different collision-free experience generation methods.Blue represents obstacle reset red represents obstacle removal, and the shaded area represent one standard deviation.Subsequent experiments compared the effects of HER and HER-CA, with setups involving one obstacle and two obstacles for model training scenarios.The experimental results are shown in Figure 7, which indicates that the model trained by HER-CA exhibits an average success rate improvement of about 10% compared to the model trained by HER.We used the results of the success test for the full training process to calculate the average success rate, and the success rate for the 5000 epochs at the end of the training was used for interval estimation to calculate the confidence level.The average success rate improvement of HER-CA in the case of one obstacle was 9.4%, and the confidence level for an improvement greater than 5% was 71.1%.The average success rate of HER-CA in the case of two obstacles was improved by 10.1%, and the confidence level for an improvement greater than 5% was 88.5%.The one-obstacle case HER-CA carried less effect enhancement than the two-obstacle case, and the reason for this phenomenon may be that one obstacle is inherently less difficult to avoid, leading to the fact that HER can also be trained slightly more efficiently.In the third experiment we tested the performance of HER-CA in a 3D path finding task.The space is a cube space with a side length of 45, in which Obstacles and target points were randomly generated in a cubic space with coordinates from 3 to 45.The performance of the HER-CA and HER was tested for the 2 obstacles and 3 obstacles cases, respectively.A decaying learning rate is used, with an initial learning rate of 5 * 10 −6 for the action model

Figure 7 .Figure 8 .
Figure 7.Comparison plot of training success rates for different numbers of obstacles, where blue and green correspond to HER-CA and HER, respectively.

Figure 9 .
Figure 9.The process of real-time collision avoidance movement using only the APF method.From 1 to 4 represents the process of the robotic arm from the beginning to the end of the simulation.

Figure 10 .
Figure10.Robotic arm motion process using our guidance APF method.The green arrow represents the direction of the temporary target pose given by the action model.From 1 to 4 represents the process of the robotic arm from the beginning to the end of the simulation.

Figure 11 .
Figure 11.The closest distance between the robot and the obstacles.The red dotted line is the distance limit.

Table 4 .Figure 12 .
Figure12.Robotic arm motion process using our guidance APF method.The green arrow represents the direction of the temporary target pose given by the action model.From 1 to 6 represents the process of the robotic arm from the beginning to the end of the simulation.

Figure 13 .
Figure 13.The change in the value of each joint.

Figure 14 .
Figure 14.The closest distance between the robot and the obstacles.The red dotted line is the distance limit.

Table 2 .
Description of state space elements.

Table 3 .
Simulation information for robotic arms and obstacles.