Assisted Operation of a Robotic Arm Based on Stereo Vision for Positioning Near an Explosive Device

: This document presents an assisted operation system of a robotic arm for positioning near an explosive device selected by the user through the visualization of the cameras on the screen. Two non-converging cameras mounted on the robotic arm in camera-in-hand configuration provide the three-dimensional (3D) coordinates of the object being tracked, using a 3D reconstruction technique with the help of the continuously adaptive mean shift (CAMSHIFT) algorithm for object tracking and feature matching. The inverse kinematics of the robot is implemented to locate the end effector close to the explosive so that the operator can perform the operation of grabbing the grenade more easily. Inverse kinematics is implemented in its geometric form, thus reducing the computational load. Tests conducted with various explosive devices verified the effectiveness of the system in locating the robotic arm in the desired position.


Introduction
Increasingly, robots are being used to carry out activities in various disciplines that are normally carried out by humans, such as assistance in surgery [1], assembly plants [2], domestic activities [3], and others. However, robots used in dangerous environments, such as rescuing people [4,5], handling radioactive elements [6], space exploration [7,8], or deactivation of explosives [9,10] have greater prominence. In addition, development of an explosive ordnance disposal (EOD) robot location system with enhanced features is being advanced as part of the ongoing project [11,12]. In order to achieve this, robots must have advanced mobility and manipulation skills that allow the operator to perform tasks very easily and quickly [13,14]. Currently, the common way to control these arms is through buttons or a joystick, and because these robots perform repetitive tasks, they generate stress for the operator [15,16]. This stress is generated because the operator tries to reach an object with the robotic arm, but by not having a clear reference of the distance at which the object is, a load of stress is generated. In addition, there is a strong pressure knowing that explosive devices are being handled [17].
The vision system provides three-dimensional information on the location of the object to be manipulated with the help of the two-dimensional location in the image that the operator provides through a touch screen, thus forming an assisted operation system. In Nadarajah's article [18], the vision system used in Robot Soccer systems is described in a general way. First, the positioning of the cameras in parallel configuration is listed; these are used in a robotic soccer system for both federation of international robot-soccer associations (FIRA) and RoboCup. Machine vision is classified into three types: omnidirectional, binocular/stereo, and monocular. Subsequently, the image processing algorithms and the reference related to their advantages and disadvantages are explained. One of the algorithms that stands out is continuously adaptive mean shift (CAMSHIFT), an algorithm that is used to follow a moving object. For the particular case of stereo vision, the distance is estimated with the stereo calibration of the cameras, the intrinsic and extrinsic parameters of the cameras are obtained, and then the distance of the objects captured in both cameras is calculated. In the paper by Zhao [19], a foldable manipulator applied to the five-degrees-offreedom (5-DOF) EOD robot is presented, and the Denavit-Hartenberg parameters (D-H) method is used to introduce the virtual joint, in order to establish the direct kinematics model of the manipulator. In this way, they demonstrate that a 5-DOF robotic arm is adequate to perform this task. In [20], a system was developed that controls a robotic arm to grab an object through stereo vision in parallel configuration and with a fixed camera (the camera is not mounted on the robotic arm but placed on a turret that has a view of the arm). In addition, object tracking is achieved due to distance estimation through the triangulation method. The system is checked with some operations described in the document. In another article [21], the triangulation method is also incorporated through a stereo vision system, which grabs an object with a robotic arm that uses the CAMSHIFT algorithm that provide better tracking. A stereo vision system placed on a robotic arm in an eye-in-hand configuration [22], together with a target selection system through a touch screen [23,24], could provide an interesting solution to the problem that operators have to bring the robotic arm closer to a specific location without generating stress load.
This article presents a system that controls the movement of a robotic arm in order to grab an explosive device using non-convergent stereo vision, as part of the multimodal system developed for this project [25]. First, the police officer of the explosive disposal unit (UDEX, by its acronym in Spanish) selects the explosive device to be hit with the proposed user interface (UI). The coordinates (X,Y,Z) of the target are calculated: coordinate Z by the two cameras configuration and triangulation method, and X and Y by perspective relations. Subsequently, the CAMSHIFT algorithm maintains the tracking of the object during the movement of the arm and, at the same time, detects the corresponding characteristic (center of mass of the object) in both images. The advantage of this proposal is that the autonomous detection of some characteristic of the object to be manipulated is no longer necessary. The possibility of false positives due to disturbances such as shadows, excess or lack of lighting, among others, is eliminated due to this system, which is robust and useful in field applications [26]. Finally, the position of the target is sent to the block of the inverse kinematics of the arm, previously calculated, using geometric techniques that reduce the computational cost. The assistance system is evaluated from the point of view of usability and user experience using the NASA-TLX (NASA taskload index) [27] and the evaluation methods SUS (System usability scale) [28], to verify that this proposal reduces operator stress levels.
The study focused on the operator assistance system, using design techniques and procedures related to vision system configuration, camera and robot calibration, and system performance analysis. The rest of the document is structured as follows: Section 2 presents the materials and methodology of the proposed system; this section is composed of the design and explanation of the interface, mathematical analysis of the stereo cameras, and control of the robotic arm. Section 3 explains the experimental results and presents discussion. Finally, conclusions and future work are presented in Section 4.

Materials and Methods
In Figure 1, the block diagram of the proposed assisted operation system is shown. First, the UDEX agent, through the user interface, selects the explosive device that he wishes to reach with the robotic arm. After this information is sent to the algorithm developed to calculate the angles that the robotic arm must move by calculating inverse kinematics, the stereo cameras send the captured frames to perform image processing and determine the estimated distance of the object. It is essential for the algorithm to work correctly. Finally, the values of the angles are sent to the robot and its movement is carried out. Figure 2 shows two images that show the proposed system. In the first image, the UDEX squad agent is selecting the target on the screen so that the proposed algorithm works; in the second image, the algorithm has blurred the background with a Gaussian filter to estimate the distance to the grenades selected by the agent more quickly and accurately. The detailed development of this algorithm can be found in a previously published article [29].  The architecture developed in this system that moves the robotic arm through stereo vision is shown in Figure 3. It consists of five modules: user interface and assistance, stereovision analysis, tracking algorithm, manipulator kinematics, and control of the robotic arm.

User Interface and Support (UI)
The proposed UI integrates the functions required for this system [29]. In this document, the functions related to the distance estimation of the explosive device are described; the UI is displayed in Figure 4.

Positioning
In the button panel on the lower left side of the UI, three options are displayed that allow the operator to place the arm in the best position (in front and center of the explosive device); the distance to the object is estimated using triangulation.

Operator Image Adjustment
The other range of options is found in the lower right area of the UI; the operator has the freedom to manipulate the characteristics of the cameras independently (left and right) to achieve similar characteristics between the two cameras. These features are: Brightness, Hue, Contrast, Camera Gain, Saturation, Exposure, and Zoom. In case an inadequate configuration is obtained, it is possible to return to the original configuration using the "Default" button located at the top of the panel.

Target Selection
Finally, the operator can select the target via the "FRAME" button, which provides the option to follow the selected object and estimate its distance; see Figure 4.

Stereo Vision Analysis
The stereo vision based distance estimation [30] is shown in Figure 5, in which the stereo camera is composed of two cameras. The points O c 1 and O c 2 are the optical centers of both cameras; T is the baseline (distance between the centers of the cameras); and f is the focal length of the lens. The point − → P represents the object in the real world, and Z is the distance between − → P and the stereo cameras [29]. Using stereo vision through cameras allows the operator to have reference to the depth at which the object is located, whereas if only one camera were used, it would not be possible to obtain this [31]. To estimate the distance from the object to the base of the cameras, it is necessary to calculate the disparity between frames; see Figure 6. The calculation of the coordinates (X, Y, Z) are given by [32]: where d is the disparity (difference of the coordinates in both images): In the third equation of (1), it is inferred that when the distance of the object − → P is greater, the disparity will be less, and vice versa: an inverse proportionality relationship. The procedure is detailed in Algorithm 1.

Object Tracking Algorithm
The CAMSHIFT algorithm is based on the MeanShift algorithm; the disadvantage of MeanShift is that its region of interest (ROI) is a fixed value. Therefore, when the target object gets closer to the lens, the object in the image becomes larger and the effect of the fixed ROI is small. However, when the target object is far from the lens, the object in the image becomes smaller. The proportion of smaller objects in the ROI makes tracking unstable and causes errors in judgment [25]. The CAMSHIFT tracking algorithm has the ability to adjust the seek box on every frame. It uses the position of the centroid and the zero-order moment of the search window in the top frame to set the location and dimensions of the search window for the next frame [33]. Figure 7 shows the flow diagram of the CAMSHIFT algorithm [34,35].

Robotic Arm Control
The study of the direct kinematic problem of a robot can be carried out by different methods; a commonly used method is based on the Denavit-Hartenberg parameters [36]. It is a systematic method and best suited for modeling manipulators in series. The D-H method has been used for the development of the kinematic model of this robot due to its versatility and acceptability to model any number of joints and links of a serial manipulator. Figure 8 shows the schematic design of the 5-DOF robotic arm from which the D-H parameters in Table 1 were extracted. The most common manipulators are 3, 4, or 6 degrees of freedom. The higher the degrees of freedom, the more flexible the manipulator, but the higher the number of degrees of freedom, the more difficult the manipulator will be to control [19]. For that reason, a 5-DOF robotic arm was chosen. Table 2 shows the lengths of the links of the robot used in testing this system.   The calculations of the arm motion matrices are shown below, where T represents the position and orientation of the end effector.
The geometric solution of the inverse kinematics is an intuitive method that requires figures that model the Euclidean space of the robot. Given this need, the top view of the robotic arm is presented in Figure 9.
Looking at Figure 9, we see: To solve joint θ 3 , the kinematic decoupling method is used; the position point γ m of the robot wrist is calculated from the orientation constant of the final effector γ, as shown in Figure 10.  After applying the decoupling method and solving Equation [37], the final equation of motion for θ 3 can be obtained: To determine the value of joint θ 2 , first the values of the angles φ and ϕ must be determined. Looking at Figure 10, the following equation for the angle φ is obtained: The equation for the angle ϕ is: Due to the restrictions of θ 2 , the robot only has the elbow down configuration. Looking at Figure 10, θ 2 can be obtained as: Angle θ 4 is obtained from the relation: where by clearing θ 4 , this is obtained: To determine the value of joint θ 5 , the following equality may be used, in order to keep the end effector coordinated with the robot base:

Evaluation Methodology
In order to validate this proposal, tests were carried out with police officers from the UDEX. At the beginning of the tests, the participants were informed about the general description of the assistance system in addition to the correct manipulation of the robotic arm and the user interface. Before starting each test, the participants had 5 min to familiarize themselves with the robot, the buttons of the classic system, the developed interface, and the developed assistance system. After the participants completed their training, tests of arm approach to the target, in this case explosive devices, were performed. Each of the agents had 5 min to achieve the greatest number of correct answers in the task of bringing the arm to the target region. A successfully completed task was for the fathom to get close to the explosive. The test scenario conditions were normal, with average room brightness. After the participants performed the robot handling test, they were provided with a sheet with the NASA-TLX test [27] and the SUS questionnaire [28]. To end the tests, each participant was interviewed to confirm that they successfully completed each questionnaire. Figure 11 shows the three steps described above.

Performance of the Proposed Algorithm
In this article, the Dobot Magician training robotic arm was used, and the two webcams were Xioami CMSXJ22A, whose resolution was 1280 × 720 pixels. Instead of using the clamp of the arm, it was decided to use a support for the stereo cameras, of our own manufacture, which is shown in Figure 12. This replacement was decided due to the limitation of the space in the end effector of the Dobot Magician. Prior to testing, the cameras were calibrated using a 9 × 8 calibration pattern with a grid size of 30 × 30 mm. The Matlab Calibration Toolbox [38] was used, using the Zhang method [39]. Once the intrinsic parameters of the camera are known through a previous calibration, it was possible to start tracking the object. First, the operator selected the object to follow in the user interface. Once this was done, the algorithm obtained the coordinates of the object in the plane of each of the two images (coordinates in pixels) in order to be able to calculate the target depth and tracking frame. The coordinate of the object in the left camera was converted to the position of the object with respect to the center of the camera (mounted on the arm end effector) by applying the mathematics developed in Section 2.2. Subsequently, the values obtained from the object in 3D and in real measurements served as input data to the inverse kinematics of the arm so that it can move to said position. When the arm reached the position, the tracking stopped and the operator was notified of the successful movement; if it failed to arrive, the movement continued while the object continued to be tracked. Verification of the estimated distance of the tracked object was achieved by comparing the estimated measurement with the actual measurement of the object. Figure 13 shows a graph of the accuracy achieved by progressively placing the object at different distances, with an average accuracy of 99.18% achieved when comparing the estimated distance versus the real distance. The method proposed in this article has been compared with other distance estimation methods [40,41]. A more detailed explanation of this process can be found in [29]. Figure 13. Accuracy graph of the estimated distances. The data of the extraction method of the author Yu [34] are considered, while the data of the other method are of the author Xiu [35].
The objectives that were followed in this experiment were real explosives, provided by the UDEX squad of the Peruvian National Police. Specifically, a military hand grenade and a type 322 mortar grenade were tracked; these explosives have had the most presence in attacks in Arequipa city [42]. They are shown in Figures 14 and 15, respectively.     Table 3 shows the values obtained by bringing the end effector to different desired positions, together with its corresponding error value for 15 follow-ups. It was observed that the differences between the values of the real coordinates and the coordinates reached were very small, less than 2.64% average error, with this error being smaller in Z (depth). The tracking sequence of the explosive device by the Dobot is shown in Figure 17.

Figures 18 and 19
show the results obtained from the tests carried out using the NASA-TLX and SUS methods. These graphs show the scores obtained by the 15 participants for each of the two robot control systems: traditional control system (robot control by means of joysticks and buttons) and proposed assistance system (robot control by the graphical interface). In Figure 18, the average of each of the six categories evaluated is presented: mental demand, physical demand, temporary demand, performance, effort, and degree of frustration. In general, the assistance system developed presented a lower workload in the six categories evaluated, being more notable in the categories of frustration and mental and temporary demands. In this evaluation, a value near 20 means that it is not pleasant for the operator. Figure 19 shows the degree of usability and workload (stress generation) of each of the two systems for each participant during the experiment. The color background of this graph shows three different scoring areas: light red for poor usability (SUS score < 50), light yellow for good usability (85 > SUS score ≥ 50), and light green for excellent usability (SUS score ≥ 85). Table 4 shows a summary of the results of both evaluations for the two compared systems. Statistical parameters are used to obtain more reliable values such as the mean, standard deviation, and standard error. In the NASA-TLX test column, the average value of the traditional method of buttons and joysticks is X W T = 15.13, a considerably high value, whereas the value of the proposed method is X W P = 8.25, making it clear that it is comfortable for operators. In the SUS tests column, it can be seen that the average score of the proposed system is X S P = 82.51, much higher than the score obtained for the joysticks and buttons system, which is X S T = 46.65. The proposed assistance system is considered to be a good interface, that is, it has a user-friendly interface.

Discussion
In general, the assistance system proposed in this document is a good method to be applied in EOD handling. The system evaluated with the NASA-TLX method presents very good results in all categories. The stress load is clearly reduced compared to the traditional method of robotic arm control, which includes a lower execution time to perform the task of reaching the objective in the proposed system compared with the reference system. These data are related to the number of successes in the tests carried out, as the proposed system reduces human error to a minimum. The operator found it easier to manipulate the robot through our user interface. According to the data obtained from the SUS evaluation, the proposed assistance system has better results in usability, that is, it is very easy to use and understand its operation; this confirms that the developed system is a good interface.
In general, through the experiences of the users evaluated, the developed interface allows the manipulation of the robot to be easier and the assistance system considerably reduces the stress load of the operator. The handling of explosive devices is more efficient and safer when using this system, improving the experience of UDEX agents.

Conclusions and Future Work
In this document, a robotic arm control system was presented, and using two nonconverging stereo cameras, the position of an explosive device selected by the operator in 3D coordinates was obtained. The triangulation method was used to determine the depth of the explosive device in relation to the location of the cameras, and the CAMSHIFT algorithm was used to track it. After the coordinates were sent to the robotic arm's inverse kinematics block, the arm was able to reach the location of the explosive, successfully completing the test. The tests show that the system works correctly in estimating the distance of the explosive device with an accuracy of 99.18%, a higher percentage than in other related work. The results of both evaluations prove that the proposed assistance system is better than the traditional robot handling system for EOD tasks. The proposed method reduces the stress load by 18% and, moreover, the success rate is very high.
Future work will focus on improving detection accuracy and implementing a robust detection method that can overcome illumination intensity variations present in a outdoor environment.