Development of a Stereo Vision Measurement System for a 3D Three-Axial Pneumatic Parallel Mechanism Robot Arm

In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epipolar line in the stereo pair. After camera calibration, both intrinsic and extrinsic parameters of the stereo rig can be obtained, so images can be rectified according to the camera parameters. Thus, through the epipolar rectification, the stereo matching process is reduced to a horizontal search along the conjugate epipolar line. Finally, 3D trajectories of the end-effector are computed by stereo triangulation. The experimental results show that the stereo vision 3D position measurement system proposed in this paper can successfully track and measure the fifth-order polynomial trajectory and sinusoidal trajectory of the end-effector of the three- axial pneumatic parallel mechanism robot arm.


Introduction
Many manufacturing processes use robots to perform various tasks, include welding, assembling, pick and place, and defect inspection. All these tasks require knowledge of the relative location between the robot's end-effector and the desired target. The best-known technique to determine threedimensional location information is based on stereo vision. Stereo vision systems often consist of two or multiple imaging devices along with a PC or other microprocessors. Due to the advantages of cost, easy maintenance, reliability, and non-contact measurement, stereo vision has become a popular research topic and been applied in industrial automation, autonomous vehicles, augmented reality, medical, and transportation [1][2][3][4].
A three-axial pneumatic parallel mechanism robot arm developed by NTU-AFPC Lab [5] was the test rig in this study. Its end-effector is able to follow the desired trajectories by controlling the positions of three rod-less pneumatic cylinders using nonlinear servo control. However, the kinematic model of the test rig has many different solutions so the real trajectories of the end-effector cannot be known only by the measured position of the three pneumatic actuators. To solve this problem, a common method is to use angular sensors for measuring angular displacements of each joint, and the trajectory of the end-effector can be calculated and estimated by the robot forward kinematics. Compared the joint angle measuring method with the stereo vision method, the stereo vision method has the advantages of involving non-contact sensing and providing direct measurements. Besides, it's very difficult to fit angular sensors at the joints of parallel mechanism of the robot due to the restrictions of the mechanism used in this study.
To reconstruct a 3D space by stereo images using binocular cues, the disparities of the corresponding points in stereo pairs have to be known. Therefore, solving the stereo correspondence problem has been the most important stage on the 3D reconstruction. Some published studies [6][7][8][9] have attempted to solve the stereo correspondence problem. The most popular and well-known method is to use epipolar constraints [10] to reduce the stereo matching region from an area to a straight line.
For an uncalibrated stereo rig (i.e., both intrinsic and extrinsic parameters are unknown) [11][12][13], the fundamental matrix F often needs to be computed to express epipolar constraints on uncalibrated stereo pairs. In contrast, a calibrated stereo rig with known intrinsic and extrinsic parameters can use so-called epipolar rectification [14,15] to transform each corresponding stereo pair for making the epipolar lines parallel and at the same horizontal rows, which greatly reduces the stereo matching region to a horizontal row. In this paper, the epipolar rectification algorithm [15] is adapted, which is a compact and clear stereo rectification algorithm and can provide MATLAB code for research reference. Since the algorithm assumes that the stereo rig is calibrated, camera calibration needs to be performed first.
The camera calibration procedure in this paper was accomplished using the Camera Calibration Toolbox of MATLAB [16] developed by Bouquet. Bouquet's main initialization phase of camera calibration inspires by Zhang [17] that uses a chessboard as calibration pattern to obtain both intrinsic and extrinsic camera parameters, and Bouquet's intrinsic camera model inspired by Heikkilä and Silvé n [18] which includes two extra distortion coefficients of Zhang's intrinsic camera model to get more precise stereo rectification. After stereo rectification is done, the correspondence problem of desired target is solved.
The Circle Hough Transform (CHT) [19] is one of the best known methods which aims to detect lines or circles in an image. The algorithm transforms each edge point in edge map to parameter space and plots histogram of parameter space in an accumulator space as output image in which the highest-frequency accumulator cell (i.e., pixel with the highest gray value) is the outcome. If circle radii are unknown, the parameter space is a three-dimensional space, and that requires a large amount of computing power for the algorithm, which is the major drawback for real-time application. Kimme et al. [20] first suggested that using edge direction can reduce the computational requirements of the CHT to only an arc to be plotted in the accumulator space. Minor and Sklansky [21], and Faez et al. [22] then proposed that plotting a line in the edge direction has the advantage of reducing parameter space to a two dimensional space. Scaramuzza et al. [23] developed a new algorithm which rejects non-arc segments (e.g., isolated points, noises, angular points, and straight segments) and plots lines in the direction of arc concavity. The algorithm gives more precise approximation for circle location.
There are two major procedures in stereo vision tracking, including motion tracking and stereo matching [24]. The location of the desired target in the reference image (e.g., the left image) is tracked by using the motion tracking algorithm, and the stereo matching algorithm is then matching the correspondence location of the desired target in the other image (e.g., the right image).
Motion tracking involves two types of algorithms: feature-based tracking algorithm [25,26] and region-based tracking algorithm [27][28][29]. The feature-based tracking algorithm tracks partial features of the target. The canny edge detector [30] is often used for extracting edge features of the target, and point feature of the target's corner is extracted by the SUSAN corner detector [31]. Region-based tracking algorithm uses the template/block determined by user selection or image recognition to track the target. Once the template/block is decided, the algorithm starts to compute the correlation between the template/block and the designated region in the current frame. The most common used correlation criteria are the sum of absolute differences (SAD) and the sum of squared differences (SSD). References [28,29] suggested the template update strategies that solve the -drifting‖ problem caused by environmental influence (e.g., light conditions or object occlusion) during motion tracking.
The developed stereo matching methods can roughly be divided into two categories: local methods and global methods [32]. Although global methods, such as those using dynamic programming [33], can be less sensitive to local ambiguous regions (e.g., occlusion regions or regions with uniform texture in an image) than those using the local method, the global methods require more computing cost [34]. Block matching [35] is the best known method among the local methods because of its efficiency and simplicity in implementation. In the block matching, the reference block determined in moving tracking is used to search stereo corresponding by using matching criteria such as SSD or SAD. Once the stereo matching is made, each corresponding locations of the target in the stereo images are found, that is, the disparity of the target's location is known. Therefore, the depth information of the target can be calculated by triangulation.
This paper aims to develop a stereo vision system that is considered as a sensor to measure 3D space trajectories of the end-effector of the three-axial pneumatic parallel mechanism robot arm in real-time. Thus, the real-time stereo tracking is required to ensure that the stereo measurement process and the end-effector's motion are as synchronized as much as possible. The Multimedia Extension (MMX) technology [6] is utilized in this paper to minimize the computational cost of the stereo tracking process. In addition, the stereo depth estimation will be calibrated by the linear encoder measuring results on a straight-line moving pneumatic cylinder. Therefore, the correctness of stereo vision system can be known. For that, a test rig is set up for realizing the developed strategies of the stereo vision which is used to measure the end-effector of the three-axial pneumatic parallel mechanism robot arm.

System Setup
The system setup combines the stereo vision measuring method with the three-axial pneumatic parallel mechanism robot arm for measuring the 3D trajectories of the end-effector. Based on the structure design, the position of the end-effector of the three-axial pneumatic parallel mechanism robot arm is very difficult to measure directly in the practical experiment. It can be calculated via the position sensors of the three linear actuators by means of the kinematics. However, there are many different solutions for the kinematics of the end-effector of the three-axial pneumatic parallel mechanism robot arm, so the accurate xyz coordinates of the end-effector are difficult to solve. Thus, this paper proposes the stereo vision measuring method to measure the absolute xyz coordinates of the end-effector after the image coordinate calibration. In this paper, the stereo vision measurement system can be divided into two parts: the offline preparation stage and the online measuring stage, as shown in Figure 1. The offline preparation stage of the system includes camera calibration and calculation of transformation matrices for epipolar rectification, and a series of calibration patterns must be taken for this stage. After each camera of the stereo rig is calibrated independently, both projection matrices and radial distortion coefficients of the left and right cameras are used to compute the transformation matrices and to rectify the distorted images.
In the online measuring stage, the transformation matrices and the radial distortion coefficients calculated at the offline preparation part are imported for the image rectification. In the first place, the desired target is detected in the rectified left image by means of circle detection, and the reference block for stereo tracking is defined.
Once all requirements mentioned above are computed, the real-time 3D measurement can be executed. The target locations in both left and right image are tracked by stereo tracking so as to compute the estimated 3D world coordinates of the target through stereo triangulation. Figure 2 shows the system scheme of the stereo vision measurement system. Three-axial pneumatic parallel mechanism robot arm The stereo rig in this study, as shown in Figure 3, is composed of two identical CCD cameras which are equipped with camera lenses, and the baseline distance is approximately 7 cm. Figure 3 also shows the three-axial pneumatic parallel mechanism robot arm developed by the NTU-AFPC lab [5]. The end-effector is the desired target of the stereo vision measurement system.
Based on image quality and anti-noise performance, the CCD image sensor is better than the CMOS image sensor, thus, the CCD camera is selected in this paper [36]. A camera manufactured by TOSHIBA TELI, model CS8550Di, which supports progressive area scan and interlaced area scan, is utilized in this paper For the real-time application aspect, the progressive area scan is used in this paper. The detailed specificationd of the camera are shown in Table 1.  An analog CCD camera has analog signals, so it needs an image acquisition device to digitalize the analog signals for further processing or analyzing on a PC or other processors. The image acquisition card developed by National Instruments, model IMAQ PCI-1410 is chosen in this paper. It has 16 MB of onboard SDRAM used to temporarily store the image being transferred to the PCI bus, and three independent onboard DMA (Direct memory access) controllers for improving its performance. The intensity resolution can reach 10 bit/pixel, and 8-bit pixel format is supported on software programming. Table 1 shows its detailed specifications.

Image Rectification
Given a pair of stereo images, the problem of finding pixels or objects in one image which can be identified as the same pixels or objects in another image is called the correspondence problem. Solving the correspondence problem is difficult, due to problems such as object occlusion, lens distortion, aperture problem, and homogeneous regions in the stereo pair [39]. These problems make the correspondence problem difficult and complex. To make it easier, the image rectification is introduced.
Rectifying stereo images involves finding a transformation for each image plane such that pairs of conjugate epipolar lines become collinear and parallel to one of the image horizontal axes. Because the search of corresponding points becomes only along the horizontal lines of rectified images, the stereo rectification makes the correspondence problem easier. In the following sections, the stereo-pair image rectification methods applied in this paper will be introduced.

Camera Model
To rectify the stereo images, the knowledge of camera model and its parameters are important. Figure 4 depicts the pinhole camera model, which is the simplest camera model that describes the mathematical relationship between the 3D world coordinates and the image plane coordinates. R is the image plane (or retinal plane) centered on the principle point P; F is the focal plane centered on the optical center C. Both planes are parallel to the focal length f. The straight line passing through the principle point P and the optical center C is called the optical axis.
A three-dimensional point which represents the mapping relationship between the world reference frame and the image frame; the relationship can be formulated as: In Equation (1), the homogeneous transformation matrix P is also called the perspective projection matrix (PPM), which can be considered as the combination of transformations: the extrinsic parameters T e and the intrinsic parameters T i . Therefore, the homogeneous coordinates in the image frame m can be written as: The extrinsic parameters T e define the position and the orientation of the camera reference frame with respect to the world reference frame by a rotation R and a translation t: The intrinsic parameters T i are the optical characteristics and the internal geometric of the camera, which define the pixel coordinates of image frame with respect to the coordinates in the camera reference frame: In Equation (5), where α = f/k 0 and β = f/k 1 are the focal lengths in horizontal and vertical pixels respectively (f is the focal length in millimeter, and (k 0 , k 1 ) are the pixel size in millimeter), (u 0 , v 0 ) are the coordinates of the principle point, and γ is the skew factor that models non-orthogonal u-v axes.
Since (u, v, s) is homogeneous, the pixel coordinates u′ and v′ can be retrieved by dividing the scale factor s.
The camera model derived above is based on the simple pinhole camera model, which doesn't take the lens distortion into consideration. To correct the radial distortion image, the lens distortion model implemented by Devernay et al. [37] is included in the camera model. As shown in Equations (6) and (7), an infinite polynomial series is used to model the radial distortion, in which, 1  and 2  are the first and second order radial distortion parameters, and r d is the distorted radius. Note that x d and y d are the distorted camera coordinates; x d and y u are the undistorted camera coordinates: By eliminating higher-order terms of Equations (6) and (7), which can be written as Equations (9) and (10), respectively: If camera has been calibrated, that is, the intrinsic parameters are known, so the radial distortion parameters and the distorted camera coordinates can be computed by Equation (11); the radial distortion correction can then be achieved through Equations (12) and (13):

Camera Calibration
Camera calibration is a process to find the intrinsic parameters and the extrinsic parameters of a camera. The knowledge of these parameters is essential for the stereo rectification. The Camera Calibration Toolbox of MATLAB is adopted as the camera calibration tool in this paper to find parameters of the stereo rig.
In order to obtain precise parameters of camera, the calibration pattern needs to take at least 5 and up to 20 pictures from different distances and angles simultaneously. As shown in Figure 5, the pictures of the chess board are taken by the left and the right camera respectively.
After the stereo images of the calibration patterns are achieved, the camera calibration tool is able to compute the intrinsic parameters and the extrinsic parameters of each camera.

Epipolar Geometry
As shown in Figure 6, it is interesting to note that when the baseline ( l r C C ) is contained in both focal planes, that is, both image planes (R l and R r ) are parallel to the baseline, the epipoles (E r and E l ) are at infinity and the epipolar lines, denoted by the blue lines on two image planes, are all horizontal.
In this special case, also called the standard setting, the epipolar lines corresponding to the same horizontal rows with the same y coordinate in both images and point correspondences are searched over these rows, and that simplifies the computation of stereo correspondences. The imaged points of three arbitrary 3D points are all on the same horizontal epipolar lines.

Epipolar Rectification
As mentioned, the standard setting has a great advantage of reducing the computation of stereo correspondences, but it cannot be obtained by real cameras. However, if the cameras' calibration parameters are known, this problem could be overcome through the Epipolar Rectification. In this paper, the rectification algorithm presented by Fusiello et al. [15] is adopted.
The stereo rig can be calibrated by Bouguet's stereo calibration tool, that is, the intrinsic and extrinsic parameters of both left and right cameras are known. Therefore, from Equation (1) (14) and the coordinates of the left optical center c l and the right optical center c r can be determined as: where the new intrinsic parameters T ni is the same for both new PPMs, and can be chosen arbitrarily; the optical centers c l and c r are computed in Equation (15) of the old PPMs. The rotation matrix R is the same for both new PPMs, which will be specified by means of its row vectors: The row vectors of R are the X, Y, and Z axes, respectively, of the camera reference frame, expressed in the world coordinates.
According to Fusiello's algorithm, the row vectors of R can be calculated as: 1. The new X axis is parallel to the baseline: The new Y axis is orthogonal to X (mandatory) and to k :

The new Z axis is orthogonal to XY (mandatory):
In the calculation of the new Y axis, the vector k is an arbitrary unit vector which makes the new Y axis orthogonal to the new X axis. In order to make the new Y axis orthogonal to both the new X axis and the old Z axis, k is set to be the unit vector of the old Z axis.
To rectify the left and right images, the mapping relationships between the old PPMs and the new PPMs of the left and right images need to be known. Let us consider the left image as the example here.
For a 3D point w appears on the left and right cameras, the old left perspective projection ol m and the new left perspective projection nl m can be expressed as: Because the optical center is not affected by rectification, Equation (18) can be expressed in its parametric form as: Therefore: (20) from Equation (20) Now the rectification transformations T l and T r have been derived, which can be applied to the original left and right image, respectively, to get the rectified images. Figure 7 illustrates the above rectification transformation. Note that the bilinear interpolation is applied to interpolate the noninteger positions of the rectified images to the corresponding pixel coordinates of the original images. The image rectification includes the radial distortion correction [37] and the epipolar rectification [15] in this paper. Since both need to have the knowledge of camera parameters, Bouguet's camera calibration toolbox of MATLAB [16] is used. The radial distortion correction and the epipolar rectification can be carried out, and the results are shown in Figure 8. Table 2 shows the intrinsic and extrinsic parameters of the stereo rig.

Target Detection
When detecting a circular object in a 3D space the circle radius is unknown. Although the circle radius can be known by pixel measurement in an image, it either needs to be re-measured when the circle radius changes, or the target has to be set in the vertical plane to maintain the same circle radius. It's inconvenient and loses the generality. Hence, the ‗Pixel-to-Pixel' circle detection algorithm developed by D. Scaramuzza et al. [23] is adopted in this paper. The experimental results of the target detection are shown in Figure 9. Note that the target detection algorithm is applied on the rectified left image. Figure 9(a) is the target required to be detected; Figure 9(b) is the image applied by the Laplacian edge detection to Figure 9(a); Figure 9(c) shows the result of target detection.

Stereo Tracking
The blocking matching is one of the best known methods for motion tracking and stereo matching due to its ease of implementation and less computational effort. The Sum of Square Difference (SSD) and the Sum of Absolute Difference (SAD) are the commonly used matching criteria for block matching. Because the SSD squares the intensity differences, it requires heavier computational burden than the SAD during the matching process. For the purpose of real-time stereo tracking, the SAD based stereo tracking is utilized: Equation (23) expresses the SAD matching criterion, where f k is an image from the k frame; R is selected the reference block; (2n + 1) × (2n + 1) is the size of the reference block.
After the circle detection determined the location of the object at the previous frame, the reference block of size (2n + 1) × (2n + 1) centered on this location is created and stored in memory to search the best match (i.e., the SAD score has the minimum value) at the current frame in the searching window. Once the best match has been found, the current location of the object (x L ,y L ) in the left image can be tracked; in addition, the reference block is replaced by the best match to adapt the searching on the next frame. Figure 10 illustrates the block matching process mentioned above. Assume that the left and right images have been rectified, so the reference block only searches horizontally along the epipolar line of the right image. The searching criterion is the same as the moving tracking; that is, the best match is determined at the location where the SAD score is the minimum. Figure 11. The stereo matching.
When the best match has been found, the corresponding location of the object in the right image can be obtained. Figure 11 shows the stereo matching process. In Equation (24), the SAD score is used to estimate the similarity between the reference block R l of the left image and right image I r . The search for the best match is done consecutively along all possible candidates within the allowable disparity range max min Figure 12(a,c) shows the stereo tracking results at three arbitrary positions of the target, namely, positions A, B, and C. The reference block is defined as a 25 × 25 size rectangular block, and the size of searching window is 50 × 50 in the left image. For stereo matching on the right image, the row size of the horizontal scan-line is 20 ≤ x r ≤ 360. Once the first match is found, the row size of the horizontal scan-line becomes    

Measurement Correction
There are three major types of errors in the correlation-based stereo system, such as foreshortening error, misalignment error, and systematic error [38]. In order to correct the incorrect measuring results caused by these error sources, the stereo system needs to be calibrated. Figure 13 shows the measurement calibration method used in this thesis. A one-centimeter grid paper and a custom-made mechanism are used to calibrate the measurement results.  Table 3 is the error table of depth measurement, which shows the depth measurement error on each corresponding X W coordinate. Note that the blank in the table indicates that the target goes out the view field of the stereo rig.
By computing the standard deviations of depth errors on each X W coordinate, we know that the standard deviations are small; that is, the depth errors of the corresponding X W coordinates are closed to their mean. Therefore, the depth errors can be assumed to be the average depth error. Based on abovementioned ideas, the average depth measurement errors listed in Table 3 are plotted in Figure 14.
Since the distribution of the average depth measurement errors is approximated linearly, the linear regression method is used to model the depth measurement error. The MATLAB curve fitting toolbox is used to compute the depth error model coefficients and the fitting residuals. Figure 14 shows the average depth measurement error and its approximated linear model. Equation (25) Table 4 shows that the original average depth errors of each corresponding W X are reduced to below 1.65 mm after the error correction using the linear error model and the greatest error is reduced to 3.45 mm.  Figure 15 shows the frame assignment of the stereo vision system where {X C , Y C , Z C } is the camera frame and {X, Y, Z} is the end-effector frame. Since the measurement result of stereo vision system is with respect to the camera frame, it needs to be transformed to be with respect to the end-effector frame. The pose relationship between each frame is defined as a homogeneous transformation matrix as Equation (26)

Trajectory Measurement Experiment
is the origin of the end-effector relative to the camera frame. Note that all the experiment results shown in the following sections are transformed using Equation (26) to be in the consistent coordinates with the end-effector frame:

Fifth Order Polynomial Trajectory
The desired end-effector trajectories in the Z-direction are given as a fifth order polynomial trajectory with stroke 100 mm in 3 seconds and a fifth order polynomial trajectory with stroke 200 mm in 5 s, respectively. Figure 16 shows the stereo vision measuring results of the fifth order polynomial trajectory with stroke 100 mm. The stereo vision measuring results of X and Y coordinates at both strokes of fifth order trajectory are less than ±2 mm, and the stereo vision measuring results in the Z-coordinates show that the end-effector can be positioned to the desired stroke profile. Figure 17 shows the comparison of the desired trajectory and stereo vision measurement results. The end-effector can follow the given desired trajectory well.

Sinusoidal Trajectory
The desired trajectory of the end-effector in Z-direction in this section is planned to be a fifth order polynomial trajectory with stroke 150 mm at t ≤ 4 s, and a sinusoidal trajectory at 4 s ≤ t ≤ 20 s, with amplitude of 50 mm and frequency of 1 rad/s. Figure 18 shows the stereo vision measuring results of the sinusoidal trajectory, and Figure 19 shows the comparison of the desired trajectory and the stereo measurement results. The measurement results of X and Y coordinates of sinusoidal trajectory are less than ±4 mm, and the difference between the desired sinusoidal trajectory and the measuring results is approximately ±4 mm at the peak of sinusoidal trajectory, which results from the effect of systematic error of the stereo vision system or the vibration of the end-effector during trajectory tracking.

Conclusions
This paper proposes a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm. The stereo vision 3D position measurement system serves to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epipolar line in the stereo pair. After camera calibration, both intrinsic and extrinsic parameters of the stereo rig can be obtained, so images can be rectified according to camera parameters. Through the use of the epipolar rectification, the stereo matching process is reduced to a horizontal search along the conjugate epipolar line. Finally, 3D trajectories of the end-effector were computed by the stereo triangulation.
In the experiments of this paper, the stereo calibration results, the image rectification results, the circle detection results and the stereo tracking results were shown graphically. In the practical stereo vision measurement experiments, the measuring error of Z direction has been corrected first, and the corrected measurement results show that the maximum average error of Z direction can be reduced to 2.18 mm. After correcting the measurement error, the end-effector of the three-axial pneumatic parallel mechanism robot arm was planned to track the fifth order polynomial trajectory and the sinusoidal trajectory. These trajectories were then successfully measured and tracked by the stereo vision 3D position measurement system developed in this paper. Future work on the stereo vision 3D position measurement system proposed in this paper can be suggested. To broaden the field of view, a fisheye lens can be used. Using a position sensor such as a laser range finder to calibrate stereo vision system requires the sensor fusion data of laser and stereo vision, and should achieve more reliable and more accurate measurements.