A Multi-Camera Rig with Non-Overlapping Views for Dynamic Six-Degree-of-Freedom Measurement

Large-scale measurement plays an increasingly important role in intelligent manufacturing. However, existing instruments have problems with immersive experiences. In this paper, an immersive positioning and measuring method based on augmented reality is introduced. An inside-out vision measurement approach using a multi-camera rig with non-overlapping views is presented for dynamic six-degree-of-freedom measurement. By using active LED markers, a flexible and robust solution is delivered to deal with complex manufacturing sites. The space resection adjustment principle is addressed and measurement errors are simulated. The improved Nearest Neighbor method is employed for feature correspondence. The proposed tracking method is verified by experiments and results with good performance are obtained.


Introduction
In recent years, there has been growing interest in intelligent manufacturing [1] of large-scale equipment, such as airplane assembly [2], shipbuilding [3], and spacecraft inspection [4]. As one of the key technologies in intelligent manufacturing, large-scale measurement [5] plays a crucial part in the improvement of product quality and working efficiency. Large-scale measuring instruments are expected to provide adaptive and flexible services to end-users and enable a highly integrated human-machine manufacturing system. However, popular measuring instruments like laser tracker [6], total station [7], and indoor Global Positioning System (iGPS) [8] have problems with portability and flexibility, especially in a narrow space. In practical operation, it is quite a challenging task to measure complex components with high efficiency and accuracy in a narrow ship or spacecraft cabin using any of the above instruments. Besides, operating personnel have no access to real-time visual measuring results due to the lack of interaction with measuring instruments, which makes it more difficult for them to get involved into measurement environment. Augmented Reality (AR) [9] is a novel human-machine interaction tool that combines virtual objects with real environment in a seamless way, thus offering an effective solution to large-scale measurement.
On this background, an immersive human-machine-environment interactive positioning and measuring method is proposed. By the integration of global positioning and local measuring, three-dimensional coordinates of the measured objects can be obtained in the global coordinate system. Then, based on AR, the measuring results and auxiliary information are accurately overlaid onto the measured object in real time using a projector, which enhances the user's interactive and immersive experiences. With the immersive positioning and measuring helmet (see Figure 1), operating personnel are able to free their hands to carry out assembly and inspection work. The whole system possesses the features of high integration, excellent portability, and powerful functionality. Therefore, the immersive positioning and measuring method gives a huge boost to working efficiency with AR-assisted guidance, and it also represents the developing trend of large-scale measurement in intelligent manufacturing. In order to obtain accurate measuring results and merge virtual information with the real object perfectly, high-accuracy global positioning and tracking method is required [10]. The major task of tracking is to determine the positions and orientations of the helmet in real time, that is, dynamic six-degree-of-freedom (6-DOF) measurement [11]. A number of alternative technologies have been proposed for indoor positioning [12], such as magnetic [13], inertial [14], ultrasound [15], and vision [16]. However, complex working environment, portability, and accuracy requirement pose challenges to these methods. With small operating range, magnetic measurement is prone to distortion. Inertial measurement is of poor accuracy due to the error accumulation with time. As for ultrasound, it is severely affected by obstacles, so it does not apply to manufacturing sites. Compared with the above methods, vision measurement can realize pixel accuracy and large-scale multi-target tracking with excellent flexibility and convenience, which shows great advantages in industrial manufacturing.

Global
The vision-based 6-DOF measurement methods can be classified into two categories: outside-in measurement [17] and inside-out measurement [18]. As for outside-in measurement, cameras are installed in the working environment and markers are fixed on the moving target. Images of markers are taken by the cameras to calculate positions and orientations of the target. The OptiTrack system [19] developed by NaturalPoint is one of the representative outside-in systems, and it produces positional error less than 0.3 mm and rotational error less than 0.05°. However, it is costly to install multiple cameras in large-scale environments, which also bring difficulties to realize exact synchronization. By contrast, inside-out measurement uses cameras mounted on the tracked object to take images of markers in the working environment, which makes it more flexible and easier to extend. As the research focus of robot autonomous navigation, Simultaneous Localization and Mapping (SLAM) [20] relies on sequences of images to recognize the robot's location and surrounding environment. But the computational load for image correspondence is particularly high, and this view-based approach can hardly meet the accuracy requirements. In order to improve accuracy of reference points, retro reflective targets [21] are used for indoor positioning. Nonetheless, these systems lack of robustness, especially under conditions with varying illumination. Thus, active markers also have been utilized, and HiBall tracking system using LED panels is one of the most successful systems. The HiBall tracking system [22] achieves 0.5 mm and 0.03°of absolute error in a 4.5 m × 8.5 m room. However, it is quite difficult to install the LED panels at industrial spots, hindering the application of this system.
In order to better meet the needs of dynamic 6-DOF measurement for immersive positioning and measuring in manufacturing sites, this paper presents an inside-out measuring method using a multi-camera rig with non-overlapping views. The multi-camera rig is mounted on the integrated helmet for global tracking, and it is effective to increase the field of view as well as reduce the impacts of vision occlusion. Taking images of the cooperative LED markers that are deployed in the surrounding environment, positions and orientations of the helmet are determined through a collinearity equation based space resection adjustment method. As the LED markers are interchangeable with 38.1 mm spherical targets, their three-dimensional coordinates can be accurately obtained with laser tracker or industrial photogrammetry system. Furthermore, a motion information combined Nearest Neighbor (NN) method [23] is adopted to implement the matching of image points and LED markers under dynamic conditions. The remaining of this paper is structured as follows: Section 2 describes configuration of the multi-camera rig and design of the cooperative object; Section 3 presents the dynamic 6-DOF measurement principle, including space resection adjustment method and feature correspondence method; measurement error is simulated in Section 4, while experiments are carried out in Section 5; finally, conclusions of the work are provided in Section 6.

System Hardware
As depicted in Figure 2a, the multi-camera rig consists of one control circuit board and three compact CMOS cameras, which are mounted on a 3D-printed connector with good rigidity. Each camera is able to provide 1280 pixel × 960 pixel resolution with frame rate up to 54 Hz, and the size of unit pixel is 3.75 µm × 3.75 µm. As the field of view of each camera is 34.5°with a 6 mm lens, the angle between two neighboring cameras is set as 35°. Based on this design, there are none overlapping views between two neighboring cameras, so the multi-camera rig can cover larger visible range to avoid vision occlusion. The control circuit board is programmed to synchronize clocks as well as gather images, and it also transmits data to the computer via Ethernet. Consequently, the multi-camera rig is well-positioned to realize global tracking with light weight and high reliability. The active LED markers are designed as control points to deal with complicated industrial environment. A red LED with a wavelength of 660 nm is installed at the center of a spherical target (see Figure 2b). The LED marker is aligned accurately using TESA-VISIO 300 video measuring machine. Compared with passive markers, active LED markers are less sensitive to illumination, and they can also provide optimal contrast and sharp edges. Moreover, this active LED marker is interchangeable with the 38.1 mm spherically mounted retroflector (SMR) of laser tracker.

Camera Model
On the basis of pinhole model, a more complex camera model is introduced for high-accuracy vision measurement, including principal point offset and lens distortion. For convenience, the symmetrical plane of image plane is analyzed. As shown in Figure 3, a spatial object point P(X p , Y p , Z p ) is projected at p(x p , y p ) on the image plane through the perspective center O c . In the camera coordinate system (O c -X c Y c Z c ), X c axis and Y c axis are parallel to x axis and y axis of the image coordinate system respectively, and Z c axis is along the optical axis. On account of lens installation errors, there exists an offset between the principal point (x 0 , y 0 ) and the center of the image O. Hence, image points coordinates after principal point correction are expressed as: Here, (u p , v p ) and (u 0 , v 0 ) stand for pixel coordinates of point p and principal point respectively, and (dx, dy) are pixel separations. Besides radial lens distortion and tangential lens distortion, affine and non-orthogonality deformations also cause image point offset. The distortion is generalized into Equation (2): where (∆x, ∆y) denote the correction values for errors in the image plane, x c = x p − x 0 and y c = y p − y 0 stand for image point coordinates after principal point correction, r = x 2 c + y 2 c refers to the radical distance from image point to optical axis, (k 1 , k 2 , k 3 ) represent radial distortion coefficients (generally considering the first three radial distortion coefficients), (p 1 , p 2 ) stand for tangential distortion coefficients (generally ignoring the third tangential distortion), and (b 1 , b 2 ) refer to affine and non-orthogonality coefficients.
The collinearity equations are given by Equation (3) based on the camera model above, where c represents the principal distance, X 0 = [X 0 , Y 0 , Z 0 ] T are coordinates of perspective center in object coordinate system, and R defines rotation of object coordinates into image coordinates by three independent rotation angles θ, ϕ, κ about axes X c , Y c , Z c in Equation (4). Thus collinearity equations offer functions of six degrees of freedom (X 0 , Y 0 , Z 0 , θ, ϕ, κ) of the camera.

Feature Points Extraction Method
Using active LED markers as feature points, images with high contrast are acquired. The facula of LED marker is shown in Figure 4. Under this condition, the squared centroid method is adopted for sub-pixel image processing, which achieves high extraction accuracy. The squared centroid sets the gray value squared as the weight in the processing window: Here (x m , y m ) refer to the coordinates of centroid, f (x, y) is the gray value at the pixel position (x, y). Squared centroid method is computationally fast and easy to implement.

Space Resection Adjustment Method
In practical measurement, a set of LED markers are deployed in the environment, as shown in Figure 5. The three-dimensional coordinates of each LED marker in the global object coordinate system O-XYZ are obtained. The parameters of interior orientation and spatial relationships between three cameras are also calibrated in advance, In Equation (6), (R 21 , T 21 ) and (R 23 , T 23 ) are rotation matrices and translations matrices from coordinates Meanwhile, we assume the coordinate system of multi-camera rig O s -X s Y s Z s is identical with the coordinate system O 2 -X 2 Y 2 Z 2 of camera-2. Therefore, we can establish the reprojection error equations for each visible LED marker on the basis of collinearity equations: Here j is the serial number of visible LED markers and i (i = 1, 2, 3) denotes the number of camera that observes j-th marker. The six degrees of freedom of camera-2 (R 2 , X 20 ) with respect to the global object coordinate system can be expressed as follows, Substituting Equation (8) into Equation (6), the following relations are obtained, Therefore, the six degrees of freedom of camera-1 (R 1 , X 10 ) and camera-3 (R 3 , X 30 ) can be expressed by (R 2 , X 20 ), In addition, as an orthonormal matrix, rotation matrix R 2 satisfies following constraint equations: Consequently, there are only six unknown parameters, and the solution requires at least three LED markers which do not lie on a common straight line. A non-linear optimization algorithm is proposed to calculate (R 2 , X 20 ), and reprojection error based object function is established as Equation (12) using Lagrange multiplier method [24].
where n stands for the number of visible LED markers, λ is the Lagrange multiplier. With fast convergence rate and strong robustness, Levenberg-Marquardt (LM) algorithm [25] is employed for this optimization problem. In order to obtain global optimal solution, the initial value for optimization is calculated using EPnP algorithm [26]. Eventually, we can determine six degrees of freedom (X s , Y s , Z s , θ s , ϕ s , κ s ) of the multi-camera rig derived from R 2 and X 20 . Figure 5. Measurement layout.

Feature Points Correspondence Method
Because there are only several feature points that are almost exactly the same in one image, it is a huge challenge to match corresponding LED markers under dynamic conditions. NN method is proposed for feature matching by searching for the nearest point couples in two images, and each point couple represents the same LED marker. In order to improve robustness, motion information of the multi-camera rig is combined. The motion state vector of the multi-camera rig at time t k is , where (ν xk , ν yk , ν zk ) and (ω xk , ω yk , ω zk ) denote velocities and angular velocities respectively. Considering that users' movements are normally slow, the multi-camera rig is assumed to move with constant velocities in the time ∆t between two adjacent frames. Hence, the state of the multi-camera rig S k can be predicted by S k−1 as follows: Then the positions and orientations of three cameras can also be predicted. Based on Equation (3), we project LED markers onto image plane and calculate image coordinates of these predicted feature points. Next, we find the nearest point couples between predicted image and real image using NN method, where the distance between two image points is defined as: In order to avoid mismatching caused by occlusion and image noise, the ratio of the shortest distance to the second-shortest distance is validated. Furthermore, the reciprocity check is employed to remove outliers. Thus the following steps need to be performed for feature points matching.

•
As for a point P r,i on the real image, we calculate the distances from P r,i to all the points on the predicted image and select its nearest neighbor P p,j that has the shortest distance. If the ratio of the shortest distance to the second-shortest distance is less than the threshold λ, we continue to the next step. If not, we remove the point P r,i as an outlier. • We calculate the distances from P p,j to all the points on the real image. Then we check whether P r,i has the shortest distance, and whether the ratio of the shortest distance to the second-shortest distance is less than λ. When both criteria are fulfilled, the nearest point couple (P r,i , P p,j ) are proved to be correct. • By repeating the above process, we complete the feature points matching (see Figure 6). The value of λ is set based on the deployment of LED markers.
The shortest distance The second-shortest distance Correct matching

Dynamic Measurement Process
In order to accomplish continuous tracking, the system initialization has to be performed. In the initialization, the multi-camera rig remains stationary and feature points correspondence is completed manually. Once the initial state is determined, 6-DOF of the rig can be calculated real-timely in the measurement field. The complete measurement process is shown in Figure 7.

Measurement Error Simulation
On the basis of space resection adjustment method given in the previous section, measurement errors mainly arise from calibration errors of interior orientation parameters, calibration errors of spatial relationship parameters and position errors of LED markers including machining errors and measuring errors. Although measurement accuracy also depends on the focal length, the number and distribution of LED markers [27], they are not discussed in this paper.
By using Monte Carlo simulation technique, the 6-DOF measurement errors are analyzed. The deployment of the multi-camera rig and 15 markers is shown in Figure 8, and in this setup each camera observes 5 non-planar markers. The parameters of the multi-camera rig are set based on the system hardware design described in Section 2. After adding normally distributed noises, the root mean square (RMS) errors between the simulated values of 6-DOF and the true values are calculated. The following simulations are conducted to study the impacts of the above factors on the measurement errors. The sample size is set as 10 4 in each simulation. Firstly, the impact of the marker position error is studied. The position noises obeying normal distribution are added to each marker in three axes. The standard deviation varies from 0 mm to 0.5 mm. As shown in Figure 9, the RMS error of 6-DOF increases linearly with the marker position error. When a realistic error of 0.2 mm is assumed for the markers, the three-dimensional position of the multi-camera rig is computed to an accuracy of about 0.5 mm. Since the calibration errors of interior parameters are directly reflected in the errors of image points, we add image point noises that follow normal distribution. For the simulation, the noise is altered in the range of 0.2 pixel, corresponding to 0.75 µm. Figure 10 illustrates a linear relationship between 6-DOF measurement error and image point error. With an image point error of 0.1 pixel or 0.375 µm, the angular error is less than 0.01°. As part of spatial relationship error, the rotation error between cameras is added to evaluate its influence on 6-DOF measurement. The noise level is changed from 0°to 0.01°and the corresponding measurement error is depicted in Figure 11. There is a clear linear trend for all six degrees of freedom. It can also be observed that the rotation angle about X axis is computed with better robustness than the other two rotation angles. Finally, a variation of the translation error between cameras is investigated. The noise of relative position is varied within 0.5 mm. As seen from Figure 12, the 6-DOF measurement error indicates again a linear relationship to the translation calibration error [28]. Furthermore, another simulation is carried out to compare the measurement accuracy of a single camera and the multi-camera rig in the same setup above (see Figure 8). With a focal length of 2.4 mm, the single camera covers almost as wide view as the multi-camera rig to observe all 15 markers. Then the normally distributed noises including the marker position noise (0.1 mm), image point noise (0.1 pixel) and spatial relationship noise (only for the multi-camera rig) are added to simulate the 6-DOF measurement error. The spatial relationship noise is composed of the rotation noise (0.005°) and the translation noise (0.1 mm) between cameras, which are typical values for the calibration of non-overlapping cameras. The result in Table 1 shows that the multi-camera rig gives a higher accuracy than the single camera.

Experiment
Before the experiments, the multi-camera rig is fixed to a helmet. Interior orientation parameters and spatial relationship parameters of three cameras are calibrated in a large-scale spatial photogrammetric test field. Then the following experiments are conducted to evaluate the performance of the proposed method.

Static Measurement Experiment
The static measurement experiment is conducted in a 5 m × 5 m × 3 m measurement field (see Figure 13). Ten LED markers are deployed, and their spatial coordinates are measured using Leica AT901 laser tracker.

Measurement Repeatability
The helmet is randomly placed at ten different positions in the measurement field, and five images are captured at each position. Feature points are all extracted and the pixel coordinate repeatability of each point is shown in Figure 14. From the results, we can observe that the extraction precision of feature points in either axis is better than 0.01 pixel. Then 6-DOF (X 0 , Y 0 , Z 0 , θ, ϕ, κ) of the multi-camera rig are calculated, and 6-DOF measurement repeatability at each position is analyzed. As shown in Figure 15, standard deviations of positions along global coordinate axes are less than 0.5 mm and standard deviations of three rotations are better than 0.01°.

Distance Measurement
The helmet is mounted on a motorized translation stage with a long travel of 1000 mm (see Figure 13), and the straightness error of the translation stage is less than 0.02 mm. The translation stage is placed in the measurement field and set to travel 900 mm each time. The positions of the multi-camera rig are obtained before and after translation, so the travel distance D can be calculated by In addition, a SMR is fixed on the helmet to obtain accurate travel distances as reference values using the laser tracker. Nine sets of results are acquired while the translation stage is placed at nine different positions and directions, and the measurement errors of travel distances are shown in Table 2. Using the multi-camera rig, the RMS error of distance measurement is 0.383 mm. In order to evaluate the performance of dynamic measurement, the operating speed is tested using the C++ language in Visual Studio 2013 on a laptop with Inter(R) Core(TM) i7-6700HQ CPU at 2.60 GHz and 8 GB RAM. The test is conducted with ten LED markers and the maximum consuming time is shown in Figure 16. The consuming time of single measurement is approximately 33.9 ms, in which the feature extraction takes up 85% of the entire process.

6-DOF Measurement
As shown in Figure 17, the helmet and the Leica T-Mac are both mounted on the three-axis turntable to assess the dynamic 6-DOF measurement accuracy. Based on the experiment setup, T-Mac has a 0.01°rotational accuracy and about 30 µm positional accuracy. The spatial relationship between these two devices remains constant, no matter how the turntable rotates. Ten LED markers are deployed about five meters in front of the turntable, and their three-dimensional coordinates are measured using Leica AT901 laser tracker. In consequence, positions and orientations of the helmet and the T-Mac are unified in the laser tracker coordinate system O-XYZ.
The turntable is set to rotate 20°, 15°, and 10°about its outer axis, middle axis and inner axis respectively at an angular velocity of 5°/s, and then it returns to the starting position. In the feature matching process, NN method is applied with the threshold λ = 0.3. The multi-camera rig and T-Mac are triggered at 20 Hz synchronically, and their motion trajectories are shown in Figure 18.
A test is carried out to validate the proposed feature correspondence method. Here, the sample interval is altered to simulate different angular velocities of the turntable. As for an angular velocity of 10°/s, half of the obtained images are selected with equal interval. Then feature correspondence between adjacent pictures is performed with and without motion prediction respectively. The numbers of image mismatching at different angular velocities are listed in Table 3. The results indicate that using motion prediction helps to identify better correspondences.
Moreover, six degrees of freedom of the helmet in the T-Mac coordinate system O T -X T Y T Z T at each triggering moment are also acquired (see Figure 19). Based on the 600 sets of data obtained, standard deviations of six degrees of freedom are listed in Table 4. Standard deviations of dynamic measurement are slightly larger than those of static measurement. This problem is probably caused by the time synchronization error for triggering the multi-camera rig and the T-Mac, which needs to be further verified.     Table 4. Standard deviations of dynamic six-degree-of-freedom (6-DOF) measurement

Parameter Standard Deviation
Angle about X T axis (°) 0.0137 Angle about Y T axis (°) 0.0140 Angle about Z T axis (°) 0.0147 X T position (mm) 0.829 Y T position (mm) 0.640 Z T position (mm) 0.834

Conclusions
In this paper, a multi-camera rig with excellent portability and high reliability is presented for dynamic 6-DOF measurement. The multi-camera rig increases the entire field of view significantly, while guaranteeing measurement accuracy by space resection adjustment. The LED markers offer a more flexible and robust solution in complex manufacturing sites. The improved Nearest Neighbor method is employed for feature correspondence under dynamic conditions. Besides, the proposed global tracking method is validated by simulations and experiments, which demonstrate good performance of static and dynamic measurement.
Considering that the proposed feature matching method is suitable for slow-moving conditions, inertial measurement unit (IMU) will be utilized in future research. With high measuring frequency, IMU provides accurate compensations of positions and orientations for vision measurement in a short time. Meanwhile, vision measurement is able to effectively correct the drift error of IMU. Therefore, vision-inertial tracking is a promising method to deal with fast and intricate movements.