Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization

Three-dimensional (3D) localization plays an important role in visual sensor networks. However, the frame rate and flexibility of the existing vision-based localization systems are limited by using synchronized multiple cameras. For such a purpose, this paper focuses on developing an indoor 3D localization system based on unsynchronized multiple cameras. First of all, we propose a calibration method for unsynchronized perspective/fish-eye cameras based on timestamp matching and pixel fitting by using a wand under general motions. With the multi-camera calibration result, we then designed a localization method for the unsynchronized multi-camera system based on the extended Kalman filter (EKF). Finally, extensive experiments were conducted to demonstrate the effectiveness of the established 3D localization system. The obtained results provided valuable insights into the camera calibration and 3D localization of unsynchronized multiple cameras in visual sensor networks.


Introduction
Currently, multi-camera localization is used in many fields, e.g., the currently hot field of autonomous driving [1,2].But, in many real-world scenarios such as swarm formations and mobile robots, multiple cameras often constitute an unsynchronized multi-camera localization system (UMCLS) without additional synchronization processing [3].The main challenge posed by unsynchronized cameras is that each scene point would be captured at different instants by each camera.This would induce triangulation errors because the two image rays would either intersect at an incorrect location or simply not intersect at all [4].Therefore, unsynchronization would bring large errors to traditional multi-camera calibration and localization algorithms, which are designed on the basis of synchronized cameras.This paper addresses the challenge posed by unsynchronized cameras by using timestamp matching and pixel fitting.
For a UMCLS, the first step is to perform multi-camera calibration, which aims to accurately compute the intrinsic parameters (principal point, lens distortion, etc.) and the extrinsic parameters (rotation matrix and translation vector between the camera coordinate system and the reference coordinate system) of each camera.Multi-camera calibration is the basis of 3D localization since the calibration results would be subsequently used during the process of 3D localization.The localization accuracy of a UMCLS will be determined by the calibration accuracy of multiple cameras directly, so the process of multi-camera calibration is very important.According to the dimension of the calibration object, existing multi-camera calibration methods can be roughly divided into five kinds: methods based on 3D calibration objects [5,6], methods based on 2D calibration objects [7,8], methods based on 1D calibration objects [9], methods based on point objects [10], and self-calibration methods [11][12][13][14][15][16].Note that methods based on 1D calibration objects can quickly and easily complete the calibration of multiple cameras without being affected by occlusion [17], so this paper chooses to use the 1D calibration method.
Until now, most of the multi-camera calibration methods are developed for synchronized cameras, and synchronization is controlled by a hardware trigger.But, in many cases, the camera frame rates are different, the cameras work asynchronously, and the acquired image sequences are naturally not matched.Therefore, image synchronization and camera calibration are usually carried out simultaneously for multiple unsynchronized cameras.From the perspective of the calibration object, existing calibration methods for unsynchronized cameras could be generally classified into two kinds: methods based on point objects [18][19][20] and methods based on 3D calibration objects [21].As mentioned above, 1D calibration methods are quite suitable for calibrating multiple cameras, and a practical 1D calibration method needs to be designed for a UMCLS.
On the other hand, 3D localization is a crucial function for unsynchronized multiple cameras, and there are some related research works.For example, Benrhaiem et al. [22] proposed a temporal offset-invariant 3D reconstruction method to solve the problem of camera unsynchronization.Their approach only deals with stereo cameras based on the epipolar geometry.Piao and Sato [23] proposed a method to achieve the synchronization of multiple cameras and compute the epipolar geometry from uncalibrated and unsynchronized cameras.In particular, they used the affine invariance on the frame numbers of camera images to find the synchronization.Considering that 3D localization needs to meet the real-time requirement in practice, the acquired unsynchronized multi-camera images should be quickly processed.Therefore, a fast and high-precision feature point localization method needs to be designed for a UMCLS.
In this paper, in order to solve the problem of unsynchronization, we propose a time synchronization scheme and a wand-based calibration method to complete the calibration of multiple perspective/fish-eye cameras.Then, we propose an EKF-based unsynchronized 3D localization method.Finally, we built a real UMCLS and verified its performance by using a flapping-wing micro air vehicle (FWAV) to accomplish fixed-height experiments.The main contributions of this study are as follows: (1) For unsynchronized perspective or fish-eye cameras, a wand-based multi-camera calibration method is proposed by using timestamp matching and pixel fitting, and the calibration result was verified by real experiments.
(2) An EKF-based 3D localization algorithm is proposed for unsynchronized perspective or fish-eye cameras.
(3) An actual UMCLS was built, and the performance of the system was evaluated through the feature point reconstruction experiments and fixed-height experiments of an FWAV.
The remainder of this paper is organized as follows.The problem formulation, designed calibration algorithm, and designed 3D localization algorithm are introduced in Section 2. Section 3 presents the results and a discussion of the calibration and fixed-height experiments.Section 4 is the conclusion of this paper.

General Camera Model
In order to broaden the applicability of the proposed methods to different cameras, this paper adopts the general camera model proposed by Kannala et al. [24], which is suitable for perspective cameras, as well as fish-eye cameras.As shown in Figure 1, the real fish-eye lense does not completely follow the conventional perspective model.The following is a general form of prediction for imaging: where θ is the angle between the principal axis and the incident light, r is the distance between the image point and the principal point, even powers are removed in order to extend r to the negative semi-axis as an odd function, and odd powers span the set of continuous odd functions.Note that the first five items already have enough degrees of freedom to approximate different projection curves.Therefore, the radially symmetric part of the camera model only contains the first five parameters k 1 , k 2 , k 3 , k 4 , k 5 .Let F be the mapping from the incident light to standardized image coordinates: where r(θ) contains the projection model of the first five items of Equation (1) and Φ = (θ, φ) is the direction of the incident light.For a real lens, the value of the parameter k i makes r(θ) monotonically increase over the interval [0, θ max ], where θ max is the maximum angle of view.Therefore, when calculating the inverse of F, the roots of the ninth-order polynomial can be found numerically and, then, real roots between 0 and θ max can be selected.
Assuming that the pixel coordinate system is orthogonal, we can obtain the pixel coordinate [u, v] T : where [u 0 , v 0 ] T is the center point, that is the pixel coordinate corresponding to the center of the imaging plane, and m u and m v are the number of pixels per unit distance in the horizontal and vertical directions, respectively.By combining (2) and (3), we obtain the forward camera model: where m = [u, v] T .The P9(k 1 , k 2 , k 3 , k 4 , k 5 , u 0 , v 0 , m u , m v ) camera model is used in this paper.The nine parameters consist of the five parameters of the radially symmetrical part, the center point, and the number of pixels per unit distance in the horizontal and vertical directions.

Problem Formulation
As shown in Figure 2, suppose we use two wired cameras and one wireless camera in our unsynchronized multi-camera system, and their camera coordinate systems are . The world coordinate system O w -X w Y w Z w is determined manually (generally, on the ground).In this paper, our main work was to first calibrate the internal and external parameters of the unsynchronized cameras and obtain the conversion relationship between the camera coordinate system O c i -X c i Y c i Z c i and the world coordinate system O w -X w Y w Z w .Then, complete the 3D localization of the feature points by using the unsynchronized cameras and accomplish real-time tracking of the feature points.Finally, a UMCLS was built to complete the fixed-height control of an FWAV to verify the system performance.The flowchart of the proposed multi-calibration method and the EKF-based 3D localization algorithm is shown in Figure 3.

Calibration Algorithm
This section is the core of the methods to solve the unsynchronized problem.The idea is that the acquired camera images are pre-processed or synchronized so that the unsynchronized multi-camera calibration based on the 1D wand could be completed.In the following, the two-camera and multi-camera situations are separately processed for image synchronization, and the corresponding processing methods are designed for the loss of marker points.

Pre-Processing for Two Cameras
As shown in Figure 4, since the frame rates of the two cameras are different, we can obtain the image sequences of the two cameras.Suppose that Cam1 with a low-frame-rate image sequence is used as the benchmark; we hope to obtain the image frames matching the image sequence of Cam1 from the image sequence of Cam2.Assume that the frame rates of Cam1 and Cam2 are M fps and N fps, respectively.As shown in Figure 5, the two line segments represent the shooting time sequence of the two cameras, and the end points on the line segment represent the time when the camera captures the images.∆t0 represents the time delay between the first frame of the two acquired images.Since the image sequence of Cam1 with a low frame rate is selected as the reference, the image sequence information of Cam1 is completely retained: where i is the serial number of the image, denoting that it is the ith image obtained by Cam1.Cam1[i] represents the coordinate information of all the marker points of the ith image frame.Since the 1D wand used in this paper (see Figure 6) has three markers, Cam1[i] denotes the pixel coordinates of the three markers.Cam1_new[i] represents the new image information of the ith image frame after pre-processing.Assuming that the frame rates of the two cameras remain unchanged, the following formula can be obtained: where ⌊ ⌋ means rounding down, that is obtaining the largest integer no greater than the number in it, and ∆t represents the ratio of the time of the two image frames in Cam2 corresponding to the ith image frame of Cam1.Then, we use the interpolation fitting method to obtain: where i is the serial number of the image and Cam2_new[i] represents the new image information of the ith image frame after processing.Note that the above equations are derived under ideal conditions.It is necessary to ensure that the frame rate of each camera is stable and the initial ∆t0 can be accurately measured.However, these two conditions are difficult to guarantee in practice and rely too much on the accuracy of the initial parameters.Once a certain value has errors or changes, it will have a great impact on the subsequent calculations so that the synchronization effect cannot be achieved.So, we simplified the method by stamping every frame of the images from each camera with a timestamp.
As shown in Equation ( 8), t is the timestamp corresponding to the N t th frame in Cam1.Search the images captured by Cam2 for two adjacent frames with timestamps satisfying t 1 < t ≤ t 2 , where t 1 and t 2 correspond to the timestamps of the Nth image frame Cam2[N] and the (N + 1)th image frame Cam2[N + 1], respectively.Although each image frame needs to be timestamped, the time error will not accumulate.In addition, since the timestamp is added when receiving an image on the ground station, there is no doubt that the built-in clocks of the cameras are unsynchronized.However, the time delay of transmitting image data is unstable for each camera, and this problem will be solved in the subsequent optimization of the camera parameters.

Pre-Processing for Multiple Cameras
Referring to the synchronization method of two cameras, select the camera with the lowest frame rate as the reference for multiple cameras, so as to obtain the new image sequences matching the image sequences of the other cameras and the reference camera.
As shown in Equation ( 9), t is the timestamp corresponding to the N t th image frame of the reference camera.Search the images captured by the other cameras for two adjacent frames with timestamps satisfying t n1 < t ≤ t n2 , where t n1 and t n2 correspond to the nth camera's Nth frame image information Camn[N n ] and (N + 1)th frame image information Camn[N n + 1], respectively.

Loss of Marker
When the marker is occluded or exceeds the camera's field of view, the camera cannot capture the information of the marker.Only when the three markers are captured by the same camera will we record the marker coordinates; otherwise, this image frame will be regarded as an invalid frame during the calibration process.
In Equation ( 8), for the timestamp t of the reference camera sequence, if condition t 1 < t ≤ t 2 is met, it is possible to find that both adjacent frames are not valid frames.So, we need to look for valid frames before t 1 or t 2 .The formula in the case of invalid frames can be obtained as shown in Equation (10), where Cam2[N t 1 ] represents the image frame of Cam2 corresponding to the valid timestamp t 1 and N t 1 represents the sequence number of the original image sequence of Cam2 at t 1 .Cam2[N t 2 ] means the same.This is carried out to obtain as much image data as possible for camera calibration.However, if the matching timestamps are too different, the fitted result will have large errors, so set the rule: where M is the frame rate of the current camera.

Multi-Camera Calibration Method
After the above pre-processing is performed on the image sequences acquired by the unsynchronized multi-camera system, the 1D-wand-based multi-camera calibration method proposed in our previous work [25] can be used.
First, the initialization of the camera's internal parameters needs to be completed.By taking pictures of the calibration board for each camera, the method in [24] can be used to complete the calibration of the internal parameters and obtain the initial internal parameter of each camera.Then, stereo camera calibration is performed.Given five or more corresponding feature points, the essential matrix can be calculated using the 5-point random sample consensus (RANSAC) algorithm [26].So, the initial value of the camera's external parameters (R 01 , T01 ) can be obtained through singular-value decomposition [27].It should be noted that T01 is the normalized translation vector, and the real translation vector needs to be obtained further.Then, the internal and external parameters of the cameras are nonlinearly optimized.Set A r j , B r j , C r j to be the reconstructed space coordinates of A, B, C in the jth frame image, respectively.Since the three points A, B, C are on the calibration wand, the error function can be obtained: where Note that (r 1 , r 2 , r 3 ) T ∈ R 3 is another form of the rotation matrix transformed by the Rodrigues formula [27] and (t x , t y , t z ) T ∈ R 3 is the translation vector.
Based on Equation ( 12), we can obtain the final objective function: which can be solved by using the Levenberg-Marquardt method [27].
The above solution can be refined by bundle adjustment, which involves both camera parameters and 3D space points.Using the bundle adjustment method not only optimizes the measurement error, but also optimizes the problems that cannot be solved during time synchronization.For example, when waving the calibration wand, it cannot be guaranteed to be a perfect uniform linear motion and the timestamps of the acquired images will be affected by the fluctuation of the transmission delay.The above problem can be well solved by bundle adjustment.
Since the 3D space points A j , B j and C j are collinear, they have the relation as follows: where L 1 represents the actual length of AB and n j = (sinϕ j cosθ j , sinϕ j sinθ j , cosϕ) T denotes the unit vector of the calibration wand.In order to improve the accuracy of the camera model, three more parameters (k 1 , k 2 , k 3 ) are added to each camera, and their initial values are set to zero.So, the final camera parameters involved in the optimization are: Let the function P i (x ′ , M) (i = 0, 1) denote that the 3D point M is projected onto the ith camera image plane under the parameters x ′ .The final optimization problem as shown in Equation ( 16) is solved by using the sparse Levenberg-Marquardt method [28].Based on the stereo calibration results, the internal and external parameters of the multicamera system could be obtained when the number of cameras is more than two (see [25] for details).

Three-Dimensional Localization Algorithm
In this section, we propose a real-time tracking algorithm for each feature point based on the extended Kalman filter, which is often used for information fusion of multi-sensor fusion, e.g., the IMU and cameras.The state model adopts the traditional linear model: where A ∈ R 6×6 represents a block diagonal matrix with each block [1, T s ; 0, 1] (T s denotes the sampling time) and γ k = 0, γ 1 k , 0, γ 2 k , 0, γ 3 k T ∈ R 6 models the motion uncertainties.
The measurement model for multiple cameras adopts the model presented in our previous study [29]: where 9 is the internal parameters of the ith camera, and w represent the rotation matrix and translation vector from the ith camera coordinate system to the world coordinate system, respectively.Note that α i and R c i w , T c i w could be obtained by using the calibration algorithm in Section 2.2.
Since the frame rate of each camera is different, it was set as FR i .For an actual unsynchronized multi-camera system, this paper selects the reciprocal of the maximum frame rate as the sampling time, that is The sampling time selected in this way can meet the requirements of actual filtering and also reduce the computational burden.In this paper, the prediction equation of the EKF is: The correction equation of the EKF is: where P k,k−1 is the prior variance of the estimated error, P k,k is the posterior variance of the estimated error, and K k is the Kalman gain matrix at step k.

System Construction and Experimental Design
As shown in Figure 7, the UMCLS has three cameras.Cam0 and Cam1 communicate with the ground computer through a switch.Cam2 communicates with the ground computer through a router.The two wires connected to Cam2 are used to supply power to the camera and the infrared light source around it, respectively.The frame rates of Cam0 and Cam1 can be adjusted, and the frame rate of Cam2 is constantly set to 110 Hz.In this way, the UMCLS can be used to study the influence of asynchronous connection caused by wired connection and wireless connection (WiFi) and the influence of different camera frame rates.
Note that all the cameras used in this paper are smart cameras, which can preprocess the acquired image and extract the center coordinate of each feature point, so the output of each camera is the center coordinate information of the feature points that have been processed.The proposed methods in this paper are suitable for more than three cameras, and we just took three cameras as an example.The proposed methods are also applicable to outdoor scenarios, but feature detection will be challenging in complex outdoor environments.
A sample image of the actual system is shown in Figure 7.We used three CatchBest CZE130MGEHD cameras, equipped with three AZURE-0420MM lenses (the focal length is 4 mm, and the field of view is 77.32 • ).Each camera has an infrared light-emitting diode (LED) light source and an infrared pass filter (850 nm wavelength), as shown in Figure 8.The image resolution is 640 px × 480 px.The ground computer used to process the data was a laptop with a 2.70 GHz AMD Ryzen 7 4800H Core processor and 16G RAM.The experimental design is now introduced.We first needed to complete the multicamera calibration for the UMCLS and verify the quality of calibration results through the root mean square (RMS) of the reprojection error of each camera.Secondly, after multicamera calibration, a reconstruction experiment was performed by using some reflective spheres.The accuracy of the 3D localization function is illustrated by calculating the errors between the reconstructed distances and the actual distances.Finally, we used the system to complete the fixed-height flight task of an FWAV.
The system diagram of the fixed-height flight mission is shown in Figure 9.We fixed a reflective ball on the FWAV.The 3D coordinates of the reflective sphere were reconstructed through the system.The required control command was then calculated by a PID controller.The control signal was sent to the FWAV through the wireless serial port module.After receiving the control signal, the FWAV completes the control of the motor speed and, then, accomplishes the closed-loop control of the fixed-height task.Figure 10 shows the FWAV used in this paper.There are three main components: reflective sphere, motor, and control board.

Calibration Experiment
We directly verified the proposed camera calibration method through real experiments.As shown in Figure 6, three infrared reflective balls A, B, C were placed on the 1D calibration wand.The distances between them satisfy the following.
Firstly, we performed subsequent calibration experiments on the stereo cameras.By changing the frame rates of the cameras, we compared the performance of the proposed method in this paper with that of the method mentioned in [9].Each experiment was performed three times.The results are shown in Tables 1-3.It can be found from Table 1 that the proposed method can deal with the unsynchronized problem caused by the different frame rates of the cameras, while the method in [9] cannot.Through the results of Tables 2 and 3, the proposed method can not only solve the unsynchronized problem caused by different frame rates, but also that caused by different communication delays.
Then, we adopted the above three cameras to perform multi-camera calibration by using the proposed method.We kept the frame rates of Cam0 and Cam1 as identical, and by changing their frame rates, we conducted two sets of experiments.The calibration results are shown in Tables 4 and 5.  Since the method in [9] cannot deal with the unsynchronized problem, the calibration results of the three cameras were not compared for the different methods.It can be found from Tables 4 and 5 that, as the discrepancy of the camera frame rates decreases, the reprojection errors of these cameras generally decrease as well, which is in line with the actual situation.Note that the reprojection errors of Cam0 are relatively large in Table 4.This is probably because the delay of network transmission has a considerable influence on the multi-camera calibration results.

Localization Experiment
After multi-camera calibration, we evaluated the accuracy of the 3D localization by reconstructing the 3D coordinates of the three reflective spheres.The specific implementation was to calculate and compare the reconstructed distances and the actual distances.We evaluated the reconstruction accuracy for the multi-camera calibration results under different circumstances, and the results are shown in Table 6.It was found that, although the frame rates of multiple cameras varied, the reconstruction errors did not obviously change.Note that the maximum reconstruction error was less than 20 mm, which can meet the requirements of many experiments.We investigated the performance of the UMCLS established in this paper through a fixed-height experiment of the FWAV shown in Figure 10.The fixed reflective sphere on the FWAV can be reconstructed through the UMCLS to obtain its position information in real-time.The control quantity was calculated by the PI control method, and the control signal was sent to the FWAV control board to realize the closed-loop control.Note that three cameras were used in the experiment, of which two cameras (with frame rates of 70 Hz and 80 Hz) communicated with the ground computer through a network cable and the remaining camera (with a frame rate of 110 Hz) communicated with the ground computer through WiFi.We completed two fixed-height experiments of 1200 mm and 1400 mm, and the results are shown in Figures 11 and 12, respectively.It can be concluded that the established UMCLS performed well for the fixed-height experiments, and both of the control errors were guaranteed to be within 20 mm.
Note that only three cameras were adopted in this paper and the coverage area of 3D localization was quite limited.In the future, more cameras can be used so that the coverage area is increased and the FWAV could perform more tasks.

Conclusions
In this paper, a wand-based calibration method was proposed to calibrate the internal and external parameters of unsynchronized multiple cameras.Compared with the traditional calibration methods, the superiority of the proposed calibration method is that it can solve the unsynchronized problem caused by different camera frame rates and communication delay.Combined with the extended Kalman filter, a 3D localization algorithm for unsynchronized multiple cameras was proposed.In addition, an actual UMCLS was established with three smart cameras, and the practicality of the localization system was verified through fixed-height control experiments of an FWAV.This paper is helpful for researchers who want to build a UMCLS for 3D localization by using inexpensive and off-the-shelf cameras.

Figure 3 .
Figure 3. Flowchart of the proposed camera calibration and 3D localization methods.

Figure 8 .
Figure 8. Image of the camera used in the experiments.

Figure 9 .
Figure 9. System diagram of fixed-height flight task for the flapping-wing micro air vehicle.

Table 6 .
Reconstruction errors of three calibration results.