Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization

Zhang, Sujie; Fu, Qiang

doi:10.3390/s24010284

Open AccessArticle

Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization

by

Sujie Zhang

¹ and

Qiang Fu

^2,3,4,*

¹

Tianjin College, University of Science and Technology Beijing, Tianjin 301830, China

²

School of Intelligence Science and Technology, University of Science and Technology Beijing, Beijing 100083, China

³

Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing 100083, China

⁴

Key Laboratory of Intelligent Bionic Unmanned Systems, Ministry of Education, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(1), 284; https://doi.org/10.3390/s24010284

Submission received: 7 December 2023 / Revised: 29 December 2023 / Accepted: 31 December 2023 / Published: 3 January 2024

(This article belongs to the Special Issue Sensors and Techniques for Indoor Positioning and Localization)

Download

Browse Figures

Versions Notes

Abstract

Three-dimensional (3D) localization plays an important role in visual sensor networks. However, the frame rate and flexibility of the existing vision-based localization systems are limited by using synchronized multiple cameras. For such a purpose, this paper focuses on developing an indoor 3D localization system based on unsynchronized multiple cameras. First of all, we propose a calibration method for unsynchronized perspective/fish-eye cameras based on timestamp matching and pixel fitting by using a wand under general motions. With the multi-camera calibration result, we then designed a localization method for the unsynchronized multi-camera system based on the extended Kalman filter (EKF). Finally, extensive experiments were conducted to demonstrate the effectiveness of the established 3D localization system. The obtained results provided valuable insights into the camera calibration and 3D localization of unsynchronized multiple cameras in visual sensor networks.

Keywords:

camera calibration; unsynchronized multi-camera system; timestamp; 3D localization

1. Introduction

Currently, multi-camera localization is used in many fields, e.g., the currently hot field of autonomous driving [1,2]. But, in many real-world scenarios such as swarm formations and mobile robots, multiple cameras often constitute an unsynchronized multi-camera localization system (UMCLS) without additional synchronization processing [3]. The main challenge posed by unsynchronized cameras is that each scene point would be captured at different instants by each camera. This would induce triangulation errors because the two image rays would either intersect at an incorrect location or simply not intersect at all [4]. Therefore, unsynchronization would bring large errors to traditional multi-camera calibration and localization algorithms, which are designed on the basis of synchronized cameras. This paper addresses the challenge posed by unsynchronized cameras by using timestamp matching and pixel fitting.

For a UMCLS, the first step is to perform multi-camera calibration, which aims to accurately compute the intrinsic parameters (principal point, lens distortion, etc.) and the extrinsic parameters (rotation matrix and translation vector between the camera coordinate system and the reference coordinate system) of each camera. Multi-camera calibration is the basis of 3D localization since the calibration results would be subsequently used during the process of 3D localization. The localization accuracy of a UMCLS will be determined by the calibration accuracy of multiple cameras directly, so the process of multi-camera calibration is very important. According to the dimension of the calibration object, existing multi-camera calibration methods can be roughly divided into five kinds: methods based on 3D calibration objects [5,6], methods based on 2D calibration objects [7,8], methods based on 1D calibration objects [9], methods based on point objects [10], and self-calibration methods [11,12,13,14,15,16]. Note that methods based on 1D calibration objects can quickly and easily complete the calibration of multiple cameras without being affected by occlusion [17], so this paper chooses to use the 1D calibration method.

Until now, most of the multi-camera calibration methods are developed for synchronized cameras, and synchronization is controlled by a hardware trigger. But, in many cases, the camera frame rates are different, the cameras work asynchronously, and the acquired image sequences are naturally not matched. Therefore, image synchronization and camera calibration are usually carried out simultaneously for multiple unsynchronized cameras. From the perspective of the calibration object, existing calibration methods for unsynchronized cameras could be generally classified into two kinds: methods based on point objects [18,19,20] and methods based on 3D calibration objects [21]. As mentioned above, 1D calibration methods are quite suitable for calibrating multiple cameras, and a practical 1D calibration method needs to be designed for a UMCLS.

On the other hand, 3D localization is a crucial function for unsynchronized multiple cameras, and there are some related research works. For example, Benrhaiem et al. [22] proposed a temporal offset-invariant 3D reconstruction method to solve the problem of camera unsynchronization. Their approach only deals with stereo cameras based on the epipolar geometry. Piao and Sato [23] proposed a method to achieve the synchronization of multiple cameras and compute the epipolar geometry from uncalibrated and unsynchronized cameras. In particular, they used the affine invariance on the frame numbers of camera images to find the synchronization. Considering that 3D localization needs to meet the real-time requirement in practice, the acquired unsynchronized multi-camera images should be quickly processed. Therefore, a fast and high-precision feature point localization method needs to be designed for a UMCLS.

In this paper, in order to solve the problem of unsynchronization, we propose a time synchronization scheme and a wand-based calibration method to complete the calibration of multiple perspective/fish-eye cameras. Then, we propose an EKF-based unsynchronized 3D localization method. Finally, we built a real UMCLS and verified its performance by using a flapping-wing micro air vehicle (FWAV) to accomplish fixed-height experiments. The main contributions of this study are as follows:

(1) For unsynchronized perspective or fish-eye cameras, a wand-based multi-camera calibration method is proposed by using timestamp matching and pixel fitting, and the calibration result was verified by real experiments.

(2) An EKF-based 3D localization algorithm is proposed for unsynchronized perspective or fish-eye cameras.

(3) An actual UMCLS was built, and the performance of the system was evaluated through the feature point reconstruction experiments and fixed-height experiments of an FWAV.

The remainder of this paper is organized as follows. The problem formulation, designed calibration algorithm, and designed 3D localization algorithm are introduced in Section 2. Section 3 presents the results and a discussion of the calibration and fixed-height experiments. Section 4 is the conclusion of this paper.

2. Materials and Methods

2.1. Preliminaries and Problem Formulation

2.1.1. General Camera Model

In order to broaden the applicability of the proposed methods to different cameras, this paper adopts the general camera model proposed by Kannala et al. [24], which is suitable for perspective cameras, as well as fish-eye cameras. As shown in Figure 1, the real fish-eye lense does not completely follow the conventional perspective model. The following is a general form of prediction for imaging:

r (θ) = k_{1} θ + k_{2} θ^{3} + k_{3} θ^{5} + k_{4} θ^{7} + k_{5} θ^{9} + \dots

(1)

where

θ

is the angle between the principal axis and the incident light, r is the distance between the image point and the principal point, even powers are removed in order to extend r to the negative semi-axis as an odd function, and odd powers span the set of continuous odd functions. Note that the first five items already have enough degrees of freedom to approximate different projection curves. Therefore, the radially symmetric part of the camera model only contains the first five parameters

k_{1}, k_{2}, k_{3}, k_{4}, k_{5}

.

Let F be the mapping from the incident light to standardized image coordinates:

[\begin{matrix} x \\ y \end{matrix}] = r (θ) [\begin{matrix} cos φ \\ sin φ \end{matrix}] = F (Φ)

(2)

where

r (θ)

contains the projection model of the first five items of Equation

(1)

and

Φ = (θ, φ)

is the direction of the incident light. For a real lens, the value of the parameter

k_{i}

makes

r (θ)

monotonically increase over the interval

[0, θ_{max}]

, where

θ_{max}

is the maximum angle of view. Therefore, when calculating the inverse of F, the roots of the ninth-order polynomial can be found numerically and, then, real roots between 0 and

θ_{max}

can be selected.

Assuming that the pixel coordinate system is orthogonal, we can obtain the pixel coordinate

{[u, v]}^{T}

:

[\begin{matrix} u \\ v \end{matrix}] = [\begin{matrix} m_{u} & 0 \\ 0 & m_{v} \end{matrix}] [\begin{matrix} x \\ y \end{matrix}] + [\begin{matrix} u_{0} \\ v_{0} \end{matrix}]

(3)

where

{[u_{0}, v_{0}]}^{T}

is the center point, that is the pixel coordinate corresponding to the center of the imaging plane, and

m_{u}

and

m_{v}

are the number of pixels per unit distance in the horizontal and vertical directions, respectively.

By combining

(2)

and

(3)

, we obtain the forward camera model:

m = P_{c} (Φ)

(4)

where

m = {[u, v]}^{T}

. The P9

(k_{1}, k_{2}, k_{3}, k_{4}, k_{5}, u_{0}, v_{0}, m_{u}, m_{v})

camera model is used in this paper. The nine parameters consist of the five parameters of the radially symmetrical part, the center point, and the number of pixels per unit distance in the horizontal and vertical directions.

2.1.2. Problem Formulation

As shown in Figure 2, suppose we use two wired cameras and one wireless camera in our unsynchronized multi-camera system, and their camera coordinate systems are

O_{c_{i}} - X_{c_{i}} Y_{c_{i}} Z_{c_{i}} (i = 0, 1, 2)

. The world coordinate system

O_{w} - X_{w} Y_{w} Z_{w}

is determined manually (generally, on the ground). In this paper, our main work was to first calibrate the internal and external parameters of the unsynchronized cameras and obtain the conversion relationship between the camera coordinate system

O_{c_{i}} - X_{c_{i}} Y_{c_{i}} Z_{c_{i}}

and the world coordinate system

O_{w} - X_{w} Y_{w} Z_{w}

. Then, complete the 3D localization of the feature points by using the unsynchronized cameras and accomplish real-time tracking of the feature points. Finally, a UMCLS was built to complete the fixed-height control of an FWAV to verify the system performance. The flowchart of the proposed multi-calibration method and the EKF-based 3D localization algorithm is shown in Figure 3.

2.2. Calibration Algorithm

This section is the core of the methods to solve the unsynchronized problem. The idea is that the acquired camera images are pre-processed or synchronized so that the unsynchronized multi-camera calibration based on the 1D wand could be completed. In the following, the two-camera and multi-camera situations are separately processed for image synchronization, and the corresponding processing methods are designed for the loss of marker points.

2.2.1. Pre-Processing for Two Cameras

As shown in Figure 4, since the frame rates of the two cameras are different, we can obtain the image sequences of the two cameras. Suppose that Cam1 with a low-frame-rate image sequence is used as the benchmark; we hope to obtain the image frames matching the image sequence of Cam1 from the image sequence of Cam2. Assume that the frame rates of Cam1 and Cam2 are M fps and N fps, respectively. As shown in Figure 5, the two line segments represent the shooting time sequence of the two cameras, and the end points on the line segment represent the time when the camera captures the images.

Δ t 0

represents the time delay between the first frame of the two acquired images. Since the image sequence of Cam1 with a low frame rate is selected as the reference, the image sequence information of Cam1 is completely retained:

Cam 1_new [i] = Cam 1 [i]

(5)

where i is the serial number of the image, denoting that it is the ith image obtained by Cam1.

Cam 1 [i]

represents the coordinate information of all the marker points of the ith image frame. Since the 1D wand used in this paper (see Figure 6) has three markers,

Cam 1 [i]

denotes the pixel coordinates of the three markers.

Cam 1_new [i]

represents the new image information of the ith image frame after pre-processing.

Assuming that the frame rates of the two cameras remain unchanged, the following formula can be obtained:

Δ t = (\frac{i}{M} + Δ t 0) \times N - ⌊ (\frac{i}{M} + Δ t 0) \times N ⌋

(6)

where

⌊ ⌋

means rounding down, that is obtaining the largest integer no greater than the number in it, and

Δ t

represents the ratio of the time of the two image frames in Cam2 corresponding to the ith image frame of Cam1. Then, we use the interpolation fitting method to obtain:

\begin{matrix} Cam 2_new [i] = (1 - Δ t) \times Cam 2 [⌊ (\frac{i}{M} + Δ t 0) \times N ⌋] \\ + Δ t \times Cam 2 [⌊ (\frac{i}{M} + Δ t 0) \times N ⌋ + 1] \end{matrix}

(7)

where i is the serial number of the image and

Cam 2_new [i]

represents the new image information of the ith image frame after processing. Note that the above equations are derived under ideal conditions. It is necessary to ensure that the frame rate of each camera is stable and the initial

Δ t 0

can be accurately measured. However, these two conditions are difficult to guarantee in practice and rely too much on the accuracy of the initial parameters. Once a certain value has errors or changes, it will have a great impact on the subsequent calculations so that the synchronization effect cannot be achieved. So, we simplified the method by stamping every frame of the images from each camera with a timestamp.

Cam 2_new [N_{t}] = \frac{(t_{2} - t) \times Cam 2 [N] + (t - t_{1}) \times Cam 2 [N + 1]}{(t_{2} - t_{1})}

(8)

As shown in Equation (8),

t

is the timestamp corresponding to the

N_{t}

th frame in Cam1. Search the images captured by Cam2 for two adjacent frames with timestamps satisfying

t_{1} < t \leq t_{2}

, where

t_{1}

and

t_{2}

correspond to the timestamps of the

N

th image frame

Cam 2 [N]

and the (

N + 1

)th image frame

Cam 2 [N + 1]

, respectively.

Although each image frame needs to be timestamped, the time error will not accumulate. In addition, since the timestamp is added when receiving an image on the ground station, there is no doubt that the built-in clocks of the cameras are unsynchronized. However, the time delay of transmitting image data is unstable for each camera, and this problem will be solved in the subsequent optimization of the camera parameters.

2.2.2. Pre-Processing for Multiple Cameras

Referring to the synchronization method of two cameras, select the camera with the lowest frame rate as the reference for multiple cameras, so as to obtain the new image sequences matching the image sequences of the other cameras and the reference camera.

\begin{matrix} Camn_new [N_{t}] = \frac{(t_{n 2} - t) \times Cam n [N_{n}] + (t - t_{n 1}) \times Cam n [N_{n} + 1]}{(t_{n 2} - t_{n 1})} \end{matrix}

(9)

\begin{matrix} Cam 2_new [N_{t}] = \frac{(t_{2} - t) \times Cam 2 [N_{t_{1}}] + (t - t_{1}) \times Cam 2 [N_{t_{2}}]}{(t_{2} - t_{1})} \end{matrix}

(10)

As shown in Equation (9),

t

is the timestamp corresponding to the

N_{t}

th image frame of the reference camera. Search the images captured by the other cameras for two adjacent frames with timestamps satisfying

t_{n 1} < t \leq t_{n 2}

, where

t_{n 1}

and

t_{n 2}

correspond to the nth camera’s

N

th frame image information

Cam n [N_{n}]

and

(N + 1)

th frame image information

Cam n [N_{n} + 1]

, respectively.

2.2.3. Loss of Marker

When the marker is occluded or exceeds the camera’s field of view, the camera cannot capture the information of the marker. Only when the three markers are captured by the same camera will we record the marker coordinates; otherwise, this image frame will be regarded as an invalid frame during the calibration process.

In Equation (8), for the timestamp

t

of the reference camera sequence, if condition

t_{1} < t \leq t_{2}

is met, it is possible to find that both adjacent frames are not valid frames. So, we need to look for valid frames before

t_{1}

or

t_{2}

. The formula in the case of invalid frames can be obtained as shown in Equation (10), where

Cam 2 [N_{t_{1}}]

represents the image frame of Cam2 corresponding to the valid timestamp

t_{1}

and

N_{t_{1}}

represents the sequence number of the original image sequence of Cam2 at

t_{1}

.

Cam 2 [N_{t_{2}}]

means the same.

This is carried out to obtain as much image data as possible for camera calibration. However, if the matching timestamps are too different, the fitted result will have large errors, so set the rule:

\begin{matrix} |t_{1} - t| \leq 2 / M \\ |t_{2} - t| \leq 2 / M \end{matrix}

(11)

where M is the frame rate of the current camera.

2.2.4. Multi-Camera Calibration Method

After the above pre-processing is performed on the image sequences acquired by the unsynchronized multi-camera system, the 1D-wand-based multi-camera calibration method proposed in our previous work [25] can be used.

First, the initialization of the camera’s internal parameters needs to be completed. By taking pictures of the calibration board for each camera, the method in [24] can be used to complete the calibration of the internal parameters and obtain the initial internal parameter

(k_{1}^{i}, k_{2}^{i}, m_{u}^{i}, m_{v}^{i}, u_{0}^{i}, v_{0}^{i})

of each camera.

Then, stereo camera calibration is performed. Given five or more corresponding feature points, the essential matrix can be calculated using the 5-point random sample consensus (RANSAC) algorithm [26]. So, the initial value of the camera’s external parameters (

R_{01}, {\bar{T}}_{01}

) can be obtained through singular-value decomposition [27]. It should be noted that

{\bar{T}}_{01}

is the normalized translation vector, and the real translation vector needs to be obtained further. Then, the internal and external parameters of the cameras are nonlinearly optimized. Set

A_{j}^{r}, B_{j}^{r}, C_{j}^{r}

to be the reconstructed space coordinates of

A, B, C

in the jth frame image, respectively. Since the three points

A, B, C

are on the calibration wand, the error function can be obtained:

\begin{matrix} g_{1, j} (x) = L_{1} - ∥A_{j}^{r} - B_{j}^{r}∥ \\ g_{2, j} (x) = L_{2} - ∥B_{j}^{r} - C_{j}^{r}∥ \\ g_{3, j} (x) = L - ∥A_{j}^{r} - C_{j}^{r}∥ \end{matrix}

(12)

where

x = (k_{1}^{0}, k_{2}^{0}, m_{u}^{0}, m_{v}^{0}, u_{0}^{0}, v_{0}^{0}, k_{1}^{1}, k_{2}^{1}, m_{u}^{1}, m_{v}^{1}, u_{0}^{1}, v_{0}^{1}, r_{1},

r_{2}, r_{3}, t_{x}, t_{y}, t_{z})^{T} \in R^{18}

. Note that

(r_{1},

r_{2}, r_{3})^{T} \in R^{3}

is another form of the rotation matrix transformed by the Rodrigues formula [27] and

{(t_{x}, t_{y}, t_{z})}^{T} \in R^{3}

is the translation vector.

Based on Equation (12), we can obtain the final objective function:

x^{*} = arg min_{x} \sum_{j = 1}^{N} (g_{1, j}^{2} (x) + g_{2, j}^{2} (x) + g_{3, j}^{2} (x))

(13)

which can be solved by using the Levenberg–Marquardt method [27].

The above solution can be refined by bundle adjustment, which involves both camera parameters and 3D space points. Using the bundle adjustment method not only optimizes the measurement error, but also optimizes the problems that cannot be solved during time synchronization. For example, when waving the calibration wand, it cannot be guaranteed to be a perfect uniform linear motion and the timestamps of the acquired images will be affected by the fluctuation of the transmission delay. The above problem can be well solved by bundle adjustment.

Since the 3D space points

A_{j}, B_{j}

and

C_{j}

are collinear, they have the relation as follows:

\{\begin{matrix} B_{j} = f_{B} (A_{j}, ϕ_{j}, θ_{j}) = A_{j} + L_{1} \cdot n_{j} \\ C_{j} = f_{C} (A_{j}, ϕ_{j}, θ_{j}) = A_{j} + L \cdot n_{j} \end{matrix}

(14)

where

L_{1}

represents the actual length of

AB

and

n_{j} = {(s i n ϕ_{j} c o s θ_{j}, s i n ϕ_{j} s i n θ_{j}, c o s ϕ)}^{T}

denotes the unit vector of the calibration wand. In order to improve the accuracy of the camera model, three more parameters

(k_{1}, k_{2}, k_{3})

are added to each camera, and their initial values are set to zero. So, the final camera parameters involved in the optimization are:

\begin{matrix} x^{'} = (k_{1}^{0}, k_{2}^{0}, m_{u}^{0}, m_{v}^{0}, u_{0}^{0}, v_{0}^{0}, k_{3}^{0}, k_{4}^{0}, k_{5}^{0}, k_{1}^{1}, k_{2}^{1}, m_{u}^{1}, \\ {m_{v}^{1}, u_{0}^{1}, v_{0}^{1}, k_{3}^{1}, k_{4}^{1}, k_{5}^{1}, r_{1}, r_{2}, r_{3}, t_{x}, t_{y}, t_{z})}^{T} \in R^{24} \end{matrix}

(15)

Let the function

P_{i} (x^{'}, M)

(i = 0, 1) denote that the 3D point

M

is projected onto the ith camera image plane under the parameters

x^{'}

. The final optimization problem as shown in Equation (16) is solved by using the sparse Levenberg–Marquardt method [28]. Based on the stereo calibration results, the internal and external parameters of the multi-camera system could be obtained when the number of cameras is more than two (see [25] for details).

\begin{matrix} min_{x^{'}, A_{j}, ϕ_{j}, θ_{j}} \sum_{i = 0}^{1} \sum_{j = 1}^{N_{1}} ({∥a_{i j} - P_{i} (x^{'}, A_{j})∥}^{2} + {∥b_{i j} - P_{i} (x^{'}, f_{B} (A_{j}, ϕ_{j}, θ_{j}))∥}^{2} + \\ {∥c_{i j} - P_{i} (x^{'}, f_{C} (A_{j}, ϕ_{j}, θ_{j}))∥}^{2}) \end{matrix}

(16)

2.3. Three-Dimensional Localization Algorithm

In this section, we propose a real-time tracking algorithm for each feature point based on the extended Kalman filter, which is often used for information fusion of multi-sensor fusion, e.g., the IMU and cameras. The state model adopts the traditional linear model:

x_{k} = A x_{k - 1} + γ_{k}

(17)

where

A \in R^{6 \times 6}

represents a block diagonal matrix with each block

[1, T_{s}; 0, 1]

(

T_{s}

denotes the sampling time) and

γ_{k} = {[0, γ_{k}^{1}, 0, γ_{k}^{2}, 0, γ_{k}^{3}]}^{T} \in R^{6}

models the motion uncertainties. The measurement model for multiple cameras adopts the model presented in our previous study [29]:

z_{k} = g (x_{k}, α_{i}, R_{w}^{c_{i}}, T_{w}^{c_{i}}) + v_{k}

(18)

where

z_{k} = {[z_{k}^{c_{1}} \dots z_{k}^{c_{N}}]}_{k}^{T} \in R^{2 N}

(N is the number of cameras),

g (\cdot) = {[g^{c_{1}} (\cdot) \dots g^{c_{N}} (\cdot)]}_{k}^{T} \in R^{2 N}

,

x_{k} = [X_{k}, V_{x, k}, Y_{k}, V_{y, k}, Z_{k}, V_{z, k}] \in R^{6}

,

α_{i} \in R^{9}

is the internal parameters of the ith camera, and

v_{k} = {[\begin{matrix} v_{k}^{c_{1}} & \dots & v_{k}^{c_{N}} \end{matrix}]}_{k}^{T} \in R^{2 N}

and

(R_{w}^{c_{i}}, T_{w}^{c_{i}})

represent the rotation matrix and translation vector from the ith camera coordinate system to the world coordinate system, respectively. Note that

α_{i}

and

(R_{w}^{c_{i}}, T_{w}^{c_{i}})

could be obtained by using the calibration algorithm in Section 2.2.

Since the frame rate of each camera is different, it was set as

{FR}_{i}

. For an actual unsynchronized multi-camera system, this paper selects the reciprocal of the maximum frame rate as the sampling time, that is

T_{s} = \frac{1}{m a x ({FR}_{1}, {FR}_{2}, \dots, {FR}_{N})}

(19)

The sampling time selected in this way can meet the requirements of actual filtering and also reduce the computational burden. In this paper, the prediction equation of the EKF is:

\begin{matrix} {\hat{x}}_{k, k - 1} = A {\hat{x}}_{k - 1, k - 1} \\ P_{k, k - 1} = A P_{k - 1, k - 1} A^{T} + Q_{k - 1} \end{matrix}

(20)

The correction equation of the EKF is:

\begin{matrix} H_{k} & = {\frac{\partial G (x)}{\partial x}|}_{x = {\hat{x}}_{k, k - 1}} \\ K_{k} & = P_{k, k - 1} H_{k}^{T} {(R_{k} + H_{k} P_{k, k - 1} H_{k}^{T})}^{- 1} \\ {\hat{x}}_{k, k} & = {\hat{x}}_{k, k - 1} + K_{k} (Z_{k} - G ({\hat{x}}_{k, k - 1})) \\ P_{k, k} & = P_{k, k - 1} - K_{k} H_{k} P_{k, k - 1} \end{matrix}

(21)

where

P_{k, k - 1}

is the prior variance of the estimated error,

P_{k, k}

is the posterior variance of the estimated error, and

K_{k}

is the Kalman gain matrix at step k.

3. Results and Discussion

3.1. System Construction and Experimental Design

As shown in Figure 7, the UMCLS has three cameras. Cam0 and Cam1 communicate with the ground computer through a switch. Cam2 communicates with the ground computer through a router. The two wires connected to Cam2 are used to supply power to the camera and the infrared light source around it, respectively. The frame rates of Cam0 and Cam1 can be adjusted, and the frame rate of Cam2 is constantly set to 110 Hz. In this way, the UMCLS can be used to study the influence of asynchronous connection caused by wired connection and wireless connection (WiFi) and the influence of different camera frame rates.

Note that all the cameras used in this paper are smart cameras, which can pre-process the acquired image and extract the center coordinate of each feature point, so the output of each camera is the center coordinate information of the feature points that have been processed. The proposed methods in this paper are suitable for more than three cameras, and we just took three cameras as an example. The proposed methods are also applicable to outdoor scenarios, but feature detection will be challenging in complex outdoor environments.

A sample image of the actual system is shown in Figure 7. We used three CatchBest CZE130MGEHD cameras, equipped with three AZURE-0420MM lenses (the focal length is 4 mm, and the field of view is 77.32

^{\circ}

). Each camera has an infrared light-emitting diode (LED) light source and an infrared pass filter (850 nm wavelength), as shown in Figure 8. The image resolution is 640 px × 480 px. The ground computer used to process the data was a laptop with a 2.70 GHz AMD Ryzen 7 4800H Core processor and 16G RAM.

The experimental design is now introduced. We first needed to complete the multi-camera calibration for the UMCLS and verify the quality of calibration results through the root mean square (RMS) of the reprojection error of each camera. Secondly, after multi-camera calibration, a reconstruction experiment was performed by using some reflective spheres. The accuracy of the 3D localization function is illustrated by calculating the errors between the reconstructed distances and the actual distances. Finally, we used the system to complete the fixed-height flight task of an FWAV.

The system diagram of the fixed-height flight mission is shown in Figure 9. We fixed a reflective ball on the FWAV. The 3D coordinates of the reflective sphere were reconstructed through the system. The required control command was then calculated by a PID controller. The control signal was sent to the FWAV through the wireless serial port module. After receiving the control signal, the FWAV completes the control of the motor speed and, then, accomplishes the closed-loop control of the fixed-height task. Figure 10 shows the FWAV used in this paper. There are three main components: reflective sphere, motor, and control board.

3.2. Experimental Result

3.2.1. Calibration Experiment

We directly verified the proposed camera calibration method through real experiments. As shown in Figure 6, three infrared reflective balls

A, B, C

were placed on the 1D calibration wand. The distances between them satisfy the following.

\begin{matrix} L_{1} & = ∥ A - B ∥ = 130 mm \\ L_{2} & = ∥ B - C ∥ = 260 mm \\ L & = ∥ A - C ∥ = 390 mm \end{matrix}

(22)

Firstly, we performed subsequent calibration experiments on the stereo cameras. By changing the frame rates of the cameras, we compared the performance of the proposed method in this paper with that of the method mentioned in [9]. Each experiment was performed three times. The results are shown in Table 1, Table 2 and Table 3.

It can be found from Table 1 that the proposed method can deal with the unsynchronized problem caused by the different frame rates of the cameras, while the method in [9] cannot. Through the results of Table 2 and Table 3, the proposed method can not only solve the unsynchronized problem caused by different frame rates, but also that caused by different communication delays.

Then, we adopted the above three cameras to perform multi-camera calibration by using the proposed method. We kept the frame rates of Cam0 and Cam1 as identical, and by changing their frame rates, we conducted two sets of experiments. The calibration results are shown in Table 4 and Table 5.

Since the method in [9] cannot deal with the unsynchronized problem, the calibration results of the three cameras were not compared for the different methods. It can be found from Table 4 and Table 5 that, as the discrepancy of the camera frame rates decreases, the reprojection errors of these cameras generally decrease as well, which is in line with the actual situation. Note that the reprojection errors of Cam0 are relatively large in Table 4. This is probably because the delay of network transmission has a considerable influence on the multi-camera calibration results.

3.2.2. Localization Experiment

After multi-camera calibration, we evaluated the accuracy of the 3D localization by reconstructing the 3D coordinates of the three reflective spheres. The specific implementation was to calculate and compare the reconstructed distances and the actual distances. We evaluated the reconstruction accuracy for the multi-camera calibration results under different circumstances, and the results are shown in Table 6. It was found that, although the frame rates of multiple cameras varied, the reconstruction errors did not obviously change. Note that the maximum reconstruction error was less than 20 mm, which can meet the requirements of many experiments.

3.2.3. Fixed-Height Experiment

We investigated the performance of the UMCLS established in this paper through a fixed-height experiment of the FWAV shown in Figure 10. The fixed reflective sphere on the FWAV can be reconstructed through the UMCLS to obtain its position information in real-time. The control quantity was calculated by the PI control method, and the control signal was sent to the FWAV control board to realize the closed-loop control. Note that three cameras were used in the experiment, of which two cameras (with frame rates of 70 Hz and 80 Hz) communicated with the ground computer through a network cable and the remaining camera (with a frame rate of 110 Hz) communicated with the ground computer through WiFi. We completed two fixed-height experiments of 1200 mm and 1400 mm, and the results are shown in Figure 11 and Figure 12, respectively. It can be concluded that the established UMCLS performed well for the fixed-height experiments, and both of the control errors were guaranteed to be within 20 mm.

Note that only three cameras were adopted in this paper and the coverage area of 3D localization was quite limited. In the future, more cameras can be used so that the coverage area is increased and the FWAV could perform more tasks.

4. Conclusions

In this paper, a wand-based calibration method was proposed to calibrate the internal and external parameters of unsynchronized multiple cameras. Compared with the traditional calibration methods, the superiority of the proposed calibration method is that it can solve the unsynchronized problem caused by different camera frame rates and communication delay. Combined with the extended Kalman filter, a 3D localization algorithm for unsynchronized multiple cameras was proposed. In addition, an actual UMCLS was established with three smart cameras, and the practicality of the localization system was verified through fixed-height control experiments of an FWAV. This paper is helpful for researchers who want to build a UMCLS for 3D localization by using inexpensive and off-the-shelf cameras.

Author Contributions

Conceptualization, S.Z.; methodology, S.Z.; software, S.Z.; validation, S.Z.; investigation, Q.F.; resources, Q.F.; data curation, S.Z.; writing—original draft preparation, S.Z. and Q.F.; writing—review and editing, S.Z. and Q.F.; supervision, Q.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62173031, the Tianjin Education Commission Scientific Research Program Project under Grant 2021KJ066, the Interdisciplinary Research Project for Young Teachers of USTB (Fundamental Research Funds for the Central Universities) under Grant FRF-IDRY-22-029, and the Beijing Top Discipline for Artificial Intelligent Science and Engineering, University of Science and Technology Beijing.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zeng, Y.; Hu, Y.; Liu, S.; Ye, J.; Han, Y.; Li, X.; Sun, N. Rt3d: Real-time 3-d vehicle detection in lidar point cloud for autonomous driving. IEEE Robot. Autom. Lett. 2018, 3, 3434–3440. [Google Scholar] [CrossRef]
Deng, H.; Fu, Q.; Quan, Q.; Yang, K.; Cai, K.-Y. Indoor multi-camera-based testbed for 3-D tracking and control of UAVs. IEEE Trans. Instrum. Meas. 2019, 69, 3139–3156. [Google Scholar] [CrossRef]
Sivrikaya, F.; Yener, B. Time synchronization in sensor networks: A survey. IEEE Netw. 2004, 18, 45–50. [Google Scholar] [CrossRef]
Huang, Y.; Xue, C.; Zhu, F.; Wang, W.; Zhang, Y.; Chambers, J.A. Adaptive recursive decentralized cooperative localization for multirobot systems with time-varying measurement accuracy. IEEE Trans. Instrum. Meas. 2021, 70, 1–25. [Google Scholar] [CrossRef]
Puig, L.; Bastanlar, Y.; Sturm, P.; Guerrero, J.J.; Barreto, J. Calibration of central catadioptric cameras using a DLT-like approach. Int. J. Comput. Vis. 2011, 93, 101–114. [Google Scholar] [CrossRef]
Du, B.; Zhu, H. Estimating fisheye camera parameters using one single image of 3D pattern. In Proceedings of the 2011 International Conference on Electric Information and Control Engineering, Wuhan, China, 15–17 April 2011; pp. 1–4. [Google Scholar]
Song, L.; Wu, W.; Guo, J.; Li, X. Survey on camera calibration technique. In Proceedings of the 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 26–27 August 2013; pp. 389–392. [Google Scholar]
Mei, C.; Rives, P. Single view point omnidirectional camera calibration from planar grids. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Rome, Italy, 10–14 April 2007; pp. 3945–3950. [Google Scholar]
Deng, H.; Yang, K.; Quan, Q.; Cai, K.-Y. Accurate and flexible calibration method for a class of visual sensor networks. IEEE Sens. J. 2019, 20, 3257–3269. [Google Scholar] [CrossRef]
Zhou, T.; Cheng, X.; Lin, P.; Wu, Z.; Liu, E. A general point-based method for self-calibration of terrestrial laser scanners considering stochastic information. Remote Sens. 2020, 12, 2923. [Google Scholar] [CrossRef]
Faugeras, O.D.; Luong, Q.T.; Maybank, S.J. Camera self-calibration: Theory and experiments. In Proceedings of the Second European Conference on Computer Vision, Santa Margherita Ligure, Italy, 19–22 May 1992; pp. 321–334. [Google Scholar]
Maybank, S.J.; Faugeras, O.D. A theory of self-calibration of a moving camera. Int. J. Comput. Vis. 1992, 8, 123–151. [Google Scholar] [CrossRef]
Hartley, R.I. Self-calibration of stationary cameras. Int. J. Comput. Vis. 1997, 22, 5–23. [Google Scholar] [CrossRef]
Sang, D.M. A self-calibration technique for active vision systems. IEEE Trans. Robot. Autom. 1996, 12, 114–120. [Google Scholar] [CrossRef]
Kwon, H.; Park, J.; Kak, A.C. A new approach for active stereo camera calibration. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Rome, Italy, 10–14 April 2007; pp. 3180–3185. [Google Scholar]
Khan, A.; Aragon-Camarasa, G.; Sun, L.; Siebert, J.P. On the calibration of active binocular and RGBD vision systems for dual-arm robots. In Proceedings of the 2016 IEEE International Conference on Robotics and Biomimetics (ROBIO), Qingdao, China, 3–7 December 2016; pp. 1960–1965. [Google Scholar]
Zhang, Z. Camera calibration with one-dimensional objects. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 892–899. [Google Scholar] [CrossRef] [PubMed]
Velipasalar, S.; Wolf, W.H. Frame-level temporal calibration of video sequences from unsynchronized cameras. Mach. Vis. Appl. 2008, 19, 395–409. [Google Scholar] [CrossRef]
Noguchi, M.; Kato, T. Geometric and timing calibration for unsynchronized cameras using trajectories of a moving marker. In Proceedings of the 2007 IEEE Workshop on Applications of Computer Vision (WACV’07), Austin, TX, USA, 21–22 February 2007; pp. 1–6. [Google Scholar]
Matsumoto, H.; Sato, J.; Sakaue, F. Multiview constraints in frequency space and camera calibration from unsynchronized images. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1601–1608. [Google Scholar]
Kim, J.-H.; Koo, B.-K. Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object. Opt. Express 2012, 20, 25292–25310. [Google Scholar] [CrossRef] [PubMed]
Benrhaiem, R.; Roy, S.; Meunier, J. Achieving invariance to the temporal offset of unsynchronized cameras through epipolar point-line triangulation. Mach. Vis. Appl. 2016, 27, 545–557. [Google Scholar] [CrossRef]
Piao, Y.; Sato, J. Computing epipolar geometry from unsynchronized cameras. In Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP 2007), Modena, Italy, 10–14 September 2007; pp. 475–480. [Google Scholar]
Kannala, J.; Brandt, S.S. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1335–1340. [Google Scholar] [CrossRef] [PubMed]
Fu, Q.; Quan, Q.; Cai, K.-Y. Calibration of multiple fish-eye cameras using a wand. IET Comput. Vis. 2015, 9, 378–389. [Google Scholar] [CrossRef]
Nister, D. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 756–770. [Google Scholar] [CrossRef] [PubMed]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Lourakis, M.I.A. Computing epipolar geometry from unsynchronized cameras. In Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 43–56. [Google Scholar]
Fu, Q.; Zheng, Z.-L. A robust pose estimation method for multicopters using off-board multiple cameras. IEEE Access 2020, 8, 41814–41821. [Google Scholar] [CrossRef]

Figure 1. Fish-eye camera model.

Figure 2. System schematic.

Figure 3. Flowchart of the proposed camera calibration and 3D localization methods.

Figure 4. Unsynchronized camera image sequences.

Figure 5. Timestamp line segment.

Figure 6. The 1D calibration wand.

Figure 7. System construction.

Figure 8. Image of the camera used in the experiments.

Figure 9. System diagram of fixed-height flight task for the flapping-wing micro air vehicle.

Figure 10. Flapping-wing micro air vehicle.

Figure 11. Fixed-height experiment of 1200 mm.

Figure 12. Fixed-height experiment of 1400 mm.

Table 1. RMS reprojection errors of Cam0 (40 Hz) and Cam1 (70 Hz).

Experiment Number	Proposed (Pixel, Pixel)	[9] (Pixel, Pixel)
1	(1.308,1.731)	(>100, >100)
2	(0.811,1.048)	(>100, >100)
3	(1.285,1.560)	(>100, >100)

Table 2. RMS reprojection errors of Cam0 (60 Hz) and Cam2 (110 Hz).

Experiment Number	Proposed (Pixel, Pixel)	[9] (Pixel, Pixel)
1	(0.836,1.098)	(>100, >100)
2	(3.404,3.340)	(>100, >100)
3	(3.636,4.346)	(>100, >100)

Table 3. RMS reprojection errors of Cam1 (80 Hz) and Cam2 (110 Hz).

Experiment Number	Proposed (Pixel, Pixel)	[9] (Pixel, Pixel)
1	(1.891,2.032)	(>100, >100)
2	(0.684,0.751)	(>100, >100)
3	(0.821,0.794)	(>100, >100)

Table 4. RMS reprojection errors of Cam0 (60 Hz), Cam1 (60 Hz), and Cam2 (110 Hz).

Experiment Number	Cam0 (Pixel)	Cam1 (Pixel)	Cam2 (Pixel)
1	7.780	2.427	2.743
2	6.382	4.734	4.445
3	5.329	2.405	2.650

Table 5. RMS reprojection errors of Cam0 (90 Hz), Cam1 (90 Hz), and Cam2 (110 Hz).

Experiment Number	Cam0 (Pixel)	Cam1 (Pixel)	Cam2 (Pixel)
1	3.850	3.385	2.936
2	5.314	1.066	1.143
3	4.247	2.975	2.539

Table 6. Reconstruction errors of three calibration results.

Error (mm)	60-60-110 HZ	75-75-110 HZ	90-90-110 HZ
1	7.9496	12.1173	8.2264
2	1.3425	18.5684	1.6776
3	4.3363	5.6703	16.1708

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Fu, Q. Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization. Sensors 2024, 24, 284. https://doi.org/10.3390/s24010284

AMA Style

Zhang S, Fu Q. Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization. Sensors. 2024; 24(1):284. https://doi.org/10.3390/s24010284

Chicago/Turabian Style

Zhang, Sujie, and Qiang Fu. 2024. "Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization" Sensors 24, no. 1: 284. https://doi.org/10.3390/s24010284

APA Style

Zhang, S., & Fu, Q. (2024). Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization. Sensors, 24(1), 284. https://doi.org/10.3390/s24010284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wand-Based Calibration of Unsynchronized Multiple Cameras for 3D Localization

Abstract

1. Introduction

2. Materials and Methods

2.1. Preliminaries and Problem Formulation

2.1.1. General Camera Model

2.1.2. Problem Formulation

2.2. Calibration Algorithm

2.2.1. Pre-Processing for Two Cameras

2.2.2. Pre-Processing for Multiple Cameras

2.2.3. Loss of Marker

2.2.4. Multi-Camera Calibration Method

2.3. Three-Dimensional Localization Algorithm

3. Results and Discussion

3.1. System Construction and Experimental Design

3.2. Experimental Result

3.2.1. Calibration Experiment

3.2.2. Localization Experiment

3.2.3. Fixed-Height Experiment

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI