^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This paper describes the development of a modified Kalman filter to integrate a multi-camera vision system and strapdown inertial navigation system (SDINS) for tracking a hand-held moving device for slow or nearly static applications over extended periods of time. In this algorithm, the magnitude of the changes in position and velocity are estimated and then added to the previous estimation of the position and velocity, respectively. The experimental results of the hybrid vision/SDINS design show that the position error of the tool tip in all directions is about one millimeter RMS. The proposed Kalman filter removes the effect of the gravitational force in the state-space model. As a result, the resulting error is eliminated and the resulting position is smoother and ripple-free.

It is well known that inertial navigation sensors have drifts. There are two components in the inertial sensor drift: bias stability and bias variability. These components are involved in double integration in position calculation; so after a while, the output of the Inertial Navigation System (INS) is not reliable. Since these factors are involved in the inertial navigation computing task, they cause unavoidable drift in orientation and position estimation. Removing the drift of inertial navigation systems requires that the sensors be assisted with other resources or technologies such as Global Positioning Systems (GPS) [

The use of Kalman filters is a common method used in the data fusion technique. The Kalman filter is a powerful method for improving the output estimation and reducing the effect of sensor drift. However, sensor integration is based on Kalman filtering, but different types of Kalman filters are being developed in this area [

In the past, the three-dimensional attitude representations were applied, but these representations are singular or discontinuous for certain attitudes [

In aided inertial motion tracking applications, the state variables of a Kalman filter usually take one of two forms: first, the sensed engineering quantities, that is acceleration, velocity, and attitude,

A Kalman filter that operates on the error states is called an indirect or a complementary Kalman filter. The optimal estimates of the errors are then subtracted from the sensed quantities to obtain the optimal estimates. Since the 1960s, the complementary Kalman filter has become the standard method of integrating non-inertial with inertial measurements in aviation and missile navigation. This method requires dynamic models for both the navigation variable states and the error states [

This research develops an EKF which offers the estimation of the changes in the state variables. Then the current estimated values of changes in the variables are added to the previous estimation values of the position and velocity, respectively. According to the general equations of the SDINS, the constant value of the gravitational force is removed from the resulted equations and the resulting error from the uncertainty value of the gravitational force is eliminated.

Inertial navigation systems are typically employed to provide the present position and heading of a moving vehicle with respect to the known and fixed reference frame. An inertial navigation system localizes the vehicle by measuring the linear and angular components of its motion using inertial sensors and with knowing the initial value of its position and attitude.

Since Microelectromechanical System (MEMS) techniques provide the opportunity to manufacture miniature inertial sensors inexpensively, this has led to the development of Strapdown Inertial Navigation Systems (SDINS) for new applications such as medicine [

In SDINS, MEMS-based inertial sensors are mounted rigidly on the body of a moving object [

Motion analysis of a moving rigid body provides a set of equations determining its trajectory, speed, and attitude. Since the Inertial Measurement Unit (IMU) measures the inertial components of the connected point, the acceleration and velocity of other points on the body can be computed relatively. As shown in

As the tool is rotating and translating, the body reference frame shown in _{A}_{B}_{B/A}_{B/A}_{b}_{B/A}_{b}

Since both point A and point B are located on the tool and moving along the tool, the relative acceleration and velocity of point B with respect to A is zero, and

In order to transform the acceleration of the tool tip from the body frame into the North-East-Down (NED) frame [

The transformation matrix
_{1} _{2} _{3} _{4}]^{T}

Therefore

Moreover, the gravity compensation is required since the accelerometers measure the local gravitational force. As a result, the acceleration of point B with respect to the navigation frame is calculated as:
_{A}_{n}^{T}

As a result, the state space equations of the system can be finalized as:
^{n}^{n}

The navigation frame and the body frame shown in _{0}^{−5} rad/s [

In this research, a vision system is proposed which includes four CCD cameras located on an arc to expand the domain of the field of view, see

The calibration of the vision system provides the intrinsic and extrinsic parameters of the cameras [

The camera lens distortion causes two radial and tangential displacements [^{2} = (^{2} + (^{2}.

Considering two vectors α and β as the radial and tangential distortion factors of a camera, the distortions can be calculated as [

Consequently, the projection of each point in the world coordinate system into the image plane is:
_{1} and _{2} denote the focal length factors of the lens. In fact, _{1} and _{2} are related to the focal length and the dimension of the pixels:
_{px}_{py}_{u}

According to the camera model obtained in _{u}

Applying the parameter estimation method [

In order to localize the tool tip, the edge detection and boundary extraction must be applied to every single frame from each camera. Obtaining the edge of the tool tip requires applying a thresholding technique. Each pixel is detected as an edge if its gradient is greater than the threshold. In this paper, the threshold is chosen as the boundary pixels of the tool tip are detected as the edge positions. Since the size of the tool tip is about a few pixels, then an adaptive thresholding technique is applied to remove the noise pixels around the tool tip as much as possible. For this purpose, a masking window is chosen around the initial guess of the position of the tool tip. Then, a fixed threshold is chosen which select pixels that their value is above the 80% of the value of all pixels of the image. If the boundary detection technique can identify the boundary of the tool tip, then it shows that the threshold selection is appropriate. Otherwise, the previous threshold is reduced by 5%, and this procedure is run recursively to find the proper threshold. Afterwards, the opening morphologic operation followed by closing operation is applied to simplify and smooth the shape of the tool tip. Finally, the boundary of the tool tip can be detected and extracted by using the eight-connected neighbors’ technique.

The integrated navigation technique employs two or more independent sources of navigation information with complementary characteristics to achieve an accurate, reliable, and low-cost navigation system.

Integration of SDINS and vision system using EKF.

Typically, Extended Kalman Filter (EKF) is applied by combining two independent estimates of a nonlinear variable [

Since the measurements are practically provided at discrete intervals of time, it is appropriate to express the system modeling in the form of discrete differential equations:

Therefore the two set of equations involving the prediction and updating of the state of the system are defined as:

According to _{i}

As a consequence, the computation of the velocity is independent of the gravitational force in the new state-space model. In fact, the error caused by inaccurate value of the gravitational force in the new state-space model is completely eliminated.

The inertial sensor noise is theoretically modeled with a zero-mean Gaussian random process. In practice, the average of the noise is not absolutely zero. Due to the inherent characteristic of the Gaussian random process, the discrete difference of a zero-mean Gaussian random process is also a zero-mean Gaussian random process with very lower actual mean while its variance is twice of the variance of the original process. As a result, the drift resulting from the input noise is reduced and a smooth positioning is expected.

The equation of the INS with the state vector ^{T}

Subsequently, the transition matrix [

By considering Δ_{1} Δ_{2} Δ_{3}]^{T}

Substituting

Therefore:

As a result of Equation (106):

Because the vision system as the measurement system provides the position of the tool tip, velocity can be computed by knowing the present and the previous position at each time step:
_{v}

This section presents the experimental hardware setup and the result of applying the proposed EKF. The experimental hardware includes a 3DX-GX1 IMU from Microstrain, an IDS Falcon Quattro PCIe frame grabber from IDS Imaging Development Systems, and four surveillance IR-CCD cameras. The IMU contains three rate gyros and three accelerometers with a sampling rate of 100 Hz and with a noise density of 3.5 °/

All cameras are connected through the frame grabber to a PC, which includes four parallel video channels able to capture images from four cameras simultaneously with a sampling rate of 20 fps. Since the multi-camera vision system is used as a measurement system, the camera calibration procedure must be performed primary. The intrinsic and extrinsic parameters of each camera are listed in

Once the calibration is completed, the vision system is ready to track the tool and measure the position of the tool tip by applying image processing techniques.

It should be mentioned that a predesigned path is printed on the 2D plane and it is tried to be traced by the tool tip during its movement on the plane in order to compare the performance of proposed EKF and with the performance of the conventional EKF reported in [

The sensor fusion techniques allow us estimating the states variables of the system at the sampling rate of the sensor with the highest measurement rate. In this experiment, the sampling rate of cameras and inertial sensors are 20 fps and 100 Hz. As a result of sensor fusion, the measurement rate of the proposed integrated system is 100 Hz.

The classical EKF is applied in both switch and continues modes. In the switch mode, the estimation of the states variables is corrected whenever the measurement of the vision system is available. Otherwise, the states are estimated only based on the SDINS. In order to reduce the computational complexity of image processing algorithms, sensor fusion allows that the sampling rate of the vision system can be reduced to 10 fps and 5 fps. As illustrated in

Although, it is shown in

The position resulting from switch EKF is crinkly due to the drift position in the SDINS and the wrinkles are amplified by decreasing the measurement rate of the cameras. The position estimated by the proposed EKF is smooth and ripple-free and this method tries to reduce the errors of the entire system compared with the predesigned path. As a result, the proposed EKF is suitable for the higher measurement rate; while the continuous EKF is recommended for the lower sampling rate. However, the error of inertial sensors resulting from noise and the common motion-dependent errors are compensated, but the remaining errors cause the position error estimation in the integrated system. In addition, the video tracking errors lead to the position estimation error as well.

This paper describes the use of the EKF to develop integration of the multi-camera vision system and inertial sensors. The sensor fusion techniques allow estimation of the state variables at the sampling rate of the sensor with the highest measurement rate. This helps to reduce the sampling rate of the sensors with high computational load.

The classical EKF is designed for nonlinear dynamic systems such as the strapdown inertial navigation system. The performance of the classical EKF is reduced by lowering the sampling rate of the cameras. When the sampling rate of the cameras is reduced, the rate of updating decreases and the system must rely more on the inertial sensors output for estimating the position. Because of the drift in the SDINS, the position error increases.

The modified EKF is proposed to obtain position estimation with less error. Furthermore, it removes the effect of the gravitational force in the state-space model. In fact, the error resulting from inaccuracy in the evaluation of the gravitational force is eliminated in the state-space model. In addition, the estimated position is smooth and ripple-free. However; the proposed EKF is not convincing at the lower measurement rate. The error of the estimated position results from inertial sensor errors, uncompensated common motion-dependent errors, attitude errors, video tracking errors, and unsynchronized data.

Hand-held tool and assigned reference frames.

Experimental setup for the multi-camera vision system.

Camera imaging model.

Tool tip tracking by Camera #1.

Estimated position by applying different estimation method: continuous EKF (left), Switch EKF (center), and proposed EKF (right); when the sampling rate of the cameras is 16 fps.

Estimated position by applying different estimation method: continuous EKF (left), Switch EKF (center), and proposed EKF (right); when the sampling rate of the cameras is 5 fps.

Intrinsic and extrinsic parameters.

X: 400.69 pixels |
X: 398.51 pixels |
X: 402.00 pixels |
X: 398.74 pixels | |

X: 131.12 pixels |
X: 152.74 pixels |
X: 144.77 pixels |
X: 136.90 pixels | |

_{r,x}_{r,y}_{t,x}_{t,y} |
_{r,x}_{r,y}_{t,x}_{t,y} |
_{r,x}_{r,y}_{t,x}_{t,y} |
_{r,x}_{r,y}_{t,x}_{t,y} | |

1.552265 |
0.4686021 |
0.6128003 |
1.537200 | |

729.4870 mm |
385.2578 mm |
−61.1933 mm |
−365.5847 mm |

Positions estimated by different estimation methods are compared with the position estimated by the multi-camera vision system.

16 fps | 0.9854 | 0.1779 | 1.0076 | 0.7851 | 0.4320 | 0.1386 |

10 fps | 1.0883 | 0.3197 | 1.2147 | 0.8343 | 0.5658 | 0.2149 |

5 fps | 1.4730 | 1.5173 | 1.3278 | 0.8755 | 0.7257 | 0.8025 |