An Enhanced Fusion Strategy for Reliable Attitude Measurement Utilizing Vision and Inertial Sensors

: In this paper, we present a radial basis function (RBF) and cubature Kalman ﬁlter (CKF) based enhanced fusion strategy for vision and inertial integrated attitude measurement for sampling frequency discrepancy and divergence. First, the multi-frequency problem of the integrated system and the reason for attitude divergence are analyzed. Second, the ﬁlter equation and attitude di ﬀ erential equation are constructed to calculate attitudes separately in time series when visual and inertial data are available or when there are only inertial data. Third, attitude errors between inertial and vision are sent to the input layer of RBF for training. After this, through the activation function of the hidden layer, the errors are transferred to the output layer for weighting the sums, and the training model is established. To overcome the problem of divergence inherent in a multi-frequency system, the well-trained RBF, which can output the attitude errors, is utilized to compensate the attitudes calculated by pure inertial data. Finally, semi-physical simulation experiments under di ﬀ erent scenarios are performed to validate the e ﬀ ectiveness and superiority of the proposed scheme in accurate attitude measurements and enhanced anti-divergence capability.


Introduction
Attitude measurement has received considerable attention from the research community due to its significant importance in navigation, positioning, tracking and control [1][2][3][4], where measurement precision and stability are also important requirements. The main purpose of attitude measurement is to dynamically obtain the spatial attitude angles of the measured object using one or more sensors installed onto it in engineering measurement, military, controlling and locating robots [5][6][7].
However, the development of an effective attitude measurement strategy that performs well while ensuring stability and precision is still a significant challenge due to the following important reasons. (i) The single sensor has limited performance due to its own measurement characteristics. (ii) To use an integrated system, the data fusion algorithm, such as the filtering method, can only operate when at least two types of data are available. That is, there is always a sampling frequency discrepancy among heterogeneous sensors. All the aforementioned factors pose multiple handicaps in implementing a high-performance attitude measurement system.
For common single measurement techniques [8,9], such as inertial measurement and visual measurement, their applications are fairly limited due to their inherent measurement characteristics. Focusing on the low cost and widely used micro-electromechanical systems (MEMS) inertial sensor [10,11], its sampling frequency is high, but the measurement errors accumulate quickly with time due to bias error drift and scale factor instability [12][13][14]. The vision measurement can provide relatively stable results. However, compared to the inertial sensor, it has a significantly lower frequency, which may not be sufficient for fast moving object tracking. Thus, the single attitude measurement technique may fail to meet the practical attitude measurement requirements, which promote the development of the fusion measurement system. Considering the advantages and limitations of a single sensor, a vision sensor and an inertial sensor are frequently deployed together where data are fused to enhance measurement performance by fully utilizing the complementary properties of each sensor to overcome the shortcomings inherent with a single sensor. For attitude measurement, the combination of vision and inertial sensors [15] can enhance the accuracy, fault detection and confidence in decision making.
To fuse data from heterogeneous sensors, such as the vision sensor and inertial sensor together, a common strategy is nonlinear Kalman filter (KF) [16], such as the extended KF (EKF) [17], unscented KF (UKF) [18,19], and cubature KF (CKF) [20]. In general applications, the EKF relies only on the first order linearization of nonlinear system models. Thus, the EKF may provide particularly poor performance if the dynamic systems are highly nonlinear. To better treat the nonlinearity, the UKF has been developed to address nonlinear state estimation in the context of the control theory. Unlike the conventional EKF, the UKF can obtain second order accuracy of the posterior mean and covariance. Under the Bayesian filtering framework, the CKF provides a new direction to realize the nonlinear filter [21]. CKF is more preferred over the UKF for its more stable performance. Therefore, CKF is used to fuse inertial and visual data, and its performance is compared with those of other filter methods.
Actually, in a vision and inertial integrated system, the attitude measurement performance does not only depend on the selection of the filtering methods. Other factors may also need to be considered. Among them, the multi-frequency is an important problem in theory and practice. As the sampling frequency of the inertial sensor is high and that of visual sensor is low, an inconsistent sampling frequency always exists in a vision and inertial integrated attitude measurement system [22]. Correspondingly, a degradation of measurement performance is unavoidable in the intersampling of slow vision data. To ensure precision and convergence, special approaches to correct pure inertial attitude errors may need to be developed.
To avoid the attitude divergence phenomenon created by the attitude differential equation when only depending on inertial data during the intersampling of slow visual data, the traditional single frequency data fusion method needs to be improved. Nowadays, the deep learning algorithms are widely used in signal trend prediction to enhance the performance of data fusion in different applications. An artificial neural network [23] is used to enhance the attitude accuracy, which has the characteristic of strong machine learning. Long-short term memory [24] is employed to fuse the inertial and vision data, which can provide high precision for attitude measurement. These methods focus on reducing the attitude error by training the neural networks algorithm when the visual data are available. However, when the visual data are unavailable, these methods cannot ensure the high accuracy of attitude measurement due to the differences in sampling frequency. A convolutional neural network-support vector machine [25] method is used to remove the faults in data fusion, which have impactful effect. A convolutional neural network [26] is proposed to improve the accuracy of data fusion, which is robust and reliable. These methods can effectively fuse data, but they have some limitations in multi-frequency integrated attitude measurements.
To achieve a reliable attitude in a vision and inertial integrated system during visual data unavailability, especially when the interval lasts for a period of time, a data fusion methodology based on radial basis function (RBF) and CKF for enhancing the performance of attitude measurements is developed. The algorithm can reduce the attitude divergence affected by an inconsistent sampling Appl. Sci. 2019, 9,2656 3 of 17 frequency between inertial and visual data. Combining CKF and RBF can fully utilize the advantages of these two techniques. In this way, the problem of divergence resulting from the sampling frequency discrepancy is ameliorated.
The rest of the paper is organized as follows. Section 2 explains the problem formulation and system modeling. Furthermore, the core algorithms of the enhanced fusion strategy based on RBF and CKF is given in Section 3. Experimental testing and results are discussed in Section 4. Finally, Section 5 concludes the article.

Multi-Frequency Problem in Vision and Inertial Heterogeneous Sensor Integrated System
For a heterogeneous sensors integrated system, there is always a sampling frequency discrepancy problem, namely the multi-frequency problem. The sampling frequencies of different sensors are probably inconsistent. The vision and inertial integrated attitude measurement system, is constructed of two heterogeneous sensors, namely the MEMS-based gyroscope and camera. The sampling frequency of the camera is often several or dozens of Hertz while that of the gyroscope is hundreds of Hertz. There is considerable sampling frequency discrepancy between them, as shown in Figure 1.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 17 measurements is developed. The algorithm can reduce the attitude divergence affected by an inconsistent sampling frequency between inertial and visual data. Combining CKF and RBF can fully utilize the advantages of these two techniques. In this way, the problem of divergence resulting from the sampling frequency discrepancy is ameliorated. The rest of the paper is organized as follows. Section 2 explains the problem formulation and system modeling. Furthermore, the core algorithms of the enhanced fusion strategy based on RBF and CKF is given in Section 3. Experimental testing and results are discussed in Section 4. Finally, Section 5 concludes the article.

Multi-Frequency Problem in Vision and Inertial Heterogeneous Sensor Integrated System
For a heterogeneous sensors integrated system, there is always a sampling frequency discrepancy problem, namely the multi-frequency problem. The sampling frequencies of different sensors are probably inconsistent. The vision and inertial integrated attitude measurement system, is constructed of two heterogeneous sensors, namely the MEMS-based gyroscope and camera. The sampling frequency of the camera is often several or dozens of Hertz while that of the gyroscope is hundreds of Hertz. There is considerable sampling frequency discrepancy between them, as shown in Figure 1. Furthermore, the multi-frequency problem can lead to divergence in attitude results in the vision and inertial integrated measurement system. For both available visual and inertial data, CKF deals with the nonlinear filter problem by fusing them together. Inertial measurements are incorporated in the filter loop as inputs, and visual attitudes are calculated out of the filter loop as the observation vectors. Thus, the nonlinear filter model consists of the nonlinear state model and linear observation model [27]. 1 1 1 where k x is the state vector and 1 k u − is the control input from the high frequency angular velocity of gyroscope. k z is the observation vector from low frequency visual measurements. When there are both vision and inertial data, the normal filtering can be used to fuse data together and output the optimal attitude estimation. However, during the intersampling of slow vision data, there is still a significant amount of inertial gyroscope data. To ensure the high frequency of attitude output, the attitude differential equation is used to update attitude angles. The specific formula is given as follows: where Figure 1. Sampling frequency discrepancy between visual and inertial data.
Furthermore, the multi-frequency problem can lead to divergence in attitude results in the vision and inertial integrated measurement system. For both available visual and inertial data, CKF deals with the nonlinear filter problem by fusing them together. Inertial measurements are incorporated in the filter loop as inputs, and visual attitudes are calculated out of the filter loop as the observation vectors. Thus, the nonlinear filter model consists of the nonlinear state model and linear observation model [27]. x where x k is the state vector and u k−1 is the control input from the high frequency angular velocity of gyroscope. z k is the observation vector from low frequency visual measurements. H k is the measurement matrix, w k−1 and v k are independent zero-mean Gaussian white process noise with covariances Q k−1 and R k . When there are both vision and inertial data, the normal filtering can be used to fuse data together and output the optimal attitude estimation. However, during the intersampling of slow vision data, there is still a significant amount of inertial gyroscope data. To ensure the high frequency of attitude output, the attitude differential equation is used to update attitude angles. The specific formula is given as follows: where is the angular velocity, q is the rotation quaternion parameter, and I is a unit matrix with proper dimensions. To obtain an updated attitude by solving the above formula in an integral operation, the measurement errors can accumulate quickly with time due to bias error drift by the MEMS gyroscope. There are two reasons for this. The existing manufacturing level of the general MEMS sensor has too much bias and noise contained in its output signal. Besides, the integral operation means that the bias and noise will inevitably accumulate with time. The two factors makes it difficult to guarantee the convergence during the intersampling of visual data, especially when the interval lasts for a period of time.

System Modeling
The established integrated attitude measurement system consists of a vision module and inertial module as displayed in Figure 2. In the inertial module, the built-in gyroscope of the IMU is used for output angular velocity. Due to the inconsistent sampling frequency between the inertial and visual data, the multi-frequency problem exists in data fusion of the integrated attitude measurement system. Without visual data, the observation noise will trend towards infinity and the Kalman gain will approach zero. It can be easily seen that the integrated attitude system is forced into a pure inertial mode when vision data are unavailable, which may result in divergence during the intersampling of slow visual data. To enhance the fusion performance, a data fusion methodology based on CKF and RBF is developed. When the visual data are available, CKF is employed to fuse the inertial data and visual data. At the same time, the RBF model is trained by attitude errors between inertial and visual data. During the intersampling of vision data, the trained RBF can predict attitude errors and used to compensate the attitude calculated by inertial data. The specific process of the proposed RBF and CKF will be discussed in the next section.
 is the angular velocity, q is the rotation quaternion parameter, and I is a unit matrix with proper dimensions.
To obtain an updated attitude by solving the above formula in an integral operation, the measurement errors can accumulate quickly with time due to bias error drift by the MEMS gyroscope. There are two reasons for this. The existing manufacturing level of the general MEMS sensor has too much bias and noise contained in its output signal. Besides, the integral operation means that the bias and noise will inevitably accumulate with time. The two factors makes it difficult to guarantee the convergence during the intersampling of visual data, especially when the interval lasts for a period of time.

System Modeling
The established integrated attitude measurement system consists of a vision module and inertial module as displayed in Figure 2. In the inertial module, the built-in gyroscope of the IMU is used for output angular velocity. Due to the inconsistent sampling frequency between the inertial and visual data, the multi-frequency problem exists in data fusion of the integrated attitude measurement system. Without visual data, the observation noise will trend towards infinity and the Kalman gain will approach zero. It can be easily seen that the integrated attitude system is forced into a pure inertial mode when vision data are unavailable, which may result in divergence during the intersampling of slow visual data. To enhance the fusion performance, a data fusion methodology based on CKF and RBF is developed. When the visual data are available, CKF is employed to fuse the inertial data and visual data. At the same time, the RBF model is trained by attitude errors between inertial and visual data. During the intersampling of vision data, the trained RBF can predict attitude errors and used to compensate the attitude calculated by inertial data. The specific process of the proposed RBF and CKF will be discussed in the next section.

Basic Filter Process of CKF Algorithm
For multi-frequency fusion issues, an enhanced fusion strategy based on RBF and CKF is mainly discussed in this paper. To start, the basic filter process of CKF is given. Instead of applying the linearization operation on the nonlinear equation, CKF approximates the probability density distribution of nonlinear function using a set of cubature points. Thus, this results in good estimation accuracy with an acceptable computation complexity. Furthermore, in CKF, the cubature points and weights are determined uniquely by the dimension of the state vector so that the complex process of parameter selection can be removed. The filter process is given as follows.
(1) Calculating volume point: where [1] i denotes the column i of [1], and [1] is a symmetric matrix with a dominant diagonal line of 1.
(2) Time update: The estimated attitude and covariance matrix at time k can be described as: (3) Measurement update: Calculating volume point: Using measurement equation to propagate the volume point: The predicted measurement and covariance matrix at time k are in the following forms: Calculating Kalman filter gain: Attitude estimates and covariance estimate of the attitude error at time k are expressed as follows.

RBF Algorithm
The radial basis function neural network is one of the extensively employed deep learning methods, which has good approximation, fast learning convergence and simple structure of feed-forward neural network. There are usually three layers of network structure, as shown in Figure 3 where x i denotes the ith input element. The hidden layer contains m radial basis neurons, whose basis function responds locally to the input. ϕ i (x) is the activation function in hidden layer. When the input is close to the basis function, a large number of outputs will be generated. W i indicates the weight between the hidden layer and output layer. Y is the linear weighted sum of the output of hidden layer neurons.

RBF Algorithm
The radial basis function neural network is one of the extensively employed deep learning methods, which has good approximation, fast learning convergence and simple structure of feed-forward neural network. There are usually three layers of network structure, as shown in Figure 3 where i x denotes the i th input element. The hidden layer contains m radial basis neurons, whose basis function responds locally to the input.
is the activation function in hidden layer. When the input is close to the basis function, a large number of outputs will be generated. i W indicates the weight between the hidden layer and output layer. Y is the linear weighted sum of the output of hidden layer neurons. The input layer includes a set of sensory units, which represents the input of signal source nodes. The hidden layer has the sufficient dimension, which uses the activation functions to map a low dimensional nonlinear separable input to a high dimensional linear separable space. The output layer expresses a linear transformation from the hidden node to the output space. The output can be given by: where 0 w is the bias term and i W is the weight between the hidden layer and output layer. The Gaussian basis function is the most frequently used radial basis function for   (18) where x is the input of the network; || || ⋅ represents the Euclidean distance; μ and σ are the center and width of the Gaussian function, and the two factors is used to adjust the deviation. By analyzing the structure characteristics, the center nodes μ in hidden layer determine the performance of RBF. An appropriate center node in the hidden layer will enhance the learning effect.
Furthermore, the value of μ can be determined by different methods. The commonly used method is the k-means [28] clustering algorithm to obtain a subset of the data points as the RBF centers, which can adjust the deviation of the RBF [29,30]. The specific steps of k-means clustering algorithm are as follows: Step 1: Preset the output precision E of the whole neural network. k nodes in hidden layer are selected arbitrarily from the input sample X. The input layer includes a set of sensory units, which represents the input of signal source nodes. The hidden layer has the sufficient dimension, which uses the activation functions to map a low dimensional nonlinear separable input to a high dimensional linear separable space. The output layer expresses a linear transformation from the hidden node to the output space. The output can be given by: where w 0 is the bias term and W i is the weight between the hidden layer and output layer. The Gaussian basis function is the most frequently used radial basis function for ϕ i (x) with the following function.
where x is the input of the network; ||·|| represents the Euclidean distance; µ and σ are the center and width of the Gaussian function, and the two factors is used to adjust the deviation. By analyzing the structure characteristics, the center nodes µ in hidden layer determine the performance of RBF. An appropriate center node in the hidden layer will enhance the learning effect. Furthermore, the value of µ can be determined by different methods. The commonly used method is the k-means [28] clustering algorithm to obtain a subset of the data points as the RBF centers, which can adjust the deviation of the RBF [29,30]. The specific steps of k-means clustering algorithm are as follows: Step 1: Preset the output precision E of the whole neural network. k nodes in hidden layer are selected arbitrarily from the input sample X.
Step 2: M samples among the X and the width of Gauss function are randomly selected as the center of gauss function in hidden layer and the width of gauss function, respectively.
Step 3: K-means clustering algorithm is employed to average Euclidean distance from each simple to the center node for updating the center nodes and function width.
Step 4: RBF is constructed by using the updated central data and function width, and the weights between the hidden layer and output layer. The output Y is obtained.
Step 5: Calculate the sums of squared errors E(M) by comparing the practical output Y and theoretical output y. If E(M) < E, proceed to step 6. Otherwise, M = M + 1 and return to step 2.
Step 6: Compare E(M) and E(M − 1). If they both dissatisfy the accuracy requirement, M = M − 1 and repeat steps 2 to 5. Otherwise, if E(M − 1) < E, the M is set as the number of center nodes and end loop.
Using the above-described procedures, the center nodes of hidden layer can be determined by the k-means clustering algorithm.

Fusion Strategy Based on RBF and CKF
As the sampling frequency of inertial sensor is high and that of visual sensor is low, an inconsistent sampling frequency exists in the inertial and vision integration attitude measurement system. During the intersampling of vision data, only a time update is performed and there is no measurement update. This is because the measurement error is regarded as infinity, and the Kalman gain approaches zero. The attitude measurement cannot be compensated by the slow measurement vector. At the same time, the accuracy and stability of the filtering are difficult to guarantee. To deal with the multi-frequency issue between the inertial and visual data, a fusion strategy based on RBF and CKF algorithm is developed to improve the performance of the integrated attitude measurement system. This research uses RBF to establish the network model by training attitude errors while the visual data are available and to compensate attitude by predicted errors without visual data. In the process of training, attitude errors are treated as the input of RBF. The errors are transferred to the weighting sum of output layer by the activation function in the hidden layer. In the process of prediction, the well-trained RBF model outputs the estimated errors. The output attitude angles are obtained by CKF and RBF. CKF is used to estimate the attitude by data fusion, when there are both inertial and visual data. Without the vision data, the well-trained RBF and CKF are employed to predict estimation attitude by the data fusion method. By this method, the attitude estimation accuracy and stability can be improved and strengthened. The specific process of the optimal input-output estimation relationship constructed by RBF can be described as shown in Figure 4.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 17 Step 2: M samples among the X and the width of Gauss function are randomly selected as the center of gauss function in hidden layer and the width of gauss function, respectively.
Step 3: K-means clustering algorithm is employed to average Euclidean distance from each simple to the center node for updating the center nodes and function width.
Step 4: RBF is constructed by using the updated central data and function width, and the weights between the hidden layer and output layer. The output Y is obtained. Step

. Fusion Strategy Based on RBF and CKF
As the sampling frequency of inertial sensor is high and that of visual sensor is low, an inconsistent sampling frequency exists in the inertial and vision integration attitude measurement system. During the intersampling of vision data, only a time update is performed and there is no measurement update. This is because the measurement error is regarded as infinity, and the Kalman gain approaches zero. The attitude measurement cannot be compensated by the slow measurement vector. At the same time, the accuracy and stability of the filtering are difficult to guarantee. To deal with the multi-frequency issue between the inertial and visual data, a fusion strategy based on RBF and CKF algorithm is developed to improve the performance of the integrated attitude measurement system.  In our study, the relationship between the past vision data and current attitude angles is investigated to continuously provide estimated error even during intersampling of visual data. It can be concluded from Figure 4a that the training process of RBF goes through when visual sensor works well. The attitude errors are inputs of RBF, and the activation function of hidden layer is used to transfer the error to output layer for weighting sum. The output attitude angles of the system are corrected with the proposed fusion data algorithm. The prediction process of RBF during the intersampling of visual data is shown in Figure 4b. Attitude error is used as an independent input of RBF. The well-trained RBF model can output estimated errors. Inertial data are corrected by the RBF prediction results of output final attitude angles.
The reliable attitude estimation for integrated attitude measurements is achieved by the data fusion algorithm based on RBF and CKF. The proposed methodology can reduce the side-effect of inconsistent sampling frequency. Furthermore, the problem of divergence resulting from the sampling frequency discrepancy between the inertial and vision data are ameliorated and the accuracy and reliability of the estimated attitude are both improved.

Experimental Test and Comparison
A semi-physical simulation platform for attitude measurements is established, as shown in Figure 5, to evaluate the performance of the inertial and vision data fusion algorithm based on RBF and CKF. The camera is fixed onto the bracket, which is installed on the platform. Furthermore, the target and IMU are fixed on the turntable. The target is represented by four non-coplanar infrared LEDs [31][32][33][34][35][36], as shown in Figure 5. An external PC collects the image data and the angular velocity from the camera and a built-in gyroscope of IMU separately. With only four feature points, the workload of image processing and feature matching is greatly reduced, and the computational efficiency of data fusion is largely improved. In this experiment, the specifications of the gyroscope in IMU are given in Table 1.
weighting sum of output layer by the activation function in the hidden layer. In the process of prediction, the well-trained RBF model outputs the estimated errors. The output attitude angles are obtained by CKF and RBF. CKF is used to estimate the attitude by data fusion, when there are both inertial and visual data. Without the vision data, the well-trained RBF and CKF are employed to predict estimation attitude by the data fusion method. By this method, the attitude estimation accuracy and stability can be improved and strengthened. The specific process of the optimal inputoutput estimation relationship constructed by RBF can be described as shown in Figure 4.
In our study, the relationship between the past vision data and current attitude angles is investigated to continuously provide estimated error even during intersampling of visual data. It can be concluded from Figure 4(a) that the training process of RBF goes through when visual sensor works well. The attitude errors are inputs of RBF, and the activation function of hidden layer is used to transfer the error to output layer for weighting sum. The output attitude angles of the system are corrected with the proposed fusion data algorithm. The prediction process of RBF during the intersampling of visual data is shown in Figure 4(b). Attitude error is used as an independent input of RBF. The well-trained RBF model can output estimated errors. Inertial data are corrected by the RBF prediction results of output final attitude angles.
The reliable attitude estimation for integrated attitude measurements is achieved by the data fusion algorithm based on RBF and CKF. The proposed methodology can reduce the side-effect of inconsistent sampling frequency. Furthermore, the problem of divergence resulting from the sampling frequency discrepancy between the inertial and vision data are ameliorated and the accuracy and reliability of the estimated attitude are both improved.

Experimental Test and Comparison
A semi-physical simulation platform for attitude measurements is established, as shown in Figure 5, to evaluate the performance of the inertial and vision data fusion algorithm based on RBF and CKF. The camera is fixed onto the bracket, which is installed on the platform. Furthermore, the target and IMU are fixed on the turntable. The target is represented by four non-coplanar infrared LEDs [31][32][33][34][35][36], as shown in Figure 5. An external PC collects the image data and the angular velocity from the camera and a built-in gyroscope of IMU separately. With only four feature points, the workload of image processing and feature matching is greatly reduced, and the computational efficiency of data fusion is largely improved. In this experiment, the specifications of the gyroscope in IMU are given in Table 1.   In the experiment, to fuse the vision and inertial data, the time synchronization of the sensor fusion system is considered. To guarantee the time synchronization of the whole attitude measurement system, the key is to guarantee the time synchronization of both visual and inertial systems. The program calculagraph starts to obtain the system time at the start of the inertial measurement and visual measurement. The inertial system is comprised of a IMU sensor and the vision system is formed by a camera. When the inertial sensor captures inertial data, there is a time stamp attached to each data. When the vision sensor captures one frame of a picture, the time stamp is recorded with the picture. After this, the time stamps of the inertial system and vision system are matched in the calculation process. When the difference of inertial and visual time stamps is small enough, the mean inertial and vision data are synchronized. Thus, the time synchronization of a sensor fusion system is guaranteed.

Normalization of Coordinate System
The multi-sensor attitude measurement system is composed of inertia and vision sensors, which are expressed in different coordinate systems. To obtain the attitude from the inertial and vision data, the basic requirement is that the two different measurements are converted into a same coordinate system. There is a turntable coordinate system (b system), the camera system (c system), inertial coordinate system (i system) and target coordinate system (t system). The specific normalization among these coordinate frames is depicted in Figure 6.
As our objective is to measure the relative angles between the body coordinate system and one stationary reference frame, such as the camera coordinate system, it can be expressed by rotation matrix R b c (t). The t in the parentheses indicates that the matrix is time-varying. The coordinate system normalization is to transform the inertial and visual measurements into the R b c (t). The mathematical model of rotation transformations from the c system to the b system by visual and inertial quantity can be expressed as follows: where R t c (t) is the rotation matrix between the t system and c system, which is measured by vision. R i c (t) is the rotation matrix between the i system and c system, which can be obtained by inertial measurement. R b t is the rotation matrix between t system and b system and R b i is the rotation matrix between i system and b system. As the IMU and target are both fixed onto the turntable, R b t and R b i can be obtained by calibration.
systems. The program calculagraph starts to obtain the system time at the start of the inertial measurement and visual measurement. The inertial system is comprised of a IMU sensor and the vision system is formed by a camera. When the inertial sensor captures inertial data, there is a time stamp attached to each data. When the vision sensor captures one frame of a picture, the time stamp is recorded with the picture. After this, the time stamps of the inertial system and vision system are matched in the calculation process. When the difference of inertial and visual time stamps is small enough, the mean inertial and vision data are synchronized. Thus, the time synchronization of a sensor fusion system is guaranteed.

Normalization of Coordinate System
The multi-sensor attitude measurement system is composed of inertia and vision sensors, which are expressed in different coordinate systems. To obtain the attitude from the inertial and vision data, the basic requirement is that the two different measurements are converted into a same coordinate system. There is a turntable coordinate system (b system), the camera system (c system), inertial coordinate system (i system) and target coordinate system (t system). The specific normalization among these coordinate frames is depicted in Figure 6.
As our objective is to measure the relative angles between the body coordinate system and one stationary reference frame, such as the camera coordinate system, it can be expressed by rotation matrix R t is the rotation matrix between the t system and c system, which is measured by vision.  The rotation transformations between two coordinate systems can be expressed by three times of rotation, which can be indicated by rotating around x, y and z axes with corresponding α, β and θ angles, as displayed in Figure 7. In this way, the inertial and visual measurements are converted to express the motion of the object turntable, which prepares for further data fusion.
The rotation transformations between two coordinate systems can be expressed by three times of rotation, which can be indicated by rotating around x , y and z axes with corresponding α , β and θ angles, as displayed in Figure 7. In this way, the inertial and visual measurements are converted to express the motion of the object turntable, which prepares for further data fusion.

Comparison Among Different Methods by Repeated Experiments with Round-trip-step-up Motion
Repeated experiments are conducted to compare the performance among different methods. The motion trajectory of the moving object is designed as displayed in Figure 8, which we called the round-trip-step-up motion. The pitch axis is from -20 degrees to +20 degrees in the round-trip form and the yaw axis is from -20 degrees to +20 degrees in the step-up form.

Comparison Among Different Methods by Repeated Experiments with Round-Trip-Step-up Motion
Repeated experiments are conducted to compare the performance among different methods. The motion trajectory of the moving object is designed as displayed in Figure 8, which we called the round-trip-step-up motion. The pitch axis is from −20 degrees to +20 degrees in the round-trip form and the yaw axis is from −20 degrees to +20 degrees in the step-up form. To provide convincing results, the performance of five different methods are conducted and the attitude angle errors of the pitch angle and yaw angle are shown in Figure 9, where the abscissa is time in seconds and the ordinate represents the pitch and yaw angle errors. It can be directly seen that considering the overall performance of the five methods in the viewpoint of stability and precision, the proposed RBF-CKF method outperforms the EKF, RBF-EKF, CKF and back propagation neural network (BPNN) BP-CKF methods. Besides, during the period when the errors of other method is quite large, such as the 40-60 s for the pitch axis as well as 20-40 s and 40-60 s for the yaw axis, the RBF-CKF can stay steady, which indicates its stability. More importantly, it  To provide convincing results, the performance of five different methods are conducted and the attitude angle errors of the pitch angle and yaw angle are shown in Figure 9, where the abscissa is time in seconds and the ordinate represents the pitch and yaw angle errors. It can be directly seen that considering the overall performance of the five methods in the viewpoint of stability and precision, the proposed RBF-CKF method outperforms the EKF, RBF-EKF, CKF and back propagation neural network (BPNN) BP-CKF methods. Besides, during the period when the errors of other method is quite large, such as the 40-60 s for the pitch axis as well as 20-40 s and 40-60 s for the yaw axis, the RBF-CKF can stay steady, which indicates its stability. More importantly, it indicates the RBF-CKF algorithm can maintain good convergence during the intersampling of slow visual data. the yaw axis, the RBF-CKF can stay steady, which indicates its stability. More importantly, it indicates the RBF-CKF algorithm can maintain good convergence during the intersampling of slow visual data.
The reasons for the above phenomena are that when the visual measurements are available, attitude is calculated by normal filtering to fuse the inertial and visual data together. During the intersampling of the visual data, for the EKF or CKF method, attitude is calculated by inertial data only. For RBF-EKF and BP-CKF methods, the attitude calculated by inertial data is compensated by RBF and BP prediction results before combining the EKF and CKF respectively. Finally, we need to output the final estimated attitudes. Furthermore, for the RBF-CKF method, the attitude angles calculated by inertial data are compensated by RBF prediction results before putting the final estimated attitudes as the output. The optimal estimation of RBF-CKF can be reflected in the pitch and yaw angle errors and it is determined that the performance of RBF-CKF is obviously superior to the other four methods.  Table 2 and Figure 10. It can be clearly seen that RBF-CKF is more stable compared to the other methods. The running time of each method is shown in Table 3. Due to the complex structure of RBFNN as well as the process of establishing the training model and predicting the estimated error, the RBF-CKF running time is the longest among the five methods.  The reasons for the above phenomena are that when the visual measurements are available, attitude is calculated by normal filtering to fuse the inertial and visual data together. During the intersampling of the visual data, for the EKF or CKF method, attitude is calculated by inertial data only. For RBF-EKF and BP-CKF methods, the attitude calculated by inertial data is compensated by RBF and BP prediction results before combining the EKF and CKF respectively. Finally, we need to output the final estimated attitudes. Furthermore, for the RBF-CKF method, the attitude angles calculated by inertial data are compensated by RBF prediction results before putting the final estimated attitudes as the output. The optimal estimation of RBF-CKF can be reflected in the pitch and yaw angle errors and it is determined that the performance of RBF-CKF is obviously superior to the other four methods.
The maximum errors and Root Mean Squared Errors (RMSE) for different methods across five groups of experiments are shown in Table 2 and Figure 10. It can be clearly seen that RBF-CKF is more stable compared to the other methods. The running time of each method is shown in Table 3. Due to the complex structure of RBFNN as well as the process of establishing the training model and predicting the estimated error, the RBF-CKF running time is the longest among the five methods.

Performance of the Round-trip Motion
Another two repeated experiments with longer running times were conducted to confirm the performance of the proposed RBF-CKF method. The motion trajectory of the moving object is

Performance of the Round-trip Motion
Another two repeated experiments with longer running times were conducted to confirm the performance of the proposed RBF-CKF method. The motion trajectory of the moving object is shown in Figure 11, which has a round-trip motion. The pitch and yaw axes range from 0 • to +40 • in round-trip form. Two groups of experiments were carried out and the attitude angle errors of pitch and yaw angles are shown in Figure 12. The optimal estimation of RBF-CKF can be displayed in pitch and yaw angle errors with long data. It is clear that the performance of RBF-CKF is superior to the other four methods with long data. During the period when the errors of the other four methods are relatively large, such as 40-60 s and 120-140 s for the pitch axis as well as 20-40 s and 100-120 s for the yaw axis, the RBF-CKF remains stable. Actually, RBF-CKF possesses the good convergence during the intersampling of vision data.

Performance of the Round-trip Motion
Another two repeated experiments with longer running times were conducted to confirm the performance of the proposed RBF-CKF method. The motion trajectory of the moving object is shown in Figure 11, which has a round-trip motion. The pitch and yaw axes range from 0° to +40° in round-trip form. Two groups of experiments were carried out and the attitude angle errors of pitch and yaw angles are shown in Figure 12. The optimal estimation of RBF-CKF can be displayed in pitch and yaw angle errors with long data. It is clear that the performance of RBF-CKF is superior to the other four methods with long data. During the period when the errors of the other four methods are relatively large, such as 40-60s and 120-140s for the pitch axis as well as 20-40s and 100-120s for the yaw axis, the RBF-CKF remains stable. Actually, RBF-CKF possesses the good convergence during the intersampling of vision data.   The maximum errors and RMSE for different methods across two groups of experiments are shown in Table 4 and Figure 13. Even after lengthening the experimental data and time, the performance of RBF-CKF method is better than the other four methods. The maximum errors and RMSE for different methods across two groups of experiments are shown in Table 4 and Figure 13. Even after lengthening the experimental data and time, the performance of RBF-CKF method is better than the other four methods.    The maximum errors and RMSE for different methods across two groups of experiments are shown in Table 4 and Figure 13. Even after lengthening the experimental data and time, the performance of RBF-CKF method is better than the other four methods.

Application and Development Prospects of the Research
In our study, the attitude measurement technique is applied to the helmet tracking system, which is an indoor attitude measurement with high accuracy. The attitude measurement of the object is obtained by capturing four feature points installed on the moving object with a camera. The above measurement method possesses practicality and operability in indoor close-range attitude measurement and achieves this high accuracy, especially when applied to the helmet tracking system.
In the next step of research, experimental forms will be improved by adding the groups of infrared spots in a helmet to extend the number of feature points, which is shown in Figure 14. In Figure 14, four feature points are considered as one group and there are multiple groups in a helmet for attitude measurement. The final objective is to obtain highly accurate, stable, applicable and operational attitude measurements, which can be applied in the helmet tracking system. object is obtained by capturing four feature points installed on the moving object with a camera. The above measurement method possesses practicality and operability in indoor close-range attitude measurement and achieves this high accuracy, especially when applied to the helmet tracking system.
In the next step of research, experimental forms will be improved by adding the groups of infrared spots in a helmet to extend the number of feature points, which is shown in Figure 14. In Figure 14, four feature points are considered as one group and there are multiple groups in a helmet for attitude measurement. The final objective is to obtain highly accurate, stable, applicable and operational attitude measurements, which can be applied in the helmet tracking system.

Conclusions
An enhanced data fusion strategy based on RBF and CKF has been developed for attitude estimation in a vision and inertial integrated system with consideration of sampling frequency discrepancy and divergence. By integrating the technique of RBF into the normal filtering framework of CKF, the proposed reliable attitude measurement not only can reduce the problem of divergence inherent in the intersampling of slow vision data when calculated by only inertial data but also can ensure the stability of the high frequency output of attitude angles. More importantly, in contrast to the existing compensation techniques that only rely on the past information of single information, the present scheme makes full use of the past information of inertial and vision information. Comparisons and extensive experiments under different scenarios illustrate that reliable attitude estimation performance and enhanced convergence capability are achieved by the proposed approach.

Conclusions
An enhanced data fusion strategy based on RBF and CKF has been developed for attitude estimation in a vision and inertial integrated system with consideration of sampling frequency discrepancy and divergence. By integrating the technique of RBF into the normal filtering framework of CKF, the proposed reliable attitude measurement not only can reduce the problem of divergence inherent in the intersampling of slow vision data when calculated by only inertial data but also can ensure the stability of the high frequency output of attitude angles. More importantly, in contrast to the existing compensation techniques that only rely on the past information of single information, the present scheme makes full use of the past information of inertial and vision information. Comparisons and extensive experiments under different scenarios illustrate that reliable attitude estimation performance and enhanced convergence capability are achieved by the proposed approach.