Modeling and Compensation of Random Drift of MEMS Gyroscopes Based on Least Squares Support Vector Machine Optimized by Chaotic Particle Swarm Optimization

MEMS (Micro Electro Mechanical System) gyroscopes have been widely applied to various fields, but MEMS gyroscope random drift has nonlinear and non-stationary characteristics. It has attracted much attention to model and compensate the random drift because it can improve the precision of inertial devices. This paper has proposed to use wavelet filtering to reduce noise in the original data of MEMS gyroscopes, then reconstruct the random drift data with PSR (phase space reconstruction), and establish the model for the reconstructed data by LSSVM (least squares support vector machine), of which the parameters were optimized using CPSO (chaotic particle swarm optimization). Comparing the effect of modeling the MEMS gyroscope random drift with BP-ANN (back propagation artificial neural network) and the proposed method, the results showed that the latter had a better prediction accuracy. Using the compensation of three groups of MEMS gyroscope random drift data, the standard deviation of three groups of experimental data dropped from 0.00354°/s, 0.00412°/s, and 0.00328°/s to 0.00065°/s, 0.00072°/s and 0.00061°/s, respectively, which demonstrated that the proposed method can reduce the influence of MEMS gyroscope random drift and verified the effectiveness of this method for modeling MEMS gyroscope random drift.


MEMS (Micro Electro
Mechanical System) gyroscopes have the advantages of being small in size, lightweight, low cost, vibration resistant, and so on. Therefore, MEMS gyroscopes have been used widely in civil and military fields such as automobiles, UAVs (unmanned aerial vehicles) and weapon guidance systems [1,2]. However, compared to high precision laser gyroscopes or fiber optic gyroscopes, the accuracy of the MEMS gyroscope is low and the random drift has nonlinear and non-stationary characteristics due to the limitation of the current material processing technology [3][4][5]. These disadvantages make the MEMS gyroscopes still inapplicable in many high-precision fields. Therefore, it is of great significance for modeling to compensate for MEMS gyroscope random drift.
Presently, the common methods for modeling MEMS gyroscope random drift are two classes, one is the statistical modeling method represented by traditional time series analysis, and the other is the intelligence algorithm represented by ANNs (artificial neural networks) for modeling the MEMS gyroscope random drift [6][7][8]. The time series model of random drift assumes that MEMS gyroscope measurement noise is a linear combination of historical data and historical white noise, and the ARMA (auto regressive and moving average) model for drift modeling is often used [9]. However, this method assumes the random drift is stationary which will inevitably limit the range of application and the prediction accuracy of the model [3]. Some scholars put forward reducing the adverse effect of the MEMS gyroscope random drift using a wavelet analysis method, since wavelet transform can subdivide the specified signal in the frequency domain and time domain simultaneously to obtain more details. Conversely, Fourier transform can only be used in the frequency domain analysis, so wavelet transform has a better signal processing ability [10,11].
Through the development of intelligent algorithms, there are more ways to model and compensate the random drift of MEMS gyroscope. The ANNs are common intelligent algorithms, and have achieved good results in random drift modeling [12][13][14]. While ANNs can theoretically approximate any nonlinear function, there may be overfitting problems during training, and the number of nodes in the hidden layer depends on experience and lacks theoretical guidance [15,16]. The LSSVM (least squares support vector machine) proposed by Suyken is one of the most important achievements of statistical learning theory; compared with the standard SVM (support vector machine) algorithm, the LSSVM follows the same principle of SRM (structural risk minimization) and has the advantage of reducing the computational complexity, improving the calculation speed and the anti-interference ability; compared to the ANNs that have attracted much attention in the nonlinear field, this algorithm is not prone to fall into local optimization and has better generalization performance, especially if there are not enough learning samples [17][18][19]. Some scholars had proposed to use the LSSVM to predict MEMS gyroscope random drift [20,21]. However, too many data were used according to the collection order for prediction, and there was no particularly clear data construction standard and the dimension of the input vector was high, thus leading to computational complexity.
This paper proposes to process the raw data of MEMS gyroscopes with wavelet filtering and PSR (phase space reconstruction), then model the reconstructed data based on LSSVM, and using CPSO (chaotic particle swarm optimization) to optimize the parameters of LSSVM. This algorithm can be abbreviated as CPSO-LSSVM. To the best of the authors' knowledge, it is the first report on the combination of PSR and CPSO-LSSVM for modeling and compensating the random drift of MEMS gyroscope.
The purpose of this paper is to establish a more accurate MEMS gyroscope random drift model based on the CPSO-LSSVM method; to do so, the steps of building the model are described in detail, including dealing with the original data, reconstructing the data and analyzing and modeling the reconstructed data. The comparison with BP-ANN (back propagation artificial neural network) has verified that this algorithm has a better effect.
The structure of this paper is organized as follows: The principle of BP-ANN and the proposed method, that is, CPSO-LSSVM, are clearly described in Section 2; the steps of constructing an MEMS gyro drift model are discussed in Section 3; experiments and results of BP-ANN and CPSO-LSSVM for modeling the gyroscope random drift are illustrated in Section 4. Finally, Section 5 draws the conclusion and ends the paper.

The Principles of Algorithms
Artificial neural networks, ANNs, and LSSVMs (least square support vector machines) are two kinds of machine learning algorithms that can achieve nonlinear mapping. Both ANNs and LSSVMs have been applied to analyze data that are not mathematically modeled easily. The principles of two algorithms are described in this section.

Back Propagation Artificial Neural Networks
Artificial neural networks (ANNs) simulating biological neural systems have a powerful learning function. The weighted sum of inputs arriving at each neuron generates an output signal by means of an activation function [22][23][24]. The BP-ANN (back propagation artificial neural network) is one of the most employed ANN methods and has been widely used in many applications. It does not require a rigorous mathematic model and can obtain calibration parameters for data through a learning step [25]. Good flexibility makes it widely used to express nonlinear relationships in databases [26].
The BP-ANN consists of three types of layers which are the input layer, hidden layer and output layer, and its classical architecture is revealed in Figure 1. The nodes of input layer are used to select independent variables for estimation. The hidden layer is important and has an impact on the learning ability of the BP-ANN. The output layer is used to estimate the result depending on the independent input variables. learning step [25]. Good flexibility makes it widely used to express nonlinear relationships in databases [26]. The BP-ANN consists of three types of layers which are the input layer, hidden layer and output layer, and its classical architecture is revealed in Figure 1. The nodes of input layer are used to select independent variables for estimation. The hidden layer is important and has an impact on the learning ability of the BP-ANN. The output layer is used to estimate the result depending on the independent input variables.
Input layer Hidden layer Output layer The input of neurons in the hidden layer can be described as: where is the th neuron input in the hidden layer, is the connection weight between the th neuron in the input layer and the th neuron in the hidden layer, denotes the th neuron input in the input layer while is the number of neurons in the input layer, and is the threshold of the th neuron in the hidden layer.
The sigmoid function presented in (2) is the activation function of the hidden layer: The output of the neuron in the hidden layer can be formulated as (3) with the sigmoid function shown in (2): where is the th neuron output in the hidden layer, and is the same as for (1). The activation function in the output layer is the same as that in the hidden layer, and the weight and threshold of the neuron in the output layer are also constantly adjusted to approximate the desired output value. Figure 2 is the scheme diagram of the BP-ANN illustrating the neural network architecture, where is the connection weight of each node, b is a threshold, and g and f are the activation function shown in (2).
However, it must be pointed out that ANNs follow the principle of ERM (empirical risk minimization) and might result in over-fit toward the employed dataset [27]. In particular, when the amount of data is not too large, some algorithms must be applied to improve ANNs or other machine learning algorithms should be considered to avoid over-fit. The input of neurons in the hidden layer can be described as: where v i is the ith neuron input in the hidden layer, w ij is the connection weight between the ith neuron in the input layer and the jth neuron in the hidden layer, x j denotes the jth neuron input in the input layer while n 0 is the number of neurons in the input layer, and b i is the threshold of the ith neuron in the hidden layer. The sigmoid function presented in (2) is the activation function of the hidden layer: where x is the independent variable. The output of the neuron in the hidden layer can be formulated as (3) with the sigmoid function shown in (2): where x i is the ith neuron output in the hidden layer, and v i is the same as for (1). The activation function in the output layer is the same as that in the hidden layer, and the weight and threshold of the neuron in the output layer are also constantly adjusted to approximate the desired output value. Figure 2 is the scheme diagram of the BP-ANN illustrating the neural network architecture, where W is the connection weight of each node, b is a threshold, and g and f are the activation function shown in (2).
However, it must be pointed out that ANNs follow the principle of ERM (empirical risk minimization) and might result in over-fit toward the employed dataset [27]. In particular, when the amount of data is not too large, some algorithms must be applied to improve ANNs or other machine learning algorithms should be considered to avoid over-fit.

LSSVM Model with CPSO
The LSSVM, one of the new soft computing learning algorithms, was developed by Suyken [17]. The LSSVM is performed by solving a linear system of equations instead of quadratic programming as for the standard SVM case [28]. Since LSSVMs have SRM of the standard SVMs and a faster computation speed, LSSVMs have been successfully applied to pattern recognition, fault diagnosis and function estimation [29][30][31]. The principle of LSSVMs is shown in Figure 3.

Input space
Feature space Kernel function

Suppose a training dataset {
, where is the n-dimensional input vector, ∈ , and is the responding desired output, ∈ R. The training data can fit with the function represented as: where ∅( ) is a nonlinear mapping between input space and high-dimensional feature space, denotes a weight vector, and b denotes a constant offset.
where is the introduced error variable, and is an adjustable parameter. The corresponding Lagrange for Equation (5) is: The optimality condition of Equation (6) leads to the following linear system: where Ω = Φ( , ) is the kernel function and has optional kernels such as Gaussian kernel, polynomial kernel, B-spine kernel, and so on. A Gaussian kernel is usually chosen due to its simple, efficient and reliable computing power [32]. The Gaussian kernel can be written as follows: where represents the kernel parameter, and and are two independent vectors in the input space.
However, the LSSVM's parameters are uncertain for different problems and must be carefully determined because this will affect the modeling effect. It was proposed to use CPSO (chaotic particle swarm optimization) to optimize the two parameters of LSSVM. What needs to be noted is that CPSO

LSSVM Model with CPSO
The LSSVM, one of the new soft computing learning algorithms, was developed by Suyken [17]. The LSSVM is performed by solving a linear system of equations instead of quadratic programming as for the standard SVM case [28]. Since LSSVMs have SRM of the standard SVMs and a faster computation speed, LSSVMs have been successfully applied to pattern recognition, fault diagnosis and function estimation [29][30][31]. The principle of LSSVMs is shown in Figure 3.

LSSVM Model with CPSO
The LSSVM, one of the new soft computing learning algorithms, was developed by Suyken [17]. The LSSVM is performed by solving a linear system of equations instead of quadratic programming as for the standard SVM case [28]. Since LSSVMs have SRM of the standard SVMs and a faster computation speed, LSSVMs have been successfully applied to pattern recognition, fault diagnosis and function estimation [29][30][31]. The principle of LSSVMs is shown in Figure 3.

Input space
Feature space Kernel function

Suppose a training dataset {
, where is the n-dimensional input vector, ∈ , and is the responding desired output, ∈ R. The training data can fit with the function represented as: where ∅( ) is a nonlinear mapping between input space and high-dimensional feature space, denotes a weight vector, and b denotes a constant offset. The and b can be estimated through minimization of regularized risk function subjected to the equality constraint shown in (5): where is the introduced error variable, and is an adjustable parameter. The corresponding Lagrange for Equation (5) is: The optimality condition of Equation (6) leads to the following linear system: where Ω = Φ( , ) is the kernel function and has optional kernels such as Gaussian kernel, polynomial kernel, B-spine kernel, and so on. A Gaussian kernel is usually chosen due to its simple, efficient and reliable computing power [32]. The Gaussian kernel can be written as follows: where represents the kernel parameter, and and are two independent vectors in the input space.
However, the LSSVM's parameters are uncertain for different problems and must be carefully determined because this will affect the modeling effect. It was proposed to use CPSO (chaotic particle swarm optimization) to optimize the two parameters of LSSVM. What needs to be noted is that CPSO Suppose a training dataset {x i y i } n i , where x i is the n-dimensional input vector, x i ∈ R n , and y i is the responding desired output, y i ∈ R. The training data can fit with the function represented as: where ∅(x) is a nonlinear mapping between input space x and high-dimensional feature space, w denotes a weight vector, and b denotes a constant offset. The w and b can be estimated through minimization of regularized risk function subjected to the equality constraint shown in (5): where e i is the introduced error variable, and γ is an adjustable parameter. The corresponding Lagrange for Equation (5) is: The optimality condition of Equation (6) leads to the following linear system: where Ω ij = Φ x i , x j is the kernel function and has optional kernels such as Gaussian kernel, polynomial kernel, B-spine kernel, and so on. A Gaussian kernel is usually chosen due to its simple, efficient and reliable computing power [32]. The Gaussian kernel can be written as follows: where σ 2 represents the kernel parameter, and x i and x j are two independent vectors in the input space. However, the LSSVM's parameters are uncertain for different problems and must be carefully determined because this will affect the modeling effect. It was proposed to use CPSO (chaotic particle swarm optimization) to optimize the two parameters of LSSVM. What needs to be noted is that CPSO is an improved algorithm for PSO (particle swarm optimization). Similar to GA (genetic algorithms) and evolutionary programming, PSO is also a heuristic searching method, but it does not contain complex mechanisms such as crossover or mutation [33]. The PSO algorithm is fast and suitable for parallel computing. Since chaotic mapping has deterministic, ergodic and stochastic properties, chaos mapping is introduced into PSO, which is called CPSO [34,35]. During calculation, the particle moving in space is influenced by three factors, including the particle's current velocity v(t), the best point p id where the particle has arrived before and optimum point of the community p gd . Concurrently, the three factors are respectively assigned to a random weight. The velocity and position are updated on the basis of Equations (9) and (10): where c 1 and c 2 denote the learning factors, usually, c 1 = c 2 = 2; ω is the weight; r 1 , r 2 are the random numbers within [0, 1]; z denotes the current position of the particle; t is the number of iterations; n and m, respectively, represent the number of particles and the dimension of the particle. The CPSO process can be described as follows: Step 1. Chaos Initialization: A random vector defined as z = [z 1 , z 2 , . . . , z D ] is generated, D denotes the dimension of the variable that needs to be optimized. The range of each component of z is [0, 1]. Then, M components are obtained according to Equation (11).
The chaotic interval is mapped to the range of variables based on Equation (12): where b j and a j are the upper and lower limits of the optimization variables, respectively. Then, calculating the fitness value of each particle according to the objective function shown in Equation (13): where n represents the sample size of the training dataset. The output training set is y i andŷ i is the corresponding fitting result of y i . The N particles with better performance are selected as the initial solution from the initial particle swarm with M particles, and the particle velocity is generated at random.
Defining the current position of the particle is individual best, then calculating the corresponding fitness value and setting the position of the particle whose fitness value is the best is the global best point. The global best corresponds to the minimum fitness value.
Step 2. Velocity and Position Updating: The velocity and position of ith particle in the dth dimension is updated according to Equations (9) and (10), with the global best and individual best. The parameter ω is updated by the Equation (14): where ω max and ω min are the maximum and minimum values of initial weights which can be set as 0.9 and 0.1, respectively, t denotes the current number of iterations, while t max is the maximum number of iterations set.
Step 3. Individual and Global Best Updating: Calculating the fitness value of each particle and evaluating this particle according to its updated position. When the fitness value of this particle is less than that of individual or global best, then individual or global best will be replaced by the position of this particle.
Step 4. Stopping Criteria: If one stopping condition is satisfied then stop; otherwise go to Step 2. Based on descriptions above, the flow chart of CPSO-LSSVM is shown in Figure 4: Sensors 2017, 17, 2335 6 of 15 Step 3. Individual and Global Best Updating: Calculating the fitness value of each particle and evaluating this particle according to its updated position. When the fitness value of this particle is less than that of individual or global best, then individual or global best will be replaced by the position of this particle.
Step 4. Stopping Criteria: If one stopping condition is satisfied then stop; otherwise go to Step 2. Based on descriptions above, the flow chart of CPSO-LSSVM is shown in Figure 4:

Construction of MEMS Gyroscope Drift Model
The CPSO-LSSVM method is used to model the MEMS gyroscope random drift, and is compared with BP-ANN, which is widely used in nonlinear fields. The original MEMS gyroscope data unavoidably includes noise information. Therefore, the wavelet filtering method is used to eliminate noise, and the random drift of the MEMS gyroscope is obtained for constructing a model. The random drift is a chaotic time series, and it is reconstructed by PSR since it is an efficient algorithm to analyze chaotic time series; the dimension of the random time series is improved by embedding the onedimensional time series into an auxiliary phase space, thus improving the prediction accuracy [3,36]. Then, the reconstructed data is modeled and analyzed. Details of the model are expressed as follows, and the main procedures are illustrated as follows and depicted in Figure 5:

Construction of MEMS Gyroscope Drift Model
The CPSO-LSSVM method is used to model the MEMS gyroscope random drift, and is compared with BP-ANN, which is widely used in nonlinear fields. The original MEMS gyroscope data unavoidably includes noise information. Therefore, the wavelet filtering method is used to eliminate noise, and the random drift of the MEMS gyroscope is obtained for constructing a model. The random drift is a chaotic time series, and it is reconstructed by PSR since it is an efficient algorithm to analyze chaotic time series; the dimension of the random time series is improved by embedding the one-dimensional time series into an auxiliary phase space, thus improving the prediction accuracy [3,36]. Then, the reconstructed data is modeled and analyzed. Details of the model are expressed as follows, and the main procedures are illustrated as follows and depicted in Figure 5: Step 1: Wavelet Filtering. The wavelet analysis overcomes the shortcomings of Fourier analysis in signal processing and can remove the noise while preserving the signal instantaneous dynamic characteristics, so it has a wide range of applications in signal de-noising [37,38]. The original MEMS gyroscope drift contained noise components, and the mother wavelet 'db4' and the soft threshold are adopted for de-noising.
Step 2: Phase Space Reconstruction. Regarding the random time series: { ( ) , i = 1, 2,..., N, the reconstructed sequence is obtained through PSR method as follows: where represents embedding dimension and denotes delay time. The delay time window, , shown in Equation (17) is introduced as follows: .
The C-C method is used to determine optimal parameters and since this method is robust, relatively simple and not computationally demanding [39]. The following three formulas are very important for determining the parameters: Step 1: Wavelet Filtering. The wavelet analysis overcomes the shortcomings of Fourier analysis in signal processing and can remove the noise while preserving the signal instantaneous dynamic characteristics, so it has a wide range of applications in signal de-noising [37,38]. The original MEMS gyroscope drift contained noise components, and the mother wavelet 'db4' and the soft threshold are adopted for de-noising.
Step 2: Phase Space Reconstruction. Regarding the random time series: {x(i)}, i = 1, 2,..., N, the reconstructed sequence is obtained through PSR method as follows: where m represents embedding dimension and τ denotes delay time.
The delay time window, τ w , shown in Equation (17) is introduced as follows: The C-C method is used to determine optimal parameters τ w and τ since this method is robust, relatively simple and not computationally demanding [39]. The following three formulas are very important for determining the parameters: S cor (t) = ∆S(t) + |S(t)| (20) where m = 2,3,4,5, r j = iσ 2 , i = 1,2,3,4 and S m, r j , t and ∆S(m, t) are two functions whose definitions have been elaborated in the paper [39].
Looking for the first zero point of S(t) or the first local minimum of ∆S(t) to find the first locally optimal time for independence of the data gives the delay time τ. Meanwhile, by simply looking for the minimum of S cor (t) and then the delay time window, τ w can be obtained [37,39].
Step 3: Data Splitting. After the noise removal and data reconstruction, the MEMS gyroscope random drift sequence is split into the training dataset and the testing dataset. To eliminate the impact of amplitude and improve convergence performance, data normalization is needed to be done, based on Equation (21) as follows: where x max is the maximum value of the variable and x min represents the minimum value, with x max set as 1 and x min is set as −1. The range of variables is normalized within [−1, 1] after preprocessing.
Step 4: Model Construction and Test. This step requires a training dataset to train the model, and uses a testing dataset to verify the prediction accuracy of the proposed model. The algorithms of different models have been described in detail in Section 2 of this paper.
Step 5: Evaluation. The prediction error indexes, namely MAE (mean absolute error), RMSE (root mean square error), and ARE (average relative error), are applied for evaluating prediction accuracy. The application of these indices is very common, so are only briefly described here: where n is the size of prediction data, y pi denotes the predicting value, and y i represents the actual value.

Experiment Setup
The hardware utilized in the trials was an MEMS IMU (inertial measurement unit) designed by this laboratory. The MEMS IMU was placed on the three-axis turntable. Preheating the MEMS IMU lasted for 20 min, then 1500 s of data were collected, and the sampling rate was 10 Hz. The output data of the X-axis gyroscope were analyzed and modeled in this paper. The experiment environment and MEMS IMU are shown in Figure 6.

Data Preprocessing
Section 3 notes the task of data preprocessing was to perform wavelet filtering and phase space reconstruction for the collected data of MEMS gyroscope. Following deterministic error compensation, the time series of random drift was obtained. First wavelet filtering with mother wavelet 'db4' and the soft threshold was used to remove most of the noise of original data under three scales. Figure 7 shows the de-noised data contained the essential features of random drift. The next step was to reconstruct the filtered series. The C-C method was used to determine the delay time and the time window . The related equations are presented as Equations (18)- (20). Figure 8 shows the time delay τ corresponding to the first local minimum of ∆ ( ) was 10. The time window corresponding to the minimum value of ( ) was 20. According to the Equation (17), = 3 was easily obtained. Then, the reconstructed mathematical matrixes were obtained as follows:

Data Preprocessing
Section 3 notes the task of data preprocessing was to perform wavelet filtering and phase space reconstruction for the collected data of MEMS gyroscope. Following deterministic error compensation, the time series of random drift was obtained. First wavelet filtering with mother wavelet 'db4' and the soft threshold was used to remove most of the noise of original data under three scales. Figure 7 shows the de-noised data contained the essential features of random drift.

Data Preprocessing
Section 3 notes the task of data preprocessing was to perform wavelet filtering and phase space reconstruction for the collected data of MEMS gyroscope. Following deterministic error compensation, the time series of random drift was obtained. First wavelet filtering with mother wavelet 'db4' and the soft threshold was used to remove most of the noise of original data under three scales. Figure 7 shows the de-noised data contained the essential features of random drift. The next step was to reconstruct the filtered series. The C-C method was used to determine the delay time and the time window . The related equations are presented as Equations (18)- (20). Figure 8 shows the time delay τ corresponding to the first local minimum of ∆ ( ) was 10. The time window corresponding to the minimum value of ( ) was 20. According to the Equation (17), = 3 was easily obtained. Then, the reconstructed mathematical matrixes were obtained as follows: The next step was to reconstruct the filtered series. The C-C method was used to determine the delay time τ and the time window τ w . The related equations are presented as Equations (18)- (20). Figure 8 shows the time delay τ corresponding to the first local minimum of ∆S(t) was 10. The time window τ w corresponding to the minimum value of S cor (t) was 20. According to the Equation (17), m = 3 was easily obtained. Then, the reconstructed mathematical matrixes were obtained as follows: x 11 x 21 x 2 x 12 x 22 . . .

Comparing the Effects of Modeling
To verify the applicability of the proposed model, the reconstructed data (25) and (26) were normalized by using Equation (21) at first, then the data were divided into three groups. Each group then was divided into two data sets: the training dataset (80%), and the testing dataset (20%). Furthermore, the training set was further divided into an input-training set and an output-training set. Detailed descriptions were as follows.
The first group containing 5000 sets of data was used for establishment and validation of BP-ANN and CPSO-LSSVM models at first. The training dataset contained 4000 sample points, and the other 1000 sample points were for testing. The BP-ANN method, with 10 neurons in hidden layer, was used for the training dataset. The parameters of LSSVM, which had been acquired on the basis of CPSO, were γ = 29.358, = 83.162. The fitting effect of the training dataset was shown in Figure  9. The two well-trained models were used to predict the output of the testing dataset, and the predicted values were compared with the actual values, which are shown in Figure 10:

Comparing the Effects of Modeling
To verify the applicability of the proposed model, the reconstructed data (25) and (26) were normalized by using Equation (21) at first, then the data were divided into three groups. Each group then was divided into two data sets: the training dataset (80%), and the testing dataset (20%). Furthermore, the training set was further divided into an input-training set and an output-training set. Detailed descriptions were as follows.
The first group containing 5000 sets of data was used for establishment and validation of BP-ANN and CPSO-LSSVM models at first. The training dataset contained 4000 sample points, and the other 1000 sample points were for testing. The BP-ANN method, with 10 neurons in hidden layer, was used for the training dataset. The parameters of LSSVM, which had been acquired on the basis of CPSO, were γ = 29.358, σ 2 = 83.162. The fitting effect of the training dataset was shown in Figure 9.

Comparing the Effects of Modeling
To verify the applicability of the proposed model, the reconstructed data (25) and (26) were normalized by using Equation (21) at first, then the data were divided into three groups. Each group then was divided into two data sets: the training dataset (80%), and the testing dataset (20%). Furthermore, the training set was further divided into an input-training set and an output-training set. Detailed descriptions were as follows.
The first group containing 5000 sets of data was used for establishment and validation of BP-ANN and CPSO-LSSVM models at first. The training dataset contained 4000 sample points, and the other 1000 sample points were for testing. The BP-ANN method, with 10 neurons in hidden layer, was used for the training dataset. The parameters of LSSVM, which had been acquired on the basis of CPSO, were γ = 29.358, = 83.162. The fitting effect of the training dataset was shown in Figure  9. The two well-trained models were used to predict the output of the testing dataset, and the predicted values were compared with the actual values, which are shown in Figure 10: The two well-trained models were used to predict the output of the testing dataset, and the predicted values were compared with the actual values, which are shown in Figure 10: Statistical parameters, such as MAE, RMSE and ARE, depicted in Equations (22)- (24), were used to evaluate the prediction accuracy. Table 1 depicts the statistical analysis of the comparison results of the two models. Furthermore, the reproducibility of both methods was evaluated by building models for the other two groups. The results of the three groups were vividly presented in Figure 11. Comparing the prediction results of the other two groups reaches conclusions similar to that which the first group obtained.

CPSO-LSSVM
Group Ⅱ Group Ⅲ Group Ⅰ Figure 11. The statistical results of the three groups. Statistical parameters, such as MAE, RMSE and ARE, depicted in Equations (22)- (24), were used to evaluate the prediction accuracy. Table 1 depicts the statistical analysis of the comparison results of the two models. Furthermore, the reproducibility of both methods was evaluated by building models for the other two groups. The results of the three groups were vividly presented in Figure 11. Comparing the prediction results of the other two groups reaches conclusions similar to that which the first group obtained. Statistical parameters, such as MAE, RMSE and ARE, depicted in Equations (22)- (24), were used to evaluate the prediction accuracy. Table 1 depicts the statistical analysis of the comparison results of the two models. Furthermore, the reproducibility of both methods was evaluated by building models for the other two groups. The results of the three groups were vividly presented in Figure 11. Comparing the prediction results of the other two groups reaches conclusions similar to that which the first group obtained.

CPSO-LSSVM
Group Ⅱ Group Ⅲ Group Ⅰ Figure 11. The statistical results of the three groups. Based on the results above, the proposed CPSO-LSSVM method outperforms the BP-ANN. The better prediction performance may be due to the tendency for the CPSO to enjoy certainty, ergodicity and stochastic property, and to have a good global search capability; LSSVSM has the characteristics of SRM and anti-interference ability; CPSO-LSSVM combines both the advantages of CPSO and LSSVM.
The compensation of MEMS gyroscope drift then was done based on the prediction values of CPSO-LSSVM method. The anti-normalization of prediction data were used to compensate the random drift of MEMS gyroscope. Comparing the testing data in the first group before and after the compensation, the results showed that the compensative effect was remarkably effective. The standard deviation of the random drift after compensation was 0.00065 • /s, while it was 0.00354 • /s without compensation. Figure 12 shows the random drift of the MEMS gyroscope before and after the compensation. Based on the results above, the proposed CPSO-LSSVM method outperforms the BP-ANN. The better prediction performance may be due to the tendency for the CPSO to enjoy certainty, ergodicity and stochastic property, and to have a good global search capability; LSSVSM has the characteristics of SRM and anti-interference ability; CPSO-LSSVM combines both the advantages of CPSO and LSSVM.
The compensation of MEMS gyroscope drift then was done based on the prediction values of CPSO-LSSVM method. The anti-normalization of prediction data were used to compensate the random drift of MEMS gyroscope. Comparing the testing data in the first group before and after the compensation, the results showed that the compensative effect was remarkably effective. The standard deviation of the random drift after compensation was 0.00065°/s, while it was 0.00354°/s without compensation. Figure 12 shows the random drift of the MEMS gyroscope before and after the compensation. The effect of compensating the random drift using CPSO-LSSVM method for the three groups was shown in Table 2, where the evaluation index was the standard deviation of the random drift.  Table 2, it was clear that the standard deviation of the random drift had been decreased significantly after compensation. The results further demonstrated the applicability and validity of the CPSO-LSSVM method. This method was a feasible and satisfactory way to establish the model for MEMS gyroscope random drift.

Conclusions
The modeling of MEMS gyroscope random drift is a hot research topic since an accurate model is beneficial to improve the accuracy of MEMS gyroscopes. The key contribution of this paper is to reconstruct the random drift data of the MEMS gyroscope with PSR using the C-C method, and then to analyze the reconstructed data by the BP-ANN and CPSO-LSSVM methods. Based on the results and analyses, the following conclusions can be drawn: (a) Through a comparison of the results from the above two analysis methods, the statistical indicators, including MAE, RMSE, and ARE, for the testing dataset indicate that the proposed method has a better prediction precision; (b) the availability and effectiveness of CPSO-LSSVM method are demonstrated by an obvious decrease in the standard deviation of the random drift after compensation, which is shown in Table 2; (c) PSR plays an The effect of compensating the random drift using CPSO-LSSVM method for the three groups was shown in Table 2, where the evaluation index was the standard deviation of the random drift. Observing Table 2, it was clear that the standard deviation of the random drift had been decreased significantly after compensation. The results further demonstrated the applicability and validity of the CPSO-LSSVM method. This method was a feasible and satisfactory way to establish the model for MEMS gyroscope random drift.

Conclusions
The modeling of MEMS gyroscope random drift is a hot research topic since an accurate model is beneficial to improve the accuracy of MEMS gyroscopes. The key contribution of this paper is to reconstruct the random drift data of the MEMS gyroscope with PSR using the C-C method, and then to analyze the reconstructed data by the BP-ANN and CPSO-LSSVM methods. Based on the results and analyses, the following conclusions can be drawn: (a) Through a comparison of the results from the above two analysis methods, the statistical indicators, including MAE, RMSE, and ARE, for the testing dataset indicate that the proposed method has a better prediction precision; (b) the availability and effectiveness of CPSO-LSSVM method are demonstrated by an obvious decrease in the standard deviation of the random drift after compensation, which is shown in Table 2; (c) PSR plays an important role in the modeling of MEMS gyroscope random drift since it can reduce the dimension of input vector during modeling, thus reducing computation cost and complexity; (d) the CPSO can perform more powerful searching capability for parameters to construct the proposed model, and it can also be used for function optimization, data mining, and so on; (e) the combination and improvement of intelligent algorithms can lead to a better algorithm, and the CPSO-LSSVM method presented in this paper validates this point; (f) whether the modeling accuracy and compensation effectiveness of the method are affected by the different sensing mechanisms should be further studied. Considering using an algorithm to compensate for random drift of MEMS gyroscope is not enough, as well as from the structure and design standpoint to improve the accuracy of the MEMS gyroscope.
To summarize, the results show the better prediction capacity of the proposed model. It is believed that this algorithm can be regarded as a reliable method for modeling and compensating the MEMS gyroscope random drift. This method is expected to be applied in the field of north-seeking and short-time navigation based on MEMS gyroscopes. Additionally, the method proposed in this paper also implies that this method can be a new way for pedestrian step estimation, pattern recognition and many other fields. Better algorithms and more methods need to be studied further, and more work needs to be done in the future.