A MEMS IMU De-Noising Method Using Long Short Term Memory Recurrent Neural Networks (LSTM-RNN)

Microelectromechanical Systems (MEMS) Inertial Measurement Unit (IMU) containing a three-orthogonal gyroscope and three-orthogonal accelerometer has been widely utilized in position and navigation, due to gradually improved accuracy and its small size and low cost. However, the errors of a MEMS IMU based standalone Inertial Navigation System (INS) will diverge over time dramatically, since there are various and nonlinear errors contained in the MEMS IMU measurements. Therefore, MEMS INS is usually integrated with a Global Positioning System (GPS) for providing reliable navigation solutions. The GPS receiver is able to generate stable and precise position and time information in open sky environment. However, under signal challenging conditions, for instance dense forests, city canyons, or mountain valleys, if the GPS signal is weak and even is blocked, the GPS receiver will fail to output reliable positioning information, and the integration system will fade to an INS standalone system. A number of effects have been devoted to improving the accuracy of INS, and de-nosing or modelling the random errors contained in the MEMS IMU have been demonstrated to be an effective way of improving MEMS INS performance. In this paper, an Artificial Intelligence (AI) method was proposed to de-noise the MEMS IMU output signals, specifically, a popular variant of Recurrent Neural Network (RNN) Long Short Term Memory (LSTM) RNN was employed to filter the MEMS gyroscope outputs, in which the signals were treated as time series. A MEMS IMU (MSI3200, manufactured by MT Microsystems Company, Shijiazhuang, China) was employed to test the proposed method, a 2 min raw gyroscope data with 400 Hz sampling rate was collected and employed in this testing. The results show that the standard deviation (STD) of the gyroscope data decreased by 60.3%, 37%, and 44.6% respectively compared with raw signals, and on the other way, the three-axis attitude errors decreased by 15.8%, 18.3% and 51.3% individually. Further, compared with an Auto Regressive and Moving Average (ARMA) model with fixed parameters, the STD of the three-axis gyroscope outputs decreased by 42.4%, 21.4% and 21.4%, and the attitude errors decreased by 47.6%, 42.3% and 52.0%. The results indicated that the de-noising scheme was effective for improving MEMS INS accuracy, and the proposed LSTM-RNN method was more preferable in this application.

Generally, in the data science community, sequence prediction problems have been around for a long period of time in a wide range of applications, including stock price prediction and sales pattern finding, language translation and speech recognization [37][38][39][40][41]. Recently, a new breakthrough has happened in the data science community, and a Long Short Term Memory Recurrent Neutral Networks (LSTM-RNN) has been proposed and has been demonstrated more effective for almost all of these sequence prediction problems [37][38][39][40][41]. Compared with conventional RNN, LSTM-RNN introduces the "gate structure" to address the long-term memory, which allows it to have the pattern of selectively remembering for a long time. This special design or structure makes it more suitable for predicting or processing time based series data. In this paper, the LSTM-RNN is incorporated in MEMS IMU gyroscope raw signal de-noising. LSTM-RNN has performed excellently in time series signal processing, for instance stock price prediction, speech single processing, and others [37][38][39][40][41]. A MEMS Inertial Measurement Unit (IMU) manufactured by MT Microsystems Company known as MSI2000 IMU is employed in the experiments for testing [42]. Firstly, a common ARMA is employed to process the raw signal, then the order and parameters are determined through the auto-correlation and partial correlation operation; secondly, a single LSTM and multi-layer LSTM are compared in the MEMS gyroscope raw signal de-nosing in aspects of average training loss, training time and de-noising performance; finally, the three-axis attitude errors of raw signals, ARMA, and LSTM-RNN are compared and analyzed.
The remainder of this paper is organized as: (1) Section 2 introduces the methods including Auto Regressive Moving Average Method (ARMA), and the proposed LSTM-RNN; (2) Section 3 presents the experiments, results and comparison (3) the following are the discussion, conclusion and future work.

Method
In this section, the conventional ARMA representing the statistical methods and the proposed LSTM-RNN representing AI methods are presented. The principles, basic equations and information flow are briefly introduced.

ARMA Model
As illustrated in previous papers [25,26], the following two steps are essential for setting up an ARMA model: (1) After the obtaining the raw gyroscope signals, auto-correlation and partial correlation are operated to characterize the noise and select the suitable time series model; (2) estimating the parameters of the ARMA model.
The auto-correlation of a signal is a product operation of the signal and a time-shifted version of the signal itself. Assuming r(t) is a random signal sequence, and the auto-correlation can be modelled as [25,26]: where, E(·) is the expectation operator, τ is the time delay or shift. The partial correlation process is defined as [25,26]: P(τ) = ∑ (r(t) − E(r(t)))(r(t + τ) − E(r(t + τ))) ∑ (r(t) − E(r(t))) 2 ∑(r(t + τ) − E(r(t + τ))) 2 (2) where, E(·) is the expectation operator, τ is the time delay or shift. The ARMA model is defined as: where, ε(k) is a zero mean and unknown white noise, p and q are the order of the ARMA model, z(k − i) is the input time series data, and the a i and b j are the related parameters, some approaches are published for obtaining the values of these parameters, for instance Kalman filter and the least square estimation method [25,26]. Generally, the auto-correlation and partial correlation function is used to decide the order of the AMRA model.

LSTM-RNN Method
Long Short Term Memory (LSTM) is a popular variant of the common Recurrent Neural Network (RNN). An RNN composed of LSTM units is often called an LSTM network. Different from RNN, new structure termed as "gate" is added to LSTM. Commonly, a LSTM unit is composed of a cell, an "input gate", "output gate" and a "forget gate." The basic structure of a single layer LSTM unit is shown as Figure 1. The cell remembers values over arbitrary time intervals and the three different gates regulate and control the flow of information into and out of the cell [38][39][40]. Following is the detailed description of the different gates and relative equations. approaches are published for obtaining the values of these parameters, for instance Kalman filter and the least square estimation method [25,26]. Generally, the auto-correlation and partial correlation function is used to decide the order of the AMRA model.

LSTM-RNN Method
Long Short Term Memory (LSTM) is a popular variant of the common Recurrent Neural Network (RNN). An RNN composed of LSTM units is often called an LSTM network. Different from RNN, new structure termed as "gate" is added to LSTM. Commonly, a LSTM unit is composed of a cell, an "input gate", "output gate" and a "forget gate." The basic structure of a single layer LSTM unit is shown as Figure 1. The cell remembers values over arbitrary time intervals and the three different gates regulate and control the flow of information into and out of the cell [38][39][40]. Following is the detailed description of the different gates and relative equations. As illustrated in Figure 1, the first part of the LSTM is the "forget gate", which is employed to decide what information is going to get thrown away from the cell state, the decision is made by a sigmoid layer called "forget gate layer". x are input to the function, and outputs a value ranging from 0 to 1 for each number in the cell state 1 t C − . The values represent the forgetting degree of each number in the cell state, and "1" represents "completely keep this" while "0" represents "completely get rid of this". The operation equation t f is as: x is the input vector.
After deciding the memory of the previous hidden state, and the second part is the "input gate", which is utilized to decide what new information is going to be stored in the current cell state. This gate is composed of two parts: (1) A sigmoid layer to decide what values are going to be updated, the output values t i range from 0 to 1, and they represent the updating degree of each number in input; (2) another part is a tanh layer which creates a vector of new candidate values t C  , which will be added to the cell state after multiplied with the decision vector t i . The relative equations are as following: [ ] ( ) 1 tanh , As illustrated in Figure 1, the first part of the LSTM is the "forget gate", which is employed to decide what information is going to get thrown away from the cell state, the decision is made by a sigmoid layer called "forget gate layer". h t−1 and x t are input to the function, and outputs a value ranging from 0 to 1 for each number in the cell state C t−1 . The values represent the forgetting degree of each number in the cell state, and "1" represents "completely keep this" while "0" represents "completely get rid of this". The operation equation f t is as: where, σ(·) is a sigmoid function, W f is the updating weights, b f is the bias, h t−1 is the hidden state, and x t is the input vector. After deciding the memory of the previous hidden state, and the second part is the "input gate", which is utilized to decide what new information is going to be stored in the current cell state. This gate is composed of two parts: (1) A sigmoid layer to decide what values are going to be updated, the output values i t range from 0 to 1, and they represent the updating degree of each number in input; (2) another part is a tanh layer which creates a vector of new candidate values C t , which will be added to the cell state after multiplied with the decision vector i t . The relative equations are as following: where, σ(·) is a sigmoid function, W i is the updating weights, b i is the bias in the input gate, h t−1 is the hidden state at time t − 1, W C is the updating weights, b C is the bias, and x t is the input vector. The last part is the "output gate", which is employed to decide what is going to output, similarly, a sigmoid layer outputs values o t , which is employed to decide what parts of the cell state will be output, then the cell state is put through a tanh function. After this operation, the cell state values are pushed to be between −1 and 1. Finally, the results are multiplied by the output of the sigmoid gate, and the output parts are decided. The related equations are as: where, W o is the updating weights, and b o is the bias in the output gate, C t is the cell state at time t.
The above Equations (4)-(8) describes the basic LSTM unit for RNN, which is just a single LSTM unit. Figure 2 presents a sequence of LSTM-RNN units in time domain. In Figure 2a, is a simple description of the LSTM-RNN working flow. The output of LSTM-RNN is decided by not only current state, but a long-term memory. Figure 2b gives the details. The cell state and hidden layer is covered to the next LSTM Unit, and the inner gate will decide the memory degree of past information. In addition, before it is employed for prediction, a training procure is necessary for determining the unknown parameters in the above Equations [37][38][39][40]. where, ( ) σ ⋅ is a sigmoid function, i W is the updating weights, i b is the bias in the input gate, W is the updating weights, C b is the bias, and t x is the input vector.
The last part is the "output gate", which is employed to decide what is going to output, similarly, a sigmoid layer outputs values t o , which is employed to decide what parts of the cell state will be output, then the cell state is put through a tanh function. After this operation, the cell state values are pushed to be between −1 and 1. Finally, the results are multiplied by the output of the sigmoid gate, and the output parts are decided. The related equations are as: where, o W is the updating weights, and o b is the bias in the output gate, t C is the cell state at time The above Equations (4)-(8) describes the basic LSTM unit for RNN, which is just a single LSTM unit. Figure 2 presents a sequence of LSTM-RNN units in time domain. In Figure 2a, is a simple description of the LSTM-RNN working flow. The output of LSTM-RNN is decided by not only current state, but a long-term memory. Figure 2b gives the details. The cell state and hidden layer is covered to the next LSTM Unit, and the inner gate will decide the memory degree of past information. In addition, before it is employed for prediction, a training procure is necessary for determining the unknown parameters in the above Equations [37][38][39][40].

Experiments and Results
This section will present experiments and the relative analyses for evaluating the performance of the proposed LSTM-RNN method. The laboratory experiments are conducted using the data which was collected from a MEMS IMU (MSI3200) manufactured by MT Microsystems Company, Hebei, China [29]. The real picture and the specifications of the IMU are as Figure 3 and Table 1 respectively. The gyroscope bias stability is ≤10°/h, and the random walk is 10 h ≤°. The accelerometer bias was

Experiments and Results
This section will present experiments and the relative analyses for evaluating the performance of the proposed LSTM-RNN method. The laboratory experiments are conducted using the data which was collected from a MEMS IMU (MSI3200) manufactured by MT Microsystems Company, Shijiazhuang, China [29]. The real picture and the specifications of the IMU are as Figure 3 and Table 1 respectively. The gyroscope bias stability is ≤10 • /h, and the random walk is accelerometer bias was 0.5 mg, and the bias stability was 0.5 mg. The IMU was placed statically on the table, and the sampling frequency was 400 Hz. Thus, the amount of data was 48,000. The gyroscope output data unit was in degree/s. The raw noisy data of the X-axis gyroscope output is shown as Figure 3 (red line representing the raw data), and the bias was reduced before modeling the errors. After removing the bias, the result is presented in Figure 4 (blue line representing the data excluding bias). Note that the program in this experiment was developed in Python with the Tensorflow package, which is operated in an Alienware R2 PC installed an i7 Intel CPU and 16 GB random memory.
Sensors 2018, 18, 3470 6 of 14 0.5 mg, and the bias stability was 0.5 mg. The IMU was placed statically on the table, and the sampling frequency was 400 Hz. Thus, the amount of data was 48,000. The gyroscope output data unit was in degree/s. The raw noisy data of the X-axis gyroscope output is shown as Figure 3 (red line representing the raw data), and the bias was reduced before modeling the errors. After removing the bias, the result is presented in Figure 4 (blue line representing the data excluding bias). Note that the program in this experiment was developed in Python with the Tensorflow package, which is operated in an Alienware R2 PC installed an i7 Intel CPU and 16 GB random memory.   The remainder of his section is divided into three parts: (1) The first part describes the results using the ARMA method to model the errors with presenting the auto-correction and partial correction results. The ARMA models are given according to the auto-correction and partial 0.5 mg, and the bias stability was 0.5 mg. The IMU was placed statically on the table, and the sampling frequency was 400 Hz. Thus, the amount of data was 48,000. The gyroscope output data unit was in degree/s. The raw noisy data of the X-axis gyroscope output is shown as Figure 3 (red line representing the raw data), and the bias was reduced before modeling the errors. After removing the bias, the result is presented in Figure 4 (blue line representing the data excluding bias). Note that the program in this experiment was developed in Python with the Tensorflow package, which is operated in an Alienware R2 PC installed an i7 Intel CPU and 16 GB random memory.   The remainder of his section is divided into three parts: (1) The first part describes the results using the ARMA method to model the errors with presenting the auto-correction and partial correction results. The ARMA models are given according to the auto-correction and partial  The remainder of his section is divided into three parts: (1) The first part describes the results using the ARMA method to model the errors with presenting the auto-correction and partial correction results. The ARMA models are given according to the auto-correction and partial correction results. The standard deviation (STD) values of the signals are compared with the corresponding raw gyroscope

Error Modeling Using ARMA
For time series analysis, Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) characteristics are usually employed to select the proper model. As aforementioned in Section 2, the ACF and PACF are presented as Equations (1) and (2), and then Figures 4-6 show the ACF and PACF results of the X-axis, Y-axis, and Z-axis gyroscope raw signals respectively, which are processed according to the Equations (1) and (2). From Figures 5-7, it is evident and obvious that ACF and PACF of the three-axis gyroscope are tail off. Thus, the ARMA model is suitable for this application, and the order of the ARMA model is determined using the results. More details about parameters determination can be found in the references [25,26]. Therefore, ARMA models for these three-axis gyroscope signals are presented as: where, the z(k) is the data at time ε(k) is the white noise at time k. The results from the ARMA is listed in Table 2. Compared with raw signal, the standard deviation (STD) of the three-axis gyroscope outputs decrease by 31.1%, 20.0% and 25.0%. The results show the ARMA performs effectively for de-noising MEMS gyroscope raw signals. Specifically, the parameters or orders are fixed in this experiment. correction results. The standard deviation (STD) values of the signals are compared with the corresponding raw gyroscope signals; (2) the second part is the results using the LSTM-RNN to model the errors, and presenting the training time and prediction accuracy for different input vector length. Moreover, a multi-layer LSTM-RNN is designed and compared with a single-layer LSTM-RNN in terms training time, computation load and performance; (3) the last part presents the comparisons conducted between the ARMA and LSTM-RNN, including statistical results and position results.

Error Modeling Using ARMA
For time series analysis, Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) characteristics are usually employed to select the proper model. As aforementioned in Section 2, the ACF and PACF are presented as Equations (1) and (2), and then Figures 4-6 show the ACF and PACF results of the X-axis, Y-axis, and Z-axis gyroscope raw signals respectively, which are processed according to the Equations (1) and (2). From Figures 5-7, it is evident and obvious that ACF and PACF of the three-axis gyroscope are tail off. Thus, the ARMA model is suitable for this application, and the order of the ARMA model is determined using the results. More details about parameters determination can be found in the references [25,26]. Therefore, ARMA models for these three-axis gyroscope signals are presented as: where, the ( ) z k is the data at time ( ) k ε is the white noise at time k . The results from the ARMA is listed in Table 2. Compared with raw signal, the standard deviation (STD) of the three-axis gyroscope outputs decrease by 31.1%, 20.0% and 25.0%. The results show the ARMA performs effectively for de-noising MEMS gyroscope raw signals. Specifically, the parameters or orders are fixed in this experiment.

Error Modeling Using LSTM-RNN
The output data vector is defined as: In above equations, the variable step is the length of the input data vector for training procedure. Suitable values of the input vector length step is identified to realize a tradeoff between training time and the prediction performance. Table 3 shows the three axis gyroscope data training results and standard deviation (STD) of the prediction. In this test, the 5, 10, 15, 20 and 30 are the selected values of the vector length. Table 4 shows the comparison results. The training dataset length is 1000, and the testing dataset length is 48,000 (2 min with 400 Hz sampling rate). The specifications of the LSTM-RNN are presented in Table 5. The consumption time increases with the input vector length, and the STD values decrease first, and then increase. Hence, 20 is selected as the length of input vector, which is the best tradeoff between the STD and the computation time.

Error Modeling Using LSTM-RNN
The output data vector is defined as: In above equations, the variable step is the length of the input data vector for training procedure. Suitable values of the input vector length step is identified to realize a tradeoff between training time and the prediction performance. Table 3 shows the three axis gyroscope data training results and standard deviation (STD) of the prediction. In this test, the 5, 10, 15, 20 and 30 are the selected values of the vector length. Table 4 shows the comparison results. The training dataset length is 1000, and the testing dataset length is 48,000 (2 min with 400 Hz sampling rate). The specifications of the LSTM-RNN are presented in Table 5. The consumption time increases with the input vector length, and the STD values decrease first, and then increase. Hence, 20 is selected as the length of input vector, which is the best tradeoff between the STD and the computation time.

Error Modeling Using LSTM-RNN
In this proposed LSTM-RNN method, the employed MEMS gyroscope dataset is labeled as [x 1 , x 2 , . . . , x N ], the subscript N is termed as the amount of the IMU data samples. The dataset is divided into a training and testing part. The training part is used to build the model and the testing part is used to verify the model. The input data vector for training is defined as: The output data vector is defined as: In above equations, the variable step is the length of the input data vector for training procedure. Suitable values of the input vector length step is identified to realize a tradeoff between training time and the prediction performance. Table 3 shows the three axis gyroscope data training results and standard deviation (STD) of the prediction. In this test, the 5, 10, 15, 20 and 30 are the selected values of the vector length. Table 4 shows the comparison results. The training dataset length is 1000, and the testing dataset length is 48,000 (2 min with 400 Hz sampling rate). The specifications of the LSTM-RNN are presented in Table 5. The consumption time increases with the input vector length, and the STD values decrease first, and then increase. Hence, 20 is selected as the length of input vector, which is the best tradeoff between the STD and the computation time. As shown in Table 5, the noises are considerably decreased using the LSTM-RNN, and the STD values decrease by 60.3%, 37%, and 44.6%. As aforementioned, the training epoch was set to 50, and a multi-layer LSTM-RNN was designed and compared with single LSTM-RNN. Table 6 shows the results. The multi-layer LSTM-RNN (two hidden units) has a lower average training loss at epoch of 50, which decreases by 55.4%, 34.2% and 32.1%, However, it seems that there is no obvious advance in filtering performance. The STD values of the filters data have no reduction, and are even a little higher. The operation time consumption is almost twice that of single LSTM-RNN. Figure 8 shows the training loss comparison of single-layer LSTM RNN and multi-layer LSTM-RNN, and they have identical accuracy at the 20th epochs. Thus, multi-layer LSTM-RNN was trained with 20 epochs, and Table 7 shows the results for the multi-layer LSTM-RNN. The results were compared with single LSTM-RNN with 50 training epochs. The average training losses have a slight increase, and the STD values are almost identical to that of multi-layer LSTM-RNN with 50 training epochs. However, compared with the single LSTM-RNN, the time consumption of multi-layer LSTM-RNN is less than that of single LSTM-RNN, which is initialed by the training epochs reducing in multi-layer LSTM-RNN. In aspects of the STD values, the de-noised three-axis gyroscope outputs have an improvement of 22.4%, 9.1%, and 22.6% respectively compared with single-layer LSTM-RNN trained after 50 epochs. This decline in accuracy means the multi-layer LSTM-RNN with fewer training epochs have weaker generation ability, since the multi-layer LSTM-RNN has more parameters which need more training epochs. Thus, while the training epochs are set to 20, the multi-layer has slightly worse STD values compared with the single-layer LSTM-RNN.

Comparisons of ARMA and LSTM-RNN
This part presents the comparisons between ARMA and LSTM-RNN. Table 8 shows the STD results from the ARMA and LSTM-RNN de-noising methods. Compared with raw signals, STD values of the three-axis gyroscope data from the ARMA method perform a 31.2%, 20.0% and 25.0% improvement, and the STD values of the single-layer LSTM-RNN de-noised signals decrease by 42.4%, 21.4% and 21.4% respectively. With the same MEMS IMU dataset, the LSTM-RNN has an obvious improvement of 42.3%, 21.4% and 26.2% respectively for the three-axis gyroscope dataset.
Further, Figure 9 shows the attitude errors of the ARMA and LSTM-RNN de-noised MEMS IMU data. In the Figure 9, the blue line represents the position errors of the raw signals from MEMS IMU, the green line represents the position errors of AMMA de-noised MEMS IMU, and the red line represents the position errors of the designed single LSTM-RNN de-noised MEMS IMU. Table 9 shows the maximum errors of the three-axis gyroscope, compared with raw signal. The pitch, roll angle and yaw angle errors decreased by 15.8%, 18.3% and 51.3% respectively. Specifically, the pitch error decreases from −5.07° to −4.27°, the roll error decreases from −1.95° to −1.60°, and the yaw angle error decreases from −3.85° to −1.87°. Moreover, the errors from signals de-noised by LSTM-RNN decreased by 47.6%, 42.3% and 52.0%, further compared with ARMA results. To be specific, the pitch, roll and yaw angles have an extra improvement of 2.04°, 0.66° and 0.96° compared with that of ARMA. The ARMA employed in this experiment was operated with fixed parameters with selected part of the dataset. Thus, in the testing dataset, more feasible parameters are available and suitable for better performance. However, the single-layer LSTM and ARMA were tested with the identical dataset, this might demonstrate that LSTM-RNN have better generation ability in this application. This is what we think might account for the accuracy improvement of the single-layer LSTM-RNN compared with the common ARMA method. In addition, the yaw errors from the LSTM-RNN de-noised signals have an upward trend which is different from the errors from the ARMA and raw signals. We think the principle of the LSTM-RNN may account for this, and the specifications need more and further investigation. Overall, the results demonstrate the effectiveness of LSTM-RNN in MEMS gyroscope signals de-nosing.

Comparisons of ARMA and LSTM-RNN
This part presents the comparisons between ARMA and LSTM-RNN. Table 8 shows the STD results from the ARMA and LSTM-RNN de-noising methods. Compared with raw signals, STD values of the three-axis gyroscope data from the ARMA method perform a 31.2%, 20.0% and 25.0% improvement, and the STD values of the single-layer LSTM-RNN de-noised signals decrease by 42.4%, 21.4% and 21.4% respectively. With the same MEMS IMU dataset, the LSTM-RNN has an obvious improvement of 42.3%, 21.4% and 26.2% respectively for the three-axis gyroscope dataset. Further, Figure 9 shows the attitude errors of the ARMA and LSTM-RNN de-noised MEMS IMU data. In the Figure 9, the blue line represents the position errors of the raw signals from MEMS IMU, the green line represents the position errors of AMMA de-noised MEMS IMU, and the red line represents the position errors of the designed single LSTM-RNN de-noised MEMS IMU. Table 9 shows the maximum errors of the three-axis gyroscope, compared with raw signal. The pitch, roll angle and yaw angle errors decreased by 15.8%, 18.3% and 51.3% respectively. Specifically, the pitch error decreases from −5.07 • to −4.27 • , the roll error decreases from −1.95 • to −1.60 • , and the yaw angle error decreases from −3.85 • to −1.87 • . Moreover, the errors from signals de-noised by LSTM-RNN decreased by 47.6%, 42.3% and 52.0%, further compared with ARMA results. To be specific, the pitch, roll and yaw angles have an extra improvement of 2.04 • , 0.66 • and 0.96 • compared with that of ARMA. The ARMA employed in this experiment was operated with fixed parameters with selected part of the dataset. Thus, in the testing dataset, more feasible parameters are available and suitable for better performance. However, the single-layer LSTM and ARMA were tested with the identical dataset, this might demonstrate that LSTM-RNN have better generation ability in this application. This is what we think might account for the accuracy improvement of the single-layer LSTM-RNN compared with the common ARMA method. In addition, the yaw errors from the LSTM-RNN de-noised signals have an upward trend which is different from the errors from the ARMA and raw signals. We think the principle of the LSTM-RNN may account for this, and the specifications need more and further investigation. Overall, the results demonstrate the effectiveness of LSTM-RNN in MEMS gyroscope signals de-nosing.

Discussion
1. In this paper, limited by the computing capacity of the employed computer, the LSTM-RNN had a limited amount of layers, which might have a negative influence on the generation ability of LSTN-RNN and the prediction performance in the long term. 2. In this paper, just one of the RNN variants LSTM-RNN were employed and evaluated in this application, and it has significant meaning to explore different LSTM-RNN structures more suitable for MEMS IMU errors modeling and de-noising. 3. This method was tested only using static data, and dynamic trajectory data should be included for fully evaluating the proposed method. The noise characteristics in dynamic environment may be different from that in dynamics.

1.
In this paper, limited by the computing capacity of the employed computer, the LSTM-RNN had a limited amount of layers, which might have a negative influence on the generation ability of LSTN-RNN and the prediction performance in the long term. 2.
In this paper, just one of the RNN variants LSTM-RNN were employed and evaluated in this application, and it has significant meaning to explore different LSTM-RNN structures more suitable for MEMS IMU errors modeling and de-noising.
3. This method was tested only using static data, and dynamic trajectory data should be included for fully evaluating the proposed method. The noise characteristics in dynamic environment may be different from that in dynamics.

Conclusions
This paper discussed a LSTM-RNN based MEMS IMU errors modelling method. A MEMS IMU (MSI 3200) was employed for testing the proposed method. Through the comparisons, three major conclusions were drawn as: (1) LSTM-RNN outperformed the ARMA in this application. Compared with the ARMA model, the standard deviation of the single LSTM-RNN de-noised signals decreased by 42.4%, 21.4% and 21.4% respectively, and the attitude errors decreased by 47.6%, 42.3% and 52.0%; (2) multi-layer LSTM-RNN was able to realize the settled average training loss with less training epochs. However, the multi-layer LSTM-RNN did not outperform the single LSTM-RNN in standard deviation values of the prediction. When the training epoch was set to 20, the multi-layer had a slightly better prediction accuracy with less computation time than the single LSTM-RNN.
In addition, we think some more details are worthy of being investigated further in the future: (1) It is meaningful to investigate the deep LSTM-RNN network, which should be trained with a large amount data. Well trained deep LSTM-RNN has been demonstrated to be more feasible in some applications. A deep LSTM-RNN will be implemented and presented in future; (2) many variants of RNN are published and have been demonstrated effectively in solving time series prediction problems. It is meaningful to further investigate and compare their performance, and find more preferable neural networks suitable for this particular application. Comparison of several popular variants of RNN will be presented in future.