Field Programmable Gate Array Based Parallel Strapdown Algorithm Design for Strapdown Inertial Navigation Systems

A new generalized optimum strapdown algorithm with coning and sculling compensation is presented, in which the position, velocity and attitude updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single FPGA chip, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the Xilinx ISE 12.3 software platform and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform.


Introduction
In a strapdown inertial navigation system (SINS), inertial sensors are rigidly attached to the vehicle, which leads to the system suffering from the highly dynamic vehicle movement environment. In addition, inertial sensors may be subject to high frequency motion as a result of body bending and engine-induced vibration. The strapdown algorithms adopted by most modern SINSs are constructed based on a general two-speed structure by which the position, velocity and attitude (PVA) updating operations are divided into two parts [1,2]: an exact moderate-speed updating routine (e.g., 50 ~ 200 Hz) typically designed to update each PVA based on the maximum angular rate and acceleration of vehicle; and a high-speed updating routine (e.g., 1 ~ 4 KHz for an aircraft INS with the positioning accuracy of less than 1 nmile/h) designed to accurately account for vibration-induced coning and sculling effects based on the anticipated vibration condition of the system. The original intention of the two-speed structure is to overcome the throughput limitations of early computer techniques, but the limitation is rapidly becoming insignificant with the continuous improvement in the performance of modern high-speed computers [3]. On the other hand, along with the fast progress of modern vehicles in ultra-high speed and other maneuvering performances, there exist more and more urgent demands to promote the navigation and control precision of the vehicles in high dynamic motion. It provides the motivation to return to a simpler single-speed structure of the strapdown algorithm in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects.
Two key compensation algorithms designed to operate in severe maneuvering and vibratory environments are critical in determining the performance of a SINS, i.e., the coning compensation that works when the vehicle's angular rate vector is rotating, and the sculling compensation that takes effect when the vehicle's angular rate or specific force acceleration vector is rotating, or when the ratio of the angular rate to specific force is not constant. Thus in order to improve the navigation accuracy of the system, particularly in the environments where the angular rate vector or the specific force vector of the vehicle is large, several algorithms have been developed for the coning and sculling compensation. A substantial number of integration algorithms have been designed for coning compensation to improve the attitude accuracy without sacrificing computer throughput [4][5][6][7][8][9][10][11]. Analogous to the coning compensation algorithm adopted in attitude updating, a number of sculling compensation algorithms have also been designed for velocity updating, and the equivalence between coning and sculling compensation algorithms is discussed in [12,13]. A detailed statement of the coning and sculling compensation algorithms is given in [1,3,[14][15][16][17]. Most algorithms for the coning and sculling compensations are based on truncated Taylor series expansion approximations for the angular rate of vehicle over updating cycles [2,3,6,7,[9][10][11]18]. The accuracy of the coning and sculling compensation algorithms is determined by the updating rate of the coning and sculling compensations and the order of the truncated Taylor series expansion for the angular rate and specific force. Generally, in order to improve the accuracy of these algorithms, the updating rate must be increased to keep track of vehicle angular and linear motions more accurately. Among these existing algorithms, however, when the sampling rates of inertial sensors remain constant, and the number of the gyro incremental angle samples for coning compensation and the number of the accelerometer incremental velocity samples for sculling compensation are selected, the updating rates of these algorithms are also determined. The increase of the updating rates results in the decrease of the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples for the coning and sculling compensation (namely, the decrease of order of the coning and sculling compensation algorithms), which in turn reduces the accuracy of the algorithms.
Furthermore, in recent SINS applications the strapdown algorithm with coning and sculling compensations is commonly implemented in a Digital Signal Processor (DSP) platform supplemented with Field Programmable Gate Array (FPGA) for data acquisition and noise filtering. Due to the serial execution mode of the DSP, however, it cannot support an updating rate fast enough for a high-order algorithm. In order to tackle the conflict between the computation complexity of the high-order algorithm and the updating speed of the algorithm implemented on a DSP platform, Xie [19] proposed a strapdown algorithm architecture on dual DSPs and FPGA which in essence works in a parallel computation mode. Jew [20] presented a framework for designing inertial navigation systems on a single-chip FPGA, in which the strapdown algorithm is implemented by the PowerPC hardcore of the FPGA. Although to some extent these methods improved the performance of the strapdown algorithms, they all work in a serial mode, thus it did not make full use of the parallel computation characteristics of the FPGA. Some other researchers [21,22] suggested using a single-chip FPGA to implement multi-processing cores and parallel computing, but there is no any implementation scheme discussed in detail.
In this paper, a new generalized optimum strapdown algorithm with coning and sculling compensations is presented in Section 2, in which the PVA updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. Then, in order to implement the new strapdown algorithm in a single chip FPGA, the parallelization of the algorithm is designed in Section 3, and its computational complexity is analyzed in Section 4. In Section 5, the performance of the proposed parallel strapdown algorithm is tested on the software platform of Xilinx ISE 12.3 and the hardware platform of FPGA device XC6VLX550T on the basis of some fighter aircraft data. The contributions of this paper are finally summarized in Section 6.

Generalized Optimum Strapdown Algorithm
In order to reduce the computational complexity and decouple the relationship between the updating rates of the coning and sculling compensations and the number of the gyro incremental angle and the accelerometer incremental velocity samples, the strapdown algorithm proposed in this section is constructed on the basis of a single-speed structure, i.e., the PVA are updated in all the intervals where and are the attitude matrixes relating the B frame to the N frame at time t n-1 and at time t n , respectively; is the direction cosine matrix that accounts for angular motion of the B frame from time t n-1 to time t n ; is the direction cosine matrix that accounts for the N frame rotation frame from time t n-1 to time t n . According to the velocity rate equation in the N frame [3], the velocity v N at the time t n can be obtained by integrating the specific forces sensed by the accelerometers, the Coriolis accelerations due to the rotations of the navigation and earth frames and the gravity, namely: where and are the velocity of the system relative to the E frame at time t n and t n-1 , respectively; Δ and Δ / are the integrated transformed specific force increment and the gravity-Coriolis velocity increment over the updating interval [t n-1 , t n ], respectively, calculated by: Considering the rotation of the navigation frame and the body frame over the updating interval [t n-1 , t n ], Δ in Equation (3a) can be expanded according to the chain rule of matrix product as follows: where: Because the variation of the position of the system is small over the updating interval [t n-1 , t n ] (for example, when the updating interval length T n is 0.0005 s, the variation of the position of the 6 Mach where N ZN u is the unit vector along the Z-axis of the navigation frame, and According to the altitude and position matrix rate equation [3], the altitude h and position matrix at the time t n can be obtained as follows: where h n and h n-1 are the altitudes at the time t n and t n-1 , respectively; Δh n is the altitude change from the time t n-1 to the time t n ; and are the position matrix at the time t n and t n-1 , respectively; is the direction cosine matrix that accounts for the navigation frame rotation relative to the Earth frame from the time t n-1 to the time t n ; and: Similar to the attitude matrix updating, in Equation (7b) can also be approximated in terms of a rotation vector as follows (accurate to second order): where n ξ is the rotation vector defining the navigation frame rotation relative to the earth frame from the time t n-1 to the time t n ; and n ζ can be approximately expressed as follows: ( ) Note that Δ should be calculated first to obtain the altitude h and the position matrix .
Considering that the change of the velocity is small over the updating interval [t n-1 , t n ], Δ can be computed based on a trapezoidal integration algorithm as follows:

Body Frame Rotation Update
The direction cosine matrix in Equation (1) is used to update the attitude matrix which accounts for the angular rate of the B frame relative to the inertial space. According to the relationship between rotation vector and direction cosine matrix, can be expressed as follows: where Φ n is the rotation vector that accounts for angular motion of the body frame from time t n-1 to time t n ; Φ n is the magnitude of Φ n In practice, Φ n can be obtained by the following rotation vector rate equation [4]: where represents the angular rate of the B frame. The last two terms in Equation (13) are non-commutative, thus have to be calculated and compensated based on the gyro incremental angles in order to improve the computation accuracy. The triple-cross-product term in Equation (13) is usually quite small and can be neglected [4]. Then under the second order accuracy, the rotation vector Φ n in Equation (12) can be approximated by the integral of Equation (13) from t n-1 to t n , i.e., where n β is defined as the coning compensation from t n-1 to t n , and: For a SINS, the coning motion is the worst working condition which will induce serious attitude errors [5][6][7]18]. In other words, if in the case of coning movement the attitude errors are made minimal by a certain algorithm, the errors in the other cases will also be minimal by the same algorithm. So in order to develop the new strapdown algorithm, it is assumed that the vehicle is undergoing a pure coning movement, defined by the following angular rate: where Ω is the frequency associated with the coning motion; a and b are the amplitudes of the coning motion.
According to Equations (15a), (15b) and (16), the coning compensation term β n has the following form: Equation (17) shows an interesting property that the coning compensation is constant over all updating cycles, regardless of the absolute time at which the updating cycle begins. It only depends on the duration of the updating cycle.
According to the concept of distance between the cross products [6,11], the cross products with equal distance behave exactly the same in a pure coning environment defined by Equation (16). The coning compensation that uses the concept will have the simplest form, the optimal accuracy and the minimum computational throughput. Taking the advantage of this property, a generalized optimum algorithm for the integral in Equation (15b) consists of the sum of all possible cross products of the gyro incremental angle samples, making up the computation over the updating interval of rotation vector, such as [9]: (18) where N is the number of gyro incremental angle samples for the calculation of the coning compensation term; α n-i (1, 2,…, N − 1) is the gyro incremental angle sample in the (n − i)-th updating cycle; k i (1, 2,…, N − 1) is the constant coefficients for the cross product of α n-i and α n . Substituting Equations (15a) and (16) into Equation (18), and expanding each terms using Taylor series, the coning compensation term over the updating interval [t n-1 , t n ] is obtained as: ( ) where λ = ΩT n ; A ij is a constant defined by: In order to derive k i (1, 2,…, N − 1) in Equation (18), expanding Equation (17) by using Taylor series yields: where C i is a constant defined by: Combining Equation (19) with Equation (21), the following simultaneous equations for constant coefficients K i , i = 1, 2,…, N − 1), can be obtained: In a matrix form, Equation (23) is equivalent to: According to Equation (24), coefficients K i can be solved as follows: where [ ] 1 i m are m-dimensional column vector and the m-by-n matrix, respectively.
Note that different from other existing algorithms, the updating rate of the proposed optimal coning compensation algorithm is independent of the number of gyro incremental angle samples in the calculation of the coning compensation. Thus, this algorithm allows the updating speed to be increased, at the same time increasing the number of gyro incremental angle samples, in order to improve the attitude accuracy of the system.

Navigation Frame Rotation Update
The direction cosine matrix in Equation (1) is used to update the attitude matrix which accounts for the angular rate N IN ω of the N frame relative to the inertial space. Similar to the computation of , according to the relationship between rotation vector and direction cosine matrix, can also be expressed in a second order form as follows: where n ζ is the rotation vector that accounts for the angular motion of the N frame from time t n-1 to time t n ; n ζ is the magnitude of n ζ .
Because the updating interval length T n is short (generally equal to 0.0005 s-0.01 s), the angular rate is small and changes slowly over the updating interval [t n-1 , t n ] (due to the small changes in velocity and position over this updating cycle). Then according to the rotation vector rate equation, n ζ can be approximated as follows: where and are the angular rates of the earth frame relative to the inertial frame and the navigation frame relative to the earth frame, respectively; is the position matrix relating the earth frame with the navigation frame; the subscript n − 1/2 indicates the midpoint of the updating interval [t n-1 , t n ].

Integrated Specific Force Increment Update
Similar to the attitude updating algorithm, the integral term Δ in Equation (4) can be formulated based on the first order approximation of as follows: where: The integrand term ( ) B t × α f in Equation (28) has the following expression [3]: Δ can also be expressed as follows: where Δ denotes the velocity rotation compensation term; Δ denotes the sculling compensation term; and: ( ) In principle, the approaches used for the coning compensation can also be applied to the sculling compensation. Similar to the optimal generalized coning compensation algorithm in Equation (18), a generalized sculling compensation algorithm that has the simplest form, the optimal accuracy and the minimum computational throughput takes the following form: where N is the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples for the calculation of the sculling compensation term; α n-i (1, 2,…, N − 1) and υ n-i (1, 2,…, N − 1) are the gyro incremental angle and accelerometer incremental velocity in the (n − i)-th updating cycle; L i (1, 2,…, N − 1) is the constant coefficients for the cross product of α n-i with υ n and υ n-i with α n . Considering the equivalency between the coning compensation and sculling compensation [12,23], similar to Equation (25), the coefficients L i can also be calculated as follows: where A ij and C i can be calculated according to Equations (20) and (22), respectively. Note that different from other existing algorithms, the updating rate of the proposed optimal sculling compensation algorithm is also independent of the number of gyro incremental angle samples and accelerometer incremental velocity samples. Thus, this algorithm allows the updating speed to be increased, at the same time increasing the number of gyro incremental angle samples and accelerometer incremental velocity samples, in order to improve the accuracy of the related velocity updating algorithm.

Related Parameters Extrapolation Update
Because the gravity anomaly and the vertical deviation over the earth surface resulting from mass irregular and shape asymmetric distributions are generally small (the maximum value of the gravity anomaly is only tens of mgal; and the maximum value of the vertical deviation is only tens of arcs), the following approximate model of the gravity is used for most SINS applications: Then the gravity in the navigation frame can be expressed as follows: For the conventional ellipsoidal earth surface model and the E-N-U navigation frame [1,3], , and can also be expressed in the following form: with: where is the curvature matrix in the navigation frame that is a function of position over the earth; ρ ZN is the Z-axis component of ; v N is the velocity vector of the vehicle relative to the earth projected on the navigation frame; R M and R N are the radii of curvature at the earth surface in meridian and in prime vertical, respectively; h is the height; α is the wander angle; L is the latitude; R e is the equatorial radius of earth; e is the flattening of earth; C ij is the i-row and j-column component of .
By the definition of the navigation frame, the orientation of the X axis and the Y axis around the Z axis is somewhat arbitrary. So ρ ZN in Equation (38b) depends on the selection of the axes orientation of the navigation frame. The navigation frame is generally selected as a wander azimuth navigation frame for most SINSs [3]. In this case, ρ ZN is given by: Because the related parameters ( ) (namely, , , ρ ZN , v N , Δ and ) in Equations (6), (10) and (27) are all the functions of position or velocity, and the values of these parameters at the current cycle are not available, thus the ( ) n − 1/2 terms can be approximately calculated based on the linear extrapolation formula in the following form:

Strapdown Algorithm Parallelization
Although the sampling rate of inertial sensors can be up to 2 kHz or even higher, taking into account the complexity of the employed strapdown algorithm and the ability of the current processors, the updating rate of the strapdown algorithm in a serial mode is limited, usually only 200-500 Hz when implemented in a DSP. An effective way to break through the limitation of commonly used navigation computers is to implement the strapdown algorithm on a purely parallel computing platform such as FPGA, and execute the calculations in the algorithm "as concurrently as possible" to make full use of the capability of the parallel computing platform.
The strapdown algorithm proposed in Section 2 can be divided into six modules doing the following calculations severally: the body frame rotation update (M1), the integrated specific force increment update (M2), the related parameters extrapolation update and the navigation frame rotation update (M3), the attitude update (M4), the velocity update (M5) and the position update (M6), as shown in Figure 2. Among them, M1, M2 and M3 can be executed first in a parallel mode; M4 and M5 have to be executed afterwards but also in a parallel mode; finally, M6 is executed.

Module of Body Frame Rotation Update (M1)
The calculations involved in the module M1 are described by Equations (12), (14) and (18), as shown in Figure 3. Once the number of the gyro incremental angle samples N is selected, the corresponding coefficients K i of the coning compensation can firstly be determined offline by Equations (20), (22) and (25) and stored in the memory of the navigation computer for online use; secondly, the coning compensation β n and rotation vector Φ n can be successively computed according to Equations (14) and (18), respectively; finally, the direction cosine matrix that accounts for the angular motion of the B frame is obtained by Equation (12). Note that since the accuracy of the coning compensation directly determines the attitude accuracy of the system, particularly in high dynamic conditions, the coning compensation is generally designed to accurately account for the vibration induced coning effects by selecting the appropriate number of the gyro incremental angle samples.   (11), (14) and (18) can only be executed in a serial mode. From Equations (12), (14) and (18), it is shown that M1 contains the following operations of related minimum calculation particles: cross-product of vectors, product of a skew symmetric matrix with itself, addition of matrixes or vectors and calculation of sine or cosine function. The operations within these minimum calculation particles can be further concurrently processed, which will be discussed in Section 3.2.

Module of Integrated Specific Force Increment Update (M2)
The module M2 carries out the calculations defined by Equations (31), (32a) and (33), as shown in Figure 4. Similar to the coning compensation, once the number of the gyro incremental angle and accelerometer incremental velocity samples N is selected, the corresponding coefficients L i of the sculling compensation can firstly be determined offline by Equations (20), (22) and (34) and stored in the memory of the navigation computer for online use; secondly, the sculling compensation Δ and the velocity rotation compensation Δ can be successively computed according to Equations (32a) and (33); finally, the integrated specific force increment Δ that accounts for the linear motion of the B frame is calculated by Equation (31). Note that since the accuracy of the sculling compensation directly determines the velocity accuracy of the system, particularly in high dynamic conditions, the sculling compensation is generally designed to accurately account for the vibration induced sculling effects by selecting the appropriate number of the gyro incremental angle and accelerometer incremental velocity samples.   (33), it is shown that M2 contains the following operations of related minimum calculation particles: cross-product of vectors and addition of vectors. Similar to the operations within these minimum calculation particles in M1, the operations within these minimum calculation particles can be further processed concurrently, which will be discussed in Section 3.2.

Module of Related Parameters Extrapolation and Navigation Frame Rotation Update (M3)
The module M3 can be further divided into two serial-process modules: the module of related parameters extrapolation update (M31) and the module of navigation frame rotation update (M32), as shown in Figure 5. The computation tasks completed in M31 and M32 are described by Equations (26), (27) and (35-41) respectively.  Figure 5 shows that in M31, the calculation of the gravity and the curvature matrix can be firstly executed in a parallel mode according to Equations (35-39), and then the related parameters ( / , / , / , / , Δ / and / ) can also be calculated in a parallel mode according to Equation (41); in M32, Equations (26) and (27) can only be executed in a serial mode. From Equations (26), (27) and Equations (35-41), it is shown that M31 and M32 contain the following operations of related minimum calculation particles: cross-product of vectors, product of a skew symmetric matrix with its own and addition of matrixes or vectors. Similar to the operations within these minimum calculation particles in module M1, the operations within these minimum calculation particles can be further processed concurrently, which will be discussed in Section 3.2.

Module of Attitude Update (M4)
The module M4 is used to calculate Equation (1), as shown in Figure 6. From Equation (1), it is shown that M4 only contains the following operation of related minimum calculation particles: product of matrixes. Similar to the operations within these minimum calculation particles in module M1, the product operation of matrixes can be further processed concurrently, which will be discussed in Section 3.2.

Module of Velocity Update (M5)
The calculations implemented in the module M5 are defined by Equations (2), (4) and (6), as shown in Figure 7.  Figure 7 shows that the calculation of the integrated transformed specific force increment and the gravity-Coriolis velocity increment can be executed in a parallel mode according to Equations (4) and (6). From Equations (2), (4) and (6), it is shown that M5 contains the following operations of related minimum calculation particles: cross-product of vectors, product of a skew symmetric matrix with its own, product of matrixes or a matrix with a vector and addition of vectors. Similar to the operations within these minimum calculation particles in module M1, the operations within these minimum calculation particles can be further parallelized, which will be discussed in Section 3.2.

Module of Position Update (M6)
In the module M6, the computations defined by Equations (7-11) are carried out, as shown in Figure 8.  can be executed in a parallel mode according to Equations (7-10), in which Equations (7b), (9) and (10) can only be executed in a serial mode, and Equations (7a) and (8) can also only be executed in a serial mode. From Equations (7)(8)(9)(10)(11), it is shown that M6 contains the following operations of related minimum calculation particles: cross-product of vectors, product of a matrix with a vector and addition of vectors. Similar to the operations within these minimum calculation particles in module M1, the operations within these minimum calculation particles can be further parallelized, which will be discussed in Section 3.2. In order to make the updating cycle of the strapdown algorithm shortest, the maximum parallelism degree is usually used as a performance index to optimize the calculation particles involved in the modules M1-M6.
Assume that the execution time of addition (subtraction), multiplication, division, trigonometric and square root operation are defined as T A , T M , T D , T T and T S , respectively. The summation of 3-dimensional vectors, for instance by: 1 2 contains three addition operations which can be executed in a parallel mode. Thus the execution time of the addition operation for two 3-dimensional vectors is The addition of two 3-by-3 matrixes, for instance by: 1 2 contains nine addition operations which can also be executed in a parallel mode. Thus the execution time of the addition operation for two 3-by-3 matrixes is The cross-product of two 3-dimensional vectors, expressed for instance by: contains six multiplication operations and three subtraction operations. All the multiplication operations or the subtraction operations can be executed in a parallel mode, but the subtraction operations must be executed after the multiplication operations. Thus the execution time of the cross-product operation for two 3-dimensional vectors is: The product of two 3-by-3 matrixes, for instance by: contains 27 multiplication operations and 18 addition operations. The multiplication operations can be executed in a parallel mode, but the addition operations must be executed twice in a parallel mode after the multiplication operations. Thus the execution time of the product operation for two 3-by-3 matrixes is: The product of a 3-by-3 matrix with a 3-dimensional vector, which can be defined for instance by: contains nine multiplication operations and 6 addition operations. The multiplication operations can be executed in a parallel mode, but the addition operations must be executed twice in a parallel mode after the multiplication operations. Thus the execution time of the product operation for a 3-by-3 matrix with a 3-dimensional vector is: The product of a skew symmetric matrix with itself defined, for instance by: contains six multiplication operations and three addition operations. The multiplication operations or the addition operations can be executed in a parallel mode, but the addition operations must be executed after the multiplication operations. Thus the execution time of the product operation for a skew symmetric matrix with itself is: Based on the aforementioned execution time analysis of the basic computational operations involved in the modules M1-M6, we can evaluate the computational complexity of each module.

Analysis of Module M1
In the calculation of the coning compensation term β n defined by Equation (18), the N − 1 vectors cross-product operation can be executed in a parallel mode, and the summation of N − 1 vectors can also be successively executed in a parallel mode [log 1 ] times. Thus the execution time of the coning compensation is: And in the calculation of the direction cosine matrix according to Equation (12), the calculation of 2 can be executed in a parallel mode. Thus the execution time of the matrix calculation is:

T T T T T T T T T T T T T T
Based on Equations (56) and (57) and refer to Figure 3, the execution time of module M1 can be obtained as follows: Figure 5 shows that the calculation of the velocity rotation compensation and the sculling compensation can be executed in a parallel mode. According to Equation (48), the execution time of the velocity rotation compensation is:

Analysis of Module M2
Similar to the calculation of the coning compensation term β n , in the calculation of the sculling compensation term Δ defined by Equation (33), the cross-product operations of the 2(N -1) vectors can be executed in a parallel mode, and the summation of N − 1 vectors can also be successively executed in a parallel mode [log 1 ] times. Thus the execution time of the coning compensation is: Based on Equations (59)

Implementation and Simulation of Parallel Strapdown Algorithm on FPGA
The parallel strapdown algorithm proposed in Section 3 has been implemented on a FPGA platform in the structure shown in Figure 9. The data acquisition module receives the input signal (the gyro incremental angle α n , the accelerometer incremental velocity υ n and the initial alignment data Data_Init) and writes the data to the data register module in the operation controller, then notifies the operation controller to start the strapdown calculations through the signal GD_Rdy. The operation controller sends the data stored in the data register module in a parallel mode to the input registers of the floating point unit (FPU), and starts the FPU by the operation starter. The operation results ( , , h n and ) are then exported through the output module. The handshake signals ACK and Data_Rdy are used for the communication between the data acquisition module and the external module such as the initial alignment or noise filtering of inertial sensor output samples that is beyond the scope of this paper; and the operation controller can be reset by the signal RST. The state machine in the operation controller is used to control the execution of operations in an appropriate time sequence. All floating-point operations are carried out in the FPU which is composed of five arithmetic sub-units executing the operations for addition, multiplication, division, square root calculation and trigonometric calculation, respectively. Among them, the adder unit contains k floating-point adders; the multiplier unit contains l floating-point adders; the divider unit contains m floating-point adders; the square root arithmetic unit contains n floating-point adders; where different values of k, l, m and n can be selected according to the hardware resources of the selected FPGA platform.
The parallel strapdown algorithm has been simulated on the Xilinx ISE 12.3 software platform and the hardware platform of the FPGA device XC6VLX550T. The floating-point adder/subtractor, multiplier and other floating-point operations in FPU are constructed by the Xilinx IP core.
In the simulation, both the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples for the calculation of the coning compensation and the sculling compensation are set to two (namely, N = 2), then, according to Equations (20), (22), (25) and (34), the coning compensation term and the sculling compensation term defined in Equations (18) and (33) can be expressed as follows: To demonstrate the performance of the proposed parallel strapdown algorithm, the simulation results in a typical updating interval are shown in Figure 10 as the behavioral simulation waveform graph yielded by Xilinx ISE 12.3, and listed in Table 1 where all the data are accurate to four decimal places. The updating interval length T n in the simulation is 1.0e−3 s, and the clock frequency of the FPGA is set to 10 ns. The same simulation scenarios are calculated on a MATLAB R2007a platform with the strapdown algorithm in Section 2, the results are consistent with those shown in Table 1.  (b) Figure 10 shows that the signal GD_Rdy has a pulse output at time 20.47 us, and the signal OUT_Rdy has a pulse output at time 30.66 us. This means that the start time and end time of the parallel strapdown algorithm on the FPGA platform are 20.47 us and 30.66 us, respectively. Then the execution time of this parallel strapdown algorithm is only 10.19 μs, when the clock frequency is selected as 10 ns. But the execution time of a strapdown algorithm on a DSP platform is generally in milliseconds. Thus the execution speed of parallel strapdown algorithm on the FPGA platform is much faster than the conventional algorithm on a DSP platform.
The resource utilization of the parallel strapdown algorithm on the hardware platform of the FPGA device XC6VLX550T is shown in Table 2 where Slice Registers, Slice LUTs and DSP48Es are the registers, the look-up tables and the multipliers based on the intellectual property (IP) hard-core of the Xilinx FPGA, respectively.

Conclusions
In this paper, a new generalized optimum strapdown algorithm with the coning and sculling compensation is presented, in which the PVA updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single chip FPGA, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the software platform of Xilinx ISE 12.3 and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform.