Next Article in Journal
OA-YOLOv8: A Multiscale Feature Optimization Network for Remote Sensing Object Detection
Previous Article in Journal
Study on Pressure-Bearing Performance and Application of Narrow Coal Pillars Reinforced by Bidirectional Tension Anchor Cables
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Attitude Estimation Method Based on LSTM-Enhanced Extended Kalman Filter

1
School of Mechanical Engineering, Hubei University of Technology, Wuhan 430068, China
2
Hubei Key Laboratory of Modern Manufacture Quality Engineering, Wuhan 430068, China
3
AVIC Beijing Changcheng Institute of Metrology & Measurement, Beijing 100095, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(3), 1466; https://doi.org/10.3390/app16031466
Submission received: 8 January 2026 / Revised: 29 January 2026 / Accepted: 30 January 2026 / Published: 31 January 2026

Abstract

Visual–inertial attitude estimation systems often suffer from accuracy degradation and instability when visual measurements are intermittently lost due to occlusion or illumination changes. To address this issue, this paper proposes an LSTM-EKF framework for dynamic attitude estimation under visual information loss. In the proposed method, an LSTM-based vision prediction network is designed to learn the temporal evolution of visual attitude measurements and to provide reliable pseudo-observations when camera data are unavailable, thereby maintaining continuous EKF updates. The algorithm is validated through turntable experiments, including long-term reciprocating rotation tests, continuous visual occlusion scanning experiments, and attitude accuracy evaluation experiments over an extended angular range. Experimental results show that the proposed LSTM-EKF effectively suppresses IMU error accumulation during visual outages and achieves lower RMSE compared with conventional EKF and AKF methods. In particular, the LSTM-EKF maintains stable estimation performance under a certain degree of visual occlusion and extends the effective attitude measurement range beyond the camera’s observable limits. These results demonstrate that the proposed method improves robustness and accuracy of visual–inertial attitude estimation in environments with intermittent visual degradation.

1. Introduction

Accurate and continuous six-degree-of-freedom pose measurement has been widely recognized as one of the keys enabling technologies in intelligent manufacturing, industrial robot calibration, and autonomous driving. Among them, the accuracy of attitude measurement, which involves three rotational degrees of freedom, directly affects the precision of overall pose estimation. Since single-sensor-based schemes are susceptible to environmental disturbances and inherently limited in terms of measurement accuracy and long-term stability, multi-sensor data fusion–based attitude measurement methods have emerged as an effective solution for enhancing system robustness and measurement accuracy [1].
To improve attitude measurement accuracy, Sun et al. proposed a dual-IMU/vision integrated attitude estimation algorithm for mobile platforms, achieving fast and accurate relative attitude estimation [2]. Yan et al. developed a laser-tracking-based attitude measurement method that combines a two-dimensional position-sensitive detector with a monocular vision system [3]. Xiong et al. enhanced the accuracy of vision/IMU fusion algorithms by incorporating a real-time calibration–based multi-factor influence analysis framework together with an AEKF [4]. Overall, compared with single-sensor-based approaches, these multi-sensor fusion methods significantly improve attitude measurement accuracy. However, when information sources are missing in the measurement system—particularly when high-precision sensors fail—the measurement performance deteriorates rapidly, leading to limited system robustness.
The problem of integrating inertial navigation systems with auxiliary sensors under intermittent signal loss has been extensively studied, particularly in the context of INS/GNSS integration, where a substantial methodological foundation has been established. Early studies focused on exploiting motion constraints, such as non-holonomic constraints [5] and zero-velocity updates (ZUPT) [6], to mitigate error divergence during GNSS outages. Subsequently, adaptive filtering techniques were developed to cope with time-varying noise characteristics and model uncertainties [7].
More recently, data-driven approaches have gained increasing attention. Recurrent neural networks, including LSTM and GRU architectures, have been widely employed to predict pseudo-measurements or error states during GNSS signal outages, demonstrating significant improvements over conventional Kalman filtering methods [8,9]. These learning-based techniques effectively capture the temporal dependencies of inertial error dynamics and have been shown to enhance robustness during extended signal loss.
Despite these advances, direct application of GNSS-based integration strategies to vision–inertial systems remain challenging. Visual signal loss differs fundamentally from GNSS outages in terms of spatial dependency, feature availability, and error propagation characteristics. Moreover, vision-based systems primarily provide relative pose constraints, making attitude error accumulation the dominant failure mode during visual occlusion. These differences motivate the need for dedicated fusion strategies tailored to vision–inertial systems, which is the focus of this work.
KF is one of the most widely used approaches for multi-sensor data fusion. Compared with the standard KF, the EKF exhibits superior nonlinear handling capability. Unlike AKF, EKF does not impose strict requirements on initial parameter settings, and compared with the UKF, it offers lower computational complexity and better real-time performance [10,11,12]. However, the loss of measurement information sources can have a more pronounced adverse impact on EKF-based estimation performance [13].
To address the limitations of conventional EKF, recent studies have explored the integration of neural networks and deep learning techniques with traditional filtering frameworks, aiming to mitigate performance degradation caused by missing or unreliable measurement information during continuous iterative estimation [14].
Recent data-driven approaches integrate neural networks with Kalman filtering to improve state estimation accuracy by learning motion characteristics. LSTM- or BP-based methods have been shown to reduce estimation errors and enhance robustness compared with conventional filters [15,16,17,18]. However, their practical application is often limited by high computational cost, sensitivity to motion changes, and strong dependence on training data, especially under sensor degradation or visual information loss. These limitations motivate the development of an efficient learning-assisted EKF framework capable of maintaining stable and accurate estimation during visual occlusion.
In summary, existing data-driven and hybrid learning-based methods have demonstrated the potential of neural networks in improving tracking accuracy. However, most of these approaches suffer from high computational complexity, limited adaptability to abrupt motion, and strong dependence on data and application scenarios. Motivated by these limitations, proposed a LSTM-EKF–based dynamic attitude measurement framework that integrates temporal prediction into the EKF measurement update process, enabling stable and accurate attitude estimation under prolonged visual occlusion. By using the variation in object attitude obtained from vision measurements as training samples, the proposed LSTM network provides relatively high-accuracy reference outputs when visual measurements are unavailable due to environmental disturbances. Consequently, stable and continuous dynamic attitude measurement of the target can be achieved, enhancing the stability and robustness of the attitude measurement system.
The main contributions of this article are summarized as follows:
  • An LSTM-EKF-based dynamic attitude estimation algorithm is proposed to address the degradation of estimation accuracy caused by visual information loss. By integrating temporal prediction into the EKF framework, the proposed method enhances the stability and robustness of attitude estimation under vision-degraded conditions;
  • A visual prediction LSTM network is designed to compensate for missing visual measurements, providing reliable pseudo-observations for the EKF when visual data are unavailable. This enables stable attitude estimation during a certain degree of visual occlusion and effectively extends the measurable angular range of the attitude measurement system;
  • Extensive experimental validations are conducted on a precision turntable platform, including long-term reciprocating rotation experiments, visual occlusion scanning experiments, and ablation studies. The results demonstrate that the proposed LSTM-EKF approach achieves higher accuracy and stronger robustness than EKF and AKF methods, particularly under visual occlusion.

2. Principle of Dynamic Attitude Measurement

2.1. Attitude Measurement System Components

To ensure real-time performance of the measurement system, both the IMU and the vision sensor are connected to the host computer via wired communication. The IMU is rigidly mounted on the target object. The overall configuration of the IMU/vision-based attitude measurement system is illustrated in Figure 1.
In this system, the following coordinate systems are defined: The origin of the vision coordinate system ( O V _ X V Y V Z V ) is located at the optical center of the vision, along the optical axis, the Z V axis points forward, the X V axis is horizontally to the right, and the Y V axis points vertically downward. The origin of the cooperative target coordinate system ( O T _ X T Y T Z T ) is located at the center of cooperative target, the vertical direction of the Z T axis points to the IMU, and its X T and Y T axes are determined by the right-hand rule. The origin of the IMU coordinate system ( O I _ X I Y I Z I ) is at the sensitive center of the IMU, and the X I , Y I , and Z I axes are parallel to the corresponding axes of the target coordinate system.
The camera coordinate system is defined as the global reference frame. The transformation matrix R I T between the target coordinate system and the IMU coordinate system is obtained through a rotary stage-based dynamic calibration procedure.
As illustrated in Figure 2, data for the dynamic attitude measurement system are acquired by both the IMU and the camera. The target images captured by the camera are first subjected to preprocessing operations, including attitude estimation and time-axis alignment. Subsequently, the preprocessed attitude angle data are evaluated to determine whether the camera output is abnormal. When the camera-derived attitude angles are valid, an EKF is employed for state updating. In contrast, when the camera measurements are abnormal or unavailable, an LSTM–EKF fusion algorithm is activated to maintain stable system operation. The detailed implementation procedure is described as follows.
First, the EPnP [19] algorithm is used to solve the target’s attitude angles in the visual coordinate system, and then the results are transformed into the global coordinate system through a rotation matrix. The monocular camera captures an image of the target, and after processing, the pixel coordinates of the feature points on the target are obtained:
In the EPNP algorithm, the space point P can be represented as the weighted sum of four control points V 1 T , V 2 T , V 3 T , V 4 T , in the target coordinate system: P i T = j 4 a i j V j T , i = 1,2 , , n .
Where P i T represents the coordinates of the target feature points in the target coordinate system, and n is the number of target feature points. a is the weight coefficient of the control point, satisfying j 4 a j = 1 , V j T = x j T y j T z j T T .
Similarly, from the invariance of the linear relation under the Euclidean transformation, the target feature points can also be expressed in the camera coordinate system as:
P i C = j 4 a i j V j C , i = 1,2 , , n
In the camera coordinate system, the point P i T can be converted to the point P i C in the camera coordinate system by the rotation matrix R and the translation vector t :
P i C = R P i T + t
The image coordinates corresponding to the target feature points are ( u i , v i ), and the camera is calibrated by Zhang Zhengyou’s plane calibration method, and the internal reference matrix is A, which can be obtained according to the perspective projection model:
s i u i v i 1 = A K i C = f x 0 u x 0 f y v y 0 0 1 j = 1 m a i j x j c y j c z j c
where s i is the projection depth of the feature point; f x , f y , u x and v y are the internal parameters of the camera.
From the above formulas and the correspondence between the four spatial points and the image points, a linear system of equations for the weight coefficient a i j can be obtained. After solving the system of equations, the rotation matrix R T C and the translation vector t can be determined by minimizing the reprojection error, and the expression for the reprojection error is:
E = i = 1 n u i v i f x X i C Z i C + c x f y Y i C Z i C + c y 2
where c x and c y are the principal point coordinates of the camera. Convert R T C to a 3 × 3 matrix:
R T C = r 11 r 12 r 13 r 21 r 22 r 23 r 31 r 32 r 33 · q
Based on the relationship between the rotation matrix and the Euler angle, the target attitude angle can be expressed as:
α C = arctan r 21 r 11 β C = arcsin r 31 γ C = arctan r 32 r 33
Since the target is rigidly connected with the test object, the attitude angle θ C of the test object in the camera coordinate system can be obtained by the above formula:
θ C = α C β C γ C T
Secondly, the attitude angle under the IMU coordinate system is obtained by the quaternion method, which is converted to the global coordinate system.
The quaternion solution formula based on differential equations is used to calculate the angular velocity data collected by the IMU gyroscope. Assuming the quaternion q = q 0 ,   q 1 ,   q 2 ,   q 4 T , the angular velocity vector output by the IMU is ω = ω x ω y ω z T , and its differential equation can be expressed as:
d q d t = 1 2 0 ω x ω y ω z ω x 0 ω z ω y ω y ω z 0 ω x ω z ω y ω x 0 · q
In order to solve this differential equation, the system employs the fourth-order Runge–Kutta method [20] for numerical integration. Then, normalization is performed to obtain q t , n e w :
q t , n e w = q t + Δ t q t + Δ t ,   q t + Δ t = q 0 2 + q 1 2 + q 2 2 + q 3 2
Furthermore, the attitude quaternion is converted into a rotation matrix through the conversion relationship between the quaternion and the rotation matrix, and the conversion formula is as follows:
R I = 1 2 q 2 2 + q 3 2       2 q 1 q 2 q 0 q 3       2 q 1 q 3 q 0 q 2 2 q 1 q 2 + q 0 q 3       1 2 q 1 2 + q 3 2       2 q 2 q 3 + q 0 q 1 2 q 1 q 3 + q 0 q 2       2 q 2 q 3 + q 0 q 1       1 2 q 1 2 + q 2 2
Then, according to the relationship between the target coordinate system, the inertial coordinate system and the camera coordinate system, the rotation matrix containing the attitude information is converted to the global coordinate system, which is expressed as:
R I C = R T C · R I T · R I
Similarly, based on the relationship between the rotation matrix and the Euler angles, the IMU calculates the attitude angle θ I :
α I = a r c t a n 2 ( 2 ( q 2 q 3 + q 0 q 1 ) , 1 2 ( q 1 2 + q 2 2 ) ) β I = a r c s i n ( 2 ( q 1 q 3 q 0 q 2 ) ) γ I = a r c t a n 2 ( 2 ( q 1 q 2 + q 0 q 3 ) , 1 2 ( q 2 2 + q 3 2 ) )
θ I = α I β I γ I T
In this way, the Camera and IMU measurement results can be transformed into attitude angles.

2.2. Algorithm Design by LSTM-Enhanced Extended Kalman Filter

In the proposed dynamic attitude measurement system, the EKF serves as the core approach for multi-sensor data fusion. In this study, the IMU outputs are formulated as the system state vector X t = [ α I , t β I , t γ I , t ] T , while the attitude angles estimated by the camera are defined as the observation vector Z t = [ α C , t β C , t γ C , t ] T . Based on these definitions, the attitude measurement system can be modeled as a nonlinear state-space system, which is expressed as follows:
X t = f ( X t 1 , u t ) + ω t Z t = h ( X t ) + v t
where X t represents the state quantity of the system at time t , Z t is the measured value of the system at time t , ω t and v t represent the state noise and measurement noise of the system at time t , respectively. ω t ~ 0 , Q t and ν t ~ 0 , R t . u t represents the control input vector. f ( X t 1 , u t ) and h ( X t ) represent the nonlinear state transition function and observation function.
EKF primarily includes two steps: prediction and update.
Prediction Step:
X t + 1 | t = f ( X t , u t )
P t + 1 | t = F t P t F t T + L t Q t L t T
where F t is the Jacobian matrix of the state transition function and L t is the process noise Jacobian matrix, specifically:
F t = f X t , u t X t
L t = f X t , u t ω t
Update Step:
The Kalman Gain:
K t + 1 = P t + 1 | t H t + 1 T H t + 1 P t + 1 | t H t + 1 T + R t 1
where H k + R is the Jacobian matrix of the observation equation.
H t = h X t X t
State Update:
X t + 1 = X t + 1 | t + K t + 1 ( z t + 1 h ( X t + 1 | t ) )
Error Covariance Update:
P t + 1 = I K t + 1 H t + 1 P t + 1 | t
where I is the identity matrix.
Since the LSTM network is a type of recurrent neural network with strong capability in modeling temporal sequences, it has been widely applied in fields such as target tracking, speech translation, and image caption generation. Compared with the conventional RNN, LSTM effectively alleviates the problems of vanishing gradients and exploding gradients during training. Moreover, by introducing three gating mechanisms—namely the forget gate, input gate, and output gate—LSTM is able to selectively retain and update long-term and short-term information. The architecture of the LSTM network is illustrated in Figure 3.
In Figure 3, f t ,   i t and o t represent the output values of the forget gate, input gate, and output gate, respectively; C t represents the cell state; and H t represents the update and activation of the current cell state.
Among them, the forget gate f is used to control the information to be forgotten from the memory c t 1 . The formula is as follows:
f t = σ ( w f [ H t 1 , X t ] + b f )
where σ is the sigmoid activation function, which can be defined as:
σ ( x ) = 1 1 + e x
Equation (24) and the following equations appearing w f ,   w i ,   w o and w a represent the weight parameters; b f ,   b i ,   b o and b a represent the bias parameters.
The input gate i t determines how many inputs enter the current cell state:
i t = σ ( w i [ H t 1 , X t ] + b i )
Candidate memory g t provides an alternative input, which is converted to a value between −1 and 1 by a hyperbolic tangent function, calculated as follows:
g t = tan h ( w g [ H t 1 , X t ] + b g )
The cell state can then be found by the following formula:
C t = f t C t 1 + i t C t
The output gate o t determine the output by obtaining the output of the unit:
o t = σ ( w o [ h t 1 , X t ] + b o )
H t = o t tan h ( C t )
Based on the above principles, the LSTM-EKF algorithm structure designed in this paper is shown in Figure 4:
Figure 4 illustrates the detailed update procedure of the EKF employed in this study, as well as the basic architecture of the LSTM network used for visual measurement prediction. As shown in Figure 4, at the EKF update instants when visual information is available, the fused attitude angles are jointly determined by the measurements from the IMU and the camera. When the camera provides normal attitude measurements, a standard EKF update is performed to achieve attitude fusion between the IMU and visual data.
In the case of abnormal or missing visual outputs, the visual attitude variations over the previous N time steps are fed into the LSTM network as an input sequence, where the value of N is determined through simulation analysis. The LSTM network outputs the predicted visual attitude increment Δ θ p r e , t from the last valid visual measurement to the current time step. By adding this predicted increment to the previous visual attitude measurement, the predicted visual attitude at the current time step is obtained, which serves as the observation vector Z p r e . This predicted observation is then incorporated into the EKF update, enabling continuous and stable attitude estimation even under visual measurement degradation.
Therefore, the system observation vector can be expressed as:
Z t = Z C ,     i f Z C , t X t + 1 | t < α Z p r e , o t h e r w i s e
where α is the switching threshold.
By integrating the LSTM network with the EKF, continuous dynamic attitude measurement can be achieved even under abnormal or missing camera measurements. The proposed approach significantly enhances the stability and robustness of the IMU/camera-based attitude measurement system, and effectively maintains high estimation accuracy when the visual system is affected by adverse factors such as illumination variations and occlusions.

3. Training Parameter Settings and Simulation

3.1. Dataset Construction and Network Structure Design

In Section 2, the overall structure of the proposed algorithm is defined. Within the fusion framework, the accuracy of the camera serves as the primary factor affecting the performance of the measurement system. Consequently, during visual occlusion, the accuracy of the visual prediction network directly determines the algorithm’s reliability in this period. To enhance the prediction performance of the neural networks, this section focuses on designing the network architecture.
Considering the output frequency of the camera in the attitude dynamic measurement system, an excessively short input sequence length results in large prediction errors, whereas an overly long input sequence reduces the network’s responsiveness to changes in complex measurement environments. Therefore, we select an optimal input length within the range of 5-to-20-time steps. To determine the best configuration, time-series cross-validation with five folds is employed to evaluate the influence of different input lengths on network performance, as illustrated in Figure 5.
As shown in Figure 5, when the input sequence length increases from 5 to 10, both the RMSE and MAE of the network decrease steadily. However, beyond a length of 10, RMSE and MAE exhibit only slight fluctuations around a nearly constant value. Therefore, we select an input sequence length of 16, as it yields a relatively small loss index. Additionally, increasing the input sequence length significantly increases the algorithm’s training time, which is unfavorable for handling visual occlusion scenarios at the beginning of the measurement process.
The hyperparameters of the neural network are determined using a random search strategy. Specifically, the number of layers is selected from 1, 2, 3 and 4, and the number of cells is chosen from 16, 32, 64 and 128. The combination yielding the minimum RMSE is selected, as shown in Figure 6.
As illustrated in Figure 6, for the LSTM, the minimum RMSE of 0.128° is achieved when the network contains one LSTM layer with 64 hidden units. The LSTM incorporates a dropout layer following the LSTM layer. This dropout layer is introduced to mitigate overfitting during training on the sample set. Subsequently, a fully connected layer is employed to reduce the dimensionality of the extracted features, thereby lowering computational complexity and further alleviating overfitting. A ReLU activation function is then applied to enhance the network’s ability to model complex nonlinear relationships and to address the vanishing gradient problem. Finally, another fully connected layer is appended to match the output requirements.
The models were developed using MATLAB 2022b’s Neural Network Toolbox. The experiments were conducted on a system running Windows 11 with an AMD R9-7945HX CPU, an NVIDIA RTX 4060 GPU with 8 GB of video memory, and 16 GB of RAM. The networks were trained using the Adam optimizer, which adaptively updates the network parameters based on the first- and second-order moments of the gradients.

3.2. Performance Simulation Analysis of Fusion Algorithms

After determining the system structure, simulations are conducted to evaluate the performance of the proposed algorithm. The simulation conditions are summarized in Table 1.
The training of the neural networks in the LSTM-EKF algorithm plays a critical role in the overall system performance. Therefore, prior to initiating the simulation experiments, the network is trained using predefined parameters. The training parameters are summarized in Table 2.
After finalizing the algorithm structure and training the neural networks using the designated datasets, simulation experiments were conducted to evaluate the performance of the proposed LSTM-EKF algorithm. For comparison, the traditional AKF and standard KF algorithms were also implemented. The total simulation duration was set to 60 s. During this period, the target was assumed to undergo constant angular motion with a rotational velocity of 5°/s.
To simulate visual occlusion scenarios, camera outputs were artificially removed between 31 and 40 s, representing 10 consecutive missing vision outputs. The goal was to evaluate the robustness of the algorithm under camera failure. The simulation results are illustrated in Figure 7.
As illustrated in Figure 7 the IMU-only measurements exhibit significant cumulative errors over time due to sensor drift and noise. These errors can be effectively mitigated through the fusion of IMU and vision data. During periods of visual data loss, the accuracy of both the KF and AKF degrades considerably. The AKF also exhibits a noticeable lag in measurement, attributed to disruptions in the visual input that impair the adaptive adjustment of filter parameters, resulting in model inaccuracies. Nevertheless, the AKF is capable of gradually restoring accuracy once visual information becomes available again. In contrast, the proposed LSTM-EKF algorithm maintains superior estimation accuracy throughout the occlusion period. This improvement is attributed to its integrated visual prediction and residual correction mechanisms. Consequently, the LSTM-EKF demonstrates high-precision measurement capability and effectively suppresses error divergence in the presence of visual occlusion, thereby enhancing the robustness and reliability of the measurement system.

4. Experiment

4.1. Experimental Platform

As illustrated in Figure 8, a precision two-dimensional turntable was employed to construct the experimental platform for validating the performance of the proposed dynamic attitude measurement algorithm.
In Figure 8, the IMU used in the experiment is the ADIS16488, a 10-degree-of-freedom sensor rigidly mounted on a fixed position of the cooperating target. The vision sensor utilized is the Basler acA2500-20gm (Basler AG, Ahrensburg, Germany), offering a resolution of 2590 × 2048 pixels.

4.2. Design of Experiments

To evaluate the performance of the proposed LSTM-EKF algorithm, three experiments were conducted: (1) a long-term reciprocating rotation experiment, (2) a camera occlusion scanning experiment, and (3) an attitude accuracy evaluation experiment. The goal of the long-term reciprocating rotation experiment is to verify that the LSTM-EKF algorithm can effectively suppress the divergence of IMU cumulative errors during extended measurements. In addition, by comparing with conventional AKF and KF algorithms, this test demonstrates the proposed algorithm’s ability to accurately track target attitude even under visual occlusion.
The attitude accuracy evaluation experiment assesses the angular measurement accuracy at various rotation angles. It aims to validate that, compared with the AKF algorithm, the LSTM-EKF algorithm maintains high accuracy even when the rotation exceeds the vision sensor’s measurement range.
Prior to the experiments, the total station and vision sensor were rigidly mounted at fixed positions, and the cooperative target was secured to the turntable. The IMU was installed inside the target. Subsequently, calibration was performed for both the IMU error model and the intrinsic parameters of the vision sensor. The rotation matrices between all relevant coordinate systems were determined to establish their spatial relationships. The IMU sampling frequency was set to 100 Hz, and the vision sensor was configured to sample at 1 Hz. The specific parameters of the turntable and IMU used in the experiment are shown in Table 3.
The experimental procedures are as follows:
  • Long-Term Reciprocating Rotation Experiment:
Since complex motions can theoretically be decomposed into simple motions of different periods, this paper adopts a relatively simple uniform-speed periodic reciprocating motion to preliminarily verify the effectiveness of the method. In subsequent work, we will consider more complex motion patterns.
The turntable was configured to rotate back and forth over a ±50° range. The measurement system was activated, and the turntable executed multiple reciprocating rotations. During the experiment, the vision sensor was intermittently occluded for various durations to simulate visual missing. Upon completion of the predefined number of rotations, the turntable was stopped and stabilized before terminating the measurement.
2.
Camera Occlusion Scanning Experiment:
Ten sets of repeated experiments were conducted for Occlusion Scanning. All algorithms were first evaluated using unobstructed data to obtain the benchmark RMSE. Subsequently, visual occlusion segments with frame lengths ranging from 1 to 30 were introduced at identical time points across all experimental runs. The RMSE of each algorithm with respect to the ground truth was then calculated, and the experimental results were recorded for further analysis.
3.
Attitude Accuracy Evaluation Experiment:
The turntable was programmed to perform a single rotation from 0° to 80°, with measurements taken at every 5° interval. For each angle, repeated measurements were conducted. After the system initialization, the turntable was rotated to the specified angle, and the system recorded the results after stabilization was achieved.

4.3. Experimental Results and Analysis

  • Long-Term Reciprocating Rotation Experiment:
The experimental results of the long-term reciprocating rotation experiment are illustrated in Figure 9.
As shown in Figure 9, the shaded regions indicate periods of visual signal loss. The visual output is categorized into measured and predicted values, which are clearly distinguished in the figure. During the occlusion periods, the missing visual data are reconstructed using the LSTM. From left to right, the number of missing visual frames is 5, 10, and 20, respectively. The t1, t2, and t3 are, respectively, the start times of visual occlusion.
It is evident that the standalone IMU exhibits significant cumulative errors over prolonged measurements. However, the proposed LSTM-EKF algorithm effectively corrects the IMU drift, significantly enhancing the measurement accuracy. Notably, even during visual occlusion, the LSTM-EKF algorithm maintains reliable attitude tracking and measurement performance.
To further evaluate the robustness of the algorithm, the LSTM-EKF is compared with the traditional AKF and KF algorithms under varying durations of visual occlusion. The comparative results are presented in Figure 10.
As illustrated in Figure 10, it is evident that the errors associated with the AKF and KF algorithms progressively increase once the visual input is unavailable. In contrast, the LSTM-EKF maintains stable performance by adaptively correcting the IMU drift during the occlusion interval through its LSTM.
Due to different initial moments of visual occlusion, the cumulative errors corresponding to zero time differ in Figure 10a–c. The initial moment of occlusion in Figure 10c is a quasi-static moment, at which the cumulative errors corresponding to AKF and KF are minimal. This is also one of the reasons why the error of our method is relatively large within 11 s. Furthermore, from the overall effect of Figure 10c, our method can control the error within a certain range within 20 frames. Especially when the errors of AKF and KF gradually diverge with time, our method’s suppression effect on errors is more significant.
When the duration of visual occlusion becomes prolonged, as shown in Figure 10c, the performance of the AKF and KF methods deteriorates significantly. Specifically, once the visual loss exceeds 11 s, their errors rapidly diverge. This is because when the camera is obstructed, AKF and KF degrade to updating solely with the IMU. When a change in motion occurs, without corrections from the camera output, the error will diverge rapidly. However, the proposed LSTM-EKF algorithm effectively suppresses this error divergence by leveraging the visual prediction and residual adaptively correcting mechanism, thereby maintaining a high level of measurement accuracy even in extended occlusion scenarios.
To comprehensively evaluate the performance of the three fusion algorithms, we compare both their overall measurement performance and their performance during visual occlusion. The comparison results are summarized in Table 4.
As shown in Table 4, the LSTM-EKF algorithm achieves the lowest RMSE and ME across the entire measurement duration compared with both AKF and KF, demonstrating superior accuracy and long-term stability. Although KF slightly outperforms AKF due to the constant rotational motion of the turntable, LSTM-EKF still improves the MAE by 19% and reduces the RMSE and ME by approximately 29% relative to KF.
During the visual occlusion periods, the estimation errors of both AKF and KF increase significantly, while the LSTM-EKF maintains consistent accuracy. Compared with AKF—which exhibits lower errors among traditional methods—LSTM-EKF achieves an 85% reduction in ME and approximately 59% reductions in both RMSE and MAE. This performance gain is attributed to the LSTM’s ability to accurately predict during vision loss. Moreover, the MAE and RMSE during occlusion remain close to those over the full sequence, confirming the algorithm’s robustness. These results indicate that the proposed LSTM-EKF algorithm provides accurate and reliable attitude estimation, especially under challenging conditions such as extended visual occlusion.
2.
Camera Occlusion Scanning Experiment:
To evaluate the maximum effective prediction duration of the proposed LSTM network, a visual-missing duration scanning experiment was conducted. Ten long-duration oscillation experiments were performed with the same visual occlusion onset time. Under different visual missing durations, the RMSE of the IMU-only, KF, AKF, and LSTM-EKF methods were systematically compared. The experimental results are presented in Figure 11.
As shown in Figure 11, the estimation accuracy of all compared methods generally degrades as the number of consecutively missing camera frames increases. Nevertheless, among all fusion strategies, the LSTM-EKF method consistently achieves the highest estimation accuracy under different visual missing durations.
When the camera measurements are missing for 1–10 consecutive frames, both the LSTM-EKF and AKF approaches exhibit significantly lower RMSEs than the conventional KF. This improvement can be attributed to the adaptive mechanism of AKF, which is capable of adjusting model and noise parameters within a short time interval to suppress error divergence. However, when the number of consecutively missing frames exceeds 10, the RMSE of AKF also begins to diverge, showing a trend similar to that of the KF.
In contrast, the LSTM-EKF method exploits the LSTM network to predict the camera outputs and provides effective pseudo-measurements to the EKF during visual outages, thereby compensating for missing measurement updates and extending the duration over which RMSE divergence is suppressed. When the number of missing frames exceeds 20, the RMSE of the LSTM-EKF method also starts to diverge. These results indicate that, within 20 consecutive missing camera frames, the proposed LSTM-EKF approach can more effectively correct the accumulated IMU errors and maintain stable EKF updates compared with the KF and AKF methods.
3.
Attitude Accuracy Evaluation Experiment:
Since the effective measurement range of the vision sensor is limited to 55°, visual measurements fail when the turntable rotation angle exceeds this range. The proposed LSTM-EKF algorithm leverages neural network predictions to estimate angles beyond this visual range, thereby effectively extending the measurement capability of the system. To evaluate this, an angular accuracy evaluation experiment was designed with turntable rotation angles ranging from 0° to 80°, in 5° increments, with multiple repeated measurements conducted for each group. As a representative example, the measurement results for an 80° rotation are illustrated in Figure 12.
In Figure 12, two types of colored markers indicate the visual measurement output and the predictions from the LSTM. The total measurement duration was approximately 21 s during the 80° rotation. Over this period, the IMU accumulated an error of approximately 1.3°, which continued to diverge over time. However, the errors associated with the LSTM-EKF, AKF, and KF algorithms were significantly smaller, indicating effective suppression of IMU error drift.
When the rotation angle exceeds 55°, the cooperative target moves out of the vision sensor’s observable range. During this visual loss phase, the LSTM-EKF algorithm employs its LSTM to continue correcting the system’s outputs based on learned motion patterns, thereby maintaining high measurement accuracy. The quantitative evaluation results of angular measurement accuracy are summarized in Table 5.
As shown in Table 5, the absolute error of IMU angle measurements increases with larger rotation angles. In contrast, the measurement errors of both the LSTM-EKF and AKF fusion methods are significantly lower than that of the standalone IMU. A comparison between LSTM-EKF and AKF demonstrates that LSTM-EKF consistently achieves higher angular measurement accuracy.
Although the performance of both fusion methods degrades when the target motion exceeds the visual measurement range due to visual loss, the LSTM-EKF algorithm maintains superior accuracy due to its use of a visual prediction network. These allow it to adaptively correct the fusion results even in the absence of camera inputs.
In the angular range of [0°, 55°], where the camera measurements are valid, the maximum absolute error of the LSTM-EKF method is 0.18°, compared to 0.29° for the AKF method. In the range of [55°, 80°], where vision is occluded, the LSTM-EKF maintains a maximum absolute error of 0.26°, while the AKF error increases to 0.38°.
To evaluate the performance contribution of the LSTM network in the proposed LSTM-EKF algorithm, an ablation study was conducted to analyze the impact of the LSTM network on attitude estimation accuracy in the three aforementioned experiments, using RMSE as the evaluation metric. In Table 6, Experiment 1, Experiment 2, and Experiment 3 correspond to the Long-Term Reciprocating Rotation Experiment, the Camera Occlusion Scanning Experiment, and the Attitude Accuracy Evaluation Experiment, respectively. The experimental results are summarized in Table 6.
As shown in Table 6, compared with the EKF algorithm without the LSTM network, the incorporation of the LSTM significantly improves the attitude estimation performance. Specifically, the RMSE is reduced by approximately 52.4% in the Long-Term Reciprocating Rotation Experiment, 21.9% in the Camera Occlusion Scanning Experiment, and 45.7% in the Attitude Accuracy Evaluation Experiment, respectively. Since the Camera Occlusion Scanning Experiment demonstrated the superior performance of the LSTM-EKF method under conditions of up to 20 consecutive missing camera frames, the ablation study herein reports the average RMSE across occlusion durations from 1 to 20 frames.
In summary, the proposed LSTM-EKF method effectively combines the high dynamic responsiveness of the IMU with the high-accuracy characteristics of vision sensors and the temporal prediction capabilities of LSTM neural networks. Compared to conventional vision systems, this approach significantly increases the measurement frequency while reducing the cumulative error typically associated with IMU-based systems during dynamic motion. As a result, the LSTM-EKF algorithm enables high-precision dynamic attitude estimation even under visual occlusion conditions.

5. Conclusions

To address the accuracy degradation of IMU–vision integrated attitude measurement systems under visual information loss, proposed a dynamic attitude estimation method based on an LSTM-enhanced Extended Kalman Filter (LSTM-EKF). By introducing an LSTM network to predict visual measurement increments during camera outages, the proposed framework compensates for missing observations and maintains stable EKF updates, thereby effectively suppressing IMU cumulative error drift.
Experimental results obtained from a precision two-dimensional turntable demonstrate the effectiveness and robustness of the proposed approach. Compared with conventional KF and AKF methods, the LSTM-EKF consistently achieves lower estimation errors in long-term reciprocating motion and maintains reliable attitude tracking during visual occlusion. A continuous visual occlusion scanning experiment further shows that the proposed method can effectively suppress RMSE divergence for up to 20 consecutive missing frames, significantly outperforming both KF and AKF. Moreover, an attitude accuracy evaluation experiment confirms that the LSTM-EKF extends the effective measurement range beyond the vision sensor’s limit, achieving a maximum absolute error of 0.16° over an 80° rotation.
An ablation study further verifies the critical role of the LSTM network in improving system performance. Compared with the EKF without LSTM assistance, the proposed method reduces RMSE by approximately 52.4%, 21.9%, and 45.7% in the long-term reciprocating rotation, continuous occlusion scanning, and attitude accuracy evaluation experiments, respectively. Overall, the proposed LSTM-EKF framework effectively integrates the high dynamic responsiveness of IMU sensors, the accuracy of vision-based measurements, and the temporal prediction capability of LSTM networks, enabling accurate and robust dynamic attitude estimation under visual occlusion. Future work will focus on optimizing neural network structures and extending the proposed framework to more complex dynamic measurement scenarios.

Author Contributions

Conceptualization, Z.G. and Z.X.; methodology, Z.X. and Z.G.; software, H.H., F.W. and S.J.; validation, Z.X. and Z.Z. (Ziyue Zhao); formal analysis, Z.X. and Z.G.; investigation, H.H.; resources, Z.G., Z.Z. (Ziyue Zhao) and Z.X.; data curation, Z.Z. (Zhongsheng Zhai) and H.H.; writing—original draft preparation, Z.G. and Z.X.; writing—review and editing, Z.Z. (Ziyue Zhao) and Z.X.; visualization, F.W.; supervision, S.J. and Z.Z. (Ziyue Zhao); project administration, H.H. and Z.Z. (Zhongsheng Zhai); funding acquisition, Z.Z. (Ziyue Zhao) and Z.Z. (Zhongsheng Zhai). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China-Poland Measurement and Control Technology ‘Belt and Road’ Joint Laboratory Open Project, grant number MCT202302.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

LSTMLong Short-Term Memory
EKFExtended Kalman Filter
LSTM-EKFLSTM-enhanced Extended Kalman Filter
IMUInertial Measurement Unit
RMSERoot Mean Square Error
AKFAdaptive Kalman Filter
AEKFAdaptive Extended Kalman Filter
KFKalman Filtering
UKFUnscented Kalman Filter
BPBack Propagation
EPNPEfficient Perspective-n-Point
RNNRecurrent Neural Network
MAEMean Absolute Error
ReULRectified Linear Unit
MEMean Error

References

  1. Liu, Y.; Zhou, J.; Li, Y.; Zhang, Y.; He, Y.; Wang, J. A high-accuracy pose measurement system for robotic automated assembly in large-scale space. Measurement 2022, 188, 110426. [Google Scholar] [CrossRef]
  2. Martinelli, A. Vision and IMU Data Fusion: Closed-Form Solutions for Attitude, Speed, Absolute Scale, and Bias Determination. IEEE Trans. Rob. 2012, 23, 44–60. [Google Scholar] [CrossRef]
  3. Sun, C.; Huang, L.; Wang, P.; Guo, X. Fused Pose Measurement Algorithm Based on Double IMU and Vision Relative to a Moving Platform. Chin. J. Sens. Actuators 2018, 31, 1365–1372+1376. [Google Scholar] [CrossRef]
  4. Yan, K.; Xiong, Z.; Lao, D.B.; Zhou, W.H.; Zhang, L.G.; Xia, Z.P.; Chen, T. Attitude measurement method based on 2DPSD and monocular vision. In Proceedings of the Applied Optics and Photonics China (2019), Beijing, China, 7–9 July 2019. [Google Scholar] [CrossRef]
  5. Zhu, K.; Guo, X.; Jiang, C.; Xue, Y.; Li, Y.; Han, L.; Chen, Y. MIMU/Odometer Fusion with State Constraints for Vehicle Positioning during BeiDou Signal Outage: Testing and Results. Sensors 2020, 20, 2302. [Google Scholar] [CrossRef] [PubMed]
  6. Sun, W.; Yang, Y. BDS PPP/INS Tight Coupling Method Based on Non-Holonomic Constraint and Zero Velocity Update. IEEE Access 2020, 8, 128866–128876. [Google Scholar] [CrossRef]
  7. Kruse, T.; Griebel, T.; Graichen, K. Adaptive Kalman Filtering: Measurement and Process Noise Covariance Estimation Using Kalman Smoothing. IEEE Access 2025, 13, 11863–11875. [Google Scholar] [CrossRef]
  8. Fang, W.; Jiang, J.; Lu, S.; Gong, Y.; Tao, Y.; Tang, Y.; Yan, P.; Luo, H.; Liu, J. A LSTM Algorithm Estimating Pseudo Measurements for Aiding INS during GNSS Signal Outages. Remote Sens. 2020, 12, 256. [Google Scholar] [CrossRef]
  9. Tang, Y.; Jiang, J.; Liu, J.; Yan, P.; Tao, Y.; Liu, J. A GRU and AKF-Based Hybrid Algorithm for Improving INS/GNSS Navigation Accuracy during GNSS Outage. Remote Sens. 2022, 14, 752. [Google Scholar] [CrossRef]
  10. Xiong, Z.; Dong, J.; Guo, Z.; Jia, X.; Zhai, Z.; Zhao, Z.; Dong, D.; Zhou, W. A laser tracking attitude dynamic measurement method based on real-time calibration and AEKF with a multi-factor influence analysis model. Meas. Sci. Technol. 2025, 36, 25014. [Google Scholar] [CrossRef]
  11. Khodarahmi, M.; Maihami, V. A Review on Kalman Filter Models. Arch. Comput. Methods Eng. 2022, 30, 727–747. [Google Scholar] [CrossRef]
  12. Guo, F.; Zhang, X. Adaptive robust Kalman filtering for precise point positioning. Meas. Sci. Technol. 2014, 25, 105011. [Google Scholar] [CrossRef]
  13. Reif, K.; Gunther, S.; Yaz, E.; Unbehauen, R. Stochastic Stability of the Discrete-Time Extended Kalman Filter. IEEE Trans. Autom. Control. 1999, 44, 714–728. [Google Scholar] [CrossRef]
  14. Tan, T.-N.; Khenchaf, A.; Comblet, F.; Franck, P.; Champeyroux, J.-M.; Reichert, O. Robust-Extended Kalman Filter and Long Short-Term Memory Combination to Enhance the Quality of Single Point Positioning. Appl. Sci. 2020, 10, 4335. [Google Scholar] [CrossRef]
  15. Zhao, Z.; Xiong, Z.; Guo, Z.; Zhang, H.; Li, X.; Zhai, Z.; Zhou, W. Large-Space Laser Tracking Attitude Combination Measurement Using Backpropagation Algorithm Based on Neighborhood Search. Appl. Sci. 2025, 15, 1083. [Google Scholar] [CrossRef]
  16. Huang, Z.; Ye, G.; Yang, P.; Yu, W. Application of multi-sensor fusion localization algorithm based on recurrent neural networks. Sci. Rep. 2025, 15, 8195. [Google Scholar] [CrossRef] [PubMed]
  17. Kong, X.-Y.; Yang, G.-H. Secure State Estimation for Train-to-Train Communication Systems: A Neural Network-Aided Robust EKF Approach. IEEE Trans. Ind. Electron. 2024, 71, 13092–13102. [Google Scholar] [CrossRef]
  18. Huang, G.-S.; Wu, Y.-F.; Kao, M.-C. Inertial Sensor Error Compensation for Global Positioning System Signal Blocking “Extended Kalman Filter vs. Long- and Short-term Memory”. Sens. Mater. 2022, 34, 2427–2445. [Google Scholar] [CrossRef]
  19. Lepetit, V.; Moreno-Noguer, F.; Fua, P. EPnP: An Accurate O(n) Solution to the PnP Problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef]
  20. Shi, Y.; Zhang, Y.; Li, Z.; Yuan, S.; Zhu, S. IMU/UWB Fusion Method Using a Complementary Filter and a Kalman Filter for Hybrid Upper Limb Motion Estimation. Sensors 2023, 23, 6700. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Composition of the attitude measurement system: (a) Camera (Basler ace acA2500-20gm, Basler AG, Ahrensburg, Germany); (b) IMU (ADIS16488, Analog Devices, Inc., Wilmington, NC, USA); (c) Cooperative Target, (d) Computer.
Figure 1. Composition of the attitude measurement system: (a) Camera (Basler ace acA2500-20gm, Basler AG, Ahrensburg, Germany); (b) IMU (ADIS16488, Analog Devices, Inc., Wilmington, NC, USA); (c) Cooperative Target, (d) Computer.
Applsci 16 01466 g001
Figure 2. System flowchart.
Figure 2. System flowchart.
Applsci 16 01466 g002
Figure 3. LSTM structure.
Figure 3. LSTM structure.
Applsci 16 01466 g003
Figure 4. LSTM-EKF fusion algorithm framework.
Figure 4. LSTM-EKF fusion algorithm framework.
Applsci 16 01466 g004
Figure 5. The result of Input sequence length time-series cross-validation.
Figure 5. The result of Input sequence length time-series cross-validation.
Applsci 16 01466 g005
Figure 6. The effect of different network parameters on the results.
Figure 6. The effect of different network parameters on the results.
Applsci 16 01466 g006
Figure 7. Error of simulation results.
Figure 7. Error of simulation results.
Applsci 16 01466 g007
Figure 8. Experimental platform.
Figure 8. Experimental platform.
Applsci 16 01466 g008
Figure 9. Rotation measurement results under long-term reciprocating motion.
Figure 9. Rotation measurement results under long-term reciprocating motion.
Applsci 16 01466 g009
Figure 10. Error comparison among fusion methods. (a) 5-frame visual loss. (b) 10-frame visual loss. (c) 20-frame visual loss.
Figure 10. Error comparison among fusion methods. (a) 5-frame visual loss. (b) 10-frame visual loss. (c) 20-frame visual loss.
Applsci 16 01466 g010
Figure 11. Fusion results with changes in the number of missing frames from the camera.
Figure 11. Fusion results with changes in the number of missing frames from the camera.
Applsci 16 01466 g011
Figure 12. Measurement results with 80° rotation.
Figure 12. Measurement results with 80° rotation.
Applsci 16 01466 g012
Table 1. Simulation conditions.
Table 1. Simulation conditions.
SensorParametersValue
IMUGyroscope bias0.01°/h
Angular velocity random walk 0.1 ° / h
Sampling interval0.01 s
CameraMeasurement bias0.01°
Sampling interval1 s
Table 2. LSTM parameters.
Table 2. LSTM parameters.
ParametersValue
Input dimensions3 × 16
Number of LSTM layers1
Number of LSTM layer cells64
Dataset size2400
Initial learning rate0.001
Learning rate decay interval100
Learning rate decay factor0.1
L2 regularization coefficient0.01
Table 3. Parameters of the turntable and IMU.
Table 3. Parameters of the turntable and IMU.
InstrumentParametersValue
turntableRotational range[0, 320°]
Rotational speed5°/s
Rotational accuracy2″
IMUGyroscope bias0.02°/h
Angular velocity random walk0.25°/√h
Table 4. Performance Comparison of Fusion Algorithms.
Table 4. Performance Comparison of Fusion Algorithms.
ParametersKF/°AKF/°LSTM-EKF/°
WholeMAE0.4900.5350.396
RMSE0.6680.6980.469
ME0.4180.5180.294
Vision LossMAE0.8640.8480.344
RMSE1.0541.0490.428
ME0.8560.8450.121
Table 5. Absolute error of different methods.
Table 5. Absolute error of different methods.
Reference AngleIMU/°LSTM-EKF/°AKF/°
Within visual range00.1220.0510.077
50.1730.0910.106
100.2130.1040.152
150.2260.1170.186
200.2760.1280.191
250.2900.1360.221
300.3740.1570.233
350.4070.1600.248
400.5260.1570.225
450.6280.1500.220
500.7200.1510.284
550.7790.1820.267
Out of visual range600.8120.2040.342
650.8480.2210.352
700.8580.2360.379
750.8830.2550.334
801.0010.2640.384
Table 6. Ablation experiment.
Table 6. Ablation experiment.
AlgorithmExperiment 1/°Experiment 2/°Experiment 3/°
LSTM-EKF0.4630.7530.14
Only EKF0.9720.9640.258
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, Z.; Xiong, Z.; Zhao, Z.; Han, H.; Wang, F.; Jia, S.; Zhai, Z. Dynamic Attitude Estimation Method Based on LSTM-Enhanced Extended Kalman Filter. Appl. Sci. 2026, 16, 1466. https://doi.org/10.3390/app16031466

AMA Style

Guo Z, Xiong Z, Zhao Z, Han H, Wang F, Jia S, Zhai Z. Dynamic Attitude Estimation Method Based on LSTM-Enhanced Extended Kalman Filter. Applied Sciences. 2026; 16(3):1466. https://doi.org/10.3390/app16031466

Chicago/Turabian Style

Guo, Zhengnan, Zhi Xiong, Ziyue Zhao, Haosen Han, Fan Wang, Shufang Jia, and Zhongsheng Zhai. 2026. "Dynamic Attitude Estimation Method Based on LSTM-Enhanced Extended Kalman Filter" Applied Sciences 16, no. 3: 1466. https://doi.org/10.3390/app16031466

APA Style

Guo, Z., Xiong, Z., Zhao, Z., Han, H., Wang, F., Jia, S., & Zhai, Z. (2026). Dynamic Attitude Estimation Method Based on LSTM-Enhanced Extended Kalman Filter. Applied Sciences, 16(3), 1466. https://doi.org/10.3390/app16031466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop