Previous Article in Journal
Adaptive AR Navigation: Real-Time Mapping for Indoor Environment Using Node Placement and Marker Localization
Previous Article in Special Issue
Wi-Fi Sensing and Passenger Counting: A Statistical Analysis of Local Factors and Error Patterns
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Learning-Assisted Multi-IMU Proprioceptive State Estimation for Quadruped Robots

1
Intelligent Fusion Technology, Inc., Germantown, MD 20874, USA
2
Department of Electrical and Biomedical Engineering, University of Nevada, Reno, NV 89557, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Information 2025, 16(6), 479; https://doi.org/10.3390/info16060479
Submission received: 21 April 2025 / Revised: 5 June 2025 / Accepted: 6 June 2025 / Published: 9 June 2025
(This article belongs to the Special Issue Sensing and Wireless Communications)

Abstract

:
This paper presents a learning-assisted approach for state estimation of quadruped robots using observations of proprioceptive sensors, including multiple inertial measurement units (IMUs). Specifically, one body IMU and four additional IMUs attached to each calf link of the robot are used for sensing the dynamics of the body and legs, in addition to joint encoders. The extended Kalman filter (KF) is employed to fuse sensor data to estimate the robot’s states in the world frame and enhance the convergence of the extended KF (EKF). To circumvent the requirements for the measurements from the motion capture (mocap) system or other vision systems, the right-invariant EKF (RI-EKF) is extended to employ the foot IMU measurements for enhanced state estimation, and a learning-based approach is presented to estimate the vision system measurements for the EKF. One-dimensional convolutional neural networks (CNN) are leveraged to estimate required measurements using only the available proprioception data. Experiments on real data from a quadruped robot demonstrate that proprioception can be sufficient for state estimation. The proposed learning-assisted approach, which does not rely on data from vision systems, achieves competitive accuracy compared to EKF using mocap measurements and lower estimation errors than RI-EKF using multi-IMU measurements.

1. Introduction

Accurate state estimation is critical for balance and motion planning of quadruped robots on challenging terrains [1]. The requirements of real-time implementation and high accuracy for state estimation in the real world restrict sensor and algorithm selection [2].
Exteroception and proprioception are two distinct sensory modalities that play crucial roles in state estimation for legged robots [3,4]. While exteroception has been investigated for state estimation with external environment information from sensors such as cameras [5] and lidars [6], exteroceptive sensors may be prone to environmental disturbances and perceptual limitations (e.g., occlusions, varying lighting conditions), and provide large amounts of high-dimension sensory data that can be computationally demanding for real-time processing [7], which limits its applicability. Instead, proprioception involves sensing the internal state of the robot, including internal sensors (such as joint encoders, force sensors, and IMUs) that provide immediate feedback about the robot’s position, orientation, and internal forces in real time [3]. Proprioceptive state estimation has become a popular design due to its low latency, high energy efficiency, and low computational demands [7,8]. The challenges of proprioceptive state estimation lie in calibration, drift, and sensor noise [9].
EKF is a commonly used approach for sensor fusion and state estimation in legged robots [10,11]. EKF integrates measurements from multiple sensors (e.g., encoders, IMUs) with the robot’s motion model to estimate the state variables while considering measurement noise and uncertainties. Bloesch et al. [12] presented a consistent fusion of leg kinematics and IMU using an observability-constrained EKF for state estimation. Furthermore, Yang et al. [13] proposed online kinematic calibration for legged robots to reduce the velocity estimation errors caused by inaccurate leg-length knowledge. Then, multi-IMU proprioceptive odometry (PO) [14] was developed to improve upon KF-based PO methods that only use a single IMU. It is noted that multi-IMU PO (MIPO) focuses on PO rather than the full state estimation and uses the mocap system for measuring the body yaw. However, EKF is capable of diverging in some challenging situations. Invariant EKF (IEKF) has been increasingly investigated to enhance the convergence of state estimation [15]. The state is defined on a Lie group, and dynamics satisfy a particular group-affine property in IEKF. The invariance of the estimation error with respect to a Lie group action leads to the estimation error satisfying a log-linear autonomous differential equation on the Lie algebra, allowing the design of a nonlinear state estimator with strong convergence properties. Hartley et al. [16] derived an IEKF for a system containing IMU and contact sensor dynamics, with forward kinematic correction measurements. However, few works consider the IEKF for multi-IMU proprioceptive state estimation in the context of quadruped robots. Xavier et al. [17] considered multiple IMUs distributed throughout the robot’s structure for state estimation of humanoid robots. In this work, we extend IEKF for state estimation of a quadruped robot with a body IMU and four foot IMUs, which provides a good benchmark.
Moreover, machine learning has been employed to address the challenges of proprioceptive state estimation. In particular, learning-based approaches have been developed for reliable contact detection and measurements estimation with reduced sensor demands under challenging scenarios. Lin et al. [18] developed a learning-based contact estimator for IEKF to bypass the need for physical contact sensors. Buchanan et al. [19] proposed a learned inertial displacement measurement to improve state estimation in challenging scenarios where leg odometry is unreliable, such as slipping and compressible terrains. Teng et al. [20] proposed a neural measurement network (NMN) to estimate the contact probabilities of feet and the body linear velocity for IEKF. The inputs to the NMN consist of current body acceleration, body angular velocity, joint angles, joint velocities, and previous positional joint targets. Moreover, Liu et al. [21] employed a physics-informed neural network in conjunction with an unscented KF using proprioceptive sensory data to enhance the state estimation process. Inspired by previous works, this paper aims to develop an accurate full-state estimator using only proprioceptive information. In particular, built upon the MIPO, we use a 1D CNN learned from multi-IMU and other internal sensor data with groundtruth body orientations to estimate the body yaw, to avoid the mocap or sophisticated visual-inertial-leg odometry algorithm (e.g., [22]) for accurate yaw measurements/estimates. Then, the yaw angles predicted by the CNN model are used by EKF to estimate states. Furthermore, we compare this learning-assisted EKF approach with the IEKF using multi-IMU measurements.
The primary contribution of this paper lies in presenting a learning-assisted multi-IMU proprioceptive state estimation (MIPSE) framework that provides accurate full-state estimation using only proprioception. Theoretically, we extend the RI-EKF to a multi-IMU setting, deriving explicit state propagation and update equations. In addition, we propose a hybrid learning-filter fusion strategy that is both practically viable and analytically sound. The resulting learning-assisted MIPSE (LMIPSE) pipeline delivers robust state estimation, even in challenging scenarios where exteroceptive sensors are unreliable or unavailable. Our approach also addresses a critical gap in the literature by enabling precise state estimation for quadrupeds without the need for expensive external tracking systems. The remainder of the paper is organized as follows: Section 2 provides the problem formulation and related preliminaries; Section 3 presents the RI-EKF using multi-IMU measurements; Section 4 presents the proposed LMIPSE; Section 5 provides experimental results; and finally, Section 6 summarizes this paper.

2. Problem Formulation and Related Preliminaries

The states to estimate for a quadruped robot include the orientation R S O ( 3 ) , velocity v R 3 , and position p R 3 of the body in the world frame W. The sensor measurements considered for updating state estimations in this paper are from five IMUs and four joint encoders. We use I 0 to denote the body IMU frame that is aligned with the body frame B, and use { I i | i = 1 , 2 , 3 , 4 } to denote the frames of the IMUs attached to the legs. Additionally, C i denotes the frames of the foot-end contact points, and i = 1 , 2 , 3 , 4 correspond to the front left, front right, hind left, and hind right foot, respectively.

2.1. Measurements and System Model

Consider the IMU measurements model described by
ω ˜ i , t = ω i , t + b i , t g + w i , t g ,
a ˜ i , t = a i , t + b i , t a + w i , t a ,
b ˙ i , t g = w i , t b , g , b ˙ i , t a = w i , t b , a ,
where ω ˜ i , t , b i , t g , and w i , t g denote the measurement, bias, and white Gaussian noise of the gyroscope in the i-th IMU, respectively; a ˜ i , t , b i , t a , and w i , t a denote the measurement, bias, and white Gaussian noise of the accelerometer in the i-th IMU, respectively; Equation (3) considers random walk model of the bias terms of the IMU, and w i , t b , g and w i , t b , a are white Gaussian noise. Given the angular velocity ω ˜ 0 , t and linear acceleration a ˜ 0 , t measured by the body IMU as well as the contacts to the ground and the joint positions α ˜ i , t from the joint encoders, the system model can be described by
R ˙ t = R t ( ω ˜ 0 , t b 0 , t g w 0 , t g ) × ,
v ˙ t = R t ( a ˜ 0 , t b 0 , t a w 0 , t a ) + g ,
p ˙ t = v t ,
d ˙ i , t = R t h R ( α ˜ i , t ) ( 0 w i , t v ) ,
where ( · ) × denotes a skew symmetric matrix; g is the gravity vector; d i , t is the position of the i-th contact point in the world frame; h R ( α t ) = J i α ˙ ( ω ) × r i ( α ) with J i = r i α and r i denoting the location of the i-th contact point in the body frame. It is noted that Equation (7) assumes zero measured velocity for the contact point.

2.2. Right-Invariant EKF

The right-invariant EKF (RI-EKF) has proven consistent in orientation estimation by maintaining the structure of the orientation space and effective in drift mitigation by properly propagating uncertainties through the state space [15]. It is therefore an appropriate and strong baseline for proprioceptive state estimation. The RI-EKF for one body IMU and the contact process model with a forward kinematics measurement model has been derived in [16,23]. The prediction step and the update step of the RI-EKF are as follows.

2.2.1. Prediction Step

Assuming a zero-order hold to the input and performing Euler integration from time t k to t k + 1 , the discrete dynamics are described by
R ^ k + 1 = R ^ k + exp ( ( ω ˜ k ( b ^ 0 , k g ) + ) Δ t ) ,
v ^ k + 1 = v ^ k + + R ^ k + ( a ˜ k ( b ^ 0 , k a ) + ) Δ t + g Δ t ,
p ^ k + 1 = p ^ k + + v ^ k + Δ t + 1 2 R ^ k + ( a ˜ k ( b ^ 0 , k a ) + ) Δ t 2 + 1 2 g Δ t 2 ,
d ^ k + 1 = d ^ k + , ( b ^ 0 , k + 1 g ) = ( b ^ 0 , k g ) + , ( b ^ 0 , k + 1 a ) = ( b ^ 0 , k a ) + ,
P k + 1 = Φ k P k Φ k + Q k ,
where Δ t = t k + 1 t k , ( · ) k + denotes the corrected estimation with all the measurements until t k and ( · ) k + 1 denotes the one-step-ahead prediction using the dynamic model with the corrected estimates at t k ; Φ k = exp m ( A k Δ t ) is the discrete state transformation matrix with A k : = A | t = t k and
A = 0 3 0 3 0 3 0 3 R ^ 0 3 g × 0 3 0 3 0 3 ( v ^ ) × R ^ R ^ 0 3 0 3 0 3 0 3 ( p ^ ) × R ^ 0 3 0 3 I 3 0 3 0 3 ( d ^ ) × R ^ 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 ;
being the system matrix of the linearized right-invariant error dynamics; the estimation error ξ = [ ξ R , ξ v , ξ p , ξ d , e b 0 g , e a 0 g ] with e b 0 g = b ^ 0 g b 0 g and e b 0 a = b ^ 0 a b 0 a ; Q k Φ k Q ^ Φ k Δ t with Q ^ = Ad X ^ t 0 12 × 6 0 6 × 12 I 6 Cov ( w t ) Ad X ^ t 0 12 × 6 0 6 × 12 I 6 and
w t : = vec ( w t g , w t a , 0 3 × 1 , h R ( α ˜ t ) w t v , w 0 , t b , g , w 0 , t b , a ) ;
exp m ( · ) is the matrix exponential; Ad X ( ξ ) = X ξ X 1 denotes the adjoint map and ξ is the associated element in a n × n matrix Lie group for ξ R n .

2.2.2. Update Step

The update step is as follows:
X ^ + = exp ( δ I M U ) X ^ ,
b g + = b g + δ b g , b a + = b a + δ b a ,
P + = ( I K H ) P ( I K H ) + K N K ,
where
X t : = R t v t p t d t 0 1 × 3 1 0 0 0 1 × 3 0 1 0 0 1 × 3 0 0 1
is the embedding of the states in the matrix Lie group (without loss of generality, we will give all further equations assuming only a single contact point, as the process and measurement models are identical for each contact point.); δ : = [ δ I M U , δ b g , δ b a ] = K Π ( X ^ Y ) is the correction term with K being the Kalman gain and Π = [ I 3 , 0 3 ] for selecting the first three rows of the right-hand-side matrix. The Kalman gain is given by
N = R ^ J v ( α ˜ ) Cov ( w α ) J v ( α ˜ ) R ^ + Q α ,
H = [ 0 3 , 0 3 , I 3 , I 3 , 0 3 , 0 3 ] ,
S = H P H + N ,
K = P H S 1 ,
where Q α is the covariance matrix of the additive white Gaussian noise from the leg kinematics measurements.

2.3. Multi-IMU Proprioceptive Odometry

Multiple IMUs, including body and foot IMUs, have been used with joint encoders for legged robots to achieve low-drift long-term position and velocity estimation using an EKF. The foot IMUs can address the limitations of the PO using the single body IMU by enabling the updates of the foot positions during the airborne phase and considering the rolling during the contact without the zero-velocity assumption for the Leg Odometry velocity. In particular, foot IMU data was employed to update foot velocities in the prediction step and determine foot contact models and slip for the measurement model [14]. The process model for MIPO is
x ^ k + 1 = θ ^ k + 1 v ^ k + 1 p ^ k + 1 d ^ i , k + 1 d ^ ˙ i , k + 1 = θ ^ k + ( Ω ( θ ^ k ) ( B ω ) ) Δ t v ^ k + ( R ( θ ^ k ) a 0 g ) Δ t p ^ k + v ^ k Δ t d ^ i , k + d ^ ˙ i , k Δ t d ^ ˙ i , k + ( R ( θ ^ k ) h R ( α ) a i g ) Δ t
where Ω ( θ ^ k ) originates from the derivative of θ = θ r θ p θ y , with θ r , θ p , θ y denoting the roll, pitch, and yaw angles, respectively. The measurement model is
h ( x ^ k , α , ω i ) = R ( θ ^ k ) ( d ^ p ^ k ) R ( θ ^ k ) ( v ^ i , k v ^ k ) d ^ ˙ i , k W ω i × s
where the last term is based on the pivoting model [14]; × denotes the cross product of two vectors; ω i W = h R ( α ˜ ) ω i ; using d 0 to denote the distance between the foot center and the foot surface and n = R ( θ ^ k ) h R ( α ˜ ) to denote the contact normal vector expressed in the world frame, s ( x ^ , α ˜ ) = s 0 · n / n is the pivoting vector pointing from the contact point to the body center. The last term in (22) can be used for the update step only when a foot is in contact with the ground. To detect the contact and slip, a statistical test based on Mahalanobis distance is used in [14] as follows:
z T S z 1 z < σ ,
where z = d ^ ˙ i , k W ω i × s and S z is the covariance matrix of z estimated by the EKF; σ is the hypothesis testing threshold. The foot is recognized as being in non-slipping contact if (23) is satisfied. Since RI-EKF shows superior performance, we will present the RI-EKF approach for state estimation using multi-IMU measurements.

3. RI-EKF Using Multi-IMU Measurements

In this section, we present RI-EKF using multi-IMU measurements as a strong benchmark. Considering the linear velocity of the foot as a state in the system dynamics, the augmented robot state x ¯ = [ θ , v 0 , p , d i , v i ] where v i denotes the linear velocity of the foot i contact foot in the world frame and v ˙ i , k = R ( θ k ) h R ( α ) a i + g . Then, we consider
X ¯ t : = R t v 0 , t p t d i , t v i , t 0 1 × 3 1 0 0 0 0 1 × 3 0 1 0 0 0 1 × 3 0 0 1 0 0 1 × 3 0 0 0 1 .
The foot IMUs can provide the estimation of the foot velocities and accelerations, even during slipping, by
d ˙ i , t = v i , t ,
v ˙ i , t = R t h R ( α ˜ ) ( a ˜ i , t b i , t a w i , t a ) + g .
Consequently, we have
d d t X ¯ t = R t ( ω ˜ 0 , t ) × R t a ˜ 0 , t + g v 0 , t 0 3 × 1 R t h R ( α ˜ ) a ˜ i , t + g 0 1 × 3 0 0 0 0 0 1 × 3 0 0 0 0 0 1 × 3 0 0 0 0 0 1 × 3 0 0 0 0 R t v 0 , t p t d i , t v i , t 0 1 × 3 1 0 0 0 0 1 × 3 0 1 0 0 0 1 × 3 0 0 1 0 0 1 × 3 0 0 0 1 ( w 0 , t g ) × w 0 , t a 0 3 × 1 h R ( α ˜ t ) w t v h R ( α ˜ ) w i , t a 0 1 × 3 0 0 0 0 0 1 × 3 0 0 0 0 0 1 × 3 0 0 0 0 0 1 × 3 0 0 0 0
f u t ( X ¯ t ) X ¯ t L g ( w ¯ t )
with w ¯ t : = vec ( w 0 , t g , w 0 , t a , 0 3 × 1 , h R ( α ˜ t ) w t v , h R ( α ˜ ) w i , t a ) . The right-invariant error between two trajectories X ¯ t and X ¯ ^ t is defined as
η t r = X ¯ ^ t X ¯ t exp ( ξ ¯ t ) I d + L g ( ξ ¯ t ) .
where L g : R 3 N + 9 g maps a vector to the corresponding element of the Lie algebra. The f u t ( · ) can be shown to satisfy the group affine property. Therefore, the right-invariant error dynamics are trajectory independent and satisfy d d t η t r = f u t ( η t r ) η t r f u t ( I d ) + ( X ^ t L g ( w t ) X ^ t 1 ) η t r g u t ( η t r ) + L g ( w ^ t ) η t r [15]. Considering the IMU biases,
d d t d ^ i , t R ^ t R t d i , t ξ t v i + ( d ^ i , t ) × R ^ t ( w 0 , t g ζ 0 , t g ) , d d t v ^ i , t R ^ t R t v i , t ( g ) × ξ t R + ( v ^ i , t ) × R ^ t ( w 0 , t g ζ 0 , t g )
+ R ^ t h R ( α ˜ ) ( w i , t a ζ i , t a ) .
where ζ 0 , t g = b ^ 0 , t g b 0 , t g and ζ i , t a = b ^ i , t a b i , t a . To let the invariant error satisfy the log-linear property [15], A ¯ t is defined by g u t ( exp ( ξ ¯ ) ) = L g ( A ¯ t ξ ) + O ( ξ ¯ 2 ) . Using the first-order approximation exp ( ξ ¯ ) I d + L g ( ξ ¯ t ) , we can have
A ¯ = 0 3 0 3 0 3 0 3 0 3 R ^ 0 3 0 3 0 3 × 9 ( g ) × 0 3 0 3 0 3 0 3 ( v ^ 0 ) × R ^ R ^ 0 3 0 3 × 9 0 3 I 3 0 3 0 3 0 3 ( p ^ ) × R ^ 0 3 0 3 0 3 × 9 0 3 0 3 0 3 0 3 I 3 ( d ^ ) × R ^ 0 3 0 3 0 3 × 9 ( g ) × 0 3 0 3 0 3 0 3 ( v ^ i ) × R ^ 0 3 R ^ h R ( α ˜ ) 0 3 × 9 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 × 9 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 × 9 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 × 9
with ξ ¯ = [ ξ R , ξ v , ξ p , ξ d i , ξ v i , ζ b 0 g , ζ a 0 g , ζ a i g ] . The covariance matrix of the process noise for the augmented error dynamics is Q ¯ ^ = Ad X ¯ ^ t 0 12 × 6 0 6 × 12 I 6 Cov ( w ¯ t ) Ad X ¯ ^ t 0 12 × 6 0 6 × 12 I 6 .
Using the measurements ω ˜ i of the gyroscope on the foot i,
v i B = R t ω i W ( x ^ , α ˜ , C i ω i ) × s ( x ^ , α ˜ ) ,
where ω i W = R ( θ ^ k ) h R ( α ˜ ) C i ω i . v i B can be expressed in the right-invariant observation structure Y t = X t 1 b + V t in [15] as follows
v i B 0 0 0 1 = R t R t v 0 R t p R t d i R t v i 0 1 × 3 1 0 0 0 0 1 × 3 0 1 0 0 0 1 × 3 0 0 1 0 0 1 × 3 0 0 0 1 0 3 × 1 0 0 0 1 + R ^ w C i v 0 0 0 0 .
Then,
Y ¯ = h p ( α ˜ ) 0 1 1 0 J ( α ) α ˙ + ( B ω ) × h p ( α ) 1 0 0 1 v i B 0 0 0 1 ,
H ¯ = 0 3 0 3 I I 0 3 0 3 × 18 0 3 I 0 3 0 3 I 0 3 × 18 0 3 0 3 0 3 0 3 I 0 3 × 18 ,
N ¯ = diag R ^ J v i Cov ( w i , t v ) J v i ( α ˜ ) R ^ , R ^ Cov ( w i f ) R ^ , R ^ Cov ( w C i v ) R ^ .
where diag { · , · } denotes the block diagonal operation, w i f incorporates the uncertainty in the encoder measurements, the kinematic model, the effect of slip and gyroscope bias [23], and w C i v represents the noise of the measured velocity for the i-th contact point.

4. Learning-Assisted Multi-IMU Proprioceptive State Estimation

A notable limitation of MIPO [14] arises from its reliance on yaw angle measurements provided by a motion capture system. This dependency arises because using motion capture data significantly enhances the accuracy of the state estimation model. However, solely relying on IMU measurements or EKF predictions results in poor accuracy due to the high noise levels in IMU data and the lack of a cross-referencing method to correct these measurements. This reliance on motion capture systems, which are typically unavailable in outdoor environments or general experimental setups, limits the model’s applicability. To address this limitation, our method employs a learning-based approach to predict yaw angles from available data sources, including IMUs, joint motor encoders, and foot contact sensors. This predictive capability enables the replacement of direct motion capture measurements, thus fitting seamlessly within the multi-IMU odometry framework and broadening the model’s applicability to outdoor and less controlled environments. Figure 1 shows the schematic of our proposed LMIPSE.
To train a model for predicting yaw angles online, we first collect a dataset denoted by D = { x ( k ) = { ω ˜ i , t j , a ˜ i , t j , α ˜ i , t j , f ˜ i , t j | i = 0 , 1 , , 4 ; j = k , k 1 , , k n w s + 1 } , y ( k ) = θ y ( k ) | k = 1 , , N d s } where x ( k ) are the inputs of the model while y ( k ) is the output the model learns to predict, f ˜ i , t j is the foot force sensor measurements, n w s is the time window size, and N d s is the dataset size. The dataset D can be randomly split into a training set D t r a i n and a testing set D t e s t . It is noted that mocap is only needed for collecting the data of θ y ( k ) but is not necessary for state estimation. Many model architectures have been proposed for signal processing [24]. We consider a 1D version of the ResNet [25] (ResNet1D) due to its proven capability in modeling time series data with long-term dependencies while maintaining a manageable model size. Compared with other architectures [26], ResNet1D achieves a better tradeoff between performance and training efficiency for inertial data processing, as evidenced in recent works such as [19,27]. Then, the ResNet1D (denoted by f ( x ; w ) with parameters w) can be trained by minimizing the following Smooth L1 loss (aka Huber Loss)
L δ t h ( y f ( x ; w ) ) = 1 2 ( y f ( x ; w ) ) 2 for | y f ( x ; w ) | δ t h , δ t h | y f ( x ; w ) | δ t h 2 for | y f ( x ; w ) | > δ t h .
The Huber loss (36) is chosen for its robustness to outliers in data. It smoothly transitions to a linear loss for errors larger than a certain threshold δ t h . This combination of the squared loss and the absolute loss allows the Huber loss to provide the benefits of both types of losses.
It is noted that the NN design significantly affects the generalization of data-driven models [28]. Besides directly predicting yaw angles, we consider using an NN denoted by f ( x ; w ) to model the mismatch between the yaw data ( θ y ) from the mocap system and the yaw estimates ( θ ^ D K , y ) by the body IMU dead reckoning (DK). DK was implemented using Euler integration of angular velocities from the body IMU without any correction or external references, which results in typical drift over time. To learn an accurate f ( x ; w ) , first, we transform the dataset D into D = { x ( k ) = { ω ˜ i , t j , a ˜ i , t j , α ˜ i , t j , f ˜ i , t j | i = 0 , 1 , , 4 ; j = k , k 1 , , k n w s + 1 } , y ( k ) = δ θ y ( k ) | k = 1 , , N d s } , where δ θ y ( k ) = θ y ( k ) θ ^ D K , y ( k ) with θ ^ D K , y denoting the estimated yaw angle by body IMU DK. Then, we optimize w in f by minimizing L δ t h ( y f ( x ; w ) ) on the transformed dataset D .
After finding a model f : = f ( x ( k ) ; w ) that achieves small testing errors on D t e s t , we use the model f to predict the correction factor δ θ y for the yaw angle online. The predicted correction factor is then added to the yaw estimate θ ^ D K , y of IMU DK to obtain the final corrected yaw angle prediction θ ^ y . The corrected θ ^ y is used for the prediction step and update step of the EKF. Specifically, at time instant k, the prediction step is
x ^ k + 1 = f ( x ^ k + , u k ) ,
P k + 1 = F k P k + F k + G k Q k E G k ,
where F = f E x with f E denoting the state transition function in (21), G = f E w E with w E denoting the process noise in (21), and Q E is the covariance matrix of w E ; the update step is
P k + = ( ( P k ) 1 + H k R k 1 H k ) 1 ,
x ^ k + = x ^ k + P k + H k R k 1 y ¯ k H k x ^ k ,
where H = h x , and y ¯ k = y k h ( x ^ k ) + H k x ^ k .

5. Experiments and Validation

This section validates the proposed LMIPSE approach using simulation and real data of a legged robot.

5.1. Dataset Description

The dataset utilized for this study was collected using a Unitree Go 1 robot in a lab space equipped with a highly accurate mocap system, as detailed in the work [14]. The robot was equipped with several proprioceptive sensors, including a body MEMS IMU, twelve joint motor encoders, and four foot pressure contact sensors. An MPU9250 IMU was also mounted on each robot foot to capture additional inertial information.
During indoor trials, the robot operated on flat ground and moved at speeds ranging from 0.4 to 1.0 m/s. Sensor data included linear accelerations and angular velocities from the IMUs, torque and angle measurements from the joint motor encoders, and orientation data and positional data from the motion capture system. To ensure consistency, all collected data were first resampled to the same frequency, aligned to the same start and end times, and normalized such that the starting location of the robot was at the origin. These data were stored in a ROSBag file and are publicly available [14].
A key limitation of the dataset is its short duration of approximately 45 s and the simple flat terrain of the testing environment, which might not significantly affect the IMU data’s susceptibility to drift. To address these limitations and simulate a more practical scenario, Gaussian noise and a time-integrated Gaussian noise drift were added to the body IMU data. This setup was intended to assess the robustness of the state estimation algorithms under conditions that more closely mimic operational environments with potential IMU data degradation.

5.2. Experimental Setup

In our study, the mocap system’s pose measurements serve as the ground truth, providing a high-fidelity benchmark for evaluating the performance of various state estimation methods. The Multi-IMU Proprioceptive Odometry (MIPO), which integrates multiple IMUs mounted on the robot’s feet, serves as the baseline against which other methods are compared. Additionally, we explore the body IMU dead reckoning as a first baseline method, where angular velocity measurements are integrated over time to estimate orientation. However, due to IMU drift, this method typically yields less reliable results.

Impacts of NN Design

To address the limitations caused by dataset variability and enhance the model’s generalizability across different robotic movements, we implemented two CNN-based approaches based on a 1D ResNet architecture [29]. The first approach is the Multi-IMU CNN angle estimator (MI-CAE), which predicts yaw angles directly, using mocap-derived orientation measurements as ground truth. The second approach is the Multi-IMU CNN angle correction enhancer (MI-CCE), which aims to predict a correction factor for the IMU-derived yaw angles, with the ground truth being the discrepancy between the IMU dead reckoning and mocap measurements. The corrected yaw angles are then integrated into the MIPO framework to enhance state estimation. The efficacy of these approaches is subsequently evaluated by comparing the resulting position estimates against the mocap system data, thus ensuring a robust assessment of each approach’s ability to improve upon traditional IMU dead reckoning techniques in dynamic environments.
NN architecture: We used a ResNet1D as the backbone of the model for processing sensor data. The ResNet1D structure comprises nine residual blocks, each containing a sequence of convolutional layers, batch normalization, and ReLU activation functions. The output of the model is configured to produce a single continuous value, which presents the predicted yaw angles or the correction factor for IMU yaw angle measurements. Table 1 summarizes the architecture of the ResNet1D model.
Training settings: From a training perspective, the model employs the Adam optimizer with an initial learning rate of 0.001 and a weight decay of 0.001 to prevent overfitting. Additionally, a ReduceLROnPlateau learning rate scheduler is integrated to adjust the learning rate based on the performance, specifically reducing the rate if no improvement in loss is observed over 10 epochs. This is complemented by using Huber Loss as the loss function. The entire model is trained for 50 epochs.

5.3. Results and Discussions

5.3.1. Yaw Prediction Model

Table 2 summarizes the testing accuracy of the two CNN-based models using different training set sizes. After experimenting with various sizes, we chose to use 3000 random data points to balance performance and generalization. Furthermore, Figure 2 compares the yaw prediction results of MI-CAE and MI-CCE, demonstrating that MI-CCE achieved smaller prediction errors. Since the yaw predictions are used for EKF, we further tested the performance of the two models for state estimation.

5.3.2. Proprioceptive State Estimation

We assessed the performance of different state estimation methods under both low-noise and high-noise conditions. Low Noise Condition refers to the original IMU data measured using a high-quality IMU in a controlled lab environment. High Noise Condition is created by adding Gaussian noise and drift to the IMU measurements to mimic the unstable environment.
Table 3 summarizes the results of various state estimation methods under low IMU noise conditions. The SIPO (Single-IMU Proprioceptive Odometry) uses only one body IMU. The MIPO method [14] achieves significantly higher positioning accuracy compared to SIPO. However, using yaw angles estimated by EKF rather than motion capture (mocap) measurements substantially increases the position estimation errors for both SIPO and MIPO. When using the body IMU dead reckoning (DK) method, the performance of MIPO with mocap is slightly degraded, and this degradation compounds over time. The MIPSE with RI-EKF reduces estimation errors, but still results in much larger errors compared to MIPO using mocap. In contrast, the MI-CAE method uses CNN-predicted yaw angles derived from multiple IMU measurements for state estimation. This results in an average drift of 15.65%, a median drift of 15.87%, an RMSE of 0.304946, and a maximum RSE of 0.855203. The MI-CCE method improves upon MI-CAE by using a CNN to predict an angle correction factor from multiple IMU measurements, which is then used to correct the IMU data before performing state estimation. This method shows slightly lower drift percentages and error metrics, with an average drift of 15.65%, a median drift of 15.87%, an RMSE of 0.304946, and a maximum RSE of 0.855203. The discrepancy in the performance of the state estimation is smaller than the difference in the accuracy of the yaw prediction between MI-CAE and MI-CCE, demonstrating that the prediction errors can be mitigated with the EKF.
Notably, while the MIPO method uses angle information from a motion capture system and achieves the lowest error metrics, the CNN-enhanced methods, MI-CAE and MI-CCE, exhibit similar levels of accuracy without the reliance on external motion capture systems.
Table 4 summarizes the results of various state estimation methods under high IMU noise. Our LMIPSE using CCE achieved the best results regarding all three performance measurements. Furthermore, LMIPSE is more robust to increased noise levels than MIPO with DK, since the median drift was increased only by 0.03% for LMIPSE compared to 0.25% for MIPO with DK. Furthermore, Figure 3 compares the trajectory estimation, and Figure 4 compares the state estimation results of LMIPSE and other methods. The MIPO EKF estimator, without motion capture assistance, begins to diverge significantly due to yaw estimation error. Our proposed LMIPSE method continues to accurately track the ground-truth trajectory, in contrast to the divergence observed in other methods. These experimental results show that learning-assisted correction is more resilient to drift over time, demonstrating the robustness of our proposed LMIPSE method. The runtime of our LMIPSE is approximately 3 ms (including 0.5 ms ResNet1D model inference time on an NVIDIA Jetson Orin (sourced from NVIDIA, Santa Clara, CA 95051, USA) platform and 2.5 ms MIPSE runtime on an Arduino Teensy board), which is within budget (5 ms) and satisfies the real-time requirements of many robotic platforms.

6. Conclusions

A learning-assisted EKF approach was presented for multi-IMU proprioceptive state estimation of quadruped robots. In particular, a 1D CNN-based model was trained offline to estimate the measurements required for EKF using ground-truth measurements from the mocap system and buffers of proprioception data (including IMUs and joint encoder measurements). The trained model was then used to estimate the measurements online for EKF. Experiments demonstrated that the proposed approach improved the accuracy of state estimation compared to multi-IMU state estimation using RI-EKF and proprioceptive odometry without using multiple IMUs. In future work, we will further validate the proposed LMIPSE approach using outdoor and varied-terrain experiments and will consider learning-based approaches for tuning the hyperparameters of the filters.

Author Contributions

Conceptualization, X.L., Y.B. and H.X.; methodology, Y.B. and Z.F.; software, X.L. and Y.B.; validation, X.L. and Y.B.; formal analysis, X.L. and Y.B.; investigation, P.C. and D.S.; resources, G.C.; data curation, X.L. and Y.B.; writing—original draft preparation, X.L. and Y.B.; writing—review and editing, P.C.; visualization, X.L. and Y.B.; supervision, P.C., D.S. and G.C.; project administration, D.S. and G.C.; funding acquisition, D.S. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Xuanning Liu, Yajie Bao, Peng Cheng, Dan Shen, Zhengyang Fan, and Genshe Chen was employed by the company Intelligent Fusion Technology, Inc. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IMUsInertial measurement units
KFKalman filter
EKFExtended Kalman filter
IEKFInvariant extended Kalman filter
RI-EKF  Right-invariant extended Kalman filter
mocapMotion capture
CNNConvolutional neural networks
POProprioceptive odometry
SIPOSingle-IMU proprioceptive odometry
MIPOMulti-IMU proprioceptive odometry
NMNNeural measurement network
MIPSEMulti-IMU proprioceptive state estimation
LMIPSELearning-assisted multi-IMU proprioceptive state estimation
DKDead reckoning
MI-CAEMulti-IMU CNN angle estimator
MI-CCEMulti-IMU CNN angle correction enhancer

References

  1. Mastalli, C.; Havoutis, I.; Focchi, M.; Caldwell, D.G.; Semini, C. Motion planning for quadrupedal locomotion: Coupled planning, terrain mapping, and whole-body control. IEEE Trans. Robot. 2020, 36, 1635–1648. [Google Scholar] [CrossRef]
  2. Bloesch, M. State Estimation for Legged Robots-Kinematics, Inertial Sensing, and Computer Vision. Ph.D. Thesis, ETH Zurich, Zurich, Switzerland, 2017. [Google Scholar]
  3. He, J.; Gao, F. Mechanism, actuation, perception, and control of highly dynamic multilegged robots: A review. Chin. J. Mech. Eng. 2020, 33, 1–30. [Google Scholar] [CrossRef]
  4. Nobili, S.; Camurri, M.; Barasuol, V.; Focchi, M.; Caldwell, D.; Semini, C.; Fallon, M. Heterogeneous sensor fusion for accurate state estimation of dynamic legged robots. In Proceedings of the Robotics: Science and System XIII, Cambridge, MA, USA, 12–16 July 2017. [Google Scholar]
  5. Bloesch, M.; Burri, M.; Omari, S.; Hutter, M.; Siegwart, R. Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback. Int. J. Robot. Res. 2017, 36, 1053–1072. [Google Scholar] [CrossRef]
  6. Zhao, S.; Zhang, H.; Wang, P.; Nogueira, L.; Scherer, S. Super odometry: Imu-centric lidar-visual-inertial estimator for challenging environments. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 8729–8736. [Google Scholar]
  7. Fink, G.; Semini, C. Proprioceptive Sensor Fusion for Quadruped Robot State Estimation. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 10914–10920. [Google Scholar] [CrossRef]
  8. Lin, T.Y.; Li, T.; Tong, W.; Ghaffari, M. Proprioceptive invariant robot state estimation. arXiv 2023, arXiv:2311.04320. [Google Scholar]
  9. Reinstein, M.; Hoffmann, M. Dead Reckoning in a Dynamic Quadruped Robot Based on Multimodal Proprioceptive Sensory Information. IEEE Trans. Robot. 2013, 29, 563–571. [Google Scholar] [CrossRef]
  10. Ribeiro, M.I. Kalman and extended kalman filters: Concept, derivation and properties. Inst. Syst. Robot. 2004, 43, 3736–3741. [Google Scholar]
  11. Camurri, M.; Ramezani, M.; Nobili, S.; Fallon, M. Pronto: A multi-sensor state estimator for legged robots in real-world scenarios. Front. Robot. AI 2020, 7, 68. [Google Scholar] [CrossRef] [PubMed]
  12. Bloesch, M.; Hutter, M.; Hoepflinger, M.A.; Leutenegger, S.; Gehring, C.; Remy, C.D.; Siegwart, R. State estimation for legged robots-consistent fusion of leg kinematics and IMU. Robotics 2013, 17, 17–24. [Google Scholar]
  13. Yang, S.; Choset, H.; Manchester, Z. Online kinematic calibration for legged robots. IEEE Robot. Autom. Lett. 2022, 7, 8178–8185. [Google Scholar] [CrossRef]
  14. Yang, S.; Zhang, Z.; Bokser, B.; Manchester, Z. Multi-IMU Proprioceptive Odometry for Legged Robots. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 774–779. [Google Scholar]
  15. Barrau, A.; Bonnabel, S. The invariant extended Kalman filter as a stable observer. IEEE Trans. Autom. Control 2016, 62, 1797–1812. [Google Scholar] [CrossRef]
  16. Hartley, R.; Jadidi, M.G.; Grizzle, J.W.; Eustice, R.M. Contact-aided invariant extended Kalman filtering for legged robot state estimation. arXiv 2018, arXiv:1805.10410. [Google Scholar]
  17. Xavier, F.E.; Burger, G.; Pétriaux, M.; Deschaud, J.E.; Goulette, F. Multi-IMU Proprioceptive State Estimator for Humanoid Robots. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 10880–10887. [Google Scholar]
  18. Lin, T.Y.; Zhang, R.; Yu, J.; Ghaffari, M. Legged Robot State Estimation using Invariant Kalman Filtering and Learned Contact Events. In Proceedings of the Conference on Robot Learning, PMLR, Auckland, New Zealand, 14–18 December 2022; pp. 1057–1066. [Google Scholar]
  19. Buchanan, R.; Camurri, M.; Dellaert, F.; Fallon, M. Learning inertial odometry for dynamic legged robot state estimation. In Proceedings of the Conference on Robot Learning, PMLR, Auckland, New Zealand, 14–18 December 2022; pp. 1575–1584. [Google Scholar]
  20. Youm, D.; Oh, H.; Choi, S.; Kim, H.; Hwangbo, J. Legged Robot State Estimation With Invariant Extended Kalman Filter Using Neural Measurement Network. arXiv 2024, arXiv:2402.00366. [Google Scholar]
  21. Liu, Y.; Bao, Y.; Cheng, P.; Shen, D.; Chen, G.; Xu, H. Enhanced robot state estimation using physics-informed neural networks and multimodal proprioceptive data. In Proceedings of the Sensors and Systems for Space Applications XVII, SPIE, National Harbor, MD, USA, 21–25 April 2024; Volume 13062, pp. 144–160. [Google Scholar]
  22. Yang, S.; Zhang, Z.; Fu, Z.; Manchester, Z. Cerberus: Low-drift visual-inertial-leg odometry for agile locomotion. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 4193–4199. [Google Scholar]
  23. Teng, S.; Mueller, M.W.; Sreenath, K. Legged robot state estimation in slippery environments using invariant extended kalman filter with velocity update. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 3104–3110. [Google Scholar]
  24. Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef] [PubMed]
  25. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  26. Fan, Z.; Cheng, P.; Chen, H.; Bao, Y.; Pham, K.; Blasch, E.; Xu, H.; Chen, G. STAN: Spatial-Temporal Attention Based Inertial Navigation Transformer. In Proceedings of the 37th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2024), Baltimore, MD, USA, 16–20 September 2024; pp. 2825–2836. [Google Scholar]
  27. Liu, W.; Caruso, D.; Ilg, E.; Dong, J.; Mourikis, A.I.; Daniilidis, K.; Kumar, V.; Engel, J. TLIO: Tight Learned Inertial Odometry. IEEE Robot. Autom. Lett. 2020, 5, 5653–5660. [Google Scholar] [CrossRef]
  28. Chen, C.; Pan, X. Deep learning for inertial positioning: A survey. IEEE Trans. Intell. Transp. Syst. 2024, 25, 10506–10523. [Google Scholar] [CrossRef]
  29. Hong, S.; Xu, Y.; Khare, A.; Priambada, S.; Maher, K.; Aljiffry, A.; Sun, J.; Tumanov, A. HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1614–1624. [Google Scholar]
Figure 1. The schematic of our proposed LMIPSE.
Figure 1. The schematic of our proposed LMIPSE.
Information 16 00479 g001
Figure 2. Comparison of yaw predictions.
Figure 2. Comparison of yaw predictions.
Information 16 00479 g002
Figure 3. Trajectory estimation using MI-CCE.
Figure 3. Trajectory estimation using MI-CCE.
Information 16 00479 g003
Figure 4. Comparison of state estimation results.
Figure 4. Comparison of state estimation results.
Information 16 00479 g004
Table 1. Architecture of the ResNet1D model.
Table 1. Architecture of the ResNet1D model.
Layer (Type)Output ShapeParam #
Conv1d-1[32, 256, 8691]75,776
MyConv1dPadSame-2[32, 256, 8691]0
BatchNorm1d-3[32, 256, 8691]512
ReLU-4[32, 256, 8691]0
Conv1d-5[32, 256, 8691]327,936
MyConv1dPadSame-6[32, 256, 8691]0
BatchNorm1d-7[32, 256, 8691]512
ReLU-8[32, 256, 8691]0
Dropout-9[32, 256, 8691]0
Conv1d-10[32, 256, 8691]327,936
MyConv1dPadSame-11[32, 256, 8691]0
BasicBlock-12[32, 256, 8691]0
Conv1d-106[32, 1024, 544]5,243,904
MyConv1dPadSame-107[32, 1024, 544]0
BasicBlock-108[32, 1024, 544]0
BatchNorm1d-109[32, 1024, 544]2048
ReLU-110[32, 1024, 544]0
Linear-111[32, 1]1025
Table 2. Testing accuracy of CNN models for different training set sizes.
Table 2. Testing accuracy of CNN models for different training set sizes.
Size1k2k3k4k
Method
MI-CAE0.08260.05260.04640.0361
MI-CCE0.04450.03320.02390.0236
Table 3. Summary of results for various state estimation methods under low IMU noise.
Table 3. Summary of results for various state estimation methods under low IMU noise.
MethodYawFilterMedian Drift %RMSEMax RSE
SIPOMocapEKF31.520.571.00
SIPOEKFEKF45.190.952.12
MIPOMocapEKF14.890.290.79
MIPODKEKF15.120.280.81
MIPOEKFEKF17.830.622.11
MIPSERI-EKFRI-EKF16.590.411.29
LMIPSECAEEKF15.230.300.87
LMIPSECCEEKF14.840.290.80
Table 4. Summary of results for various state estimation methods under high IMU noise.
Table 4. Summary of results for various state estimation methods under high IMU noise.
MethodYawFilterMedian Drift %RMSEMax RSE
SIPOEKFEKF45.900.962.13
MIPOEKFEKF18.300.662.22
MIPODKEKF15.370.310.98
LMIPSECCEEKF14.870.290.79
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, X.; Bao, Y.; Cheng, P.; Shen, D.; Fan, Z.; Xu, H.; Chen, G. Learning-Assisted Multi-IMU Proprioceptive State Estimation for Quadruped Robots. Information 2025, 16, 479. https://doi.org/10.3390/info16060479

AMA Style

Liu X, Bao Y, Cheng P, Shen D, Fan Z, Xu H, Chen G. Learning-Assisted Multi-IMU Proprioceptive State Estimation for Quadruped Robots. Information. 2025; 16(6):479. https://doi.org/10.3390/info16060479

Chicago/Turabian Style

Liu, Xuanning, Yajie Bao, Peng Cheng, Dan Shen, Zhengyang Fan, Hao Xu, and Genshe Chen. 2025. "Learning-Assisted Multi-IMU Proprioceptive State Estimation for Quadruped Robots" Information 16, no. 6: 479. https://doi.org/10.3390/info16060479

APA Style

Liu, X., Bao, Y., Cheng, P., Shen, D., Fan, Z., Xu, H., & Chen, G. (2025). Learning-Assisted Multi-IMU Proprioceptive State Estimation for Quadruped Robots. Information, 16(6), 479. https://doi.org/10.3390/info16060479

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop