Simultaneous Floating-Base Estimation of Human Kinematics and Joint Torques

The paper presents a stochastic methodology for the simultaneous floating-base estimation of the human whole-body kinematics and dynamics (i.e., joint torques, internal and external forces). The paper builds upon our former work where a fixed-base formulation had been developed for the human estimation problem. The presented approach is validated by presenting experimental results of a health subject equipped with a wearable motion tracking system and a pair of shoes sensorized with force/torque sensors while performing different motion tasks, e.g., walking on a treadmill. The results show that joint torque estimates obtained by using floating-base and fixed-base approaches match satisfactorily, thus validating the present approach.


Introduction
In physical human-robot interaction (pHRI) domains, a huge variety of applications requires robots to actively collaborate with humans. More and more frequently, robots are required to be endowed with the capability to control physical collaboration through intentional interaction with humans. The simultaneous whole-body estimation of the human kinematics (i.e., motion) and dynamics (i.e., joint torques and internal forces) is a crucial component for modeling, estimating and controlling the interaction. In assistive and rehabilitation scenarios, for instance, the demand for physical robotic assistance to humans is an ever-growing practice and the estimation is a pivotal component for creating technologies capable to help and assist humans. The importance of controlling the pHRI calls for the design of a framework in which the concurrent estimation of the human kinematics and dynamics could be exploited by the robots [1].
In general, several routes for pHRI have been explored over the years. Minimum-jerk-based methods [2,3], imitation learning [4] and retargeting techniques [5,6] are only some of the relevant examples. None of them, however, deal with the simultaneous estimation of the human kinematics and dynamics. Although real-time solutions for whole-body motion tracking are widely marketed (e.g., marker-based motion capture such as Vicon or marker-less systems such as Microsoft Kinect and Xsens wearable suit system), the dynamics estimation is still a challenging problem, especially in those scenarios for which an online estimation is a crucial requirement (e.g., health monitoring or manufacturing ergonomy).
This paper builds upon our former work [7], where a probabilistic algorithm to estimate the human whole-body fixed-base dynamics is described. We propose here a stochastic methodology for the simultaneous floating-base estimation of the human kinematics and dynamics. The estimation is computed by means of a sensor-fusion-based tool able to provide an estimation of the whole-body kinematics and dynamics of the human (torques, internal forces exchanged across joints, and external forces acting on links) by leveraging the reliability of the available measurements. The core of the algorithm has been here adapted to a new version to address the need of the floating-base estimation, in which the position and the velocity of the human base are not assumed to be known a priori. The floating-base formalism is not a novelty in the robotics research and it has been used for multiple topics for years. Inverse dynamics control for humanoids and legged robots [8], modeling and control of humanoids in dynamic environments [9], identification of humanoids inertial parameters [10,11], are only a few examples of applications. In [12,13], the formalism has been even used in the context of human-robot experiments. The human dynamics, however, has been not computed under the floating-base formalism but human motion capture data have been used to generate human-like motions to be retargeted onto floating-base humanoids.
Our experimental validation setup considered a sensorized human subject walking on a treadmill. In general, the gait analysis requires: (i) classifying the human walking state, i.e., the recursive switching pattern from right leg single support → double support → left leg single support; and (ii) defining an algorithm able to detect the pattern classification. In a recent study on the human joint muscular torques estimation during gait [14], two dynamical models have been considered separately for the legs to overcome the problem of the switching contact detection and to avoid the increasing complexity of the control algorithm for pattern classification. In this work, we decided to perform a different choice: we considered the dynamics of the human as a whole (with the pelvis as a floating base) and we developed an algorithm to detect the feet contact via additional sensors readings (force/torque sensors). In general, the algorithm constitutes a powerful and versatile tool to arbitrarily analyze all those tasks for which a switching contact condition is required without changing the base inside the algorithm.
The paper is structured as follows. Section 2 defines the mathematical notation and describes the human kinematics and dynamics modeling. Section 3 describes the steps useful to compute the kinematics and joint torques whole-body estimation for the floating-base formalism. In Section 4, the experimental setup and data analysis are described. Conclusions, limitations and several considerations on future developments are discussed in Section 5.

•
Let R and N be the set of real and natural numbers, respectively. • Let x ∈ R n denote a n-dimensional column vector, while x denotes a scalar quantity. We advise the reader to pay attention to the notation style: we define vectors, matrices with bold small and capital letters, respectively, and scalars with non-bold style. • Let |x| be the norm of the vector x.

•
Let 0 m and 1 m be the zero and identity matrices ∈ R m×m , respectively. The notation 0 m×n represents the zero matrix ∈ R m×n . • Let I be an inertial frame with z-axis pointing against the gravity (g denotes the norm of the gravitational acceleration). Let B denote the base frame, i.e., a frame attached to the base link. Let L be the generic frame attached to a link, and J the frame of a joint.  • Given a stochastic variable x, let p(x) denote its probability density and p(x|y) the conditional probability of x given the assumption that another stochastic variable y has occurred.
and Σ x = E[xx ] be the mean and covariance of x, respectively. Let x ∼ N (µ x , Σ x ) be the expression for the normal distribution of x.

Human Kinematics and Dynamics Modeling
The human is modeled as a rigid multi-body system with n ∈ N internal Degrees of Freedom (DoFs). The system is composed of N B rigid bodies, called links (denoted with L) connected by joints (denoted with J). Links are numbered from 0 to N B − 1, being 0 the floating base. Furthermore, λ(L) and µ(L) represent the parent and child links of L, respectively. The topological order is such that links L and λ(L) are coupled by joint J. The joint motion constraint is modeled with the motion freedom subspace S ∈ R 6 . We assume that all the joints have one DoF each, which implies that the joint numbering is from 1 to n = N B − 1. No parent joint is assumed for the floating base. See the overall representation in Figure 1. We also assume that none of the links have a known a priori constant pose w.r.t. I. Thus, we say that the system is floating base. More precisely, the system configuration space is a Lie group Q = R 3 × SO(3) × R n such that q = (q b , s) ∈ Q, being q b = ( I o B , I R B ) ∈ R 3 × SO(3) the pose of the base frame B w.r.t. I and s ∈ R n the joint positions vector capturing the topology of the system. The velocity of the system is represented by ν = ( I v B ,ṡ) ∈ R 6+n being I v B = ( Iȯ B , I ω B ) ∈ R 6 the velocity of B w.r.t. I (the angular velocity of the base I ω B is such that IṘ B = S( I ω B ) I R B ).ṡ ∈ R n is the joint velocities vector. If the system is interacting with the external environment by exchanging n c wrenches, the dynamics of the floating-base system can be described by adopting the Euler-Poincaré formalism ( [15], Ch. 13.5): where M ∈ R (n+6)×(n+6) is the mass matrix, h ∈ R n+6 is a vector accounting for the Coriolis effects and gravity terms, B := (0 n×6 , 1 n ) is a selector matrix, τ ∈ R n represents the joint torques, and f x k = ( f x k , m x k ) ∈ R 6 is a vector representing the external wrench acting on the system on the link that has the kth contact point, being f x k and m x k the external force and moment, respectively. The Jacobian J C k (q) ∈ R 6×(6+n) is an operator that maps the system velocity ν with the velocity v at the kth contact frame, such that with J b (q) ∈ R 6×6 and J s (q) ∈ R 6×n Jacobians being related to the base and joint configuration, respectively.

Case-Study Human Model
Our case-study human body owns N B = 67 links and n = 66 internal DoFs. The links were modeled with simple geometric shapes (parallelepiped, cylinder, and sphere) whose dimensions were estimated via inertial measurement units (IMUs) readings (i.e, Xsens motion capture system provides the position of several anatomical bony landmarks w.r.t. the origin of each link). The dynamic properties of each link (i.e., inertias and center of mass) were computed via the anthropometric data available in the literature by: (i) exploiting the relation between the total body mass and the mass of each link [16,17]; and (ii) assuming geometric approximations and homogeneous density for the links [18,19].

Simultaneous Floating-Base Estimation of Human Whole-Body Kinematics and Dynamics
In this section, we describe step-by-step the simultaneous floating-base estimation algorithm for the human whole-body kinematics and dynamics.

Offline Estimation of Sensor Position
The first objective was to develop a Universal Robot Description Format (URDF) model for the human with properties listed in Section 2.3 (see Figure 2, right). The URDF is a XML-based file format for representing the kinematics and dynamics of multi-body systems. A crucial step for the URDF generation is to identify the position of each sensor w.r.t. the attached link frame (Figure 2, left). Xsens exposes the sensor linear acceleration, the link angular velocity and acceleration, the sensor and link orientation w.r.t. the inertial frame. However, the sensor position is not provided by its framework. A procedure to estimate the sensor position by processing IMUs data was therefore adopted. The procedure is very similar to the one in [20], where it is used for humanoid robots.
More precisely, if S is the frame associated to the sensor and L is the frame of the link where the sensor is rigidly attached, then the measurement equation is such that Equation (3) can be rearranged in the following form: Given N m measurements, the position of the sensor w.r.t. the link, i.e., L o S , is the solution of the following optimization problem: being

Estimation of Human Kinematics
The objective of this section is to derive algorithms for estimating the human kinematic configuration q = (q b , s) and its derivatives.
Per each link pair [λ(L), L] coupled by the joint J , s ∈ R n was computed by solving an optimization problem using Ipopt [21]. The problem is formulated to minimize the distance between the measured, i.e., λ(L) R meas L , and the computed, i.e., λ(L) R L , relative rotations between the frames attached to the link pair, such that being the distance defined in terms of the rotation error parameterized in Euler angles, and (s J,min , s J,max ) joint limits. We refer to Equation (6) as a link-pairwise inverse kinematics (IK) problem. Joint velocities and accelerationsṡ,s ∈ R n were computed by using a weighted sum of moving windows of elements with a third-order polynomial Savitzky-Golay filtering [22]. The base pose q b was obtained via IMUs readings. The pivotal modification for the floating-base formalism deals with the computation of the velocity I v B of the floating base. It is assumed that holonomic constraints in the form of c(q) = 0 act on the system in Equation (1). In the human experimental framework, constraints occur when the system is in contact with the ground such that the feet can be considered as end-effectors with zero velocity (i.e., I v C k = 0). This yields to If RF and LF are the contact frames associated to the right and left foot, respectively, for Equation (2), we can write Equation (7) such that if only the contact in RF occurs, and if the contact occurs in LF . Nevertheless, if the system is simultaneously constrained by both feet, we need to consider the overall effect on the system,

Offline Contact Classification
We implemented an offline algorithm to detect which foot is in contact with the ground, i.e., double support state, left single support state, or right single support state (see Algorithm 1). The contact classification is determined via force/torque (FT) sensors readings and depends on a self-tuned threshold force value T f z . The threshold defines how big the area of the double support has to be considered. When a single support occurs, the algorithm is able to classify which foot is in contact with the ground by reading and comparing the FT sensors values.

Algorithm 1 Offline Feet Contact Cassification.
Require: FT sensor forces (z component) for right foot RF f z and left foot LF f z 1: procedure 2: N ← number of samples 3: Classify j as double support sample 8: else 9: if RF f z (j) > LF f z (j) then 10: Classify j as right single support sample 11: else 12: Classify j as left single support sample 13: end if 14: end if 15: end for 16: end procedure

Maximum-A-Posteriori Algorithm for Floating-Base Dynamics Estimation
The simultaneous estimation of the human kinematics and dynamics is performed by means of a Maximum-A-Posteriori (MAP) algorithm. The advantages of this algorithm are discussed in [7]. Here, the objective is to describe how the core of the algorithm was modified to fit the floating-base formalism. The main difference lies in a new representation for the acceleration. Instead of using the proper body acceleration ∈ R 6 (e.g., as in [23]), i.e., we decided to adopt the proper sensor acceleration ∈ R 6 , i.e., being X ∈ R 6×6 the adjoint transformation matrix for motion vectors. The main advantage is that the linear part of Equation (12) corresponds to Equation (3) where the frame of the accelerometer coincides with the frame L of the link (same origin and orientation). In general, several modifications were required for the floating-base formalism as follows.
• We used the new acceleration representation by exploiting the relation between Equation (11) and Equation (12), i.e., • Since we broke the univocal relation between each link and its parent joint, we redefine the serialization of all the kinematics and dynamics quantities in the vector d w.r.t. the fixed-base serialization of the same vector in Section 4 of [7], thus being In the new serialization, α g ∈ R 6 is the proper sensor acceleration of Equation (12) and f x ∈ R 6 is the external wrench acting on each link. Similarly for the joint quantities, f ∈ R 6 is the internal wrench (or joint wrench) exchanged from λ(L) to L through the joint J, whiles is the joint acceleration.
• The variable τ was removed from d. The joint torque can be obtained as a projection of the joint wrench on the motion freedom subspace, such that τ J = S J f J , for each joint J of the model.
Within this new formalism, we rewrite the equation of the acceleration propagation and the recursive Newton-Euler equations. The Newton-Euler formalism is an equivalent representation of Equation (1) (more details about this choice can be found in Section 3.3 of [24]). For the sake of simplicity, here following, we refer to L and J as compact forms for L L and J J , respectively, thus: where X * ∈ R 6×6 and × * are the adjoint transformation matrix and the dual cross product operator ∈ R 3 for force vectors, respectively. In general, these equations seem to be much more complex than the ones obtained by using the proper body acceleration in [7]. They have, however, the convenient property to be agnostic to the linear velocity of each link. This property drastically simplifies the generalization to the floating-base case, in which the linear velocity of the floating base is, in general, not available.
As already described in [24], the estimation problem can be compactly arranged in a matrix form, as follows: More in detail: • The first set of equations Y(s)d + b Y (s,ṡ) = y accounts for the sensor measurements. The number of equations depends on how many sensors are conveyed into the vector y ∈ R y and it does not depend on the number of links in the model (more than one sensor could be associated to the same link, e.g., the combination of an IMU + a FT sensor). In general, the sensor matrices are not changed within the new floating-base formalism. The only difference is that the accelerometer has a different relation with the acceleration of the body. In particular, if the frame L of a link and the frame associated to the IMU located on the same link are rigidly connected, then Similarly, for the FT sensor frames rigidly connected to the feet frames, the measurement equation is • The second set of equations D(s)d + b D (s,ṡ) = 0 represents the compact matrix form for Equations (16) and (17) given the new serialization of d in Equation (14). The matrix D ∈ R 12N B ×d is a matrix with 12N B rows and d columns, i.e., the number of rows of d in Equation (14). The matrix blocks in D for the acceleration of Equation (16) are recursively the following: The blocks in D for Newton-Euler equations related to Equation (17) are instead: All the other blocks in D are equal to 0. Unlike D, the term b D ∈ R 12N B is affected by the new representation of the acceleration w.r.t. the one in [7]. Each subterm b D L ∈ R 12 is such that The solution of the system in Equation (18) is computed in a Gaussian domain via MAP estimator. Within this framework, d and y are stochastic variables with Gaussian distributions and the problem is solved by maximizing the conditional probability of d given the measurements y, i.e., Equation (24) corresponds to the mean of the conditional probability µ d|y , such that More details on the MAP solution are provided in Appendix A.

Experimental Setup
The objective of the experiment was to test the goodness of the estimation algorithm. An experimental session was carried out at Istituto Italiano di Tecnologia (IIT), Genoa, Italy, with a healthy male subject. The participant was equipped with an Xsens wearable motion tracking system with 17 IMUs to capture the whole-body kinematics. A pair of sensorized shoes developed at IIT was used to detect the ground reaction forces. Each shoe was equipped with two six-axis FT sensors able to measure 6D wrenches (3 forces and 3 moments), as shown in Figure 3. The subject was asked to perform a set of different tasks, as listed in Table 1.  Data were recorded at 50 Hz via a YARP-based [25] framework for wearable sensors that allows synchronously collecting data coming from multiple sources (see open-source code available on Github repository https://github.com/robotology-playground/wearables). Data processing was analyzed on MathWorks MATLAB R . The MAP computation code (open-source at https://github. com/claudia-lat/MAPest) relies on the C++ based iDynTree multi-body dynamics library designed for free floating robots [26]. iDynTree is released as open source code available on Github: https:// github.com/robotology/idyntree. Furthermore, it is worth remarking that an important modification on the IK computation was here introduced w.r.t. the one in [7]. We removed the OpenSim IK toolbox dependency and we computed the whole-body joint angles with an Ipopt-based IK (see Section 3.2).
Data coming from the shoes FT sensors were analyzed to detect the feet contact, as described in Algorithm 1 in Section 3.3. The overall estimation considered thus Equation (10b) for T1, Equation (8b) for Sequence 2 of T2, and Equation (9b) for Sequence 2 of T3. In Sequence T4 on Figure 4, for instance, the algorithm detected the switching contact condition of the feet in Figure 5 and applied the proper base velocity computation. Figure 6 shows the feet contact detection for the tasks in Table 1. Eq.10b Eq.9b

Comparison between Measurement and Estimation
The primary objective of the floating-base MAP algorithm is to estimate simultaneously kinematic and dynamic quantities related to the links and the joints of the human model. The vector d in Equation (14) contains variables measurable via sensors (α g lin , f x and m x ) and variables that cannot be measured in humans (f and τ) but only estimated with the algorithm. The MAP algorithm represents the probabilistic way to estimate those quantities for which a quantitative measure does not exist. The objective is to compare the same variables (measured and estimated) to prove the goodness of the proposed algorithm. Figure 7 shows the comparison for the base linear proper sensor acceleration α g lin [ m/s 2 ] between the measurement (mean and standard deviation, in red) and the estimation via algorithm (mean, in blue), for tasks T1, T2, T3 and T4, respectively. The same comparison for the external force f x [ N] and external moment m x [ Nm] is shown in Figure 8a,b for the left foot and right foot, respectively. The validation was performed along with a Root Mean Square Error (RMSE) investigation for linear accelerations and external wrenches ( Table 2). Error range values are shown in Table 3.
It is worth remarking on the importance of the choice of the covariance matrix associated to the sensors. It can be manually tuned as a parameter of the measurement trust. In this experimental analysis, covariances were chosen in a range from 10 −6 to 10 −4 : the higher the level of sensor trust (i.e., low covariance), the lower the RMSE associated to the sensor variable.

Human Joint Torques Estimation during Gait
The floating-base MAP algorithm provides the whole-body joint torque estimation. The estimated torque does not have a measured quantity to be compared. We can trust only the estimation as a consequence of the validation analysis in Section 4.2. The algorithm becomes particularly useful when dealing with the human gait analysis. Figure 10 shows the joint torque estimations along with the joint angles for the ( Figure 10a) left leg and ( Figure 10b) right leg, respectively. We decided, here, to show only the most representative results for the walking task.

Comparison between Fixed-Base and Floating-Base Algorithms
We performed a comparison between the floating-base estimation w.r.t. the fixed-base estimation done in [24] by computing the norm of the error between the two formalisms, for tasks T1, T2 and T3. In general, the error norm was computed as the norm of the difference between each fixed-base and the floating-base estimated variable, i.e, ε(variable) = (variable estimatedFixed − variable estimatedFloating ) . Figure 11 shows the norm of the error for the base proper body linear acceleration ε(a g lin ) [ m/s 2 ] and angular acceleration ε(a g ang ) [ rad/s 2 ] of the base (i.e, the Pelvis) between the estimations with the formalisms. The proper body acceleration for the floating-base MAP estimation was obtained via Equation (13). The same comparison was performed for the overall set of external force ε( f x ) [ N] and moment ε(m x ) [ Nm] errors (Figure 12), and for the entire set of joint acceleration ε(s) [ rad/s 2 ] and torque ε(τ) [ Nm] (Figure 13) errors. Table 4 shows the mean and standard deviation of the error norms.
In addition to the analytical modifications for the floating-base formalism described in Section 3.4, there is another important advantage in using the floating-base MAP. Unlike the fixed-base estimation where it is fundamental to change the base among the tasks (left foot for tasks T1 and T3, and right foot for T2), this limitation does not exist for the floating-base algorithm, e.g., the pelvis remains the base for all the tasks. Furthermore, the floating-base formalism allows us to make up for the lack of the external force estimation that exists for the fixed-base algorithm on the link appointed as the model base.   Table 4. Mean and standard deviation of the error norm for: (i) the base proper body linear acceleration related to Figure 11; (ii) the external wrench related to Figure 12; and (iii) the joint acceleration and torque related to Figure 13, for tasks T1, T2 and T3.

A Word of Caution on the Covariances Choice
The MAP algorithm estimation depends on the covariance values chosen by the end-user. In Equations (25a) and (25b), it is visible the role of: (i) the measurements covariance Σ y ; and (ii) a covariance Σ D that, in turn, takes into account the model reliability (via covariance Σ D ) and the prior on the estimation (via covariance Σ d ) (see details in Appendix A, Equations (A5a) and (A5b)). In general, the procedural approach consists in assigning: • low values for the covariance Σ y if trusting in the sensor measurements; • low values for the model covariance Σ D for trusting the dynamic model; and • high values for the covariance Σ d , which means that the end-user does not know any a priori information on the estimation.
The combined contribution of this set of covariances affects the final estimation. The estimation vector d contains variables that are also measured (e.g., linear acceleration, external wrench, and joint acceleration) for which the covariance of the measurement Σ y plays a predominant role (a minor role is due to the covariance Σ d of the prior). A problem arises, however, when considering the angular acceleration α g ang . This variable is part of the d vector as estimation, but it is not measured. At the current stage, a way to play with the angular acceleration trust is, therefore, to tune Σ d . A forthcoming investigation will deal with the possibility of integrating the angular acceleration as part in the vector y of the measurements.

Conclusions
In this paper, we present a stochastic methodology for the simultaneous floating-base estimation of the whole-body human kinematics and dynamics (joint torques, internal forces and external forces). The novelty consists in the possibility to perform the estimation in a floating-base framework. The floating base can be arbitrarily chosen among the model links and the algorithm requires to estimate the pose and 6D velocity of the base w.r.t. the inertial frame.
The algorithm was validated by carrying out a four-task experimental session with a healthy subject equipped with a wearable motion tracking system-to capture the whole-body kinematics-and a pair of force/torque sensorized shoes. We performed the tasks in Table 1 by considering the human pelvis as the system (floating) base. This choice came particularly in handy with the walking task on the treadmill. In general, the algorithm allows analyzing all those tasks for which a switching contact condition is implicitly required (e.g., the human gait) without manually tuning the model base into the algorithm.
Current limitations of the methodology concern: (i) the human URDF generation and the estimation of sensors position w.r.t. the attached link (see Section 3.1); and (ii) the feet contact classification (see Section 3.3), as they are carried out in an offline post-processing step. The impeding objective is to develop a new online procedure to automatize the human model generation from real-time acquisitions together with a real-time algorithm to classify the feet contact. These two features will strongly improve the already existing tool for the online estimation of the human joint torques (open-source code available on Github repository https://github.com/robotology/humandynamics-estimation).

Conflicts of Interest:
The content of this publication is the sole responsibility of the authors. The European Commission or its services cannot be held responsible for any use that may be made of the information it contains.

Appendix A
Appendix A provides the reader with a description of the Maximum-A-Posteriori (MAP) estimator as a tool for the whole-body kinematics and dynamics human estimation. More details can be found in [24].
The problem in Equation (18)  The solution consists in the computation of p(y|d), p(d) and then of p(d|y).

•
The conditional probability p(y|d) is: which implicitly makes the assumption that the set of measurements equations Y(s)d + b Y (s,ṡ) = y is affected by a Gaussian noise with zero mean and covariance Σ y|d .
• Let d ∼ N (µ D , Σ D ) the normal distribution of d. The probability density p(d) is where the covariance and the mean are, respectively, In particular, covariances Σ D and Σ d account for the reliability of the model constraints and on the estimation prior, respectively, in the equation D(s)d + b D (s,ṡ) = 0 .
• To compute p(d|y), it suffices to combine Equations (A3) and (A4), with covariance matrix and mean as follows: In the Gaussian domain, the MAP solution coincides with the mean in Equation (A7b) yielding to: