# Invariant Observer-Based State Estimation for Micro-Aerial Vehicles in GPS-Denied Indoor Environments Using an RGB-D Camera and MEMS Inertial Sensors

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Overview of the RGB-D Visual/Inertial Navigation System Framework

**Figure 1.**Block diagram of the RGB-D visual/inertial navigation scheme. MEMS, micro-electro-mechanical system; IMU, inertial measurement unit; MAV, micro-aerial vehicle.

## 4. Robust RGB-D Visual Odometry

#### 4.1. Robust Feature Detection and Matching

**Figure 2.**Process flow of the robust RGB-D visual odometry. ORB, oriented FAST and rotated BRIEF algorithm; OFC-ORB, optical flow-constrained ORB; RANSAC, random sample consensus algorithm.

_{m}and I

_{m}

_{+1}be two consecutive grayscale images captured at time t

_{m}and t

_{m}

_{+1}= t

_{m}+ ∆t by the RGB-D camera. The functions I

_{m}(

**u**) and I

_{m}

_{+1}(

**u**) provide the respective grayscale intensity values of pixel

**u**at the location

**u**= [x, y]

^{T}in two images. Given a point

**p**= [x

_{p}, y

_{p}]

**in image I**

^{T}_{m}, the objective of the optical flow-based tracking is to find the corresponding point

**q**= [x

_{q}, y

_{q}]

**in I**

^{T}_{m}

_{+1}, with the optical-flow disparity

**d**= [d

_{x}, d

_{y}]

^{T}that minimizes the following error function:

_{x}, w

_{v}denote the size of the pixel window around

**p**and

**q**. The Lucas–Kanade (LK)-optical-flow approach employs the first-order linear approximation of the error function given by Equation (1), and solves the above problem using a Newton-Raphson iteration method. Let

**d**

_{k}be the optical-flow calculated from the kth iteration step and

**d**

_{0}= 0. The k + 1th iteration calculates

**δ**

_{k}that minimizes the following error function:

**δ**

_{k}, the optical-flow from the k + 1th iteration step can be obtained by:

**δ**

_{k}<

**∆δ**(

**∆δ**is a pre-defined threshold), and the estimates of

**d**

_{k}converge to optimal

**d**ideally.

_{x}× n

_{y}, and it is considered as the zeroth level of the pyramid, i.e., I

^{0}= I, the pyramid representation of the image I can be established as follows: Define l = 0, 1, 2, …, l

_{n}as levels of the pyramid, and let I

^{l}

^{−1}be the image of the l − 1th level of size ${n}_{x}^{l-1}$, ${n}_{y}^{l-1}$, the lth level image I

^{l}is given by:

_{n}) of the pyramid, and the estimate results from the previous level are used as initial parameters of the computations in the next level to obtain a refined estimate. This procedure traverses the image pyramid until the original level image is reached (l

_{0}).

**p**as

**p**

^{k},

**p**

^{k}= [${x}_{p}^{k}$, ${y}_{p}^{k}$]

^{T}. Following Equations (4) and (5), the relationship of

**p**

^{k}and

**p**is given by:

**p**

^{k}=

**p**/2

^{k}. Assume that

**g**

^{k}= [${\mathbf{g}}_{x}^{k}$, ${\mathbf{g}}_{y}^{k}$]

^{T}is the optical-flow estimate calculated from level l

_{n}to level l

_{k}

_{+1}, and set the initial optical-flow estimate at level l

_{n}to zero (i.e.,

**g**

^{k}= [0 0]

^{T}). The level l

_{k}images can be centered-compensated using the initial estimate

**g**

^{k}:

**d**

^{k}between the compensated images ${I}_{m-1,c}^{\mathrm{k}}$ and ${I}_{m,c}^{\mathrm{k}}$ can be estimated following the aforementioned LK-optical-flow method, which operates by minimizing the following error function:

**g**

^{k}can then be corrected using

**d**

^{k}:

**g**

^{k}

^{−1}can be used as the initial estimate of the computation in level l

_{k}

_{−1}. The above procedure goes on through the pyramid to the original level l

_{0}, and the final estimate of the optical-flow

**d**is given by:

**Figure 3.**Pyramid implementation of the optical-flow tracking operation. LK optical-flow: Lucas–Kanade optical-flow algorithm.

**Algorithm 1.**OFC-ORB feature detection and matching.

Input | Two consecutive images $\left\{{I}^{m-1},{D}^{m-1}\right\},\left\{{I}^{m},{D}^{m}\right\}$ captured by the RGB-D camera |

Output | Feature correspondence set S of ${I}^{m-1},{I}^{m}$ |

1 | Extract features ${P}_{{n}_{1}}^{m}=\left\{{p}_{i}^{m},i=\mathrm{1...}{n}_{1}\right\}{Q}_{{n}_{1}^{\prime}}^{m-1}=\left\{{q}_{j}^{m-1},j=\mathrm{1...}{n}_{1}^{\prime}\right\}$ from ${I}^{m},{I}^{m-1}$ respectively, using the ORB detector, compute the descriptor $v({p}_{i}^{m}),v({q}_{j}^{m-1})$ for each feature of ${P}_{{n}_{1}}^{m},{Q}_{{n}_{1}^{\prime}}^{m-1}$ |

2 | Extract the depth data for each feature of ${P}_{{n}_{1}}^{m},{Q}_{{n}_{1}^{\prime}}^{m-1}$, discard features that do not have corresponding depth, sort features in ${P}_{{n}_{1}}^{m},{Q}_{{n}_{1}^{\prime}}^{m-1}$ in an ascending order of the x-coordinates, obtain ${P}_{{n}_{2}}^{m},{Q}_{{n}_{2}^{\prime}}^{m-1}$ |

3 | Initialize the feature correspondence set $S\leftarrow \{\varnothing \}$ |

4 | for each ${p}_{i}^{m}\in {P}_{{n}_{2}}^{m}$ |

5 | Estimate the optical-flow vector ${d}_{i}^{m}$ of ${p}_{i}^{m}$ using the pyramid LK-optical-flow algorithm, calculate the corresponding point ${q}_{i}^{m-1}$ in ${I}^{m-1}$: ${q}_{i}^{m}={p}_{i}^{m-1}+{d}_{i}^{m}$. |

6 | Build the confidence sub-window ${w}_{q}^{m-1}$ of size $0.1{n}_{x}\times 0.1{n}_{y}$ centered at ${q}_{i}^{m-1}$ |

7 | Find the best match ${\widehat{q}}_{i}^{m}$ of ${p}_{i}^{m}$ from ${w}_{q}^{m-1}$, by searching for ${q}_{r}\in {w}_{q}^{m-1}$ that satisfies: $r=\mathrm{arg}\mathrm{min}\Vert v({p}_{i})-v({q}_{r})\Vert $ |

8 | Build a sub-window ${w}_{p}^{m}$ centered at ${p}_{i}^{m}$ in ${I}^{m}$, find the best match ${\widehat{p}}_{i}^{m}$ of ${\widehat{q}}_{i}^{m}$ from ${w}_{p}^{m}$ |

9 | if ${\widehat{p}}_{i}^{k}={p}_{i}^{k}$ |

10 | Estimate the optical-flow vector ${\widehat{d}}_{i}^{m}$ between $\left\{{p}_{i}^{m},{\widehat{q}}_{i}^{m}\right\}$ |

11 | if $\left|{d}_{i}^{m}-{\widehat{d}}_{i}^{m}\right|<\epsilon $ |

12 | $S\leftarrow S\cup \left\{{p}_{i}^{m},{\widehat{q}}_{i}^{m}\right\}$ |

13 | end if |

14 | end if |

15 | end for |

16 | return S |

#### 4.2. Robust Inlier Detection and Relative Motion Estimation

**P**m−1,

**Q**m (

**P**m−1 = {

**p**i},

**Q**m = {

**q**i}, i = 1, …, n, ${p}_{i},{q}_{i}\in {\mathbb{R}}^{3}$). Given these consecutive 3D feature correspondences captured at different time steps, the MAV’s relative motion from the prior to the subsequent time step can be estimated based on the transformation of the two 3D point clouds. The concept of relative motion estimation is illustrated in Figure 5.

**R**and translation

**t**, and the relationship of the 3D feature correspondences can then be given by:

**R**and

**t**in a least-square sense can be obtained by minimizing the following error function:

**P**'

_{m−1}= {

**p**'

_{i}} be the 3D point cloud obtained by applying the optimal transformation $\widehat{R},\widehat{t}$ to

**Q**

_{m}

**P**'

_{m−1}must coincide with that of the actual point cloud

**P**

_{m}

_{−1}:

**Q**

_{m}. Following Equations (13) and (14), we have:

**R**. Therefore, the above simplified least-square problem can be solved through two primary steps: (1) Find the optimal $\widehat{R}$ that minimizes the error function in Equation (17); (2) Calculate$\widehat{t}$ according to Equation (15).

**p**

_{i}, ∆

**q**

_{i}) are then calculated according to Equation (16). Given ∆

**p**

_{i}, ∆

**q**

_{i}, the algorithm computes the following matrix:

**H**:

**P**

_{m}

_{−1},

**Q**

_{m}, let

**p**

_{i},

**p**

_{j}ϵ

**P**

_{m}

_{−1}be two arbitrary 3D features from the 3D point clouds

**P**

_{m}

_{−1}at time m − 1, and

**q**

_{i},

**q**

_{j}ϵ

**Q**

_{m}denote their corresponding matches in

**Q**

_{m}at time m, respectively. The consistency constraints of {

**p**

_{i},

**p**

_{j}} and {

**q**

_{i},

**q**

_{j}} is then given as:

**p**

_{i},

**p**

_{j},

**q**

_{i},

**q**

_{j}are expressed in the same global coordination. By performing the consistency check to the 3D feature correspondences, the algorithm can prune out incorrect matches that do not satisfy the consistency constraints in Equation (21) (i.e., The deviations of relative distances exceed the threshold δ). Using the consistency check results, finding the largest set of inliers can be transformed into the problem of determining the maximum clique on a graph. This problem can then be solved by iteratively adding feature matches with the greatest degree (i.e., feature matches with the largest number of consistent matches), which is consistent with all the feature matches in the existing consistent set. After finding the set of inliers, the final relative motion is estimated using an improved RANSAC procedure which operates by selecting consensus feature matches based on their similarity to a hypothesis, and progressively refining the relative motion hypothesis based on the selected inliers [47]. Since the consistency check-based inlier detection procedure has already increased the ratio of inliers, the RANSAC procedure can generate a good estimate through only a very limited number of iterations. This significantly reduces the overall computation cost of the algorithm.

**p**

_{i}, ∆

**q**

_{i}) from each 3D feature point to its corresponding centroid (line 3). These relative distances-to-centroid are then used to check the consistency of each pair of feature matches: the algorithm computes the errors between the distances-to-centroid of feature points and those of their corresponding matches (line 4), and sort all the 3D feature points according to these consistency errors (line 6). The last ξ% (ξ = (n − n

_{1})/n) feature match pairs with the largest consistency errors in the sorted feature point set are discarded (line 7). Using the distances-to-centroid information, the above refined strategy exempts the consistency check from the conventional time-consuming maximum-clique search procedure, and can ensure a high ratio of inliers.

**R**

_{0},

**t**

_{0}line 13), using the SVD-based least-square approach described previously. The rest of the feature match pairs in Q are then checked for their compatibility with this initial hypothesis, by computing the transformation error using

**R**

_{0},

**t**

_{0}, and comparing this error to the consensus error bound ε (line 15), and all compatible feature match pairs are added to a consensus set S

_{c}(line 16). This operation serves to prune out outliers from Q. Once we get a sufficient number of feature match pairs in the consensus set (line 16), these data in S

_{c}are used to compute a refined motion estimate (

**R**

_{1},

**t**

_{1}). To adjust the consensus threshold, the average transformation error of feature matches in S

_{c}is calculated using (

**R**

_{1},

**t**

_{1}) (lines 21–24), and this error is used as the updated consensus error bound ε in the next iteration cycle if it is lower than the original ε (line 26). The above procedure continues until the probability of finding a better solution becomes sufficiently low (e.g., the difference between current average transformation error and prior consensus error is negligibly small), and the current solution is then accepted as the final motion estimate of (

**R**,

**t**) (line 28).

**Algorithm 2.**Consistency-RANSAC inlier detection and relative motion estimation.

Input | Two 3D point clouds with correspondences: ${P}_{m-1}\text{=}\left\{{p}_{i}\right\},{Q}_{m}\text{=}\left\{{q}_{i}\right\},i=\mathrm{1...}n$ |

Output | Relative motion $(R,t)$ of ${P}_{m-1},{Q}_{m}$ |

1 | Calculate the centroid $\overline{p},\overline{q}$ of ${P}_{m-1},{Q}_{m}$ |

2 | for i = 1 to n |

3 | $\Delta {p}_{i}\leftarrow \Vert {p}_{i}-\overline{p}\Vert ,\begin{array}{c}\end{array}\Delta {q}_{i}\leftarrow \Vert {q}_{i}-\overline{q}\Vert $ |

4 | ${d}_{i}\leftarrow \left|\Delta {p}_{i}-\Delta {q}_{i}\right|$ |

5 | end for |

6 | Sort the point set $M=\left\{({p}_{i},{q}_{i}),i=\mathrm{1...}n\right\}$ in the ascending order of ${d}_{i}$: ${M}_{\text{sorted}}\leftarrow M$ |

7 | Select the top n_{1} pairs of feature matches: $Q\leftarrow \left\{({p}_{i},{q}_{i})|({p}_{i},{q}_{i})\in {M}_{\text{sorted}},i=\mathrm{1...}{n}_{1}\right\}$ |

8 | RANSAC initialization: $j\leftarrow 1$, $(R,t)\leftarrow (I,0)$, $\epsilon \leftarrow \infty $ |

9 | while $j<\text{MaxIteration}$ do |

10 | Initialize the sample set and the jth consensus set: ${S}^{j}\leftarrow \varnothing ,{S}_{c}^{j}\leftarrow \varnothing $, ${\epsilon}^{\prime}\leftarrow 0$ |

11 | Randomly select $\text{\lambda}$ pairs of feature matches from Q: ${S}^{j}\leftarrow \left\{({p}_{i},{q}_{i})|({p}_{i},{q}_{i})\in Q,i=\mathrm{1...}\text{\lambda}\right\}$ |

12 | ${S}_{c}^{j}\leftarrow {S}^{j}$ |

13 | Estimate $({R}_{0}^{j},{t}_{0}^{j})$that minimizes the error function given in Equation (17) based on SVD approach, using features in ${S}^{j}$ |

14 | for each $({p}_{i},{q}_{i})\in Q$ AND $({p}_{i},{q}_{i})\notin {S}^{j}$ do |

15 | if $\Vert {p}_{i}-({R}_{0}^{j}{q}_{i}+{t}_{0}^{j})\Vert <\epsilon $ |

16 | ${S}_{c}^{j}\leftarrow {S}_{c}^{j}\cup \left\{{p}_{i},{q}_{i}\right\}$ |

17 | end if |

18 | end for |

19 | if $\text{size}({S}_{c}^{j})>\text{\eta}{n}_{1}$ |

20 | Re-estimate $({R}_{1}^{j},{t}_{1}^{j})$ that minimizes the error function based on SVD approach, using features in the consensus set ${S}_{c}^{j}$ |

21 | for each $({p}_{i},{q}_{i})\in {S}_{c}^{j}$ do |

22 | ${\epsilon}^{\prime}\leftarrow {\epsilon}^{\prime}+\Vert {p}_{i}-({R}_{1}^{j}{q}_{i}+{t}_{1}^{j})\Vert $ |

23 | end for |

24 | ${\epsilon}^{\prime}\leftarrow {\epsilon}^{\prime}/\text{size}({S}_{c}^{j})$ |

25 | if ${\epsilon}^{\prime}<\epsilon $ |

26 | $\epsilon \leftarrow {\epsilon}^{\prime}$, $(R,t)\leftarrow ({R}_{1}^{j},{t}_{1}^{j})$ |

27 | end if |

28 | if $\left|\text{\epsilon}-{\text{\epsilon}}^{\text{\u2032}}\right|<\text{\sigma}$ |

29 | break |

30 | end if |

31 | end if |

32 | $j\leftarrow j+1$ |

33 | end while |

34 | return $(R,t)$ |

#### 4.3. Global Transformation of Relative Motions

**R**,

**t**) of consecutive 3D feature point clouds at different time steps. Note that in actual applications, the features are actually fixed in the environment while the MAV moves around the features. Therefore, given the estimated feature transformation

**T**of feature points, the corresponding motion (∆

**T**) of the MAV’s current body frame with respect to the prior frame can be obtained by:

**R**, ∆

**t**denote the rotation and translation component, respectively. We can derive a global representation of the MAV’s pose with respect to an initial pose

**T**

_{0}using the following sequence of homogenous transformations:

**T**denotes the relative transformation of the MAV’s pose at different time steps. This global motion estimate can then be used as a measurement of the MAV’s state in the data fusion scheme.

## 5. Invariant Observer Based State Estimation

#### 5.1. Review of Invariant Observer Theory

**Definition 1.**

_{gϵG}acting on the manifold M can be defined as a smooth map $\left(g,\mu \right)\in G\times M\mapsto {\varphi}_{g}(\mu )\in M$ and:

**x**,

**u**,

**y**) ϵ X × U × Y. For the state estimation problem of systems modeled as the above formulation, (

**x**,

**u**,

**y**) represent the state, system input and output, respectively, where $X\in {\mathbb{R}}^{n},U\in {\mathbb{R}}^{m},Y\in {\mathbb{R}}^{r}$ are all smooth manifolds (i.e., the state manifold, the input manifold and the output manifold, respectively). Assuming that B = X × U is the trivial fiber bundle over the state manifold X, let φ

_{g}: G × X → X, ψ

_{g}: G × U → U and ρ

_{g}: G × Y → Y be the smooth Lie group actions on the system’s state, input and output manifold, respectively, where is G the system’s Lie group with the property described in Definition 1. The invariance of the system in Equation (25) by the transformation group G can be defined as:

**Definition 2.**

**Theorem 1**.

**y**= h(

**x**,

**u)**, there exists an invariant observer $\dot{\widehat{x}}=F(\widehat{x},u,y)$ that verifies the following properties:

- (a)
- $F(x,u,h(x,u))=f(x,u)$;
- (b)
- $({\phi}_{g})\cdot F(\widehat{x},u,y)=F({\phi}_{g}(\widehat{x}),{\psi}_{g}(u),{\rho}_{g}(y))$, i.e., the observer is invariant by the transformation group.

- (1)
- w
_{i}is the invariant frame. A vector field w: TX → X is G-invariant if it verifies:$${\phi}_{g}\cdot w(x)=w({\phi}_{g}(x)),\forall g\in G$$The invariant frame is defined as the invariant vector fields that form a global frame for TX. Therefore, $({w}_{1}(x),{w}_{2}(x)\mathrm{...}{w}_{n}(x))$ forms a basis for T_{x}X. An invariant frame can be calculated by:$${w}_{i}(x)=({\phi}_{\gamma {(x)}^{-1}})\cdot \partial /\partial {x}_{i}=\frac{d}{d\tau}\left({\phi}_{\gamma {(x)}^{-1}}({\upsilon}_{i}\tau )\right)$$_{i}ϵ T_{e}X is a basis of υ_{i}ϵ T_{e}X, and γ(x) is the moving frame. Following the Cartan moving frame method, the moving frame γ(x) can be derived by solving φ_{g}(x) = c for g = γ(x), where c is a constant. In particular, one can choose c = e such that γ(x) = x^{−1}. - (2)
- $\epsilon (\widehat{x},u,y)$ denotes the invariant output error, which is defined as follows:
**Definition 3.**(Invariant output error) The smooth map $\epsilon :(\widehat{x},u,y)\mapsto \epsilon (\widehat{x},u,y)$ is an invariant error which verifies the following properties:- (a)
- For any $\widehat{x},u$, $\epsilon :(\widehat{x},u,y)\mapsto \epsilon (\widehat{x},u,y)$ is invertible;
- (b)
- For any $\widehat{x},u$, $\epsilon (\widehat{x},u,h(\widehat{x},u))=0$;
- (c)
- For any $\widehat{x},u$, $\epsilon ({\phi}_{g}(\widehat{x}),{\psi}_{g}(u),{\rho}_{g}(y))=\epsilon (\widehat{x},u,y)$;

According to the moving frame method, the invariant output error can be given by:$$\epsilon (\widehat{x},u,y)={\rho}_{\gamma (\widehat{x})}(h(\widehat{x},u))-{\rho}_{\gamma (\widehat{x})}(y)$$ - (3)
- $I(\widehat{x},u)$ is the invariant of G, which verifies:$$I({\phi}_{g}(x))=I(x)$$Following the moving frame method, the invariant $I(\widehat{x},u)$ is obtained by:$$I(\widehat{x},u):={\phi}_{\gamma (\widehat{x})}(\widehat{x})\times {\psi}_{\gamma (\widehat{x})}(u)$$
- (4)
- L
_{i}is a 1 × r observer gain matrix that depends on**I**and ε, such that:$${L}_{i}(I(\widehat{x},u),0)=0,\text{\hspace{0.17em}}\forall \widehat{x}\in X$$

**Definition 4.**

- (a)
- $\eta (\widehat{x},x)$ is a diffeomorphism on X × X;
- (b)
- For any $x\in X$, $\eta (x,x)=0$;
- (c)
- $\eta ({\phi}_{g}(\widehat{x}),{\phi}_{g}(x))=\eta (\widehat{x},x)$.

**Theorem 2**[31]. For a G-invariant and G-equivariant system given by Equation (25), the state estimation error of its invariant state observer is given as:

**x**(t),

**u**(t))), the dynamics of the invariant state error depends only on the estimated invariants $I(\widehat{x},u)$. This significantly simplifies the stability analysis and the selection of observer gains.

#### 5.2. Sensor Measurement Models

_{m}, a tri-axial accelerometer module that provides measurements of the acceleration f

_{m}, as well as a tri-axial magnetometer module that measures the local magnetic field vector expressed in the body-fixed frame: y

_{m}, where the magnetic field vector can be considered as constant over a small-scale operating environment.

_{m}can be modeled as:

_{ω}denotes the constant gyroscope bias, and ν

_{ω}is Gaussian noise with zero mean. Similarly, signals of the accelerometer model can be modeled as:

_{f}represents the constant accelerometer bias, and ν

_{f}is also a Gaussian noise vector with zero mean.

_{x}, 0, m

_{z}]

^{T}. Since the magnetometer is fixed with the MAV body, the magnetic readings provided by the magnetometer are measured in the body-fixed frame, which also contain sensor noises. Denoting q as the quaternion that represents the orientation of the MAV’s body fixed frame with respect to the ground-fixed frame, the magnetometer model can be given as:

_{m}is the readings of the magnetometer, and m denotes the Gaussian noise with zero mean.

_{p}and y

_{s}are measurements of the RGB-D visual odometry and altimeter, respectively. p

_{x}

_{,y}and p

_{z}denote the MAV’s planar positions and altitude in the global frame. Both y

_{p}and ν

_{s}are the Gaussian white-noise of measurements.

#### 5.3. RGB-D Visual/Inertial Navigation System Model

**g**= [0 0 g]

^{T}is the local gravity vector in the ground-fixed frame. The state vector chosen for observer design is

**x**= [q p v b

_{ω}b

_{f}]

^{T}, along with the system input

**u**= [ω

_{m}f

_{m}

**g**]

^{T}.

_{m}in the body-fixed frame and measurements of the MAV’s position y

_{p}y

_{s}provided by RGB-D VO and altimeter, both expressed in the global frame. Using the measurement model given in Equations (41) and (42), the system output is written as:

#### 5.4. Observer Design of the RGB-D Visual/Inertial Navigation System

_{0}, p

_{0}, b

_{ω,0}, b

_{f}

_{,0}) denotes the group action of G with the following physical meaning: q

_{0}and p

_{0}represent the constant rotations and translations in the global frame, and b

_{ω,0}, b

_{f}

_{,0}ϵ R

^{3}denote constant translations on the bias of gyroscopes and accelerometers, respectively.

_{g}(

**u**) and ρ

_{g}(

**y**), we can directly verify:

_{g}, ψ

_{g}and ρ

_{g}. As a result, the existence of the invariant observer $\dot{\widehat{x}}=F(\widehat{x},u,y)$ for the system in Equations (45) and (46) can be guaranteed by Theorem 1. Using the verified invariant properties of the system dynamics and output, we can now design the invariant state observer for the RGB-D visual/inertial navigation system by following the systematic steps described in Section 5.1.

_{g}(x) = c. Let c be the unity (i.e., c = e), φ

_{g}(x) can be given as:

**y**= h(

**x**,

**u**).

_{e}X over the state manifold $X={S}^{3}\times {R}^{3}\times {R}^{3}\times {R}^{3}\times {R}^{3}$, the invariant frame w

_{i}(

**x**) is calculated by:

_{i}are 1 × 3 gain matrices of the observer. Notice that:

^{q}is a 3 × 6 matrix, and the rows of L

^{q}are from row matrix ${L}_{i}^{q}$. Other terms in Equation (58) that associated with the observer gains can also be transformed in the same manner, leading to:

#### 5.5. Calculation of Observer Gains Based on Invariant-EKF

**B**,

**D**are the input matrices of the noise vectors. The optimal state estimate $\widehat{x}$ that minimizes the estimation error $\epsilon \text{=}x-\widehat{x}$ can be obtained using the conventional continuous-time EKF procedure given by:

**K**denotes the Kalman gain, and

**A**,

**C**are the Jacobian of the process model f and measurement model h with respect to the current estimated state ($A=\partial f(x,u)/\partial x{|}_{\widehat{x}}$, $C=\partial h(x,u)/\partial x{|}_{\widehat{x}}$), respectively. According to the EKF method, we linearize the system dynamics and output model given by Equation (64) about the latest estimated state using Taylor expansion:

**K**, which is calculated following the procedure given by Equation (65).

_{x}

_{,y}, p

_{z}and v

_{p}, v

_{s}are now merged and denoted as p and v

_{p}):

**K**is composed by the gains of the invariant observer in Equation (60). Therefore, observer gains L can be extracted from matrix

**K**, which is updated via the procedure given by Equation (65) using matrices

**A**,

**B**,

**C**,

**D**.

## 6. Implementation and Experimental Results

#### 6.1. Implementation Details and Experimental Scenarios

**u**), while the full estimate $\widehat{x}$ is updated at a rate of 30 Hz, when the measurement from the RGB-D VO is available.

**Figure 6.**Experiment system: a prototype quadrotor MAV equipped with a RGB-D camera and a low-cost MEMS IMU module. (

**a**) Prototype quadrotor MAV system; (

**b**) MEMS IMU: ADIS16405;

**(c)**RGB-D Camera: PrimeSense Carmine 1.08.

**Figure 7.**Indoor flight test scenarios. (

**a**) Scenario 1: an actual laboratory; (

**b**) Scenario 2: a corridor inside a building. Both environments are without access to GPS signal.

#### 6.2. Indoor Flight Test Results

#### 6.2.1. RGB-D Visual Odometry Test Results

**Figure 8.**Experimental results of feature detection and matching. (

**a**) Features extracted from images captured at time step m − 1 (

**left**) and m (

**right**); (

**b**) feature correspondences; (

**c**) depth image captured by the RGB-D camera (

**left**) and optical-flow between consecutive images (

**right**).

Algorithms | Average Time (ms) |
---|---|

Harris Corner | 16.6 |

SIFT | 6290.1 |

SURF | 320.5 |

OFC-ORB | 13.2 |

#### 6.2.2. State Estimation Results

**Figure 9.**State estimation results. (

**a**) Attitude and velocity estimates of Scenario 1; (

**b**) estimated position vs. ground truth trajectory of Scenario 1; (

**c**) attitude and velocity estimates of Scenario 2; (

**d**) estimated position of Scenario 2.

## 7. Conclusions and Future Work

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Bouabdallah, S.; Bermes, C.; Grzonka, S.; Gimkiewicz, C.; Brenzikofer, A.; Hahn, R.; Schafroth, D.; Grisetti, G.; Burgard, W.; Siegwart, R. Towards palm-size autonomous helicopters. J. Intell. Robot. Syst.
**2011**, 61, 445–471. [Google Scholar] [CrossRef] - Goodrich, M.A.; Cooper, J.L.; Adams, J.A.; Humphrey, C.; Zeeman, R.; Buss, B.G. Using a mini-UAV to support wilderness search and rescue: Practices for human–robot teaming. In Proceedings of 2007 IEEE International Workshop on Safety, Security and Rescue Robotics (SSRR 2007), Rome, Italy, 27–29 September 2007.
- Tomic, T.; Schmid, K.; Lutz, P.; Domel, A.; Kassecker, M.; Mair, E.; Grixa, I.L.; Ruess, F.; Suppa, M.; Burschka, D. Toward a fully autonomous UAV: Research platform for indoor and outdoor urban search and rescue. IEEE Robot. Autom. Mag.
**2012**, 19, 46–56. [Google Scholar] [CrossRef] - Lin, L.; Roscheck, M.; Goodrich, M.; Morse, B. Supporting wilderness search and rescue with integrated intelligence: autonomy and information at the right time and the right place. In Proceedings of 24th AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; pp. 1542–1547.
- Bachrach, A.; He, R.; Roy, N. Autonomous flight in unknown indoor environments. Int. J. Micro Air Veh.
**2009**, 4, 277–298. [Google Scholar] - Bachrach, A.; He, R.; Roy, N. Autonomous flight in unstructured and unknown indoor environments. In Proceedings of European Conference on Micro Aerial Vehicles (EMAV 2009), Delft, The Netherlands, 14–17 September 2009.
- Bachrach, A.; Prentice, S.; He, R.; Roy, N. RANGE: Robust autonomous navigation in GPS-denied environments. J. Field Robot.
**2011**, 28, 644–666. [Google Scholar] [CrossRef] - Chowdhary, G.; Sobers, D.M., Jr.; Pravitra, C.; Christmann, C.; Wu, A.; Hashimoto, H.; Ong, C.; Kalghatgi, R.; Johnson, E.N. Self-contained autonomous indoor flight with ranging sensor navigation. J. Guid. Control Dyn.
**2012**, 29, 1843–1854. [Google Scholar] [CrossRef] - Chowdhary, G.; Sobers, D.M.; Pravitra, C.; Christmann, C.; Wu, A.; Hashimoto, H.; Ong, C.; Kalghatgi, R.; Johnson, E.N. Integrated guidance navigation and control for a fully autonomous indoor UAS. In Proceedings of AIAA Guidance Navigation and Control Conference, Portland, OR, USA, 8–11 August 2011.
- Sobers, D.M.; Yamaura, S.; Johnson, E.N. Laser-aided inertial navigation for self-contained autonomous indoor flight. In Proceedings of AIAA Guidance Navigation and Control Conference, Toronto, Canada, 2–5 August 2010.
- Weiss, S.; Achtelik, M.W.; Lynen, S.; Chli, M.; Siegwart, R. Real-time onboard visual-inertial state estimation and self-calibration of MAVs in unknown environments. In Proceedings of 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA, 14–18 May 2012; pp. 957–964.
- Wu, A.D.; Johnson, E.N.; Kaess, M.; Dellaert, F.; Chowdhary, G. Autonomous flight in GPS-denied environments using monocular vision and inertial sensors. J. Aerosp. Comput. Inf. Commun.
**2013**, 10, 172–186. [Google Scholar] - Wu, A.D.; Johnson, E.N. Methods for localization and mapping using vision and inertial sensors. In Proceedings of AIAA Guidance, Navigation, and Control Conference, Honolulu, HI, USA, 18–21 August 2008.
- Acgtelik, M.; Bachrach, A.; He, R.; Prentice, S.; Roy, N. Stereo vision and laser odometry for autonomous helicopters in GPS-denied indoor environments. Proc. SPIE
**2009**, 7332, 733219. [Google Scholar] - Achtelik, M.; Roy, N.; Bachrach, A.; He, R.; Prentice, S.; Roy, N. Autonomous navigation and exploration of a quadrotor helicopter in GPS-denied indoor environments. In Proceedings of the 1st Symposium on Indoor Flight, International Aerial Robotics Competition, Mayagüez, Puerto Rico, 21 July 2009.
- Voigt, R.; Nikolic, J.; Hurzeler, C.; Weiss, S.; Kneip, L.; Siegwart, R. Robust embedded egomotion estimation. In Proceedings of 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, USA, 25–30 September; pp. 2694–2699.
- Bachrach, A.; Prentice, S.; He, R.; Henry, P.; Huang, A.S.; Krainin, M.; Maturana, D.; Fox, D.; Roy, N. Estimation, planning, and mapping for autonomous flight using an RGB-D camera in GPS-denied environments. Int. J. Robot. Res.
**2012**, 31, 1320–1343. [Google Scholar] [CrossRef][Green Version] - Leishman, R.; Macdonald, J.; McLain, T.; Beard, R. Relative navigation and control of a hexacopter. In Proceedings of 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA, 14–18 May 2012; pp. 4937–4942.
- Guerrero-Castellanos, J.F.; Madrigal-Sastre, H.; Durand, S.; Marchand, N.; Guerrero-Sanchez, W.F.; Salmeron, B.B. Design and implementation of an attitude and heading reference system (AHRS). In Proceedings of 2011 8th International Conference on Electrical Engineering Computing Science and Automatic Control (CCE), Merida, Mexico, 26–28 October 2011.
- Bonnabel, S.; Martin, P.; Salaün, E. Invariant extended Kalman filter: Theory and application to a velocity-aided attitude estimation problem. In Proceedings of Joint 48th IEEE Conference on Decision and Control and 2009 28th Chinese Control Conference (CDC/CCC 2009), Shanghai, China, 15–18 December 2009; pp. 1297–1304.
- Kelly, J.; Sukhatme, G.S. Visual-inertial sensor fusion: Localization, mapping and sensor-to-sensor self-calibration. Int. J. Robot. Res.
**2011**, 30, 56–79. [Google Scholar] [CrossRef] - Van der Merwe, R.; Wan, E. Sigma-point Kalman filters for integrated navigation. In Proceedings of 60th Annual Meeting of the Institute of Navigation (ION), Dayton, OH, USA, 7–9 June 2004; pp. 641–654.
- Bry, A.; Bachrach, A.; Roy, N. State estimation for aggressive flight in GPS-denied environments using onboard sensing. In Proceedings of 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA, 14–18 May 2012; pp. 1–8.
- Crassidis, J.L.; Markley, F.L.; Cheng, Y. Survey of nonlinear attitude estimation methods. J. Guid. Control Dyn.
**2007**, 30, 12–28. [Google Scholar] [CrossRef] - Achtelik, M.; Achtelik, M.; Weiss, S.; Siegwart, R. Onboard IMU and monocular vision based control for MAVs in unknown in- and outdoor environments. In Proceedings of 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 3056–3063.
- Boutayeb, M.; Richard, E.; Rafaralahy, H.; Souley Ali, H.; Zaloylo, G. A simple time-varying observer for speed estimation of UAV. In Proceedings of 17th IFAC World Congress, Seoul, Korea, 6–11 July 2008; pp. 1760–1765.
- Benallegue, A.; Mokhtari, A.; Fridman, L. High-order sliding-mode observer for a quadrotor UAV. Int. J. Robust Nonlinear Control
**2008**, 18, 427–440. [Google Scholar] [CrossRef] - Madani, T.; Benallegue, A. Sliding mode observer and backstepping control for a quadrotor unmanned aerial vehicles. In Proceedings of 2007 American Control Conference, New York, NY, USA, 9–13 July 2007; pp. 5887–5892.
- Benzemrane, K.; Santosuosso, G.L.; Damm, G. Unmanned aerial vehicle speed estimation via nonlinear adaptive observers. In Proceedings of 2007 American Control Conference, New York, NY, USA, 9–13 July 2007.
- Rafaralahy, H.; Richard, E.; Boutayeb, M.; Zasadzinski, M. Simultaneous observer based sensor diagnosis and speed estimation of unmanned aerial vehicle. In Proceedings of 47th IEEE Conference on Decision and Control (CDC 2008), Cancun, Mexico, 9–11 December 2008; pp. 2938–2943.
- Bonnabel, S.; Martin, P.; Rouchon, P. Symmetry-preserving observers. IEEE Trans. Autom. Control
**2008**, 53, 2514–2526. [Google Scholar] [CrossRef][Green Version] - Bonnabel, S.; Martin, P.; Rouchon, P. Non-linear symmetry-preserving observers on lie groups. IEEE Trans. Autom. Control
**2009**, 54, 709–1713. [Google Scholar] - Mahony, R.; Hamel, T.; Pflimlin, J.-M. Nonlinear complementary filters on the special orthogonal group. IEEE Trans. Autom. Control
**2008**, 53, 1203–1218. [Google Scholar] [CrossRef][Green Version] - Martin, P.; Salaun, E. Invariant observers for attitude and heading estimation from low-cost inertial and magnetic sensors. In Proceedings of 46th IEEE Conference on Decision and Control, New Orleans, LA, USA, 12–14 December 2007; pp. 1039–1045.
- Martin, P.; Salaun, E. Design and implementation of a low-cost attitude and heading nonlinear estimator. In Proceedings of Fifth International Conference on Informatics in Control, Automation and Robotics, Signal Processing, Systems Modeling and Control, Funchal, Portugal, 11–15 May 2008; pp. 53–61.
- Martin, P.; Salaün, E. Design and implementation of a low-cost observer-based attitude and heading reference system. Control Eng. Pract.
**2010**, 18, 712–722. [Google Scholar] [CrossRef] - Bonnabel, S. Left-invariant extended Kalman filter and attitude estimation. In Proceedings of the 46th IEEE Conference on Decision and Control, New Orleans, LA, USA, 12–14 December 2007; pp. 1027–1032.
- Barczyk, M.; Lynch, A.F. Invariant extended Kalman filter design for a magnetometer-plus-GPS aided inertial navigation system. In Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC), Orlando, FL, USA, 12–15 December 2011; pp. 5389–5394.
- Barczyk, M.; Lynch, A.F. Invariant observer design for a helicopter UAV aided inertial navigation system. IEEE Trans. Control Syst. Technol.
**2013**, 21, 791–806. [Google Scholar] [CrossRef] - Cheviron, T.; Hamel, T.; Mahony, R.; Baldwin, G. Robust nonlinear fusion of inertial and visual data for position, velocity and attitude estimation of UAV. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007; pp. 2010–2016.
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571.
- Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of Computer Vision–ECCV 2006, 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Springer: Berlin, Germany, 2006; Volume 1, pp. 430–443. [Google Scholar]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary robust independent elementary features. In Proceedings of Computer Vision–ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Springer: Berlin, Germany; pp. 778–792.
- Lucas, B.D.; Kanade, T. An iterative image registration technique with an application to stereo vision. Proc. IJCAI
**1981**, 81, 674–679. [Google Scholar] - Bouguet, J.Y. Pyramidal Implementation of the Affine Lucas Kanade Feature Tracker Description of the Algorithm. Available online: http://robots.stanford.edu/cs223b04/algo_affine_tracking.pdf (accessed on 20 April 2015).
- Arun, K.S.; Huang, T.S.; Blostein, S.D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell.
**1987**, 5, 698–700. [Google Scholar] [CrossRef] - Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM
**1981**, 24, 381–395. [Google Scholar] [CrossRef] - Nistér, D. Preemptive RANSAC for live structure and motion estimation. Mach. Vis. Appl.
**2005**, 16, 321–329. [Google Scholar] [CrossRef] - OpenCV. Available online: http://opencv.org/ (accessed on 27 December 2014).

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, D.; Li, Q.; Tang, L.; Yang, S.; Cheng, N.; Song, J. Invariant Observer-Based State Estimation for Micro-Aerial Vehicles in GPS-Denied Indoor Environments Using an RGB-D Camera and MEMS Inertial Sensors. *Micromachines* **2015**, *6*, 487-522.
https://doi.org/10.3390/mi6040487

**AMA Style**

Li D, Li Q, Tang L, Yang S, Cheng N, Song J. Invariant Observer-Based State Estimation for Micro-Aerial Vehicles in GPS-Denied Indoor Environments Using an RGB-D Camera and MEMS Inertial Sensors. *Micromachines*. 2015; 6(4):487-522.
https://doi.org/10.3390/mi6040487

**Chicago/Turabian Style**

Li, Dachuan, Qing Li, Liangwen Tang, Sheng Yang, Nong Cheng, and Jingyan Song. 2015. "Invariant Observer-Based State Estimation for Micro-Aerial Vehicles in GPS-Denied Indoor Environments Using an RGB-D Camera and MEMS Inertial Sensors" *Micromachines* 6, no. 4: 487-522.
https://doi.org/10.3390/mi6040487