Scale Factor Estimation for Quadrotor Monocular-Vision Positioning Algorithms

Unmanned aerial vehicle (UAV) autonomous navigation requires access to translational and rotational positions and velocities. Since there is no single sensor to measure all UAV states, it is necessary to fuse information from multiple sensors. This paper proposes a deterministic estimator to reconstruct the scale factor of the position determined by a simultaneous localization and mapping (SLAM) algorithm onboard a quadrotor UAV. The position scale factor is unknown when the SLAM algorithm relies on the information from a monocular camera. Only onboard sensor measurements can feed the estimator; thus, a deterministic observer is designed to rebuild the quadrotor translational velocity. The estimator and the observer are designed following the immersion and invariance method and use inertial and visual measurements. Lyapunov’s arguments prove the asymptotic convergence of observer and estimator errors to zero. The proposed estimator’s and observer’s performance is validated through numerical simulations using a physics-based simulator.


Introduction
Nowadays, quadrotors are used in various applications, thanks to their low cost, mechanical robustness, and high maneuverability. Such applications include homeland security, forest-fire control, surveillance, sea and land exploration, human search and rescue, archaeological exploration, and volcanic activity monitoring, among many others [1,2]. Most of the abovementioned applications become impractical or even dangerous for human operators; thus, autonomous navigation, control, and guidance are required.
A quadrotor can perform autonomous navigation in unknown environments when its autopilot has access to all states. However, providing access to all quadrotor states without relying on a remote computer or sensors demands a vehicle with the capacity to process and extract information from only onboard sensors. The bottleneck to measuring all quadrotor states is that there are no out-of-the-box functional and reliable sensors to measure all states directly. For example, the quadrotor's attitude is obtained by fusing the measurements from an inertial measurement unit (IMU). On the other hand, the global position system (GPS) is the primary sensor used for quadrotor positioning, but it presents limitations. It fails in environments where satellite communication is degraded, called GPS-denied environments, such as water bodies and indoors [3]. Furthermore, low-cost GPS does not provide enough resolution for trajectories on a centimeter scale, and the price of a GPS with higher resolution, such as differential GPS, increases drastically.
An algorithm to fuse GPS measurements with optical flow information using a Kalman filter (KF) was proposed in [4]. Shortly after, the work reported in [5] presented an improved sensor fusion algorithm based on an extended Kalman filter (EKF) that includes the measurements from an inertial navigation system (INS). Both sensor fusion algorithms improved the position estimation for low-cost GPS but not for GPS-denied environments.
However, using EKF on this approach makes the estimator nondeterministic, so stability is not formally proven.
In the field of deterministic estimators, ref. [23] presents a scale estimator based on control stability. It shows that the absolute scale and control gain have a unique linear relationship. The absolute scale can be estimated by detecting self-induced oscillations and analyzing the system stability. The problem with this approach is that an adaptive control technique must be used for an online estimation, leaving out other types of controllers.
This article presents a scale factor estimator in the cartesian plane fused with a velocity observer, deterministic and based on the quadrotor dynamic model, using only onboard sensors. The set of sensors provides the quadrotor's attitude and angular velocity from an attitude and heading reference system (AHRS), the quadrotor's acceleration from the set of sensors of the inertial measurement unit, and the scaled position from a SLAM algorithm based on a monocular camera. The scale factor estimator and the velocity observer are designed following the immersion and invariance methodology introduced in [24]. The singular contributions of this work are: the estimator and observer are designed considering the full quadrotor nonlinear model, the Coriolis forces are not neglected, the position scale factor is reconstructed in all three dimensions, and it is formally proven using Lyapunov arguments that the estimator and observer errors locally asymptotically converge to zero. Numerical simulations using Gazebo are presented to support the theoretical developments. The outcomes of this paper are based on the preliminary works reported in [25,26].
The remaining parts of the paper are arranged as follows. The sensor models used are presented in Section 2, along with the quadrotor dynamics. Section 3 outlines the fundamental contribution of this paper and describes the mathematical advancements used to create the scale factor estimator. Through numerical simulations, Section 4 illustrates the performance of the estimator. Finally, Section 5 wraps up this paper with a few closing thoughts and suggestions for future work. Table 1 summarizes the notation used to introduce the quadrotor dynamic model.

Symbol
Variable Units Rotation matrix from body to inertial coordinates dimensionless Translational velocity in body frame coordinates m/s Ω = p q r Angular velocity in body coordinates rad/s Moments generated by the differential thrust, and reaction moment between the four rotors Nm m Quadrotor mass kg g Gravity acceleration constant m/s 2 T T Total thrust generated by the four rotors N µ Parameter related to aerodynamic drag force [27] kg/s Quadrotor inertia matrix kg m 2 In Table 1, the International System of Units is considered, and with I the identity matrix.

Quadrotor Dynamics
An inertial coordinate frame and a non-inertial coordinate frame (body frame) attached to the quadrotor center of gravity are needed to describe the quadrotor dynamic, see Figure 1. The following equations, expressed in mixed inertial and body coordinates, describe the translational and rotational quadrotor dynamics [26]:

Available Sensors
It is assumed that the quadrotor carries onboard a set of sensors that provide the following measurements.

Scaled Position
The vehicle carries a monocular camera facing the horizontal plane and the necessary computer power to implement a monocular-vision algorithm to determine its scaled inertial position. Therefore, the following measurement is available where X s is the scaled position delivered by the monocular SLAM vision algorithm and K = k x k y k z is the dimensionless unknown scale factor on the axes X i Y i Z i , respectively.

Remark 1.
The operator diag(A) represents a diagonal matrix whose elements are the elements of vector A ∈ R 3 . This operator satisfies the following indentities with A, B ∈ R 3 vectors.

Specific Acceleration
Commonly, quadrotors are equipped with an inertial measuring unit (IMU) that measures the Earth's magnetic field intensity, angular velocity, and specific acceleration in body coordinates. According to the Accelerometer Tutorial reported in [27], the specific acceleration measured by an accelerometer mounted on a quadrotor is given by where a b is the specific acceleration measured in the body axis and F b T is the total external force acting on the quadrotor expressed in the body axis. From the quadrotor dynamics model in Equation (1), it follows that as a result, Hence, the specific acceleration is an available output, element wise it reads as x , a b y and a b z the specific acceleration along the body axis.

Attitude and Heading Reference Systems
The device that computes the quadrotor's attitude and rotational velocity from the IMU measurements is called the attitude and heading reference system (AHRS). Assuming that the quadrotor carries an AHRS, the following signals are available.
where r i ∈ S 2 , i = 1, . . . , 3 are the columns of the rotation matrix transposed, is the unit 2-sphere.

Vertical Speed
Through the use of a laser sensor or an ultrasonic sensor, the vertical quadrotor position can be measured so that the vertical speed can be determined. As a result, it is presumed that the subsequent measurement is available Finally, note that the quadrotor translational dynamic, the first equation in (1), expressed in terms of the measured states reads aṡ

Immersion and Invariance Observers
The following developments are based on Chapter 5 of [24]. Consider the following non-linear, deterministic, time invariant systeṁ where η ∈ R ⊂ R n and y ∈ Y ⊂ R m are the unmeasured and measured states, respectively.
It is assumed that the vector fields f 1 (η, y) and f 2 (η, y) are forward complete.
withη ∈ R n , is an observer for the unmeasured state η if there exists a mapping β : R n × R m → R n such that the manifold All trajectories of (13), (14) that start in a neighbourhood of M asymptotically converge to M.
The construction of the observer of the form given in Definition 1 requires additional properties on the mapping β(η, y), as stated in the following result. Theorem 1. Consider the system (13). Suppose that there exist differentiable maps β : R n × R m → R n such that A1 For allη and y the map β(η, y) satisfies has a (globally) asymptotically stable equilibrium atη = 0 uniformily inη and y. Then, the system (14) with is a (global) observer for (13).
The proof of this Theorem is presented in Appendix A. The result expressed in Theorem 1 is followed to design the velocity observer and the scale factor estimator.

Observer and Estimator Design
This section discusses how the observer and the estimator that reconstruct V b and K, respectively, are designed from the available measurements of acceleration, scaled position, attitude, angular velocity, and vertical speed.

Observation and Estimation Problems
The following terms state the observation problem. Assume that the outputs y i , i = 1, . . . , 5 are measurable. Design two dynamic systems, likely, of the forṁV whereV b ∈ R 3 andK ∈ R 3 , such that two functions exist, β 1 ∈ R 3 and β 2 ∈ R 3 , that depend on the available information, and the following identities asymptotically hold

Velocity Observer
According to the immersion and invariance technique, the observation error is defined as followsṼ with β 1 element wise reading as (18) models the distance to the manifold M of Definition 1, where the velocity V b is equal to β 1 (V b , σ). This distance must asymptotically converge to zero to complete the observer design. Note that the output y 2 is not directly used since the time derivative ofṼ b will require the computation ofẏ 2 ; this is the reason why the new state σ is introduced.
The time derivative ofṼ b is given bẏṼ SubstitutingV b , the body velocity V b and the time derivative of σ from Equations (12), (18), and (20), respectively, one haṡṼ Now, the observer state dynamic˙V b is defined in terms of the known signals, aṡV Substituting (23) into (22), one obtainṡṼ Consider the following Lyapunov function to define the function it follows thatν Hence, to guarantee that the time derivative of ν V is negative-definite, the matrix ∂β 1 ∂σH must also be negative-definite. On the other hand, Equation (23) requires the matrix ∂β 1 ∂V b to be invertible. Note that selecting with and γ vx , γ vy and γ vz positive gains, it follows thaṫ with h = γ vx µ m γ vy µ m γ vz and I 3 the 3 × 3 identity matrix. As a result, ∂β 1 ∂σH is negative-definite and ∂β 1 ∂V b is invertible.

Scale Factor Estimator
In reference [25], the scale factor estimator needs the translational velocity as a measurable output, but in this work it is available through the observer designed (27) according to (17). It is important to note also in this equation that β 2 depends on states expressed in mixed inertial and body coordinates, unlike β 1 which depends only on states expressed in body coordinates. In order to have all the states expressed in inertial coordinates, β 1 needs to be translated with the rotation matrix. The inertial velocity is introduced as follows Additionally, the inertial velocity observer error is defined Now, the scale factor estimation error is defined in the following form with β 2 element wise reading as The derivative with respect to the time of the estimation error iṡK From Equations (1) and (2), it follows thaṫ By combining Equations (18) and (32), one obtainŝ Substituting (36) and (37) into (35), one obtainṡK Now, substituting K from (33) and V i from (31), the scale factor estimation error becomes˙K The dynamic of the scale factor estimator state is defined in terms of the known signals as follows˙K After substituting˙K into the scale factor estimator error (40), it follows thaṫK Once again, the function β 2 needs to be defined to ensure that the estimation error K converges to zero with ∂β 2 ∂K an invertible matrix. Thus, the following vector function is proposed with and γ ki , i = x, y, z the scale factor estimator gains. Replacing (43) into (42), one obtainṡK where (3), (4) and (33) had been considered. From (31), it follows thaṫK The following assumptions are considered to state the main result of this paper.

Assumption 1.
The following identity holds.
with I ∈ R 3×3 the identity matrix.

Assumption 2. There exist control inputs T T and M b such that the following quadrotor states can be upper bounded, this is
for some not-necessarily constants κ 0 and κ 1 . The notation (·) stands for the Euclidean norm for a matrix or vector (·).
Remark 2. Assumption 1 is the persistence of the excitation condition; in this case, fulfilling this condition implies that the quadrotor must move to estimate the scale factor successfully. Assumption 2 means that a control loop allows the quadrotor to fly stably; consequently, the quadrotor dynamics is forward complete.
The following Proposition summarizes the main result of this work. The proof of this Proposition is reported in Appendix B.

Numerical Simulations
A numerical simulation study was performed on different platforms to evaluate the observer and estimator's performance.

Matlab-Simulink
The first one was performed using Matlab-Simulink, to avoid problems such as sensors' noise and external disturbances so that we can evaluate the estimators by themselves. A program was designed to simulate a quadrotor in a closed loop with the controller developed in [28], tracking a circular trajectory on the Cartesian plane and a sinusoidal form on the vertical plane. To fulfill Assumption 1, the desired trajectories were x d = Acos(ωt), y d = Asin(ωt) and z d = Acos(ωt). For the velocity observer, the initial conditions used were V b (0) = −0.2 0.3 0.2 with a proposed µ = 0.6 and gains γ vx = 1.2, γ vy = 1.2 and γ vz = 1.2. Regarding the monocular-vision positioning algorithm for this numerical simulation, any real values for the scale factor K can be used; nevertheless, in more realistic simulations such as the ones in the next section, it is observed that the scaled position is always smaller than the real position, X s < X, so K < 1; hence, the real values of the scale factor for this simulation where fixed at K = [0.65 0.7 0.55] . The gains used for the estimator were γ kx = 2.0, γ ky = 2.0 and γ kz = 2.0. Figure 2 shows the velocity observer errorṼ b for this simulation, where it can be seen that the velocities converge to zero correctly. Note that the velocity error on the Z axis,w, is always zero because the speed on this axis is measurable, (11). Figure 3 shows the scale factor estimator errorK that also proves the correct convergence of the estimator. In this graph the cascade behavior of (46) with (24) can also be seen, which means that˙K will always converge after˙Ṽ b due to the interconnection term Ψ.

Gazebo
It is time to put the observers to the test in a simulator, such as Gazebo, which is more like reality after confirming their proper operation in a controlled setting. The Gazebo is an open-source 3D robotics simulator. It incorporates the open dynamics engine (ODE) as a physics engine, OpenGL for graphics, support code for sensor simulation, and actuator control. The robot operating system (ROS) and the Gazebo simulator are wholly linked.
In the previous simulation, it was easy to set simulation values for the constant µ and the scaled position X s delivered by a monocular-vision algorithm. In the Gazebo simulation, such values will have to be treated more rigorously as would be performed in an actual experiment.

Calculation of µ
µ is a term related to the drag coefficient, a positive constant representing a combination of the profile and induced drag forces on the rotors, known in the helicopter literature as "rotor drag" [27,29]. Like any other parameter of the quadrotor, such as its weight (m), it must be measured before experiments. Due to the units of this parameter, listed in Table 1, it is known as the "mass flow rate".
From Equation (8), it can be seen that From any of these two equations, the constant µ can be measured by having access to the accelerometer and translational velocities measurements, for example We have access to both of these measurements in Gazebo, so a circular trajectory tracking simulation was performed to measure these states. The calculated value for the quadrotor used in Gazebo is µ = 0.18.
With µ calculated, Figure 4 shows the Equation (48) on the X axis with the quadrotor following a circular trajectory, where it can be seen that the relation holds, so the calculated µ is correct. Note that the accelerometer readings (a b x ) have noise added due to the rotors.

Monocular-Vision Algorithm
As mentioned before, the designed scale factor estimator works with any computervision algorithm that implements a monocular camera and delivers position measurements. For this simulation, we use ORB-SLAM2 [15] due to its precision and accuracy, it is easy to install and well-documented, and it has ROS integration. Hence, it is easy to add to the Gazebo environment.

Trajectory Tracking Control Using the Scale Factor Estimator
The simulation involves flying the quadrotor running ORB-SLAM2 from the visual information obtained by its onboard monocular camera without position control following a diagonal trajectory in the horizontal plane to obtain the information required to make the identities in (17) hold. After this step, the scale factor K will be known through β 2 , so the quadrotor will be able to measure its actual position with y 1 , (2). Then, the quadrotor will follow a lemniscate trajectory autonomously (closed-loop).
The low-level controller of the quadrotor is driven by [30], which is a driver to interface with Parrot AR-Drone quadrotors through ROS. It takes translational velocities as control inputs. Figure 5 shows the Gazebo environment with the SLAM algorithm running. The window on the left shows what the onboard camera is seeing with the features (points) detected, and the window on the right depicts the mapping construction.

Figures 8 and 9
show the scale factor estimation on the X and Y axes, respectively. Note that from time 0 s to 6 s, the estimator remains in zero because, in that period, the quadrotor takes off and holds in hover for some seconds, so Assumption 1 is not fulfilled. In the period 22 s to 60 s, some peaks appear because the quadrotor reaches the coordinates (3, 3) and (−3, −3), respectively. Hence, it changes velocity abruptly to change its direction of movement to reach the next point. These changes in velocity can also be seen in Figures 6 and 7 in the time periods. After the open-loop routine, β 2 is calculated to converge at k x = 0.175 and k y = 0.13. It can also be seen in Figures 8 and 9 thatK converges exponentially, as it was anticipated in (A7). From this point, knowing the scale factor K, the quadrotor will be able to measure its actual position through y 1 in (2) as follows Finally, using the estimated information, the quadrotor will follow a lemniscate trajectory autonomously in closed-loop; that is, x d = 1.5cos(ωt) and y d = sin(2ωt). Figures 10 and 11 show the real position and the position measured by the quadrotor on the X and Y axis, respectively, along the lemniscate trajectory.  As shown in Figure 12, the SLAM algorithm has better precision in the left part of the lemniscate because the camera is closer to the buildings, so more image features fed the algorithm. On the diagonals, the SLAM algorithm performs better when the quadrotor moves towards the buildings than when it moves away. Although the scale factor estimator recovers the actual position dimension, it does not help in any sense to improve the SLAM algorithm performance.
Changing the position of the house on the right in the Gazebo environment, we ran a second simulation to check if small changes in the initial conditions of the monocular-vision algorithm influence the final value of the scale factor. In this case, the house on the right is closer to the quadrotor, as shown in Figure 13. At the end of the simulation, the calculated scale factor values were k x = 0.232 and k y = 0.166, proving that the scale factor is different even for the same environment but with minor changes in the initial conditions.

Conclusions
This article proposed a velocity observer in cascade with a scale factor estimator. The significant contributions of this work are listed next: • The velocity observer does not neglect the Coriolis term, offering greater accuracy in fast flights. • The scale factor estimator allows taking advantage of all the benefits of monocular cameras, obtaining the accuracy of a stereoscopic camera without increasing the processing power. • Lyapunov's arguments prove asymptotic convergence to zero of the observer and estimator errors, and the simulations validate the correct performance and use of the proposed theory. • It is illustrated that the scale factor is not the same in all axes, as some authors assume. It is even different from experimenting in the same environment if the initial conditions change. • The proposed approach allows for position trajectory tracking to be performed directly using the measurements of a monocular-vision positioning algorithm, removing the limitations of a GPS or a motion capture system.
In future work, experiments will be carried out by a real quadrotor in more complex environments, combined with other kinds of computer vision algorithms such as person recognition or obstacle avoidance.

Conflicts of Interest:
The authors declare no conflict of interest. and the time derivative of the Lyapunov function (A4) along the trajectories of (A7) becomeṡ then, from Assumption 1 and selecting Γ k as a positive-definite matrix, it follows that the estimation errorK converges asymptotically to zero. The last step in this proof is to prove that the interconnection term between the estimator and observer dynamics given by grows lineally with respect toK. From, Assumption 2 one has