Inertial Sensor Self-Calibration in a Visually-Aided Navigation Approach for a Micro-AUV

This paper presents a new solution for underwater observation, image recording, mapping and 3D reconstruction in shallow waters. The platform, designed as a research and testing tool, is based on a small underwater robot equipped with a MEMS-based IMU, two stereo cameras and a pressure sensor. The data given by the sensors are fused, adjusted and corrected in a multiplicative error state Kalman filter (MESKF), which returns a single vector with the pose and twist of the vehicle and the biases of the inertial sensors (the accelerometer and the gyroscope). The inclusion of these biases in the state vector permits their self-calibration and stabilization, improving the estimates of the robot orientation. Experiments in controlled underwater scenarios and in the sea have demonstrated a satisfactory performance and the capacity of the vehicle to operate in real environments and in real time.


Introduction and Related Work
In the last decade, technology applied to autonomous underwater vehicles (AUVs) has experienced a spectacular development. Scientific, industrial or security applications related to remote observation, object identification, wreck retrieval, mapping and rescuing, among others, have boosted such technological advances. AUVs are progressively becoming fundamental tools for underwater missions, minimizing human intervention in critical situations and automatizing as many procedures as possible. Despite this increasing interest and the great number of vehicles developed far and wide, their cost, size, weight and depth rate are the main factors that mostly restrict their commercialization and extensive use worldwide.
Compared to their larger counterparts, micro-AUVs are smaller, lighter and cheaper to acquire and maintain. These reasons make small AUVs a valuable tool for exploring and mapping shallow waters or cluttered aquatic environments, addressing a wide variety of applications. Conversely, the reduced size of these vehicles limits seriously the equipment that they can carry; thus, they have to be endowed with a reduced sensor set, which can meet, simultaneously, navigation and mission purposes.
The literature is very scarce in this kind of compact system. Smith et al. [1] proposed a micro-AUV to perform inspection, hydrographic surveying and object identification in shallow waters. Watson et al. [2] detail the design of a micro-AUV specially designed to work in swarms, forming mobile underwater sensor networks for monitoring nuclear storage ponds. The hull of this micro-AUV consists of a hemisphere of 150 mm in diameter, and it contains several mini acoustic sensors, a pressure sensor, a gyroscope and a magnetic compass. Wick et al. presented in [3] a miniature AUV for minefield mapping. This robot can work alone or in groups to perform cooperative activities. The dimensions of this vehicle are 60 cm in length by 9 cm in diameter, and it is equipped with a compass, a gyroscope, an accelerometer and a pressure sensor. Both vehicles ( [2,3]) have a limited data storage capacity and are mostly prepared to transmit the collected information to an external unit regularly.
Some micro-AUVs incorporate cameras inside their hull, but to the best of our knowledge, none of them have been endowed with two stereo rigs. For instance, Heriot Watt University designed mAUVe [4], a prototype micro-AUV that weighs only 3.5 kg and is equipped with an inertial unit, a pressure sensor and a camera. There are mini-AUVs currently operative that incorporate vision as an important sensor, but all of them have higher dimensions and weight (for instance, DAGON [5], REMUS [6] or SPARUS-II [7]).
In micro-AUVs, navigation, guidance and control modules are usually approached from the point of view of hydrodynamic and kinematic models, since the power of the thrusters is limited by their small volume and light weight, and they are easily influenced by waves or currents [8,9]. These models generate complex non-linear equation systems that usually depend on the structure of the vehicle (mass and shape) and on the external forces that act on it (gravity, buoyancy, hydrodynamics, thrusters and fins) [10]. As a consequence, these models have to be reconsidered if any significant modification is made to the vehicle, adding a considerable complexity to the system without producing, sometimes, the expected results [11]. Furthermore, a control system optimization with respect to its propelling infrastructure and its hydrodynamic behavior is unavoidable, especially if the energy storage and the power supply are limited [12]. An inefficient control strategy will have negative effects on the useful tool for underwater image recording, observation, environment 3D reconstruction and mosaicking in shallow waters and in aquatic environments of limited space.
The goal was to implement a vehicle fulfilling a set of key factors and requirements that differentiated it from other micro-AUVs that incorporated some of the same facilities. Fugu-C has the advantage of a micro-AUV in size and weight, but with additional strengths typical of bigger vehicles.
Its most important features are: (1) The navigation and guidance modules are based uniquely on the sensor suite. The vehicle is equipped with two stereo rigs, one IMU and a pressure sensor. This paper pays special attention to the navigation filter used in Fugu-C to fuse all of the data coming from all of the navigation sensors. The filter is a particularization of an ESKF approach and includes some strategic points present in similar solutions described in [21][22][23][24]. Similarly to [21], the state vector includes not only the pose and velocity of the vehicle, but also the biases of the inertial sensors, permitting a rapid compensation of these systematic errors accumulated in the estimates of the vehicle orientation.
As for all of these aforementioned pieces of work, this is a multiplicative filter, which means that all orientations are expressed in the quaternion space, avoiding errors due to attitude singularities. Contrary to other similar references, this filter has a general design prepared to work with vehicles with slow or fast dynamics (AUVs or unmanned aerial vehicles (UAV), respectively). Similarly to [23], the error state is reset after each filter iteration, avoiding the propagation of two state vectors and limiting the covariance variations to the error dynamics. While the other referenced ESKF-based solutions assume non-linear and sometimes complex observation models, we propose to simplify the definition of the predicted measurement errors in the Kalman update stage, applying a linearized observation function, which returns the same values contained in the error state vector. Experimental results shown in Section 5 validate our proposal. Furthermore, the implementation in C++ and its wrapper in the robot operating system (ROS, [25]) are published in a public repository for its use and evaluation [26]. (2) Two stereo cameras point to orthogonal directions, covering wider portions of the environment simultaneously. AQUASENSOR [27] is a very similar video structure for underwater imaging. It can be attached to an underwater vehicle, but contrary to Fugu-C, it has only one stereo rig connected to the same hardware structure. The spatial configuration of both cameras is very important to reconstruct, in 3D, the larger parts of the environment with less exploration time and, thus, with less effort. Furthermore, while the down-looking camera can be used for navigation and 3D reconstruction, the frontal one can be used for obstacle detection and avoidance. (3) Fugu-C can be operated remotely or navigate autonomously describing lawn-mowingpaths at a constant depth or altitude. The autonomy of the vehicle is variable, depending on the capacity of the internal hard-drive and the powering battery. (4) The software computation capacity and its internal memory outperform the other cited micro-AUVs, which mostly collect data and send them to external computers to be processed.
This work is organized as follows: Section 2 describes the Fugu-C hardware and software platforms. Section 3 details the whole visual framework and functionalities. Section 4 outlines the modular navigation architecture and the detailed design of the error state Kalman filter. Section 5 presents experimental results concerning navigation and 3D reconstruction, and finally, Section 6 extracts some conclusions about the presented work and points out some details concerning forthcoming work.

Hardware
The Fugu-C structure consists of a cylindrical, transparent, sealed housing containing all of the hardware and four propellers, two horizontal and two vertical. The dimensions of the cylinder are 212 × 209 mm, and the motor supports are 120 mm long. Despite the fact that the propeller configuration could provide the vehicle with four DOFs, surge, heave, roll and yaw, due to the vehicle dynamics, it is almost passively stable in roll and pitch. Consequently, the robot has three practical DOFs, surge, heave and yaw: (x, z, yaw). Figure 1a shows the 3D CAD model design of the vehicle, Figure 1b shows the hardware and the supporting structure, Figure 1c  Fugu-C can operate as an AUV, but it can also be teleoperated as a remotely operated vehicle (ROV) with a low-cost joystick. Its hardware includes (see Figure 2): • Two IEEE-1394a stereo cameras, one of them oriented toward the bottom, whose lenses provide a 97 • horizontal field of view (HFOV), and another one oriented toward the forward direction with a 66 • HFOV. • A 95-Wh polymer Li-Ion battery pack. With this battery, Fugu-C has an autonomy for about 3 h, depending on how much processing is done during the mission. For an autonomous mission with self-localization, the autonomy is reduced to two hours. • A PC104 board based on an Atom N450 at 1.66 GHz with a 128-GB SSD 2.5-inch hard drive.
• A three port 800 FireWire PC104 card to connect the cameras to the main computer board.
• A nano IMU from Memsense, which provides triaxial acceleration, angular rate (gyro) and magnetic field data [28]. • A power supply management card, which permits turning on/off the system, charging the internal battery and the selection between the external power supply or the internal battery. • A microcontroller-based serial motor board that manages some water leak detectors, a pressure sensor and four DC motor drivers.
• An external buoy containing a USB WiFi antenna for wireless access from external computers.
• A set of DC-DC converters to supply independent power to the different robot modules.
• Four geared, brushed DC motors with 1170 rpm of nominal speed and 0.54 kg·cm torque.
A 16-pin watertight socket provides the connectivity for wired Ethernet, external power, battery changing and a WiFi buoy. Note that not all connections are needed during an experiment. If the wired Ethernet is used (tethered ROV in use), then the buoy is not connected. Alternatively, if the buoy is connected (wireless ROV/AUV), then the wired Ethernet remains disconnected.
The autonomy of the robot when operating in AUV mode with the current type of battery has been calculated taking into account that its average power consumption at maximum processing (all sensors on, including both cameras, and all of the electronics and drivers operative) and with the motors engaged is around 44.5 W, so the battery would last (96 W·h)/(44.5 W) = 2.15 h at full charge. If it was necessary to include two LED bulbs (10 W) attached to the vehicle enclosure to operate in environments with poor illumination conditions, the internal power consumption would increase to 64.5 W and the battery would last (96 W·h)/(64.5 W) = 1.49 h.
The robot can operate in shallow waters up to 10 m in depth. However, another version for deeper waters could be made by thickening the acrylic cylinder.
Experiments showed an average speed in surge of 0.5 m/s and an average speed in heave of 1 m/s.  Figure 2. The schematic of the Fugu-C hardware connection.

Software
All of the software was developed and run using ROS middleware [25] under GNU/Linux. Using ROS makes the software integration and interaction easier, improves their maintainability and sharing and strengthens the data integrity of the whole system. The ROS packages installed in Fugu-C can be classified into three categories: internal state management functions, vision functions and navigation functions.
The internal state management function continuously checks the state of the water leak sensors, the temperature sensor and the power level. The activation of any of these three alarms induces the system to publish a ROS topic-type alarm, to terminate all running software and to shutdown the horizontal motors, while it activates the vertical ones to launch the vehicle quickly up to the surface.
The visual functions include image grabbing, stereo odometry, 3D point cloud computation and environment reconstruction, which is explained in Section 3.
The navigation functions include the MESKF for sensor fusion and pose estimation and all of the modules for twist and depth control, and it is detailed in Section 4.

Image Grabbing and Processing
In order to save transmission bandwidth, raw images provided by any of the stereo cameras are encoded using a Bayer filter and sent directly to the computer, where they are processed as follows: (1) First, RGB and gray-scale stereo images are recovered by applying a debayer interpolation.
(2) Then, these images are rectified to compensate for the stereo camera misalignment and the lens and cover distortion; the intrinsic and extrinsic camera parameters for the rectification are obtained in a previous calibration process. (3) Optionally, they are downscaled from the original resolution (1024 × 768) to a parameterizable lower rate (power of two divisions); this downsampling is done only if there is a need to monitor the images from remote computers without saturating the Ethernet communications. Normally, a compressed 512 × 384 px image is enough to pilot the vehicle as an ROV without compromising the network capacity. If the vehicle is being operated autonomously, this last step is not performed. (4) Following the rectification, disparity calculation is done using OpenCV's block matching algorithm to finally project these disparity images as 3D points. (5) Odometry is calculated from the bottom-looking camera with the LibViso2 library. In our experiments, the forward-looking camera has been used only for monitoring and obstacle detection, although it could also be used to compute the frontal 3D structure.

Visual Odometry
Stereo visual odometry is used to estimate the displacement of Fugu-C and its orientation. Many sophisticated visual odometry approaches can be found in the literature [29]. An appropriate approach for real-time applications that uses a relatively high image frame rate is the one provided by the Library for Visual Odometry 2 (LibViso2) [30]. This approach has shown good performance and limited drift in translation and rotation, in controlled and in real underwater scenarios [31].
The main advantages derived from LibViso2 and that make the algorithm suitable for real-time underwater applications are three-fold: (1) it simplifies the feature detection and tracking process, accelerating the overall procedure; in our system, the odometer can be run at 10 fps using images with a resolution of 1024 × 768 pixels; (2) a pure stereo process is used for motion estimation, facilitating its integration and/or reutilization with the module that implements the 3D reconstruction; (3) the great number of feature matches found by the library at each consecutive stereo image makes it possible to deal with high resolution images, which is an advantage to increase the reliability of the motion estimates.

Point Clouds and 3D Reconstruction
3D models of natural sea-floor or underwater installations with strong relief are a very important source of information for scientists and engineers. Visual photo-mosaics are a highly valuable tool for building those models with a high degree of reliability [32]. However, they need a lot of time to be obtained, and they are rarely applicable online. On the contrary, the concatenation of successive point clouds registered by accurate pose estimates permits building and watching the environment online and in 3D, but with a lower level of detail. Thus, photo-mosaicking and point cloud concatenation can be viewed at present as complementary instruments for optic-based underwater exploration and study.
Dense stereo point clouds computed from the disparity maps [33] are provided by Fugu-C at the same rate as the image grabber. They are accumulated to build a dense 3D model of the underwater environment, visible online if a high rate connection between the vehicle and an external computer is established. The concatenation and meshing of the partial reconstructed regions must be done according to the robot pose. Although the odometric position (x, y, z) accumulates drift during relatively long routes, the system can reconstruct, with a high degree of reliability, static objects or short-medium stereo sequences, if no important errors in orientation are accumulated. However, if the orientation of the vehicle diverges from the ground truth, especially in roll and pitch, the successive point clouds will be concatenated with evident misalignments, presenting differences in inclination between them, thus distorting the final 3D result.
In order to increase the reliability of the reconstructed areas in longer trajectories and to minimize the effect of the orientation errors in the 3D models, we have implemented a generic ESKF, which corrects the vehicle attitude estimated by the visual odometer in the three rotation axis and minimizes the effect of the gyroscope drift.

Architecture Pipeline
The navigation architecture of Fugu-C is designed at three concatenated levels: (1) a twist controller; (2) a depth controller; and (3) a motion planner.
The twist control module is, in turn, implemented in three interdependent layers, namely: • Motor layer: the orders directly sent to the motor board node, in charge of interacting with the four propellers. • Wrench layer: translational (thrust) and angular (torque) forces calculated to move the vehicle in a desired direction with a certain speed are the inputs (wrench levels) of the thruster allocation module. Wrench levels are transformed into motor levels according to the affine thruster model and through a thruster allocation matrix (TAM) [34]. • Twist layer: twist includes the linear and angular velocities. The inputs of this module are the velocity setpoint and the current velocity. This module performs a basic proportional integral derivative (PID) controller for each degree of freedom (x, y, z; roll, pitch, yaw) which outputs a wrench level, containing the necessary force and torque to reach the requested twist.
During the experiments with Fugu-C in the pool and in the sea, the PID gains were experimentally adjusted in order to obtain an adequate response of the vehicle in each scenario, avoiding instabilities, oscillations or steady-state errors. A systematic calculation of the PID gains is included in an upcoming work, but excluded in the scope of this paper. The depth module is also a basic PID that outputs a wrench level in the zaxis from two inputs: the current depth and the depth setpoint. The PID gains of these two additional modules are also tuned experimentally and online during each test.
The motion planner module is in charge of generating the twist set points for the twist controller. It can be either a joystick to control the vehicle remotely or a software module that generates automatically the needed twist, depending on the location of the goal point and the current vehicle state in terms of pose and velocity.

Overview
Measurements of inertial sensors always include a certain systematic error, the so-called bias. For example, in the case of gyros, bias is the return of the sensor when the body is rotationless. The desirable situation would be to expect a fixed bias, but unfortunately, biases tend to vary, for instance, with temperature, voltage, due to noise and over time (in gyros, the time varying error is commonly known as drift). The accumulated drift, sample after sample, generates an important deviation of the final pose estimation integrated from the INS measurements with respect to the ground truth. Some manufacturers give an estimation of these systematic errors, assuming they are constant, but in practice, they are not. Consequently, it turns out to be very difficult to compensate for their effect. The proposed navigation filter includes, in its state vector, the systematic errors of the inertial sensors, assuming, initially, that those values will be constant. The results will show how their stabilized values are bounded and small, permitting their easy compensation. The effects of this compensation are especially evident in the estimation of the vehicle orientation, as shown later in Section 5.
Representing the vehicle orientation in the quaternion space is essential in vehicles with six DOF to prevent singularities in the attitude estimation [35]. However, the quaternion 4 × 4 covariance matrix has rank three, due to the unit norm constraint, and, thus, can present singularities. One alternative to solve this problem is to represent the vehicle attitude as a quaternion and the attitude error as a three-component rotation vector [23].
The data coming from the pressure sensor, the IMU and the camera were integrated in a multiplicative (quaternion-based) error state Kalman filter (MESKF). The pose and twist outputted by this filter were, in turn, used as inputs for the motion planner and for the twist controller described above. Although Fugu-C has, at the moment, only three DOF (surge, heave and yaw), the MESKF has a generic design, permitting the attachment of as many sensors as needed and ready to work in vehicles with up to six DOF. A brief outline of this filter was firstly presented by some of the same authors in [36]. Here, an extended and detailed description is offered to the reader, while the results of its application in Fugu-C are shown in Section 5.2.
The EKF error state formulation is supported by the fact that the predicted state vector at the instant k can be expressed as the true state vector plus a certain error. BeingX k the true state vector, x k the predicted state vector and δx k the error state vector: According to Kalman, the predicted state vector at instant (k + 1) (e.g.,X k+1 ), assuming a linear system and no control variables, can be expressed as: where A is the state transition model and n represents the process noise. If Equation (2) is expressed in terms of Equation (1), then: Assuming that the vector state has no error or noise, the state transition can be expressed as: and the error transition as: Similarly to [23], the Kalman system will update the error vector, which will be added to the state vector, giving the appearance that the update is applied directly to it.

State Vector and Error State Vector
Two vectors are involved in the process, the nominal state and the error state. Let x k be the nominal state vector at time k: where p k is the position of the vehicle with respect to the inertial global frame, q k represents its attitude as a quaternion, v k is the robot linear local velocity (unlike [21], where the velocity is with reference to the global inertial frame), ω k is the angular rate, a k is the linear local acceleration and d k and b k denote the drift of the gyroscope and the bias of the accelerometer, respectively. The biases present in the acceleration and in the angular rate given by the inertial sensors are assumed to be constant. It is assumed that the nominal state does not include noise or model imperfections.
Let δx k be the error state vector at time k, including the errors caused by sensor biases and random noise: where δp k represent the error in position, δq k is the error in attitude in the form of a rotation vector, δv k is the error in the linear local velocity and δω k with δa k are the errors in the angular rate and in the linear local acceleration, respectively, which are also assumed to be constant. Errors in acceleration and in angular rate include sensor biases and noise.
An EKF framework usually consists of two steps, a prediction and an update [37]. However, this navigation filter proceeds in four steps:

Nominal State Prediction
The readings of the inertial sensors are used at this stage to estimate the relative vehicle displacement (translation and rotation) between two filter iterations. The inertial data contain a certain bias and noise, assumed to be additive and zero-mean Gaussian.
Let ω be the true vehicle angular rate given by [23]. At time k: where ω m , d and n ω denote, respectively, the sensor readings, the drift and the noise of the gyroscope. The readings of the inertial sensors are usually referenced to the body frame, and the acceleration can include the locally-referenced gravity in the vertical axis. Let us assume, without loss of generality, that the inertial sensor is approximately located in the center of the vehicle, the centripetal acceleration being negligible. At time k: where a m is the sensor reading, b is the accelerometer bias, n a is its noise and a is the real body acceleration at time t. The gravity, g L , is expressed with respect to the body frame, and R is the rotation matrix from local to global coordinates (corresponding to the quaternion q, the vehicle attitude); so R is the transposed rotation matrix of R, permitting the transform from global to local coordinates, and g = (0, 0, −g), being g = 9.81 m/s 2 . Note that v k , ω k , a k , d k and b k are referenced to the local coordinate system (e.g., the system of coordinates attached to the vehicle), and p k and q k are referenced to the inertial frame (e.g., global frame). The coordinate system conventions and notations are illustrated in Figure 4. In this particular implementation, the discrete equations that govern the nominal state are based on a general constant acceleration motion. Note that x (k−1) ≡ x (k−1)|(k−1) and that the estimated state at instant k dependent on the estimated state at instant k − 1 is denoted as where ∆t is the time interval between the current and the predicted nominal state, * represents the quaternion product and q m is the quaternion that represents the vehicle angular motion occurring during ∆t.
It is important to observe that, since the acceleration a k and the linear velocity v k are referenced to the local body frame, the term g L present in the expression of a k transforms the gravity expressed with respect to the inertial frame into the body frame. However, the accumulated position p k must be given with respect to the inertial global frame. As a consequence, the terms R (k−1) v (k−1) and R (k−1) a (k−1) present in the expression of p k denote the transformation of the velocity and the acceleration from the body frame to the inertial global frame.

Error State Prediction
The error is defined as the difference between the estimate of a certain variable and its real value. Let δx k be the estimation of the error state vector at time k: where f (x k ) is the general set of functions defined to predict the error state and k = N (0, Q k ) is the zero-mean Gaussian error of the prediction model, Q k being the model noise covariance. Let δã k , δω k , δp k , δq k , δṽ k , be the estimated errors in acceleration, angular velocity, position, attitude and linear velocity, all at time k. The discrete equations f (x k|(k−1) ) that govern the error state estimation are: where R m is the rotation matrix corresponding to ω m ∆t, also depending on time (k − 1). Note that δã k|(k−1) and δω k|(k−1) are supposed as constant, so that the filter can find the constant bias and drift of the sensors. V is the skew symmetric matrix of the velocity, in the form: I 3×3 is the three by three identity matrix and ⊗ represents the cross product. Derivation of Equation (36) can be found in [35], and derivations of Equations (32) and (33) can be found in the Appendix of this paper. As is exposed in Section 4.2.6, the values forming the error state are all reset after the nominal state has been corrected, and thereby, the values computed by Equations (32) to (36) are always zero. However, the corresponding Jacobian F k = ∂f (x k ) ∂δx k | δx k , at iteration k is different from zero. Therefore, the system covariance matrix P k can be propagated according to the Kalman equation:

Error Kalman Update
Let us denote the measurement error δz k as the difference between a sensor reading z k and its predictionz k contained in the nominal state: According to a standard EKF, the observation function h(δx k ) will be used to compute the predicted error measurement from the predicted error state. In this particular design of ESKF, and similarly to [21], the observation function is assumed to be a linear approximation and stated as h(δx k ) = H k δx k|(k−1) , H k being the identity matrix. This is the same as assuming that, by definition, the measurement error is predicted as exactly the value of the estimated error state. Then, the innovation can be written as: being δz k the vector of measurement errors. In this step, the classical Kalman update equations are used to obtain an updated error state vector (δx k ) and its covariance, now both different from zero: S k being the innovation covariance, K k the Kalman gain and R k the measurements uncertainty. Note the difference between this R k and the rotation matrix R defined in Section 4.2.3.

Nominal State Correction and Error Reset
The values obtained in the updated error state vector are used to correct the nominal state vector by Equations (45)-(51).
It is important to notice that quaternion(δq k|k ) represents the quaternion corresponding to the rotation vector δq k|k . Contrary to the rest of nominal variables, the angular velocity is decremented with its corresponding error. Equation (48) is derived from Equation (20), where the true vehicle angular rate is expressed in terms of the measurement, drift and noise: now with time indexes, and underestimating the error term, substituting d k|k with the correction Equation (50), but as there is only one measure per iteration, ω m k|k = ω m k|(k−1) , then substituting with the prediction term in Equation (26), This filter corrects the whole nominal state, including the biases of the inertial sensor, contrary to other solutions based on indirect Kalman filters that do not include sensor biases in the state vector [38], or where the acceleration is estimated externally of the filter state [19].
As indicated in [23], after this correction, all variables of the error state vector are reset to zero, and the algorithm starts again the next iteration in Step 1 with the corrected value of x k . This reset does not modify the covariance, since it does not alter the content of the Kalman filter estimates. It merely moves the data from one vector to the other, avoiding the need to propagate two state vectors and limiting the covariance variations to the error dynamics.

Measurement Delays
The different sensors that equip Fugu-C supply their data at different rates. While the operational frequency of an standard IMU can range from 100 Hz to 1 KHz, a visual odometer or a pressure sensor can work, typically, between 10 Hz and 25 Hz (depending on the capacity of the used processor).
Motion predictions are done at the IMU transmission rate, while updates must be run when one exteroceptive measurement is received. The lack of synchronism in the integration of the data coming from the diverse sensors, that is delays between the reception and the processing of the IMU or odometry messages, and delays in the execution of the prediction and update steps can introduce additional errors in the filtering process.
To assure that all of the measurements are incorporated synchronously to the filter and that every update is launched synchronously with its corresponding motion prediction, all of the sensor timestamps are referred to the same global timer. Furthermore, similarly to [22], all of the measurements are stored in FIFO buffers with their timestamp. Although every measurement can really be processed with a certain delay, the updates are executed, aligned in time with the corresponding prediction, which is the one performed with the IMU reading that has a time stamp that immediately precedes the time stamp of the currently-processed visual sample.
It is worth commenting that, although the dynamics of Fugu-C are quite slow and this problem would not really affect the vehicle control, the filter has a general design, and it is prepared also to work in vehicles with six DOF with faster dynamics. This issue is crucial, for instance, in UAVs.  Figure 5. The navigation and control architecture.
Our implementation of the MESKF and its ROS wrapper is available to the scientific community in a public repository [26]. Figure 5 shows a schematic overview of the ROS node-based navigation architecture described in this section.

Experimental Results
The experimental results are organized in experiments in a controlled environment, experiments in the sea and 3D reconstruction.

Navigation in a Controlled Environment
The first set of experiments to assess the navigation modules was carried out actuating the robot at a constant depth in a pool of 7 × 4 m. The bottom of this pool was covered with a printed digital image of a sea bottom environment, to simulate a marine-like context. In order to have a certain reference, the poster was also used to calculate the trajectory ground truth (see Figure 6). To that end, features of the online stereo images, captured from the down-looking camera, are matched with those features extracted offline from the original digital image of the poster. Afterwards, the 3D points computed from the online stereo pairs are re-projected onto the matching features of the original digital image, trying to minimize the global re-projection error. This is a particular application of the perspective-N-point (PNP) problem [39], and the result of the algorithm is the relative transformation between the current stereo image pair and the corresponding section in the original image of the poster. The algorithm assumes a flat ground and a perfect alignment of 90 • between the lens axis of both cameras and the ground surface. The precision in the computation of the ground truth under the ideal conditions is high, since the outliers are eliminated using random sample consensus (RANSAC) [40], and errors in the re-projection and optimization process are minimized. The origin of the global coordinate system is always set at the beginning of the trajectory. This point is located at the water surface (z = 0), so the depth computed from the pressure sensor is equivalent to the z > 0 axis. The initial 15 × 15 covariance matrix of the error state vector was set to zero in all of the positions, except in the diagonal, where all elements were set to small values (10 −2 ) to represent small uncertainties in the initial state vector.
Plot 7a shows the trajectory in (x,y,z) for the visual odometry (red), ground truth (blue) and for the filter estimates (green), and Plot 7c shows the same robot trajectory in the x-y plane.
Plot 7d shows the evolution of the accelerometer bias returned by the filter, and Plot 7e shows the evolution of the gyro drift. Figure 8 shows the vehicle attitude during Experiment 1, corresponding to the ground truth (Plot 8a), the visual odometry (Plot 8b) and to the filter (Plot 8c). Figure 9a,c shows, respectively, the 3D and 2D views of a sweeping trajectory run in the second experiment. Figure 9b shows the depth of the vehicle according to the visual odometry and the filter. Again, the origin of the trajectory is in the point (0, 0, 0). Plots 9d and 9e show the accelerometer bias and the gyro drift, respectively. Figure 10 shows the vehicle orientation in roll, pitch and yaw, according to the ground truth (Plot a), the visual odometry (Plot b) and the filter estimates (Plot c).

Results of both experiments point out the next conclusions:
• It can be observed from trajectories shown in Figures 7a and 9a that, although the data from the visual odometer present an important drift in z, the filter is able to correct this coordinate to a nearly constant value, using the pressure sensor information. The trajectory estimated by the filter is very close to the visual odometer, which is, in fact, much more reliable than the position estimated by integrating the inertial data. It is important to remark that the goal of the filter is not the pose correction. Since this is an approach for inertially-aided navigation, here, there are no external landmarks or loop closings used to improve the robot position further than the odometer estimates. Instead, the aim is to integrate all of the data coming from the navigation sensors in the most convenient way to, on the one hand, continuously obtain a vehicle pose that is as reliable as possible and, on the other hand, to stabilize the inertial data, compensating for their systematic biases. • Acceleration bias and gyro drift stabilize quickly to constant values (some of them close to zero), meaning a practical sensor calibration and permitting the compensation of the inertial systematic errors. The highest value in the gyro drift corresponds to approximately 0.015 rad in its x component, an imperceptible value compared with the typical drifts that standard inertial sensors and odometers are prone to reach. Since the vehicle is moving at, approximately, a constant depth, (oscillating slightly in z due to the PID vertical hysteresis), the acceleration in z is very small and so is its drift in z. • All of the trajectories estimated by the filter present, at their origin, a difference in orientation with respect to the ground truth. This difference is caused by the initial error in the angular rate. However, as this initial error is rapidly compensated for by the filter, it is not accumulated in the subsequent pose estimates. • As Fugu-C is passively stable in pitch and roll, motion in these two DOF should result in being insignificant, while rotations in yaw should reflect the heading changes at every turn. In all of the trajectories, the vehicle always moved only in surge, heave and yaw. Consequently, the values corresponding to pitch and roll should be very small. However, the values of pitch and roll estimated by the ground truth and by the visual odometer in Experiments 1 and 2, as shown in Figures 8a,b and 10a,b, present values significantly different from zero (with maxima in the odometry of 0.5 radians), which is incorrect. The ground truth also presents, erroneously, non-negligible values of roll and pitch. This is because the condition of flat ground is not completely fulfilled due to a slight inclination of the pool bottom with respect to the water surface. Furthermore, the alignment of the lens axis of the down-looking camera with the normal to the ground plane is not perfect. Conversely, the filter approaches the values of roll and pitch to zero during both trajectories, which is much closer to the vehicle's real behavior. The ground truth in these experiments performs well in (x, y), and it is a reliable reference of the vehicle trajectory in the horizontal plane. However, unfortunately, it presents some errors in the orientation estimates and in the vertical direction.

Navigation in the Sea
Other experiments were conducted in a shallow water area off the coast of Mallorca (Balearic Islands, Spain). The robot was actuated only in surge, heave and yaw, without any significant motion in roll or pitch. Fugu-C was programmed to navigate at a constant depth in four different trajectories: one L-shaped, running close to a harbor breakwater, a loop and two sweeping trajectories covering around 40 meters of length. The sea bottom in the experiment area had no significant slope and was mainly covered by rocks and algae. Local reliefs caused significant and frequent changes in the distance from the vehicle to the ground. Even so, the experiments demonstrate that the filter responds properly, compensating for the visual odometer drift with the pressure sensor data. For these experiments, there was no possibility to calculate a ground truth. Instead, several markers were put on the sea bottom to control the difference between the real markers' position and the estimated pose of the vehicle when it passed over them. All trajectories started at the same point, indicated with a marker. Nonetheless, the goal of the filter is not a pose correction, but there is the self-calibration of the inertial sensors.
Besides, taking advantage of the rosbag [25] technology offered by the ROS middleware, the results of Experiments 3, 4 and 5, obtained with our MESKF, were compared with the results obtained using a standard approach of multiplicative extended Kalman filter (MEKF). The node ekf_localization of the ROS package robot_localization [41] was used to that end. This node implements an multiplicative (orientations represented with quaternions) EKF that permits the integration of six-DOF nominal data coming from several sensors (in our case, the acceleration and the angular rate of an IMU, the pose and twist of a visual odometer and the depth given by the pressure sensor) and returns a refined state vector composed of pose and twist. In the standard EKF, the nominal variables, all affected by additive errors, are inserted in the Kalman filter, while in the MESKF, only the error state is filtered, but its result is added to the nominal state. Both systems should lead to similar results, because both include, integrate, estimate and evolve the error of the system model and the nominal data in different ways. However, the main differences between both approaches are: (1) no error variables are included in the EKF state vector; consequently, the systematic errors of the inertial sensors are not compensated for; (2) there are no significant differences in the estimated 3D robot trajectory; but, since the drift of the gyroscope is not corrected, appreciable irregularities in the returned orientations are obtained with the standard EKF. Figure 11 corresponds to an L-shaped trajectory (Experiment 3). The figure shows the trajectory in 3D and 2D, the depth, the acceleration bias, gyro drift and the robot orientation according to the visual odometer and the filter. This figure also shows the vehicle orientation returned by the ROS EKF package [41].   [41]. The length of both experiments exceeds 40 meters. Figure 14 shows four images captured during Experiments 3, 4 and 5. Figure 14d shows the artificial marker deposited at the sea bottom to indicate the starting/end point of the trajectory.
The results of these experiments lead to the next discussion: • The drift in z generated by the visual stereo odometer was corrected by including the pressure sensor data in the filter. In these experiments, the difference between the odometry and the filter estimates in the (x,y) plane is nearly zero. Some settings in the sea are slightly different than in the pool, for example the initial error state covariances and, especially, the odometry covariances (given directly by the visual odometer library and clearly influenced by the type of environment and, thus, the features found in it) and the covariances of the inertial data, which differ from those obtained in the pool, since the motion of the vehicle is conditioned by the slight sea current and the actions of the vehicle PID to compensate for this. • Trajectories given by the ROS EKF have not been printed, because they are nearly equal to the trajectories estimated by the visual odometer; similar to the odometer, the vehicle attitude estimated by the ROS EKF has non-negligible values, with some maxima and minima close to 0.5 rad, which are not adjusted to its real behavior. • The orientation of the vehicle in roll and pitch was clearly adjusted by the MESKF to values close to zero, thanks to the rapid compensation of the gyro drift. • The acceleration biases given by the MESKF approximate a constant; the desirable situation would be to obtain biases stabilized to zero for motions at constant speed; however, in the sea, accelerations in the x axis, which is the direction of progress, are prone to appear periodically, since the vehicle is constantly trying to overcome the effects of the currents by applying, from time to time, in the forward direction, a certain force given by the PID controller; this problem is not so evident in the pool, where it is much easier to keep the vehicle moving at a constant velocity without interferences.    (d) also shows the artificial marker that indicates the stating point. Figure 15 shows the evolution of the variables included in the error state vector during Experiment 4. All errors are bounded in a small range: position errors are bounded between ±10 cm; velocity errors are bounded between ±0.5 m/s (except a sample that reaches 1.5 m/s); orientation errors are bounded between approximately ±0.01 radians (0.6 • ); errors in angular rate are negligible; and errors in acceleration are bounded between approximately 0.02 rad/seg 2 and −0.015 rad/seg 2 . Values of errors in angular rate and orientation (which include biases and noise) are remarkably small, which means that, according to the previous discussions, attitudes estimated by the filter are highly reliable and much closer to reality than those given by the odometry and by the ground truth algorithm used in this work.

3D Reconstruction
Stereo video sequences grabbed in the Port of Valldemossa in the ROS-bag [25] format were played offline to simulate the process of building online the 3D map of the environment by concatenating the successive point clouds. The dense point cloud generation was performed at the same rate as the image grabber (10 frames/s), permitting the reconstruction of the environment in real time. The correction in the vehicle estimated attitude increases the precision in the assembly of these point clouds, resulting in a realistic 3D view of the scenario where the robot is moving. Figure 16a,b shows two different 3D views of the marine environment where Experiment 3 was performed. The successive point clouds were registered using the vehicle odometry pose estimates. Figure 16c,d shows two 3D views of the same environments, but registering the point clouds using the vehicle pose estimates provided by the MESKF. In all figures, the starting point and the direction of motion are indicated with a red circle and a red arrow, respectively. A marker was placed in the ground to indicate the starting/end point. Figure 17a,b shows two different 3D views built during Experiment 4, registering the successive point clouds with the vehicle odometry pose estimates. Figure 17c,d shows two 3D views of the same environments, but registering the point clouds using the vehicle pose estimates provided by the MESKF.
Again, in all figures, the starting point and the direction of motion are indicated with a red circle and a red arrow, respectively.  Figure 18 shows three images of Fugu-C navigating in this environment. In Figure 18c, the artificial marker deposited on the sea ground can be observed at the bottom of the image.
Raw point clouds are expressed with respect to the camera frame, but then, they are transformed to the global coordinates frame by composing their camera (local) coordinates with the estimated vehicle global coordinates. 3D maps of Figures 16a,b and 17a,b show clear misalignments between diverse point clouds. These misalignments are due to the values of the estimated roll, and pitch vehicle orientations are, at certain instants, significantly different from zero. As a consequence, the orientation of the point cloud has a value different from zero, causing them to be inclined and/or displaced with respect to those immediately contiguous and causing also the subsequent point clouds to be misaligned with respect to the horizontal plane. This effect is particularly evident in Figure 16a,b, where very few point clouds are parallel to the ground, most of them being displaced and oblique with respect to the ground. However, as the vehicle orientations in roll and pitch estimated by the MESKF are all approximately zero, all of the point clouds are nearly parallel to the ground plane, without any significant inclination in pitch/roll or important misalignment, and providing a highly realistic 3D reconstruction. Notice how the 3D views shown in Figures 16c,d and 17c,d coincide with the trajectories shown in Figures 11 and 12, respectively. The video uploaded in [42] shows different perspectives of the 3D map built from the dataset of Experiment 4, registering the point clouds with the odometry (at the left) and with the filter estimates (at the right). Observing the 3D reconstructions from different view points offers a better idea of how sloping, misaligned and displaced with respect to the ground, can be some of the point clouds, due to those values of roll and pitch different from zero. The improvement in the 3D map structure when using the filtered data is evident, as all of the point clouds are placed consecutively, totally aligned and parallel to the sea ground.
(a) (b) (c) Figure 18. Images (a-c) show three different views of Fugu-C navigating in the Port of Valldemossa.

Conclusions
This paper presents Fugu-C, a prototype micro-AUV especially designed for underwater image recording, observation and 3D mapping in shallow waters or in cluttered aquatic environments. Emphasis has been made on the vehicle structure, its multiple-layer navigation architecture and its capacity to reconstruct and map underwater environments in 3D. Fugu-C combines some of the advantages of a standard AUV with the characteristics of the micro-AUVs, outperforming other micro underwater vehicles in: (1) its ability to image the environment with two stereo cameras, one looking downwards and another one looking forward; (2) its computational and storage capacity; and (3) the possibility to integrate all of the sensorial data in a specially-designed MESKF that has multiple advantages.
The main benefits of using this aforementioned filter and their particularities with respect to other similar approaches are: (1) A general configuration permitting the integration of as many sensors as needed and applicable in any vehicle with six DOF. (2) It deals with two state vectors, the nominal and the error state; it represents all nominal orientations in quaternions to prevent singularities in the attitude estimation; however, the attitude errors are represented as rotation vectors to avoid singularities in the covariance matrices; the prediction model assumes a motion with constant acceleration. Extensive experimental results in controlled scenarios and in the sea have shown that the implemented navigation modules are adequate to maneuver the vehicle without oscillations or instabilities. Experiments have also evidenced that the designed navigation filter is able to compensate, online, the biases introduced by the inertial sensors and to correct errors in the vehicle z coordinate, as well as in the roll and pitch orientations estimated by a visual odometer. These corrections in the vehicle orientation are extremely important when concatenating stereo point clouds to form a realistic 3D view of the environment without any misalignment.
Furthermore, the implementation of the MESKF is available to the scientific community in a public repository [26].
Future work plans are oriented toward the following: (1) the aided inertial navigation approach presented in this paper is unable to correct the vehicle position in (x, y), since it is not using any technique to track environmental landmarks or to adjust the localization by means of closing loops; the use of stereo GraphSLAM [43] to correct the robot position estimated by the filter will be the next step, applying, afterwards, techniques for fine-point cloud registration when they present overlap; (2) obviously, the twist and depth simple PID controllers described in Section 4.1 could be changed by other, more sophisticated systems that take into account other considerations, such as external forces, hydrodynamic models and the relation with the vehicle thrusters and its autonomy; one of the points planned to be investigated in forthcoming work consists of trying to find a trade off between controlling the vehicle only with the navigation sensorial data and the incorporation of a minimal number to structural considerations.
whereã k is the estimated acceleration.
According to Equations (21) and (A1), it can be expressed as: v (k+1) =ṽ k + (R k g − a m +b k )∆t (A2) and according to Equations (1) and (A2), it can be expressed as: where all of the estimated variables have been substituted by the sum of their nominal and error values, except the rotation matrix, which has to be denoted as a product of the nominal rotation and the rotation error matrices. The rotation error matrix δR k is given by its Rodrigues formula [44], which, for small angles, can be approximated as: δR k ≈ I 3×3 + [K] x , where [K] x is a skew symmetric matrix that contains the rotation vector corresponding to δR k and I 3×3 is the 3 × 3 identity matrix.
Consequently, Equation (A3) can be expressed as: Segregating the error terms from both sides of Equation (A4), we obtain the expression to predict the error in the linear velocity: δv (k+1) = δv k + [K] x R k g∆t + δb k ∆t = δv k + [K] x g L ∆t + δb k ∆t (A5) Since [K] x contains the vector corresponding to the rotation error δR k and the acceleration bias is expressed as δa k in the error state vector, Equation (A5) can also be seen as: δv (k+1) = δv k − (g L ⊗ δq k )∆t + δa k ∆t (A6)
Assuming that errors are very small between two filter iterations, taking into account that [K] x contains the rotation vector error and representing any product by [K] x as a cross product, Equation (A12) can be re-formulated as: