Validation of Underwater Sensor Package Using Feature Based SLAM

Robotic vehicles working in new, unexplored environments must be able to locate themselves in the environment while constructing a picture of the objects in the environment that could act as obstacles that would prevent the vehicles from completing their desired tasks. In enclosed environments, underwater range sensors based off of acoustics suffer performance issues due to reflections. Additionally, their relatively high cost make them less than ideal for usage on low cost vehicles designed to be used underwater. In this paper we propose a sensor package composed of a downward facing camera, which is used to perform feature tracking based visual odometry, and a custom vision-based two dimensional rangefinder that can be used on low cost underwater unmanned vehicles. In order to examine the performance of this sensor package in a SLAM framework, experimental tests are performed using an unmanned ground vehicle and two feature based SLAM algorithms, the extended Kalman filter based approach and the Rao-Blackwellized, particle filter based approach, to validate the sensor package.


Introduction
In order for unmanned vehicles to operate with autonomy they must be able to navigate through unknown environments while avoiding obstacles. In order to accomplish this task, unmanned vehicles must use the sensors that they are equipped with to construct a picture of their surrounds, a map, and to determine where they are located within the environment. Vehicles that operate outdoors have many different types of sensors that can be used to accomplish this task. For example, satellite-based localization sensors (GPS) can provide vehicles with their location with centimeter accuracy, and obstacles in the environment can be mapped using laser-based rangefinder which can determine the distance to obstacles within millimeters. However, in some environments such as indoors, or in our case underwater, these types of sensors do not operate properly therefore they cannot be used. In underwater environments the most common types of sensors used for mapping a vehicles surrounds are based on acoustics. These sensor provide accurate sensor measurements when used in large open water environments, however, they can be expensive and are difficult to use in enclosed environments such as pools or underwater caves. To overcome the issues that arise with using sensors based on acoustics or radio waves, our goal is develop a sensor package that can be used on a small, low cost underwater vehicle designed specifically to operate in small, enclosed spaces.
In this paper, a proposed sensor suite is tested in order to verify that it can be used to complete mapping and localization tasks with the required accuracy. To verify the accuracy, two commonly used feature-based Simultaneous Localization and Mapping (SLAM) algorithms, the extended Kalman filter based solution (EKF SLAM) [1,2] and the approach that uses a Rao-Blackwellized particle filter (FastSLAM) [3], are used. These two approaches were selected as they are relatively easy to implement and there is a large amount of research available on each of the solutions. In order to make a proper comparison we mount our sensor package to a small ground vehicle and operate in inside of an enclosed indoor environment. We then compare the results of the SLAM algorithms implemented using our sensor suite to a ground truth that is generated using highly accurate sensors designed to be used in air.
One environment in which SLAM implementations are less common, as opposed to ground vehicles operating indoors and outdoors or aerial vehicles, is underwater environments. While the underwater environment is less common, there are still several notable implementations. In [4] a feature based SLAM method is developed for underwater environments. The system uses the Robust Extended Kalman Filter as opposed to the classic EKF. The authors of [5] developed a system for mapping and localizing a small exploratory Remotely Operated Vehicle (ROV) that operates in underground cistern systems. The ROV was equipped with a depth sensor, compass, and scanning sonar. A dynamic vehicle model is used to implement a FastSLAM algorithm using occupancy grids. The SLAM algorithm was not used in real-time, rather the vehicle was remotely controlled by a user and the collected data was used to determine the path that the vehicle travelled along with a map of the cistern environment after the run. An underwater SLAM approach is developed in [6] and based on the implementation of computer vision. The vehicle uses stereo cameras and a visual odometry system to estimate its motion. The authors implement the EKF and Unscented Kalman Filter (UKF) solution to the SLAM problem. However, a separate Rauch-Tung-Striebel Smoother is implemented to smooth the entire control and measurement histories. A SLAM system that uses low frequency high resolution images is developed in [7]. The system makes use of on board inertial sensors and a high resolution underwater camera to implement a SLAM approach based on the sparse extended information filter. The system is used to localize a ROV over a 3km track while mapping the RMS Titanic.
The remainder of this paper is organized as follows. In the following section some preliminary mathematical concepts are provided along with the mathematical notation used in this paper. In Section 3 the design of our prototype underwater rangefinder is discussed and the performance of the sensor is examined using experimental results. In Section 4 a visual odometry algorithm is presented that makes use of a downward facing camera to provide our unmanned underwater vehicle (UUV) with the information required to localize itself. In Section 5 our sensor suite is validated using EKF SLAM and in Section 6 they are validated using FastSLAM. Finally, concluding remarks and areas of future research are presented in Section 7.

Mathematical Preliminaries
In this section the mathematical notation used in this paper is presented along with some preliminary mathematical topics. There are three standard data types that are used throughout this paper. Scalar values are denoted by lower case italic values, for example x ∈ R denotes a scalar value which is a member of the set of real numbers R. One dimensional vectors are identified using bold lower case notation, for example x ∈ R n is a vector containing n ∈ N elements where N is the set of natural numbers. The kth element in a one dimensional vector is referenced as x(k). Two dimensional matrices are displayed using bold upper case notation, for example X ∈ R n×m is a matrix with n ∈ N rows and m ∈ N columns. An element in the matrix is referenced according to X (i, j) which represents the value stored in the matrix X in the ith row and the jth column. A row or column in a matrix are referenced by placing a ":" in the second dimension, for example the ith row of X is referenced as X(i, :).
In much of the existing literature, the SLAM problem is addressed in a probabilistic sense. In many cases we would like to estimate the probability, also referred to as the distribution, of some random variable x ∈ R and we denote the distribution of x as p (x). In many cases we use some additional information y ∈ R to tell us something about the random variable x. In this situation p (x) is referred to as the prior probability and it is all that we know about the probability of x without the inclusion of y, in many cases this is shortened to prior. The distribution of x with the inclusion of the data y is denoted p (x | y) and referred to as the posterior probability and in many cases is just referred to as the posterior.
In SLAM we are attempting to estimate the pose ξ k of our UUV and a map M k of the environment that surrounds it at some time step k > 0. In this paper we treat the world in which the UUV operates as a two dimensional plane, thus ξ k [ x k y k θ k ] T where [ x k y k ] ∈ R 2 are the horizontal and vertical position of the vehicle in some frame of reference and θ k ∈ (−π, π] is the heading of the UUV with respect to the positive horizontal axis in the frame of reference. From our use of occupancy grids, the estimated map is represented by a matrix M k ∈ R r×c where r ∈ N are the number of rows in the grid and c ∈ N are the number of columns. We assume that the environment that the UUV operates in is static, therefore during the time in which the UUV is performing the SLAM algorithm the environment does not change. Based on this assumption, to simplify the notation, the occupancy grid is denoted with M. The estimate produced by SLAM, in many cases, makes use of sensor measurements and control inputs. The set of sensor measurements z 1:k denotes the full set of sensor measurements for k > 0, thus z 1:k [z 1 , . . . , z k ] where z i ∈ R m , i = 1, 2, . . . , k and m ∈ N is the number of measurements in z i . In the same way the full set of control inputs are defined as u 1:k [u 1 , . . . , u k ] where u i ∈ R d , i = 1, 2, . . . , k and d ∈ N is the size of the control vector u i .

Camera and Laser Rangefinder
When considering unmanned vehicle applications, the ability to locate objects in the surrounding environment is important. Knowledge of the operating environment is required for complex tasks such as localization and mapping, and for simple yet fundamental tasks such as obstacle avoidance. The standard sensor used by underwater vehicles to "see" the environment is the Sound Navigation and Ranging (SONAR) sensor. SONAR sensors work well measuring distances in underwater environments, however in enclosed environments the use of SONAR can cause problems because the acoustic signal used to detect objects can bounce around the environment and be detected by the sensor multiple times. In this section the design of our prototype range finder is presented. Our sensor [8] is low cost and uses a single camera and a pair of laser line generators to measure distances. Similar sensors have been proposed in the literature, in [9,10] sensors using a single laser pointer projecting a dot are presented. These sensors are only able to measure the distance to a single location directly in front of the camera and both of the designs rely heavily on calibration routines that map the laser pointer's location in the image to a distance. Our sensor has two advantages over the previously developed sensors: first, by using laser lines as opposed to a laser point multiple distance measurements can be made and second, the calibration requirement is removed by using two laser line generators mounted in parallel at a set distance apart.

Physical Layout
The physical design of the sensor comes from the structured light approach that serves as the basis for the sensor. The sensor's physical layout can be seen in Figure 1 and is composed of two laser line generators (A and B) and a CCD camera (C). The lasers are mounted so that their generated laser lines are parallel to each other and orthogonal to the viewing axis of the camera. The result is that two parallel laser lines are projected horizontally across the image captured by the camera. The camera selected for the prototype is a Sony FCB-EX11D [11] which uses a 1/4-type CCD sensor, is equipped with a 10× optical zoom and can provide a 120x magnification when combined with the camera's digital zoom.
The laser line generators are Apinex GM-CW02L [12] which produce green lines with a wavelength of 532 nm, have a power rating < 50 mW, and a produce lines across a 90 • fan angle. The 532 nm wavelength was chosen because it has a low absorption coefficient in water ( Figure 2). Other colors have lower absorption coefficients, primarily blue and violet, however at the time the prototype was developed laser line generators producing those colors were found to be much more expensive than those that produced light with the green wavelength.

Pinhole Camera Model
The method used by our sensor to measure the distance to an object is based on the pinhole camera model [14]. The pinhole camera model relates a point in the world to its projection on the camera's focal plane. According to the pinhole camera model the light reflected off an object located in the world at p = (x, y, z) that passes through the camera's aperture located at o = (0, 0, 0) is projected onto the focal plane of the camera at q = (u, v, − f ); this relationship is illustrated in Figure 3. By examining the projection in the xz plane ( Figure 4a) and the yz plane ( Figure 4b) the relationship between p and q is given by where (x, y, z) ∈ R 3 are the components of p in the real world, f ∈ R is the focal distance of the camera being used, and (u, v) ∈ N 2 are the horizontal and vertical components of q measured in the camera's unit of measure pixels. The negative sign in Equation (1) comes from the fact that light passing through the camera's aperture is mirrored across the horizontal and vertical axis onto the camera's focal plane, which is located behind the camera's aperture as shown in Figure 3.   Equation (1) is simplified by placing the camera's focal plane in front of the aperture. Using the simplified pinhole camera model, whose projection in the xz plane is seen in Figure 5a and in the yz plane in Figure 5b, the relationship between p and q can be rewritten as

Distance Measurement Theory
Our method of measuring distances is derived from the physical configuration of our sensor, whose side view is shown in Figure 6, and Equation (2). In examining Figure 6 it can be seen that a pair of similar triangles is created between the camera's aperture and (i) the projection of the laser lines on the camera's focal plane (oab) and (ii) the location of the laser lines on an object (ocd) . By equating the two triangles, the relationship between the laser lines in world coordinates and their projection on the camera's focal plane is given byỹ whereỹ y 1 − y 2 is the physical distance that separates the laser lines,ṽ v 1 − v 2 is the distance between the laser lines on the camera's focal plane, f is the focal length of the camera, and z is the unknown distance to the object.

Laser 1
Laser 2 Focal Plane The physical distance separating the laser lines,ỹ, can be measured directly from the sensor prototype and f can be found for the camera that we are using, hence onlyṽ is needed to compute z. The distance that separates the laser lines in the captured image,ṽ, is found through an image processing algorithm, described in Section 3.4. Afterṽ has been found, the unknown distance to the obstacle is calculated using As seen in Equation (4), we must know how far apart the two laser lines are in the captured image in order to determine how far away an object is from our sensor. To accomplish this, we developed an algorithm that extracts the distance separating the two laser line from an image. An overview of the algorithm is seen in Figure 7.

Distortion Removal
The first step of the image processing algorithm removes distortions that are present in the image due to lens and manufacturing defects. These distortions prevent the acquired image from behaving as expected based on the pinhole camera model so they must be corrected before the distance calculation can be made. The distortion model that was selected [15] assumes two types of distortion, radial and tangential. The relationship between a pixel location in the image and the expected location if the camera behaved according to the pinhole camera model is given by u = u+ũ k 1 r 2 +k 2 r 4 +k 3 r 6 + p 1 r 2 +2ũ 2 +2p 2ũṽ (5) v = v+ṽ k 1 r 2 +k 2 r 4 +k 3 r 6 + p 1 r 2 +2ṽ 2 +2p 2ũṽ (6) where (u , v ) ∈ N 2 is where (u, v) ∈ N 2 would be located if the camera behaved according to the pinhole camera model and u and v are the horizontal and vertical components of the pixel location in the image. The parameters k i ∈ R, i = 1, 2, 3 are the coefficients that correspond to the radial distortion and p j ∈ R, j = 1, 2 are the coefficients that describe the tangential distortion. The variables u c ∈ N and v c ∈ N are the horizontal and vertical components of the pixel that represents the center of the camera aperture and (u c , v c ) is known as the principle point. Finally, r √ũ 2 +ṽ 2 is the Euclidian distance in pixels between (u, v) Before the distortion can be removed, the parameters k i , i = 1, 2, 3 and p j , j = 1, 2 must be found. These coefficients are computed using the Camera Calibration Toolbox for Matlab [16] which uses the algorithms described in [17] to determine the distortion coefficients along with other camera specific parameters, such as the principle point. The toolbox uses a set of calibration images which are a series of pictures of a standard checkerboard training pattern that is placed around the field of view of the camera. After the calibration images have been generated, they are loaded by the toolbox and the user selects the four outer corners of the pattern. After these corners have been selected, the toolbox finds the pattern intersections, where four of the squares on the pattern meet, in each image. Using a camera model [18] along with the physical properties of the calibration pattern, square size and number of rows and columns, the toolbox performs a Maximum Likelihood estimation of the camera parameters that minimizes the reprojection error in each of the intersection locations. After the distortion coefficients have been found, the distortion effects are removed from an image acquired by the camera using the OpenCV [19] function cv::remap() which removes the distortion by remapping each pixel in the image using the camera model and camera parameters. Once each pixel has been remapped, the new image matches what would be expected if the camera performed according the pinhole camera model and this allows us to calculate distances according to Equation (4).

Image Segmentation
The sensors described in [9,10] can only measure the distance to an object at a single point directly in front of the sensor. Our design takes advantage of laser line generators that project horizontal lines across the entire camera frame. By using laser lines instead of a single point we are able to measure distances at multiple locations. The ability to measure distances at multiple locations improves the sensor's ability to aid in mapping by providing richer information about an unknown environment, such as the shape of objects located in the environment. To calculate the distance at multiple locations, the image is broken down into small segments as seen in Figure 8. A secondary benefit of segmenting the image is that the line extraction component of the algorithm can be run on smaller images as opposed to the complete image. This provides a performance benefit because processing times are decreased when compared to the time that it would take for the algorithm to be run on the complete image.

Line Extraction
The line extraction component of the algorithm finds the location of the two laser lines in each image segment. By finding the vertical position of the two lines in each segment, the distance between the lines can be found which is the value that is needed to calculate the distance to the object in front of the camera. An overview of each of the steps used to extract the laser lines is shown in Figure 9. In the first step of the algorithm, the green color is extracted from the image. The color plane extraction converts the image from color to black and white and the components of the original image that contained the largest amounts of green have the largest values in the extracted plane; these areas correspond to white in the black and white image. The extracted plane of Figure 10a can be seen in Figure 10b.
The laser lines run horizontal across the image so the pixels in each segment column with the largest values represent the points in that column with the largest amount of green and we assume that they make up the laser line. To increase the speed at which the algorithm runs, not all columns in an image segment are examined, instead a subset of m ∈ N columns are processed. Each of the m columns are searched and the n ∈ N maxima are extracted. Each of the extracted maxima are compared to a threshold value to ensure that the value is above some minimum, this is to ensure that the selected points have a minimum amount of green in an attempt to ensure that the selected points are part of the laser line. A view of the extracted maxima for the sample image can be seen in Figure 10c where the maxima are marked with a "*". Once the maximum values for a column have been extracted the set of points are partitioned into two groups, one for each laser line. The partitioning is performed using K-Means Clustering [20]. K-Means Clustering partitions our set of mn vertical location values, v ∈ N mn , into 2 partitions (p i , i = 1, 2) by minimizing where J : N mn → R is the objective function being minimized, v(j) is vertical position of the point being checked, and c i ∈ R is the mean of the ith partition. The result of the partitioning is shown in Figure 10d where the points composing each of the two laser lines are marked with a "*" and a "+" respectively. Once the two sets of points making up each laser line have been found, the vertical position of each laser is determined by calculating the mean vertical position of each point set. The final location for each of the laser lines are displayed in Figure 10e with the dashed line representing one of the laser lines and the dotted line representing the other. Finally, the distance to the object is found using the vertical distance that separates each of the laser lines and Equation (4).

In Air Testing
Initial test of our sensor were performed in air, in an environment that was constructed in the laboratory using plywood. The purpose for performing the initial tests in air was twofold. First, it was more practical to perform the test in the laboratory environment as there was no on-site underwater facility that could be used so it was easier to make adjustments to the sensor in the laboratory. Second, by performing the test in the air, highly accurate measurement using alternative sensors could be generated for comparison purposes. For our experiments a Hokuyo UTM-30LX scanning laser range finder (LiDAR), which has an accuracy of 0.1 − 10 m ± 30 mm, 10 − 30 m ± 50 mm [21], was used for generating accurate measurements which we compared our sensors measurements against. We attached our sensor prototype and the LiDAR to a wheeled cart and moved the sensors through the test environment. The full image acquired by the camera was divided into 23 segments and a distance measurement was obtained for each segment. A comparison between the distances calculated using the sensor prototype and those measured with the LiDAR at three bearings can be seen in Figure 11.
To better understand how well the sensor prototype measured the distance to objects, an error analysis was performed. The results of the error analysis for each of the bearings can be seen in Figure 12. From the analysis it can be seen that the measurement error as a percentage of the true distance, as measured by the LiDAR, is approximately 10% of the true distance. This result means that the closer the sensor is to an object the smaller the absolute error between our measured distance and the true distance.   To illustrate how our sensor can determine shape information about an object, a pair of frames are shown in Figure 13. By examining these frames it can be seen that richer information about an obstacle, for example its shape, can be found using our design as compared to those sensors that only measure a single distance to an object.

Underwater Testing
After the sensor was tested in the lab, we constructed a waterproof enclosure for the sensor. Underwater tests were performed in an outdoor, unfiltered test tank. The sensor was placed underwater pointing toward one end of the test tank at distances of 0.6 m, 2.15 m, and 4 m. The results of these tests are shown in Figure 14. In the results our sensor is located at the origin and pointed in the positive horizontal direction. Before testing the sensor, the distortion coefficients for water were found by performing the camera calibration routine underwater. The underwater experimental results show that the sensor is capable of measuring the distance to an underwater object, with an estimated relative error close to the previously calculated 10%. From the results it can be seen that the sensor is able to determine the shape of objects in the underwater environment. In fact, the measurement at 4 m captured both of the sides and the end of the test tank and the corners of the tank are easily seen. However, there is one drawback to our design. Since our sensor must be able to see the reflected laser light in order to determine the distance to an obstacle, any environmental conditions that make it more difficult to see the laser lines, in either air or underwater, would negatively affect the performance of our sensor.

Error Analysis
By examining Equation (4) with the assumption that the laser line generators can be mounted parallel to each other, the primary source of error in the distance measurement comes from the calculation of the distance that separates the laser lines in the camera image. To see how this error affects the distance measurement an error analysis was performed. By differentiating Equation (4) the distance error is found to be where δz ∈ R represents the distance error corresponding to laser line separation error δṽ ∈ N. Equation Equation (8) can be rewritten as |δz| = fỹ v 2 |δṽ| or |δz| = z 2 fỹ |δy| (9) which shows that the absolute value of the measurement error grows quadratically with respect to the distance from the target object; this means that as the sensor moves further away from an object the affect of laser line misidentifications becomes greater. Finally, using Equation (4) we can rewrite Equation (9) as which gives us the relationship seen in Figure 12 where the error as a percentage of true distance stays constant at approximately 10% of the distance. With a sensor designed that allows our UUV to see in underwater environments, in the following section a method of allowing to determine where it is related to where it began operating will be presented.

Visual Odometry with Downward Facing Camera
In the previous section a sensor was presented that allows a UUV to see what surrounds it. To be able to operate autonomously the UUV must know what surrounds it and where it is located in the environment. To determine where they are in an environment, underwater vehicles typically use a Doppler Velocity Log (DVL) to obtain information similar to that provided by encoders on wheeled ground vehicles [22]. A DVL operates by facing downwards and bouncing an acoustic signal off of the environment's floor, using the time that it takes the signal to return to the sensor the speed of vehicle is determined. A new family of sensors such as Teledyne RD Instruments' Explorer [23] have been developed for small underwater vehicles. Unfortunately these sensors are expensive and designed to operate at a minimum distance of 0.5 m off of the floor which makes their use impractical for low cost vehicles or for a vehicle in constant contact with the floor.
We propose using a downward facing camera to provide visual odometry (VO) data for our vehicle. Downward facing cameras are quite common in many robotics applications due to their low cost and ease of use. In this section a correlation based VO algorithm is developed that makes use of a downward facing camera and the algorithm is tested using experimental data.

Visual Odometry Algorithm
The visual odometry algorithm that we developed is based on [24] which estimates vehicle translations using a downward facing camera. An overview of the complete algorithm is shown in Figure 15. Before the translations of our UUV can be calculated, the image captured by the downward facing camera must be preprocessed. The original image captured at time step k, I k,o ∈ R w×h (Figure 16a) where w ∈ N is the width and h ∈ N is the height of the image, is converted from the full color space to greyscale, I k,o → I k,bw ∈ R w×h (Figure 16b). This conversion is required for the remaining steps of the algorithm to work properly. A filter is then applied to I k,bw which serves two purposes: (i) through experimentation it was found that filtering the image made the system more robust to inconsistent lighting conditions and (ii) the filter is required because the floor of the environment in which our vehicle is intended to operate does not have a significant number of visually identifiable features when viewed without the filter. In order for the template matching approach to perform correctly, the image captured by the downward facing camera must have unique features that can be tracked. In some environments where our vehicle will operate, the floor has an almost uniform color while possessing an unique texture, similar to a poured concrete surface. This texture can be used to provide the unique features that are required and the filtering step makes the texture more apparent, as seen in Figure 16c. The filter we apply is a Laplacian [25] and is defined as where I ∈ R w×h is a matrix of intensity values that make up the image and L (I) ∈ R w×h is the Laplacian of I. Directly computing Equation (11) can be computationally intensive so we approximate the Laplacian by convolving I k,bw with a filter kernel [25]. The filter kernel selected is a 7 × 7 Laplacian approximation kernel (Table 1).
To decrease the feature tracking execution time, in the final preprocessing step, before the template matching is performed, we reduce the resolution of I k,bw through resampling. In the resampling process, I k,bw → I k,r ∈ R w r ×h r where w r < w is the width of the resampled image and h r < h is the height. By reducing the resolution of the image we reduce the amount of data that must be processed during the tracking process, thus reducing the execution time. The result of resampling can be seen in Figure 16d.
Once the image has been preprocessed, the UUV translations can be calculated. The process that we use is referred to as template matching and involves finding the location of one image in another image. The first step of the template matching algorithm involves extracting a template image, T k ∈ R w t ×h t where w t ∈ N and h t ∈ N are the width and height of the template, from I k−1,r . Our UUV moves at slow speeds so T k is extracted from the center of I k−1,r . If our vehicle was moving faster it could prove advantageous to extract T k from an alternate location that would give the best possible chance of T k being present in I k,r . An example of the extraction location can be seen in Figure 17a along with the extracted template Figure 17b. Once T k has been extracted, the next step is finding the location of T k in I k,r . The template matching process is performed by cross correlating T k with I k,r which yields where C k ∈ R w×h is the cross correlation matrix, w t h t ∈ N is the total number of pixels in T k , σ I , σ T ∈ R are the standard deviations andĪ k,r ,T k ∈ R are the mean of the pixels in I k,r and T k respectively. The maximum value of C k (Figure 18) is located at (u m , v m ) ∈ N 2 and corresponds to the center of T k in I k,r . Once T k is located in I k,r the translations of the camera attached to the vehicle can be calculated using where δu, δv ∈ N are the translation of the camera, in pixels, in the horizontal and vertical direction and (u c , v c ) ∈ N 2 is the center of I k,r . These translations are not in physical units, but rather in pixels, the camera's native units. To provide useful measurements for odometry purposes δu and δv are converted to physical units by where c ∈ R is a scaling factor, which was calculated experimentally. An image was captured from the downward facing camera and then the camera was moved a set measured distance and a second image was captured. By using the position change measured in pixels along with the position change measured in physical units, the scaling factor that relates the position change in pixels to physical units was determined.
Using δx and δy which are in the vehicle's body frame, the translations of the vehicle in a global frame of reference can be estimated. The translations in the global frame are found using an odometry model, seen in Figure 19, which assumes that the center of the vehicle is located at (x k , y k ) ∈ R 2 , k ≥ 0 and posses a global heading θ k ∈ [−π, π) , k ≥ 0. Using the vehicle translation in the body frame and a global heading provided by a compass, the translations in the global frame are given as where ∆x k , ∆y k ∈ R are the horizontal and vertical translation of the vehicle in the global frame. The global position of the vehicle is given by which when expanded using Equation (15) yields the final odometry model for the system x y y ′ x ′ θ Figure 19. Overview of the visual odometry model.

Experimental Results
In order to examine how well the visual odometry algorithm performed experiments were performed in the laboratory environment due to the practicality, as no on-site underwater testing facility exists, as well as the ability to generate an accurate baseline that we could compare the performance of our algorithm. In order to perform these tests a downward facing camera was mounted to a small ground vehicle ( Figure 20) for testing purposes along with a Hagisonic Stargazer indoor localization sensor [26] that was used to provide an experimental baseline. The vehicle was driven around a test environment logging position estimates provided by the visual odometry system as well as those provided by the Stargazer. The location estimates for an experimental run can be seen in Figure 21a and a plot of the corresponding error, using the Euclidean distance between the two estimates, is shown in Figure 21b. As seen in these results there is an error between the estimate produced by the visual odometry system and that provided by the Stargazer, which has a manufacturer reported error as low as 2 cm. The error comes from the successive build up of small errors in the visual odometry measurements. Since each position estimate is based on the previous estimate, small errors at each time step build up over time, referred to as sensor drift. As discussed in the following section, this is not going to be a problem for the localization algorithm since the obstacle measurement provided by the laser range finder, will help to correct such a drift.

Sensor Validation with EKF SLAM
In the previous two sections a pair of low cost sensors using computer vision where developed for use on an UUV and each of the sensors was tested for accuracy and performance. From these results it is believed that the sensors perform adequately for the task of localization and mapping. EKF SLAM was chosen for initial verification because it is easy to implement and it is often used as a baseline when comparing SLAM solutions due to the large amount of research upon which it is based. EKF SLAM is one of the oldest and most thoroughly researched solutions to the SLAM problem. EKF SLAM is based around several important publications that provided some of the original formulations of the SLAM problem, particularly [1,2], which have been used in the next subsection to summarize the algorithm.

The SLAM Problem
In our application, the SLAM problem can be thought of as providing an UUV with the ability to determine, when placed in an unknown environment, where in that environment it is located while building a map of the environment. The online SLAM problem, which EKF SLAM solves, attempts to estimate the pose our UUV and the map of the environment at the current time step k. In a probabilistic sense the online SLAM problem is attempting to find p (ξ k , M | u 1:k , z 1:k ) (18) where the ξ k = x k y k θ k ∈ R 2 × [−π, π) is the instantaneous pose of the vehicle, M is the map of the environment, u 1:k is the full series of controls performed by the vehicle, and z 1:k is the full series of observations collected. The SLAM problem is an example of a Markov chain and the standard solution to problems of this type is the recursive Bayesian estimator also referred to as the Bayes filter.

EKF SLAM Algorithm
The EKF is one of the earliest implementations of the Bayes filter. Before presenting the solution to the SLAM problem, a method of storing the map, M in Equation (18), must be selected. In many instances it is easiest to think of the map as a set of discrete points in an environment. Each of these discrete locations in the environment are known as landmarks and are areas of the environment that represent features that can be extracted from raw sensor data. Some common types of features that are used by robotic vehicles are walls, corners, or columns for vehicles operating in indoor environments. Using this approach the map is defined as where m i = (x i , y i ) ∈ R 2 are the two dimensional Cartesian coordinates of the ith landmark in M and i = 1, . . . , n where n ∈ N is the total number of landmarks in M. The family of SLAM solutions that represent the world using this type of map are known as feature based SLAM solutions. When landmarks are used to make up the map, an important component of SLAM is the ability to determine which landmark an observed feature represents. The feature to landmark relationship is represented using a set of correspondence values c k ∈ N m where m ∈ N is the number of observations in z k . If the observation z k (i) is generated because of the jth landmark then c k (i) = j. The EKF SLAM algorithm estimates the pose of a vehicle and the map of the environment so the full state of the system being estimated is defined as We assume that the full system behaves as where x k ∈ R 3+2n is the state of the system, z k ∈ R m is the current set of observations, and u k ∈ R c is the current control input. The function g : R 3+2n × R c → R n is the nonlinear state transition model that defines how the system evolves between time steps based on x k−1 and u k . The function h : R 3+2n → R m is the nonlinear measurement model and it describes how z k is related to x k . The variables k and δ k are additive zero mean Gaussian noise with covariances of R k and Q k respectively Using these assumptions, the EKF SLAM algorithm estimates the online SLAM posterior Equation (18) as a Gaussian distribution p (ξ k , M | u 1:k , z 1:k ) = p (x k | u 1:k , z 1:k , c 1:k wherex k ∈ R 3+2n is the mean vector of the estimate and Σ k ∈ R (3+2n)×(3+2n) is the covariance matrix that describes the uncertainty in the estimate.

Prediction
The first step of the EKF SLAM algorithm is referred to as the prediction stage and is based on the state transition model Equation (21) of the system, also referred to as the motion model, which describes how the full SLAM state evolves between time steps. The prediction step uses g (·) to incorporate u k into the estimate. A predicted mean vector is generated according tō wherex k ∈ R 3+2n is the predicted value of the mean vector. The state transition model g (·) updates the vehicle pose using the motion model of the vehicle and u k . We assume that the environment in which the vehicle operates is static so g (·) predicts the landmark locations using their estimated location at the previous time step k − 1.
The use of noisy control inputs causes uncertainty to be added to the estimate, this uncertainty is incorporated in the second phase of the prediction step. The covariance matrix prediction increase the uncertainty in the estimate according to where s Σ k ∈ R (3+2n)×(3+2n) is the predicted covariance matrix, R k ∈ R (3+2n)×(3+2n) is the covariance matrix of the state transition model, and G k ∈ R (3+2n)×(3+2n) is the Jacobian of g (·) with respect to the system state.
The first term of the covariance prediction, G k Σ k−1 G T k , propagates the uncertainty of the estimate from k − 1 to k. The second term, R k , incorporates the additional uncertainty caused by the noisy control input. The landmark predictions do not cause any additional uncertainty to be added to the system, only the control inputs add uncertainty, so R k can be defined, using the covariance matrix of the control inputs, as where M k ∈ R c×c is the covariance matrix of u k and V k ∈ R (3+2n)×c is the Jacobian of g (·) with respect to the control input.

Correction
The second step of the EKF SLAM algorithm is referred to as the correction stage. The correction stage uses the set of feature observations z k and Equation (22) to adjust the mean vector of the estimate while reducing the uncertainty contained in the covariance matrix. The mean vector correction is performed according tox where K k ∈ R (3+2n)×2m is the Kalman gain matrix. The Kalman gain matrix is a weighting matrix that creates a best estimate by defining how important the observation is when it is incorporated, based on the covariance values of the state transition model and measurement model. The Kalman gain matrix is defined as where H k ∈ R 2m×(3+2n) is the Jacobian of h (·) with respect to the system state and Q k ∈ R 2m×2m is the covariance matrix of the measurement model. The corrected covariance matrix of the estimate is generated according to where I is a 3 + 2n identity matrix.

Augmentation
An additional step present in the EKF SLAM algorithm that does not belong to the standard EKF involves the addition of new landmarks to the estimate. As a UUV moves through an unknown environment new landmarks are found as unexplored areas are visited. When these features are observed for the first time the state vector and covariance matrix must be augmented to include the new landmarks. The mean vector augmentation is given bŷ wherex + k ∈ R 5+2n is the mean with the newly observed landmark added and f : R 3+2n × R 2 → R 2 is the inverse measurement model that calculates the landmark location in the global frame based onx k and z k (i).
The augmentation of the covariance matrix is more complicated as Σ k contains the covariance matrices of the vehicle pose estimate and the landmark location estimates along with the cross covariance terms that relates each element inx k to every other element. The covariance matrix augmentation is given by where Σ + k ∈ R (5+2n)×(5+2n) is the covariance matrix following the augmentation. The matrix A k ∈ R 2×(3+2n) is defined as where F k,x ∈ R 2×3+2m is the Jacobian of f (·) with respect to the system state and it propagates the uncertainty in the estimate before augmentation into the new feature cross covariance terms. The matrix B k is defined as where F k,z ∈ R 2×2 is the Jacobian of f (·) with respect to the current observation. The matrix B k takes the current uncertainty and adds the uncertainty caused by the observation sensors to generate the full uncertainty in the location estimate for the new feature.

Feature Extraction and Data Association
EKF SLAM builds a map of the world using a set of landmarks. In order to use these landmarks, a key aspect in implementing the algorithm is developing a method of extracting features that correspond to the landmarks from raw sensor data. Almost any environmental feature can be used as a landmark, as long as it can be detected using the sensor being used. Our tests were performed in a simple rectangular room, so the corners of the room were used as features as they were simple to detect. To extract the corner location from the raw sensor data, a corner extraction algorithm was developed. The algorithm is a modified version of the Random Sample Consensus (RANSAC) algorithm [27] for line identification. In the standard RANSAC line identification algorithm, a random set of points are selected then a best fit line is generated through those points. The full set of points are compared to that line and if a minimum number of points lie on the line then that line assumed to be a true line in the sensor data. Our sensor has very few points and can be quite noisy at larger distance so randomly selecting points to create a line led to a large number of invalid walls being identified. In our algorithm a single point in the sensor data is selected at random and all points that fall within a certain distance of that point are used to generate the potential line.
Once all lines in the laser range finder data are found using the RANSAC algorithm, each line is compared to every other and if two lines have an angle between them that is larger than a minimum corner angle the intersection of those two lines is identified as a corner. An example of the corner identification is show in Figure 22. The second key component in the implementation of EKF SLAM is data association, previously discussed in Section 5.2.2. Data association involves finding which landmark inx k corresponds to each observed feature in z k . If a given observed feature in z k corresponds to a landmark inx k then the estimate is corrected using the observed feature as seen in Section 5.2.2. If the observed feature does not correspond to any landmark then the newly observed feature is used to augmentx k as described in Section 5.2.3. In our test environment the minimum distance between corners was quite large, greater than 1m, so a simple search algorithm was developed to perform the data association. At time step k, a global location is generated for each of the observed features, z k (i), i = 1, 2, . . . , m, usingx k and h (·). If the location of z k (i) is within some maximum distance of the jth landmark inx k then c k (i) = j. If no corresponding landmark is found for the ith observed feature then c k (i) = −1.

Experimental Results
In order to examine if the addition of using our laser based rangefinder along with a SLAM algorithm improved the localization performance over the use of a single downward facing camera and compass, experiments were performed in the same experimental environment using the same test platform described in Section 4.2. Just as in the previous section experiments were performed using this test vehicle due to the lack of an on-site underwater test facility along with the ability to generate an accurate baseline for comparison purposes. In order to provide a location baseline for comparison purposes, a Stargazer indoor localization sensor was also attached to the test platform. The platform was driven remotely around an indoor test environment that measured 3 m × 3.5 m while executing EKF SLAM. The final vehicle path and map are shown in Figure 23a and the position error during the run is shown in Figure 23b. The position error is calculated using the euclidian distance between the estimate and baseline. From the results it can be seen that the error in the estimate produced by EKF SLAM never exceeds 0.7 m. To illustrate how the estimate and the uncertainties change during the run a sequence of images are shown in Figure 24 that display the estimate and uncertainties.  It can be seen in the sequence that initially the uncertainty in the position estimate, represented by a 2σ covariance ellipse, is very small and the uncertainty in the first landmark estimate, which is initially observed before moving, is also small. As the vehicle moves through the environment the uncertainty in the position estimate grows due to the noisy control signals. Due to the growing position uncertainty, the uncertainty in newly observed features also increases. The SLAM algorithm limits the growth of the uncertainty while the vehicle is moving, however the true benefit of EKF SLAM can be seen at the end of the run when the robot returns near the starting position and observes the landmark it observed from the starting location. The robot had a good idea about the location of that feature so EKF SLAM uses this observation to correct all of the estimates that are maintained by the estimate. This correction, referred to as loop closure in the literature [28], not only updates the mean of the estimate for each component it also reduces the uncertainty of each estimate.

Sensor Validation with FastSLAM
In the previous section EKF SLAM was used to validate our assumption that our selected sensor suite performs well enough for underwater localization and mapping. In this section FastSLAM is used as a second validation procedure. Unlike the family of solutions that model the state estimate as a Gaussian, which EKF SLAM is a member, FastSLAM belongs to a family of solutions that do not make this assumption. This family of solutions has the advantage of removing the requirement for additive Gaussian noises to the prediction and measurement models which is advantageous as the noise present in most sensors is not Gaussian. By allowing the use of more realistic probabilistic sensor models, these non-gaussian solutions have the potential of providing more accurate estimates of the SLAM posterior. The solution we selected to implement is referred to as FastSLAM 1.0 [3] as it is the initial version of the FastSLAM algorithm, however in the remainder of this paper we will just refer to it as FastSLAM.

FastSLAM Algorithm
As opposed to the EKF solution to the SLAM problem that estimates the distribution that represents the instantaneous pose of our UUV and the map, the FastSLAM solution estimates the distribution that represents the full trajectory of the UUV and the map, this distribution is given by p (ξ 1:k , M | u 1:k , z 1:k ) By estimating the distribution that represents the full trajectory of the UUV, Equation (35) can be factored into a pair of terms that are easier to estimate. Using the property of conditional independence and making the correspondence value explicit, as with the EKF SLAM approach, the factorization of Equation (35) is given as p (ξ 1:k , M | u 1:k , z 1:k , c 1:k ) = p (ξ 1:k | u 1:k , z 1:k , c 1:k ) p (M | ξ 1:k , u 1:k , z 1:k , c 1:k ) Similarly to the EKF SLAM algorithm, the FastSLAM algorithm uses a feature based map. This map representation and the above decomposition allow us to factor Equation (35) as p (ξ 1:k , M | u 1:k , z 1:k ) = p (ξ 1:k | u 1:k , z 1:k , c 1:k ) n ∏ i=1 p (m i | ξ 1:k , u 1:k , z 1:k , c 1:k ) where the full map posterior is replaced by the product of landmark posteriors. The FastSLAM algorithm estimates the joint posterior, Equation (37), using a Rao-Blackwellized particle filter which is an example of a sampling importance resampling (SIR) [29] particle filter and it is this type of filter that forms the basis of the FastSLAM algorithm. In the FastSLAM algorithm the distribution that represents the trajectory of the UUV is estimated using a particle filter where each particle maintains its own copy of the map. The map maintained by the FastSLAM algorithm is composed of a set of n landmarks where the distribution corresponding to the estimate of each landmark is assumed to be a Gaussian distribution and estimated using an EKF. As a result each particle maintains a UUV pose estimate and n EKFs which represent the distribution corresponding to the set of landmark estimates with a mean vector and covariance matrix. The pth particle in the particle set X k is defined as wherem [p] k,j ∈ R 2 , j = 1, . . . , n and Σ [p] k,j ∈ R 2×2 , j = 1, . . . , n are the mean and covariance of the jth landmark estimate. For p particle and n landmarks the FastSLAM algorithm maintains pn EKFs each used to estimate a single landmark location.

Pose Sampling
The first step of the FastSLAM algorithm is referred to as pose sampling. In this step a set of potential particles, s X k , is generated from the set of particles, X k−1 , that resulted from the previous time step. The set of potential particles is generated by sampling a new pose for each particle in X k−1 using the probabilistic motion model of the vehicle Unlike EKF SLAM where the uncertainty introduced into the estimate is maintained by the covariance matrix, FastSLAM introduces uncertainty through the sampling process. The probabilistic motion model introduces uncertainty by adding noise to the control inputs that is distributed according to the probabilistic model of the control noise. In our implementation the control noise is assumed to be zero mean Gaussian with a covariance matrix M k ; this form of control noise is not a requirement as opposed to EKF SLAM. This is one of the advantages of FastSLAM, more realistic noise models can be used as opposed to EKF SLAM that requires the control noise to be Gaussian. The only requirement for FastSLAM is the distribution that represents the noise on the control input must be of a form that can be sampled from.
The set of potential particles generated during the pose sampling step, s X k , are distributed according to p (ξ 1:k | u 1:k , z 1:k−1 , c 1:k−1 ) which is denoted the proposal distribution. However, the proposal distribution does not match the distribution of particles that we are attempting to estimate, the first half of Equation (37), which is referred to as the target distribution and given as p (ξ 1:k | u 1:k , z 1:k , c 1:k ) It can be seen that the distribution that we sample new particle poses from does not include the current observations at time step k. The current observation is incorporated into the estimate in the third step of the algorithm, the importance weight calculation after which the set of potential particles, distributed according to Equation (40), are transformed to approximate Equation (41) during the final step of the algorithm, the resampling stage.

Feature Location Estimation
In the second step of the FastSLAM algorithm the landmark estimates maintained by each particle are updated using the current set of observations at time step k. Each particle maintains a set of n EKFs representing the particle's estimate for each landmark. The correspondence for each feature observation is calculated as described in Section 5.3 using ξ [p] k and z k . If the jth landmark is not observed at time step k, then the particle's estimate of the feature location remains unchanged m [p] k,j =m [p] k−1,j and Σ [p] k,j = Σ If the jth landmark is observed at time step k, then the landmark estimate is updated using the standard EKF update equations, more specifically the correction equations, that make use of h (·), the measurement model of the system Equation (22) from Section 5, and Q k , the covariance matrix of the measurement model. As with the EKF SLAM algorithm, we assume that the landmarks used to represent the environment are static so the predicted mean and covariance are ŝ m [p] k,j =m [p] k−1,j and s Σ where ŝ m [p] k,j ∈ R 2 is the predicted mean of the jth landmark estimate and s Σ [p] k,j ∈ R 2×2 is the predicted covariance matrix of the estimate. The Kalman gain matrix, K [p] k,j ∈ R 2×2 , is generated according to where H k ∈ R 2×2 is the Jacobian of h (·) with respect to the landmark position. The landmark estimate is corrected usingm and the covariance matrix of the landmark estimate is corrected according to where I is a 2 dimensional identity matrix.

Importance Weight Calculation
As discussed in Section 6.1.1, the set of temporary particles that is generated in the sampling process, s X k , are distributed according to Equation (40) which only includes the control input at time step k. However, the true distribution that we are attempting to estimate, Equation (41), makes use of the current control input along with the current observation and set of correspondences. To overcome the difference between the two distributions an importance weight for each particle is generated. The form of the importance weight comes from the fact that the Rao-Blackwellized particle filter is version of a SIR particle filter. From [30] when we are unable to directly sample from the distribution that we are attempting to estimate, by considering the following importance weight for each particle and particles are drawn with replacement from s X k and added to X k with a probability proportional to w k , then X k will approximate the target distribution and the quality of the approximation will improve as the number of particles increases.
From [3] the importance weight for the ith particle in s X k is the ratio of the target distribution and proposal distribution w [p] k = target distribution proposal distribution = p (ξ 1:k | u 1:k , z 1:k , c 1:k ) p (ξ 1:k | u 1:k , z 1:k−1 , c 1:k−1 ) = η p z k ξ [p] k , c k In order to calculate Equation (48), we take note that the measurement z k is also dependent on the map M. Using this information Equation (48) is expanded as The map is composed of a set of landmarks so we integrate over all landmarks and Equation (49) becomes w Using the fact that each landmark estimate is dependent on the vehicle trajectory, all feature observations, and all landmark correspondences, Equation (50) is written as This can be calculated in closed form as a Gaussian according to and where Σ [p] k−1,j is the covariance of the landmark estimate from k − 1, H k is the Jacobian of h (·), and Q k is the covariance matrix of the observation.

Resampling
The final step of FastSLAM is resampling during which p particles are drawn with replacement from s X k with a probability proportional to w [p] and added to X k . This step converts s X k which is distributed according to Equation (40) to the final particle set X k which is distributed according to Equation (41).

New Feature Addition
Similarly to the EKF SLAM algorithm, when features are observed that do not correspond to already tracked landmarks, a new landmark must be added to the map. From [31], when a new feature is observed the mean of the new landmark's estimate is initialized according tô where f : R 3 × R 2 → R 2 is the inverse of h (·) and it generates a landmark location based on a particle pose and a measurement. The covariance is initialized according to where F k,z is the Jacobian of f (·) with respect to the observation and Q k is the covariance of the measurement noise.

Feature Extraction and Data Association
Just like the EKF SLAM algorithm, the FastSLAM algorithm uses a map composed of landmarks so a feature extraction and data association method must be selected. For the following experimental results the same feature extraction and data association method used by the EKF SLAM algorithm and described in Section 5.3 was used.

Experimental Results
As with the two previous localization approaches, EKF SLAM and pure visual odometry, validation testing was performed in a laboratory environment where an accurate baseline could be generated. The compete sensor suite composed of the laser rangefinder, downward facing camera, and compass were mounted to a test vehicle and driven around a test environment, the same vehicle and environment used in Section 4.2. The generated path estimate and map can be seen in Figure 25a compared to the path estimate provided by the Stargazer. The resulting path error as a function of time can be seen in Figure 25b. Since the vehicle posterior is represented by a set of particles, the mean of the particle set is used for display purposes and calculating the error. As seen in the error plot the position error never exceeds 0.7 m. To illustrate the FastSLAM process an equally spaced sequence of frames over the entire run are displayed in Figure 26.
In Figure 26 a 2σ covariance ellipse is shown along with the mean position estimate from the particle set along with the mean position estimate of each of the landmark locations. The performance of the FastSLAM algorithm is very close to that of the EKF SLAM approach seen in Section 5.4. The uncertainty is low for both the position and landmark estimates at the beginning of the run, described by the uncertainty ellipse that is generated using the covariance of the particle set. As the vehicle moves through the environment, the uncertainty grows until the vehicle returns near the starting location and re-observes the first landmark. At this time the particles that have positions nearest to the true position of the vehicle have very large importance weights so they are selected at higher probability than those farther away from the vehicles true position. This corrects the position estimate and brings the estimate much closer to the true position while also significantly reducing the spread of the particle set.

Conclusions
In this paper a set of low cost vision based sensors were developed for use on a UUV. Our goal is to select a set of sensors that can be used by a UUV to perform underwater localization and mapping. A custom laser based range finder was developed in Section 3 and experimental results on the sensors performance were provided. A visual odometry algorithm was described in Section 4 that makes use of a downward facing camera. In Section 5 and Section 6 the pair of vision based sensors and a compass were tested to see how well they perform in localization and mapping. In Section 5, an EKF SLAM algorithm was used to validate the sensors and in Section 6 a FastSLAM algorithm was used. The results from the experimental validation show that when using a SLAM solution the selected sensors perform well, with a position error at no more than 0.7 m over the full run. As seen in Figure 27 when used with SLAM the sensor suite performs better than when using the raw sensor data alone and the final position error for the vehicle is < 0.2 m once loop closure has been performed. The initial results presented in the paper were generated using a small ground vehicle due to the fact that accurate comparative measurements could be made in order to examine the performance of our sensor suite. We believe that the results have shown that our sensor suite has the potential to generate accurate measurements in an underwater environment that will allow for a UUV to operate autonomously. By using our proposed sensor package the dynamic position of a low cost, unmanned, underwater vehicle can be known more accurately. By increasing the accuracy at which these vehicles can dynamically position themselves, the performance of underwater sensor networks [32] or communication networks [33] which make use of such vehicles can be improved. The next step in this research is to mount our sensor package to a UUV and to examine how well the sensor package operates in an underwater environment.