An Underwater Localization Method Based on Visual SLAM for the Near-Bottom Environment

: The feature matching of the near-bottom visual SLAM is influenced by underwater raised sediments, resulting in tracking loss. In this paper, the novel visual SLAM system is proposed in the underwater raised sediments environment. The underwater images are firstly classified based on the color recognition method by adding the weights of pixel location to reduce the interference of similar colors on the seabed. The improved adaptive median filter method is proposed to filter the classified images by using the mean value of the filter window border as the discriminant condition to retain the original features of the image. The filtered images are finally processed by the tracking module to obtain the trajectory of underwater vehicles and the seafloor maps. The datasets of seamount areas captured in the western Pacific Ocean are processed by the improved visual SLAM system. The keyframes, mapping points, and feature point matching pairs extracted from the improved visual SLAM system are improved by 5.2%, 11.2%, and 4.5% compared with that of the ORB-SLAM3 system, respectively. The improved visual SLAM system has the advantage of robustness to dynamic disturbances, which is of practical application in underwater vehicles operated in near-bottom areas such as seamounts and nodules.


Introduction
Underwater inspection and surveys are important underwater applications such as seafloor resource exploration.Visual technology is widely used for Underwater inspection and surveys.Deep-sea mining and seafloor oil and gas exploration utilized by AUVs are popular topics to research [1][2][3].Wang uses a robust Real-Time AUV Self-Localization method for deep-sea exploration [4].Hong presents a portable autonomous underwater vehicle (AUV) named Shark for vision-based underwater inspection [5].Chemisky conducts a review of the close-range optical methods for the underwater oil and gas industry [6].Stenius presents the application of AUVs for seaweed farm inspection [7].Underwater visual localization technology such as underwater visual SLAM is one of the key technologies for underwater inspection tasks.Underwater visual SLAM is utilized for the trajectory localization of underwater vehicles and the construction of the surrounding environment by feature matching and tracking the captured video images in the underwater scene [8,9].Underwater navigation is necessary for the operation of autonomous underwater vehicles [10].The position of the underwater vehicle is a prerequisite for underwater navigation [11].Due to the absence of GPS information in the underwater environment, underwater localization is difficult and has been extensively investigated.Multi-sensor information fusion is commonly applied to obtain the position information of autonomous underwater vehicles.A passive inverted Ultra-Short Baseline (piUSBL) positioning system is proposed by Wang [12], which provides accurate and instantaneous positioning for small Autonomous Underwater Vehicles (AUVs) by single-beacon tracking at low cost and power consumption.The acoustic image processing based on multi-beam bathymetric sonar data is utilized in underwater terrain-aided navigation, which is more stable when direction error is more significant than 10 • , and the accuracy is approximately 50% better.At the same time, noise and scale vary in the real-time images compared with the terrain contour matching algorithm [13].In addition to using additional sensor information to aid navigation, improved underwater navigation algorithms are also proposed for underwater localization [14][15][16].However, accumulated errors are unavoidable in traditional integrated navigation methods, and devices such as DVL and USBL fail in near-bottom environments.Machine vision can work steadily for long periods with tiny cumulative time errors [17,18].Thus, underwater visual SLAM has been developed and applied for location awareness of the underwater environment and assisted navigation in recent years, which has excellent potential for unmanned surveys and free cruising applications [19][20][21][22][23].The application of underwater SLAM is widely studied.Issartel uses SLAM technology for the underwater military to achieve underwater hiding for tasks [24].Zhang utilizes BSLAM (Bathymetric Particle Filter SLAM) which is accurate and fast for oceanographic surveys, demining, and seabed mappings [25].Yang proposes a SLAM localization algorithm using forward-looking sonar for deep-sea mining [26].Mahajan proposed a pilot aid using visual SLAM for seabed surveying applications [27].
Joshi verifies and analyzes multiple visual algorithms in the underwater environment, such as OKVIS, SVO, ROVIO, VINS-Mono, and ORB-SLAM3, based on the datasets tested [28].The Fugu-f (Fugu flexible) system is presented to provide visual localization in submarine tasks such as navigation, surveying, and mapping under good visibility with the main advantages of robustness and flexibility [29].An omnidirectional and positioning-tolerant miniaturized prototype AUV docking system based on the integrated visual navigation and docking algorithm is proposed to solve the planar-type docking issues in a transparent water environment [30].Although underwater visual SLAM has been used for AUV navigation tasks, it is not always feasible for all cases due to limitations in underwater image quality.A keyframe-based monocular visual odometry method is proposed by adding a re-tracking mechanism to enhance optical flow tracking, realizing a robust vision-based underwater localization in the turbid underwater environment [31].However, underwater raised sediments sometimes occur in near-bottom environments.The feature matching of visual SLAM is affected by the randomness of motion direction of the underwater raised sediments.There are few studies on this influence.
When visual SLAM is affected by the external environment, there are two main solutions: adding sensors or image pre-processing [32].Since the role of visual SLAM is auxiliary rather than central in underwater navigation, the solution of adding sensors for visual SLAM is not adopted in underwater navigation.Image pre-processing for visual SLAM includes two parts: image classification and image denoising.
Because the raised sediment phenomenon is occasional and does not always exist, image classification can reduce the computational effort to increase real-time performance, which is essential for visual SLAM.Image classification methods use underwater image features such as color, texture, shape, and spatial distribution to classify images.Mittal uses deep learning and image color analysis to classify underwater photographs and recognize numerous things, such as fish, plankton, coral reefs, submarines, and the gestures of sea divers [33].Mahmood proposes a new image feature (called ResFeats) extracted from images' textures and shapes.ResFeats has state-of-the-art classification accuracies on MLC, Benthoz15, EILAT, and RSMAS datasets compared with the traditional CNN method [34].Lopez-Vazquez designs a pipeline using integrated spatial distribution and temporal dynamics information to perform well in mobile and sessile megafauna recognition and classification tasks [35].The accuracy value of the classification method reaches 76.18% in deep-sea videos taken at 260 m depth in the Barents Sea.As the underwater raised sediments have the characteristics of noticeable color difference with seawater and strong randomness of shape and spatial distribution, the color features of the underwater image are utilized for image classification.However, the color of the underwater raised sediments is similar to that of the sea floor in the near-bottom environment, affecting image classification.There are few studies for image classification based on image color features in near-bottom environments.
The images that need to be denoised are obtained through image classification.The image-denoising methods include learning-based and filtering-based [36].Deep learning techniques such as convolutional neural networks (CNN) are widely utilized for the imagedenoising method of learning [37].Pang proposed a data augmentation technique called corrupted-to-corrupted (R2R) to achieve unsupervised image denoising.The method is competitive to representative supervised image denoising [38].Li proposes a selfsupervised image denoising (SSID) method for real-world sRGB images to seek spatially adaptive supervision on real-world sRGB photographs [39].Dasari uses GAN architecture and histogram equalization to enhance the underwater images.The 2186 real-world underwater images are used for verification [40].Raj uses SURF (Speeded-Up Robust Features) and SVM (Support Vector Machines) algorithms to attain maximum accuracy in underwater image classification [41].Moghimi uses a two-step image enhancement method which includes color correction and a convolutional neural network (CNN) with deep learning capability to realize the underwater image quality enhancement [42].
However, underwater raised sediments are often random and unknown during underwater operations.The background has also changed due to the movement of underwater vehicles.There are not enough sample sets to train and learn underwater image classification and denoising in an underwater survey task when the operating area is unknown.Terrestrial training models are directly used whose results are not good.Therefore, filtering-based methods are utilized for image denoising.There are extensive studies on filtering-based image-denoising methods [43].Kumar proposes different median filter methods to eliminate salt-and-pepper noise [44].Sagar presents a circular adaptive median filter (CAMF) to denoise salt-and-pepper noise of varying noise densities for magnetic resonance imaging (MRI) images [45].It can be seen that the primary filter-based methods are effective for salt-and-pepper noise.The underwater raised sediments are similar to salt-and-pepper noise in terms of significant color differences.However, there is a big difference in size and shape.There is little research on underwater raised sediments denoised through filter-based methods.
In this paper, an improved visual SLAM system is proposed to reduce the impact of the underwater raised sediments to the feature matching of the visual SLAM to improve the robustness of the visual SLAM.The weights of pixel position are added to improve the color recognition method in HSV color space for image classification to reduce the interference of similar colors on the seabed to improve the real-time of the visual SLAM.An improved adaptive median filtering method is proposed by utilizing the mean value of the window border pixels as a filtering criterion, which enhances the retained feature information of the classified images to retain more original features of the images to improve the success rate of feature matching.The filtered images are finally processed by the tracking module to obtain the trajectory of underwater vehicles and the seafloor maps.The video datasets captured in the western Pacific Ocean at 3000 m depth are processed by the improved visual SLAM system.Keyframes, mapping points, and feature point matching pairs are extracted from the improved visual SLAM system, which improved by 5.2%, 11.2%, and 4.5%, respectively.The improved visual SLAM system has the advantage of robustness to dynamic disturbances such as underwater raised sediments, water bubbles, etc., which is of practical application in underwater vehicles operated in near-bottom areas such as seamounts, rockfall, and nodules.
This paper is organized as follows.Section 2 describes the underwater raised sediments phenomenon and proposes a novel visual SLAM system in the underwater raised sediments environment.Section 3 presents the image classification and denoising methods to achieve image pre-processing.Then, image classification and quality evaluation results are obtained to verify the effectiveness of the methods.Section 4 shows the experimental tests and results.Section 5 presents the conclusions of the paper.

The Visual SLAM System for Underwater Raised Sediments Environment Establishment
Underwater images taken by the autonomous underwater vehicles while working at a depth of 3000 m in the western Pacific seamount area are shown in Figure 1.Underwater noraised sediment images that expressed different areas during the movement and operation of autonomous underwater vehicles are shown in Figure 1, respectively.The seamount area is shown in Figure 1a, the rockfall area is shown in Figure 1b, and the nodules area is shown in Figure 1c.The images of the underwater different areas are clear and rich in texture detail.When the autonomous underwater vehicles operate in the seamount area near the bottom, the localization of the low-cost autonomous underwater vehicles is realized generally by an inertial navigation system (INS) with IMU as the core sensor and integrated navigation assisted by underwater acoustic equipment such as Doppler velocity log(DVL) and ultra-short baseline (USBL).However, there is a dead band due to the installation position of the Doppler velocity log(DVL) on the low-cost autonomous underwater vehicles that are too close to the seabed.The ultra-short baseline (USBL) is disturbed by bubbles generated by propeller disturbance.Underwater acoustic equipment fails when it moves and inspects underwater near-bottom seamount areas.Because the images taken in the underwater seamount area are clear and texture features are sufficient, the visual localization method is utilized instead of underwater acoustic equipment to assist integrated navigation.
This paper is organized as follows.Section 2 describes the underwater raised sedi-ments phenomenon and proposes a novel visual SLAM system in the underwater raised sediments environment.Section 3 presents the image classification and denoising methods to achieve image pre-processing.Then, image classification and quality evaluation results are obtained to verify the effectiveness of the methods.Section 4 shows the experimental tests and results.Section 5 presents the conclusions of the paper.

The Visual SLAM System for Underwater Raised Sediments Environment Establishment
Underwater images taken by the autonomous underwater vehicles while working at a depth of 3000 m in the western Pacific seamount area are shown in Figure 1.Underwater no-raised sediment images that expressed different areas during the movement and operation of autonomous underwater vehicles are shown in Figure 1, respectively.The seamount area is shown in Figure 1a, the rockfall area is shown in Figure 1b, and the nodules area is shown in Figure 1c.The images of the underwater different areas are clear and rich in texture detail.When the autonomous underwater vehicles operate in the seamount area near the bo om, the localization of the low-cost autonomous underwater vehicles is realized generally by an inertial navigation system (INS) with IMU as the core sensor and integrated navigation assisted by underwater acoustic equipment such as Doppler velocity log(DVL) and ultra-short baseline (USBL).However, there is a dead band due to the installation position of the Doppler velocity log(DVL) on the low-cost autonomous underwater vehicles that are too close to the seabed.The ultra-short baseline (USBL) is disturbed by bubbles generated by propeller disturbance.Underwater acoustic equipment fails when it moves and inspects underwater near-bo om seamount areas.Because the images taken in the underwater seamount area are clear and texture features are sufficient, the visual localization method is utilized instead of underwater acoustic equipment to assist integrated navigation.Due to the movement of the underwater vehicle and the disturbance of the propeller near the bo om, the underwater raised sediments will sometimes be generated in nearbo om areas in Figure 2. The degrees of the underwater raised sediments are different in different areas.The underwater raised sediment in the seamount area is light degree shown in Figure 2a, the underwater raised sediment occupies a small part of the images, and most of the feature information of the underwater images is retained compared with the origin images; the underwater raised sediment in the rockfall area is medium degree shown in Figure 2b, the underwater raised sediment takes up most of the images, and most of the feature information of the underwater images are influenced by the underwater raised sediments; and the underwater raised sediment in the area of the nodule is heavy degree shown in Figure 2c, and there is fog dust apart from line dust.Few of the feature information is retained in underwater images.The impact of underwater raised sediments on visual localization techniques such as ORB-SLAM3 is mainly divided into two parts: the original images are obscured by the underwater raised sediments, reducing the feature information.Image recovery techniques are generally used to recover Due to the movement of the underwater vehicle and the disturbance of the propeller near the bottom, the underwater raised sediments will sometimes be generated in nearbottom areas in Figure 2. The degrees of the underwater raised sediments are different in different areas.The underwater raised sediment in the seamount area is light degree shown in Figure 2a, the underwater raised sediment occupies a small part of the images, and most of the feature information of the underwater images is retained compared with the origin images; the underwater raised sediment in the rockfall area is medium degree shown in Figure 2b, the underwater raised sediment takes up most of the images, and most of the feature information of the underwater images are influenced by the underwater raised sediments; and the underwater raised sediment in the area of the nodule is heavy degree shown in Figure 2c, and there is fog dust apart from line dust.Few of the feature information is retained in underwater images.The impact of underwater raised sediments on visual localization techniques such as ORB-SLAM3 is mainly divided into two parts: the original images are obscured by the underwater raised sediments, reducing the feature information.Image recovery techniques are generally used to recover characteristics, but image recovery techniques require accurate image information, which is often unknown in the underwater near-bottom environment; the direction of motion of underwater raised sediments is random and different from the motion direction of underwater vehicles.When the degree of underwater dust increases, it will cause a reduction in the number of feature point-matching pairs due to the directional consistency detection of the tracking module.Therefore, image pre-processing is necessary to remove raised sediments which can increase feature point matching pairs and increase the robustness of visual SLAM.
underwater raised sediments is random and different from the motion direction of underwater vehicles.When the degree of underwater dust increases, it will cause a reduction in the number of feature point-matching pairs due to the directional consistency detection of the tracking module.Therefore, image pre-processing is necessary to remove raised sediments which can increase feature point matching pairs and increase the robustness of visual SLAM.An improved visual SLAM system is proposed for visual localization in the underwater raised sediments environment.The block diagram of the improved underwater visual SLAM system is shown in Figure 3.The monocular video is extracted frame by frame into an image stream, and each image is classified as to whether or not it has to be preprocessed.The improved adaptive median filtering algorithm is then utilized to perform image denoising on the raised sediment images.The ORB feature points are retrieved from pre-processed images, and the matching between the initial frame and the current frame is achieved by performing monocular initialization using the Homography matrix or the Fundamental matrix.The pose of the initialization is obtained by Epipolar Geometry.The matching feature point pairs are then projected into 3D map points using triangulate.Via tracking the motion model or key frame, feature matching is conducted.Then, the pose of the current frame is estimated by PnP (Perspective-n-Point) such as DLT (Direct Linear Transform), and optimized by the Back-end Optimization such as BA (Bundle Adjustment).The Loop Closure Detection is not necessary, as the underwater vehicle does not return to its starting point in the underwater exploration task.An improved visual SLAM system is proposed for visual localization in the underwater raised sediments environment.The block diagram of the improved underwater visual SLAM system is shown in Figure 3.The monocular video is extracted frame by frame into an image stream, and each image is classified as to whether or not it has to be preprocessed.The improved adaptive median filtering algorithm is then utilized to perform image denoising on the raised sediment images.The ORB feature points are retrieved from pre-processed images, and the matching between the initial frame and the current frame is achieved by performing monocular initialization using the Homography matrix or the Fundamental matrix.The pose of the initialization is obtained by Epipolar Geometry.The matching feature point pairs are then projected into 3D map points using triangulate.Via tracking the motion model or key frame, feature matching is conducted.Then, the pose of the current frame is estimated by PnP (Perspective-n-Point) such as DLT (Direct Linear Transform), and optimized by the Back-end Optimization such as BA (Bundle Adjustment).The Loop Closure Detection is not necessary, as the underwater vehicle does not return to its starting point in the underwater exploration task.

Underwater Raised Sediments Image Pre-Processing Method
The underwater raised sediments image pre-processing method consists of image classification and image denoising, which removes dynamic interference in underwater images.The proposed image classification method is presented in Section 3.1.The underwater images are classified based on the color recognition method in HSV color space to obtain the images that need to be denoised by adding the weights of pixel location to re-

Underwater Raised Sediments Image Pre-Processing Method
The underwater raised sediments image pre-processing method consists of image classification and image denoising, which removes dynamic interference in underwater images.The proposed image classification method is presented in Section 3.1.The underwater images are classified based on the color recognition method in HSV color space to obtain the images that need to be denoised by adding the weights of pixel location to reduce the interference of similar colors on the seabed.The improved adaptive median filtering method is proposed in Section 3.2.The improved adaptive median filter method is proposed to denoise the classified images by using the mean value of the filter window border as the discriminant condition to reduce the disturbance of underwater raised sediments and to retain the original features of the image.The filtered image quality assessment is shown in Section 3.3 through Mean Gradient (MG), Structural Similarity (SSIM), and Peak Signal-to-Noise Ratio (PSNR) methods.

The Proposed Image Classification Method
The flowchart of the proposed image classification method is shown in Figure 4, where the dotted box indicates the proposed image classification method.The original underwater images are first converted to HSV color space.The location weight values are set by the application scenes added in color recognition to reduce the interference of similar colors on the seabed.When the color recognition value of the image exceeds the discriminant threshold, the image needs to be filtered; otherwise, the original image is retained.The classified images are entered into the filter module, and the remaining images are entered directly into the tracking module.The color space of an image includes RGB (Red, Green, and Blue) color space, CMY (Cyan, Magenta, and Yellow) color space, and HSV (Hue, Saturation, and Value) color space, etc.The CMY color space is used to represent a printed image and the digital images commonly use RGB color space and HSV color space.The RGB color space is the most used way to describe electronic images, but parameter changes in RGB can cause significant color changes.Therefore, it is not suitable for classifying images by color.Conversely, the HSV color space can be used to classify underwater images by se ing a parameter range for the color.
Images are classified based on the color and location of the underwater raised sediments to determine if the images need to be denoised.The color of the underwater raised sediments is different from the water, so the image classification can be realized through color recognition.However, since the seabed's color is similar to that of the underwater raised sediments, location is also necessary to determine the color weights.Depending on the camera mounting angle and distance from the bo om, underwater-raised sediments in near-bo om images suspended in seawater are present in the image's upper half, not The color space of an image includes RGB (Red, Green, and Blue) color space, CMY (Cyan, Magenta, and Yellow) color space, and HSV (Hue, Saturation, and Value) color space, etc.The CMY color space is used to represent a printed image and the digital images commonly use RGB color space and HSV color space.The RGB color space is the most used way to describe electronic images, but parameter changes in RGB can cause significant color changes.Therefore, it is not suitable for classifying images by color.Conversely, the HSV color space can be used to classify underwater images by setting a parameter range for the color.
Images are classified based on the color and location of the underwater raised sediments to determine if the images need to be denoised.The color of the underwater raised sediments is different from the water, so the image classification can be realized through color recognition.However, since the seabed's color is similar to that of the underwater raised sediments, location is also necessary to determine the color weights.Depending on the camera mounting angle and distance from the bottom, underwater-raised sediments in near-bottom images suspended in seawater are present in the image's upper half, not the seafloor's lower half.Therefore, the yellow pixel weight for the seawater is ω a , and the yellow pixel weight for the seabed is ω b , ω a ≫ ω b , The size of the photo is m * n, and the pixel point location is expressed by (i, j), and the location weight ω is expressed in Equation ( 1): The images captured by the underwater camera are stored in RGB color space.RGB color space uses a linear combination of three-color components to represent color.But in color recognition, any color is highly correlated with these three components, so it is not intuitive to continuously change colors, and these three components need to change to adjust the color of the image.Therefore, we need to convert the RGB into HSV color space expressed in Equation ( 2).The color component value of a pixel point is R, G, B, respectively; max is the maximum of the three color components, min is the minimum of the three color components, V is the luminance Value, S is the Saturation, and H is the hue.Color is only highly correlated with Hue.Adjusting Saturation and Value will result in similar colors.
The original purpose of image classification was to reduce the amount of computation.The purpose of the increase in position weights is to increase the accuracy of image classification based on color.The limitations of the positional weights are such that it is not able to correctly classify all images, but the ORB-SLAM3 system itself is robust to images that are not correctly classified.The range of values for the color of underwater raised sediments is set in Equation ( 3 where H is expressed using radians.k is a Boolean value, if the HSV value of the pixel satisfies Equation (3), the value k ij is 1; otherwise, it is 0. ω is the position weight, K is the number of pixels identified by adding the position weight, and ρ is the discriminant threshold.
When K > ρ, the image is considered to need image noise denoising, otherwise the original image is kept.
For the underwater videos of the different areas, 2000 underwater images were extracted at 15 fps, respectively.Let discriminant threshold ρ = 10, 15, 20, 25, 30, there are classification results shown in Tables 1-3.The 'real' respects the true results which are used to compare.The title of Need-Denoising shows the number of classified images.The title of Original shows the number of the remaining images.Accuracy is obtained by the number of correctly detected images divided by the total number of images.The image classification in the seamount area has high accuracy, when ρ = 15 or 20.The real number of Need-Denoising images in the rockfall area is more than that in the seamount area.The number of Need-Denoising images in the rockfall area is more than the real number because the underwater raised sediment takes up most of the images.During the underwater operation of an underwater vehicle, underwater raised sediments are not present at every moment, so not all the images need to be denoised.The image classification method proposed in this paper can significantly reduce the computational effort and retain original image features.

The Improved Adaptive Median Filtering Method
The flowchart of the improved adaptive median filtering method is shown in Figure 5, where the dotted box indicates the improved adaptive median filtering method.One pixel in the classified image is chosen, and the filter window length is initialized (set in 3).The mean value of the window border is compared with the mean value of the background, which is the mean value of the image.The window length is determined through the compared result based on the Pauta criterion.The mean value of the window border is utilized for the filter discriminant condition, which is compared with the value of the chosen pixel to decide whether to filter or not.The median value of the filter window is used to replace the filtered pixel.Otherwise, the original value is retained.Then, the next pixel repeats the same steps until the end of the loop.The filtered image is entered in the tracking module at last.

Window length determination
Mean value of the background

Window length initialization Mean value of the window border
The original pixel value

Next pixel
The value of the chosen pixel

Increase filter window size
The median value of the filter window The underwater raised sediments are small in the image, and the color is significantl different from seawater.Due to the relative motion of underwater raised sediments and underwater vehicles, there is a smear in the underwater image, as shown in Figure 2. Th smear can be seen as a kind of line salt-and-pepper noise.Adaptive median filtering ca effectively filter out salt-and-pepper noise of different sizes through adaptive window size.The original adaptive median filter is more suitable for granular noise.This pape proposes an improved adaptive median filter for line noise.The window size is deter mined according to the mean pixel value of the window border compared with the mea pixel value of the background.Then, the center point is compared with the mean pixe value of the window border to decide whether to filter or retain.If the filter is decided, th filtered pixel points are replaced by the median value within the window.The improved adaptive filtering has a be er effect of denoising the line noise.

Filter window size determination
The size of the filter window should be chosen to be as small as possible to retai more feature information about the underwater images.Thus, an adaptive filter method for the selection of filter window size is applied.The window is a square, and the initia value of the window length is 3.The gray value of the center point is shown in th cell of the window, as shown in Figure 6.The gray value of the center point is 171, an Underwater-raised sediments are small in the image, and the color is significantly different from seawater.Due to the relative motion of underwater raised sediments and underwater vehicles, there is a smear in the underwater image, as shown in Figure 2. The smear can be seen as a kind of line salt-and-pepper noise.Adaptive median filtering can effectively filter out salt-and-pepper noise of different sizes through adaptive window size.The original adaptive median filter is more suitable for granular noise.This paper proposes an improved adaptive median filter for line noise.The window size is determined according to the mean pixel value of the window border compared with the mean pixel value of the background.Then, the center point is compared with the mean pixel value of the window border to decide whether to filter or retain.If the filter is decided, the filtered pixel points are replaced by the median value within the window.The improved adaptive filtering has a better effect of denoising the line noise.
The underwater raised sediments are small in the image, and the color is significantly different from seawater.Due to the relative motion of underwater raised sediments and underwater vehicles, there is a smear in the underwater image, as shown in Figure 2. The smear can be seen as a kind of line salt-and-pepper noise.Adaptive median filtering can effectively filter out salt-and-pepper noise of different sizes through adaptive window size.The original adaptive median filter is more suitable for granular noise.This paper proposes an improved adaptive median filter for line noise.The window size is determined according to the mean pixel value of the window border compared with the mean pixel value of the background.Then, the center point is compared with the mean pixel value of the window border to decide whether to filter or retain.If the filter is decided, the filtered pixel points are replaced by the median value within the window.The improved adaptive filtering has a better effect of denoising the line noise.

1.
Filter window size determination The size of the filter window should be chosen to be as small as possible to retain more feature information about the underwater images.Thus, an adaptive filter method for the selection of filter window size is applied.The window is a square, and the initial value of the window length n w is 3.The gray value of the center point p ij is shown in the cell of the window, as shown in Figure 6.The gray value of the center point p ij is 171, and the pixels are sorted from most minor to most significant, as shown in Figure 7.The maximum gray value Z max of the window is 171, the maximum gray value Z min of the window is 113, and the median gray value Z med of the window is 118.The mean gray value of the window border w ij is 139.5.The mean of the background grayscale value is u b , and the standard deviation of the background grayscale value is σ b .According to the Pauta criterion, if w ij − u b < 3σ b , determine the window length n w and enter the filtering session.Otherwise, add 2 to the window length n w and perform a new cycle until the window length n w is determined.The maximum value of the window length is n max .When n w = n max , the filtering session is entered directly.

Filtering session
The method of mean value filtering is a common method for image denoising.How ever, the method of median filtering has less computational effort which increases the real time performance of ORB-SLAM3, and the image processed by median filtering retains more feature information compared to mean filtering to improve the stability of the ORB SLAM3.Therefore, the method of median filtering is applied for the underwater image denoising.The gray value of the center point is , the mean gray value of the window border is in Figure 8, and the standard deviation of the background grayscale value is .According to the Pauta criterion, if , the gray value of the center point is replaced by the median gray value of the window , otherwise the original value is retained.

Filtering session
The method of mean value filtering is a common method for image denoising.How ever, the method of median filtering has less computational effort which increases the real time performance of ORB-SLAM3, and the image processed by median filtering retains more feature information compared to mean filtering to improve the stability of the ORB SLAM3.Therefore, the method of median filtering is applied for the underwater image denoising.The gray value of the center point is , the mean gray value of the window border is in Figure 8, and the standard deviation of the background grayscale value is .According to the Pauta criterion, if , the gray value of the center point is replaced by the median gray value of the window , otherwise the original value is retained.

Filtering session
The method of mean value filtering is a common method for image denoising.However, the method of median filtering has less computational effort which increases the real-time performance of ORB-SLAM3, and the image processed by median filtering retains more feature information compared to mean filtering to improve the stability of the ORB-SLAM3.Therefore, the method of median filtering is applied for the underwater image denoising.The gray value of the center point p ij is g ij , the mean gray value of the window border is u w in Figure 8, and the standard deviation of the background grayscale value is σ w .According to the Pauta criterion, if g ij − u w < 3σ w , the gray value of the center point g ij is replaced by the median gray value of the window Z med , otherwise the original value g ij is retained.
image denoising.The gray value of the center point is , the mean gray value of the window border is in Figure 8, and the standard deviation of the background grayscale value is .According to the Pauta criterion, if , the gray value of the center point is replaced by the median gray value of the window , otherwise the original value is retained.

The Filtered Image Quality Assessment
The assessment of image quality focuses on three aspects.Firstly, measuring whether the sharpness and contrast of the images are effectively improved, is generally evaluated by the Mean Gradient (MG).Secondly, detecting whether the enhanced images retain the information of the original image as much as possible, which generally uses Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) to calculate the structural similarity of the images.Finally, visually analyze the denoising effect by observing the images before and after processing.
(1) Mean Gradient (MG) is used to measure the contrast of images, and the larger the mean gradient, the greater the contrast.
where p i,j denotes the pixel value at position (i, j) and the image size is H * W.
(2) Structural Similarity (SSIM) is applied to evaluate the pixel correlation between the processed and original images.
where µ x and µ y are the mean of the two images, σ x and σ y are the standard deviations, while σ xy is the covariance of the two images, and C 1 C 2 are constants.
(3) Peak Signal-to-Noise Ratio (PSNR) reflects the degree of similarity between the processed and original images and is similar to the SSIM function.
MSE is the Mean Square Error (MSE) of the two images before and after processing and ϵ is the number of pixel bits in binary, which is taken as 8.
The assessment of image quality focuses on three aspects.Firstly, measuring whether the sharpness and contrast of the images are effectively improved, is generally evaluated by the Mean Gradient (MG).Secondly, detect whether the enhanced images retain the information of the original image as much as possible, which generally uses Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) to calculate the structural similarity of the images.Finally, visually analyze the denoising effect by observing the images before and after processing.
Because the degree of underwater raised sediments is severe in the rockfall and nodules areas, the feature matching of ORB-SLAM3 is affected leading to initialization failure.The effectiveness of image pre-processing and the enhancement of ORB-SLAM3 is validated in the seamount area.
An original color image of underwater raised sediments is shown in Figure 9a, and its grayscale image is shown in Figure 9b.The results of the traditional adaptive median filter (TAMF) and improved adaptive median filter (IAMF) on the original grayscale image are shown in Figures 9c and 9d, respectively.The image quality assessment results are shown in Table 4.The metrics are the three assessment methods.The TAMF and IAMF are the abbreviations of the two filter methods.The improvement is calculated by the difference value of the two filter methods divided by the value of the traditional adaptive median filter.The image contrast is significantly improved by 106.7%.Image similarity is improved by 4.6%, and image quality by 1.1%.More image similarity means more original information preserved, which can be seen by observing the images directly.The improved adaptive median filter has a better denoising effect on the line noise than the traditional adaptive median filter.The improved adaptive median filter has a be er denoising effect on the line noise than the traditional adaptive median filter. (

Experimental Verification and Analysis
The experimental environment is shown in Section 4.1.The experimental procedures and results are shown in Section 4.2.The datasets of seamount areas captured in the western Pacific Ocean are processed by the improved visual SLAM system.The initialization is recorded, which proves that the improved system is easy to succeed.The keyframes, mapping points, and feature point matching pairs extracted from the improved visual SLAM system are improved by 5.2%, 11.2%, and 4.5% compared with that of the ORB-SLAM3 system, respectively.The improved visual SLAM system has the advantage of robustness to dynamic disturbances such as underwater raised sediments, water bubbles, etc., which is of practical application in underwater vehicles operated in near-bo om areas such as seamounts, rockfall, and nodules.

Experimental Verification and Analysis
The experimental environment is shown in Section 4.1.The experimental procedures and results are shown in Section 4.2.The datasets of seamount areas captured in the western Pacific Ocean are processed by the improved visual SLAM system.The initialization is recorded, which proves that the improved system is easy to succeed.The keyframes, mapping points, and feature point matching pairs extracted from the improved visual SLAM system are improved by 5.2%, 11.2%, and 4.5% compared with that of the ORB-SLAM3 system, respectively.The improved visual SLAM system has the advantage of robustness to dynamic disturbances such as underwater raised sediments, water bubbles, etc., which is of practical application in underwater vehicles operated in near-bottom areas such as seamounts, rockfall, and nodules.

The Experimental Environment for the Seamount Images
The experimental environment is shown in Table 5. ORB-SLAM3 is the popular visual SLAM method, which is selected for the comparison system.The SLAM operating system is Ubuntu 20.04, and the image-processing operating system is Windows 10.The image processing program software is MatLab R2022a, and the computer CPU is Ryzen7.

Experimental Procedures and Results for the Seamount Images
The experimental tests were conducted using the videos captured by ROV in the seamount area at a frame rate of 15 fps, with 2000 consecutive frames each to form image datasets.ORB-SLAM3 is the famous monocular vision SLAM system being used for comparison.The datasets are processed under proposed visual SLAM systems.The initialization phase, mapping, and trajectory are recorded.The number of keyframes, matched point pairs, and map points are extracted.The analysis was performed according to the experimental results.
The initialization phase is recorded in Figures 10 and 11.The tests are carried out several times, where Test 1 and Test 2 are the two representative tests.Initialization failure occurred in the initialization phase due to the influence of underwater raised sediments under the ORB-SLAM3 system in Figure 10.There are not enough feature points matched to finish the initialization in Figure 10.The feature of the seamount image is not richer than the terrestrial environment.Initialization failure occurs when other dynamic disturbances exist, such as water bubbles and underwater raised sediments.Initialization was rapidly successful due to image pre-processing for the underwater raised sediments under the proposed system in Figure 11.Initialization under the proposed system is easy to succeed in the underwater raised sediments environment.The experimental tests were conducted using the videos captured by ROV in the seamount area at a frame rate of 15 fps, with 2000 consecutive frames each to form image datasets.ORB-SLAM3 is the famous monocular vision SLAM system being used for comparison.The datasets are processed under proposed visual SLAM systems.The initialization phase, mapping, and trajectory are recorded.The number of keyframes, matched point pairs, and map points are extracted.The analysis was performed according to the experimental results.
The initialization phase is recorded in Figures 10 and 11.The tests are carried out several times, where Test 1 and Test 2 are the two representative tests.Initialization failure occurred in the initialization phase due to the influence of underwater raised sediments under the ORB-SLAM3 system in Figure 10.There are not enough feature points matched to finish the initialization in Figure 10.The feature of the seamount image is not richer than the terrestrial environment.Initialization failure occurs when other dynamic disturbances exist, such as water bubbles and underwater raised sediments.Initialization was rapidly successful due to image pre-processing for the underwater raised sediments under the proposed system in Figure 11.Initialization under the proposed system is easy to succeed in the underwater raised sediments environment.Mapping and trajectory in the initialization phase under the proposed system were recorded in Figure 12.Mapping and trajectory in the end phase under the ORB-SLAM3 system were recorded in Figure 13.Mapping and trajectory in the end phase under the proposed system were recorded in Figure 14.Different representative moments were recorded continuously such as moment 1 and moment 2 in Figures 12-14.Mapping and trajectory under the proposed system are more obtained than the ORB-SLAM3 system,   Mapping and trajectory in the initialization phase under the proposed system were recorded in Figure 12.Mapping and trajectory in the end phase under the ORB-SLAM3 system were recorded in Figure 13.Mapping and trajectory in the end phase under the proposed system were recorded in Figure 14.Different representative moments were recorded continuously such as moment 1 and moment 2 in Figures 12-14.Mapping and trajectory under the proposed system are more obtained than the ORB-SLAM3 system, which can be seen from the number of keyframes and map points.The underwater raised sediment images in the datasets processed by the proposed system caused matching point pairs to increase.Increasing matched point pairs provides more accurate tracking, which is proved by more keyframes and mapping points obtained.The number of match point pairs (Matches) under both systems is shown in Figure 15.The number of match point pairs of the proposed system is shown in Figure 15a; the number of match point pairs of the ORB-SLAM3 system is shown in Figure 15b.The statistical results of match point pairs are shown in Table 6.The title of Max is the maximum value of the match point pairs; the title of Min is the minimum value of the match point pairs; the title of Average is the average value of the match point pairs; and the improvement is calculated by the difference value of the two systems divided by the corresponding value of the ORB-SLAM3.The proposed system decreased the impact of the underwater raised sediments on the matching phase.The average of the match point pairs under the proposed system improved by 4.5% compared with ORB-SLAM3.Due to the rapid initialization speed, the running time is long under the same datasets.The improvement of the match point pairs proves the effectiveness of the improved system in reducing the impact of the underwater raised sediments environment.The number of match point pairs (Matches) under both systems is shown in Figure 15.The number of match point pairs of the proposed system is shown in Figure 15a; the number of match point pairs of the ORB-SLAM3 system is shown in Figure 15b.The statistical results of match point pairs are shown in Table 6.The title of Max is the maximum value of the match point pairs; the title of Min is the minimum value of the match point pairs; the title of Average is the average value of the match point pairs; and the improvement is calculated by the difference value of the two systems divided by the corresponding value of the ORB-SLAM3.The proposed system decreased the impact of the underwater raised sediments on the matching phase.The average of the match point pairs under the proposed system improved by 4.5% compared with ORB-SLAM3.Due to the rapid initialization speed, the running time is long under the same datasets.The improvement of the match point pairs proves the effectiveness of the improved system in reducing the impact of the underwater raised sediments environment.The number of mapping points (MPs) and keyframes (KFs) under both systems is shown in Figure 16 and Figure 17, respectively.The statistical results of mapping points and keyframes are shown in Table 7 and Table 8, respectively.The length of available data under the proposed system is 2167, and the length of available data under ORB-SLAM3 is 1171.It means the proposed system has a faster initialization than ORB-SLAM3 with the same datasets captured in the underwater raised sediments environment.The number of mapping points improved by 11.5%, and the number of mapping points improved by 5.2%.More matching point pairs provide more rich information for the mapping phase  The number of mapping points (MPs) and keyframes (KFs) under both systems is shown in Figures 16 and 17, respectively.The statistical results of mapping points and keyframes are shown in Tables 7 and 8, respectively.The length of available data under the proposed system is 2167, and the length of available data under ORB-SLAM3 is 1171.It means the proposed system has a faster initialization than ORB-SLAM3 with the same datasets captured in the underwater raised sediments environment.The number of mapping points improved by 11.5%, and the number of mapping points improved by 5.2%.More matching point pairs provide more rich information for the mapping phase resulting in more mapping points obtained.The number of keyframes has increased less because the selection of keyframes is rigorous.More mapping points and keyframes mean the proposed system has robustness in the underwater raised sediments environment.

Conclusions
The underwater images are classified by the improved color recognition method to reduce the computation of image pre-processing to improve the real-time of the proposed SLAM system.The improved adaptive median filtering is proposed to denoise the classi-

Conclusions
The underwater images are classified by the improved color recognition method to reduce the computation of image pre-processing to improve the real-time of the proposed SLAM system.The improved adaptive median filtering is proposed to denoise the classi-

Conclusions
The underwater images are classified by the improved color recognition method to reduce the computation of image pre-processing to improve the real-time of the proposed SLAM system.The improved adaptive median filtering is proposed to denoise the classified images to reduce the effect of underwater raised sediments on feature matching of the SLAM system to improve the robustness of the proposed SLAM system.The MG of the filtered images is improved by 106.7% compared with those filtered by the traditional adaptive median filtering, SSIM by 4.6%, and PSNR by 1.1%, resulting in more information on the original features being retained compared with the traditional adaptive median filter.The filtered images are finally processed by the tracking module to obtain the trajectory of underwater vehicles and the seafloor maps.The datasets of the video captured from the seamount of the western Pacific Ocean at 3000 m depth are processed in the improved visual SLAM system.Keyframes, mapping points, and feature point matching pairs are extracted from the improved visual SLAM system by 5.2%, 11.2%, and 4.5% compared with ORB-SLAM3, respectively.The improved visual SLAM system has robustness in nearbottom environments such as seamounts, rockfall, and nodules, which is less impacted by dynamic disturbances such as water bubbles and underwater raised sediments.
In future work, the parameters of image classification and image denoising are adaptive for different underwater investigation tasks.The parameters of the proposed method are prior and set by the known underwater operating environments and the mounting locations of cameras.However, it is difficult to cope with an unknown environment underwater.The proposed SLAM is a loose-coupling system, in which the visual localization is realized by the pre-processed images imported into the tracking methods.The tightcoupling system will be studied in future work, in which the impact of underwater raised sediments is reduced by optimizing the feature matching phase within SLAM.

Figure 1 .
Figure 1.No raised sediment environment in the underwater area.(a) The seamount area; (b) the rockfall area; and (c) the nodules area.

Figure 1 .
Figure 1.No raised sediment environment in the underwater area.(a) The seamount area; (b) the rockfall area; and (c) the nodules area.

Figure 2 .
Figure 2. Different degrees of the raised sediments in different areas.(a) Light degree in the seamount area; (b) medium degree in the rockfall area; and (c) heavy degree in the nodules area.

Figure 2 .
Figure 2. Different degrees of the raised sediments in different areas.(a) Light degree in the seamount area; (b) medium degree in the rockfall area; and (c) heavy degree in the nodules area.

JFigure 3 .
Figure 3.The block diagram of the improved underwater visual SLAM system.

Figure 3 .
Figure 3.The block diagram of the improved underwater visual SLAM system.

Figure 4 .
Figure 4.The flowchart of the proposed image classification method.

Figure 4 .
Figure 4.The flowchart of the proposed image classification method.

Figure 5 .
Figure 5.The flowchart of the improved adaptive median filtering method.

Figure 5 .
Figure 5.The flowchart of the improved adaptive median filtering method.
maximum gray value of the window is 171, the maximum gray value of the window is 113, and the median gray value of the window is 118.The mean gray value of the window border is 139.5.The mean of the background grayscale value is , and the standard deviation of the background grayscale value is .According to the Pauta criterion, if , determine the window length and enter the filtering session.Otherwise, add 2 to the window length and perform a new cycle until the window length is determined.The maximum value of the window length is When , the filtering session is entered directly.

Figure 6 .
Figure 6.The pixel value of window length .

Figure 7 .
Figure 7.The sequence of window length .

Figure 6 .
Figure 6.The pixel value of window length n w = 3.

Figure 6 .
Figure 6.The pixel value of window length .

Figure 7 .
Figure 7.The sequence of window length .

Figure 7 .
Figure 7.The sequence of window length n w = 3.

Figure 8 .
Figure 8.The mean gray value of the window border.

Figure 8 .
Figure 8.The mean gray value u w of the window border.

Figure 9 .
Figure 9. Improved adaptive median filtering results in the seamount area.(a) Original color image; (b) original grayscale image; (c) traditional adaptive median filtering; and (d) improved adaptive median filtering.

Figure 9 .
Figure 9. Improved adaptive median filtering results in the seamount area.(a) Original color image; (b) original grayscale image; (c) traditional adaptive median filtering; and (d) improved adaptive median filtering.

Figure 11 .
Figure 11.Initialization success under the proposed system.(a) Initialization success in Test 1.Initialization success in Test 2.

Figure 13 .
Figure 13.Mapping and trajectory in the end phase under the ORB-SLAM3 system.(a) Moment 1.(b) Moment 2.

Figure 13 .Figure 14 .
Figure 13.Mapping and trajectory in the end phase under the ORB-SLAM3 system.(a) Mom (b) Moment 2.

Figure 14 .
Figure 14.Mapping and trajectory in the end phase under the proposed system.(a) Moment 1.(b) Moment 2.

Figure 15 .
Figure 15.The number of match point pairs under both systems.(a) The proposed system.(b) ORB-SLAM3 system.

Figure 15 .
Figure 15.The number of match point pairs under both systems.(a) The proposed system.(b) ORB-SLAM3 system.

Figure 17 .
Figure 17.The number of keyframes under both systems.(a) The proposed system.(b) ORB-SLAM3 system.

Table 1 .
Classification results of the 2000 underwater images in the seamounts area.

Table 2 .
Classification results of the 2000 underwater images in the rockfall area.

Table 3 .
Classification results of the 2000 underwater images in the nodules area.

Table 4 .
Image quality assessment results in the seamount area.

Table 4 .
Image quality assessment results in the seamount area.

Table 5 .
Experimental test environment for the seamount images.

Table 5 .
Experimental test environment for the seamount images.

Table 6 .
Statistical results of match point pairs.

Table 6 .
Statistical results of match point pairs.

Table 7 .
Statistical results of mapping points.

Table 8 .
Statistical results of keyframes.

Table 7 .
Statistical results of mapping points.

Table 8 .
Statistical results of keyframes.

Table 7 .
Statistical results of mapping points.

Table 8 .
Statistical results of keyframes.