Bubble Plume Target Detection Method of Multibeam Water Column Images Based on Bags of Visual Word Features

: Bubble plumes, as main manifestations of seabed gas leakage, play an important role in the exploration of natural gas hydrate and other resources. Multibeam water column images have been widely used in detecting bubble plume targets in recent years because they can wholly record water column and seabed backscatter strengths. However, strong noises in multibeam water column images cause many issues in target detection, and traditional target detection methods are mainly used in optical images and are less efﬁcient for noise-affected sonar images. To improve the detection accuracy of bubble plume targets in water column images, this study proposes a target detection method based on the bag of visual words (BOVW) features and support vector machine (SVM) classiﬁer. First, the characteristics of bubble plume targets in water column images are analyzed, with the conclusion that the BOVW features can well express the gray scale, texture, and shape characteristics of bubble plumes. Second, the BOVW features are constructed following steps of point description extraction, description clustering, and feature encoding. Third, the quadratic SVM classiﬁer is used for the recognition of target images. Finally, a procedure of bubble plume target detection in water column images is described. In the experiment using the measured data in the Strait of Georgia, the proposed method achieved 98.6% recognition accuracy of bubble plume targets in validation sets, and 91.7% correct detection rate of the targets in water column images. By comparison with other methods, the experimental results prove the validity and accuracy of the proposed method, and show potential applications of our method in the exploration and research on ocean resources.


Introduction
As one of the most widely used ocean survey equipment, multibeam sonars can acquire the full coverage of underwater topographic survey and simultaneously construct seabed backscatter strength images. In recent years, recording and processing the whole water column backscatter data using multibeam sonars have become possible, with the rapid improvement of computer processing power and storage capacity [1].
Multibeam water column images have been used to investigate marine biological resources (fish/fish school [2,3], krill [4], and seabed eelgrass [5,6]), study ocean physical changes such as internal waves [7,8], find underwater shipwrecks [9,10], and detect gas clouds [11] and bubble plumes [12][13][14][15] from seabed seeps. Innangi et al. [16] proposed a detection method of fish schools using multibeam water column images and obtained 3D shapes of various fish schools. Hughes Clarke et al. [9,17,18] studied the imaging principle, noise interference, and other basic knowledge of multibeam water column imaging, which laid a foundation for subsequent research. They also used water column image to estimate the minimum depth of a shipwreck (mast peak), which could provide a safety guarantee for shipping. Marques [19] used multibeam water column images to recognize and precisely locate spherical suspended targets and monitor seabed landslide activities. Gardner et al. [20] found a bubble plume with a depth of 1830 m and a height of 1400 m by using water column images collected from Kongsberg EM302 multibeam sonar in the coastal area of northern California. Weber et al. [21] discovered a bubble plume with 1100 m height using water column images from Kongsberg EM302 in the Gulf of Mexico. Various applications, especially in marine resource exploration and environment research of multibeam water column images have been proven by these studies.
Bubble plumes are the main manifestations of seabed gas leakage. The leaked gases could be the natural gas (methane), meaning the potential storage of natural gas hydrates. The gases also could be the mixed gases produced by decomposition of underwater dead plants and animals. Therefore, bubble plumes play important roles in the exploration of important resources (like natural gas hydrate) and the research of underwater environment [22][23][24][25]. How to accurately and efficiently recognize and detect bubble plumes in multibeam water column images is a hot research issue at present [26].
To recognize targets in sonar images, classifying the special target features has been proven as a feasible solution. Many studies have focused on detection and recognition of sonar image targets using classifiers of image features. Dobeck et al. [27] achieved automatic mine target detection and recognition in sonar images by using a k-nearest neighbor classifier of 45 extracted features, such as shape and intensity of mine targets. Tang et al. [28] studied the multi-channel texture classification algorithm by applying wavelet packet and Fourier transform on side-scan sonar images. The experimental results showed that the feature extraction method is vital for image classification. Reed et al. [29] achieved the recognition of sand slope targets in side-scan sonar images based on texture features, standard classifier, and Markov random field. Rhinelander et al. [30] combined feature extraction, edge filter, median filter, and support vector machine (SVM) classifier to achieve target detection and recognition in side-scan sonar images. Song et al. [31] studied the sonar image target segmentation method based on Markov random field and extreme learning machine. Wang [32] studied a target detection algorithm in side-scan sonar images using feature extraction method based on a neutrosophic set and diffusion maps. Moreover, a combination of various feature extraction methods (gray-level co-occurrence matrix, local binary pattern, Gabor, Tamura, multifractal spectrum, and others) and various classifiers (AdaBoost, back-propagation neural network, convolutional neural network, and others) have been applied on various targets in different sonar images.
In conclusion, numerous target feature extraction methods and classifiers have been used for target recognition and detection in sonar images. For various target recognition in different types of sonar images, the optimal features and classifiers are quite different. For the bubble plume target in multibeam water column, the optimal features and classifiers have yet to be studied. Zhao et al. [33] used Haar-like and local binary pattern feature and AdaBoost classifier to recognize the gas plume targets in multibeam water column images. This feature is suitable for vertical and strong-strength bubble plume targets, but recognition accuracy of inclined and weak-strength bubble plume target still needs to be improved.
In recent years, deep learning methods have been widely applied in recognition and detection of sonar image targets [34,35]. Deep learning methods usually need a large number of samples and labels to ensure model accuracy and generality. However, due to the particularity of sonar data collection and the difficulty in finding underwater targets, it is difficult to guarantee the sufficient number of sonar image target samples, which brings problems to the accuracy target recognition [36][37][38]. Another feasible solution is automatic sample augmentation using deep learning methods (like generative adversarial networks) in the case of small or even zero samples [39][40][41]. However, differences remain between the augmented and real samples, and the accuracy improvement of target recognition and detection using augmented samples still needs to be studied. Therefore, in this article, we analyze the characteristics of bubble plume, select an applicable feature extraction algorithm and reasonable classifier, and finally realize the automatic accurate recognition and detection of bubble plume targets in multibeam water column images.

Basic Principle of Target Detection in Multibeam Water Column Images
The multibeam echo sounder system contains multiple sub-systems, including global navigation satellite system (GNSS), motion reference unit (MRU), gyro-magnetic compass, the bathymetric sonar and other auxiliary sensors [42,43]. Multibeam sonars emit sound waves through the projector arrays and then receive the backscatter strengths at multiple angles through the hydrophone arrays. By continuously recording the backscatter strengths from the transducer to the seabed and even under the seabed, we can obtain the observation of the whole water column, the seabed and surface sediments. The bubble plumes are some of the most important targets in the water column image because they could indicate the possible storage of natural gas hydrates or decomposition of underwater dead plants and animals. The bubble plumes usually were leaked from the seabed, rise to a certain height, and then dissipate. When the sound waves travel through these two different propagation medias (i.e., gas and water), the backscatter strengths from the bubbles are stronger than those from surrounding water, as shown in Figure 1.
Remote Sens. 2022, 14, x FOR PEER REVIEW 3 of 26 the augmented and real samples, and the accuracy improvement of target recognition and detection using augmented samples still needs to be studied. Therefore, in this article, we analyze the characteristics of bubble plume, select an applicable feature extraction algorithm and reasonable classifier, and finally realize the automatic accurate recognition and detection of bubble plume targets in multibeam water column images.

Basic Principle of Target Detection in Multibeam Water Column Images
The multibeam echo sounder system contains multiple sub-systems, including global navigation satellite system (GNSS), motion reference unit (MRU), gyro-magnetic compass, the bathymetric sonar and other auxiliary sensors [42,43]. Multibeam sonars emit sound waves through the projector arrays and then receive the backscatter strengths at multiple angles through the hydrophone arrays. By continuously recording the backscatter strengths from the transducer to the seabed and even under the seabed, we can obtain the observation of the whole water column, the seabed and surface sediments. The bubble plumes are some of the most important targets in the water column image because they could indicate the possible storage of natural gas hydrates or decomposition of underwater dead plants and animals. The bubble plumes usually were leaked from the seabed, rise to a certain height, and then dissipate. When the sound waves travel through these two different propagation medias (i.e., gas and water), the backscatter strengths from the bubbles are stronger than those from surrounding water, as shown in Figure 1. In the multibeam water column images, the bubble plume targets are different from the background noises ( Figure 1B). Some characteristics of the bubble plume targets in the water column images can be analyzed and concluded as follows: 1. The bubble plume targets have much brighter grayscale features than the water column image background because the backscatter strengths from the gas bubbles are much stronger than those from the surrounding water; In the multibeam water column images, the bubble plume targets are different from the background noises ( Figure 1B). Some characteristics of the bubble plume targets in the water column images can be analyzed and concluded as follows: 1.
The bubble plume targets have much brighter grayscale features than the water column image background because the backscatter strengths from the gas bubbles are much stronger than those from the surrounding water; 2.
The bubble plume targets have special shape features. These targets are generally plume-like or ribbon-like shapes, which are obviously different from other water column targets (such as fish and fish school);

3.
The bubble plumes also have special orientation features. The bubble plumes usually range from the seabed to a certain height, and are usually approximately perpendicular to the seabed, but may be bent and oblique due to the ocean current effects. Due to side-lobe effects, bubble plumes in the minimum slant range are easier to detect; 4.
Special texture features exist around bubble plume targets. Due to the changes of two different propagation media, the texture features of the bubble plume and surrounding water are quite different.
These feature differences between the bubble plumes and background noises are the basis for the target recognition and detection method in a multibeam water column image.
As analyzed, the bubble plume targets have special grayscale, shape, orientation, and texture features. In the following sections, the bag of visual words (BOVW) features of bubble plume targets are extracted and used to represent these features, and the SVM classifier is used for binary classification of the BOVM features. Then, the recognition and detection procedures of bubble plume targets in the water column image are introduced in detail.

BOVW Features of Multibeam Water Column Images
The BOVW features were originally used for text classification by calculating the frequencies of important words in the text. The BOVW feature applies the same idea into image classification [44]. The images do not contain any concrete words; therefore, visual vocabulary needs to be constructed using feature point extraction methods, such as speeded up robust features (SURF) ( Figure 2B), then the visual vocabulary is established by clustering these SURF descriptors ( Figure 2C), and finally, the BOVW features can be encoded by calculating the frequencies of each visual words in the visual vocabulary ( Figure 2D). These steps are shown in Figure 2 as: 1.
Step 1. All the SURF descriptors (64-dimensional eigenvectors) of interest points from sample images are extracted by SURF detection or uniform-grid-point selection; 2.
Step 2. Due to the excessive number SURFs extracted, the SURFs of all sample images were clustered using k-means to obtain k category; 3.
Step 3. For any image, the SURF points can be extracted using the interest point selection (Step 1) and calculated as 64-dimensional eigenvectors, then these SURF descriptions are classified into each category (Step 3); 4.
Step 4. The occurrence frequencies of all the SURF clustering categories are calculated to form the k-dimensional BOVW feature vector. 2. The bubble plume targets have special shape features. These targets are generally plume-like or ribbon-like shapes, which are obviously different from other water column targets (such as fish and fish school); 3. The bubble plumes also have special orientation features. The bubble plumes usually range from the seabed to a certain height, and are usually approximately perpendicular to the seabed, but may be bent and oblique due to the ocean current effects. Due to side-lobe effects, bubble plumes in the minimum slant range are easier to detect; 4. Special texture features exist around bubble plume targets. Due to the changes of two different propagation media, the texture features of the bubble plume and surrounding water are quite different.
These feature differences between the bubble plumes and background noises are the basis for the target recognition and detection method in a multibeam water column image.
As analyzed, the bubble plume targets have special grayscale, shape, orientation, and texture features. In the following sections, the bag of visual words (BOVW) features of bubble plume targets are extracted and used to represent these features, and the SVM classifier is used for binary classification of the BOVM features. Then, the recognition and detection procedures of bubble plume targets in the water column image are introduced in detail.

BOVW Features of Multibeam Water Column Images
The BOVW features were originally used for text classification by calculating the frequencies of important words in the text. The BOVW feature applies the same idea into image classification [44]. The images do not contain any concrete words; therefore, visual vocabulary needs to be constructed using feature point extraction methods, such as speeded up robust features (SURF) ( Figure 2B), then the visual vocabulary is established by clustering these SURF descriptors ( Figure 2C), and finally, the BOVW features can be encoded by calculating the frequencies of each visual words in the visual vocabulary (Figure 2D). These steps are shown in Figure 2 as: Step 1. All the SURF descriptors (64-dimensional eigenvectors) of interest points from sample images are extracted by SURF detection or uniform-grid-point selection; 2.
Step 2. Due to the excessive number SURFs extracted, the SURFs of all sample images were clustered using k-means to obtain k category; 3.
Step 3. For any image, the SURF points can be extracted using the interest point selection (Step 1) and calculated as 64-dimensional eigenvectors, then these SURF descriptions are classified into each category (Step 3); 4. Step 4. The occurrence frequencies of all the SURF clustering categories are calculated to form the k-dimensional BOVW feature vector.  scale [45]. In multibeam water column image, SURF is also suitable for the detection and description of bubble plume target feature points [46].
The SURF method uses the Hessian matrix H(p, σ) to detect and describe the interest points p(x, y) at scale σ as where L is the convolution of the Gaussian second-order derivative with the image. As shown in Figure 3, detected SURF points from the target images and background noises are significantly different in feature scale, orientation and distribution, which proves the effectiveness of SURF points as the visual words of water column images. However, affected by different kinds of noises, detected SURF points from different water column images could vary greatly under the same Hessian threshold. If SURF detection is used to find interest points, some features of the target and background images could be ignored. Therefore, we use the uniform-grid-point selection to ensure that all the features in the target and background image can be extracted ( Figure 2B). The visual vocabulary of BOVW features need to be constructed by clustering the extracted point features from all different categories of sample images. SURF, as a scale invariant feature, can be used as both detectors and descriptors of interest points at any scale [45]. In multibeam water column image, SURF is also suitable for the detection and description of bubble plume target feature points [46].
The SURF method uses the Hessian matrix , to detect and describe the interest points p(x, y) at scale σ as where L is the convolution of the Gaussian second-order derivative with the image. As shown in Figure 3, detected SURF points from the target images and background noises are significantly different in feature scale, orientation and distribution, which proves the effectiveness of SURF points as the visual words of water column images. However, affected by different kinds of noises, detected SURF points from different water column images could vary greatly under the same Hessian threshold. If SURF detection is used to find interest points, some features of the target and background images could be ignored. Therefore, we use the uniform-grid-point selection to ensure that all the features in the target and background image can be extracted ( Figure 2B). . SURF differences between bubble plume target and background image in multibeam water column image (Hessian threshold was set as 250). The radiuses of these circles indicate the scales of these detected points, and the lines in these circles mean the main directions of these features.
The SURF descriptors are used to describe the uniform-grid selected points in the water column images. Before feature description, the main orientation of the descriptor is set as upright because no rotation is applied in the water column images. The steps for extracting SURF descriptors are listed as follows: 1. Select the regions around key points and divide them into 4 × 4 small regions; 2. Calculate Four features (dx, dy, |dx| and |dy|) of the sampling point corresponding to the Haar response in each small region; 3. Construct and normalize the 64-dimensional (4 × 4 × 4) eigenvector of each key point. Therefore, all the SURF descriptors of the uniform-grid selected points in the image can be described using the normalized 64-dimensional eigenvectors ( Figure 2B). The grid size would determine the number of extracted points, which would be discussed in the experimental section. The SURF descriptors are used to describe the uniform-grid selected points in the water column images. Before feature description, the main orientation of the descriptor is set as upright because no rotation is applied in the water column images. The steps for extracting SURF descriptors are listed as follows:

1.
Select the regions around key points and divide them into 4 × 4 small regions; 2.
Calculate Four features (dx, dy, |dx| and |dy|) of the sampling point corresponding to the Haar response in each small region; 3.
Construct and normalize the 64-dimensional (4 × 4 × 4) eigenvector of each key point. Therefore, all the SURF descriptors of the uniform-grid selected points in the image can be described using the normalized 64-dimensional eigenvectors ( Figure 2B). The grid size would determine the number of extracted points, which would be discussed in the experimental section.

Clustering SURF Descriptors
The visual vocabulary of water column images is constructed based on SURF descriptors of uniform-grid selected points. For sample sets containing many images, all of the SURF descriptors (i.e., the 64-dimensional eigenvector) of each image need to be extracted and summarized to obtain the special BOVW features that can describe the bubble plume targets. Moreover, these extracted SURF descriptors need to be clustered to limit the BOVW feature dimension. k-means, as an efficient clustering method, is suitable for clustering these SURF descriptors. The steps are the following: 1.
The Euclidean distance L j between each SURF 64-dimensional eigenvector v and the centroids c(j) is calculated as 3.
Each eigenvector is assigned to the nearest centroid and the k centroids are recalculated; 4.
Steps 2 and 3 are repeated until all the cluster assignments are stable ( Figure 4B).
scriptors of uniform-grid selected points. For sample sets containing many images, all of the SURF descriptors (i.e., the 64-dimensional eigenvector) of each image need to be extracted and summarized to obtain the special BOVW features that can describe the bubble plume targets. Moreover, these extracted SURF descriptors need to be clustered to limit the BOVW feature dimension. k-means, as an efficient clustering method, is suitable for clustering these SURF descriptors. The steps are the following: 1. The k-mean++ algorithm is used to initialize k centroids c(j) (j = 1, …, k); 2. The Euclidean distance Lj between each SURF 64-dimensional eigenvector v and the centroids c(j) is calculated as 3. Each eigenvector is assigned to the nearest centroid and the k centroids are recalculated; 4. Steps 2 and 3 are repeated until all the cluster assignments are stable ( Figure 4B).
The clustering number k determines the visual vocabulary number and the dimension of BOVW features. Considering the noises affecting water column images, an extremely small visual vocabulary number will lead to the confusion of different features, and an extremely large visual vocabulary number will cause a huge waste of calculation resources. Therefore, determining k is important and would affect the final recognition accuracy, which would be discussed in the experimental section.

Image Coding Using BOVW Feature
In the previous section, a k-size visual vocabulary is obtained by clustering all the visual words (SURF eigenvectors). For a single image, the BOVW feature is the k-size vector by counting the occurrence frequencies of all the visual words in the vocabulary. The histograms of visual words are shown in Figure 5. The clustering number k determines the visual vocabulary number and the dimension of BOVW features. Considering the noises affecting water column images, an extremely small visual vocabulary number will lead to the confusion of different features, and an extremely large visual vocabulary number will cause a huge waste of calculation resources. Therefore, determining k is important and would affect the final recognition accuracy, which would be discussed in the experimental section.

Image Coding Using BOVW Feature
In the previous section, a k-size visual vocabulary is obtained by clustering all the visual words (SURF eigenvectors). For a single image, the BOVW feature is the k-size vector by counting the occurrence frequencies of all the visual words in the vocabulary. The histograms of visual words are shown in Figure 5.
To make BOVW features suitable for following target recognition and detection, the BOVW feature [x 1 , x 2 , . . . x i , . . . , x k ] needs to be normalized using L 2 normalization, as where y is the corresponding normalized vector and k is the vocabulary size.
The SURF descriptors extracted from bubble plume targets and background noise image are obviously different in scales and orientations ( Figure 3); therefore, BOVW features as bags of these SURF descriptors can effectively represent grayscale, shape, orientation, and texture features of bubble plume targets. In the following sections, BOVW features are used for target recognition and detection. To make BOVW features suitable for following target recognition and detection, the BOVW feature [x1, x2, … xi, …, xk] needs to be normalized using L2 normalization, as where y is the corresponding normalized vector and k is the vocabulary size. The SURF descriptors extracted from bubble plume targets and background noise image are obviously different in scales and orientations ( Figure 3); therefore, BOVW features as bags of these SURF descriptors can effectively represent grayscale, shape, orientation, and texture features of bubble plume targets. In the following sections, BOVW features are used for target recognition and detection.

Bubble Plume Recognition Using Support Vector Machine
Based on the SURF descriptor differences between bubble plume target and background noise images, the recognition of bubble plume targets can be done by classification of the BOVW features. Taking the target images as positive samples and the background images as negative samples, recognition becomes binary classification. In this section, a SVM classifier is used to binarily classify the BOVW features of bubble plumb target and background noise image. The overall procedure is shown as Figure 6.

Bubble Plume Recognition Using Support Vector Machine
Based on the SURF descriptor differences between bubble plume target and background noise images, the recognition of bubble plume targets can be done by classification of the BOVW features. Taking the target images as positive samples and the background images as negative samples, recognition becomes binary classification. In this section, a SVM classifier is used to binarily classify the BOVW features of bubble plumb target and background noise image. The overall procedure is shown as Figure 6.

Support Vector Machine
As a kernelized method, selection of kernel functions seriously affects the classification accuracy of the SVM classifier. To process the extracted BOVW features (k-dimensional eigenvectors), our work uses the nonlinear SVM classifier using the quadratic polynomial kernel. For any eigenvectors xi and xj, the quadratic polynomial kernel function κ is: Based on the SVM dual problem, SVM classification of the BOVW eigenvectors con- eigenvectors), our work uses the nonlinear SVM classifier using the quadratic polynomial kernel. For any eigenvectors x i and x j , the quadratic polynomial kernel function κ is: Based on the SVM dual problem, SVM classification of the BOVW eigenvectors consists of the following steps: 1.
Selection of the quadratic polynomial kernel function, where the SVM problem could be converted to the convex quadratic programming problem as follows: 2.
Based on the sequential minimal optimization (SMO) algorithm, the optimal solution is 3.
Selection of α j * as one component of α * , satisfying 0 < α j * < C (C is the hyperparameter, called box constraint, to avoid overfitting). Then, we calculate 4.
The kernel function is used to replace the inner product, and the quadratic SVM becomes The hyperparameter box constraint and feature scale also affect the SVM classification accuracy and need to be updated during SVM training.
After training, the SVM can be used for prediction of input images. The binary loss is a function of the class label and classification score that determines how well a binary learner classifies an image into the class. The prediction score P is calculated using Hinge binary learner loss function which provides a negated average binary loss in a domain as (-∞, ∞).
where, y j is a class label for the SVM binary classifier in the set (-1,1), and f (j) is the model score for input image j.

Recognition Procedure of Bubble Plume Targets
To recognize the bubble plume targets in the water column images, we carefully establish the positive and negative sample sets, construct the visual vocabulary based on clustering SURF points, and encode BOVW features of each sample image. Then, the SVM binary classifier is trained and used to recognize the target sample images. The flow diagram of bubble plume target recognition is shown in Figure 7. Only training image samples are used in establishing the visual vocabulary.
To recognize the bubble plume targets in the water column images, we carefully establish the positive and negative sample sets, construct the visual vocabulary based on clustering SURF points, and encode BOVW features of each sample image. Then, the SVM binary classifier is trained and used to recognize the target sample images. The flow diagram of bubble plume target recognition is shown in Figure 7. Only training image samples are used in establishing the visual vocabulary.

Recognition Accuracy Assessment
The prediction results of the target recognition model only contain two results, namely, the bubble plume target images or background noise images; therefore, the confusion matrix can be used to describe model accuracy for binary classification, as shown in Table 1. The accuracy assessments using the confusion matrix are listed in Table 2.

. Precise Target Localization
Based on the recognition method, the flow diagram of bubble plume target detection is shown as Figure 8.
TP and FP indicate that positive samples are identified as positive and negative samples, respectively, while TN and FN indicate that negative samples are identified as negative and positive samples, respectively.

Precise Target Localization
Based on the recognition method, the flow diagram of bubble plume target detection is shown as Figure 8. A moving search window (Figure 8b) is used to traverse the water column image (Figure 8a), and the score of each window image is predicted by the recognition model based on the BOVW feature and SVM classifier. The size of the moving search window is close to the sample image size, and the moving step was set as 1/4 of the search window size. After traverse, the bubble target is covered by many search windows (Figure 8c), and the outline rectangle of these detection windows can be extracted (as the green boundary in Figure 8d). The initial detection of the bubble plume target is obtained, and more precise detection results (Figure 8g) are processed by precise localization (Figure 8d-f).
The precise localization method is intended to achieve more accurate detection boundary by gradually shortening the boundary range of the detection frame to obtain higher model prediction scores. The steps are as follows: 1.
First, gradually shrink the left boundary to the right, calculate the prediction scores of the reduced images, and select the maximum-score position as the final left boundary; 2.
Second, gradually shrink the right boundary to the left, calculate the scores of the reduced images, and select the maximum-score position as the final right boundary as the red boundary in Figure 8e; 3.
Third, based on the above detection boundary, gradually shrink the top boundary to the bottom, calculate the scores of the reduced images, and select the zero-score position as the final bottom boundary;

4.
Finally, gradually shrink the bottom boundary to the top, calculate the scores of the reduced images, and select the zero-score position as the final top boundary as the yellow boundary in Figure 8f.
According to the shape and orientation characteristics of the bubble plume target, the prediction score gradually increases as the initial detected left/right boundary shrinks to the right/left, but gradually decreases when the target begins to get lost in the image (after the maximum score). As to the top and bottom boundaries, the prediction score does not change significantly when the image shrinks in the top-bottom direction. However, when the image shrinks exceeding the bottom/top boundary and does not contain any targets, the score rapidly decreases to the zero value.
For the multiple-target case, independent detection areas can be distinguished and obtained based on the connectivity of detection frames. In each independent detection area, accurate detection of these targets can be achieved using the aforementioned method.

Detection Accuracy
Intersection over Union (IoU) is a common standard for measuring the detection accuracy of image targets. IoU measures the correlation between ground truth and predicted bounding boxes. Higher correlation means higher detection accuracy.
For ground-truth box as A and predicted box as B, the IoU of A and B can be calculated as Taking 0.5 as the IoU threshold, the correct detection can be defined as The correct detection rate is calculated using the correct detected target number N c divided by the total target number N:

Experiments and Results
To verify the validity and performance of the proposed method in this study, the measured multibeam water column data in the Strait of Georgia in 2012 were used for bubble target recognition and detection in the experiment, as shown in Figure 9. The multibeam sonar used was Kongsberg EM 710 with an operating frequency of 73-81 kHz, across-track beam aperture of approximately 130 • and fixed beam number of 128 was recorded in the measurement. In the experiment, our method was implemented using MATLAB codes and built-in functions, as explained in Appendix A. The experiment consists of two parts:

1.
BOVW feature extraction and training and validation of SVM classifier. During data preparation, the measured multibeam data were used to construct the water column images. The images containing bubble plume target and only background noises were extracted as positive and negative samples and distributed in the training and validation sample sets. Then, the BOVW features were extracted from these images and the SVM classifier was trained using these features; 2.
Bubble target detection in water column images. Based on the recognition model using BOVW and SVM, the precise detection method of bubble plume targets was applied to detect all of the bubble plume targets in the water column images of the EM 710 multibeam sonar to prove the validity and generality of our detection method.
validation sample sets. Then, the BOVW features were extracted from these images and the SVM classifier was trained using these features; 2. Bubble target detection in water column images. Based on the recognition model using BOVW and SVM, the precise detection method of bubble plume targets was applied to detect all of the bubble plume targets in the water column images of the EM 710 multibeam sonar to prove the validity and generality of our detection method.

BOVW Feature Extraction and Classification
In this experiment, the recorded multibeam water column data (in *.wcd and *.all file) were decoded based on EM data format. Then, the water column images were constructed using the water column data packets.

Sampling from Multibeam Water Column Images
Positive samples which contain the bubble plume targets, and negative samples which contains only background noises, are manually extracted and labeled to establish

BOVW Feature Extraction and Classification
In this experiment, the recorded multibeam water column data (in *.wcd and *.all file) were decoded based on EM data format. Then, the water column images were constructed using the water column data packets.

Sampling from Multibeam Water Column Images
Positive samples which contain the bubble plume targets, and negative samples which contains only background noises, are manually extracted and labeled to establish the sample set, as shown in Figure 10. To avoid the problem of sample imbalance, the numbers of positive and negative samples are the same as 700. The size of each sample image was unified as 256 × 256 pixels. The positive samples included partial or whole bubble plume targets in water column images. These positive samples were identified and labeled manually. The negative samples contained various noise background images and some other targets (like fishes) which were not discussed in this study. Meanwhile, because the water column images are fan-shaped, the edge parts of the fan-shaped images were also considered in the positive and negative samples to avoid the impact of edge SURF points on visual vocabulary construction.

Visual Vocabulary Construction and BOVW Feature Encoding
The visual vocabulary is the basis for encoding BOVW features. To construct the visual vocabulary, the total sample set was divided into the training and validation set in a ratio of 7:3. The visual vocabulary was constructed based on the image data of the training set to achieve the BOVW feature encoding of both the training and validation sets.
The SURF point descriptions of all the images in the training set need to be extracted according to the reasonable selection method, and then all SURF points were clustered to The positive samples included partial or whole bubble plume targets in water column images. These positive samples were identified and labeled manually. The negative samples contained various noise background images and some other targets (like fishes) which were not discussed in this study. Meanwhile, because the water column images are fan-shaped, the edge parts of the fan-shaped images were also considered in the positive and negative samples to avoid the impact of edge SURF points on visual vocabulary construction.

Visual Vocabulary Construction and BOVW Feature Encoding
The visual vocabulary is the basis for encoding BOVW features. To construct the visual vocabulary, the total sample set was divided into the training and validation set in a ratio of 7:3. The visual vocabulary was constructed based on the image data of the training set to achieve the BOVW feature encoding of both the training and validation sets.
The SURF point descriptions of all the images in the training set need to be extracted according to the reasonable selection method, and then all SURF points were clustered to construct the visual vocabulary. Considering the sample image size as 256 × 256 px, several different SURF point extraction methods using 8 × 8, 16 × 16, 32 × 32 grids and SURF detector with different vocabulary word number k were selected and compared in the experiment, respectively, and the corresponding results were listed and shown in Table 3. The training and validation accuracy were calculated using the same SVM classifier.
The training and validation accuracy of each SURF point extraction method using the quadratic SVM classifier were calculated, and the validation accuracies were sorted out as Figure 11. Table 3 shows that different SURF point extraction methods have significant effects on the accuracy of the recognition model. In the grid extraction methods, with the increase in the grid size, the number of extracted SURF points gradually decreased, but the model accuracies may increase or decrease, indicating the particular importance of that reasonable feature scale. The detector method discovered feature points through the SURF detection algorithm. When the vocabulary word number was set as 300, the training and validation accuracies were both less than 0.5, which showed the difficulty of finding enough feature points in noise-affected sonar images by SURF algorithm. The training and validation accuracy were calculated using the same SVM classifier. Figure 11. Validation accuracies of recognition model using BOVW feathers with different SURF point selection method.

SVM Classifier Training and Validation
To train an SVM classifier, the kernel function needs to be specified first. Different classifiers were used to classify the BOVW features obtained in Section 3.1.2, and the validation accuracy, training time, and prediction speed of each classifier were computed and listed in Table 4. By comparison, the quadratic SVM classifier had the highest validation accuracy, reasonable training time, and prediction speed. Therefore, the quadratic function was chosen as the SVM kernel function.  Figure 11. Validation accuracies of recognition model using BOVW feathers with different SURF point selection method.
By comparison, the grid selection method using 32 × 32 px grids was used in the experiments, and the vocabulary word number k was set as 300.

SVM Classifier Training and Validation
To train an SVM classifier, the kernel function needs to be specified first. Different classifiers were used to classify the BOVW features obtained in Section 3.1.2, and the validation accuracy, training time, and prediction speed of each classifier were computed and listed in Table 4. By comparison, the quadratic SVM classifier had the highest validation accuracy, reasonable training time, and prediction speed. Therefore, the quadratic function was chosen as the SVM kernel function. After determining the kernel function, the hyperparameters, including the box constraint and feature scale, also need to be optimized. The hyperparameter box constraint (also called soft margin) controls the maximum imposed penalty of observed values that violate the margins to prevent model overfitting. If the box constraint is larger, the SVM classifier allocates fewer support vectors, resulting in longer training times.
SVM classifiers are also sensitive to the feature scales. The hyperparameter Feature Scale is used to scale the feature parameters, which means that all the elements of the BOVW features should be divided by the Feature Scale value. During the training process, the hyperparameters of the SVM classifier were optimized by finding the minimum model validation loss when constantly changing the Box Constraint and Feature Scale hyperparameter values, as shown in Figure 12. After training, the SVM model was used to predict the image data, and the results of some water column images were listed in Table 5. As shown in the table, these images which contained bubble plume targets (Table 5a-d) had more valid SURF points and more After training, the SVM model was used to predict the image data, and the results of some water column images were listed in Table 5. As shown in the table, these images which contained bubble plume targets (Table 5a-d) had more valid SURF points and more abundant visual words. Noise background images (Table 5f-g) usually had fewer valid SURF points and the corresponding visual words were relatively simple.  Table 5c,e, the noise backgrounds were quite similar, so the BOVW features of these two images were also similar to each other. However, obvious differences existed in the frequencies of special visual words (i.e., near index 190) between these two images, which proved the validity of the BOVW features in the recognition of the bubble plume targets. By calculation, the prediction accuracies of the training and validation set were 0.99 and 0.98, respectively.  Table 5c,e, the noise backgrounds were quite similar, so the BOVW features of these two images were also similar to each other. However, obvious differences existed in the frequencies of special visual words (i.e., near index 190) between these two images, which proved the validity of the BOVW features in the recognition of the bubble plume targets. By calculation, the prediction accuracies of the training and validation set were 0.99 and 0.98, respectively.  Table 5c,e, the noise backgrounds were quite similar, so the BOVW features of these two images were also similar to each other. However, obvious differences existed in the frequencies of special visual words (i.e., near index 190) between these two images, which proved the validity of the BOVW features in the recognition of the bubble plume targets. By calculation, the prediction accuracies of the training and validation set were 0.99 and 0.98, respectively.  Table 5c,e, the noise backgrounds were quite similar, so the BOVW features of these two images were also similar to each other. However, obvious differences existed in the frequencies of special visual words (i.e., near index 190) between these two images, which proved the validity of the BOVW features in the recognition of the bubble plume targets. By calculation, the prediction accuracies of the training and validation set were 0.99 and 0.98, respectively.  Table 5c,e, the noise backgrounds were quite similar, so the BOVW features of these two images were also similar to each other. However, obvious differences existed in the frequencies of special visual words (i.e., near index 190) between these two images, which proved the validity of the BOVW features in the recognition of the bubble plume targets. By calculation, the prediction accuracies of the training and validation set were 0.99 and 0.98, respectively.  Table 5c,e, the noise backgrounds were quite similar, so the BOVW features of these two images were also similar to each other. However, obvious differences existed in the frequencies of special visual words (i.e., near index 190) between these two images, which proved the validity of the BOVW features in the recognition of the bubble plume targets. By calculation, the prediction accuracies of the training and validation set were 0.99 and 0.98, respectively.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14. For images in Table 5c,e, the noise backgrounds were quite similar, so the BOVW features of these two images were also similar to each other. However, obvious differences existed in the frequencies of special visual words (i.e., near index 190) between these two images, which proved the validity of the BOVW features in the recognition of the bubble plume targets. By calculation, the prediction accuracies of the training and validation set were 0.99 and 0.98, respectively.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14.

Automatic Detection of Bubble Plume Target in Water Column Image
After verifying BOVW features and SVM classifier in recognition of bubble plume targets, we carried out the target detection experiment of water column images based on the proposed procedure (Figure 8).
Fixed and floating anchors are typically used in the detection of the possible targets to find the initial positions, as shown in Figure 13. The usage of fixed anchors (Figure 13a) may result in missing detection of small targets or losing parts of large targets. The usage of floating anchors can effectively avoid these problems, but may result in large initial detection boxes (Figure 13c). Therefore, further precise target localization is needed, as shown in Figure 14. As shown in Figure 14c,e, the prediction score of the image gradually increased by constantly narrowing the detection box in the left/right directions. When the target boundary was reached, the score reached the maximum value, then started to decrease. When the detection box reached the other boundary of the target, the score would be close to zero because the image contained only background noises. Thus, the position of the max- As shown in Figure 14c,e, the prediction score of the image gradually increased by constantly narrowing the detection box in the left/right directions. When the target boundary was reached, the score reached the maximum value, then started to decrease. When the detection box reached the other boundary of the target, the score would be close to zero because the image contained only background noises. Thus, the position of the maximum prediction score indicated the left and right boundary of the target detection box. Moreover, the maximum-score position in the left/right direction should correspond to the zero-score position in the right/left direction.
After determining the left and right boundaries, the top and bottom boundaries could be further obtained. Due to the orientation of the bubble plume target, the top/bottom boundaries could not be determined by the minimum value of the score curves, as shown in Figure 14b,f. Thus, the position of the top/bottom boundary should be determined by the minimum value of the bottom/top score curves. The final detection box is shown in Figure 14d, with prediction score 3.49 much larger than the initial score 0.88.
More water column images from the experimental data were processed by our detection method, and the detection results were obtained in Figure 15. As shown in Figure 15, the bubble plume targets of different sizes and backscatter strengths had been correctly detected. Taking the IoU value not less than 0.5 as the correct results, we calculated the correct detection rate of bubble plume targets in all the testing water column images as 91.7%. Therefore, the experimental results prove the validity and correctness of our detection method. the minimum value of the bottom/top score curves. The final detection box is shown in Figure 14d, with prediction score 3.49 much larger than the initial score 0.88. More water column images from the experimental data were processed by our detection method, and the detection results were obtained in Figure 15. As shown in Figure  15, the bubble plume targets of different sizes and backscatter strengths had been correctly detected. Taking the IoU value not less than 0.5 as the correct results, we calculated the correct detection rate of bubble plume targets in all the testing water column images as 91.7%. Therefore, the experimental results prove the validity and correctness of our detection method.

Discussion
In this section, our recognition and detection method of bubble plume targets in water column images was compared with other methods.

Discussion
In this section, our recognition and detection method of bubble plume targets in water column images was compared with other methods.

Feature and Classifier Comparison
To compare with the proposed method, various combinations of feature extraction methods and classifiers were used for the recognition of bubble plume targets in the water column images. These extracted features included gray-level cooccurrence matrix (GLCM), Tamura, local binary pattern (LBP), histogram of oriented gradient (HOG), the combination of these features, and Haar-LBP [33]. The results are shown in Table 6. The results in Table 6 showed that the GLCM features could achieve the accuracy of 0.824 by combining steps 1 px and 5 px, the Tamura features achieved an accuracy of 0.845 when all six features were used, and the HOG features could reach 0.897 accuracy using 16 × 16 pixel calculation units. However, the recognition accuracies still need to be improved when using these feature descriptions of bubble plume targets.
The combination of LBP features and cubic SVM could reach 0.941 recognition accuracy when selecting 32 × 32 pixel computing units, indicating that LBP features could effectively express the direction characteristics of bubble plume targets. By combining LBP with Haar features, the precision ratio could be 0.994, but the accuracy was slightly improved to 0.958 due to the insufficient recognition accuracy of negative samples (noise background). This result showed that the applicability of LBP directional features on noise background images needed further improvement. The method proposed in our study could obtain a precision ratio of 0.993 and accuracy of 0.986, thereby proving the advantage of BOVW features in recognizing the bubble plume targets.

Detection Result Comparison
To compare with the previous method [33], we processed the same water column image using histogram similarity detection method and our method to obtain the following results, as shown in Figure 16.

Detection Result Comparison
To compare with the previous method [33], we processed the same water column image using histogram similarity detection method and our method to obtain the following results, as shown in Figure 16. As shown in Figure 16B, the two main problems of the detection method based on histogram similarities were the fixed width of detection boxes and the miss detection of targets in low-echo-intensity areas, resulting in the IoU value less than 0.5. By using a more reasonable detection strategy, our method avoided these two problems and obtained the IoU value as 0.85. Our detection result was very close to the manual groundtruth result. The comparison results further proved the effectiveness of our proposed detection method.

Advantage Compared with Deep Learning Methods
Deep learning methods are widely applied in the recognition and detection of various image targets, but the high accuracies of deep learning methods need to be guaranteed by a large amount of data samples and corresponding manual labels. For targets in multibeam water column images including bubble plumes, establishing a large number of sample data could be very difficult, which would limit the accuracies of deep learning recognition and detection models.
In our method, based on the characteristics of bubble plume targets, we only need to extract several hundreds of sample data to achieve high-accuracy recognition and detection results. Our method is highly suitable for bubble plume target detection in new water areas without existing sample data. For other types of water column image targets, the proposed method can also be easily retrained according to the special features of these targets.

Using SURF to Detect the Target
To discuss the ability of SURF detection on targets under noise interference in water column images, we directly using SURF method to detect interest points in the image ( Figure 17A), and the detection results were obtained as Figure 17B. Moreover, with restrictions on feature direction (near left or right) on scale, the detection results as Figure  17C were obtained. In Figure 17C. the detected SURF points were all around the targets, which proved that conventional SURF with reasonable parameters can extract effectively feature points in water column images. As shown in Figure 16B, the two main problems of the detection method based on histogram similarities were the fixed width of detection boxes and the miss detection of targets in low-echo-intensity areas, resulting in the IoU value less than 0.5. By using a more reasonable detection strategy, our method avoided these two problems and obtained the IoU value as 0.85. Our detection result was very close to the manual ground-truth result. The comparison results further proved the effectiveness of our proposed detection method.

Advantage Compared with Deep Learning Methods
Deep learning methods are widely applied in the recognition and detection of various image targets, but the high accuracies of deep learning methods need to be guaranteed by a large amount of data samples and corresponding manual labels. For targets in multibeam water column images including bubble plumes, establishing a large number of sample data could be very difficult, which would limit the accuracies of deep learning recognition and detection models.
In our method, based on the characteristics of bubble plume targets, we only need to extract several hundreds of sample data to achieve high-accuracy recognition and detection results. Our method is highly suitable for bubble plume target detection in new water areas without existing sample data. For other types of water column image targets, the proposed method can also be easily retrained according to the special features of these targets.

Using SURF to Detect the Target
To discuss the ability of SURF detection on targets under noise interference in water column images, we directly using SURF method to detect interest points in the image ( Figure 17A), and the detection results were obtained as Figure 17B. Moreover, with restrictions on feature direction (near left or right) on scale, the detection results as Figure 17C were obtained. In Figure 17C. the detected SURF points were all around the targets, which proved that conventional SURF with reasonable parameters can extract effectively feature points in water column images.

Ghost Targets and Targets outside the Minimum Slant Range
Besides the real bubble plume targets, ghost targets and targets outside the minimum slant range (MSR) also need to be paid attention, as shown in Figure 18.
The ghost targets are not really existing and are caused by the side lobe, presenting a similar shape to the target. However, due to the orientation of these, ghost targets are quite different; SURFs of ghost targets are different from those of real targets. Therefore, our method using BOVW features achieved good recognition rate on ghost targets.
Our method mainly focuses on the targets inside MSR, because the backscatter samples outside MSR are heavily affected by side lobes. The detection of targets outside MSR is quite a challenge, which would be studied in our future works.

Application on Other Multibeam Water Column Data
The multibeam water column images measured by different sonar in different water conditions could be quite different. The multibeam water column data (Cruise ID: EX1402L3) [47] measured in Gulf of Mexico, 2014, were chosen for this discussion, as shown in Figure 19A. The multibeam sonar was Kongsberg EM 302 with an operating frequency of 26.5-31.7 kHz, across-track beam aperture of 130° and fixed beam number of 288 was used. Due to different frequencies and different water conditions, the extracted features of the water column image ( Figure 19B) are quite different from those of the data in our experimental section. The trained model using multibeam data of EM 710 sonar cannot be directly used for data measured using EM 302. To recognize and detect targets in this measurement, the BOVW features and SVM model need to be re-trained. Moreover, how different sonars and water conditions affect the accuracy of our method would be our future research direction.

Targets outside MSR
Ghost target Real target Ghost target

Ghost Targets and Targets outside the Minimum Slant Range
Besides the real bubble plume targets, ghost targets and targets outside the minimum slant range (MSR) also need to be paid attention, as shown in Figure 18.

Ghost Targets and Targets outside the Minimum Slant Range
Besides the real bubble plume targets, ghost targets and targets outside the minimum slant range (MSR) also need to be paid attention, as shown in Figure 18.
The ghost targets are not really existing and are caused by the side lobe, presenting a similar shape to the target. However, due to the orientation of these, ghost targets are quite different; SURFs of ghost targets are different from those of real targets. Therefore, our method using BOVW features achieved good recognition rate on ghost targets.
Our method mainly focuses on the targets inside MSR, because the backscatter samples outside MSR are heavily affected by side lobes. The detection of targets outside MSR is quite a challenge, which would be studied in our future works.

Application on Other Multibeam Water Column Data
The multibeam water column images measured by different sonar in different water conditions could be quite different. The multibeam water column data (Cruise ID: EX1402L3) [47] measured in Gulf of Mexico, 2014, were chosen for this discussion, as shown in Figure 19A. The multibeam sonar was Kongsberg EM 302 with an operating frequency of 26.5-31.7 kHz, across-track beam aperture of 130° and fixed beam number of 288 was used. Due to different frequencies and different water conditions, the extracted features of the water column image ( Figure 19B) are quite different from those of the data in our experimental section. The trained model using multibeam data of EM 710 sonar cannot be directly used for data measured using EM 302. To recognize and detect targets in this measurement, the BOVW features and SVM model need to be re-trained. Moreover, how different sonars and water conditions affect the accuracy of our method would be our future research direction. The ghost targets are not really existing and are caused by the side lobe, presenting a similar shape to the target. However, due to the orientation of these, ghost targets are quite different; SURFs of ghost targets are different from those of real targets. Therefore, our method using BOVW features achieved good recognition rate on ghost targets.
Our method mainly focuses on the targets inside MSR, because the backscatter samples outside MSR are heavily affected by side lobes. The detection of targets outside MSR is quite a challenge, which would be studied in our future works.

Application on Other Multibeam Water Column Data
The multibeam water column images measured by different sonar in different water conditions could be quite different. The multibeam water column data (Cruise ID: EX1402L3) [47] measured in Gulf of Mexico, 2014, were chosen for this discussion, as shown in Figure 19A. The multibeam sonar was Kongsberg EM 302 with an operating frequency of 26.5-31.7 kHz, across-track beam aperture of 130 • and fixed beam number of 288 was used. Due to different frequencies and different water conditions, the extracted features of the water column image ( Figure 19B) are quite different from those of the data in our experimental section. The trained model using multibeam data of EM 710 sonar cannot be directly used for data measured using EM 302. To recognize and detect targets in this measurement, the BOVW features and SVM model need to be re-trained. Moreover, how different sonars and water conditions affect the accuracy of our method would be our future research direction.

Conclusions
Based on the characteristics of bubble plume targets in multibeam water column images, our proposed method extracts the BOVW features of the targets and uses the SVM classifier for target recognition. On this basis, we further propose a precise target detection method of bubble plume targets in water column images. Through experiments on the measured data, the validity and correctness of each step in the target recognition method were verified, including sample set establishment, parameter selection of BOVW features, and SVM classifier optimization. The validation accuracy of our recognition method was 98.6%. In the target detection experiment, based on the high-accuracy recognition model, the accurate detection rate of bubble target targets in the water column images was 91.7% and most of the detection results were close to the manual ground truths. Compared with various combinations of features and classifiers, the advantages of BOVW and SVM in target recognition was proved. Compared with the previous detection method, the proposed method solved the existing problems and effectively improved the target detection accuracy. The proposed recognition and detection method can also be applied to more types of water column image targets, and also has significance to the exploration and research of underwater resources.  Acknowledgments: The Guangzhou Marine Geological Survey Bureau provided the experimental data in this study. The authors appreciate their support.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The proposed method was implemented using MATLAB codes and built-in functions. The raw multibeam data were decoded using our written MATLAB code based on EM data format document. Then multibeam water column images were constructed using

Conclusions
Based on the characteristics of bubble plume targets in multibeam water column images, our proposed method extracts the BOVW features of the targets and uses the SVM classifier for target recognition. On this basis, we further propose a precise target detection method of bubble plume targets in water column images. Through experiments on the measured data, the validity and correctness of each step in the target recognition method were verified, including sample set establishment, parameter selection of BOVW features, and SVM classifier optimization. The validation accuracy of our recognition method was 98.6%. In the target detection experiment, based on the high-accuracy recognition model, the accurate detection rate of bubble target targets in the water column images was 91.7% and most of the detection results were close to the manual ground truths. Compared with various combinations of features and classifiers, the advantages of BOVW and SVM in target recognition was proved. Compared with the previous detection method, the proposed method solved the existing problems and effectively improved the target detection accuracy. The proposed recognition and detection method can also be applied to more types of water column image targets, and also has significance to the exploration and research of underwater resources.