Multi-Visual Feature Saliency Detection for Sea-Surface Targets through Improved Sea-Sky-Line Detection

: To visually detect sea-surface targets, the objects of interest must be e ﬀ ectively and rapidly isolated from the background of sea-surface images. In contrast to traditional image detection methods, which employ a single visual feature, this paper proposes a signiﬁcance detection algorithm based on the fusion of multi-visual features after detecting the sea-sky-lines. The gradient edges of the sea-surface images are enhanced using a Gaussian low-pass ﬁlter to eliminate the e ﬀ ect of the image gradients pertaining to the clouds, wave points, and illumination. The potential region and points of the sea-sky-line are identiﬁed. The sea-sky-line is ﬁtted through polynomial iterations to obtain a sea-surface image containing the target object. The saliency subgraphs of the high and low frequency, gradient texture, luminance, and color antagonism features are fused to obtain an integrated saliency map of the sea-surface image. The saliency target area of the sea surface is segmented. The e ﬀ ectiveness of the proposed method was veriﬁed. The average detection rate and time for the sea-sky-line detection were 96.3% and 1.05 fps, respectively. The proposed method outperformed the existing saliency models on the marine obstacle detection dataset and Singapore maritime dataset, with mean absolute errors of 0.075 and 0.051, respectively.


Introduction
With the development of computer vision technology, the role of cameras as an imaging technique is becoming increasingly more important, and cameras are being widely used in unmanned ships to provide reliable information to realize intelligent decision-making. The ocean images can be divided into three parts: sky, sea, and sea-sky-line. The sea-sky-line is a connecting line between the sky and the sea background. It is often visually formed by an outline of the sky and the gray scale of the sea and sky background. When an object appears in a camera's field of view, it is usually near the sea-sky-line. With a decrease in the distance to the target, the target gradually appears to be within the sea surface. The range of target detection can be considerably reduced by extracting the sea-sky-line information and conducting offshore target detection near the sea-sky-lines. Moreover, the complexity and amount of computation in the algorithm can be reduced. Therefore, by detecting the sea-sky-lines, image segmentation can be realized. In this manner, different detection and tracking strategies can be applied to different regions, and the robustness of the detection method can be improved.

Sea-Sky-Line Detection
Sea-sky-lines can provide valuable reference information for the obstacle avoidance systems of unmanned surface vehicles (USVs), as the obstacles that threaten the safety of USVs, such as ships and rocks, are generally located below the level of the sea-sky-line. The existing methods detect sea-sky-lines using only gray, textural, or line features in an optical image. However, the background features in the images obtained by onboard cameras are complex and change continuously over time, which leads to the poor robustness of these methods [1][2][3][4].
The existing methods to detect sea-sky-lines usually involve the following steps. First, the image is pre-processed to reduce the image noise. Subsequently, the characteristics around the sea-sky-line are strengthened and extracted to determine the approximately location of the sea-sky-line. Finally, the location of the ocean skyline is determined using the threshold method or linear fitting method. To detect the position of sea-sky-lines, researchers have attempted to use the Hough transform [5,6], Random sample consensus (RANSAC) line fitting [7], Radon transform [8], and Canny methods [9]. However, these methods depend strongly on the background complexity and are thus easily affected by clouds, waves, and floating objects.
Wang et al. [5] proposed a sea-sky-line detection algorithm based on the gradient saliency and region growth. The gradient saliency calculation effectively improved the characteristics of the sea-sky-line and suppressed the influence of complex sea conditions such as clouds and sea clutter. Dai et al. [10] proposed an edge detection algorithm based on a local Otsu segmentation and Hough transform, which solved the problem of poor global threshold segmentation. In addition, researchers have attempted to use the information entropy and histograms [11] to strengthen the features of the region around the sea-sky-line. However, large ships on the sea may affect the gray value of the entire image, thereby affecting the results of histogram analysis methods. In addition, the computational complexity of the information entropy may not be able to be satisfied in real scenes. In addition, most of the sea-sky-line methods can only be applied to infrared images. Jian et al. [12] proposed a method based on gradient smoothing and bimodal histogram analyses to improve the robustness and accuracy of sea-sky-line detection. Recently, machine learning [13,14] has been applied to achieve satisfactory results in image processing tasks. However, to implement such methods, a large number of images must be collected in advance for training. Moreover, practical application scenarios involve unpredictable factors such as insufficient light intensity, inclement weather, presence of ships, and undulations on the sea surface, and the pre-trained model may not be able achieve satisfactory results under such conditions.

Saliency Detection
The visual attention mechanism of saliency detection is similar to that of the attention detection of the human eye, through which the visually "attention" region can be automatically extracted from an image or video. In recent work, saliency detection has been considered a specific task to be performed in a top-down manner by assuming the existence of prior knowledge or certain constraints regarding the scene [15]. Such task-driven approaches are particularly suited to realize the identification and retrieval of known objects, as such processes require knowledge learning or accumulation, which increase the complexity of the saliency detection. Consequently, visual attention mechanisms involving adaptable bottom-up algorithms are being widely examined.
Bottom-up saliency models can be classified as spatial [16,17] or spectral [18,19] models based on the domain extraction of the features. Itti et al. [20] obtained salient images by using the difference in the center-periphery of images, based on the color, intensity, and directional features of the images. Arya et al. [21] employed a double-density double-tree complex wavelet transform along with hyperpixel segmentation to realize saliency target detection. Chen et al. [22] used the spatial and time clues in the image as local constraints and developed a spatial constraint optimization model to realize video image saliency detection and global saliency optimization. Zhang et al. [23] established a saliency target detection model for a deep convolutional network and adopted a multi-scale fusion structure to obtain high-precision saliency target detection results. Wang [24] proposed a spatial-deep-learning significant target detection model for videos, which involved a full convolutional network to effectively detect the significant regions in video streams. Singh et al. [25] developed a model using a convolutional encoder-decoder to realize the significant target detection in noisy images. In general, in the case of spatial saliency models [26,27], the salient features of the objects can be effectively identified in simple scenes. However, such methods exhibit an inferior performance in complex scenes [28]. In other words, a single-image-related underlying feature cannot highlight the saliency objects in the image, and a linear combination of these features must be employed [29].
Overall, the aforementioned algorithms have achieved reasonable results in the corresponding research fields; however, it remains challenging to attain a high detection accuracy for sea-sky-lines and saliency detection in the case of complex backgrounds. To address these problems, in this work, considering the principle of bottom-up image saliency detection, a saliency detection algorithm employing multi-visual features of sea-surface images was developed, based on improved sea-sky-line detection. In this approach, the sea-sky-line in a sea surface image is iteratively fit through the image gradient integral curve, and the multi-vision features are fused to build the attention mechanism detection model for the sea surface image. The forecast sea-sky-line areas are extracted from the image gradient integral image and the prediction point for the sea-sky-line is identified. In particular, the sea-sky-line is fitted through polynomial iterations. In this manner, comprehensive saliency detection can be realized based on the fusion of multiple visual features. It is expected that the obtained sea-surface image saliency map can highlight the target saliency of the detected image, thereby improving the detection accuracy and realizing the fast and accurate detection in sea surface images.
The remaining paper is organized as follows. Sections 2 and 3 describe the proposed sea-sky-line detection algorithm and visual saliency detection model, respectively. Section 4 presents the results of the performed comparative experiments. Section 5 presents the concluding remarks.

Sea-Sky-Line Detection
Owing to the photorefractive reflection and absorption of the evaporating water on the sea surface, a thin mist is present at the gradual transition zone of the sea-sky-line, leading to an insignificant boundary gradient. Furthermore, certain natural environmental phenomena such as waves, clouds, and bright bands generate strong interference gradient edge features. To detect sea-sky-lines, an obvious saliency must be ensured between the sea surface and sky, which is identifiable in the overall analysis of the image. Therefore, in the proposed approach, a Gaussian low-pass filter is used to enhance the image gradient and construct an integral image of the gradient image. Subsequently, the potential region of the sea-sky-line is identified through the integration curve of the integral image block. Finally, within the potential region, the preselected points of the sea-sky-line are predicted. Polynomial iteration is performed to finally fit the sea-sky-line. In this manner, the sea surface image containing the target object can be obtained.

Smooth Filtering Gradient Image
A Gaussian low-pass filter [30] template is used to smooth the filtering and enhance the saliency of the gradient image. The sea surface image f (x, y) is assumed to be gradient processed, and the gradient image f T (x, y) is obtained [31]. The Gaussian image fill parameters P and Q are selected as: where m and n denote the length and width of the input image, respectively. Next, the image is zero-filled to obtain an image f p (x, y) sized P × Q. The centering function (−1) (x+y) f p (x, y) is considered to move the image f p (x, y) to the center of the transformation.
Subsequently, the Fourier transform is implemented on the image to obtain the Fourier transform image F p (x, y), as follows: Next, we calculate the product of the symmetric filter function H(x, y) and the spectral image G(x, y): G(x, y) = H(x, y)F p (x, y).
where D(µ, ν) = (µ − µ 0 ) 2 + (ν − ν 0 ) 2 and (µ, ν) and (µ 0 , ν 0 ) denote the coordinates of the pixel and center point in the image F p (x, y), respectively. D o is the low-pass filter cutoff frequency. Finally, the Fourier inverse transformation is obtained for the image G(x, y). The real part is extracted and multiplied with (−1)ˆ(x + y) to implement the inverse-center transform to obtain the smooth filtering gradient image f h (x, y).

Determination of the Potential Areas for Sea-Sky-Lines
An integral image is constructed through the smoothed gradient image f h (x, y). Using Equation (5), the integral image can be determined using the sum of the gradient values of all the pixels in any area of the gradient image: where i and j denote the row and column coordinates of the gradient image f h (x, y), respectively, and m and n denote the row and column coordinates of image J, respectively. Moreover, 1 ≤ m ≤ M, 1 ≤ n ≤ N, M and N denote the number of rows and columns of image f h , respectively. If the length and width of the area of the sea-sky-line are L and H, respectively, the maximum size of the external rectangular frame of the sea-sky-line is: where θ is the inclination of the sea-sky-line, and through the experiment, its maximum value is assumed to be 20 o . We slide from the bottom to the top along the integration image considering the rectangular frame. The gradient accumulation value S(i) is considered to be in the statistical box to formulate an array S. In this case, the largest value P in the array corresponds to the integration area, which is the potential area of the sea-sky-line. S(i) can be expressed as:

Iterative Fitting of the Sea-Sky-Line Curve
After identifying the potential region of the sea-sky-line, the optimal location of the sea-sky-line must be determined. Because the gradient value at the sea-sky-line position is usually higher than that at the upper and lower points of it, the pixel with the largest gradient value in each column in the potential region of the image must be identified. These preselected points of the sea-sky-line form the set Y, Y = y i , i = 1, 2, . . . , N . Inevitably, certain errors exist in the sea-sky-line integration point, owing to the influence of the noise point. Therefore, to fit the sea-sky-line accurately, Y polynomial iterative fitting must be performed to eliminate the large error points in the preselected points and obtain the accurate position of the sea-sky-line. The specific process is as follows.
The n-th order polynomials are used for fitting. X = {1, 2, . . . , N} and Y are used to obtain the fitting function f 1 = (x, y) and the fitted set of coordinates [X; Y ].
The pixels with a difference larger than the threshold σ in the coordinate set [Y; Y ] are eliminated, and the coordinate set of the preserved pixels is X; Y .
For the set of preserved coordinates, the fitting and rejection processes specified in Equations (1) and (2), respectively, are repeated until the difference in the newly fitted Y value and previously fitted value is smaller than the threshold σ pixel. The iterations are stopped, and the sea-sky-line corresponding to the image fit curve f n = (x, y) is output. Figure 1 shows the typical images captured from a USV. The image can be split into three semantic regions that are roughly stacked, indicating that a structural relation exists between the regions. The focus is on regions 2 and 3 , in which obstacles may be present. After detecting the sea-sky-line, the significance detection model combining multiple visual features is used to obtain the surface objects.

Significance Detection Model for the Multi-Visual Feature Fusion
owing to the influence of the noise point. Therefore, to fit the sea-sky-line accurately, Y polynomial iterative fitting must be performed to eliminate the large error points in the preselected points and obtain the accurate position of the sea-sky-line. The specific process is as follows.
The n-th order polynomials are used for fitting.

{ }
1, 2,..., X N = and Y are used to obtain the fitting function ( ) 1 , f x y = and the fitted set of coordinates The pixels with a difference larger than the threshold σ in the coordinate set  are eliminated, and the coordinate set of the preserved pixels is For the set of preserved coordinates, the fitting and rejection processes specified in Equations (1) and (2), respectively, are repeated until the difference in the newly fitted ' Y value and previously fitted value is smaller than the threshold σ pixel. The iterations are stopped, and the sea-sky-line corresponding to the image fit curve Figure 1 shows the typical images captured from a USV. The image can be split into three semantic regions that are roughly stacked, indicating that a structural relation exists between the regions. The focus is on regions ② and ③, in which obstacles may be present. After detecting the sea-sky-line, the significance detection model combining multiple visual features is used to obtain the surface objects. Owing to the different sizes, shapes and colors of the sea surface targets, a single feature cannot be used to obtain a sufficiently descriptive saliency map. Therefore, the attention subgraphs of multivisual features should be fused. Specifically, a wavelet transform, Gaussian filter and color space transform can be employed to obtain the saliency subgraphs of the high and low frequencies of the target, gradient texture features and luminance and color antagonism features, respectively. The feature subgraphs can be fused using a weighted linear strategy to obtain a comprehensive salient Owing to the different sizes, shapes and colors of the sea surface targets, a single feature cannot be used to obtain a sufficiently descriptive saliency map. Therefore, the attention subgraphs of multi-visual features should be fused. Specifically, a wavelet transform, Gaussian filter and color space transform can be employed to obtain the saliency subgraphs of the high and low frequencies of the target, gradient texture features and luminance and color antagonism features, respectively. The feature subgraphs can be fused using a weighted linear strategy to obtain a comprehensive salient image. Finally, the target region can be segmented through a significant region growth segmentation strategy. The process flow of the specific algorithm is in Figure 2.

Wavelet Transform to Extract the Frequency Saliency Subgraph
Owing to the difference in the features of sea objects, the Haar wavelet transform is used to decompose the sea surface image F h (x, y) to obtain the high and low frequency features F g (x, y) and F d (x, y), respectively. In the case of the high-frequency features, the logarithm of the image is considered to obtain the logarithmic spectrum image L g F g , which contains the high-frequency information in the image. The logarithmic spectrum M L g F g is obtained by smoothing the logarithmic spectrum with the mean template H( f ) and subtracting the logarithmic spectrum from the mean spectrum to obtain the spectral residuals S g F g as follows: where M L g F g = H( f ) * L g F g . H( f ) is the n × n mean value of the filter template, and * represents the convolution operation.
According to Equation (9), we can sum the spectral residuals S g F g and phase spectra P g F g and use the fast Fourier inversion to obtain the wavelet transform related high-frequency characteristic saliency map S XB (x, y). Specifically, S XB (x, y) can be defined as: image. Finally, the target region can be segmented through a significant region growth segmentation strategy. The process flow of the specific algorithm is in Figure 2.

Wavelet Transform to Extract the Frequency Saliency Subgraph
Owing to the difference in the features of sea objects, the Haar wavelet transform is used to decompose the sea surface image as follows: ( ) H f is the n n × mean value of the filter template, and * represents the convolution operation.
According to Equation (9), we can sum the spectral residuals 3.2. Improved Gabor Filtering to Obtain the Directional Feature Saliency Subgraph

Improved Gabor Filtering to Obtain the Directional Feature Saliency Subgraph
Although the edge information in the directional feature map can be extracted through traditional Gabor filtering, the overall significance of the target is lost. In this work, to realize the directional feature extraction, the exponential function was used instead of the Gabor function. In this case, the exponential function is as follows: where, (x, y) denote the pixel coordinates, and θ ∈ 0 • , 45 • , 90 • , 135 • . σ x and σ y are the scale factors in the x and y directions, respectively. The convolution operation is performed between the sea surface image F h (x, y) and exponential function g θ (x, y) to obtain the feature subgraphs pertaining to different directions O θ (x, y). These subgraphs are linearly combined obtain the feature map for the different directions O(x, y). The significance value of the directional feature graph is calculated using Equations (1) and (2), and the frequency significance subgraph is: O(x, y) =

Gradient Texture Feature Saliency Subgraph
The gradient image reflects the areas in the image that involve notable variations in the edge and texture. Therefore, the gradient significant subgraph S TD (x, y) is obtained using the gradient texture spectra T D (x, y) that was described in Section 3.1 for obtaining the gradient saliency subgraph.

Color Spatial Feature Saliency Subgraph
The luminance and color antagonism channel feature images for each pixel in the sea surface image F h (x, y) are obtained according to the R, G, B color components of each pixel. Specifically, through the experimental data, we obtain the feature images for luminance I, based on the color antagonism channels R − G,R − B and 2B − R − G: where (r, g, b) is the color information of a single pixel, and the color feature component of each pixel is R(i) = (2r − g − b)/I(i), G(i) = (2g − r − b)/I(i) and B(i) = (2b − r − g)/I(i), respectively. The saliency subgraphs were obtained by calculating the significant values of the characteristic maps of the luminance, S I (x, y), and color antagonism channels, S rg (x, y), S rb (x, y) and S 2brg (x, y).

Fusion and Segmentation of the Multi-Visual Feature Salient Graph
Image fusion must be performed to obtain salient images with different features. Herein, we use the normalized linear combination to obtain the multi-visual feature saliency comprehensive graph S(x). The comprehensive graph can be expressed as: where i=1,...,N ω i = 1 denotes the weight of each feature, and S i (x, y) is the feature salient subgraph.
In this experiment, the luminance and color feature subgraphs have larger weights, and the other saliency subgraphs have smaller weights. Compared to the traditional method, in which adaptive threshold segmentation is performed to obtain binary images and the target area is direct identified based on the binary images, the proposed method is simpler and involves a higher false detection rate. The significant area growth strategy [32] is applied to segment the salient image, to effectively obtain the target image from the salient graph.

Experiments
This section describes the experimental validation of the proposed approach. The experimental analysis was performed in two parts. In the first part, we evaluated and analyzed the sea-sky-line detection performance. In the second part, we evaluated and analyzed the performance of the multi-visual feature significance detection for sea-surface objects and compared it with that of the alternative methods. The marine obstacle detection dataset (MOOD) [33] and Singapore maritime dataset (SMD) [34] were selected to realize the simulation testing with an image resolution of 640 × 480. All the experiments were performed on a desktop PC Intel (R)Core(TM)i5-6500 CPU 3.2 GHz processor with MATLAB 2018b on a WIN10 (64-bits) operating system.

Sea-Sky-Line Detection Performance
To validate the proposed sea-sky-line detection model, we randomly selected 150 images from the MOOD dataset and video sets from the SMD dataset to compare the sea-sky-line detection performance of the proposed method with that of the Hough algorithm [35], gradient enhancement + Hough algorithm and semantic segmentation based obstacle image-map estimation algorithm (SSM) [36]. The sea-sky-line fitting polynomial was set as 5, and the error threshold of the image pixel was σ = 3. In general, in the case of the initial sea-sky-line scene images, the influence of factors such as cloud waves, clouds or complex background on the sea-sky-line can be alleviated through filtering by gradient image smoothing. When considerable noise is present in the image, gradient smoothing can help remove the discrete noise. In addition, implementing image pre-processing involving gradient smoothing can strengthen the boundary contrast between the sea and sky regions. The edge saliency of the region surrounding the sea-sky-line is enhanced, and the detection performance is considerably improved. Figure 3 (column 5) shows the gradient significance integral curve of the image. The area corresponding to the peak of the curve corresponds to the row coordinates of the potential region of the sea-sky-line. Furthermore, the proposed method can realize the overall classification of the potential sea-sky-line region and obtain the estimation point of the sea-sky-line. Through the polynomial iterative fitting, the sea-sky-line can be fitted correctly, and the fitted curve can accurately reflect the real position of the sea-sky-line. Figure 3 (column 2) shows the area of the sea-sky-line obtained using the Hough transform method. In the transformation process, the longest straight-line segment obtained is used to fit the sea-sky-line. Under a low contrast or strong illumination, or the presence of clouds or edge of the obstacles in the sea, the Hough transform can produce numerous straight-line segments, thereby introducing an uncertainty in the sea-sky-line detection. Therefore, the detection results are not satisfactory. The Hough transform combined with the gradient saliency enhancement algorithm can better negate the effect of the background and effectively enhance the edge of the sea-sky-line. The detection results are better than those of the single Hough transform, as shown in Figure 3 (column 3). The SSM algorithm exhibits a high detection speed; however, the Markov chain is considerably dependent on the edge of the previous image, leading to a poor robustness. The sea-sky-line information may be easily lost in the case of jitters, as shown in Figure 3 (column 1). In comparison, the proposed method is more effective and can eliminate the effect of interferences such as cloud waves and sea clutter to correctly detect the position of the sea-sky-line. Table 1 presents the average detection rate and time in the evaluation of the sea-sky-line detection performance of the four algorithms. The sea-sky-line detection rate of the proposed method is significantly higher than that of the other three algorithms, with the lowest average detection time and highest detection accuracy. These values can satisfy practical application requirements. Therefore, as described in Section 4.2, the proposed method was used to identify the sea-sky-lines and segment the sea surface images. Subsequently, an experiment was performed to compare the visual saliency detection results. detection results are better than those of the single Hough transform, as shown in Figure 3 (column 3). The SSM algorithm exhibits a high detection speed; however, the Markov chain is considerably dependent on the edge of the previous image, leading to a poor robustness. The sea-sky-line information may be easily lost in the case of jitters, as shown in Figure 3 (column 1). In comparison, the proposed method is more effective and can eliminate the effect of interferences such as cloud waves and sea clutter to correctly detect the position of the sea-sky-line.  Table 1 presents the average detection rate and time in the evaluation of the sea-sky-line detection performance of the four algorithms. The sea-sky-line detection rate of the proposed method is significantly higher than that of the other three algorithms, with the lowest average detection time and highest detection accuracy. These values can satisfy practical application requirements. Therefore, as described in Section 4.2, the proposed method was used to identify the sea-sky-lines and segment the sea surface images. Subsequently, an experiment was performed to compare the visual saliency detection results.

Visual Detection Performance
After realizing the sea-sky-line detection segmentation, the proposed method was compared qualitatively and quantitatively with the Random Walk with Restart on Video(RWRV) [28], Attention based on Information Maximization (AIM) [37], Spatiotemporal attention detection(SD) [38], Context-Aware Saliency Detection(CA) [39], Discriminative Regional Feature Integration Approach(DRFI) [40], Spatiotemporal Cues Approach (SC) [41], histogram-based contrast method (HC) [42] and Frequency-tuned Salient Region Detection FT [43] approaches. The parameters of all these models were set according to the publicly available code of the algorithms. Figure 4 shown the saliency maps obtained using the proposed algorithm and other algorithms to enable a qualitative comparison. Column 1 in Figure 4 corresponds to the sea-sky-line images, with the red lines indicating the sea-sky-line. The videos (representative frames shown in Column 1) cover a variety of sea-surface targets. Column 2 shows the sea surface image obtained after sea-sky-line detection segmentation.

Visual Detection Performance
After realizing the sea-sky-line detection segmentation, the proposed method was compared qualitatively and quantitatively with the Random Walk with Restart on Video(RWRV) [28], Attention based on Information Maximization (AIM) [37], Spatiotemporal attention detection(SD) [38], Context-Aware Saliency Detection(CA) [39], Discriminative Regional Feature Integration Approach(DRFI) [40], Spatiotemporal Cues Approach (SC) [41], histogram-based contrast method (HC) [42] and Frequency-tuned Salient Region Detection FT [43] approaches. The parameters of all these models were set according to the publicly available code of the algorithms. Figure 4 shown the saliency maps obtained using the proposed algorithm and other algorithms to enable a qualitative comparison. Column 1 in Figure 4 corresponds to the sea-sky-line images, with the red lines indicating the sea-sky-line. The videos (representative frames shown in Column 1) cover a variety of sea-surface targets. Column 2 shows the sea surface image obtained after sea-sky-line detection segmentation.
The following observations can be made (1) The saliency detection results for the video frame images with a strong contrast between the target object and background (e.g., rows 1 and 2) are satisfactory. The target information in the saliency feature image is highlighted. Therefore, when a strong contrast exists between the foreground and background, the features of the target can be easily detected. (2) The performance is relatively weak in the presence of a low contrast or complex background.
The RWRV, SC, HC, SD and FT approaches are strongly influenced by the sky background and highlight the sky feature information in the saliency map, as shown in rows 3, 4, 7 and 8. Moreover, the RWRV, AIM, DRFI and FT approaches exhibit a poor robustness against the interference of sea waves. Notable features of sea waves are present in the saliency map, as shown in rows 5, 6 and 13. In the case of small target objects in the sea surface images, the target image may be lost in the saliency maps owing to the influence of the background, as in the case of the SD algorithm in rows 6 and 7. (3) The proposed algorithm can capture the foreground salient objects more faithfully in the test cases.
The target features are prominent in the saliency map. Moreover, the approach is robust against the background interference from the waves and sky. For example, the proposed algorithm achieved a high performance in the case of objects with multiple appearance color information (e.g., rows 1-4), exhibiting relatively high scene complexities. Moreover, the proposed approach can detect small and distinct regions (rows 9, 10, 12, 13).
Owing to the use of the saliency region growth strategy for the attention mechanism, the proposed approach can extract the entire saliency target, as shown in Figure 5. The saliency map generated using the proposed method is more visually consistent with the shape, size and location of the ground truth segmentation map than those generated by the other methods. The saliency map tends to highlight the outline of the object regions, and a small amount of the internal information is merged into the foreground.
To further evaluate the performance of the proposed method, we evaluated the results based on two widely used criteria, namely, the precision-recall (PR) curve and mean absolute errors (MAEs) [24]. In the precision-recall analysis, the precision is defined as the percentage of the salient pixels correctly assigned, and the recall corresponds to the fraction of the detected salient pixels in relation to the ground truth number of the salient pixels. For each saliency map, the PR curve is obtained using 256 PR pairs, generated by normalizing the threshold from 0 to 1.
The F-measure is a measure of the overall performance, computed using the weighted harmonic of the precision and recall: where we set β 2 = 0.3 to weigh the precision. For each saliency map, we derived a sequence of the F-measure values along the PR curve, with the threshold varying from 0 to 1. For further comparison, we evaluated the MAE between a continuous saliency map S and the binary ground truth G for all image/frame pixels, as follows: where N is the number of image/frame pixels. The MAE estimates the approximation degree between the saliency map and ground truth, normalized as [0, 1]. The MAE provides a better estimate of the dissimilarity between the saliency map and ground truth. The resulting PR curves are shown in Figure 6a,c for the two data sets. The trend of the PR curves for all the methods is consistent. When the threshold is close to 1, the recall values of the AIM, FT, CA and SC are extremely small and decrease to zero. The proposed method exhibits the highest performance, with a precision rate of more than 0.9, which indicates that this method is more precise and responsive to the salient regions. Moreover, the curvature rate of the PR curve of the proposed approach, exhibiting a high recall and high precision, is smaller than that of the curves of the other methods. In addition, the proposed saliency method achieves the highest precision rates, which demonstrates that these saliency maps are more precise and responsive to the actual salient information. The MAE results are presented in Figure 6b,d. The proposed method exhibits the smallest MAE, corresponding to the best performance among all the other approaches. These findings indicate that the proposed method can realize a global optimization for the salient object detection. Owing to the use of the saliency region growth strategy for the attention mechanism, the proposed approach can extract the entire saliency target, as shown in Figure 5. The saliency map generated using the proposed method is more visually consistent with the shape, size and location of the ground truth segmentation map than those generated by the other methods. The saliency map tends to highlight the outline of the object regions, and a small amount of the internal information is merged into the foreground. To further evaluate the performance of the proposed method, we evaluated the results based on two widely used criteria, namely, the precision-recall (PR) curve and mean absolute errors (MAEs) [24]. In the precision-recall analysis, the precision is defined as the percentage of the salient pixels correctly assigned, and the recall corresponds to the fraction of the detected salient pixels in relation to the ground truth number of the salient pixels. For each saliency map, the PR curve is obtained using 256 PR pairs, generated by normalizing the threshold from 0 to 1. For further comparison, we evaluated the MAE between a continuous saliency map S and the binary ground truth G for all image/frame pixels, as follows: where N is the number of image/frame pixels. The MAE estimates the approximation degree between the saliency map and ground truth, normalized as [0, 1]. The MAE provides a better estimate of the dissimilarity between the saliency map and ground truth. The resulting PR curves are shown in Figure 6a,c for the two data sets. The trend of the PR curves for all the methods is consistent. When the threshold is close to 1, the recall values of the AIM, FT, CA and SC are extremely small and decrease to zero. The proposed method exhibits the highest performance, with a precision rate of more than 0.9, which indicates that this method is more precise and responsive to the salient regions. Moreover, the curvature rate of the PR curve of the proposed approach, exhibiting a high recall and high precision, is smaller than that of the curves of the other methods. In addition, the proposed saliency method achieves the highest precision rates, which demonstrates that these saliency maps are more precise and responsive to the actual salient information. The MAE results are presented in Figure 6b,d. The proposed method exhibits the smallest MAE, corresponding to the best performance among all the other approaches. These findings indicate that the proposed method can realize a global optimization for the salient object detection.

Conclusions
To address the problems of saliency target detection in maritime images, this paper proposes a saliency detection method based on improved sea-sky-line detection with multi-visual features for sea-surface images. First, the image gradient integration curve is used to estimate the potential seasky-line feature points and identify the sea-sky-line in the sea surface image through polynomial iterative fitting. Subsequently, to process the multiple features in sea images, a saliency detection model based on multi-visual features is used to highlight the target and background contrast, to

Conclusions
To address the problems of saliency target detection in maritime images, this paper proposes a saliency detection method based on improved sea-sky-line detection with multi-visual features for sea-surface images. First, the image gradient integration curve is used to estimate the potential sea-sky-line feature points and identify the sea-sky-line in the sea surface image through polynomial iterative fitting. Subsequently, to process the multiple features in sea images, a saliency detection model based on multi-visual features is used to highlight the target and background contrast, to facilitate the saliency detection. Comparative experiments indicated that the proposed algorithm can promptly and accurately segment the sea-sky-line in images. The obtained sea surface image saliency maps can efficiently extract the target saliency of the sea surface image, as indicated by the PR curve and MAEs. The proposed approach can provide guidance for object recognition and localization. Moreover, the proposed framework is highly generalized and can be extended to other maritime image analysis problems.
Future work will be aimed at correlating effective filters with high-level content (e.g., type of movement and object identity). In addition, the effect of white waves (e.g., waves at the stern of the ship, shown in Figure 5) on the saliency detection and segmentation should be eliminated while filtering the wave images to maintain the image details.