Ship Detection in Panchromatic Optical Remote Sensing Images Based on Visual Saliency and Multi-Dimensional Feature Description

Ship detection in panchromatic optical remote sensing images is faced with two major challenges, locating candidate regions from complex backgrounds quickly and describing ships effectively to reduce false alarms. Here, a practical method was proposed to solve these issues. Firstly, we constructed a novel visual saliency detection method based on a hyper-complex Fourier transform of a quaternion to locate regions of interest (ROIs), which can improve the accuracy of the subsequent discrimination process for panchromatic images, compared with the phase spectrum quaternary Fourier transform (PQFT) method. In addition, the Gaussian filtering of different scales was performed on the transformed result to synthesize the best saliency map. An adaptive method based on GrabCut was then used for binary segmentation to extract candidate positions. With respect to the discrimination stage, a rotation-invariant modified local binary pattern (LBP) description was achieved by combining shape, texture, and moment invariant features to describe the ship targets more powerfully. Finally, the false alarms were eliminated through SVM training. The experimental results on panchromatic optical remote sensing images demonstrated that the presented saliency model under various indicators is superior, and the proposed ship detection method is accurate and fast with high robustness, based on detailed comparisons to existing efforts.


Introduction
Ships are important targets of real-time monitoring and wartime attacks at sea, and their accurate and fast detection can play a key role in the analysis of enemy situations, precision guidance, and military mapping. Ships also play an irreplaceable role in rescue, the safety management of fishing vessels, and so on. However, automatic detection is prone to false alarm and missed detection due to complex interference such as shooting weather, sea surface clutter, cloud, fog occlusion, and uneven illumination. How ships can be quickly and reliably detected in and extracted from panchromatic remote sensing images (with one channel) has become a critical issue [1][2][3].
At present, the ship detection method mainly includes three main stages [4][5][6][7]: the separation of sea and land, the region of interest (ROI) location, and target feature description and false alarm elimination, and the last two steps have received an increasing amount of attention. In the stage of locating ROIs (our interest areas of the suspicious ship in the image), gray statistical features, edge detection, and visual saliency are the most popular tools. The first two kinds of algorithm are simple to implement, but when sea conditions become complicated because the wind and waves are strong, the contrast of Fourier transform is executed. Gaussian filtering of different scales is performed on the transformed result to synthesize the best saliency map. Finally, an adaptive binary segmentation of ROIs based on GrabCut (interactive foreground extraction using iterated graph cuts) is used to locate candidates. The presented saliency model has a high detection accuracy and obtains more complete target contours, regardless of the variety of scenes. In the last stage, the rotation-invariant modified LBP feature is constructed to make up for the poor discrimination of different regions caused by the lack of contrast description in the original LBP. At the same time, shape, texture, and moment-invariant features are extracted and concatenated as a feature description for SVM classification to eliminate false alarms and identify targets.
The novel method makes the following contribution: 1) A novel saliency map is proposed based on multiple features of quaternion transform, and it is shown to obtain more complete ROIs under complex sea backgrounds. The gray distribution of the saliency map is more uniform, which improves the accuracy of the detection algorithm for panchromatic optical remote sensing images.
2) In the ROI extraction stage, we propose an adaptive segmentation algorithm based on GrabCut to obtain a more accurate binary region.
3 The rest of this paper is structured as follows: Section 2 mainly introduces the method of extracting ROIs based on saliency map detection. Section 3 introduces the rotation-invariant MLBP The novel method makes the following contribution: (1) A novel saliency map is proposed based on multiple features of quaternion transform, and it is shown to obtain more complete ROIs under complex sea backgrounds. The gray distribution of the saliency map is more uniform, which improves the accuracy of the detection algorithm for panchromatic optical remote sensing images. (2) In the ROI extraction stage, we propose an adaptive segmentation algorithm based on GrabCut to obtain a more accurate binary region. (3) The rotation-invariant modified LBP (MLBP) feature is employed to make up for the contrast description of different regions. (4) Using shape, texture, and moment-invariant features to construct a strong target description operator achieves better performance in both detection accuracy and efficiency. (5) It is an efficient and convenient model for hardware transplantation and engineering applications on the basis of ensuring detection accuracy.
The rest of this paper is structured as follows: Section 2 mainly introduces the method of extracting ROIs based on saliency map detection. Section 3 introduces the rotation-invariant MLBP features and Remote Sens. 2020, 12, 152 4 of 24 other combined features for false alarm elimination. Experimental results and analysis are provided in Section 4. Section 5 reports the conclusion and possible extensions.

ROI Extraction
It is well known that the size of one panchromatic optical remote sensing image is very large, and the number and area of ships in the image are lower, resulting in redundant background information. Moreover, ship targets can be described by high-level semantics such as texture, shape, direction, and other features. Therefore, ships can be considered as significant targets on the image and use saliency map extraction to locate them in a large image. In this section, we will introduce the two stages of ROI extraction: the detection of saliency maps and the method of obtaining target candidates by binary segmentation. In the first step, a multi-frequency domain significance map model based on the improved hyper-complex frequency domain and quaternion Fourier transform based on PQFT is designed. In the second step, an adaptive segmentation is proposed to obtain binary images of saliency maps to extract some basic parameters of the ROIs and determine potential targets.

The PQFT Model
Guo proposed the PQFT in the hyper-complex frequency domain [9]. The generalized coordinated color features are used to construct the quaternion. The hyper-complex quaternion matrix is represented as follows: where µ 1 , µ 2 , and µ 3 are the imaginary units, µ 2 1 = µ 2 2 = µ 2 3 = µ 1 µ 2 µ 3 = −1, and f 0 is the real part. If the value of the real part is 0, it is called a pure quaternion.
By combining the multi-feature components of the color image, the hyper-complex quaternion is formed as follows: where R = r − (g + b)/2, G = g − (r + b)/2, B = b − (r + g)/2, and Y = (r + g)/2 − r − g /2 − b. r, g, and b are the three channels of the color image. f 0 is the motion characteristic component and is used to express the brightness difference between two adjacent frames. I(x, y, t) and I(x, y, t − 1) represent the current frame and the previous frame image in the video. f 0 = 0 when it is used for a single-frame static image. The hyper-complex Fourier transform Q[u, v] of the quaternion q(x, y) of the input image is computed, and the polar coordinate transformation form is described as follows: where . is the module of each element in the hyper-complex matrix, and Q[u, v] is the frequency domain expression of q(x, y). The amplitude spectrum A( u, v ) , phase spectrum P( u, v ) , and Eigen spectrum χ( u, v ) are calculated as follows: Remote Sens. 2020, 12, 152 5 of 24 where S and V represent the real and imaginary part of the Q[u, v], respectively. The phase spectrum is preserved, the inverse quaternion Fourier transform is carried out, and the saliency map of PQFT is obtained: where g represents the Gaussian filtering function, and Q −1 is the inverse quaternion Fourier transform. S is the saliency map. By the above calculation, the PQFT model can quickly achieve a saliency map of color video images and obtain good detection results. The PQFT is established by using two kinds of features (color and motion information) to construct the quaternion in the RGB or CIE Lab mode. However, the panchromatic remote sensing image has only one gray channel, and a single image has no motion information, which can result in a great reduction in the extraction accuracy for PQFT. The method adopts single-scale filtering and does not take into account the multi-scale phenomenon of different target sizes.

The Proposed Saliency Map Detection Method
Aiming at these addressed problems, a novel saliency map detection method is proposed based on the improved PQFT algorithm, called the modified PQFT (MPQFT). The method improves the PQFT in two aspects: one regards how the quaternion is constructed for the panchromatic remote sensing image to obtain uniform and complete target areas; the other regards how the saliency map is generated to ensure that differently sized targets can be detected at the same time. As shown in Figure 2, the proposed algorithm in the process of the MPQFT mainly consists of four steps: constructing the quaternion using different feature maps, carrying out hyper-complex Fourier transform to the quaternion, calculating multi-scale saliency maps, and generating the best saliency map. The process of the MPQFT is shown in Figure 2.
Remote Sens. 2020, 11, x FOR PEER REVIEW 5 of 23 where S and V represent the real and imaginary part of the [ ] , Q u v , respectively.
The phase spectrum is preserved, the inverse quaternion Fourier transform is carried out, and the saliency map of PQFT is obtained: where g represents the Gaussian filtering function, and 1 Q − is the inverse quaternion Fourier transform. S is the saliency map.
By the above calculation, the PQFT model can quickly achieve a saliency map of color video images and obtain good detection results. The PQFT is established by using two kinds of features (color and motion information) to construct the quaternion in the RGB or CIE Lab mode. However, the panchromatic remote sensing image has only one gray channel, and a single image has no motion information, which can result in a great reduction in the extraction accuracy for PQFT. The method adopts single-scale filtering and does not take into account the multi-scale phenomenon of different target sizes.

The Proposed Saliency Map Detection Method
Aiming at these addressed problems, a novel saliency map detection method is proposed based on the improved PQFT algorithm, called the modified PQFT (MPQFT). The method improves the PQFT in two aspects: one regards how the quaternion is constructed for the panchromatic remote sensing image to obtain uniform and complete target areas; the other regards how the saliency map is generated to ensure that differently sized targets can be detected at the same time. As shown in Figure 2, the proposed algorithm in the process of the MPQFT mainly consists of four steps: constructing the quaternion using different feature maps, carrying out hyper-complex Fourier transform to the quaternion, calculating multi-scale saliency maps, and generating the best saliency map. The process of the MPQFT is shown in Figure 2.  For the ship target itself, it usually has a large change in intensity and obvious boundaries compared with clouds, fog, islands, and other scenes. In addition, texture and spatial similarity of ships are also conducive information in the complex background. Combining the two aspects would benefit the extraction process. Therefore, in the process of constructing the quaternion, we adopted the contrast map, the Gabor filtering map, and the fusion feature map that is established through combining spatial correlation with texture feature. The construction process of these feature maps is shown in detail below.
Firstly, the contrast value of one pixel is measured by the Euclidean distance of regions R1 and R2, and the calculation formula is as follows: where p k1 and p k2 are gray values of pixels in R1 and R2, respectively. D[.] indicates the Euclidean distance. R1 and R2 are the square regions centered on the same pixel (i, j). i and j are the position of the row and column in the image. N1 and N2 are the number of pixels in R1 and R2, respectively. The size of regions R1 and R2 are 5 × 5 and 41 × 41 pixels, respectively. Both of them are empirical parameters. We c obtain the contrast map of the whole image by Equation (11). Secondly, the Gabor filtering map that is sensitive to the scale and direction of the images is introduced [24,25]. The Gabor kernel function is multiplied by a Gaussian and sine function as follows: where x = x cos θ + y sin θ, y = −x sin θ + y cos θ. λ is expressed in pixels when participating in the calculation. In general, it is less than one-fifth of the size of the input image and greater than or equal to 2. θ represents the direction of Gabor filter fringes, and ranges from 0 • to 360 • . ϕ is the phase offset that ranges from −180 • to 180 • . γ is used to adjust the elliptic aspect ratio after the Gabor transform and when γ equals 1, and the shape is approximately circular. σ is the standard deviation of the Gaussian function in the Gabor function. We selected the following parameters: λ = 2, θ = 0 • , γ = 0.5, ϕ = 0 • , and σ = 0.5. As shown in Figure 3, one example with different parameters is given.
Remote Sens. 2020, 11, x FOR PEER REVIEW 6 of 23 For the ship target itself, it usually has a large change in intensity and obvious boundaries compared with clouds, fog, islands, and other scenes. In addition, texture and spatial similarity of ships are also conducive information in the complex background. Combining the two aspects would benefit the extraction process. Therefore, in the process of constructing the quaternion, we adopted the contrast map, the Gabor filtering map, and the fusion feature map that is established through combining spatial correlation with texture feature. The construction process of these feature maps is shown in detail below.
Firstly, the contrast value of one pixel is measured by the Euclidean distance of regions R1 and R2, and the calculation formula is as follows: We c obtain the contrast map of the whole image by Equation (11). Secondly, the Gabor filtering map that is sensitive to the scale and direction of the images is introduced [25,26]. The Gabor kernel function is multiplied by a Gaussian and sine function as follows: where xx'cos=+ysiθθn , =−yx'sin+ycθθos . λ is expressed in pixels when participating in the calculation. In general, it is less than one-fifth of the size of the input image and greater than or equal to 2. θ represents the direction of Gabor filter fringes, and ranges from 0° to 360°. ϕ is the phase offset that ranges from −180° to 180°. γ is used to adjust the elliptic aspect ratio after the Gabor transform and when γ equals 1, and the shape is approximately circular. σ is the standard deviation of the Gaussian function in the Gabor function. We selected the following parameters: λ Next, a fusion feature map is established through combining spatial correlation with texture to enhance the quaternion. The lower the spatial correlation, the more likely a candidate it is. The spatial correlation of the region x to the domain ' X for the pixel (i, j) is defined as follows: where x and ' X are the square regions centered on the pixel p(i, j) and p(i, j ) N + , respectively. i and j are the row and column coordinates of the image. N is set to 5 in this paper. x and ' X are Next, a fusion feature map is established through combining spatial correlation with texture to enhance the quaternion. The lower the spatial correlation, the more likely a candidate it is. The spatial correlation of the region x to the domain X for the pixel (i, j) is defined as follows: Remote Sens. 2020, 12, 152 7 of 24 where x and X are the square regions centered on the pixel p(i, j) and p(i, j + N), respectively. I and j are the row and column coordinates of the image. N is set to 5 in this paper. x and X are areas represented by the black and blue borders in Figure 4. The sizes of them are both k × k pixels.σ x and σ x are standard deviations of the corresponding areas. E(x) and E(x ) are the mean value of the region x and X , respectively. x x represents the dot product of the gray value in the corresponding position of the two regions, and the size of x x is k × k pixels. E(x x ) is the mean value of the x x .
Remote Sens. 2020, 11, x FOR PEER REVIEW 7 of 23 areas represented by the black and blue borders in Figure 4. The sizes of them are both k × k pixels.  We then calculate the unified LBP [17] to reflect texture properties of points, line corners, and flat areas in the image.
where ( Through the above steps, we completed the construction of the feature maps. 0 f~ 3 f in Equation (1) can be assigned: 0 f is the original image, 1 f is a contrast map by Equation (11), 2 f is a Gabor transform map based on Equation (12), and 3 f is a blend feature map combining spatial correlation with the texture feature based on Equation (16). Therefore, the quaternion is Q u v is obtained by performing the hyper-complex Fourier transform in Equation (6). The amplitude spectrum  (7) and (8). At the same time, in order to enhance the significant part, the Gaussian kernel function with different scales is used to obtain the scale space of the amplitude spectrum. We then calculate the unified LBP [17] to reflect texture properties of points, line corners, and flat areas in the image.
where s(.) is a binary symbol, s(x) = 1, x > 0, and s(x) = 0, x ≤ 0. U(LBP N,R ) represents the number of exchanges between 0 and 1 on the N-bit binary number [17], g c and g n are the gray values of the central and the domain pixel, respectively. It was found that the LBP value of the background was higher than the target. The LBP map is processed as follows.
where LBP is the mean value of the LBP map, and |.| represents the absolute value. The final fusion feature is obtained by multiplying the correlation value Correlation(i, j) and the coefficient in the LBP coeff (i, j) in the corresponding position: Through the above steps, we completed the construction of the feature maps. f 0~f3 in Equation (1) can be assigned: f 0 is the original image, f 1 is a contrast map by Equation (11), f 2 is a Gabor transform map based on Equation (12), and f 3 is a blend feature map combining spatial correlation with the texture feature based on Equation (16). Therefore, the quaternion is q(x, y) is obtained by performing the hyper-complex Fourier transform in Equation (6). The amplitude spectrum A( u, v ) and phase spectrum P( u, v ) of Fourier transform is calculated by Equations (7) and (8). At the same time, in order to enhance the significant part, the Gaussian kernel function with different scales is used to obtain the scale space of the amplitude spectrum. where k is a spatial scale parameter, k = 1, . . . , K, K = log 2 min{H, W} + 1, and H and W are the height and width of the image, respectively, t 0 = 0.5. We built a series of saliency maps at different scales as follows: The best saliency map is selected by methods according to the theory of minimum information entropy, and other saliency maps are discarded [26]. However, the discarded saliency maps also contained significant information at different scales for differently sized ships. Therefore, we linearly synthesized the saliency maps according to the entropy: where H k (x) represents the information entropy of one saliency map. S min is the saliency map that the entropy is the minimum of S k . S is the saliency map we obtained. We provide one example to show the process of the quaternion construction and the result of saliency map. The image comes from the satellite GF-2 and has a 2 m resolution. We can see from Figure 5 that the proposed method achieves a better locating result when it is subjected to complex interference and that each feature map suppresses the background interference effectively. Moreover, the gray value distribution of the saliency map is relatively uniform, so the binary value leads to a more complete target contour in the process of binary segmentation. The color images should be converted into grayscale images, and the following processing is similar to gray images.
The best saliency map is selected by methods according to the theory of minimum information entropy, and other saliency maps are discarded [27]. However, the discarded saliency maps also contained significant information at different scales for differently sized ships. Therefore, we linearly synthesized the saliency maps according to the entropy: where ( ) k H x represents the information entropy of one saliency map. min S is the saliency map that the entropy is the minimum of k S . S is the saliency map we obtained.
We provide one example to show the process of the quaternion construction and the result of saliency map. The image comes from the satellite GF-2 and has a 2 m resolution. We can see from Figure 5 that the proposed method achieves a better locating result when it is subjected to complex interference and that each feature map suppresses the background interference effectively. Moreover, the gray value distribution of the saliency map is relatively uniform, so the binary value leads to a more complete target

Target Candidates Extraction
After the detection of saliency maps, saliency regions are enhanced, and the background is suppressed. In order to locate positions of candidate targets in the image, binary segmentation is needed. Furthermore, these areas include ships and false alarms. We can eliminate some simple false alarms by characteristic parameters and obtain ROI candidates. In order to extract the characteristic parameters of saliency regions, such as length and width, it is necessary to obtain binary images of saliency maps. Therefore, an adaptive segmentation based on the Otsu method and GrabCut [28,29] was executed to extract candidate target regions.
The original GrabCut algorithm is an interactive segmentation model with high precision. In this algorithm, the Gaussian mixture model (GMM) is used to model the foreground and background region of the image through annotating artificially. The foreground and background areas of the model need to be defined in advance. The foreground should include the complete target as much as possible. The rest of the image will be the background. Any point in the image corresponds to a Gaussian component of the target or background. This obviously requires too many manual inputs

Target Candidates Extraction
After the detection of saliency maps, saliency regions are enhanced, and the background is suppressed. In order to locate positions of candidate targets in the image, binary segmentation is needed. Furthermore, these areas include ships and false alarms. We can eliminate some simple false alarms by characteristic parameters and obtain ROI candidates. In order to extract the characteristic parameters of saliency regions, such as length and width, it is necessary to obtain binary images of saliency maps. Therefore, an adaptive segmentation based on the Otsu method and GrabCut [27,28] was executed to extract candidate target regions.
The original GrabCut algorithm is an interactive segmentation model with high precision. In this algorithm, the Gaussian mixture model (GMM) is used to model the foreground and background region of the image through annotating artificially. The foreground and background areas of the model need to be defined in advance. The foreground should include the complete target as much as possible. The rest of the image will be the background. Any point in the image corresponds to a Gaussian component of the target or background. This obviously requires too many manual inputs in advance. Otsu is a classical threshold segmentation algorithm that is simple to calculate, but its accuracy is low in complex scenes. Therefore, we propose an adaptive segmentation method combining the two algorithms to obtain more accurate binary regions.
(1) Initial binary segmentation: Slices of ROIs are segmented to obtain binary images by the Otsu algorithm. One example (2 m resolution) is displayed in Figure 6 to show how the algorithm is executed. Figure 6b is the binary result of the Otsu method. We can see that the target region is not complete.  Figure 6d. In this way, the foreground and the background can be obtained by expanding the external moment without manual input. (4) As regards GrabCut calculations, the GrabCut model iterates the energy model according to the input foreground and background region until the energy tends to be stable [27]. (5) As regards the final binary segmentation, the final binary segmentation image is obtained by a GarbCut model. In the saliency map, the region above the threshold is defined as the ROIs, while the other is considered to be the background. The external moment is recalculated to facilitate the subsequent application.
Remote Sens. 2020, 11, x FOR PEER REVIEW 9 of 23 in advance. Otsu is a classical threshold segmentation algorithm that is simple to calculate, but its accuracy is low in complex scenes. Therefore, we propose an adaptive segmentation method combining the two algorithms to obtain more accurate binary regions.
(1) Initial binary segmentation: Slices of ROIs are segmented to obtain binary images by the Otsu algorithm. One example (2 m resolution) is displayed in Figure 6 to show how the algorithm is executed. Figure 6b is the binary result of the Otsu method. We can see that the target region is not complete.  Figure  6d. In this way, the foreground and the background can be obtained by expanding the external moment without manual input. (4) As regards GrabCut calculations, the GrabCut model iterates the energy model according to the input foreground and background region until the energy tends to be stable [28]. (5) As regards the final binary segmentation, the final binary segmentation image is obtained by a GarbCut model. In the saliency map, the region above the threshold is defined as the ROIs, while the other is considered to be the background. The external moment is recalculated to facilitate the subsequent application.
GrabCut model  Figure 7 shows the ROI extraction of images in complex backgrounds with a 2 m resolution. We can obtain many ROIs with different sizes, including ship targets, broken cloud blocks, sea waves, and other false alarms, as shown on the second column in Figure 7. The third column is the segmentation results obtained by the adaptive segmentation. Some false alarms can be eliminated by relevant simple shape features including length, width, area, aspect ratio, and tightness. By applying simple shape analysis, large clouds, islands, and small waves can be ruled out.  Figure 7 shows the ROI extraction of images in complex backgrounds with a 2 m resolution. We can obtain many ROIs with different sizes, including ship targets, broken cloud blocks, sea waves, and other false alarms, as shown on the second column in Figure 7. The third column is the segmentation results obtained by the adaptive segmentation. Some false alarms can be eliminated by relevant simple shape features including length, width, area, aspect ratio, and tightness. By applying simple shape analysis, large clouds, islands, and small waves can be ruled out. Figure 7 shows the ROI extraction of images in complex backgrounds with a 2 m resolution. We can obtain many ROIs with different sizes, including ship targets, broken cloud blocks, sea waves, and other false alarms, as shown on the second column in Figure 7. The third column is the segmentation results obtained by the adaptive segmentation. Some false alarms can be eliminated by relevant simple shape features including length, width, area, aspect ratio, and tightness. By applying simple shape analysis, large clouds, islands, and small waves can be ruled out.

Target Discrimination
Although some sea surface background interference can be suppressed after the ROI extraction stage, there are still some false alarms, such as coastal buildings and thick clouds, due to the complexity of the sea surface. Therefore, it is necessary for us to use the feature extraction and machine learning technique to confirm real ships. In this section, we introduce the description operator MLBP combined with SVM training to eliminate false alarms and confirm real targets.

The MLBP Feature Description
The LBP operator is gray rotation-invariant [30]. However, the LBP method only considers the size relationship between the central and neighboring pixel, with no contrast relationship. Therefore, different contrast distributions have the same LBP value as shown in Figure 8. The modified LBP feature (MLBP) is proposed in this paper to solve this problem. The following process is for any pixel (i, j) in the image. Firstly, the local contrast between the central pixel and neighbor pixels is calculated:

Target Discrimination
Although some sea surface background interference can be suppressed after the ROI extraction stage, there are still some false alarms, such as coastal buildings and thick clouds, due to the complexity of the sea surface. Therefore, it is necessary for us to use the feature extraction and machine learning technique to confirm real ships. In this section, we introduce the description operator MLBP combined with SVM training to eliminate false alarms and confirm real targets.

The MLBP Feature Description
The LBP operator is gray rotation-invariant [29]. However, the LBP method only considers the size relationship between the central and neighboring pixel, with no contrast relationship. Therefore, different contrast distributions have the same LBP value as shown in Figure 8.
The modified LBP feature (MLBP) is proposed in this paper to solve this problem. The following process is for any pixel (i, j) in the image. Firstly, the local contrast between the central pixel and neighbor pixels is calculated: where I (i,j) denotes the gray value of the i-th row and the j-th column of the image. I p is the gray value of the neighbor pixel centered in the pixel (i, j), and p = 1, 2, . . . N. N represents the number of pixels in the neighbor field, and it is set to 8 in this paper. g p is a contrast value between the central pixel and one neighbor pixel.
operator MLBP combined with SVM training to eliminate false alarms and confirm real targets.

The MLBP Feature Description
The LBP operator is gray rotation-invariant [30]. However, the LBP method only considers the size relationship between the central and neighboring pixel, with The modified LBP feature (MLBP) is proposed in this paper to solve this problem. The following process is for any pixel (i, j) in the image. Firstly, the local contrast between the central pixel and neighbor pixels is calculated: Secondly, the maximum (maxC) and minimum (minC) of g p (p = 1, 2, ... N) can be found. The range of values between maxC and minC is then divided into L levels. Therefore, each contrast value corresponds to a certain level. The level of g p is calculated as follows: where p = 1, 2, ... N. . is a symbol of rounding in the small direction, maxC and minC represent the maximum and minimum contrast values, respectively. L represents the number of levels and is set to 4 based on a large number of experiments. If l p > L, then l p = L. The contrast level l p is transformed in to a binary description of 0 or 1: For the pixel (i, j), the binary descriptor MLBP is a series of numbers of 0 and 1 constructed with S p , MLBP = S p , and p = 1, 2, . . . , N. In order to obtain rotation invariance, the number of exchanges between 0 and 1 is used to describe it again, and the rotation-invariant MLBP proposed in this paper is finally obtained as follows: where U(MLBP) represents the number of exchanges between 0 and 1 for the series. MLBP riu is a rotation invariance binary descriptor. One example of the MLBP description process is given in Figure 9. The value of the central pixel is 120, N = 8, L= 4, and the size of the neighbor is 8 pixels, displayed in blue in Figure 9. The surrounding pixel values are compared with the central value to obtain the contrast map as shown in Figure 9b. The level of each neighbor pixel is calculated according to Equation (22) as shown in Figure 9c. The initial binary description of this area is 00000101, according to Equation (23), and the number of exchanges between 0 and 1 is 4. The rotation invariance binary descriptor of this area is 9.
One example of the MLBP description process is given in Figure 9. The value of the central pixel is 120, N = 8, L= 4, and the size of the neighbor is 8 pixels, displayed in blue in Figure 9. The surrounding pixel values are compared with the central value to obtain the contrast map as shown in Figure 9b. The level of each neighbor pixel is calculated according to Equation (22) as shown in Figure 9c. The initial binary description of this area is The square neighbor area is used as an example above. In the actual application, a circular area can be adopted, and the bilinear interpolation method is applied for the estimation of the value that is not at the center of the pixel. According to the method proposed in this paper, the binary description of area (a) and (b) in Figure 8 are 11101111 and 00001111, respectively. The rotationinvariant MLBP is 1 and 2. Obviously, the MLBP of different contrast distributions is different. The square neighbor area is used as an example above. In the actual application, a circular area can be adopted, and the bilinear interpolation method is applied for the estimation of the value that is not at the center of the pixel. According to the method proposed in this paper, the binary description of area (a) and (b) in Figure 8 are 11101111 and 00001111, respectively. The rotation-invariant MLBP is 1 and 2. Obviously, the MLBP of different contrast distributions is different.

SVM Training
The main aim of the SVM classification is to discriminate the real ship targets from candidates based on different features. We have already obtained the texture description in Section 3.1. In addition, we use the other 10 feature descriptions, including shape description, contrast and texture description, and invariant moment features. The shape feature includes length, width, area, and the perimeter of the minimum external moment. Furthermore, we selected three morphological features to eliminate false alarms, including length-width ratio, compactness, and rectangularity, which are calculated as follows: RH = H/W (28) where H and W are the length and width of the minimum external moment in pixels; P is the perimeter of the contour in pixels; S is the number of the connected area points in pixels. At the same time, we analyzed the texture characteristics of a large number of ships and non-ships. The texture of waves and clouds changed slowly, and that of ships varied greatly. Therefore, we introduced the histogram variance of MLBP (proposed by Section 3.1) and the correlation and contrast of the grey-level co-occurrence matrix (GLCM) [23] to strengthen the ability to distinguish ships and non-ships. The contrast and correlation of the GLCM are calculated as follows.
where P ij is an element of GLCM; µ is the average of GLCM, µ = i j i · P ij ; σ 2 is the standard deviation, The contrast reflected the degree of the depth of the texture. The lighter the texture groove, the lower the contrast, and the more blurred the visual effect is. The correlation measured the degree of similarity between the spatial GLCM elements in the row or column direction. If that value of the matrix is equal in size, the value of the correlation value is small.
We randomly extracted 20 samples (as shown in Figure 10) from the dataset for statistical analysis of the correlation and contrast of GLCM and the histogram variance of MLBP, as shown (a), (b), and (c) in Figure 11. The resolution of images from the satellites GF-2 and ZY-3 was 2 m. We can see that the characteristics of ships are distinguished from those of clouds and waves. This is because of the fact that the internal texture of ships is quite different compared with that of clouds and waves. measured the degree of similarity between the spatial GLCM elements in the row or column direction. If that value of the matrix is equal in size, the value of the correlation value is small.
We randomly extracted 20 samples (as shown in Figure 10) from the dataset for statistical analysis of the correlation and contrast of GLCM and the histogram variance of MLBP, as shown (a), (b), and (c) in Figure 11. The resolution of images from the satellites GF-2 and ZY-3 was 2 m. We can see that the characteristics of ships are distinguished from those of clouds and waves. This is because of the fact that the internal texture of ships is quite different compared with that of clouds and waves.
The Hu moment is a highly concentrated image feature with translation, rotation, and scale invariance. The first two moments M1 and M2 of Hu were used as a set of parameters to describe the characteristics of ships. We rotate the 20 ships by 5°, and the second moment of Hu changes little after the rotation, as shown (d) in Figure 11 From the above analysis, it can be seen that the feature description selected in this paper is strong and robust. We used the SVM to discriminate real ship targets from candidates based on the obtained features. Eleven features should be feature-transformed and mapped to a high-dimensional space. Here, the radial basis function (RBF) is used as the kernel function of the SVM binary classifier, The Hu moment is a highly concentrated image feature with translation, rotation, and scale invariance. The first two moments M1 and M2 of Hu were used as a set of parameters to describe the characteristics of ships. We rotate the 20 ships by 5 • , and the second moment of Hu changes little after the rotation, as shown (d) in Figure 11 From the above analysis, it can be seen that the feature description selected in this paper is strong and robust.
We used the SVM to discriminate real ship targets from candidates based on the obtained features. Eleven features should be feature-transformed and mapped to a high-dimensional space. Here, the radial basis function (RBF) is used as the kernel function of the SVM binary classifier, K(x, x ) = exp(− x − x 2 /σ 2 ),σ > 0. Before training, each feature is first normalized to the range of 0-1 to reduce the dominant role of a certain dimension feature due to its large magnitude. The RBF function has two key parameters: the penalty factor c and the kernel parameter σ. We obtained the best parameter through the cross-validation method and set c = 1 and σ = 0.7 [29]. Training images come from the satellites GF-2 and ZY-3 2 m resolution.

Subjective Visual Evaluation of Saliency Models
In order to test the performance of saliency map extraction proposed in this paper for panchromatic optical remote sensing images, four typical algorithms were compared and analyzed, and they were ITTI, SR, PQFT, and one found in [12]. The ASD and MSRA10K datasets were not used in the evaluation of various algorithms, because these two datasets are mostly color images [30]. Images with a 2 m resolution from the satellites GF-2 and ZY-3 were used to construct 300 test sets under various sea conditions such as different shooting time, conditions, and sea surface false alarms. The slice size is 200 × 200 pixels. In addition, in order to evaluate the performance of various algorithms objectively, we extracted precise ground-truth images of contours in advance. Figure 12 shows the ROI extraction in a set of typical cases, including the calm sea surface, low contrast, obvious sea clutter, and a similar texture.
It is not difficult to find through a large number of experiments that, for panchromatic images, the performance of the frequency domain method is better than the spatial domain method. The spatial domain method cannot suppress the clutter interference and is greatly affected by strong waves and clouds. The SR and PQFT show no significant differences in performance. For the second image in Figure 12c,d, the ship target is completely submerged into the background and cannot be distinguished. The performance of the method found in [12] is better than other frequency domain algorithms, but the detected target area is incomplete, and the target easily melts into the background, as shown in the third, fourth, and fifth images in Figure 12f. The method from [12] only uses gray information, so ROIs cannot be accurately detected. Compared with other algorithms, the method of the saliency map extraction proposed in this paper obtained a better effect, despite a low contrast or a complex background. The detection result was more complete and clearer. Moreover, the brightness value distribution of the target area was uniform, which also helps to detect completed targets in the binary process. This will prevent to some extent incomplete target slices caused by too much brightness or too much darkness.
We selected the PR curve and the comprehensive evaluation index F measure to evaluate the accuracy of various algorithms. Ground-truth images (manual marking) and binary images of saliency maps were recorded as G and M. In the PR curve, P and R refer to the precision and recall of the methods, respectively. Their calculation formulas are as follows: Among them, the pixels belonging to G and M are simultaneously called TP. The pixels belonging to G and not to M are called FN. The pixels belonging to neither G nor M are called FP.
Recall and precision cannot be discussed in isolation. The F metric is introduced to comprehensively evaluate the detection performance of the saliency model as follows: Remote Sens. 2020, 11, x FOR PEER REVIEW 14 of 23

Subjective Visual Evaluation of Saliency Models
In order to test the performance of saliency map extraction proposed in this paper for panchromatic optical remote sensing images, four typical algorithms were compared and analyzed, and they were ITTI, SR, PQFT, and one found in [12]. The ASD and MSRA10K datasets were not used in the evaluation of various algorithms, because these two datasets are mostly color images [31]. Images with a 2 m resolution from the satellites GF-2 and ZY-3 were used to construct 300 test sets under various sea conditions such as different shooting time, conditions, and sea surface false alarms. The slice size is 200 × 200 pixels. In addition, in order to evaluate the performance of various algorithms objectively, we extracted precise ground-truth images of contours in advance. Figure 12 shows the ROI extraction in a set .  Among them, the value of β is different, and the requirements of recall and precision are different. Combined with actual engineering requirement, this paper pays the same attention to recall and precision, β = 1.
After experimental analysis, we selected ITTI in the spatial domain and the method in [12] in the frequency domain for comparison with our method. The performance of the two algorithms was relatively good in their domains. Because the test database contains all kinds of complex, typical sea conditions, the threshold T of the binary processing increased from 0 to 255 when the PR curve was drawn. The average value of recall and precision in different cases was obtained as shown below.
It can be seen from the Figure 13 that the PR and F curves of our algorithm are obviously higher than those of the other two methods. From the PR curve, it can be seen that, when the recall rate is 80%, the precision of ITTI and the method in [12] is about 40% and 73%, respectively. The precision of our algorithm is about 84%, which is 11% higher than the method in [12]. Our algorithm has an obvious advantage in the case of complex sea conditions because it has constructed a contrast map, a Gabor map, and a spatial and texture correlation feature to facilitate the integrity of target edge detection. At the same time, MPQFT constructs a multi-scale space, which is helpful for detecting targets of different sizes. Remote Sens. 2020, 11, x FOR PEER REVIEW 16 of 23

Overall Detection Performances
Finally, we compared our overall detection method with three typical methods, and the evaluation criteria are defined as follows:

Overall Detection Performances
Finally, we compared our overall detection method with three typical methods, and the evaluation criteria are defined as follows: The We selected 500 typical panchromatic optical remote sensing images with a size of 8192 × 4096 pixels under different imaging conditions from the satellites GF-2 and ZY-3 and with a 2 m resolution. (Some images can be downloaded from the website: www.cresda.com/CN/). In addition, other panchromatic satellite images from various sea surfaces were collected and used as the testing dataset from the publicly available Google Earth service with a resolution of 2 m. These images are very large in size and we have subdivided them into 120 sub-images with 8192 × 4096 pixels. A total of 620 images contain ship targets of different sizes and shapes. We tested our approach on three groups of different sea surfaces: calm sea with little interference, a complex sea surface influenced by clouds and waves, the cloud area percentage of which is about 20%, −30%, and worse imaging conditions, the cloud area percentage of which is more than 50% and as high as 85%. The images contained ships of different types and sizes.
The proposed method was implemented in C++ with Intel (R) Core (TM) i7-4770K CPU at 3.40 GHz and 64.0 GB RAM. The objective evaluation indices of the detection results are listed in Table 1. Because these images are so large in size that they will be compressed in this paper, we provided some local parts of the original images, as shown in Figures 14 and 15. Red borders indicate the ship targets we detected, and the yellow border areas indicate ships we missed with our method. In some cases, we lost some targets, as shown in of Figure 15a,d. In Figure 15a, one ship is almost completely obscured by clouds, making it difficult to locate the target, and another one is lost because the contrast of the ship and the background is too low, making it difficult to distinguish by the human eye. In Figure 15d, more than five ships are connected and docked, making it different from the conventional training model in terms of the shape and area; therefore, the targets were not recognized. In Table 1, we can see that with the increasing amount of interference, the accuracy of our method decreases slightly from Group 1 to Group 3. As shown in Figures 14h and 15h, our method lost some ships (areas indicated by yellow dotted lines) submerged under clouds, but the average accuracy reached up to 92.8% for different sea surfaces, and the average false alarm rate was 7.2%. Considering the existence of ships with different sizes in an image, the multi-scale saliency map extraction was designed so that the method can accurately identify ships of different sizes simultaneously. Images used in the experiment had a 2 m resolution. For ships smaller than 20 m (10 pixels), detection accuracy decreased, and in this case, targets would be so small that their feature parameters were sometimes inaccurate, affecting detection results.
At the same time, we compared our method with state-of-the-art methods proposed in [7, 13,17]. Table 2 was obtained by averaging the evaluation indices under different sea conditions, showing the average performance of the four methods. Table 3 shows a comparison of four methods under different sea conditions. The precision of the methods from [13,17] is high with respect to calm sea surfaces, but for complex sea surfaces, detection performance is greatly reduced, as shown in Table 3. The method proposed in this paper is optimal for detection accuracy. On average, the detection time of our method is 1.6 s. The efficiency of our algorithm is not the highest. Compared with the method from [13], which has comparable accuracy, our efficiency is almost double.  In Table 1, we can see that with the increasing amount of interference, the accuracy of our method decreases slightly from Group 1 to Group 3. As shown in Figures 14h and 15h, our method lost some ships (areas indicated by yellow dotted lines) submerged under clouds, but the average accuracy reached up to 92.8% for different sea surfaces, and the average false alarm rate was 7.2%. Considering the existence of ships with different sizes in an image, the multi-scale saliency map extraction was designed so that the method can accurately identify ships of different sizes simultaneously. Images used in the experiment had a 2 m resolution. For ships smaller than 20 m (10 pixels), detection accuracy decreased, and in this case, targets would be so small that their feature parameters were sometimes inaccurate, affecting detection results.
At the same time, we compared our method with state-of-the-art methods proposed in [7], [13], and [17]. Table 2 was obtained by averaging the evaluation indices under different sea conditions, showing the average performance of the four methods. Table 3 shows a comparison of four methods under different sea conditions. The precision of the methods from [17] and [13] is high with respect to calm sea surfaces, but for complex sea surfaces, detection performance is greatly reduced, as shown in Table 3. The method proposed in this paper is optimal for detection accuracy. On average, the detection time of our method is 1.6 s. The efficiency of our algorithm is not the highest. Compared with the method from [13], which has comparable accuracy, our efficiency is almost double.   A linear function combining pixel and region characteristics was employed to select ship candidates in [7]. When the background was covered by clouds, the location of ROIs would fail, resulting in missed alarms. Compactness and the length-width ratio were considered to remove false alarms. The description features were not enough to distinguish between targets and background when the background became complex. The detection performance was greatly reduced. Regarding the calm sea, the method in [17] achieved the best performance. The accuracy was 98.3%, and the false and missed alarm rates were very low. However, this method only adopted intensity distinctness to find ROIs and led to omissions in complex backgrounds. In the false alarm exclusion stage, the LBP histogram features of the bow, stern, left hull, and right hull were used. The algorithm could achieve better results in distinguishing large ships from false alarms, but when the ships were small or the background became complex, the false alarm rate was higher, because there was little difference between each LBP part of a ship. In [13], the saliency map based on the multi-scale and multi-direction wavelet decomposition was detected to extract ROIs, and the pixel distribution discrimination was designed to eliminate false alarms. Compared with the two previous methods, this method showed a great improvement in accuracy and the false alarm rate. However, this method was designed for color images, and its performance is degraded when applied to panchromatic images. Moreover, the pixel distribution can only remove targets that vary widely in shape. Compared with other algorithms, the performance of our method is a great improvement. In order to improve recall and accuracy, in the extraction of the ROI stage, the MPQFT was designed for panchromatic remote sensing images. The quaternion was constructed with the full consideration of image contrast and edge structure. In addition, in order to find candidates in complex backgrounds, texture and spatial similarity were also adopted. In the latter stage, we proposed the MLBP method combined with 10 other features to reduce false alarms. Figure 14 shows our detection results in some typical sea conditions. On the whole, the algorithm achieved ideal detection results and detected ship targets under complex sea conditions. At the same time, if the sea surface is calm, our method can basically ensure that false alarms and missed detections are less than 5%.
Using the method proposed by this paper, color images should be transformed into gray-scale images for processing. We tested our approach on different sea surfaces in color images. These color images were synthesized from the three spectrums (RGB) from the satellite GF-2 and had an 8 m resolution. Some detection results for color images are shown in Figure 16. The first row contains color images. The second row contains the corresponding gray-scale images and detection results. On the whole, a good performance was achieved. The red border is the target detected by the method. However, it is generally said that the resolution of color images for remote sensing multi-spectral images is much lower than that of panchromatic images. Therefore, the accuracy of this method will be reduced for small ships. As shown in Figure 16, the yellow border is the lost target because the ship is too small in the image with an 8 m resolution. A platform of FPGA combined with DSP was built, and the method of ship detection proposed in this paper was optimized and embedded into a hardware platform. DSP used a TMS320C6678 with an 8-core processor. FPGA adopted the VIRTEX-7 chip produced by the company Xilinx. We carried out a large number of hardware tests of large-field optical remote sensing images. The image size was 8192 × 4096 pixels -2 m resolution. The processing time of one image was 242 ms on average. A platform of FPGA combined with DSP was built, and the method of ship detection proposed in this paper was optimized and embedded into a hardware platform. DSP used a TMS320C6678 with an 8-core processor. FPGA adopted the VIRTEX-7 chip produced by the company Xilinx. We carried out a large number of hardware tests of large-field optical remote sensing images. The image size was 8192 × 4096 pixels -2 m resolution. The processing time of one image was 242 ms on average. Compared with the performance of the original software method, the efficiency of the method was greatly improved, and the system has strong versatility and expansibility, which lays the foundation for practical engineering applications.

Conclusions
In this paper, we present a novel ship detection method for panchromatic optical remote sensing images consisting of saliency map extraction and target discrimination in complex backgrounds. We adopted an efficient frequency-domain model based on the hyper-complex Fourier transform of the quaternion to locate candidate regions, which made the brightness distribution of ROIs more uniform and complete and effectively reduced missed detections. Meanwhile, to determine whether the candidate target was a ship or not, multi-dimensional description features were extracted and designed by the characteristics of ships and no-ships. In addition, an improved LBP, which takes into account contrast information between pixels, was presented and provided more powerful descriptions. Finally, we built a database through actual panchromatic remote sensing images, and used SVM training to obtain a more stable model for ship confirmation. The experimental results under various sea backgrounds demonstrate that the proposed ship detection method can obtain high precision and detection robustness.
Although our method has achieved promising results, several issues remain to be further settled. With the development of satellite remote sensing, the hyper-spectral data should be fully used to construct a ship description operator, which can be combined with the advantages of visible and SAR images, making ship detection methods more robust and easily implemented in hardware.