Ship Detection for PolSAR Images via Task-Driven Discriminative Dictionary Learning

.


Introduction
Ship detection with synthetic aperture radar (SAR) images is one of the important applications in the field of maritime surveillance [1].Recently, polarimetric synthetic aperture radar (PolSAR) ship detection has received increasing attention, as polarimetric information has proved to be of great benefit to improving the detection effect [2][3][4][5][6][7][8][9][10][11][12].As a simple example, we can achieve satisfactory results at steep and middle (20 • to 40 • ) incidence angles by using cross-polarization (HV) only [2].However, co-polarization (HH or VV) may perform better at bigger incidence angles [2].PolSAR images combine the advantages of all polarimetric channels, reveal the scattering characteristics differences of the ship and clutter, and help to improve detection effect.
In the most of the existing PolSAR ship detection methods, a scalar feature index is designed to discriminate the target and clutter at first, and then constant false alarm rate (CFAR) operation is conducted.The simplest feature index is the image span, which is the intensity of PolSAR image and defined as the square of the scattering matrix's Frobenius norm.With further research, more complicated features have been proposed, and these features can be roughly classified into two types.The first is designed by enhancing the contrast between the interested targets and clutter.Novak et al. proposed the polarimetric whitening filter (PWF) to produce a speckle-reduced image by optimally combining all the elements of the scattering matrix [3].Yang et al. presented the generalized optimization of polarimetric contrast enhancement (GOPCE) to maximize the signal-to-clutter ratio (SCR) in the image [4].These methods work well in high SCR condition.However, when the SCR decreased, these methods may suffer from severe performance deterioration.The other type is designed by analyzing polarimetric scattering mechanism and introducing polarimetric para-meters.Yeremy et al. implemented ship detection by using Cameron decomposition [5], while the symmetric scattering characterization method (SSCM) was developed by Touzi et al. [6].Since the methods applied to single-look scattering matrix are generally more susceptible to the speckle and increase the probability of false alarms (PFAs) of small ship, multi-look covariance or coherency matrix-based methods have been explored further.Chen et al. introduced polarization cross entropy (PCE) based on the eigen-decomposition of polarimetric coherence matrix [7].Moreover, degree of polarization [8] was fully investigated for ship detection.It is true that speckle is greatly reduced by spatial ensemble averaging in these methods.Nevertheless, although the scalar feature implicitly includes the contributions of all polarimetric channels, an explicit consideration of all the polarimetric channels should provide more information, which is not fully exploited.These design features are too simple to provide robust performance in complicated and changeable clutter conditions.
Among various detection algorithms, the constant false alarm rate (CFAR) detectors have been widely used for their simplicity and adaptive ability [9][10][11].With the development of superpixel algorithms, some superpixel-based attempts have been carried out to achieve ship detection by combining the superpixel and CFAR detectors [12,13].It was proved that the superpixel can help to retain the target outline and suppress speckle noise.However, only simple features of superpixels, such as entropy information and pixel intensity, were utilized in these methods.Simple features provide weak discrimination and depend on artificial design.In addition, the detection performance also depends on largely the accuracy of statistical modeling and parameter estimation in CFAR operation.The theoretical distributions of artificial features are usually analytically intractable, or the estimations of the distributions are extremely cumbersome.
With the development of deep learning, researchers have employed deep neural networks to achieve ship detection in PolSAR images.Zhou et al. [14] modified the faster region-based convolutional neural network (Faster-RCNN) and applied it to PolSAR ship detection.And Kang et al. [15] proposed contextual region-based convolutional neural network with multilayer fusion (CRCNN-MF) by combining contextual information, multi-scaling and region-based convolutional neural network (RCNN).However, these methods simply process each channel of the PolSAR images separately and finally fuse the results.And just like other deep neural network methods, the heavy computation burden, unstable convergence and lots of sensitive parameters are the bottleneck for the application of these methods above.
In this paper, we propose a novel ship detection method for PolSAR images via task-driven discriminative dictionary learning (TDDDL).The superpixel is utilized as the basic processing cell.Ship detection can be viewed as a binary classification problem at superpixel level.Task-driven dictionary learning (TDDL) methods have achieved a cynosure success in classification field [16,17].To improve the discrimination between the ship and clutter, we propose to learn category-specific dictionaries for the ship and clutter.In this way, incoherence between sub-dictionaries is enhanced, producing more discriminative features.Contextual information is also considered by imposing joint sparsity prior.The complete dictionary is trained in TDDL framework.The proposed dictionary learning scheme is called TDDDL due to the strong discriminability of the learnt dictionary.Experimental results on synthetic images and two real-scene images show that our method outperforms all the comparative methods.
The main contributions of this paper can be summarized as follows: (1) We propose a novel dictionary learning algorithm to obtain more discriminative features and boost the detection performance.Contextual information and incoherence constraints are all included in the algorithm.
(2) We also describe an optimization procedure for solving sparse recovery problem with TDDDL.
(3) Different from previous methods, the proposed ship detection method based on TDDDL employs active learning strategies rather than artificially designed rules, and thus, is more adaptive and effective.In addition, the strong discriminability of the learnt dictionary improves detection performance further.
The remainder of this paper is organized as follows.In Section 2, we propose TDDDL in detail, including the formulation and optimization.In Section 3, the complete scheme of the proposed ship detection method is given.We conduct extensive experiments to evaluate the proposed method in Section 4, and conclude our work and propose future work in Section 5.

Task-Driven Discriminative Dictionary Learning (TDDDL)
In this section, we first briefly revisit the task-driven dictionary learning (TDDL) [16].Then, we will propose our TDDDL, including its formulation and optimization.We use [Q 1 ; Q 2 ] to denote the vertical concatenation of two matrices with the same columns, and use [Q 1 , Q 2 ] to denote the horizontal concatenation of two matrices with the same rows.

Review of TDDL
In TDDL [16], signals are represented by their sparse codes, which are then fed into a linear regression.Consider a pair of training samples (x, y), where x ∈ R M is the sum feature extracted from PolSAR image, y ∈ R K is a binary vector representation of corresponding label, M and K denote the dimensions of x and y, respectively.Given some dictionary D ∈ R M×P , where P is the number of atoms in the dictionary D, x can be represented as a sparse vector α(x, D) ∈ R P , defined as the solution of an elastic-net problem [17]: where λ 1 and λ 2 are the regularization parameters.
For classification task, TDDL uses the sparse vector α(x, D) in a classical expected risk minimization formulation: where L(D, W, x) is classification risk, W is the parameter matrix of the classifier, µ is a classifier regularization parameter to avoid the overfitting of classifier [18], and f (D, W, x) is a convex function defined as In this equation, E y,x denotes the expectation taken relative to the probability distribution p(x, y), l s is a convex loss function that measures how well one can predict y by observing α(x, D) given the parameter matrix W, which can be the square, logistic, or hinge loss from SVM [19].
Stochastic gradient descent (SGD) algorithm is used to update the dictionary D and the parameter matrix W. The update rules are as follows.
where t is the iteration index and ρ is the step size.The equation for updating W is straightforward since L(D, W, x) is both smooth and convex with respect to W. We have where T denotes the transposition.According to the chain rule, we have The main difficulty comes from ∂α/∂D, since the optimization problem in Equation ( 1) is not smooth [20].Mairal et al. [16] use fixed point differentiation to solve the problem [21].The detailed derivation of the algorithm can be found in the Appendix of Mairal et al. [16].

Formulation of TDDDL
The TDDL method provides us with a supervised dictionary learning framework to learn dictionaries adapted to various tasks instead of being only adapted to data reconstruction [16].It works well for basic-level classification because the differences between categories are typically rather significant.The sparse codes of different categories are different and result in discriminative features for classification.However, when facing harsher classification tasks, where the signal to clutter ratio (SCR) is much lower and different categories show similar characteristics, the TDDL method would suffer from severe performance deterioration.The difference can be dominated by those similar sparse codes and even disappear at the feature encoding stage.Hence, it is desirable to find a dictionary that could encode the features of different categories with their own code-words.Such a dictionary would obviously boost the differences of the feature representations, and improve the consequent ship detection.
To this end, we propose learning a discriminative dictionary by using category-specific dictionary structure and imposing incoherence constraints between the sub-dictionaries.Since neighboring pixels often share the same label with high probability, contextual information is also considered via joint sparsity prior.The complete dictionary is trained jointly with a linear classifier in TDDL framework.We call the proposed dictionary learning scheme TDDDL, due to the strong discriminability of the learnt dictionary.
We denote training samples within the neighborhood as The number of categories is represented as k, the category-specific dictionary corresponding to lth category as D l ∈ R M×P l , where P l is the number of atoms in the sub-dictioanry D l .The complete dictionary is D The samples X can be represented by the sparse code A(x, D) ∈ R P×N by solving the following Lasso problem: where λ is the regularization parameter, ||Z|| 1,2 = ∑ P i=1 ||Z i || 2 is the l 1,2 -norm of Z, and Z i ∈ R 1×N is the ith row of Z. Since neighboring pixels often share the same label with high probability, joint sparsity is imposed to enforce the sparse codes to have a row sparsity pattern.The neighboring pixels are selected by superpixel segmentation, which will be described in detail in Section 3.Many sparse recovery techniques are able to solve Equation (7), such as the sparse reconstruction by separable approximation [22], alternating direction method of multipliers [23], and fast iterative shrinkage-thresholding algorithm [24].
Obviously, the effect of sparse coding in Equation ( 7) largely depends on the quality of dictionary D. And the quality of dictionary D depends on the defined loss function.In TDDL [16], Mairal at al. suggested defining the loss function by classification error, which fully utilized the label information.To improve the dictionary quality further, we impose incoherence constraints between the two sub-dictionaries of the ship and clutter.Denote the label information corresponding to Given the training data (X, Y), the loss function can be formulated as follows: Remote Sens. 2019, 11, 769 5 of 20 where µ and η are the regularization parameters, W ∈ R k×P is the parameters of the linear classifier, A ∈ R P×N is given by Equation ( 7), D −l is denoted as the sub-dictionaries by removing D l from D. In Equation ( 8), the term ||Y − WA|| 2 F describes the classification error, the term ||W|| 2 F is to avoid overfitting of the classifier, and the term ||D T l D −l || 2 F as incoherence is to enforce category-specific sub-dictionaries incoherency.The coefficient 1/2P l (P − P l ) is to reduce the influence of the sub-dictionary size and to make the learnt dictionary more stable for classification, which was introduced by Gao et al. [25].

Optimization Procedure
For convenience, the loss function L(D, W, X) in Equation ( 8) can be further represented by two parts, L 1 and L 2 , which are defined as follows: Following the derivations in [16], we can show that L(D, W, X) is differentiable on D × W. It is simple to obtain the gradient with respect to W, i.e., Applying the chain rule, we can compute the gradient with respect to the dictionary D: Obviously, the derivative ∂L 1 /∂A can be computed in the same way as ∂L/∂W.The key point is to compute the derivative ∂A/∂D, since there is no explicit expression of D for the sparse codes A. Applying the fixed point differentiation [21] to Equation ( 7), Sun et al. derived the explicit expression of ∂A/∂D, which is illustrated in Appendix VII in [26].Here, we give the vectorization form of the derivative of A with respect to D mn : where denoted as the active atoms of D, and Λ is the active set such that where A i denotes the ith row of A. And Γ is defined as where ⊕ is the direct sum of matrices, 2 ), i = 1, . . ., P Λ .Combining Equations ( 13) and ( 14), the explicit form of ∂L 1 /∂A can be easily commputed.
As the other part of ∂L 2 /∂D, the derivative ∂L 2 /∂D can be rewritten as the following expression Therefore, we have Now, we conclude the derivation results as follows: where β ∈ R P×N is defined as and E ∈ R M×P is defined as We summarize the overall optimization for TDDDL in Algorithm 1.
Remote Sens. 2019, 11, 769 6 of 19 and  ∈ ℝ × is defined as We summarize the overall optimization for TDDDL in Algorithm 1.

3:
Compute the active set  according to Equation ( 14).

The Proposed Ship Detection Method
Ship detection can be viewed as a binary classification problem.Conventional methods generally complete ship detection in an unsupervised way.Supervised methods, typically deep neural network methods, provide us a new vision for ship detection.Thus, we propose a novel ship detection method via TDDDL.Figure 1 shows the main framework of the proposed ship detection method.Given a PolSAR image, superpixel segmentation is performed following the filter operation with boxcar filter.The superpixel is employed as the basic processing cell.Based on the superpixel segmentation result, we train a task-driven discriminative dictionary and a linear classifier jointly via TDDDL.Then, we encode the superpixels with the learnt dictionary.Finally, we achieve ship detection with binary classification.

The Proposed Ship Detection Method
Ship detection can be viewed as a binary classification problem.Conventional methods generally complete ship detection in an unsupervised way.Supervised methods, typically deep neural network methods, provide us a new vision for ship detection.Thus, we propose a novel ship detection method via TDDDL.Figure 1 shows the main framework of the proposed ship detection method.Given a PolSAR image, superpixel segmentation is performed following the filter operation with boxcar filter.The superpixel is employed as the basic processing cell.Based on the superpixel segmentation result, we train a task-driven discriminative dictionary and a linear classifier jointly via TDDDL.Then, we encode the superpixels with the learnt dictionary.Finally, we achieve ship detection with binary classification.For the sake of completeness and readability of this article, we briefly introduce the PolSAR image data here.Considering a reciprocal target illuminated by a monostatic SAR, the polarimetric information can be described by a complex scattering vector  = [  √2    ]  (22) where ,

HH HV
SS and VV S denote the complex scattering coefficients.The scattering vector can be multi-look processed for the purpose of speckle reduction, which can be expressed as where H denotes the conjugate transpose operation, and n is the number of looks.The resulting matrix  is called the n-look covariance matrix.We further represent the covariance matrix  as a real vector  ∈ ℝ 9×1 i.e.

𝒑 = [𝑇
which is called pixel element in PolSAR images.

Superpixel Segmentation
Pixel-based methods utilize single pixel information only, but not the characteristics of local regions.With the improved resolution of PolSAR images, the ship target regions show detailed structure and texture.We can speculate that considering the cues of the region one pixel belongs to will benefit the decision of this pixel, since they may reveal the regional structural or textural difference between the target and clutter.Recent research has also shown that the superpixel can help to retain the target outline and suppress the speckle noise in target detection task [27,28].On the other hand, since neighboring pixels often share the same label with high probability, joint sparsity prior can be adapted exactly at the superpixel-level.
Superpixel segmentation methods in optical images cannot be applied roughly in PolSAR images due to the influence of strong speckle noise.In this article, we use the simple iterative clustering method with boundary constraints (SLIC-BC), which was proposed by Lin et al. [29].SLIC-BC is an adaption of SLIC [30] with two modifications: 1) A new distance measure is proposed, providing control over boundary adherence, homogeneity and compactness of the superpixels simultaneously.2) A new strategy to update the positions and intensities of superpixel seeds is proposed.Only reliable pixels within one superpixel can be used to update the superpixel seed.We give a brief introduction about SLIC-BC, and more implementation details can be found in Lin et al. [29].For the sake of completeness and readability of this article, we briefly introduce the PolSAR image data here.Considering a reciprocal target illuminated by a monostatic SAR, the polarimetric information can be described by a complex scattering vector where S HH , S HV and S VV denote the complex scattering coefficients.The scattering vector can be multi-look processed for the purpose of speckle reduction, which can be expressed as where H denotes the conjugate transpose operation, and n is the number of looks.The resulting matrix T is called the n-look covariance matrix.We further represent the covariance matrix T as a real vector p ∈ R 9×1 i.e., p = [T 11 , T 22 , T 33 , Re(T 12 ), Im(T 12 ), Re(T 13 ), Im(T 13 ), Re(T 23 ), which is called pixel element in PolSAR images.

Superpixel Segmentation
Pixel-based methods utilize single pixel information only, but not the characteristics of local regions.With the improved resolution of PolSAR images, the ship target regions show detailed structure and texture.We can speculate that considering the cues of the region one pixel belongs to will benefit the decision of this pixel, since they may reveal the regional structural or textural difference between the target and clutter.Recent research has also shown that the superpixel can help to retain the target outline and suppress the speckle noise in target detection task [27,28].On the other hand, since neighboring pixels often share the same label with high probability, joint sparsity prior can be adapted exactly at the superpixel-level.
Superpixel segmentation methods in optical images cannot be applied roughly in PolSAR images due to the influence of strong speckle noise.In this article, we use the simple iterative clustering method with boundary constraints (SLIC-BC), which was proposed by Lin et al. [29].SLIC-BC is an adaption of SLIC [30] with two modifications: (1) A new distance measure is proposed, providing control over boundary adherence, homogeneity and compactness of the superpixels simultaneously.
(2) A new strategy to update the positions and intensities of superpixel seeds is proposed.Only reliable pixels within one superpixel can be used to update the superpixel seed.We give a brief introduction about SLIC-BC, and more implementation details can be found in Lin et al. [29].
The distance measurement in SLIC-BC consists of three parts: boundary term, homogeneity term and compactness term, which is defined as where d(x, l) denotes the distance between pixel x and the lth superpixel, d b (x, l), d h (x, l) and d c (x, l) denote the boundary term, homogeneity term and compactness term of the measurement, respectively.The parameters w b , w h and w c denote the weight coefficients, which are defined as (2(d b (x,l)+d h (x,l)+d c (x,l))) , and And α is a parameter to flexibly control the compactness of the resulting superpixels.The boundary term d b (x, l) is defined with the probability of a pixel lying on object boundary where C win (x) is the collection of pixels in a win × win window with pixel x as its center, |C win (x)| denotes the number of pixels in C win (x), and g(x i ) denotes the gradient at pixel x i .The homogeneity term d h (x, l) is defined with Wishart distance where Tr(•) denotes the trace of a matrix, Z x and C l is the covariance matrix of pixel x and lth superpixel seed, respectively.And the compactness term is defined with the Euclidean distance where (r x , c x ) and (r l , c l ) denote the coordinates of pixel x and the lth superpixel seed in the xy plane, respectively.
In the updating strategy in SLIC-BC, Wishart distribution is used to measure the reliability of pixels Only reliable pixels are used to update the position and intensities of the superpixel seeds.The position and intensities of the superpixel seeds are updated by where i is the number of iterations, L(x) returns the superpixel label of pixel x, and ν is the reliability threshold.
Since we require target-contained superpixels with less background pixels, a relatively small superpixel size is preferred.In practice, the desired superpixel size is a key parameter determining the average size of superpixels.In this paper, we set the desired superpixel size to be half the ship target size.

Learning Dictioanry with TDDDL
After obtaining the superpixel result, we train a dictionary and a linear classifier via TDDDL.In the training data, we denote the ship superpixel as X s = [x s 1 , x s 2 , . . ., x s T s ] ∈ R 9×T c , and the clutter superpixel as , where T s and T c are the corresponding superpixel size, respectively.The dictionary D = [D s , D c ] ∈ R 9×P consists of two sub-dictionaries, where D s ∈ R 9×P s and D c ∈ R 9×P c are the sub-dictionary corresponding to the ship and clutter.We initialize the dictionary D by learning category-specific dictionary in an unsupervised way.Concretely, we compute the category-specific sub-dictionary by solving and min where D s j and D c j are the jth column of D s and D c , respectively.The initial classifier parameter matrix W is determined based on the label information Y and the initial dictionary D. We have where denote the label of the ship superpixels and clutter superpixels, and [A s , A c ] is the solution of Equation ( 7) by substituting [X s , X c ] for X.With the initialization results of D and W above, we complete dictionary learning via Algorithm 1 and obtain the learnt dictionary D and classifier W.

Encoding with Learnt Dictionary
With the learnt dictionary D, we encode an unlabeled superpixel X u = [x u 1 , x u 2 , . . ., x u T u ] ∈ R 9×T u by solving the following problem: The resulting feature A u is the sparse code of the sample X u on the learnt dictionary D. In fact, it is constructed in virtual dictionary domain, while conventional features are constructed in the original image domain.Usually the features on the image domain do not achieve good enough results for harsh detection tasks, because the ship and clutter in the image domain are not sufficiently different.However, the difference can be amplified in transform domain.On the other hand, the features in most previous methods are designed artificially, while the feature in the proposed method is designed with active learning strategy, and thus, is more adaptive.The learnt feature in the proposed method includes more information about the data and can reveal the polarimetric structure difference between the ship and the clutter.In addition, the proposed method performs feature extraction and threshold determination jointly.It has been proven that feature and threshold joint learning is significantly better than their respective learning [26,31].Therefore, we can conclude that the feature learnt in the proposed method theoretically offers significant advantages compared to previous methods.

Binary Classification
Once the feature A u is obtained, we identify the label of each pixel of X u based on the following rule indentity( where x u i denotes the ith pixel element of superpixel X u , α u i denotes ith column of A u , and the function sign(•) returns the sign of a real number.Since pixels within a superpixel have similar characteristics such as intensity, texture and polarimetric structure, we identify the label of superpixel X u based on the pixel label indentity(X u ) = sign Finally, we obtain the binary ship detection result.

Experiments and Discussions
C-band RADARSAT-2 polarimetric SLC SAR images acquired over Tanggu port area and Dalian port area are used for experiments.The parameters of the images are tabulated in Table 1.The intensity images of single polarimetric channels, Pauli vector color-coded images and geographic locations are shown in Figures 2 and 3.The areas R1 and R3 in Figures 2e and 3e are used for testing, with training areas selected from R2 and R4, respectively.The ground truth of the testing areas R1 and R3 are shown with Pauli vector color-coded in Figure 4.The strong (group S) and weak (group W) targets are marked with green rectangles and yellow circles, respectively.The ground truth definition is based on the previous work by Song et al. [31] and He et al. [32].We know that it is easy to detect strong targets and difficult to detect weak targets.And in real-scene SAR images, strong and weak targets often appear together.In order to demonstrate the performance comparison, especially the weak target detection performance comparison, we separate the targets into strong (group S) and weak (group W) targets according to the relative magnitude of the target average intensity and its surrounding clutter average intensity.If the target average intensity is higher than its surrounding clutter average intensity, the target is grouped as the strong target; otherwise, the target is grouped as the weak target.The surrounding clutter pixels are chosen in a window with the target as its center and the window size is set as twice the average target size.Meanwhile, adjacent target pixels in the window are excluded as outliers for clutter estimation based on ground truth.
In the following, we first describe the parameter setting of the proposed method, and then give the performance evaluation on synthetic data.Finally, the performance evaluation on real-scene data is also presented.We compare the proposed method with the iterative censoring CFAR (IC-CFAR) detector [10], Variational Bayesian Inference (VBI) [31], Superpixel-level local information measurement (SLIM) detector [32], and contextual region-based convolutional neural network with multilayer fusion (CRCNN-MF) [15].The ground truth of the testing areas R1 and R3 are shown with Pauli vector color-coded in Figure 4.The strong (group S) and weak (group W) targets are marked with green rectangles and yellow circles, respectively.The ground truth definition is based on the previous work by Song et al. [31] and He et al. [32].We know that it is easy to detect strong targets and difficult to detect weak targets.And in real-scene SAR images, strong and weak targets often appear together.In order to demonstrate the performance comparison, especially the weak target detection performance comparison, we separate the targets into strong (group S) and weak (group W) targets according to the relative magnitude of the target average intensity and its surrounding clutter average intensity.If the target average intensity is higher than its surrounding clutter average intensity, the target is grouped as the strong target; otherwise, the target is grouped as the weak target.The surrounding clutter pixels are chosen in a window with the target as its center and the window size is set as twice the average target size.Meanwhile, adjacent target pixels in the window are excluded as outliers for clutter estimation based on ground truth.In each image, the strong (group S) and weak (group W) targets are marked with green rectangles and yellow circles, respectively.

Parameter Setting
The parameters of the comparative methods are set up according to the original papers.For instance, all the hyperparameters involved in the VBI method are set in noninformative manner to reduce their impact on the estimation of posterior distributions.Thus, both hyperparameters  1  1  1 and  2 in VBI are set to be 10 -6 .For the IC-CFAR method, the confidence level of a pixel being target is set as 0.02 [10].Thus, an index matrix can be obtained to label whether each pixel of the image is a potential target pixel or not.More parameter setting details can be found in [9,15,31,32].
The parameters in the proposed method include the parameters of superpixel segmentation and the parameters of TDDDL.The parameter  in superpixel segmentation largely depends on our needs.When  is large, spatial proximity is more important and the resulting superpixels are more compact.When  is small, the resulting superpixels adhere more tightly to image boundaries, but have less regular shape.For PolSAR images,  can be in the range [0.5,20].In this paper, we prefer superpixels with more boundary adherence.Thus, smaller  should be employed.Figure 5 shows the superpixel segmentation results with different .We can find that boundary adherence is not guaranteed when  = 3.0, and the boundary is too complicated when  = 0.7.Empiracalliy, the range [0.8, 1.5] is prefered.In this paper, we set the parameter  to be 1.0.In the following, we first describe the parameter setting of the proposed method, and then give the performance evaluation on synthetic data.Finally, the performance evaluation on real-scene data is also presented.We compare the proposed method with the iterative censoring CFAR (IC-CFAR) detector [10], Variational Bayesian Inference (VBI) [31], Superpixel-level local information measurement (SLIM) detector [32], and contextual region-based convolutional neural network with multilayer fusion (CRCNN-MF) [15].

Parameter Setting
The parameters of the comparative methods are set up according to the original papers.For instance, all the hyperparameters involved in the VBI method are set in noninformative manner to reduce their impact on the estimation of posterior distributions.Thus, both hyperparameters β 1 β 1 β 1 and β 2 in VBI are set to be 10 −6 .For the IC-CFAR method, the confidence level of a pixel being target is set as 0.02 [10].Thus, an index matrix can be obtained to label whether each pixel of the image is a potential target pixel or not.More parameter setting details can be found in [9,15,31,32].
The parameters in the proposed method include the parameters of superpixel segmentation and the parameters of TDDDL.The parameter α in superpixel segmentation largely depends on our needs.When α is large, spatial proximity is more important and the resulting superpixels are more compact.When α is small, the resulting superpixels adhere more tightly to image boundaries, but have less regular shape.For PolSAR images, α can be in the range [0.5, 20].In this paper, we prefer superpixels with more boundary adherence.Thus, smaller α should be employed.Figure 5 shows the superpixel segmentation results with different α.We can find that boundary adherence is not guaranteed when α = 3.0, and the boundary is too complicated when α = 0.7.Empiracalliy, the range [0.8, 1.5] is prefered.In this paper, we set the parameter α to be 1.0.
is set as 0.02 [10].Thus, an index matrix can be obtained to label whether each pixel of the image is a potential target pixel or not.More parameter setting details can be found in [9,15,31,32].
The parameters in the proposed method include the parameters of superpixel segmentation and the parameters of TDDDL.The parameter  in superpixel segmentation largely depends on our needs.When  is large, spatial proximity is more important and the resulting superpixels are more compact.When  is small, the resulting superpixels adhere more tightly to image boundaries, but have less regular shape.For PolSAR images,  can be in the range [0.5,20].In this paper, we prefer superpixels with more boundary adherence.Thus, smaller  should be employed.Figure 5 shows the superpixel segmentation results with different .We can find that boundary adherence is not guaranteed when  = 3.0, and the boundary is too complicated when  = 0.7.Empiracalliy, the range [0.8, 1.5] is prefered.In this paper, we set the parameter  to be 1.0.For the parameters in TDDDL, we use a few simple heuristics to reduce the search space, which are used in many dictionary learning methods [16,33,34].The regularization parameter µ is fixed to be 10 −3 [34].And we try parameters λ = 0.35 + 0.05j, with j ∈ {−3, −2, . . ., 2, 3}.The candidate parameters of η are {0, 0.05, . . . , 0.25, 0.3}.The detection performance versus the regularization parameter λ and η is demonstrated in Figure 6.Based on these figures, we obtain optimal parameters in TDDDL.Here we list the parameters used in the proposed method in Table 2.For the parameters in TDDDL, we use a few simple heuristics to reduce the search space, which are used in many dictionary learning methods [16,33,34].The regularization parameter  is fixed to be 10 -3 [34].And we try parameters  = 0.35 + 0.05, with  ∈ {−3, −2, … , 2, 3}.The candidate parameters of  are {0, 0.05, … , 0.25, 0.3}.The detection performance versus the regularization parameter  and  is demonstrated in Figure 6.Based on these figures, we obtain optimal parameters in TDDDL.Here we list the parameters used in the proposed method in Table 2.

Performance Evaluation on Synthetic Data
First, we evaluate the performance of the test methods quantitatively using synthetic data.In the synthetic data, sea clutter is modeled by K distribution with SCR ranging from 0 dB to 12 dB.Figure 7 shows the span images of the synthetic data and the corresponding ground truth image where white pixels denote target pixels.In order to evaluate the ship detection results of the test methods in a quantitative way, the actual detection possibility P d and figure of merit FoM are defined as follows: where n target denotes the total number of true target pixels, n dt and n dc are the number of correctly detected target pixels and that of clutter pixels detected as target pixels, respectively.Higher detection probability and figure of merit implies better detection methods.Figure 8 presents the detection performances of the five test methods under different SCR conditions.We can find that all the test methods show satisfactory performances in detection probability at relatively high SCR conditions, while the proposed method and SLIM achieve higher detection probability at low SCR conditions.Obviously, the figure of merit of the proposed method is higher than those of the comparative methods, which implies that the proposed method produces less false alarms.Therefore, we can conclude that the proposed method outperforms the comparative methods on synthetic data.

Performance Evaluation on Real-Scene Data
Two RADARSAT-2 images acquired over Tanggu port and Dalian port are used for real-scene

Performance Evaluation on Real-Scene Data
Two RADARSAT-2 images acquired over Tanggu port and Dalian port are used for real-scene validation.The ship number of different groups is tabulated in Table 3. Table 4 and Figures 9

Performance Evaluation on Real-Scene Data
Two RADARSAT-2 images acquired over Tanggu port and Dalian port are used for real-scene validation.The ship number of different groups is tabulated in Table 3. Table 4 and Figures 9-13 report the ship detection results of the proposed method and the comparative methods.In Table 4, N dt = N S dt + N W dt denote the total number of detected ships, where N S dt and N W dt denote the number of detected ships belonging to group S and group W, respectively.The number of false alarms is denoted as N dc .In Figures 9-13, the false alarms and missed targets are marked with red circles and white rectangles, respectively.In Figure 9a over Tanggu port area, we can see that the proposed method can detect all the ship targets, including 35 strong targets and 11 weak targets, with only 1 false alarm.In Figure 9b over Dalian port area, the proposed method can detect all 35 strong targets and 4 weak targets, with no false alarms.All comparative methods can detect almost all the strong targets.The performance gaps lie in false alarms and weak target detection.For Tanggu port area, the IC-CFAR detector detect 2 weak targets with 11 false alarms, and CRCNN-MF detect 5 weak targets with 9 false alarms.The VBI and SLIM perform better than the IC-CFAR detector and CRCNN-MF in the image of Tanggu port area.The VBI detect 6 weak targets with 5 false alarms, and SLIM detect 10 weak targets with 4 false alarms.For Dalian port area, the proposed method, VBI and CRCNN-MF outperform the IC-CFAR detector and SLIM, producing fewer false alarms.In summary, the proposed method achieves the highest detection accuracy and lowest false alarm rate.Therefore, we can conclude that the proposed method is superior to the comparative methods.Besides detection accuracy and false alarms, the shape preserving of detection results is another issue.First, we require that the detected targets are complete.Moreover, the detected targets should not be redundant.In the detection results, we mark the broken detected targets with blue rectangles.By comparing the results in Figures 9-13, we can see that the proposed method, SLIM and CRCNN-MF produce much less broken detected targets than the other two methods.However, the sea clutter pixels around the ships influenced by ship tend to be detected by SLIM and CRCNN-MF, and the detected targets produced by SLIM and CRCNN-MF are fatter than those produced by the proposed method, IC-CFAR detector and VBI.The proposed method mainly captures the dominant scatters on ships and maintains the best shape preserving ability.Table 5.Time consumption of different methods.Table 5.Time consumption of different methods.

Computational Cost
The proposed approach and the comparative methods are written in MATLAB code.All the experiments are performed though MATLAB 2017b in 64-bit Windows system with a hardware environment of Intel Core i7 8700 processor and 16-GB RAM.The time consumption of different methods is listed in Table 5.We can see that the proposed method maintains comparable testing time with the CRCMM-MF, while the training time of the proposed method is greatly reduced.The VBI is much slower than the other methods.The main reason for this may depend on the special structure of algorithm, where the updates of the latent variables' expectations are highly coupled.The IC-CFAR detector is the fastest method.The efficiency of the IC-CFAR detector is mainly due to the initial detector applied to the entire cross-polarization image without sliding window.In the SLIM, the sliding window is applied on the superpixel level for a fast processing.And the multiscale superpixel segmentation and local information computation are time consuming.Thus, its computation complexity is moderate.In summary, the proposed method maintains low computation complexity, while it has satisfactory detection probability and figure of merit, as well as low false alarms and good shape preserving ability.

19 Figure 1 .
Figure 1.The flowchart of the proposed ship detection method.The blue box is the input, the red boxes are the key processing modules, and the green boxes are the outputs.

Figure 1 .
Figure 1.The flowchart of the proposed ship detection method.The blue box is the input, the red boxes are the key processing modules, and the green boxes are the outputs.

Figure 3 .
Figure 3. RADARSAT-2 PolSAR image over Dalian port area.(a-d) Intensity images of HH, HV, VH and VV channels.(e) Pauli vector color-coded image, using |HH − VV|, |HV|, and |HH + VV| as red, green, and blue, respectively.The areas R3 and R4 are used for training and testing, respectively.(f) The geographic location of Dalian port area.

Figure 4 .
Figure 4. RADARSAT-2 PolSAR images for testing.Pauli vector color-coded images over (a) R2 and (b) R4 areas.In each image, the strong (group S) and weak (group W) targets are marked with green rectangles and yellow circles, respectively.

Figure 4 .
Figure 4. RADARSAT-2 PolSAR images for testing.Pauli vector color-coded images over (a) R2 and (b) R4 areas.In each image, the strong (group S) and weak (group W) targets are marked with green rectangles and yellow circles, respectively.

Figure 8 .
Figure 8. Performance evaluation on synthetic data.(a) The detection possibility.(b) The figure of merit.

Figure 11 .
Figure 11.Ship detection results of VBI.(a) Tanggu port.(b) Dalian port.The false alarms, missed targets and broken detected targets are marked with red circles, white rectangles and blue rectangles, respectively.

Figure 11 .Figure 12 .Figure 13 .
Figure 11.Ship detection results of VBI.(a) Tanggu port.(b) Dalian port.The false alarms, missed targets and broken detected targets are marked with red circles, white rectangles and blue rectangles, respectively.

Figure 12 .
Figure 12.Ship detection results of SLIM.(a) Tanggu port.(b) Dalian port.The false alarms, missed targets and broken detected targets are marked with red circles, white rectangles and blue rectangles, respectively.

Figure 11 .Figure 12 .Figure 13 .
Figure 11.Ship detection results of VBI.(a) Tanggu port.(b) Dalian port.The false alarms, missed targets and broken detected targets are marked with red circles, white rectangles and blue rectangles, respectively.

Figure 13 .
Figure 13.Ship detection results of CRCNN-MF.(a) Tanggu port.(b) Dalian The false alarms and missed targets are marked with red circles and white rectangles, respectively.No broken detected targets.

Table 1 .
The parameters of the PolSAR images.

Table 1 .
The parameters of the PolSAR images.

Table 2 .
The parameters used in the proposed method.

Table 2 .
The parameters used in the proposed method.

Table 3 .
Ground truth of ships.

Table 4 .
Performance comparison of the test methods.dt denote the number of detected ships belonging to group S and group W, respectively.N W dt denotes the total number of detected ship, and N dc denotes the number of false alarms.