Adaptive Max-Margin One-Class Classiﬁer for SAR Target Discrimination in Complex Scenes

: Synthetic aperture radar (SAR) target discrimination is an important stage that distinguishes targets from clutters in the radar automatic target recognition ﬁeld. However, in complex SAR scenes, the performance of some traditional discriminators will degrade. As an effective tool for one-class classiﬁcation (OCC), the max-margin one-class classiﬁer has attracted much attention for SAR target discrimination, as it can effectively reduce the impact of multiple clutters. However, the performance of the max-margin one-class classiﬁer is very sensitive to the values of kernel parameters. To solve the problem, this paper proposes an adaptive max-margin one-class classiﬁer for SAR target discrimination in complex scenes. In a max-margin one-class classiﬁer with a suitable kernel parameter, the distance between a sample and classiﬁcation boundary satisﬁes a certain geometric relationship, i.e., edge samples in input space are transformed to the region in the kernel space close to boundary, while interior samples in input space are transformed to the region in the kernel space far away from boundary. Therefore, we deﬁne the information entropy of samples in the kernel space to measure the distance between samples and classiﬁcation boundary. To automatically obtain the optimal kernel parameter of the max-margin one-class classiﬁer, the edge and interior samples in the input space are ﬁrst selected, and then the parameter optimization is performed by minimizing information entropy of interior samples and simultaneously maximizing the information entropy of edge samples. Experimental results of the synthetic datasets and measured synthetic aperture radar (SAR) datasets validate the effectiveness of our method.


Introduction
The development of synthetic aperture radar (SAR) imaging technology has resulted in great attention to the field of SAR automatic target recognition (ATR) [1,2]. The SAR ATR system usually contains three basic stages [3,4]: detection [5][6][7], discrimination [8][9][10], and recognition [11,12]. The target detection stage aims to locate the targets of interest and obtain the candidate target results, which contain the true targets and some clutters. The main task of target discrimination is to remove the false alarm clutters from the candidate target results and reduce the burden of the recognition stage that identifies the target type. As the second stage of SAR ATR, target discrimination performs an important role in SAR ATR systems and receives lots of attention in the remote sensing images processing field.
Many target discrimination methods have been developed, and many traditional target discrimination methods mainly focus on discriminative feature extraction. In [8], Wang et al. propose a superpixel-level target discrimination method that uses the multilevel and multidomain feature descriptor to obtain the discriminative features. Wang et al. [9] extract the local SAR-SIFT features that are then encoded to improve the category-specific performance. Moreover, Li et al. [10] develop a discrimination method by extracting the scattering center features of SAR images, which can effectively identify the targets from clutters. Although these methods [8][9][10] perform well on SAR target discrimination, they ignore the design of discriminators, which may restrict their discrimination performance in complex scenes.
Several classifiers have been proposed to solve one-class classifier (OCC) problems and have been applied for discrimination. They can be categorized into three groups: statistical methods, reconstruction-based methods, and boundary-based methods. In statistical methods, the probability density function (PDF) [13] of the target samples is estimated firstly, and then a threshold is predefined to determine whether test samples are generated from this distribution. Reconstruction-based methods, such as Auto-encoder (AE) and variational AE [14], learn a representation model by minimizing the reconstruction errors of the training samples from the target class; the reconstruction errors of test samples are then used to judge whether they belong to the target class. In boundary-based methods [15,16], a boundary is constructed with the training samples only from target class to determine the region where the target samples are located. The most well-known boundary-based method are max-margin one-class classifiers, i.e., one-class support vector machine (OCSVM). As discussed above, statistical methods are simple and carried out by estimating the probability density of the target sample distribution, but they rely on a large number of training samples to estimate a precise probability density, especially when the dimension of training data are high. In addition, reconstruction-based methods are effective and explore the representative features for one-class classification, but they also need sufficient training samples in order to learn a suitable model for the target samples. In OCSVM, the kernel transformation makes them handle nonlinear data easily, the relaxation items make them generalize well, and the sparse support vectors help them save a lot of storage space, thus they have gained significant attention for solving OCC problem. However, the performance of the OCSVM is very sensitive to the values of the kernel parameters.
Recently, several methods [17][18][19] have been developed to select the suitable kernel parameters for OCSVM. First, a kernel parameter candidate set is predefined and the value of objective function for each element in the candidate set is computed. Next, the optimal kernel parameter is selected based on the minimum/maximum objective function values. In [17], Deng et al. propose a method referred to as SKEW for OCSVM based on the false alarm rate (FAR) and missing alarm rate (MAR). Wang et al. [18] introduce a MinSV+MaxL method for SVDD, which computes the objective function value V L and the proportion V s between the support vectors and training samples for each element in the parameter candidate set, and the optimal parameter is determined by the maximum difference between adjacent times of V L and V s . Xiao et al. [19] put forward a MIES method for OCSVM via the distance between the sample and classification boundary. Although the methods [17][18][19] can select a suitable Gaussian kernel parameter, they suffer from two main challenges: (1) it is difficult to predefine the kernel parameter candidate set in the range of (0, +∞); (2) the computing burden of these methods is large, since the value of objective function is computed for each parameter in the candidate set, especially when there are many elements in the candidate set. Consequently, these kernel parameter selection methods still restrict the performance of the one-class classifier.
To access the above issues, this paper aims to develop an adaptive max-margin oneclass classifier by automatically obtaining the optimal kernel parameter, which is adaptive to the complex scenes of SAR images. The motivations of our method are as follows: (1) An adaptive max-margin one-class classifier is developed for SAR target discrimination in complex scenes, in which a suitable kernel parameter of the max-margin one-class classifier is learned based on the geometric relationship between the sample and classification boundary without a parameter candidate set. In this way, the proposed method can not only achieve the promising discrimination performance, but also avoid the difficulty of determining the parameter candidate set and reduce the computational cost in the training stage. In detail, for the max-margin one-class classifier, the training samples in the input space are mapped to the kernel space via the kernel transformation. Then, the classification boundary is constructed in the kernel space via the samples that are closest to the classification boundary, i.e., support vectors (SVs). As discussed in [20], with a suitable kernel parameter, the max-margin one-class classifier can ensure that the edge samples in the input space are transformed to the region in the kernel space close to the classification boundary and more likely to become SVs, and the interior samples in the input space are transformed to the region in the kernel space far away from the boundary and unlikely to become the SVs. Thus, an optimal kernel parameter for the max-margin one-class classifier can be adaptively obtained based on the above geometric relationship. (2) We define the information entropy of samples as the objective function of our method, which measures the distance between a sample and the classification boundary in the kernel space that can be automatically optimized by the gradient descent algorithm. Specifically, the larger entropy value a sample has, the closer the sample is to the classification boundary. The optimal kernel parameter can be learned by maximizing the information entropy of edge samples and simultaneously minimizing the information entropy of interior samples. In this way, the optimal kernel parameter can ensure that the edge samples in the input space are projected to the area in the kernel space close to the classification boundary, while the interior samples in the input space are projected to the area in the kernel space far away from the classification boundary. Based on the above criterion, our method can obtain the optimal kernel parameter that further devotes to the promising discrimination performance.
This paper focuses on the exploration of an adaptive max-margin one-class classifier for SAR target discrimination in complex scenes, and the main contributions of this paper are summarized as: (1) the geometric relationship between the sample and classification boundary is utilized to learn a suitable kernel parameter for OCSVM without a parameter candidate set, which can effectively reduce the computational cost in the training stage and ensure a favourable performance for SAR target discrimination; (2) the information entropy is defined for each sample to measure the distance between a sample and the classification boundary in the kernel space, which is adopted as the objective function of our method that can be automatically optimized by the gradient descent algorithm.
The remainder of this paper is organized as follows: a review about max-margin oneclass classifiers is given in Section 2, and the proposed method is presented in Section 3. In Section 4, some experimental results on synthetic datasets and measured synthetic aperture radar (SAR) datasets are presented. Finally, Sections 5 and 6 describe the discussion and conclusions, respectively.

Max-Margin One-Class Classifier
One-class SVM is a domain-based classification method that looks for the classification hyperplane to set the boundary of the target class sample; most of the training samples are located above the hyperplane, and the distance from the origin to the hyperplane is the largest. The maximum distance from the origin to the hyperplane is called the "maximum-margin", so the OCSVM is also called the max-margin one-class classifier. The 2D illustrations of OCSVM are shown in Figure 1. The objective function of OCSVM is shown in Equation (1): with w being the slope of the classification hyperplane, η being a tradeoff parameter, and ξ i the slack term. Moreover,x i is represented as follow: where κ(·, ·) denotes the kernel function. As a most widely used feature transformation, Gaussian kernel transformation possesses the special characteristics of similarity preserving and its value is 0 < κ(·, ·) ≤ 1. The Gaussian kernel transformation transforms the data in input space into the unit hypersquare of the first quadrant in the kernel sparse. The target samples are transformed to the region far away from the origin, while the clutter samples are projected into the region near the origin. For dataset {x i } N i=1 , Gaussian kernel function defines the inner product of two samples in the kernel space, and thus, κ(·, ·) can be further formulated as: where φ(·) is the Gaussian kernel transformation without explicated expression, and σ is the Gaussian kernel parameter. It is obvious that φ( Therefore, the value of κ x i , x j measures the similarity of two samples in the kernel space. Different values of Gaussian kernel parameters correspond to different distributions of samples in the kernel space. On the consideration of two extreme casesσ → +∞ and σ → 0 -we can see that κ x i , x j is very close to 1 for any paired-samples when σ → +∞ , thus the cosine of the angle between two samples in the kernel space is close to 1. In other words, all samples are mapped to the same location in the kernel space if σ → +∞ . On the contrary, κ x i , x j is close to 0 for any paired samples when σ → 0 , thus the cosine of the angle between two samples in the kernel space is close to 0. Therefore, all samples are mapped to the edge of each quadrant in the kernel space if σ → 0 . Since different values of Gaussian kernel parameters correspond to the different distributions of samples in the kernel space, the decision boundaries of OCSVM are different when the selection of Gaussian kernel parameters are different. In addition, Equation (1) can be transformed as Equation (4) via Lagrange multiplier theory: The problem of Equation (4) can be solved with a sequential minimal optimization (SMO) algorithm [21]. Once the optimal solution α is obtained, the decision function of OCSVM is given in Equation (5): The test samples x * belongs to the target class in the case of f (x * ) ≥ 0. Otherwise, x * belongs to the non-target class. It is can be seen from Equation (5) that the parameters of classification boundary in OCSVM are determined by the samples with coefficient α larger than 0, i.e., SVs. In other words, the classification boundary of OCSVM is decided via SVs, which is aligned with the analysis in the Introduction. According to the above analysis, this paper chooses the Gaussian kernel function for the max-margin one-class classifier.

The Proposed Method
In this section, a detailed introduction to our method will be shown. In Section 3.1, the algorithm of interior and edge samples selection is first presented. Then, the definition of information entropy for each sample in the kernel space is shown in Section 3.2. Finally, the objective function for automatically learning the optimal kernel parameter is given in Section 3.3.

Sample Selection
First of all, our method chooses the edge and interior samples in the input space. For 2D data, we can manually select samples via visual results of data distribution, but it is beyond our reach to manually select samples in high-dimensional space. Therefore, an algorithm that can automatically select edge and interior samples is a key step in our method.
In general, for an interior sample x i , its nearest neighbors are evenly sitting on two sides of the tangent plane passing through x i . On the contrary, most of the nearest neighbors of an edge sample x i only sit on one side of the tangent plane passing through x i . Such local geometric information between the samples and their nearest neighbors can be used for the selection of interior and edge samples. We should point out that the edge sample selection idea is from article [22], while the idea of interior sample selection is further induced in this paper. To give an illustration to the geometric relation, Figure 2 presents the schematic of an edge sample and an interior sample, including their k-nearest neighbors, normal vectors, and target tangent planes. In detail, based on the k-nearest neighbors of the sample x i , the normal vector V i of the tangent plane passing through sample x i can be approximated as follows: where x ij is the jth neighbor of x i . Then, the dot products between the normal vector and the vectors from x i to its k-nearest neighbors can be computed: Thus, the fraction of nonnegative dot products is calculated based on θ ij : In Figure 2b, the nearest neighbors sit evenly on two sides of the tangent plane for the interior sample. Consequently, the value of l i is close to 0.5. On the contrary, as shown in Figure 2a, most of the nearest neighbors of the edge sample sit only on one side of the tangent plane. Therefore, the value of l i is close to 1. In other words, the criterion of selecting the edge and interiors samples is expressed as: where γ and η are predefined parameters with small values. For the sample selection algorithm, there are three predefined parameters: K, γ, η. When the value of η is 0, all of the nearest neighbors of a sample sit evenly on two sides of its tangent plane. Similarly, when the value of γ is 0, all of the nearest neighbors of a sample are only located on one side of the tangent plane. Such requirements for sample selection are too strict. Therefore, the requirements for selecting samples are loose by setting small values for parameters γ and η. Empirically, as discussed in reference [22], we set the range of the parameters γ and η as [0, 0.1]. For the parameter K, the values of parameter K affects the estimation accuracy of the normal vector, and a recommended value is K = 5 ln N [14], with N being the number of training data.

Information Entropy of Samples
In the max-margin one-class classifier, the SVs are sparsely located on the decision boundary in the kernel space, while the interior samples are densely distributed inside the decision boundary. Therefore, if samples are close to the decision boundary, they are located in the low-density region and far away from most of other samples, and more likely to be CVs [13]. On the contrary, if samples are far away from the decision boundary, they are located in the high-density region and close to most of other samples [14].
For the samples, we can calculate the Euclidean distance between two samples in the kernel space as: with x i and x j denoting two samples in the training set and κ x i , x j the kernel function. As discussed in Section 2, we choose the Gaussian kernel function for κ x i , x j , and then, Equation (10) can be approximated as: which represents the Euclidean distance between the samples x i and x j . Then, the probability of dissimilarity between x i and x j is defined via dis ij : dis in (12) with N denoting the number of samples in the training set. For a sample x i , if it is located close to the decision boundary and far from most other samples x j , most of the values dis ij N j=1 are very big, and thus the probability of dissimilarity p ij is approximate to 1/N; if x i is far from the decision boundary and close to most of the other samples x j , most of the values dis ij N j=1 are very small, and thus the probability of dissimilarity p ij is approximately to 0 or 1.
Finally, the information entropy function related to the samples' Euclidean distance is defined for ith sample based on the probability of dissimilarity p ij : To solve the problem of log 0 in Equation (13), we set p ii = 1 instead of 0, and then the terms p ii log 2 p ii = 0 do not impact the calculation of other terms p ij log 2 p ij = 0(j = i) for H i . According to the property of information entropy, Equation (13) shows that, for sample x i close to the decision boundary with its probability of the dissimilarity p ij approximate to 1/N, the entropy value H i is very large; for sample x i far from the decision boundary with its probability of the dissimilarity p ij approximate to 0 or 1, the entropy value H i is very small. With the above analysis, the larger entropy value a sample has, the closer the sample is to the decision boundary. Thus, the information entropy of samples in Equation (13) can be utilized to measure the distance between the samples with the decision boundary in the kernel space.

Objective Function of the Proposed Method
As analyzed in [19,20], for an appropriate kernel parameter, the distance between samples and classification boundary satisfies a certain geometric relationship for the maxmargin one-class classifier, i.e., the edge samples in the input space are transformed to the region in the kernel space close to boundary and more likely to become SVs, while the interior samples in the input space are transformed to the region in the kernel space far away from boundary and unlikely to become SVs. In Section 3.2, we can see that the samples with large entropy values are close to boundary and more likely become SVs, while samples with small entropy values are far away from decision boundary and unlikely to become SVs. Therefore, for an appropriate kernel parameter, the entropy values of edge samples are high, while the entropy values of interior samples are low. Based on the above analysis, the optimal kernel is obtained via maximizing the subtraction of information entropy between the edge and interior samples. The objective function of our method is shown as: where H(x i ; σ) represents the entropy value of x i with kernel parameter σ, and C 1 and C 2 representing the set of edge samples and interior samples, respectively; N 1 and N 2 represent the number of edge samples and interior samples, respectively. By maximizing the information entropy of edge samples and minimizing the information entropy of interior samples, we ultimately obtain the optimal kernel parameter. The optimization of Equation (13) can be solved via the gradient descent algorithm. The gradient of Equation (14) with respect to parameter σ is given in Equation (15): where dis ik is the square of Euclidean distance of two samples x i and x k in the kernel space, and p in is the similarity between x i and x n . The formulations of dis ik and p in are predefined in Section 3.2. We summarize the whole procedure of our method in Algorithm 1. σ new = σ old + δ ∂J ∂σ ; 12: Calculate the value of objective function OF new based on Equation (13); 13 :

Results
To validate the target discrimination performance of our method, the synthetic datasets, UCI datasets, and measured synthetic aperture radar (SAR) dataset are used in this section. Three kernel parameter selection methods including MIES [19], MinSV+MaxL [18], and SKEW [17]-a parameter learning method referred to as MD [23]-are used to compare with our method. For parameter selection methods, the parameter candidate set for selection is set as [0.1, 10] with the interval of 0.1. For our method, the initial kernel parameter is set as 1. Moreover, some other discriminators, including k-means clustering [24], principle component analysis (PCA) [25], minimum spanning tree (MST) [26], Self-Organizing Map (SOM) [27], Auto-encoder (AE) [14], the minimax probability machine (MPM) [28], and two-class SVM [29], are also taken as comparisons from which to illustrate the promising performance of our method.
In the OCC problem, the confusion matrix reflects the primary source of the results, which is presented in Table 1.   Figure 3 presents the samples of targets and clutters for two kinds of 2D synthetic datasets, which shows that the distribution of targets are different from that of the outliers.  Tables 2 and 3, where the red bold denotes the best values on each dataset, and the bold italic denotes the second-best results per column. As can be seen in Figure 4, the decision boundaries of MIES and MinSV+MaxL are a litter tighter than our method, and the targets outside the boundaries are greater in number, thus missing alarms are more numerous and the recalls are lower. However, the decision boundaries of MD and SKEW are much looser, with many outliers inside the boundaries, which devotes to more false alarms and lower precision. Moreover, as shown in Figure 5, for the GMM-shaped dataset, the decision boundaries of MinSV+MaxL, SKEW, and MD are loose, thus there are many false alarms leading to low precision. The quantitative results in Tables 2 and 3 also indicate the better performance of our method than other methods on the toy datasets, with much higher precision, recall, F1score, accuracy, and AUC, since our method can learn the suitable kernel parameter that is utilized to obtain the decision boundary that is neither tight nor loose for the two toy datasets.   To further analyze the effectiveness of our method on learning the optimal kernel parameter, we present the test AUC curves with different kernel parameters, and point out the selected/learned kernel parameters by different methods in Figure 6. As we see from Figure 6, our method can learn the optimal kernel parameters on the curves, while other methods select the parameters either larger or smaller than the optimal solutions. Therefore, toy dataset results validate that our method can learn the optimal kernel parameter for the max-margin one-class classifier, which further helps it to learn the suitable decision boundaries to achieve the promising target discriminative performance.

Experiments on Measured SAR Dataset
In the following, a measured SAR dataset is utilized to verify the effectiveness of our method. In the field of automatic target recognition (ATR), the OCC task for SAR images is usually referred to the SAR target discrimination. The measured SAR dataset we used here is the MiniSAR dataset [30][31][32], which was collected by the Sandia National Laboratories of America, Albuquerque, NM, USA, in 2005. Moreover, the resolution of the images in the MiniSAR dataset is 0.1 m, and their size is 1638 × 2501. The MiniSAR dataset contains 20 images, from which we choose 4 images: 3 images for training and 1 image for testing. In Figure 7, we present the chosen four images, and we can see that their scenes are very complex. There are numerous SAR targets in the four images, covering cars, trees, buildings, grasslands, concrete grounds, roads, vegetation, a golf course, baseball field, and so on. Among these SAR targets, the cars are the target of interest, and other targets are regarded as the clutters. With the visual attention-based target detection algorithm [18], chips measuring 100 × 100 are obtained from SAR images of the MiniSAR dataset. Table 4 presents the detection results for the four SAR images in the MiniSAR data, and some chips from the MiniSAR dataset are given in Figure 8, where target samples are shown in the first row, and the clutter samples are shown in the second row. Since only target chips are used in the training stage, the training dataset only contains 135 target chips for the kernel parameter selection/learning methods. First of all, we conduct the target discrimination experiments compared with some kernel parameter selection/learning methods for the max-margin one-class classifier to illustrate the better performance of our adaptive method. The values of precision, recall, F1score, accuracy, and AUC results for different parameter selection/learning methods for the MiniSAR dataset are listed in Table 5, where the red bold denotes the best values on each dataset, and the bold italic denotes the second-best results per column. In Table 5, it is obvious that the best results for the measured dataset are obtained by our method in all criteria, which indicates our method can achieve significantly less false alarms and missing alarms, and thus gain much higher precision, recall, F1score, accuracy, and AUC results. In addition, in Figure 9 we further plot the test AUC curves with different kernel parameters and indicate the selected or learned kernel parameters by different methods for the dataset, in which our method reaches the optimal values with maximum test AUC. Therefore, we can conclude that our method can learn the optimal kernel parameter for the MiniSAR dataset, which demonstrates the effectiveness of our method for target discrimination.  In SAR target discrimination, two-class SVM [29] is also a common discriminator. Therefore, the performance of our method is compared with the two-class SVM on the MiniSAR dataset. Figure 10 shows the visualization results of our method and two-class SVM on the test SAR image, where green boxes denote the chip correctly discriminated, blue boxes denote the target chip correctly discriminated, and red boxes denote the clutter chip wrongly discriminated. From the discrimination results in Figure 10, our method gains much less false alarms, less missing alarms, and more corrected targets, illustrating the better discrimination performance of our method than that of the two-class SVM. To quantitatively compare the discrimination performance of our method with some other commonly used target discriminative methods, Table 6 lists the results of our method with some other discriminators. As shown in Table 6, the precision of our method is far higher than other methods, since the proposed method is a one-class classifier, while other methods are two-class methods that are trained with the targets and clutters. In complex SAR scenes, the SAR images contain multiple clutters. When the clutters in the training images are different from those in the test images, the performance of these methods will degrade a lot. Thus, these two-class classification methods tend to classify the background clutters as targets, and then the false alarms are very high, which leads to low precision. In addition, since these two-class classification methods learn the features of targets and clutters, most of the targets can be truly classified by these two-class classification methods, and then the number of false alarms FP is small, which further aids high recall. Since our method is trained only with target samples, it can effectively decrease false alarms and cause some missing alarms, which respectively leads to high precision and low recall. The F1score presents the harmonic mean between the precision and recall. According to other quantitative results in Table 7, our method performs well on the SAR dataset in complex scenes with much higher values of F1score, accuracy and AUC, which comprehensively illustrates the promising target discrimination performance of our method.  Table 7. The learned values l for an edge sample and an interior sample.

Samples Edge Sample Interior Sample
The learned values l 0.9355 0.5160

Model Analysis
To visualize the effect of the algorithm of edge and interior samples selection in our method, we randomly choose an edge sample and an interior sample from the GMM dataset to present their k-nearest neighbors, normal vectors, and tangent planes. In Figure 11a, the near neighbors of the edge sample are mainly located on one side of its tangent plane, while Figure 11b shows that the near neighbors of the interior sample are well-distributed on the two sides of its tangent plane, which validates the theoretical local geometric relationship between the edge and interior samples with their nearest neighbors. Moreover, the values l of the edge samples and interior samples are listed in Table 7, which are very close to 1 and 0.5, respectively. Therefore, the results in Figure 11 and Table 7 verify the effectiveness of the sample selecting algorithm. Moreover, we also take the GMM dataset as an example to analyze the effect of samples selection on the learning of optimal kernel parameter and the boundary. Figure 12a-c shows the edge and interior samples selected by the algorithm in Section 3.1 with different parameters, in which red marks ' * ' denote the interior samples, and black marks ' ' denote the edge samples. From Figure 12a-c, we can see that the number of these selected samples gradually decreases. Moreover, in the three cases of selecting different numbers of samples, the corresponding curves of the objective function are presented in Figure 12d. As presented in Figure 12d, the respective maximums of the three objective function curves are reached when the kernel parameter equals the same values, i.e., s * = 1.32, which are marked with the red points. Moreover, in Figure 12a-c, the learned decision boundaries by our method are almost the same even though the number of selected edge samples and interior samples varies. Therefore, within a certain range, the objective function of our method is not very sensitive to the number of selected samples and can learn the optimal kernel parameter to build the suitable decision boundary.
In addition, we take the GMM dataset as an example to analyze the relationship between the distance from the samples to the boundary and the entropy values of samples. Under the optimal kernel parameter, we calculate the samples' information entropy in the kernel space, and in Figure 13, we indicate the samples with M-maximum information entropy with a black mark ' ' and N-minimum information entropy with a red mark ' * '.
As presented in Figure 13, the samples with large entropy values are located very close to the decision boundary, while the samples with small entropy values are mainly located in the center of Gaussian distributions that is far from the decision boundary. To sum up, Figure 13 shows that the larger entropy value a sample has, the closer the sample is to the boundary, which further validates the effectiveness of the defined information entropy for measuring the distance between a sample and the classification boundary.

Computational Complexity
Moreover, computational complexity is an important criterion with which to measure the practicality of a method. To analyze the computational complexity of our method with several one-class classification methods, Table 8 presents the results. The computational complexity of the max-margin one-class classifier is O N 3 [9], and the computational complexity of SKEW and MinSV+MaxL is O MN 3 , with M denoting the number of elements in the kernel parameter candidate set and N denoting the number of training samples. The computational complexity of MIES is given in [14], e.g., O MN 3 + MN 2 SV + MN SV N IE , where N SV represents the number of support vectors, and N IE denotes the number of selected edge and interior samples. In [17], the computational complexity of MD is O N 2 . In the first step of our method, the edge and interior samples are selected from the training set and the computational complexity of this step is O N 2 . Then, in the second step, the entropy values for the edge and interior samples are calculated, and the computational complexity of this step is O (N IE N). Therefore, the computational complexity of our method is O N 2 + N Iter N IE N , where N Iter denotes the number of iterations in gradient descent algorithm. Based on the above analysis, we list the computational complexity of different methods in the second row of Table 8. It is obvious that the computational complexity of our method is lower than those of MIES, MinSV+MaxL, and SKEW, and higher than that of MD. Finally, the third row of Table 8 shows the computation time of the different kernel parameter selection/learning methods from the MiniSAR dataset. The red bold denotes the best values on each dataset, and the bold italic denotes the second-best results per column. It is obvious that the computation cost of our method is much less than those of MIES, MinSV+MaxL, and SKEW, and only a little more than that of MD. Although the computation burden of MD is smallest, there is almost no difference between the MD and our method, while the discrimination performance of MD is far lower than our method on used datasets. Therefore, on the consideration of computation cost and discrimination performance, our method can obtain the best discrimination results with a small computation burden compared with some parameter selection/learning one-class classification methods.

Conclusions
This paper focuses on the construction of a novel adaptive max-margin one-class classifier for SAR target discrimination in complex scenes. On the basis of the geometric relationship between the sample and classification boundary, we define the information entropy of samples in kernel space to measure the distance between a sample and the boundary. Thus, our method can automatically obtain the optimal kernel parameter of the max-margin one-class classifier, which is adaptive to SAR images with multiple clutters. The experiments on the synthetic datasets validate that our method is effective to learn the optimal kernel parameter of the max-margin one-class classifier, and then achieves the optimal target discrimination performance. Moreover, the experiments on the measured SAR dataset further verify the effectiveness of our method on SAR target discrimination in complex scenes. In addition, the analysis of computational complexity shows the promising practicality of our method.

Conflicts of Interest:
The authors declare no conflict of interest.