SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature

: Automatic target recognition (ATR) in synthetic aperture radar (SAR) images plays an important role in both national defense and civil applications. Although many methods have been proposed, SAR ATR is still very challenging due to the complex application environment. Feature extraction and classiﬁcation are key points in SAR ATR. In this paper, we ﬁrst design a novel feature, which is a histogram of oriented gradients (HOG)-like feature for SAR ATR (called SAR-HOG). Then, we propose a supervised discriminative dictionary learning (SDDL) method to learn a discriminative dictionary for SAR ATR and propose a strategy to simplify the optimization problem. Finally, we propose a SAR ATR classiﬁer based on SDDL and sparse representation (called SDDLSR), in which both the reconstruction error and the classiﬁcation error are considered. Extensive experiments are performed on the MSTAR database under standard operating conditions and extended operating conditions. The experimental results show that SAR-HOG can reliably capture the structures of targets in SAR images, and SDDL can further capture subtle differences among the different classes. By virtue of the SAR-HOG feature and SDDLSR, the proposed method achieves the state-of-the-art performance on MSTAR database. Especially for the extended operating conditions (EOC) scenario “Training 17 ◦ —Testing 45 ◦ ”, the proposed method improves remarkably with respect to the previous works.


Introduction
Automatic target recognition (ATR) is one of the important applications of synthetic aperture radar (SAR) in civilian and military fields.Although many works have been done in the past few decades [1][2][3][4][5][6][7][8], it is still a highly challenging problem.Generally, the process of SAR ATR includes four sequential stages: detection, discrimination, feature extraction and classification.In the first two stages, the potential regions of interest (ROIs) are located, and the false ROIs are removed.In the last two stages, distinctive features are extracted from ROIs, and then, the extracted features are classified.In this paper, feature extraction and classification are studied.
For the good performance of SAR ATR, feature extraction is a key factor, the aim of which is to capture the characteristic of targets.In high resolution SAR images, geometric structures are the important features of targets.Inspired by the simplicity and robustness of a local (statistical) feature, i.e., histogram of oriented gradients (HOG) [9], we adapt HOG to speckled SAR images by using the ratio-based gradient definition (called SAR-HOG).Experiments show that SAR-HOG can indeed depict the geometric property of targets.In addition to obtaining the distinctive features of targets, classification is used to capture the differences among the different classes.Inspired by the effectiveness of the sparse representation classifier (SRC) [10] and dictionary learning [11], we further propose the supervised discriminative dictionary learning (SDDL) to learn a discriminative dictionary by combining the merits of supervised dictionary learning (SDL) [12] and discriminative dictionary learning (DDL) [13].In addition, in order to amplify the differences while properly suppressing the common features of the different classes, we represent the learned dictionary as the concentration of the class-specific sub-dictionaries and a shared sub-dictionary.Meanwhile, we propose a new strategy to combine the dictionary and the parameter of the classifier to simplify the optimization procedure.Based on the proposed SDDL and SRC, we propose the SAR ATR classifier, i.e., supervised discriminative dictionary learning and the sparse representation classifier (SDDLSR).Finally, SAR-HOG and SDDLSR are applied to the public MSTAR database.
The main contributions of this paper can be summarized as follows: (1) We propose a novel local feature for SAR ATR named SAR-HOG, which can effectively capture the main structures of targets in speckled SAR images.(2) We propose SDDL to learn the discriminative dictionary for SAR ATR.This is the first work to combine the SDL and DDL.Meanwhile, we propose a new strategy to simplify the optimization problem.(3) We propose a SAR ATR classifier SDDLSR based on SDDL and SRC, in which both the reconstruction error and the classification error are considered.(4) We perform extensive experiments to demonstrate the performance of the proposed method on the MSTAR database under the standard operating conditions (SOC) and extended operating conditions (EOC) scenarios.In addition, the performance of the proposed method with respect to the reduced size of the training set is also evaluated.Experiments show that our method achieves the state-of-the-art SAR ATR performance.
The remainder of the paper is organized as follows.In Section 2, we briefly introduce the related work, including the feature extraction methods and classification methods for SAR ATR.The novel local feature SAR-HOG is detailed in Section 3. The proposed supervised discriminative dictionary learning and the whole SAR ATR algorithm are detailed in Section 4. Experimental results on the MSTAR database are given in Section 5. Section 6 finally concludes our work.

Work Related to Feature Extraction for SAR ATR
The present feature extraction methods for SAR images can be roughly grouped into three categories.(1) Using raw images or transformed images: Zhao et al. [1] used the raw images as the feature of targets.Srinivas et al. [2] adopted the wavelet decomposition images.Dong et al. [3] utilized the monogenic signal to capture the characteristics of the SAR image.Generally, the features represented by raw images or transformed images have a high dimension; thus, linear or nonlinear dimensionality reduction techniques are usually used.Mishra [4] compared the linear methods, i.e., principle component analysis (PCA) and linear discriminant analysis (LDA), for SAR ATR.Huang et al. [5] proposed a nonlinear method using tensor global and local discriminant embedding for dimensionality reduction in SAR ATR.(2) Using scattering center features: Zhou et al. [6] adopted scattering center features at different target poses for classification.However, an offline global scattering model is needed to establish SAR image templates.(3) Using global or local statistical features: Clemente et al. [7] used the pseudo-Zernike moments as the global statistical features for SAR ATR.Local (statistical) features arise from the small parts of an image, e.g., HOG [9], scale-invariant feature transform (SIFT) [14], etc.The local features are usually invariant to image rotation, image scaling and minor changes in viewing direction [15,16].For optical images, local feature extraction methods usually detect key points at first and then compute the local descriptors [15,16].However, there are two problems when applying the existing local feature extraction methods to SAR ATR.On the one hand, for limited resolution SAR images, the pixels of targets are usually limited.Additionally, key point detectors cannot always reliably capture the structures of targets [9].On the other hand, the gradient computation by difference in local feature extraction is not suitable for the SAR image due to the speckle noise.By far, only a few local features are investigated for SAR images.Dai et al. [17] adopted the multilevel local pattern histogram for SAR terrain and land-use classification.Cui et al. [18] proposed a ratio-detector-based feature extraction method for very high resolution SAR image patch indexing.Dellinger et al. [19] proposed a SIFT-like algorithm called SAR-SIFT for the registration of SAR images.These investigations indicate that the local features have great potential in SAR applications.Some reviews of local feature extraction can be found in [15,16].Apart from the above hand-designed feature extraction methods, convolutional neural networks (CNNs) have been used for automatic feature extraction for SAR ATR [8].However, CNNs need huge training samples to gain the desired performance.In short, feature extraction for SAR images is still an open-ended question.

Work Related to Classification for SAR ATR
Classification in SAR ATR normally refers to supervised classification.Many classic classifiers have been used in SAR ATR, such as SVM [1], kNN [3], etc.The earlier mentioned dimensionality reduction methods PCA and LDA can also be taken as classifiers [4].Recently, the sparse representation classifier proposed by Wright [10] has been successfully applied to SAR applications, such as polarimetric SAR image classification [20] and SAR ATR [3].The success of SRC is largely guaranteed by the high redundancy and low coherency of the dictionary atoms.However, the dictionary in SRC constructed by stacking training samples may neither be optimal for solving reconstruction problems (i.e., denoising, inpainting) nor for solving discriminative problems (i.e., classification) [21].Thus, dictionary learning [11] is used to learn a more representative and compact dictionary.Several dictionary learning methods have been proposed for reconstruction tasks, such as K-SVD [22] and online dictionary learning [23].However, these methods are not necessarily suitable for discriminative tasks [24].Therefore, discriminative dictionary learning methods are proposed for solving discriminative problems.Ramirez et al. [13] proposed an incoherent dictionary learning method by introducing an incoherence term to encourage the independency of sub-dictionaries corresponding to different classes; thus, the dictionary becomes more discriminative.Gao et al. [25] further developed this method.They explicitly represented the similar atoms existing in sub-dictionaries as a shared sub-dictionary.Meanwhile, they also imposed incoherence constraints among class-specific sub-dictionaries.Although DDL can efficiently improve the performance of the dictionary, it does not use the class labels in the training set.Mairal et al. [12] proposed supervised dictionary learning by using class labels to improve the classification performance.Zhang et al. [26] proposed a method called discriminative K-SVD (DK-SVD) to jointly learn the dictionary and classifier by minimizing the summation of the reconstruction and classification errors.Inspired by DK-SVD, Jiang et al. [27] further proposed a method called label consistent K-SVD (LCK-SVD).They enforced a label consistency constraint on the dictionary and combined the dictionary and the parameters of the classifier into a single parameter to make the optimization more simple.A survey of supervised dictionary learning and sparse representation can be found in [28].

SAR-HOG
Because of the much greater aspect sensitivity of the SAR image compared with the optical image, the SAR image is more sensitive to depression angle change and the pose variation of targets [29].Therefore, we should pay more attention to the stable SAR image pixels, which are mainly dominated by strong backscatter returns from structures with aspect insensitivity.HOG is a simple and efficient local feature.It suggests that, for capturing stable structures of targets, one should use fine-scale derivatives, fine orientation binning, a dense grid and high-quality normalized descriptor blocks [9].Inspired by this idea, we propose a HOG-like local feature for SAR images called SAR-HOG by using the ratio-based gradient definition.SAR-HOG computation includes three steps: gradient computation, orientation binning, normalization and feature description [9].In the following, we detail the computation.

Gradient Computation
Original HOG uses the simplest scheme to compute the gradients, i.e., 1-D [−1, 0, 1] masks are employed to compute the gradients of the raw image without smoothing [9].Because of the multiplicative speckle noise existing in SAR images, the gradient by difference is not a constant false alarm rate operator.It is more suitable to use the ratio instead of the difference to compute the gradients of SAR images [19].Here, we use the simplest ratio of average (ROA) [30], i.e., where R i denotes the ratio and M 1 (i) and M 2 (i) denote the local means on opposite sides of the current pixel along direction i. i = 1 means the horizontal direction, and i = 3 means the vertical direction.Figure 1 shows the scheme of the ROA.The average region for M 1 (i) calculation is max[ win−1 2 , 1] × win (or win × max[ win−1 2 , 1]), where win denotes the odd size of the average region.Inspired by SAR-SIFT [19], we define the horizontal gradient G H and vertical gradient G V as: Then, the gradient magnitude G m and orientation G θ can be computed by: where the atan(•) denotes the inverse tangent function.
It should be noted that original HOG computes gradients without smoothing, and this is equal to computing M 1 (i) and M 2 (i) using only one pixel, respectively, i.e., win = 1.However, the effect of doing so in practice is not good for SAR images.Additionally, the main reason depends on speckle noise.We find that the moderately-sized region for averaging can improve the performance.

Orientation Binning
Given G m and G θ , we calculate the histogram of oriented gradients in local spatial regions (called cells) like that in [9].Specifically, the SAR image is divided into small cells (see Figure 2).For all of the pixels within a cell, the orientations are quantized into a fixed number of angular bins, and the magnitudes are accumulated into orientation bins.In original HOG, 6-8 pixel-wide cells and nine angular bins do best [9].However, we find that smaller cells work better, and the angular bins should be a bit bigger.

Normalization and Feature Description
Generally, G m of the SAR image has a large dynamic range, especially for metallic targets in SAR ATR.Therefore, effective local contrast normalization is very important for good performance.Specifically, the cells are grouped into larger spatially-connected blocks (see Figure 2), and the histogram entries of cells in each block are concentrated to be a vector.Then, each vector is normalized to have unit l 2 norm, i.e., where v i denotes the vector corresponding to the i-th block; ε is a small number, which can be chosen to be 0.2-times the mean value of v i 2 in all of the blocks [31].The normalization method here is a little different from that in [9], which can better avoid the poor results for blocks with uniform G m .In the original HOG; 2-3 cell blocks work best [9].In Section 5, we can see that block size should be adjusted according to the targets in SAR images.Finally, all of the normalized vectors corresponding to the blocks are concentrated to yield the SAR-HOG descriptor.It should be noted that, in order to improve the performance, the blocks typically overlap, meaning that each cell contributes more than once to the final descriptor (see Figure 2).In the original HOG, the block overlap (stride) is half of the block size.We find that it is still suitable for the SAR image.
In Section 5, we can see that SAR-HOG can reliably capture the structures of targets in SAR images, and this local feature can directly improve SAR ATR performance.

SAR ATR Algorithm via Supervised Discriminative Dictionary Learning of SAR-HOG
In some SAR ATR scenarios, the targets from different classes may have similar physical structures.For example, in military vehicles, SAR ATR [29], the main battle tanks T62 and T72 have similar structures, e.g., tread, turret, armor, etc.Additionally, these similar objects correspond to similar image features.In this paper, we propose the SDDL method to further capture subtle differences among the different classes.In the following, we first review the sparse representation classifier and dictionary learning, then detail the proposed SDDL method and the optimization procedure and, finally, present the complete SAR ATR algorithm.

Review of the Sparse Representation Classifier and Dictionary Learning
SRC is a generic classifier and has been used in many applications.Given sufficient training samples of K classes, a dictionary A test sample x t is decomposed onto D by solving the following optimization problem [10]: where α α α t is the sparse code of x t and λ is the regularization parameter to control the sparse degree of α α α t ; • 2 and • 1 denote the l 2 norm and l 1 norm, respectively.Then, x t is approximated by quantities x k t = Dδ δ δ k (α α α t ) , where δ δ δ k (α α α t ) is a vector whose nonzero entries are the entries in α α α t that are associated with the k-th class.Additionally, x t is classified by assigning it to the class that minimizes the residual error between x t and x k t [10]: In order to have a representative and compact dictionary, dictionary learning is applied to learn a dictionary from the training set, which can be formulated as follows [11]: where D ∈ R m×P is the learned dictionary with size of P; X k ∈ R m×n k and A k ∈ R P×n k denote the k-th test samples matrix and the corresponding sparse codes matrix, respectively; ψ(•) is the sparsity-inducing regularization function, and the l 1 norm is usually used; • F denotes the Frobenius norm.To avoid the trivial solution, we constrain each atom to be norm one.Generally, the size of the dictionary is much smaller than the amount of the samples, i.e., P N, meaning that the learned dictionary is compact.
The above dictionary learning method is usually used for reconstruction tasks.In fact, for discriminative tasks [24], e.g., SAR ATR, DDL can be used to learn a more discriminative dictionary.In [25], class-specific sub-dictionaries and a shared sub-dictionary are jointly learned from the training samples.Meanwhile, incoherence constraints are enforced on sub-dictionaries.Mathematically, this dictionary learning is formulated as follows: where D 0 ∈ R m×P 0 denotes the shared sub-dictionary, which encodes the common feature among the different classes; {D k } ∈ R m×P k denote the class-specific sub-dictionaries, which encode the subtle feature differences among the different classes; the complete dictionary is D In the above formulation, the fist two terms represent reconstruction error, which are similar to Equation (7).The third term enforces the self-incoherence on each sub-dictionary in order to make the learned dictionary stable, and the fourth term enforces the inherenceconstraint among sub-dictionaries to make the dictionary more discriminative.
Although the dictionary learned by DDL is more discriminative, DDL does not use the class labels in the training set.SDL [12] jointly learns the dictionary and classifier by adding a classification error term to Equation (7), which would require solving: min where

Supervised Discriminative Dictionary Learning
Inspired by DDL and SDL, we combine Equations ( 8) and ( 9) to propose the supervised discriminative dictionary learning method.Specifically, we represent the dictionary as the concentration of the class-specific sub-dictionaries {D k } and a shared sub-dictionary D 0 , i.e., D = [D 0 , D 1 , • • • , D K ], enforce the incoherence constraints on sub-dictionaries, adopt the square loss function to measure the classification error and overlook the regularization term W 2  F like that in LCK-SVD.Therefore, the dictionary, classifier and sparse codes can be obtained by solving: where A k ∈ R (P 0 +P k )×n k , {γ k } are regularization parameters.
The above problem can be solved iteratively and alternatively, with the (2K + 1) of the unknowns fixed each time and solving for the (2K + 2)-th.However, the solution is very likely to get stuck in some local minima.Therefore, we have to optimize the above objective function.Now, it is worth taking a few moments to study the form of Equation (10), as it can provide a new perspective about the classification error.We can see that 10) has the same form of dictionary learning (see Equation ( 7)).Additionally, the parameter of classifier W can be taken as a dictionary; the class labels {Y k } can be taken as training samples.Thus, the classification error can also be taken as reconstruction error.It should be noted that {Y k } are actually class labels, and the columns of Y k = y k × 1 T are always denoted by the sparsest one-of-K vector y k .Here, we represent {y k } by a group of dense orthonormal vectors.Based on the above understandings, we further represent W as the concentration of the class-specific sub-dictionaries {W k } and a shared sub-dictionary W 0 , i.e., W = [W 0 , W 1 , • • • , W K ], and also enforce incoherence constraints on the sub-dictionaries of W. Because D and W are learned from two training sets with different natures, it is reasonable to further enforce incoherence constraints among D and W. In addition, we elaborately set the regularization parameters for reasonably simplification.Each γ k is set to be one, meaning that the reconstruction error term has the same weight with the classification error term.The incoherence constraint terms about D and W have the same weights.Therefore, Equation ( 10) can be developed into the following problem: where W 0 ∈ R K×P 0 and W k ∈ R K×P k ; Tr denotes the trace operation.The third line of Equation ( 11) denotes the incoherence constraints among D and W, and the adoption of such a form is for the following mathematical simplification.Then, we combine the parameters in Equation ( 11) by: where Xk ∈ R (m+K)×n k , D0 ∈ R (m+K)×P 0 and Dk ∈ R (m+K)×P k .Additionally, the simplified form is obtained as follows: min and the complete quasi-dictionary D = [ D0 , D1 , • • • , DK ] ∈ R (m+K)×P .We can see that Equation (13) has the same form as Equation (8).

Optimization Procedure
The objective function in Equation ( 13) is not convex.We can use the similar method in [25] to solve the problem.The main idea is that each A k , Dk and D0 is updated alternatively while keeping all the rest of the variables fixed, and the updates iterate until reaching the stopping criterion.Specifically, there are three steps in each iteration as follows.
1. Updating sparse codes {A k }: With all of the other factors fixed in Equation ( 13), the k-th sparse codes matrix A k can be obtained by solving the following problem: This problem is the standard sparse coding problem (see Equation ( 5)).In this paper, we use the fast implementation of LARSalgorithm [31], which is a variant for solving the Lasso.
2. Updating class-specific sub-dictionaries { Dk }: With the sparse codes {A k }, the shared sub-dictionary D0 and the k − 1 class-specific sub-dictionaries fixed, the k-th class-specific sub-dictionary Dk can be obtained by solving the following problem: min where We use the gradient descent algorithm to solve the objective function, and the step size is chosen according to the Armijo rule, like that in [13,25].
3. Updating the shared sub-dictionary D0 : When {A k } and { Dk } are fixed, we can obtain D0 by solving the following problem: We also use the gradient descent algorithm to optimize the above problem, like that in [13,25].
The above iterative process stops when the number of iterations reaches the predefined value M or the respective relative change between two successive estimations of D and {A k } are less than the given constants, i.e., where s denotes the s-th iteration; ∧ denotes the AND operation; −thres D and −thres A denote the convergence thresholds, which are set to be small numerical constants.Although the solutions obtained by this algorithm are not the exact solutions, they are actually the solutions satisfying the constraints [25].Additionally, the experiments in Section 5 also demonstrate the good performance of these solutions.Following the work of [27], once D is obtained, the final desired dictionary D and the parameter of classifier Ŵ can be derived from D as follows: where As for the initialization of the above iterative process, we initialize {D k } and {A k } with K-SVD by using the training samples from different classes, and we initialize D 0 by using the training samples from all of the classes.For the initialization of W 0 and {W k }, we can use the ridge regression model to initialize, like that in [27].As described above, W can also be taken as a dictionary, so K-SVD can be employed for initialization.In this paper, we use the latter method to initialize W 0 and {W k }.According to Equation ( 12), the initialized { Xk }, D0 and { Dk } are obtained.
The parameter setting, performing cross-validation on λ, {µ k }, {η k } (regulation parameters) and P (size of dictionary), would be cumbersome.Fortunately, due to the same forms of Equations ( 8) and ( 13), we can refer to the method in [25] for parameter setting.For sparse regularization parameter λ, it has been experimentally shown that good performance can be achieved when it is set to be 0.3 [25].We also set λ to be 0.3 for simplification.In [25], {µ k } and {η k } are set as follows: where a and b are controllable parameters.Such a formulation considers the normalization of the incoherence term by the number of samples and the size of the sub-dictionary in each class.In [25], the ratio between η k and µ k is simply set to be two, and b is set to be 0.1.We also take the formulation of Equation ( 19) and, meanwhile, constrain η k /µ k = 2. b is set by experiment, which is illustrated in Section 5.As for the size of the dictionary, [25] shows that a bigger size of class-specific sub-dictionary can lead to better performance, and a moderately-sized shared sub-dictionary leads to the desired performance.A similar rule still holds in our method.It should be noted that P grows linearly with the number of classes.Therefore, we should properly adjust P according to the real SAR ATR problem.The overall optimization procedure for solving Equation ( 13) is summarized in Algorithm 1.
Algorithm 1: Supervised discriminative dictionary learning (SDDL).Input: feature vectors of the ROIs: the stopping thresholds −thres D and −thres A or maximum iterations M.

Initialization:
Initialize {D k }, D 0 , {W k }, W 0 and {A k } with K-SVD.Initialize { Xk }, D0 and { Dk } by Equation ( 12).Repeat for k = 1 to K do Update the sparse codes A k of Xk with LARS algorithm (see Equation ( 14)).end for for k = 1 to K do Update each D k by using gradient descent algorithm (see Equation ( 15)).end for Update D 0 by using gradient descent algorithm (see Equation ( 16)).Until reaching the stopping criterion (see Equation ( 17)).Output: The desired dictionary D and the classifier parameter Ŵ obtained by Equation (18).

SAR ATR Algorithm
Given the training slice images of ROIs and relevant class labels from K classes, we first extract features from the slice images.Then, we use the SDDL method to jointly learn a discriminative dictionary D and classifier Ŵ using Algorithm 1.Then, we decompose the feature vector of a certain test sample x t onto D to obtain its sparse code α α α t by solving the problem of Equation ( 5), and here, the D in Equation ( 5) is replaced by D. Finally, we identify x t based on α α α t by using the following decision rule: where α α α t ∈ R P 0 +P k .By comparing Equations ( 6) and ( 20), we can see that both the reconstruction error and the classification error are actually considered by us.The proposed SAR ATR algorithm via supervised discriminative dictionary learning and sparse representation is summarized in Algorithm 2. Here, the proposed method is denoted by SDDLSR.

Feature extraction:
Compute features of slice images.In this paper, SAR-HOG features are computed following the description in Section 3.

Classification:
Learn the dictionary D and the classifier parameter Ŵ using Algorithm 1.
Decompose the test sample X t onto D and identify by the decision rule (see Equation ( 20)).Output: The identity of X t .

Experiments and Discussions
In order to verify the effectiveness and robustness of the proposed method, we perform experiments using the Moving and Stationary Target Automatic Recognition (MSTAR) public database [29].This database is a gallery collected using an X-band HH polarization SAR with 1 ft × 1 ft resolution for multiple targets.The SAR images are captured at various depression angles (15 • , 17 • , 30 • , 45 • ) over a 0 • -360 • range of aspect view.The sizes of the images are all around 128 × 128 pixels.In this paper, these images are cropped to 64 × 64 pixels to further avoid the influence of clutter.The performance of SAR ATR obtained by the proposed method is attributed to two factors, i.e., the SAR-HOG feature and the SDDLSR classifier.Therefore, we should demonstrate the effectiveness of the SAR-HOG feature and SDDLSR, respectively.
In the following, we first detail the experiment setup and parameter setting.Then, we evaluate the effectiveness of the SAR-HOG feature by using SDDLSR as the baseline classifier in Section 5.3.In Sections 5.4 and 5.5, we demonstrate the effectiveness of SDDLSR classifier based on the same feature SAR-HOG.Here, some classic classification methods, including SVM, kNN, SRC and the SDL method LCK-SVD, are compared to the proposed SDDLSR classifier.The performance of the proposed method with the reduced size of the training set is evaluated in Section 5.6.The time consumption is illustrated in Section 5.7.All of the experiments are performed by MATLAB, using a common PC with the Intel Core i7 processor with a 3.40-GHz main frequency and an 8.00-GB main memory.

Experiment Setup
Experiments are performed under standard operating conditions (SOC) and extended operating conditions (EOC) [29].In SOC scenarios, the testing conditions are very close to the training conditions.However, in EOC scenarios, the operating conditions of testing are always away from the training conditions.Therefore, EOC scenarios are closer to real-world battlefield scenarios.
In the experiments under SOC, the images of ten targets acquired at a 17 • depression angle are used to train, while the images acquired at a 15 • depression angle are used to test [3,29].The number of images for training and testing is tabulated in Table 1.It should be noted that BMP2and T72have several variants with small structural modifications, and these variants are denoted by different series numbers (SN).Here, only the SN_132 of T72 and SN_9563 of BMP2 acquired at a 17 • depression angle are used for training.In the experiments under EOC, the EOC difference on the depression angle (denoted by EOC_d) is considered.Four targets, 2S1, BRDM2, ZSU23/4and T72(SN_A64) at different depression angles under Scene 1 are used [29].Specifically, the images acquired at a 17 • depression angle are used for training, and the images acquired at 30  For SAR ATR, the training samples acquired from the real battlefield are usually very limited.Therefore, we further study the performance of the SAR ATR methods with a small training set.Specifically, the case "Training 17 • -Testing 30 • " is considered, and the training samples are gradually reduced.Each time, 50 samples per class are randomly removed from the previous training set, and the left is used as the current training set.For each training size, experiments are repeated 100 times to have the mean performance.

Parameter Setting
The parameters in the proposed method include the parameters of SAR-HOG and the parameters of SDDL.The setting principles have been given in Sections 3 and 4.3.The parameter setting in SAR-HOG actually depends on SAR image resolution and the sizes of targets in the images.Here, we further illustrated the parameter setting.Additionally, the fixed parameter values are tabulated in Table 3.We use the experiment setup "Training 17 • -Testing 30 • " and the proposed SDDLSR method to choose the parameter values.The experiment results are recorded in Figures 3 and 4 and Tables 4-6.It should be noted that when the current parameters are tested, the left parameters are fixed at the values in Table 3.
Figure 3a shows that properly increasing the average region size win for computing M 1 (i) and M 2 (i) can improve the recognition rate up to 11 pixels.Figure 3b shows that setting the number of bins to be 11 can obtain the best performance.Tables 4 and 5 record the recognition rate and time consumption as the cell and block sizes change, respectively.The items for which the corresponding recognition rates are bigger than 0.96 are in bold.We can see that when the cell sizes are set to be 4-8 pixels, the block sizes are set to be 2-7 cells; meanwhile, the block sizes are no more than 32 pixels, and the recognition rates are no less than 0.96.In addition, although the longer SAR-HOG feature length associated with the smaller cell and block sizes tends to lead to a higher recognition rate, the time consumption is more.Therefore, we choose 8 × 8 pixel cells and 4 × 4 cell blocks in the experiments to compromise between recognition rate and time consumption.Table 6 shows that setting the stride to be 16 pixels (i.e., half of the block size) can lead to desirable performance.As for the parameter λ and b in SDDL, λ is set to be the empirical value 0.3 [25], and b = 0.1 is an ideal value shown in Figure 4a.As for the size of the dictionary, Figure 4b shows that setting P k , k = 0, 1, • • • K to be 96 works best.For the stopping criterion, we find that the performance of Algorithm 1 is satisfactory within 20 iterations.In addition, small −thres D and −thres A , e.g., 10 −4 can guarantee the convergence of Algorithm 1.However, such a setting generally results in more than one hundred iterations.Therefore, we simply set the max iterations M = 20 in the experiments.

Effectiveness of SAR-HOG
The effectiveness of the SAR-HOG feature is first evaluated using the experiment setups under SOC and EOC_d.Specifically, the commonly-used features, i.e., raw intensity images [1] and the wavelet decomposition images [2] are used to compare with the results based on the SAR-HOG feature by using SDDLSR as the baseline classifier.It is easy to comprehend the feature "intensity image".The features "wavelet decomposition images" are actually the LL, LH and HL (L = H = high) sub-bands obtained after a multi-level wavelet decomposition using the 2D reverse biorthogonal wavelet [2].The parameters in computing SAR-HOG are listed in Table 3.The SDDLSR is implemented in Algorithm 2, the fixed parameters of which are also listed in Table 3.
The results obtained by using Algorithm 2 and taking the intensity image as the input feature under SOC and EOC_d are recorded in Tables 7 and 8, respectively.The results obtained by using Algorithm 2 and taking wavelet decomposition images as the input feature under SOC and EOC_d are recorded in Tables 9 and 10, respectively.The results obtained by using Algorithm 2 and taking SAR-HOG as the input feature under SOC and EOC_d are recorded in Tables 11 and 12, respectively.Results include the confusion matrices and the overall recognition rates.The confusion matrix is a matrix of fractions, with each column representing a possible decision class and each row representing a class that is present to the ATR system.The diagonal elements of the confusion matrix represent the fraction of correct decisions, and the mean value of the diagonal elements is the overall recognition rate.By comparing the results in Tables 7-12, we can see that the SAR-HOG feature leads to the best performance under SOC and EOC_d by using the same classifier SDDLSR.For the SOC scenario, the recognition rates for using raw intensity images and using wavelet decomposition images are 0.9406 and 0.9380, compared to 0.9624 by using SAR-HOG.For the EOC_d scenario "Training 17 • -Testing 30 • ", the recognition rates for using raw intensity images and using wavelet decomposition images are 0.9165 and 0.9028, compared to 0.9661 by using SAR-HOG.For the most challenging EOC_d scenario "Training 17 • -Testing 45 • ", the recognition rate by using SAR-HOG is 0.8086, which is better by 9% and 10.89% than the result of 0.7186 by using raw intensity images and the result of 0.6997 by using wavelet decomposition images, respectively.Thus, we can conclude that a good feature can indeed improve the SAR ATR performance, and SAR-HOG has remarkable merit.The following experiments are all based on SAR-HOG, and they further verify the effectiveness of SAR-HOG for SAR ATR.

Ten Targets' ATR under SOC
For the problem of ten targets' ATR under SOC, we take SAR-HOG as the input feature and compare the proposed SDDLSR classifier with the SVM, kNN, SRC and LCK-SVD classifiers.The parameters in computing SAR-HOG and the parameters used in SDDLSR (Algorithm 2) are listed in Table 3.The implementations of SVM, kNN, SRC and LCK-SVD are illustrated in Section 5.7.The experimental results are recorded in Tables 11 and 13-16.

Time Consumption
The time consumption in SAR ATR tasks includes two main parts: the training time and the testing time.Generally, the classifier can be trained off-line.Additionally, the testing time plays an important role in the real-time operation of SAR ATR.For the proposed method, the training time is the time consumption of Algorithm 1.The testing time is the time consumption of sparse representation classification (see Equation (20)).In the proposed method, SAR-HOG calculation and the gradient descent algorithm are written in our unrefined MATLAB code; the LARS algorithm in Algorithm 2 is implemented by the SPAMS software [31].For the reference methods, SVM is implemented by the well-known LIBSVM software [32]; kNN is implemented by the functions in MATLAB Statistics and the Machine Learning Toolbox; SRC is implemented by the SPAMS software; LCK-SVD is implemented by the code supplied in [27].The time consumptions of different methods under "Training 17 • -Testing 30 • " are recorded in Table 21.It can be seen that the computing time of SAR-HOG and the training time of the proposed method are acceptable, and the testing performance is satisfactory.In addition, the training stage of SRC is the dictionary construction by just stacking the training samples; the training stage of LCK-SVD is the discriminative dictionary learning.

Conclusions
In this paper, we proposed an SAR ATR method via supervised discriminative dictionary learning and sparse representation based on the SAR-HOG feature.Firstly, we adapted a local feature named HOG to SAR images by using the ratio-based gradient definition to deal with speckle noise.Then, we combined the merits of discriminative dictionary learning and supervised dictionary learning to propose the supervised discriminative dictionary learning.Using the SDDL, we obtained the discriminative dictionary and the classifier meanwhile.Finally, we sparsely decomposed the test sample onto the learned dictionary and identified the test sample based on the summation of the reconstruction error and the classification error.
In order to demonstrate the effectiveness and robustness of the proposed method, we performed extensive experiments on the MSTAR database under SOC and EOC.Additionally, we used several classification methods, including SVM, kNN, SRC and LCK-SVD, as reference methods.From the experimental results, we can obtain the following conclusions: (1) SAR-HOG can effectively capture the features embedded in the stable pixels corresponding to stable structures of targets; (2) SDDL can further capture the subtle differences between the different classes; and the proposed strategy can effectively simplify the optimization problem; (3) Compared with the published best results (see Figure 7 and Table VII in [3]), the proposed method can obtain higher recognition rate, especially for the EOC scenario "Training 17 • -Testing 45 • ".
In the future, the proposed method can be extended to other applications.On the one hand, SAR-HOG can be used as an effective local feature to be applied to target detection, image indexing, SAR image registration, and so on.On the other hand, SDDL can be used to learn a representative and discriminative dictionary for discriminative tasks, and SDDLSR can be taken as a general classifier to deal with target detection (two category classification) and SAR image classification problems.

Figure 1 .
Figure 1.Scheme of the ratio of average (ROA).(a) The ratio of the local means for the horizontal direction.(b) The ratio of the local means for the vertical direction.
k is the class labels matrix associated with X k and y k is usually denoted by a one-of-K indicator vector; L(•) measures the classification error between Y k , and the prediction of the classifier with model parameter W ∈ R K×P based on A k , and L(•) can be a logistic loss function, hinge loss function or square loss function; γ and ρ are regularization parameters; the last term is a regularizer for stability reasons.

Algorithm 2 :
SAR ATR via SDDLSR.Input:Slice images of ROIs from the training set and the testing set and class labels in the training set.

Figure 5 .
Figure 5. Recognition rates of different methods versus the training set.

Table 1 .
The number of images for training and testing about the ten targets under standard operating conditions (SOC).
• and 45 • depression angles are used for testing.Here, the two cases are denoted by "Training 17 • -Testing 30 • " and "Training 17 • -Testing 45 • ", respectively.The number of images for training and testing about the four targets is tabulated in Table 2.

Table 2 .
The number of images for training and testing about the four targets under extended operating conditions (EOC).

Table 3 .
The parameters used in the proposed method.

Table 4 .
The recognition rate versus different (cell, block)s.denotes that the block size is bigger than the image size.The items for which recognition rates are bigger than 0.96 are in bold.

Table 5 .
The time consumption (min) per iteration of Algorithm 1 versus different (cell, block)s.denotes that the block size is bigger than the image size.The items for which the corresponding recognition rates in Table4are bigger than 0.96 are in bold.

Table 6 .
The recognition rate versus the stride.

Table 7 .
The result of supervised discriminative dictionary learning sparse representation (SDDLSR) based on the raw intensity image under SOC.

Table 8 .
The result of SDDLSR based on the raw intensity image under EOC_d.

Table 9 .
The result of SDDLSR based on the wavelet decomposition image under SOC.

Table 10 .
The result of SDDLSR based on the wavelet decomposition image under EOC_d.

Table 11 .
The result of SDDLSR based on SAR-HOG under SOC.

Table 12 .
The result of SDDLSR based on the SAR-HOG under EOC_d.

Table 13 .
The result of SVM based on SAR-HOG under SOC.

Table 14 .
The result of kNN based on SAR-HOG under SOC.

Table 15 .
The result of sparse representation classifier (SRC) based on SAR-HOG under SOC.

Table 21 .
The time consumption of different methods (s).