SAR Target Recognition via Joint Sparse and Dense Representation of Monogenic Signal

: Synthetic aperture radar (SAR) target recognition under extended operating conditions (EOCs) is a challenging problem due to the complex application environment, especially for insufﬁcient target variations and corrupted SAR images in the training samples. This paper proposes a new strategy to solve these problems for target recognition. The SAR images are ﬁrstly characterized by multi-scale components of monogenic signal. The generated monogenic features are decomposed to learn a class dictionary and a shared dictionary, which represent the possible intraclass variations information and the common information, respectively. Moreover, a sparse representation of the class dictionary and a dense representation of the shared dictionary are jointly employed to represent a query sample for classiﬁcation. The validity of the proposed strategy is demonstrated with multiple comparative experiments on moving and stationary target acquisition and recognition (MSTAR) database. for A-ConvNets, respectively. The experimental results indicate that JMSDR can effectively deal with the SAR target recognition task with version variants.


Introduction
Synthetic aperture radar (SAR) is an active sensor, which has the ability to provide full-time, full-weather, and high-resolution imagery [1]. Therefore, SAR imagery is widely applied for civilian and military fields, for example, reconnaissance, exploration, surveillance, and, especially, target recognition [2][3][4]. SAR target recognition has been studied extensively in the last several decades, but it is still one of the most challenging tasks to recognize SAR images effectively under extended operating conditions (EOCs), since targets exhibit several variations in configuration, version, occlusions, etc. under EOCs [5][6][7].
The traditional target recognition method, i.e., template-based strategy [8], is ineffective under EOCs, as slight changes in configuration or occlusions may give rise to significantly different scattering phenomenology, thus it is hard to quantify the similarity of targets between the templates and the query sample. To enhance the performance of target recognition, some feature-based methods are proposed to characterize SAR images, such as global or local structure descriptors [9][10][11], attributed scattering centers [12,13] and filer banks [3,14]. In recent years, the monogenic signal, which is a multidimensional analytic signal, is employed to characterize SAR images due to its rotation-invariance. For example, Dong et al. introduced the monogenic components to describe SAR images for target recognition by feeding them into the framework of sparse representation modeling [14]. In [15], the monogenic components at different scales are united as a region covariance matrix for classification. Zhou et al. [16] presented a scale selection model, where the specific monogenic component features are produced before classification. Zhou et al. [17] presented the feature fusion of multi-scale monogenic components by 2D canonical correlation analysis for SAR target recognition. These studies prove the ability of the sparse representation of monogenic components in SAR target recognition. However, these methods directly produce the dictionary with the training samples in sparse representation modeling; the correlated versions of atoms in dictionary will limit the target reconstruction and discrimination.
To overcome these problems, several dictionary learning methods are developed. Ramirez et al. developed a dictionary learning framework to reduce the reconstruction error through decreasing the coherence between the sub-dictionaries [18]. Mailhe et al. presented en incoherent K-singular value decomposition (Incoherent KSVD) method, in which the coherence is reduced by rotating the dictionary atom for spare representation [19]. Jiang et al. introduced the label information constraint dictionary to strength the discriminative ability of dictionary [20]. Song et al. combined the discriminative dictionary and the classifier parameters to achieve the optimization procedure [4]. To further enhance the robustness of target recognition under EOCs, Dong et al. employed sparsity and low-rank regularization to constrain the objective function for dictionary learning, as it has been verified that the low rank representation can find the underlying structure of images in noise signal [21]. Deng et al. proposed an extended SRC (ESRC) method, in which an auxiliary intraclass variant dictionary is learned to represent possible variations [22]. Thereafter, based on the ESRC, a Superposed SRC (SSRC) method is proposed, whose dictionary is replaced by a prototype dictionary that is composed of a class mean matrix of each class [23]. Lai et al. [24] proposed a hybrid representation by a class-specific dictionary and a common intra-class variation dictionary, which shows a good performance for face recognition. Li et al. [25] presented a coupled dictionary that could maximize the differences and reduce the effect of similarities among targets. Although these methods have a certain improvement in recognition performance, they could not guarantee a valid representation for a query sample under EOCs; for example, the variations of some classes in training samples are not sufficient or the training samples have occlusion and noise corruption, hence the performance of target recognition will be limited. As a useful technique for data representation, low rank and sparse decomposition has been widely used in many fields, such as medical image reconstruction [26], face recognition [27] and maritime surveillance [28]. Recently, low rank and sparse decomposition is successfully applied to SAR image reconstructions and target imaging and tracking [29]. It has been confirmed that the common information of targets in the same class would be a low rank matrix, and the unique information of the targets in different classes would meet sparsity.
Inspired by the aforementioned works, this paper presents a target recognition strategy, i.e., joint sparse and dense representation of monogenic signal (JMSDR), to solve these problems. To capture the spatial localization and broad spectral information of SAR images, the multi-scale monogenic components of monogenic signal are employed for the production of the monogenic features. Different from directly using the monogenic features to generate low-rank dictionary for spare representation, a class dictionary and a shared dictionary are learned through decomposing the monogenic features. Specifically, the class dictionary focuses on the difference information among each class, while the shared dictionary reflects the common information between different classes. For a vehicle target example, the difference information is used to distinguish different vehicle targets, such as a tank or an armored vehicle, while the common information distinguishes whether the target is a vehicle or an airplane.Moreover, a query sample is jointly represented by the sparse representation of the class dictionary and dense representation of the shared dictionary. Hence, the query sample could be better represented by the variation information of other classes, when the targets are insufficient variations in the specific class of training samples. Additionally, the discriminability of the learned dictionary can be promoted by the low rank constraint. Finally, the decision is made by evaluating the class with the minimal reconstruction error. A block diagram of the proposed strategy is shown in Figure 1.
In the following, Section 2 first introduces the feature representation of SAR image on monogenic signal, and presents the joint sparse and dense representation of monogenic signal strategy. Section 3 evaluate the validity of the proposed strategy on extensive comparative experiments under EOCs. The conclusion is drawn in Section 4. where f i M (i ∈ s) is the ith scale monogenic feature vector.It is made up of local amplitude A i , local phase ϕ i and local orientation θ i in order, i.e., [A i , ϕ i , θ i ]. Specifically, where Figure 2 illustrates an example of monogenic components with a SAR image. The image characterized by the monogenic signals in multiple scales could be interpreted as the combination of multiple sub-bands that are from different information spaces. Recently, many methods are proposed to use these synthetic or transformed sub-bands for target detection and recognition [31,32].

Low-Rank Dictionary Learning
Let X = [X 1 , X 2 , . . . , X n ] ∈ R m×n be n training samples, the monogenic feature of training samples generated by Equation (4) is X (X) = [X (X 1 ), X (X 2 ), . . . , X (X n )]. In [14], the monogenic feature is directly utilized as a dictionary, and then a query sample is sparsely represented and assigned to the class for which the reconstruction error is minimal. However, the signal prespecified dictionary cannot guarantee a valid representation for a query sample when the variations of some classes in training samples are not sufficient. Additionally, it is difficult to perform well when training samples have occlusion and noise corruption.
To overcome these problems, a class dictionary and a shared dictionary are learned to jointly represent query sample through decomposing the monogenic feature. For vehicle target, the class dictionary focuses on the class information in the training sample, for example the vehicle target is a tank or an armored vehicle. The shared dictionary pays more attention to the common information with the vehicle targets provided in the training samples, such as phase and direction. Hence, when the query sample cannot be sufficient represented by the targets from the same class, it can be further linearly represented by the common information provided by the other classes targets in the shared dictionary. Therefore, the monogenic feature X (X) can be decomposed where Φ is the class dictionary, A = {a ij , i, j = 1 . . . n} is a sparse coefficient matrix associated with the class dictionary, Ψ is the shared dictionary, Z is a coefficient matrix corresponding with the shared dictionary, and E is the error matrix. Since the class dictionary only contains the components of specific class, the sparse coefficient should coincide with the class information of sample classes. Hence, the sparse coefficient a ij ∈ A meets the following conditions According to the above formula, the sparse coefficient A is a matrix. Equation (6) can be simplified as follows Based on manifold learning theory, the samples from the same class all reside in a low-dimensional subspace [33]. Therefore, the class matrix Φ should be approximately low rank. Accordingly, ΨZ, Ψ and E are also low rank matrices. To make the representation error E for the query sample as small as possible, the coefficient matrix Z is reasonably assumed to be not sparse but dense. Hence, the low-rank regularized sparse and dense dictionary learning model based on monogenic features is expressed as min where λ, γ and τ are regularization parameter. · represents the nuclear norm that is a convex relaxation of low-rank constrained matrix, and · F is Frobenius norm. Since the objective model in Equation (9) is nonconvex, it is solved by three sub-optimization procedures.
(1) Update the shared dictionary Ψ. The class dictionary Φ and dense coefficient matrix Z are fixed. The optimization problem in Equation (9) is then reduced to where η is a regularization parameter. The problem in Equation (10) is a classical robust principal component analysis (RPCA) problem [34]. Inexact ALM method [35,36] is utilized to solve this problem. To remedy the issue in solving Equation (10), an auxiliary variable J is employed to make it separable. Hence, Equation (10) is transformed as The augmented Lagrangian function of Equation (11) is written as where Y 1 and Y 2 are the Lagrange multipliers, and µ > 0 is a penalty parameter. By fixing Ψ and E, the closed-form solution of J is where D ε (r) = sgn(r)max(|r| − ε, 0). Similarly, for updating E, the closed-form solution can be obtained Fixing J and E, the closed-form solution for updating Ψ can be acquired is the solution of the optimization problem in Equation (12). (2) Update dense coefficient matrix Z. The class dictionary Φ and the shared dictionary Ψ are fixed. The optimization problem in Equation (9) can be replaced as: Similarly, the problem can also be solved by Inexact ALM method, whose augmented Lagrangian function is expressed as The solution of the optimization problem in Equation (17) procedure can be summarized in Algorithm 1.

Algorithm 1
The procedure of solving the optimization problem in Equation (17).
Input: the monogenic features of training sample matrix X (X) class dictionary Ψ, shared dictionary Φ parameters γ and τ, and threshold ε. Output: dense coefficient matrix Z. Initialization: (3) Update the class dictionary Φ. The shared dictionary Ψ and dense coefficient matrix Z are fixed. The optimization problem in Equation (9) is simplified as where η is a regularization parameter. The above equation is a classical robust principal component analysis model, which can be transformed into the problem of minimizing Lagrangian function Equation (19) is an unconstrained problem, thus this problem can be solved by repeatedly setting separately updating variables, and then updating the Lagrange multiplier Y k+1 . Specifically, the update optimization process is as follows is the solution of the optimization problem in Equation (19).
By sequentially solving Equations (10), (16) and (18), the shared dictionary and the class dictionary of monogenic features are acquired. The shared dictionary Φ contains more common information of the target, and the class dictionary Ψ contains more category information. More importantly, the dense coefficient matrix Z improves the ability of the dictionary to represent query samples rather than directly using the training sample themselves as a dictionary. Hence, the problems about insufficient variations of training samples, occlusion and noise corruption can be alleviated. The flow diagram of low-rank dictionary learning is shown in Figure 3. , , Figure 3. The flow diagram of low-rank dictionary learning. The objective function can be solved by three sub-optimization procedures. The variables Φ, Ψ, and Z are alternately updated by minimizing the sub-optimization problems. When the termination condition is met, the solution of the objective function is obtained.

Implementation of Target Recognition
The low-rank dictionary learning is presented in the previous section. Then, target classification with the learned dictionaries including the shared dictionary and the class dictionary is implemented.
Given the training samples and relevant class labels from C classes, X = [x 1 , x 2 , . . . , x n ] ∈ R m×n . We first compute the augmented monogenic features of training samples by Equation (4) Those features are then used to produce a shared dictionary and a class dictionary. Specifically, the two learned dictionary can be obtained by solving the low-rank dictionary learning model in Equation (9). Given a query sample y, the monogenic feature of the query sample can be represented by χ(y). The query sample can be encoded by the shared dictionary Φ and the class dictionary Ψ as a linear combination of themselves.
X (y) = Φα + Ψz + e (22) where α is a sparse coefficient related to the class dictionary Ψ and z is a dense coefficient corresponding with the shared dictionary Φ. Generally, the method for obtaining the sparse solution is to constrain the feasible set by sparse constraint. For example, 1 −norm minimization of the sparse representation α is applied. In addition, we use 2 −norm to realize the dense representation z. To make it robust to occlusion and noise corruption, the error e uses 1 −norm minimization. Therefore, the joint sparse and dense representation of query sample can be described as follows: where κ and φ are regularization parameters. To solving the problem in Equation (23), we also adopt the inexact ALM method. Thus, the optimization problem can be transferred into an augmented Lagrangian function.
where φ > 0 is a penalty parameter and p is the Lagrange multiplier. By minimizing the augmented Lagrangian function, the variables are updated alternately while keeping other variables unchanged. The iteration stops when it meets the convergence condition.
Since the class dictionary includes the class information of samples, the sparse representation part coincides with the class-label. Thus, the decision is made by calculating the class which has the minimal reconstruction error. Input: training sample X = [x 1 , x 2 , ..., x n ] ∈ R m×n and query sample y. Output: class label of y.
Step 1: Compute multi-scale monogenic features of training samples {X (x i )} n i=1 .
Step 2: Compute multi-scale monogenic features of query sample X (y).
Step 3: Learn a shared dictionary Φ and a class dictionary Ψ with Equation (9).
Step 4: Compute the sparse representation and the dense representation of query sample over the learned dictionary, as shown in Equation (23).
Step 5: Predict the identity by seeking the minimal reconstruction error, as defined in Equation (25).

Experiments and Discussion
This section evaluates the performance of the proposed strategy on MSTAR database. All target images chips cover all 0 • -359 • aspect angles at a range of depression angles 15 • and 17 • . We cropped 64 × 64 pixels for all samples from the center of the original 128 × 128 pixels images to reduce the influence of the background. Series of experiments were performed under extended operating conditions, including configuration variation, version variation, partial occlusion and noise corruption. The proposed strategy JMSDR was compared with several studied methods in terms of recognition performance: MSRC [14], ESRC [22], FDDL [37], LRSDL [38], SSRC [23], and all-convolutional network (A-ConvNets) [39]. The scale parameters for monogenic signal was set to S = 3. The parameters of the proposed strategy JMSDR were experimentally set as κ = φ = 10.

Preliminary Verification
The performance of the proposed strategy was assessed firstly on the four classes targets, i.e., BMP2, BTR60, T72 and T62. The images acquired with 17 • and 15 • of depression angle were set as training and testing, respectively. The type and number of samples, as well as the series number of configuration, are all listed in Table 1. The detail of configuration is explained in Section 3.2. The examples of four classes targets are shown in Figure 4.  The recognition performance across different methods is listed in Table 2. The recognition accuracy of the proposed strategy JMSDR is higher than that of MSRC because the discriminative ability can be promoted by learning dictionary. Compared to 0.8931 for FDDL, 0.9198 for ESRC, 0.9023 for LRSDL, and 0.9267 for SSRC, the recognition results show that the JMSDR outperforms these dictionary learning methods, demonstrating that the jointed sparse and dense representation can represent the samples more accurately. The experimental result reveals that joint sparse and dense representation of multi-scale monogenic components is an effective way to describe targets for classification.

Results on 10-Class
To comprehensively assess the performance of the proposed strategy, we firstly performed the experiment on 10-class target recognition. The training samples and the test samples were collected at 17 • and 15 • depression angles, respectively. The types of targets ( BMP2 and T72) in test samples were not covered by the training samples. The type and numbers of training and testing samples can be found in Table 3. In Table 4, we compare the recognition results of MSRC, FDDL, ESRC, LRSDL, SSRC, A-ConvNets and the proposed strategy for 10-class targets. The best results come from A-ConvNets, which is 1.96% better than JMSDR method. Specifically, JMSDR attains a high recognition accuracy of 0.9356. It is fairly comparable to the SSRC, and outperforms the MSRC, FDDL, ESRC and LRSDL methods by margins of 0.77%, 4.88%, 0.39% and 2.62%, respectively.

Results on Configuration Variance
Configuration variance is inevitable for SAR target recognition, as the targets have physical differences and structure modifications. To evaluate the reliability and stability of JMSDR under the configuration variance EOC, the four-class problem (denoted by configuration variance EOC-1) with training and testing samples chosen from BMP2, BTR60, T72 and T62, was firstly considered. The standards of BMP2 and T72 were used for training, while the other variants were employed for testing, as described in Table 1. Table 5 shows the performance comparison between the proposed strategy JMSDR and the basic methods. Except for a marginal difference compared with A-ConvNets, the proposed strategy JMSDR achieves a better recognition performance than the other methods. Specifically, the recognition accuracies for MSRC, FDDL, ESRC, and LRSDL are all below 0.9. Table 6 gives the confusion matrix of JMSDR. In Table 6, we can see that the performance for BTR60 and T62 are satisfactory, while BMP2 and T72 are more like to be misclassified. This is because BMP2 and T72 in the training and testing sets employ different types of targets. Compared to the other methods, JMSDR could achieves a better performance when the configuration variants of targets in training samples are insufficient. Then, we considered another four-class configuration variance problem (denoted by configuration variance EOC-2). Specifically, the training samples were chosen from BMP2, BRDM, BTR70 and T72, and the test set comprised two variants of BMP2 and five variants of T-72. The detailed information is listed in Table 7. Both configurations were different between the training images and testing images. Figure 5 shows an example of optical and SAR images for several T72 configuration variants. Table 8 shows the overall recognition accuracies of MSRC, FDDL, ESRC, LRSDL, SSRC and the proposed strategy JMSDR. The confusion matrix of the recognition results for each method is shown in Figure 6. In each subfigure, the row and column corresponding to the same class describe the number of correct classifications. From the confusion matrix, it can be seen that BMP2 is more easily misclassified than T72. This is because that targets of T72 have more common information in the testing samples and training samples. Therefore, compared to BMP2, T72 is better represented by the training samples. Table 8 shows that the MSRC gets the worst recognition result, which reveals that the prespecified dictionary cannot deal well with the configuration variance. The overall recognition accuracy of JMSDR is 0.9543, which is better than MSRC (0.9148), FDDL (0.9176), ESRC (0.9403), LRSDL (0.9190), and SSRC (0.9279) and is fairly comparable with A-ConvNets. The experimental results confirm the validity of the joint sparse and dense representation, which could deal with the configuration variance even under multiple structure variants.

Results on Version Variance
The version variant refers to targets of the same class but built with different blueprints [40]. In practical applications, the training set usually consists of different versions of the target. Therefore, the following experiment evaluated the recognition performance of the proposed strategy JMSDR under the version variant EOC. In the MSTAR public dataset, T72 has multiple versions. The official recommendation is that the version SN_132 of T72 is set as the training samples, while the other versions of T72 including SN_S7, SN_A32, SN_A62, SN_A63 and SN_A64 are set as testing samples. In addition, BMP2, BRDM2 and BTR70 were included in the training dataset as the interference items for target recognition. Since the versions of targets in training samples were insufficient, the difficulty of target recognition was increased. All training samples were collected at a 17 • depression angle, while the testing samples were collected at both 15 • and 17 • depression angles. Table 9 list the detail information on the training and testing samples. An example of the optical and SAR images of the target under T72 version variants is illustrated in Figure 7 Table 8 shows that the recognition accuracy of MSRC is lowest. It reveals that only the sparse representation of monogenic components cannot sufficiently reconstruct the query sample when the information included in training sample is not enough. Compared to MSRC, the performances of FDDL, ESRC, LRSDL, SSRC and A-ConvNets are significantly better. In contrast, the JMSDR represents the query sample more accurately because it learns a common dictionary and a shared dictionary. The performance of JMSDR is 0.9967, which is 1.04%, 4.35%, 3.1%, 2.59%, and 0.96% higher than 0.9863 for FDDL, 0.9535 for ESRC, 0.9657 for LRSDL, 0.9708 for SSRC, and 0.9871 for A-ConvNets, respectively. The experimental results indicate that JMSDR can effectively deal with the SAR target recognition task with version variants. Target BMP2 BRDM2 BTR70 T72  Target BMP2 BRDM2 BTR70 T72   S7  23  0  0  396  S7  1  0  0  418   A32  34  5  0  533  A32  0  1  1  570   A62  32  17  0  524  A62  0  2  1

Results on Noise Corruption
Real SAR images usually contain noises due to the inherent imaging mechanism. We added random noise to the four-class targets, which were used for conducting the experiment (see Table 1). Specifically, a percentage of pixels from original images was randomly chosen and replaced with independent and identically distributed samples within the image pixel values, which is used in several relevant studies [3,17]. However, the difference is that we randomly corrupted 60% of training samples and testing samples, not all the test samples. In comparison, our experimental setup is more demanding.
Some examples of noise corruption with different noise levels are shown in Figure 9. The percentage of corrupted levels was varied from 10% to 50%. The graph in Figure 10 shows the recognition performance of JMSDR and its four competitors. The proposed strategy outperforms others for all levels of noise corruption. For up to 30% noise corruption, JMSDR correctly identifies over 80% of test samples. Even at 40% corruption, the recognition accuracy of JMSDR is about 10% higher than that of MSRC, FDDL and LRSDL. The results validate that JMSDR is robust towards noise corruption.

Results on Partial Occlusion
The target could be partially occluded by different obstacles under changing environmental conditions, such as trees or manmade buildings. Hence, it is necessary to evaluate the performance of target recognition on partial occlusion. According to the occlusion method in [41], the levels of occlusion from 0% to 60% were simulated, by replacing target region of each SAR image with the background intensities from eight different directions. Figure 11 shows the original SAR images and the partial occluded SAR images at 25% level from four different directions. In our experiments, we corrupted 60% of training samples and testing samples and their occlusion directions were different, which further increased the difficulty of target recognition.
The recognition results are shown in Figure 12. The occlusion directions for training samples and testing samples are Directions 1 and 3 in Figure 12a, respectively. In Figure 12b, the occlusion directions are employed at Direction 1 for training samples and Direction 5 for testing samples. Obviously, the performance of JMSDR is significantly better than the studied methods at all levels of occlusion. When the percentage of occlusion increases, the recognition rates of LRSDL and FDDL quickly drop, while the performance of JMSDR is relatively stable. This can be attributed to the dense representation based on the shared dictionary, as the query sample is represented by more samples which have common information. The results fully demonstrate the advantage of shared dictionary. In addition, the results prove that the proposed strategy could deal with partial occlusion.

Conclusions
To handle the extended operation conditions of SAR target recognition, a new strategy is proposed via joint sparse and dense representation of monogenic signal. Unlike the monogenic features of training samples that are directly utilized to generate dictionary for spare representation, a class dictionary and a shared dictionary are learned through decomposing the monogenic features. The combination of the sparse representation of the class dictionary and the dense representation of the shared dictionary provides a better representation of the query sample. Hence, the query sample can better coincide with the right class. According to the results of extensive comparative experiments, some conclusions can be obtained. (1) Compared withthe prespecified dictionary, the learned dictionary, including class dictionary and the shared dictionary, promotes the discriminative ability. (2) Joint sparse representation of class dictionary and dense representation of shared dictionary improves the recognition accuracy. (3) The proposed strategy could deal with various target variants, including configuration variants and version variants, especially for the insufficient variants in training samples. (4) Although there is a large amount of noise corruption and partial occlusion in samples, the proposed strategy still achieved a better recognition results, consistently exceeding the competitors.