SAR Image Recognition with Monogenic Scale Selection-Based Weighted Multi-task Joint Sparse Representation

: The monogenic signal, which is deﬁned as a linear combination of a signal and its Riesz-transformed one, provides a great opportunity for synthetic aperture radar (SAR) image recognition. However, the incredibly large number of components at different scales may result in too much of a burden for onboard computation. There is great information redundancy in monogenic signals because components at some scales are less discriminative or even have negative impact on classiﬁcation. In addition, the heterogeneity of the three types of components will lower the quality of decision-making. To solve the problems above, a scale selection method, based on a weighted multi-task joint sparse representation, is proposed. A scale selection model is designed and the Fisher score is presented to measure the discriminative ability of components at each scale. The components with high Fisher scores are concatenated to three component-speciﬁc features, and an overcomplete dictionary is built. Meanwhile, the scale selection model produces the weight vector. The three component-speciﬁc features are then fed into a multi-task joint sparse representation classiﬁcation framework. The ﬁnal decision is made in terms of accumulated weighted reconstruction error. Experiments on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset have proved the effectiveness and superiority of our method.


Introduction
Synthetic aperture radar (SAR) automatic target recognition (ATR) is becoming increasingly important as the development of radar technology continues [1]. Much research has been based on SAR images [2]. However, the traditional pixel intensity cannot be always treated as a reliable feature in the classification field. In recent years, the monogenic signal has been used for SAR ATR because of its ability to capture the characteristics of SAR images. The monogenic signal is a generalized extension of the analytic signal in high-dimensional space,, and was first introduced by Felsberg and Sommer in 2001 [3]. Similarly to the Hilbert transform to a one-dimensional (1-D) analytic signal, the original signal can be orthogonally decomposed into three components with the Riesz transform. The three components are local energy, local phase, and local orientation. The decomposed operation is usually used for high-dimensional signal analysis and processing. This decoupling strategy makes it possible to deal with many problems in the field of image processing, especially when the traditional pixel intensity can not be treated as a reliable feature in the classification field. Furthermore, the monogenic signal has the ability to capture broad spectral information, which is useful for SAR image recognition. Then, the monogenic scale space is proposed to unify scale-space theory and the phase-based signal error can be also determined adaptively. The process of making decisions based on the measurement model becomes more reasonable and the recognition accuracy is improved.
In this paper, a scale selection method, based on a weighted multi-task joint sparse representation (abbreviated as WTJSR), is proposed for SAR image recognition. Three components of the monogenic signal at different scale spaces are extracted from the original SAR images, which carry rich information for target classification. Meanwhile, the data set becomes enormous. Then, a scale selection model based on the Fisher discrimination criterion is designed. The higher the score, the more discriminative the components are at the corresponding scale. A global Fisher score is proposed to measure the discriminative ability of components at each scale. The less discriminative scales are abandoned and the rest of the components are concatenated to three component-specific features. Meanwhile, the adaptive weight vector is provided by the scale selection model. The three component-specific features are then fed into a tri-task joint sparse representation classification framework. The final decision is made by the cumulative weighted reconstruction error. Our contributions are shown below: (1) We introduce a novel joint sparse representation method (WTJSR) with the components of the monogenic signal (2) We propose a scale-selection model based on a Fisher discrimination criterion to effectively use the information contained in monogenic signal and establish the adaptive weight vector due to the heterogeneity of the three component-specific features. The rest of this paper is organized as follows. In Section 2, the SRC and MTJSRC are introduced. In Section 3, the monogenic signal is introduced and Fisher's discrimination criterion-based monogenic scale selection is proposed. The WTJSR is analyzed. Afterwards, several experiments are presented in Section 4. Conclusions are provided in Section 5.

Related Work
This section briefly introduces two prior concepts, SRC and MJSRC. Some necessary terms are first described to facilitate the following description. Assume that the image size is w by h pixels. Every image is transformed to a column vector R m where m = w × h.

SRC
The sparse signal representation technique has been extensively studied over the last decade. The resurgent development of many theoretical analysis frameworks [22,23] and effective algorithms [24] has been witnessed. The applications of the sparse signal representation technique mainly include radar imaging [25,26], image restoration [27], image classification [28,29], and pattern recognition [15,30]. The key of sparse signal model is based on the fact that a certain signal can be represented by an overcomplete basis set (dictionary). The convex strategy is generally adapted to finding an optimal solution [31,32].
The number of training samples from all the classes is n. Let X k = x k,1 , x k,2 , . . . , x k,n k ∈ R m×n k be the concentration of the training samples from the kth (k = 1, 2, . . . , K) class, where n = n 1 + n 2 + . . . + n K . Sparse representation is based on a simple assumption that the new query sample of the same class lies in the same subspace of training samples with a much lower dimension [33][34][35]. Therefore, the test sample y with the kth class can be represented as The vector of scalar coefficients is α k = α k,1 , α k,2 , . . . , α k,n k T ∈ R n k .
As a matter of fact, the class of the test sample is almost unknown initially. Therefore, the test sample y is represented by the whole training group. Let n = K ∑ k=1 n k denote the total number of training samples and X = [X 1 , X 2 , . . . , X K ] ∈ R m×n represent the training set. Based on the whole training set, the test sample y can be rewritten as where α = [α 1 , α 2 , . . . , α K ] T ∈ R n is the coefficient vector. Most of the elements in the representative vector α have values of zero, except those related to the training samples of the kth class. Theoretically, the target recognition problem is converted into solving the linear system y = Xα.
Frequently, the solution to the equation y = Xα is not unique because of the fact that m < n. A popularly used solver seeks the sparsest linear combination of the dictionary to represent the test sample [36]: min where ε is the allowed error tolerance. Traditionally, sparsity is measured using the l 0 -norm by counting the number of nonzero elements. The problem of finding the optimally sparse representation, i.e., minimum α 0 , is a combinatorial optimization problem. However, Equation (3) has been proven to be non-deterministic polynomial (NP) hard and it is difficult to find an optimal solution in theory [37]. Considering the difficulty in solving large combinatorial problems, some algorithms such as orthogonal matching pursuit and relaxed formulations have been proposed over the years [33,38,39]. Since the solution is sparse enough, Equation (3) can be relaxed as l 1 -norm minimization [40] min In the ideal situation, most of the elements in the representative α are zero except those associated with the ground-truth class. In practice however, the recovered sparse representation vector has most of its nonzero elements with large absolute values associated with the ground-truth class, while small or zero values are distributed elsewhere. Hence, the minimum reconstruction error criterion is used to decide the label of the test sample y with the optimal solutionα [41]; class (y) = arg min k∈(1,2,...,K)

MTJSRC
SRC is proposed to deal with the classification problem with a single test sample or feature. MTJSRC is developed to expand the SRC algorithm in multiple features. Suppose that the number of features extracted from the original images is P. For each modality (task) index, p = 1, 2, . . . , P, denote A p = A p 1 , A p 2 , . . . , A p K as the pth training feature modality matrix. k = 1, 2, . . . , K refers the columns associated with the kth class. A multi-task linear representation problem with P modalities of features extracted from the original data can be described as: Inspired by SRC, the sparse representation of each modality y p , p = 1, 2, . . . , P can be obtained from the optimization of the following l 1 -norm problem: The optimal solutionα p , p = 1, 2, . . . , P can be obtained by computing the optimization problem repeatedly P times. Given all of the sparse representation vectors, the minimum reconstruction error criterion accumulated over all the P tasks is used to decide the label of the original test sample y: When P = 1, SRC is obtained.

Monogenic Signal
The monogenic signal is an extension from analytic signal theory and Hilbert transform theory [3]. The analytic signal is a complex-valued representation of a 1-D real valued signal f (1) where f H (x) = f (1) * 1 πx denotes the Hilbert-transformed signal. In the above definition, the local amplitude A(x) and the local phase φ(x) are shown in another appearance of the analytic signal: where the local amplitude A(x) represents the local energy, and the local phase ϕ(x) changes if the local structure varies.
In order to deal with high-dimensional signals like images and videos, the monogenic signal has been developed. The monogenic signal is built around the Reisz transform, which is an extension of the Hilbert transform from 1-D to N-D. In the 2-D case, the spatial representation of the Riesz kernel is and its frequency response is The expression of Riesz-transformed signal in the spatial domain is (13) where * denotes the convolution.
For an image f (z), the monogenic signal is composed of the image signal f (z) itself and its Riesz transformed one f R (z): where i and j are the imagery units, and {i, j, 1} forms an orthonormal basis in R 3 . Similarly to the analytic signal, the original image signal is decomposed orthogonally into three components: local amplitude A, local phase φ, and local orientation θ, which can be generated as where Similarly to the analytic signal, the local amplitude A describes the local energetic information and the local phase φ denotes the local structural information. The major difference is that the monogenic signal has one more component, i.e., local orientation, which describes the geometric information.
The signals are usually of finite length in applications. Therefore, in order to extend the signal to be infinite, a bandpass filter is used before the Reisz transform. Then, the monogenic signal can be rewritten as where h b (z) denotes the bandpass filter. The log-Gabor filters have the ability to catch broad spectral information [42]. Hence, the log-Gabor filter bank is employed in this paper. The frequency response of the log-Gabor filter can be generated as where ω 0 is the center frequency and σ is the scaling factor of the bandwidth. Given that s is the scale index, the multiresolution monogenic signal representation can be acquired, which forms the monogenic scale space. The monogenic signal with scale S (S = 10) is shown in Figure 1.
Suppose the scale parameter is S. The monogenic scale space f = {f A , f φ , f θ } of local amplitude, local phase, and local orientation can be described as

Fisher Discrimination Criterion-Based Monogenic Scale Selection
As we can see from Equation (3), the data size of the feature set f is increased by a factor of 3S compared with that of the original image data set. The feature set normally leads to considerable computational complexity because of redundancy and high dimensions, which makes it difficult to be applied in the recognition system directly. In order to deal with this problem, a multitask joint sparse representation strategy based on a scale selective model is proposed in this paper without concatenating all the features at S scales together in the learning system [21,43]. The features at some scales may be less discriminative or even have a negative effect on classification. In order to verify this statement, a typical example and some analyses will be given later.
A monogenic scale selection method based on Fisher's discrimination criterion is proposed to solve this problem. This method is aimed at finding the most discriminative features in monogenic scale space. As we discussed before, the three components of the monogenic signal, i.e., local amplitude, local phase, and local orientation are three different types of features. Fisher's discrimination criterion is employed separately to the scale space of each component.
In the scale space of the amplitude features, let the training samples of the kth class in the sth scale be A s k . The within-class and between-class distances (denoted as Υ W (A s k ) and Υ B (A s k )) in the scale space of the amplitude features with the kth class are, respectively, defined as where m s k and m s are the mean vectors of A s k and A s , respectively, which are defined as where n k is the number of training samples of the kth class in the scale space of the amplitude features, and n is the number of training samples in the scale space of the amplitude features. (·) T denotes the transpose of a matrix or vector. According to Fisher's linear discriminant analysis, the classification accuracy is associated with the within-class and between-class distance. Furthermore, the within-class distance should be minimized and the between-class distance should be maximized in order to achieve high recognition accuracy. Inspired by previous work, the Fisher score of the kth class in the sth amplitude scale is defined as The matrix AC ∈ R S K of the Fisher score in the amplitude scale space can be described as Each row vector of the local Fisher score matrix AC is normalized to obtain the feature weight in each class and scale space. Obviously, the larger the value of AC s k , the more discriminative the amplitude feature of the kth class in sth scale. Therefore, the matrix AC provides the indicator for choosing the most discriminative features of each class in the scale space. The global Fisher score in s scale can be generated as AC can be rewritten as Clearly, AC is the vector of global Fisher score, which has provided the information of how representative the features are in the scale space. Similarly, it is easy for us to acquire weight vector of the phase scale space and the orientation scale space, respectively The row vectors, AC, PC, and OC are sorted in descending order. Supposing V is the number of selected scales in each component scale space, the chosen features of 3V scales will be applied in the classification system rather than the whole monogenic scale space. Hence, the chosen scale matrix D can be generated as    where the elements AD v , PD v , and OD v are the chosen scale numbers of the amplitude, phase and orientation scale space, respectively. The component with the corresponding scale can be described as

Classification via Tri-Task Joint Sparse Representation of Monogenic Signal
The data size is still too large for classification after scale selection. Therefore, the independent and identically distributed (IID) Gaussian random project matrix is employed for the components of the monogenic signal (local amplitude, local phase, local orientation) at V scales. This projection is aimed at reducing the dimension and redundancy. After projection, each component is then transformed into vectors.
Finally, the obtained vectors are concatenated to generate a component-specific feature vector, which can be described as where vec(·) denotes the reshaping operation from a matrix to a vector. The generation of monogenic-specific features based on scale selection is shown in Figure 2. After scale selection and feature concatenation, the multi-task joint sparse representation turns out to be a tri-task joint sparse representation. Three overcomplete dictionaries can be built based on the training sample set X k = x k,1 , x k,2 , . . . , x k,n k ∈ R m×n k . Let the monogenic component-specific feature corresponding to the test sample x k,j be (χ A (x k,j ), χ P (x k,j ), χ O (x k,j )) by Equation (31). The overcomplete dictionary can be formulated as

Test
The test sample y can be also described as Similarly to the multi-task joint sparse representation, the minimum reconstruction error criterion accumulated over all the three task is used to decide the label of the original test sample: class (y) = arg min k∈(1,2,...,K) where W = (w 1 , w 2 , w 3 ) denotes the weight vector. Since the three components of monogenic signal show different characteristics, the elements of the weight vector W should not be considered equally as usual. Moreover, the weight vector should be adaptive when the training data set changes. The value of w p is larger when the corresponding component is more discriminative. The elements of weight vector are the global Fisher scores of each component-specific feature, which can be generated as The proposed method in this paper, i.e., monogenic scale selection-based weighted tri-task joint sparse representation classification (WTJSRC), is outlined in Algorithm 1.

Input:
SAR image data R, original training set X ∈ R m×n (with n samples) and test set Y ∈ R m×l (with l samples) from K classes; The number of total scales S; The number of selected scales V; Output: Identity ς for all test samples; 1: BEGIN 2: Acquire the monogenic signal of all the original training samples by (8) and (9), from which the V discriminative scales can be selected by Fisher discrimination criterion; 3: Build the weight vector W = (w 1 , w 2 , w 3 ) with global Fisher score of each component; 4: Generate the monogenic component-specific features with scale selection by (31); 5: Build three overcomplete dictionaries X 1 , X 2 , X 3 ; 6: for j = 1 to l do 7: ς j = arg min

Experimental Results
The Moving and Stationary Target Acquisition and Recognition (MSTAR) public database is used to evaluate the performance of the proposed method. SAR images in MSTAR dataset have a resolution of 0.3 m × 0.3 m with HH polarization. The azimuth angles of SAR imagery are from 0 • to 360 • and adjacent angle intervals are from 1 • to 2 • . The SAR images are of 128 × 128 pixels in size and are cropped in pretreatment process to extract 64 × 64 patches from the center. After pretreatment, all the SAR images are of 64 × 64 pixels in size. The IID Gaussian random project matrix is used to reduce the dimension of each component from 64 × 64 to 12 × 12. For the multiresolution monogenic signal representation, 10 is the scale index (S = 10) in this paper. To verify the effectiveness of our proposed method, several methods shown in Table 1 are studied.
In the rest of the section, several experiments are carried out to evaluate the performance of our method proposed in this paper.

Scale Parameter Experiments
First of all, the estimation of the selected scale parameter (V) in a 10-scale space is considered. Since monogenic signal is acquired from the original SAR images, it is essential to determine the optimal value of V. The dataset in Table 2 is used to compare the performance of our method with each V (from 1 to 10). As shown in Table 2, three targets (BMP2, BTR70, T72) are employed. Among them, BMP2 and T72 have several variants with small structural modifications (denoted by series number). Training set is composed with the standards (sn c9563 for BMP2, sn c71 for BTR70 and sn 132 for T72) at the depression angle of 17 • . The determination of V is based on data set 1 by comparing the performance of our method with each V (from 1 to 10). The component-specific features vary with each V (from 1 to 10), which is shown in Figure 3. In order to get rid of the influence of the IID Gaussian matrix, the same Gaussian random matrix is employed to process both the training and testing samples. In addition, 10 Gaussian matrices are used separately, and the final decision is made by the residual value in each class.
The recognition rate of the value of V is shown in Table 3 and the computational time of each V is shown in Figure 4.

SAR Image Classification under Standard Operating Conditions
We focus on performance evaluation of our method under standard operating conditions (SOCs). The testing data set is a collection of images with all ten classes acquired at the depression angle of 15 • . The training data set is a collection of images with all ten class acquired at the depression angle of 17 • . Similarly to the data set 1, only the variants of BMP2 and T72, Sn 9563, and Sn 132 (in bold in Table 4), are available for training. The confusion matrices under SOCs are shown in Figure 5. The computation costs of sparse representation-based approaches under SOCs are shown in Table 5.

SAR Image Classification under Extended Operating Conditions
Two experiments are designed to evaluate the performance of our proposed method under extended operating conditions (EOCs). First all, three types of targets (2S1, BRDM2, and ZSU234) in the MSTAR data set at depression degree of 30 • are used as test data set, which is shown in Table 6. The corresponding confusion matrices are shown in Table 7. In another EOC test scenario, the algorithm is evaluated with respect to target configuration and version variants. Considering several variants of BMP2 and T72, the SAR images of BMP2 (Sn 9563) and T72 (Sn 132) collected at the depression degree of 17 • are used as the training data set, while the testing data set includes SAR images of BMP2 (Sn 9566, Sn c12) and T72 (Sn 812 and Sn s7) collected at the depression degree of of 15 • . The data set under EOC-2 is shown in Table 8. The corresponding confusion matrices are shown in Table 9.
The recognition rate of our proposed method is compared with three widely cited methods including k-nearest neighbor (KNN) and three sparse representation-based methods under EOC, as shown in Table 10. TJSR(1) denotes the TJSR method without scale selection, while TJSR(2) represents the scale selection-based TJSR method.

Scale Parameter Analysis
From Table 3, the recognition rate increases with the value of V up to V = 7. The recognition rate drops after the value of V increases to 8. The main reason is that the features at some scales are less discriminative and even have a negative effect on classification. The result also indicates that the features of the low Fisher score reduce the recognition rate of our method, which partly improves the effectiveness of Fisher's discrimination criterion.
The computational time of each V on data set 2 is shown in Figure 4. The computation time was recorded using a personal computer with a 4-core Intel processor of 4.0 GHz and 8 GB RAM. The results are the average of 30 experiments. We can see that the computation time increases with the increase of the value V. The reason is that the dimensions of component-specific features increase with the increase in the value V, as shown in Figure 3. Experimental results show that Fisher's discrimination criterion-based monogenic scale selection reduces the computation load and improves the accuracy of the identification as compared with the method adapting all the scale space into the classification.
Based on the above analysis, we may assess that there is a appropriate tradeoff between the recognition rate and the computation load when V = 5.

Analysis of the Recognition Rate under SOC
We first compare our proposed scale selection-based WTJSR approach with the other three methods based on sparse representation. As we can see from Figure 5, the total recognition rates for our proposed approach are 1.39% and 1.40% better than for TJSR with and without scale selection, respectively. This means our scale selection and weight vector based on Fisher's discrimination criterion contribute positively to the classifier. Although the recognition rates of BTR60, 2S1, BRDM2, D7 and T62 based on our approach are lower than with SRC, the overall recognition rate of our proposed approach is 1.56% higher than for SRC with three consecutive views. Since multi-view images carry more information than a single-view image, the recognition rates with multiple-view images are higher than those with a single-view image in sparse representation-based approaches. Our method based on a single-view image not only closes the gap but also achieves a higher overall recognition rate. The recognition rate of scale selection-based TJSR is a little bit higher than for TJSR without scale selection. As the weight vector can be acquired offline from the training data set, we may assess that the less informative features are abandoned and the computation load decreases to nearly 50% by scale selection (Figure 4). The scale selection-based TJSR has a smaller computation with no recognition rate impairments as compared with TJSR without scale selection.
We also compare scale selection-based WTJSR with four widely cited approaches in ATR literature. The methods, including the conditional Gaussian model (CondGauss), SVM, AdaBoost, and iterative graph thickening (IGT) with and without a pose estimator as proposed in [46], are studied. As we can see, the pose estimator can significantly improve the performance of the four methods. In other words, the performance of the four methods depends strongly on the accuracy of pose estimation. The performance of our proposed method is much better than the four methods without pose estimation. Even in the case when the CondGauss, SVM, and AdaBoost algorithms use a pose estimator as a preprocessing step, the recognition rate of our method is still 1.35%, 3.87%, and 1.33% better than that of the three methods, respectively. The recognition rate of our method is slightly lower than that of IGT with pose estimation. All the results under SOCs prove the superiority of our proposed method.
In addition, we compare the computational costs of sparse representation-based approaches. The computational cost mainly involves two parts: the offline training and online testing. The computational cost of training is of little importance for SAR ATR because the process of training can be run offline. As we can see, the computational cost of TJSR and WTJSR is higher than for SRC. The reason is that the monogenic signal has large number of components at different scales and causes too much of a burden for computation onboard. The computational cost of WTJSR is lower than for TJSR due to the scale selection model of WTJSR. This result proves the effectiveness of the scale selection model of WTJSR.

Analysis of the Recognition Rate under EOCs
The experiment under EOCs aims to investigate the practicability of the method proposed in this paper. Under EOC-1, the overall recognition rate of the proposed WTJSR method is 14%, 12%, 8%, 5%, and 3% better than for the competing methods KNN, SVM, IGT, TJSR, scale selection-based TJSR, and WTJSR, respectively. Under EOC-2, the proposed WTJSR method still achieves the highest recognition rate at 90%. There are visible signs of the improvement in the recognition rate under EOCs. The experimental results indicate that our method is more robust with respect to large depression variation and version variants than the competitors.

Conclusions
This paper presents a scale selection-based tri-task joint sparse representation method for SAR image recognition. Our proposed approach can effectively process the huge data volume of the monogenic signal and reduce the negative effect of the less informative scale space. In addition, the adaptive weight vector is proposed based on the scale selection model due to the heterogeneity among the three component features of the monogenic signal.
We also illustrate the recognition rate of our method by experiments under SOCs and EOCs. The results of our method are compared with not only state-of-art algorithms such as SVM, AdaBoost, CondGauss, and IGT, but also sparse representation-based algorithms such as SRC and TJSR. The recognition rate of our method is 1.39% and 1.40% better than that of TJSR with and without scale selection, respectively. The scale selection-based TJSR has a smaller computational load but no recognition rate impairments as compared to TJSR without selection. Furthermore, the weight vector based on Fisher's discrimination criteria can effectively improve the recognition rate. The experimental results show the effectiveness of our method. We conclude that it is necessary to evaluate the reliability of the components of the monogenic signal at different scales, and the adaptive weight process is a very important step in classification algorithms based on the monogenic signal due to the heterogeneity among the three component features.

Conflicts of Interest:
The authors declare no conflict of interest.