Exploiting Multi-View SAR Images for Robust Target Recognition

: The exploitation of multi-view synthetic aperture radar (SAR) images can effectively improve the performance of target recognition. However, due to the various extended operating conditions (EOCs) in practical applications, some of the collected views may not be discriminative enough for target recognition. Therefore, each of the input views should be examined before being passed through to multi-view recognition. This paper proposes a novel structure for multi-view SAR target recognition. The multi-view images are ﬁrst classiﬁed by sparse representation-based classiﬁcation (SRC). Based on the output residuals, a reliability level is calculated to evaluate the effectiveness of a certain view for multi-view recognition. Meanwhile, the support samples for each view selected by SRC collaborate to construct an enhanced local dictionary. Then, the selected views are classiﬁed by joint sparse representation (JSR) based on the enhanced local dictionary for target recognition. The proposed method can eliminate invalid views for target recognition while enhancing the representation capability of JSR. Therefore, the individual discriminability of each valid view as well as the inner correlation among all of the selected views can be exploited for robust target recognition. Experiments are conducted on the moving and stationary target acquisition recognition (MSTAR) dataset to demonstrate the validity of the proposed method.


Introduction
Automatic target recognition (ATR) from synthetic aperture radar (SAR) images has important meanings for its pervasive applications in both the military and civil fields [1]. All these years, researchers have tried to find or design proper feature extraction methods and classification schemes for SAR ATR. Principal component analysis (PCA) and linear discriminant analysis (LDA) [2] are usually used for feature extraction from SAR images. Other features, such as geometrical descriptors [3], attributed scattering centers [4,5], and monogenic spectrums [6], are also applied to SAR target recognition. As for the decision engines, various classifiers, including support vector machines (SVM) [7], sparse representation-based classification (SRC) [8,9], and convolutional neural networks (CNN) [10] are employed for target recognition and have achieved delectable results. Despite great effort, SAR target recognition under extended operating conditions (EOCs) [1] is still a difficult problem.
Most of the previous SAR ATR methods make use of single view SAR image. The single view image is compared with the training samples to find its nearest neighbor in the manifolds spanned by individual training classes. Due to the unique mechanism of SAR imaging, an SAR image is very sensitive to the view angle [11]. Hence, the usage of multi-view SAR images may help in the interpretation of SAR images. Actually, multi-view SAR exploitation has been successfully applied to SAR image registration [12]. As for SAR target recognition, it is predictable that the classification performance may vary with the view angle and the usage of multi-view images will probably improve the effectiveness and robustness of SAR ATR systems. As in the case of remote sensing data fusion [13], the benefits of using multi-view SAR images can be analyzed from two aspects. On the one hand, the images from different views can provide complementary descriptions for the target. On the other hand, the inner correlation among different views is also discriminative to target recognition. Therefore, the exploitation of multiple incoherent views of a target should provide a more robust classification performance than single view classification. Several ATR algorithms based on multi-view SAR images have been proposed, which can be generally divided into three categories. The first category uses parallel decision fusion for multi-view SAR images based on the assumption that the multiple views are independent. Brendel et al. [14] analyze the fundamental benefits of aspect diversity for SAR ATR based on the experimental results of a minimum-square-error (MSE) classifier. A Bayesian multi-view classifier [15] is proposed by Brown for target classification. The results demonstrate that use of multiple SAR images can significantly improve the ATR algorithm's performance even with only two or three SAR views. Bhanu et al. [16] employ scatterer locations as the basis to describe the azimuthal variance with application to SAR target recognition. The experimental results demonstrated that the correlation of SAR images can be only maintained in a small azimuth interval. Model-based approaches [17] are developed by Ettinger and Snyder by fusing multiple images of a certain target at different view angles, namely decision-level and hypothesis-level fusion. Vespe et al. [18] propose a multi-perspective target classification method, which uses function neural networks to combine multiple views of a target collected at different locations. Huan et al. [19] propose a parallel decision fusion strategy for SAR target recognition using multi-aspect SAR images based on SVM. The second category uses a data fusion strategy. Methods of this category first fuse the multi-view images to generate a new input, which is assumed to contain the discriminability of individual views as well as the inner correlation of these views. Then, the fused data is employed for target recognition. Two data fusion methods for multi-view SAR images are proposed by Huan et al., i.e., the PCA method and the discrete wavelet transform (DWT) fusion method [19]. The third category uses joint decision fusion. Rather than classifying multi-view images respectively and then fusing the outputs, the joint decision strategy puts all the views in a unified decision framework by exploiting their inner correlation. Zhang et al. [20] apply joint sparse representation (JSR) to multi-view SAR ATR, which can exploit the inner correlations among different views. The superiority of multiple views over a single view is quantitatively demonstrated in their experiments. In comparison, the first category neglects the inner correlation among multi-view images. Although the data fusion methods consider both individuality and inner correlation, it is hard to evaluate the discriminability loss during the fusion process. The third category considers the inner correlation in the decision process, but the strategy may not work well when the multi-view images are not closely related.
In the above literatures, multi-view recognition has been demonstrated to be much more effective than single view recognition. However, some practical restrictions on multi-view recognition are neglected. Due to the EOCs in real scenarios, some of the collected views may be severely contaminated. Then, it is unwise to use these views during multi-view recognition. In this paper, a novel structure for multi-view recognition is proposed. The multiple views of a target are first classified by SRC. Based on the reconstruction residuals, a principle is designed to judge whether a certain view is discriminative enough for multi-view recognition. Then, JSR is employed to classify the selected views. However, when the input views are not closely related, the joint representation over the global dictionary is not optimal due to an incorrect correlation constraint. As a remedy, in this paper, the selected atoms of each of the input views by SRC are combined to construct an enhanced local dictionary. The selected atoms and their neighboring ones (those sharing approaching azimuths) are used to construct the enhanced local dictionary, which inherits the representation capability of the global dictionary for the selected views. Meanwhile, it constrains the atoms for representation, thus avoiding the incorrect atom selections occurring in the global dictionary, especially when the multiple views are not closely related. By performing JSR on the enhanced local dictionary, both the correlated views and the unrelated views can be represented properly. Finally, the target label will be decided according to the residuals of JSR based on the minimum residual principle.
In the following, we first introduce the practical restrictions on multi-view recognition in Section 2. Then, in Section 3, the proposed structure for multi-view recognition is presented. Extensive experiments are conducted on the moving and stationary target acquisition recognition (MSTAR) dataset in Section 4, and finally, conclusions are drawn in Section 5.

Practical Restrictions on Multi-View Target Recognition
With the development of space-borne and air-borne SAR systems, the multi-view SAR images of a target are becoming accessible. However, due to the EOCs in real scenarios, some views are not discriminative enough for target recognition. Then, it is preferable to discard these views in multi-view recognition. The following depicts some typical practical restrictions on multi-view recognition.

Target Variation
The target itself may have some variations due to configuration diversity or wreckage. Figure 1 shows two serial numbers of the T72 tank, i.e., A04 and A05. A04 is equipped with fuel drums while A05 has no fuel drums. When we want to classify A04 as a T72 with the training source of A05 samples, it is possible that the views which present the characteristics of fuel drums cause much discordance with the A05 samples. In this case, these views should be discarded in multi-view recognition. Generally speaking, for a target with local variations, the views which remarkably manifest the properties of the discordance may probably impair the multi-view recognition's performance. Therefore, such views should be cautiously passed through to multi-view recognition.
Remote Sens. 2017, 9,1150 3 of 18 enhanced local dictionary, both the correlated views and the unrelated views can be represented properly. Finally, the target label will be decided according to the residuals of JSR based on the minimum residual principle. In the following, we first introduce the practical restrictions on multi-view recognition in Section 2. Then, in Section 3, the proposed structure for multi-view recognition is presented. Extensive experiments are conducted on the moving and stationary target acquisition recognition (MSTAR) dataset in Section 4, and finally, conclusions are drawn in Section 5.

Practical Restrictions on Multi-View Target Recognition
With the development of space-borne and air-borne SAR systems, the multi-view SAR images of a target are becoming accessible. However, due to the EOCs in real scenarios, some views are not discriminative enough for target recognition. Then, it is preferable to discard these views in multi-view recognition. The following depicts some typical practical restrictions on multi-view recognition.

Target Variation
The target itself may have some variations due to configuration diversity or wreckage. Figure 1 shows two serial numbers of the T72 tank, i.e., A04 and A05. A04 is equipped with fuel drums while A05 has no fuel drums. When we want to classify A04 as a T72 with the training source of A05 samples, it is possible that the views which present the characteristics of fuel drums cause much discordance with the A05 samples. In this case, these views should be discarded in multi-view recognition. Generally speaking, for a target with local variations, the views which remarkably manifest the properties of the discordance may probably impair the multi-view recognition's performance. Therefore, such views should be cautiously passed through to multi-view recognition.

Environment Variation
The target may stand on different backgrounds, such as meadows or cement roads. Furthermore, it may be occluded by trees or manmade buildings as shown in Figure 2. Figure 2a,b shows two T72 tanks occluded by trees and a manmade wall, respectively. Figure 2b,c presents two views of a T72 tank occluded by a manmade wall. As for target recognition, it is preferable that the view in Figure 2b should be used, as most of the target's characteristics can be observed. For the view angle in Figure 2c, a large proportion of the target is occluded. Thus, the image at this view can provide very limited discriminability for target recognition.

Environment Variation
The target may stand on different backgrounds, such as meadows or cement roads. Furthermore, it may be occluded by trees or manmade buildings as shown in Figure 2. Figure 2a,b shows two T72 tanks occluded by trees and a manmade wall, respectively. Figure 2b,c presents two views of a T72 tank occluded by a manmade wall. As for target recognition, it is preferable that the view in Figure 2b should be used, as most of the target's characteristics can be observed. For the view angle in Figure 2c, a large proportion of the target is occluded. Thus, the image at this view can provide very limited discriminability for target recognition.
Remote Sens. 2017, 9,1150 3 of 18 enhanced local dictionary, both the correlated views and the unrelated views can be represented properly. Finally, the target label will be decided according to the residuals of JSR based on the minimum residual principle. In the following, we first introduce the practical restrictions on multi-view recognition in Section 2. Then, in Section 3, the proposed structure for multi-view recognition is presented. Extensive experiments are conducted on the moving and stationary target acquisition recognition (MSTAR) dataset in Section 4, and finally, conclusions are drawn in Section 5.

Practical Restrictions on Multi-View Target Recognition
With the development of space-borne and air-borne SAR systems, the multi-view SAR images of a target are becoming accessible. However, due to the EOCs in real scenarios, some views are not discriminative enough for target recognition. Then, it is preferable to discard these views in multi-view recognition. The following depicts some typical practical restrictions on multi-view recognition.

Target Variation
The target itself may have some variations due to configuration diversity or wreckage. Figure 1 shows two serial numbers of the T72 tank, i.e., A04 and A05. A04 is equipped with fuel drums while A05 has no fuel drums. When we want to classify A04 as a T72 with the training source of A05 samples, it is possible that the views which present the characteristics of fuel drums cause much discordance with the A05 samples. In this case, these views should be discarded in multi-view recognition. Generally speaking, for a target with local variations, the views which remarkably manifest the properties of the discordance may probably impair the multi-view recognition's performance. Therefore, such views should be cautiously passed through to multi-view recognition.

Environment Variation
The target may stand on different backgrounds, such as meadows or cement roads. Furthermore, it may be occluded by trees or manmade buildings as shown in Figure 2. Figure 2a,b shows two T72 tanks occluded by trees and a manmade wall, respectively. Figure 2b,c presents two views of a T72 tank occluded by a manmade wall. As for target recognition, it is preferable that the view in Figure 2b should be used, as most of the target's characteristics can be observed. For the view angle in Figure 2c, a large proportion of the target is occluded. Thus, the image at this view can provide very limited discriminability for target recognition.

Sensor Variation
Actually, the multi-view images of a certain target may come from different sensors. Figure 3 shows three typical situations during the collection of multi-view SAR images. In Figure 3a, multiple images are captured by the same airborne sensor at consecutive azimuths. The collected multiple views in Figure 3b are from different airborne sensors at quite different view angles. Figure 3c shows the multiple views acquired by both the airborne and satellite-based platforms. Consequently, the images may have quite different view angles. Furthermore, due to the sensor variety, they may also have different resolutions and signal-to-noise ratios (SNR). As the training samples are probably collected at several fixed resolutions and SNRs, the multi-view images with different resolutions or SNRs will provide disproportionate discriminability for target recognition. Then, the views with low discriminability should not be used for multi-view recognition.

Sensor Variation
Actually, the multi-view images of a certain target may come from different sensors. Figure 3 shows three typical situations during the collection of multi-view SAR images. In Figure 3a, multiple images are captured by the same airborne sensor at consecutive azimuths. The collected multiple views in Figure 3b are from different airborne sensors at quite different view angles. Figure 3c shows the multiple views acquired by both the airborne and satellite-based platforms. Consequently, the images may have quite different view angles. Furthermore, due to the sensor variety, they may also have different resolutions and signal-to-noise ratios (SNR). As the training samples are probably collected at several fixed resolutions and SNRs, the multi-view images with different resolutions or SNRs will provide disproportionate discriminability for target recognition. Then, the views with low discriminability should not be used for multi-view recognition. Actually, there are many more EOCs in practical applications. Consequently, as we analyze above, images from different views provide unequal discriminability to target recognition under EOCs. Therefore, for robust target recognition, the validity of the input views for multi-view recognition should be examined beforehand.

Sparse Representation-Based Classification (SRC)
With the development of compressive sensing (CS) theory, sparse signal processing over a redundant dictionary has drawn pervasive attention. By enforcing a sparsity constraint on the representation coefficients, the decision is made by evaluating which class of samples can recover the test sample with the minimum error.
Denote the dictionary constructed by training samples from C classes as is a class-dictionary formed by individual classes, d is the dimensionality of the atom, and N is the total number of all the training samples: Based on the assumption that a test sample y from class i lies in the same subspace with the training samples from the same class, SRC assumes that y can be recovered from its sparse representation with respect to global dictionary =  Actually, there are many more EOCs in practical applications. Consequently, as we analyze above, images from different views provide unequal discriminability to target recognition under EOCs. Therefore, for robust target recognition, the validity of the input views for multi-view recognition should be examined beforehand.

Sparse Representation-Based Classification (SRC)
With the development of compressive sensing (CS) theory, sparse signal processing over a redundant dictionary has drawn pervasive attention. By enforcing a sparsity constraint on the representation coefficients, the decision is made by evaluating which class of samples can recover the test sample with the minimum error.
Denote the dictionary constructed by training samples from C classes as is a class-dictionary formed by individual classes, d is the dimensionality of the atom, and N is the total number of all the training samples: Based on the assumption that a test sample y from class i lies in the same subspace with the training samples from the same class, SRC assumes that y can be recovered from its sparse representation with respect to global dictionary A = [A 1 , A 2 , · · · , A C ] as follows: where x is the sparse coefficient vector, and ε is the allowed error tolerance. The non-convex 0 -norm objective in Equation (1) is an NP-hard problem. Typical approaches for solving the problem are either approximating the original problem with 1 -norm based convex relaxation [9] or resorting to greedy schemes, such as orthogonal matching pursuit (OMP) [8]. After solving the optimal representationx, SRC decides the identity of the test sample by evaluating which class of samples could result in the minimum reconstruction error.
where the operation δ i (·) extracts the coefficients corresponding to class i and r(i)(i = 1, 2, · · · , C) denotes the reconstruction error of each class.

Joint Sparse Representation (JSR) for Classification
Assume there are M views of the same target, then the M sparse representation problems can be jointly formulated as: (3) can be rewritten aŝ In Equation (4), · F represents the Frobenius norm and X 0 denotes the number of non-zero elements in X. However, the inner correlation among multiple views is not fully considered due to the fact that X 0 is decomposable over each column (view). To overcome this defect, it is assumed that the multiple views of the same target share a similar sparse pattern over the same dictionary while the values of the coefficients corresponding to the same atom may be different for each input view. This can be achieved by solving the following optimization problem with 0 \ 2 mixed-norm regularization asX = argmin where X 0 \ 2 is the mixed-norm of matrix X obtained by first applying the 2 norm on each row of X and then applying the 0 norm on the resulting vector.
To solve the problem in Equation (5), the simultaneous orthogonal matching pursuit (SOMP) [21,22] method can be used. After solving the coefficient matrixX, the target label is determined by the minimum reconstruction error, which is the same as SRC. JSR incorporates the inner correlation of multiple views in the joint decision framework. However, when the multiple views are not closely related due to a large azimuth difference, environmental variations, etc., the correlation constraint in JSR may result in incorrect atom selections. Then, the classification performance will degrade.

Structure for Multi-View Target Recognition
Due to the EOCs in real scenarios, some of the collected multiple views may provide little discriminability for target recognition. In order to build a robust multi-view recognition framework, each of the input views is first examined by SRC. Based on the residuals of individual classes, a reliability level is calculated, which indicates the effectiveness of a certain view for multi-view Remote Sens. 2017, 9, 1150 6 of 18 recognition. The reliability level is evaluated by the ratio of the minimum residual r(k) and the second minimum residual as Equation (6).
It is obvious that S ranges from 0 to 1, while a lower S indicates higher reliability. By presetting a threshold T on S, only the views with higher reliability than T will be passed through to multi-view recognition. A lower T means a stricter criterion for selecting a valid view. Then, the selected views are classified by JSR to exploit their possible inner correlation for target recognition. As indicated in the JSR model, when the multiple inputs are not closely related (for example they are acquired at quite different azimuths), the resulting coefficients by JSR may not be optimal. In order to overcome this situation, an enhanced local dictionary is built based on the selected atoms of the selected views by SRC. The selected atoms by each view are combined to form a local dictionary which only contains a few atoms. To further enhance the representation capability of the local dictionary, the samples in the original dictionary A, which are closely related to these atoms, are added to the local dictionary. The detailed process for constructing the enhanced local dictionary is described in Algorithm 1. In our practical application, T θ is set to be 5 • , assuming that SAR images keep highly correlated in such an azimuth interval. On the one hand, the enhanced local dictionary inherits the representation capability of the global dictionary for the selected views. On the other hand, it can avoid the incorrect atom selection of JSR over the global dictionary to make more robust decisions.
The framework of the proposed method is shown in Figure 4. The threshold judgment based on the reliability level can effectively eliminate those views with low discriminability. The enhanced local dictionary will promote the representation capability of JSR; thus, it can still work robustly when the multiple views are not closely related. Therefore, the proposed method can improve the performance of multi-view recognition, especially when the input views contain some corrupted samples due to EOCs. In the implementation of the proposed method, OMP and SOMP are employed in SRC and JSR, respectively, for their high efficiency.

MSTAR Dataset
To test the proposed method, the MSTAR dataset is employed, which is the benchmark for SAR ATR. The SAR images are collected by X-band sensors with a 0.3 m × 0.3 m resolution. The dataset includes ten classes of military targets: BMP2, BTR70, T72, T62, BRDM2, BTR60, ZSU23/4, D7, ZIL131, and 2S1. The corresponding optical images of these targets are shown in Figure 5. SAR images at the depression angles of 15° and 17° are listed in Table 1, which cover full aspect angles from 0-360°.
In order to quantitatively evaluate the proposed method, several state-of-the-art algorithms are used as the references, including single view methods and multi-view methods. The detailed descriptions of these methods are presented in Table 2. SVM and SRC are employed for the classification of single view SAR images. The MSRC, DSRC, and joint sparse representation-based classification (JSRC) methods are used for multi-view recognition. For a fair comparison, all of these methods employ Gaussian random projection for feature extraction with the dimension of 1024. In the rest of this section, we first perform the proposed method under standard operating conditions (SOC), i.e., a 10-class recognition problem. Afterwards, the performance is evaluated under several extended operating conditions (EOCs), namely, configuration variance, noise corruption, partial occlusion, and resolution variance. At last, we examine the proposed method under different thresholds of reliability level.

MSTAR Dataset
To test the proposed method, the MSTAR dataset is employed, which is the benchmark for SAR ATR. The SAR images are collected by X-band sensors with a 0.3 m × 0.3 m resolution. The dataset includes ten classes of military targets: BMP2, BTR70, T72, T62, BRDM2, BTR60, ZSU23/4, D7, ZIL131, and 2S1. The corresponding optical images of these targets are shown in Figure 5. SAR images at the depression angles of 15 • and 17 • are listed in Table 1, which cover full aspect angles from 0-360 • .
In order to quantitatively evaluate the proposed method, several state-of-the-art algorithms are used as the references, including single view methods and multi-view methods. The detailed descriptions of these methods are presented in Table 2. SVM and SRC are employed for the classification of single view SAR images. The MSRC, DSRC, and joint sparse representation-based classification (JSRC) methods are used for multi-view recognition. For a fair comparison, all of these methods employ Gaussian random projection for feature extraction with the dimension of 1024. In the rest of this section, we first perform the proposed method under standard operating conditions (SOC), i.e., a 10-class recognition problem. Afterwards, the performance is evaluated under several extended operating conditions (EOCs), namely, configuration variance, noise corruption, partial occlusion, and resolution variance. At last, we examine the proposed method under different thresholds of reliability level.

MSTAR Dataset
To test the proposed method, the MSTAR dataset is employed, which is the benchmark for SAR ATR. The SAR images are collected by X-band sensors with a 0.3 m × 0.3 m resolution. The dataset includes ten classes of military targets: BMP2, BTR70, T72, T62, BRDM2, BTR60, ZSU23/4, D7, ZIL131, and 2S1. The corresponding optical images of these targets are shown in Figure 5. SAR images at the depression angles of 15° and 17° are listed in Table 1, which cover full aspect angles from 0-360°.
In order to quantitatively evaluate the proposed method, several state-of-the-art algorithms are used as the references, including single view methods and multi-view methods. The detailed descriptions of these methods are presented in Table 2. SVM and SRC are employed for the classification of single view SAR images. The MSRC, DSRC, and joint sparse representation-based classification (JSRC) methods are used for multi-view recognition. For a fair comparison, all of these methods employ Gaussian random projection for feature extraction with the dimension of 1024. In the rest of this section, we first perform the proposed method under standard operating conditions (SOC), i.e., a 10-class recognition problem. Afterwards, the performance is evaluated under several extended operating conditions (EOCs), namely, configuration variance, noise corruption, partial occlusion, and resolution variance. At last, we examine the proposed method under different thresholds of reliability level.

Preliminary Performance Verification
We first test the proposed method under SOC. Images at a 17 • depression angle are used for training and images at a 15 • depression angle are tested. For BMP2 and T72 with three different serial numbers, only the standard ones (Sn_9563 for BMP2 and Sn_132 for T72) are used for training (see Table 1). For each test sample, we choose M − 1 other samples at an azimuth interval of ∆θ and then the M samples are put together for multi-view recognition. The reliability level threshold is set at T = 0.96, M = 3, and ∆θ = 12 • , and the recognition results of the proposed method are displayed as the confusion matrix in Table 3. All of the 10 targets can be recognized with a percentage of correct classification (PCC) over 97% and an overall PCC of 98.94%.
The proposed method is compared with the reference methods in Table 4. It is clear that all the multi-view methods achieve a disproportionate improvement over the single view methods. Among the multi-view methods, the proposed method has the highest PCC. DSRC has the lowest PCC, probably due to the discriminability loss during the data fusion. MSRC achieves a slightly higher PCC than JSRC at the sacrifice of computation efficiency. The results demonstrate the significant superiority of the multi-view methods over the single view algorithms and the high effectiveness of the proposed method under SOC.   In this part, we evaluate the effects of view numbers on the proposed method. The azimuth interval is set as ∆θ = 1 • , and Table 5 summarizes the performance of all of the multi-view recognition methods at different view numbers. Except for DSRC, the other multi-view methods gain improvement with the increase of the view number and they share approaching PCCs, while the proposed method achieves the highest one. It is reasonable that more views could provide more information for target recognition, thus improving the performance. As for DSRC, the fused result of too many samples by PCA may hardly find a best match in the dictionary. This is probably the reason for its performance degradation when the view number is more than 5.

Robustness to Azimuthal Variation
In previous works, the researchers assumed a fixed azimuth separation between the multi-view images. With the view number set as M = 3, the performance of the multi-view recognition algorithms at different fixed view steps are shown in Figure 6. The proposed method achieves the best performance, and DSRC has the lowest PCC at each view step. By comparing MSRC and JSRC, JSRC performs better when the view step is less than 15. This is mainly because the views with a smaller azimuth interval share a higher inner correlation. When the view step continues to increase, the performance of JSRC becomes worse than that of MSRC due to the weak correlation between different views. For DSRC, the fused results of multiple views with a large azimuth interval may hardly resemble the training samples of the same class. Therefore, its performance may even be inferior to that of single view methods at large view steps.
Actually, the collected views may probably have unstable azimuth separations. Therefore, a further experiment is carried out on the random selected views. For each test sample, we randomly selected two other samples from the same class. Then, the three samples are passed through to multi-view recognition. We repeated the test 10 times, and the average PCCs of the multi-view methods are presented in Table 6. The proposed method outperforms the others. Also, JRSC has a lower PCC than MSRC due to the unstable inner correlation among the views with random azimuths.

Recognition under EOCs
In the following, the proposed method is tested under several EOCs, i.e., configuration variance, noise corruption, partial occlusion, and resolution variance. According to the experimental results in Section 4.2, we set = 3 M and Δ = 12 θ  as a tradeoff between the view number and recognition performance.

Configuration Variance
In real scenarios, a special target class may include several variants. Therefore, it is necessary to test the ATR methods under configuration variance as the training samples may not cover all the possible configurations. Table 7 describes the training and test sets for the experiment under configuration variation. The training set is composed of four targets, BMP2, BRDM2, BTR70, and T72, and the test set consists of only BMP2 and T72. For BMP2 and T72, the configurations available for testing are not contained in the training set. Table 8 presents the confusion matrix of the proposed method under configuration variance. Table 9 compares the proposed method with the reference methods. It is clear that the multi-view methods perform much better than the single view ones. With the highest PCC, the proposed method is the most robust under configuration variance.

Recognition under EOCs
In the following, the proposed method is tested under several EOCs, i.e., configuration variance, noise corruption, partial occlusion, and resolution variance. According to the experimental results in Section 4.2, we set M = 3 and ∆θ = 12 • as a tradeoff between the view number and recognition performance.

Configuration Variance
In real scenarios, a special target class may include several variants. Therefore, it is necessary to test the ATR methods under configuration variance as the training samples may not cover all the possible configurations. Table 7 describes the training and test sets for the experiment under configuration variation. The training set is composed of four targets, BMP2, BRDM2, BTR70, and T72, and the test set consists of only BMP2 and T72. For BMP2 and T72, the configurations available for testing are not contained in the training set. Table 8 presents the confusion matrix of the proposed method under configuration variance. Table 9 compares the proposed method with the reference methods. It is clear that the multi-view methods perform much better than the single view ones. With the highest PCC, the proposed method is the most robust under configuration variance.

Noise Corruption
The measured SAR images may be corrupted by noises from the background and radar system. Therefore, it is crucial that the ATR method can keep robust under noise corruption. Actually, there are many types of noises in SAR images. In this paper, additive complex Gaussian noise is employed as the representative, which is also used in several relevant researches [23,24]. Since it is impossible to perfectly eliminate all of the noises in the original SAR images, the original MSTAR images are assumed to be noise free and different amounts of Gaussian noises are added according to the SNR level defined as Equation (7). SNR(dB) = 10 log 10 where r(i, l) is the complex frequency data of MSTAR images, and σ 2 is the variance of Gaussian noise. Figure 7 illustrates the procedure of transforming the original MSTAR images to the frequency domain. By transforming the original image using a two-dimensional (2D) inverse fast Fourier transform (FFT) and removing the zero padding and window during the imaging process, the frequency data is separated as a I × L matrix. Then, the Gaussian noises are added to the frequency data according to the preset SNRs and the noise-corrupted images are obtained by transforming the noised frequency data back to the image domain. Figure 8 gives some examples of the noise-corrupted images of a BMP2 image (chip number "HB03341.000") at different SNRs.

Noise Corruption
The measured SAR images may be corrupted by noises from the background and radar system. Therefore, it is crucial that the ATR method can keep robust under noise corruption. Actually, there are many types of noises in SAR images. In this paper, additive complex Gaussian noise is employed as the representative, which is also used in several relevant researches [23,24]. Since it is impossible to perfectly eliminate all of the noises in the original SAR images, the original MSTAR images are assumed to be noise free and different amounts of Gaussian noises are added according to the SNR level defined as Equation (7).
where ( , ) r i l is the complex frequency data of MSTAR images, and 2 σ is the variance of Gaussian noise. Figure 7 illustrates the procedure of transforming the original MSTAR images to the frequency domain. By transforming the original image using a two-dimensional (2D) inverse fast Fourier transform (FFT) and removing the zero padding and window during the imaging process, the frequency data is separated as a I Ĺ matrix. Then, the Gaussian noises are added to the frequency data according to the preset SNRs and the noise-corrupted images are obtained by transforming the noised frequency data back to the image domain. Figure 8 gives some examples of the noise-corrupted images of a BMP2 image (chip number "HB03341.000") at different SNRs.  The details of the implementation of this experiment are as follows. For the multi-view methods, two of the three views are corrupted by noise. For the single view methods, we randomly corrupt 2/3 of the test samples for a fair comparison. Figure 9 shows the performance of all the methods under different noise levels. It is clear that the multi-view methods achieve much better performance than the single view methods. It is also noticeable that DRSC performs better than MSRC and JSRC at SNRs lower than 0 dB. This is mainly because the PCA process in the data fusion eliminates some noises. With the highest PCC at each SNR, the proposed method is demonstrated to be the most effective and robust.  The details of the implementation of this experiment are as follows. For the multi-view methods, two of the three views are corrupted by noise. For the single view methods, we randomly corrupt 2/3 of the test samples for a fair comparison. Figure 9 shows the performance of all the methods under different noise levels. It is clear that the multi-view methods achieve much better performance than the single view methods. It is also noticeable that DRSC performs better than MSRC and JSRC at SNRs lower than 0 dB. This is mainly because the PCA process in the data fusion eliminates some noises. With the highest PCC at each SNR, the proposed method is demonstrated to be the most effective and robust. The details of the implementation of this experiment are as follows. For the multi-view methods, two of the three views are corrupted by noise. For the single view methods, we randomly corrupt 2/3 of the test samples for a fair comparison. Figure 9 shows the performance of all the methods under different noise levels. It is clear that the multi-view methods achieve much better performance than the single view methods. It is also noticeable that DRSC performs better than MSRC and JSRC at SNRs lower than 0 dB. This is mainly because the PCA process in the data fusion eliminates some noises. With the highest PCC at each SNR, the proposed method is demonstrated to be the most effective and robust.

Partial Occlusion
The target may be occluded by obstacles, so it is meaningful to test the ATR algorithms under partial occlusion. As there are no accepted empirical models of object occlusion in SAR imagery, we consider the occlusion to occur possibly from eight different directions as in Ref. [25]. To simulate the occluded target, the target region is first segmented using the algorithm in Ref. [26]. Then, different levels of partial occlusion are simulated by occluding different proportions of the target region from the eight directions. Figure 10 shows 20 percent occluded images from eight directions.
The experimental implementation in this part is similar to the former one except that the noise corruption is replaced by partial occlusion. Figures 11-14 show the results of different methods at directions 1, 3, 5, and 7 (due to the symmetry, the other directions are not presented). In each figure, subfigure (a) compares the performance of all of the methods and subfigure (b) presents a detailed comparison of the multi-view methods. Clearly, the multi-view methods gain significant improvement over the single view methods and the proposed method achieves the best performance under partial occlusion. Specifically, with the degradation of partial occlusion, the superiority of the proposed method becomes more and more significant. When the SAR image is severely occluded, it actually loses the discriminability required for target recognition; thus, it is preferable to discard it in multi-view recognition. It also can be observed from the results under noise corruption and partial occlusion that SRC achieves better performance than SVM, which is consistent with the conclusions in Ref. [9].

Partial Occlusion
The target may be occluded by obstacles, so it is meaningful to test the ATR algorithms under partial occlusion. As there are no accepted empirical models of object occlusion in SAR imagery, we consider the occlusion to occur possibly from eight different directions as in Ref. [25]. To simulate the occluded target, the target region is first segmented using the algorithm in Ref. [26]. Then, different levels of partial occlusion are simulated by occluding different proportions of the target region from the eight directions. Figure 10 shows 20 percent occluded images from eight directions.
The experimental implementation in this part is similar to the former one except that the noise corruption is replaced by partial occlusion. Figures 11-14 show the results of different methods at directions 1, 3, 5, and 7 (due to the symmetry, the other directions are not presented). In each figure, subfigure (a) compares the performance of all of the methods and subfigure (b) presents a detailed comparison of the multi-view methods. Clearly, the multi-view methods gain significant improvement over the single view methods and the proposed method achieves the best performance under partial occlusion. Specifically, with the degradation of partial occlusion, the superiority of the proposed method becomes more and more significant. When the SAR image is severely occluded, it actually loses the discriminability required for target recognition; thus, it is preferable to discard it in multi-view recognition. It also can be observed from the results under noise corruption and partial occlusion that SRC achieves better performance than SVM, which is consistent with the conclusions in Ref. [9].  Due to the variation of SAR sensors, it is possible that multi-view SAR images may have some different resolutions. Figure 15 shows the SAR images of the same target at different resolutions from 0.3 m to 0.7 m.

Resolution Variance
Due to the variation of SAR sensors, it is possible that multi-view SAR images may have some different resolutions. Figure 15 shows the SAR images of the same target at different resolutions from 0.3 m to 0.7 m.

Resolution Variance
Due to the variation of SAR sensors, it is possible that multi-view SAR images may have some different resolutions. Figure 15 shows the SAR images of the same target at different resolutions from 0.3 m to 0.7 m. With a similar experimental setup to the former two experiments, the results of different methods are plotted in Figure 16. Compared with EOCs such as noise corruption and partial occlusion, the resolution variance causes less degradation to the ATR method's performance. It is also noticeable that SRC has a much greater PCC than SVM. With the highest PCC, the proposed method enjoys the best effectiveness and robustness among all the methods. With a similar experimental setup to the former two experiments, the results of different methods are plotted in Figure 16. Compared with EOCs such as noise corruption and partial occlusion, the resolution variance causes less degradation to the ATR method's performance. It is also noticeable that SRC has a much greater PCC than SVM. With the highest PCC, the proposed method enjoys the best effectiveness and robustness among all the methods.

Recognition under Different Thresholds
One of the key parameters in the proposed method is the reliability level threshold. Actually, it is a nuisance problem that is used to determine the threshold. In order to provide some insights into the threshold's determination, we test the proposed method at different thresholds. As shown in

Recognition under Different Thresholds
One of the key parameters in the proposed method is the reliability level threshold. Actually, it is a nuisance problem that is used to determine the threshold. In order to provide some insights into the threshold's determination, we test the proposed method at different thresholds. As shown in Figure 17, the performance under SOC improves with the increase of the threshold. Actually, most of the samples under SOC are discriminative for target recognition. Therefore, it is preferable that all the views are used for recognition. For EOC recognition, the highest PCCs are achieved at different thresholds for different EOCs. Specially, for the MSTAR dataset, a proper threshold is 0.96 for performance under all conditions, which is relatively high.

Recognition under Different Thresholds
One of the key parameters in the proposed method is the reliability level threshold. Actually, it is a nuisance problem that is used to determine the threshold. In order to provide some insights into the threshold's determination, we test the proposed method at different thresholds. As shown in Figure 17, the performance under SOC improves with the increase of the threshold. Actually, most of the samples under SOC are discriminative for target recognition. Therefore, it is preferable that all the views are used for recognition. For EOC recognition, the highest PCCs are achieved at different thresholds for different EOCs. Specially, for the MSTAR dataset, a proper threshold is 0.96 for performance under all conditions, which is relatively high.

Conclusions
This paper proposes an SAR target recognition method by exploiting the information contained in multi-view SAR images. Each of the input multi-view SAR images is first examined by SRC to Figure 17. Performance of the proposed method at different thresholds.

Conclusions
This paper proposes an SAR target recognition method by exploiting the information contained in multi-view SAR images. Each of the input multi-view SAR images is first examined by SRC to evaluate its validity for multi-view recognition. Then, the selected views are jointly recognized using JSR. To improve the adaptability of JSR, an enhanced local dictionary is constructed based on the selected atoms by SRC. Extensive experiments are conducted on the MSTAR dataset under SOC and various EOCs, and a comparison is made with several state-of-the-art methods including SVM, SRC, MSRC, DSRC, and JSRC. Based on the experimental results, conclusions can be drawn as follows: (1) the multi-view methods can significantly improve the ATR method's performance compared with single view methods; (2) the proposed method can better exploit multi-view SAR images to improve the ATR method's performance under SOC; and (3) the proposed method can significantly enhance the robustness of an ATR system to various EOCs.