Multiview Deep Feature Learning Network for SAR Automatic Target Recognition

: Multiview synthetic aperture radar (SAR) images contain much richer information for automatic target recognition (ATR) than a single-view one. It is desirable to establish a reasonable multiview ATR scheme and design effective ATR algorithm to thoroughly learn and extract that classiﬁcation information, so that superior SAR ATR performance can be achieved. Hence, a general processing framework applicable for a multiview SAR ATR pattern is ﬁrst given in this paper, which can provide an effective approach to ATR system design. Then, a new ATR method using a multiview deep feature learning network is designed based on the proposed multiview ATR framework. The proposed neural network is with a multiple input parallel topology and some distinct deep feature learning modules, with which signiﬁcant classiﬁcation features, the intra-view and inter-view features existing in the input multiview SAR images, will be learned simultaneously and thoroughly. Therefore, the proposed multiview deep feature learning network can achieve an excellent SAR ATR performance. Experimental results have shown the superiorities of the proposed multiview SAR ATR method under various operating conditions.


Introduction
Synthetic aperture radar (SAR) has been an important and powerful modern microwave sensor system in both military and civilian areas [1]. Due to its superior operational capabilities [2,3], SAR has played a significant information acquisition role for reconnaissance and detection nowadays. In addition, SAR can obtain the electromagnetic scattering characteristics of the detected targets and scenarios and acquire unique information from the imaging results at microwave frequencies [4], which have been of remarkable superiorities compared with other sensor systems.
With the improvement of the imaging capability of SAR systems, people have been interested in not only SAR signal processing but also interpretation or recognition of the real-world targets from SAR images. Automatic target recognition (ATR) [5][6][7][8] has become one of the most attractive but challenging research hotspots in SAR application. From the point of view of the users, an ideal ATR system should locate the regions with potential targets of interests from the SAR image and give those targets with accurate category labels intelligently and efficiently [9].
The general scheme of an end-to-end SAR ATR system, proposed by the researchers from MIT Lincoln Laboratory, has three basic stages with a hierarchical processing [10], i.e., detection [11], discrimination [12], and classification [13]. It aims to find the regions of interests (ROIs) from the SAR imagery, screen the targets we wanted [14], remove the false alarm clutters, and finally assign the classified attributes for the SAR targets with a well-designed classifier. In order to make the intelligent SAR target recognition to a reality, Generally, multiview SAR ATR is a complex and integrated information processing procedure. In order to achieve outstanding multiview SAR ATR performance, two important issues must be incorporated: a valid ATR processing framework and an appropriate ATR algorithm for classification feature learning from limited raw SAR samples. A reasonable processing framework is necessary for the effectiveness of multiview SAR ATR, while the ATR algorithm is one of the most key points in the framework. Hence, it is indispensable and desirable to establish a standard processing framework for multiview SAR ATR architecture design and then search for an effective ATR algorithm.
In this paper, we will give a general processing framework for multiview SAR ATR including three parts, i.e., raw multiview SAR data formation, multiview SAR data preprocessing, and multiview target recognition, which can provide an effective and standard way to multiview SAR ATR system design. Then, a novel ATR method using a multiview deep feature learning network is proposed based on this framework. The proposed deep neural network is with a multiple input parallel topology, and some specific modules such as convolutional layer, convolutional gated recurrent unit (ConvGRU), weighted concatenation unit (WCU), 3D convolutional layer, and 3D pooling layer are embedded in this network. Both the intra-view and inter-view features of the input multiview SAR images will be thoroughly learned with this elaborately designed multiview deep feature learning network. Therefore, the proposed network can take advantage of comprehensive and significant classification information from multiview SAR images and achieve high target recognition accuracy.
The main contributions compared with available SAR ATR works are the following: (1) We give a general processing framework for multiview SAR ATR, which can make a paradigm for ATR system designs and future studies of this field. (2) A multiview deep feature learning network is proposed for effective SAR ATR, and this network can simultaneously extract the intra-view and inter-view features from multiview SAR images.
(3) Compared with the available SAR ATR methods, the proposed deep neural network can achieve excellent ATR performances under various operating conditions but with limited raw SAR data for training sample generation. This paper is organized as follows: A general processing framework for multiview SAR ATR is introduced in Section 2. Section 3 details the proposed SAR ATR method using a multiview deep feature learning network. Experiments are carried out in Section 4, and Section 5 gives the conclusions of our work.

Multiview SAR ATR Processing Framework
Practical implementation of SAR ATR had been summarized as a multistage processing by the researchers from MIT Lincoln Laboratory in the last century [46], which is a classical and excellent SAR ATR framework. Nevertheless, that ATR scheme was generalized and mainly designed for single-view input SAR image at the beginning. Multiview SAR data are with higher dimensions than single-view ones, and contain rich classification information, so this needs a more sophisticated and specific processing ATR scheme than before. Therefore, based on the MIT ATR scheme, we give a general processing framework that is appropriate for multiview SAR ATR.
The framework includes three specific parts as shown in Figure 1, i.e., raw multiview SAR data formation, multiview SAR data preprocessing, and multiview target recognition, each of which performs easily identifiable functions. Modules in this framework are detailed as follows.

Raw Multiview SAR Data Formation
The first module in the framework is to acquire the eligible and valid raw multiview SAR images and find out the ROIs, which can locate the targets we wanted and reduce the computational load of the ATR system. Generally, this module should contain two processing steps, i.e., multiview SAR imaging and ROIs acquisition. Some SAR imaging modes, such as spotlight mode [47] and circular mode [48], can continuously observe the same scene or target and are perfect for raw multiview SAR images collection. Then, the target chips with multiple views will be obtained by the ROI acquisition step, and there are many target detection and discrimination methods that can be chosen to realize it.

Multiview SAR Data Preprocessing
After the raw multiview SAR data formation, the multiview SAR target chips are obtained; however, there are still some problems to be solved. For example, the orientations of the same target on the SAR chips are different, and the scattering information of the target on the multiview SAR images could be inapparent. In addition, sufficient training samples should be fed into the multiview SAR ATR algorithm to optimize its parameters during the training phase. However, the amount of the available raw multiview SAR data are often limited in practice, which could lead to overfitting of the ATR algorithm.
The aims of the multiview SAR data preprocessing are to eliminate the inconsistence, enhance the scattering information, and augment the raw multiview SAR data for training, which correspond to orientation correction, image enhancement, and data augmentation in this module, respectively. After the data preprocessing, the multiview SAR data are more suitable for the following ATR processing, and the classification information of the multiview SAR targets will be more easy to learn than before.

Multiview Target Recognition
Multiview target recognition is the back-end module in the multiview SAR ATR processing framework. It constructs ATR algorithms, receives the multiview SAR samples from the preceding module, and assigns the most probable classified label for the target. Essentially, this module is to learn and extract effective classification features from the input samples and make optimal division for the features with hyperplanes in the feature space.
There are two kinds of very important features to be learned in multiview SAR images, i.e., intra-view feature and inter-view feature. The intra-view feature means the inherent scattering or structural feature of the SAR target within each view, while the inter-view feature is the mutual feature in the multiview SAR image sequence, which is distinct from single-view SAR ATR. Meanwhile, the inter-view feature includes two individual features. When SAR observes the same target from different views, the correlated feature among the multiview image sequence, namely the temporal feature, will contain intrinsic classification information. In addition, the variation feature of the multiview image sequence, i.e., spatial feature, can also provide complementary discriminative information of the same target and benefit to ATR. Therefore, the most important point in multiview target recognition module is to design an appropriate ATR algorithm to simultaneously learn classification features of both intra-view and inter-view from multiview SAR images. After feature learning, the multiview target recognition module will give us an accurate class attribute of the target.
Thus far, the multiview SAR ATR processing framework is summarized as three individual but related modules with several distinct steps. In this way, the multiview SAR ATR problem can be effectively handled. While this ATR framework includes some specific processing steps within each module, it is noted that not every processing step is absolutely necessary; people could also make some adjustments in ATR practice.

Proposed Multiview SAR ATR Method
A multiview deep feature learning network is presented for SAR ATR in this section, which is based on the above-mentioned framework and can simultaneously learn both the intra-view and inter-view features of multiview SAR images. According to the multiview ATR framework, we will first discuss the raw multiview SAR data formation and multiview SAR data preprocessing. Then, the design of the feature learning network for multiview SAR ATR will be given, and at last the configuration of the network will also be detailed.

Raw Data Formation
In the multiview SAR imaging pattern, the SAR sensor collects returns and obtains the multiview images for a given target from different elevation and aspect angles. For simplicity, the depression angle is set as constant here. Figure 2 shows the geometric model of the multiview SAR imaging process. The given view interval is denoted as θ, and the view number is k > 1. Then, the target chips with multiple views will be obtained by target detection and discrimination methods. Using these raw SAR images from different view angles, more classification information could be exploited than from the single-view pattern.

Data Preprocessing
Before the training and recognition phases of ATR, some preprocessing steps are needed for the raw multiview SAR data, which includes orientation correction, image enhancement, and data augmentation. The targets on SAR images are often sensitive to the views and present themselves differently both in scattering characteristics and orientations on the multiview images. In order to keep the scatting information of the targets from multiview images while reducing their orientation difference, we align the targets to the same orientation by aspect rotation with an affine transformation: where φ is the rotation angle estimated by the SAR target aspect estimation method [49], (p, q) is the original coordinates, and (u, v) is the transformed coordinates. After target orientation correction, we employ the gray enhancement method based on power function [50] to enhance the scattering information of the raw multiview SAR images: where x (u, v) is the original image and x(u, v) is the SAR image after information enhancement, respectively, and ρ is the enhancement factor.
Next, we use a typical multiview SAR data augmentation method to generate adequate samples for training [45]. Suppose X (raw) = {X 1 , X 2 , · · · , X C } is the set of raw SAR images. The image set X i = {x 1 , x 2 , · · · , x n i } collected by SAR with their aspect angles 0 • −360 • belongs to class y i . Their corresponding aspect angles are ϕ(x i ). The class label set is {y i ∈ [1, 2, · · · , C]}. For a given view number k > 1, all the view combinations of one class SAR images can be obtained, and the combination number is C k Then, the training data can be significantly augmented as the steps in Table 1. In Table 1, N i is the number of the multiview SAR image sequence from the combinations that fulfills the selection condition in class y i , and the final input multiview SAR image sequence set for training is X = X 1 , X 2 , · · · , X C . Table 1. Process of multiview SAR data augmentation.
Initialization: view interval θ and view number k.
Find out all view combinations of raw SAR images from class y i . for j = 1 to C k n i do Arrange elements for each combination x j 1 , x j 2 , · · · , x j k , Output: multi-view SAR image sequence set X = X 1 , X 2 , · · · , X C for ATR algorithm training. Figure 3 shows an example of using the data augmentation method to generate multiview SAR samples for training. Suppose the view number is 3 in each view interval, the data augmentation method can get nine 3-view SAR training samples from only six raw SAR images. When the view number and interval increase, sufficient training samples can be obtained for a given number of raw SAR images.

Multiview Deep Feature Learning Network
The basic architecture of the proposed multiview deep feature learning network is shown in Figure 4. As we can see, this deep neural network employs a parallel topological structure with multiview inputs, and these multiple inputs are progressively merged and fused in different layers, which can effectively learn and extract the recognition information from the SAR images with different views.  As is mentioned in Section 2.3, intra-view and inter-view features are two kinds of important features that should be learned in multiview SAR images, which incorporate complete classification information. The proposed multiview deep feature learning network is designed with alternate convolutional and pooling layers in each branch, which can learn the inherent classification feature of the target in the SAR image within each view and reduce the feature dimension. Meanwhile, a special recurrent neural network structure, ConvGRU, is implanted in the proposed deep neural network to extract the temporal feature and learn the correlated feature among the multiview SAR images. In addition, we propose a new spatial feature learning module, WCU, to effectively learn and fuse the classification feature of the multiview SAR images between two different network branches. After all the network branches being merged, the feature maps are fed into a 3D convolutional layer followed by a 3D max pooling layer to further learn the inter-view feature from the fused feature maps. Finally, the proposed network ends with a fully connected layer, and the recognition decision is conducted by the softmax classifier.
From the basic architecture, it can be seen that the proposed multiview deep feature learning network is capable of learning both intra-view and inter-view classification features from multiview SAR images and benefits ATR. In the following discussions, we will detail the network layers for intra-view feature and inter-view feature learning in the multiview deep feature learning network.
The layers designed for intra-view feature learning from multiview SAR images mainly include convolutional layer and pooling layer which are detailed as follows.

Convolutional Layer
Convolutional layer is inspired by the process of the biological neuron in the visual cortex to a specific stimulus and can learn the features from images well. Thus, it can be used to effectively learn the intra-view feature from the multiview SAR images. Here, let a (l−1) i be the ith feature map in l − 1 convolutional layer in our neural network, and it will be connected to all the feature maps in the l layer by convolution operation. w j is the bias. The forward process in the convolutional layer for each unit can be written as j is the output feature map, the symbol * is 2D convolution operation. Rectified linear units (ReLUs) are selected as the activation function after each convolutional layer, which can increase the nonlinear properties of the proposed network. The ReLU can be expressed as

Pooling Layer
Convolutional layer is often followed by a pooling layer in the structure of a deep neural network. The pooling layer can select the local feature from its input feature map and reduce the dimension of the feature, which is a perfect auxiliary for intra-view feature learning. Here, we utilize max pooling operation in the proposed neural network, which can be written as where s 1 and s 2 are the pooling strides, and p 1 and p 2 are the pooling window sizes. When the pooling window on the feature map slides and computes the maximum in the window as its output, the valuable local feature of the SAR images within each view is effectively extracted, and the feature dimension is also reduced.
As the intra-view feature of the multiview SAR images is being learned by the convolutional layer and max pooling layer, the modules for inter-view feature learning are also designed in the proposed neural network, which are described in the following.

Convolutional Gated Recurrent Unit
Multiview SAR images contain the temporal feature among the sequence, which could provide intrinsic classification information. Thus, the ATR performance will be improved if we can extract it well from multiview SAR images. A recurrent neural network is a special deep neural network that is suitable to learn the correlated feature of data sequence, which can handle this problem. A typical recurrent neural network structure, gated recurrent unit (GRU), is able to adaptively extract the temporal feature by resetting and updating the flow of information inside the unit, which has the potential of capture dependencies among the multiview SAR image sequence. However, classical GRU is with a fully-connected operation within each unit and cannot take advantage of the underlying structural information of the feature maps learned from SAR images. We therefore employ a kind of GRU with convolution operation, namely ConvGRU [51], to the proposed deep neural network. It can obtain the superiority of recurrent unit but is more appropriate for feature learning from multiview SAR images than classical ones.
The block diagram of ConvGRU is a two-input and two-output system and mainly composed of reset gate, update gate, and some other operations, which is shown in Figure 5. In the ConvGRU, when a new input arrives, the reset gate will control the feature learned from a previous view we might want to remember, while the update gate will determine how much the new feature of current view will be retained. For a given ConvGRU, the input feature map is a t and the state input of the previous learned feature is h t−1 . The reset gate r t and update gate z t are computed as where w ar , w hr and w az , w hz are the convolution kernel, and b r , b z are their corresponding biases, respectively. sig(·) denotes a sigmoid function to transform input values to the interval (0, 1). Then, the candidate hidden state of the ConvGRU can be computed as where w ah and w hh are the convolution kernel, and b h is the bias. The symbol indicates Hadamard product, and tanh(·) denotes a tanh function to ensure the values of the candidate hidden state to the interval (−1, 1). Finally, the current state h t and the output of the ConvGRU are obtained by the following equation: When the feature maps from previous network module arrive, the temporal feature will be effectively learned from the multiview SAR image sequence by the ConvGRUs.

Weighted Concatenation Unit
The inputs or the feature maps are progressively merged and fused in different layers in the proposed neural network. Thus, several concatenating operations should exist in the network to extract classification information from different views. The input feature maps are straightforwardly stacked in a traditional concatenation module, and the importance of each feature map is treated as equal. However, the features learned from multiview SAR images are different and their corresponding feature maps are of variation in different network branches. Therefore, it is necessary to select beneficial spatial features, focus on important features, and suppress trivial ones from different network branches during feature learning. To this end, a new spatial feature learning module, WCU, is designed in the neural network to learn and fuse classification feature from different network branches. The block diagram of the proposed WCU is shown in Figure 6. Let feature maps a ∈ R m×n×r and a ∈ R m×n×s be two inputs flowed into the WCU, and the feedforward propagation in the WCU is processed as where c ∈ R m×n×(r+s) , function conc(·, ·) denotes concatenation operation as in Figure 6. W m ∈ R m×n×(r+s) is the corresponding weighting maps to be learned, and the feature maps d ∈ R m×n×(r+s) are the output weighted maps of the WCU. Through the weighted concatenation processing, WCU is able to find out the optimal weighting maps during network learning and concatenate and weight the input feature maps, which can learn and emphasize meaningful spatial features of multiview SAR images.

3D Convolutional Layer and 3D Pooling Layer
After all the input views have been progressively merged in the network, the learned feature maps are concatenated together. Then, those concatenated feature maps will flow into a 3D convolutional layer [52] and a 3D max pooling layer, in which the inter-view feature from the fused feature maps will be further learned. In contrast to 2D convolution, the input, convolution kernel, and the output are represented as 3D tensors in the 3D convolutional layer, which can learn and fuse both the spatial and temporal features from the preceding concatenated feature maps.
Specifically, the convolution is in the form of 3D calculation on feature tensors in 3D convolutional layer, which can be written as where a (l) j (x, y, z) denotes the value of the output feature tensor at position (x, y, z), w (l) ij here is the 3D convolution kernel with a size of P × Q × R, and σ(·) is the ReLU nonlinear activation function.
Usually, a 3D convolutional layer is followed by a 3D pooling operation to extract the local feature and reduce the dimension of the feature tensor. Here, the 3D max pooling operation is employed after the 3D convolutional layer, which can be expressed as a (l) j (x, y, z) = max a (l) j (x·s 1 +u, y·s 2 +v, z·s 3 +w) (12) where s 1 , s 2 and s 3 are the pooling strides, 0 ≤ u < p 1 , 0 ≤ v < p 2 , 0 ≤ w < p 3 , and p 1 , p 2 and p 3 are the pooling window sizes. In addition, some other helpful module or operation, such as dropout and softmax classifier, are necessary in the proposed network. Dropout operation [53] is a good choice to reduce overfitting and widely used in neural network. It forces the network neurons to have robust learning ability with the random active neuron combinations. In our deep neural network, dropout operation is included after the last convolutional layer to increase the generalization.
After all the intra-view and inter-view features of the multiview SAR images have been learned, those feature maps are transformed as a feature vector connecting to a fully connected layer. Then, the softmax classifier is used for the final recognition: where z (L) is the input feature vector to the softmax classifier, and C is the class number. Finally, the recognition result corresponds to the class with the maximal posterior probability.

Cost Function and Network Training
After the forward propagation, the proposed deep network will compare the class label with the inferred output of the softmax classifier, which is calculated with the cross entropy cost function: The training method to minimize the cost function and optimize those trainable parameters is similar to the common SAR ATR neural networks although the proposed deep neural network has a complex network structure. The back propagation through time algorithm [54] can be used to train the network parameters for temporal feature learning module, i.e., ConvGRU, while training the rest of the proposed deep network is realized by a back propagation algorithm. Once the network training phase finished, the proposed deep network will get its optimal parameters, which can effectively learn various features and make accurate classification for the input multiview SAR images.

Experiments and Results
ATR performance of the proposed multiview deep feature learning network will be evaluated in this section. First, the network architecture setup is specified, and the multiview SAR training and testing data formation for experiments are also given. Finally, we will extensively assess the performance of the proposed multiview deep network under different SAR ATR operating conditions.

Network Architecture Setup
We will utilize two network instances with three and four input views to assess the proposed deep network for SAR ATR, which are shown in Figure 7. The input SAR image size for a three view network instance is 90 × 90 and for four view network instance is 120 × 120. The stride size in each convolutional layer is 1 × 1, and in each pooling layer is set as 2 × 2. The probability of dropout is 0.5 during training phase. Other hyperparameters in the proposed multiview deep network are shown in Figure 7, and those hyper-parameters in the instance are determined by statistical validation and trials.
The proposed multiview neural network is implemented with the framework of TensorFlow. The instances are trained with a minibatch size of 128 examples and learning rate 0.001. Their weights and biases are initialized from Gaussian distributions with zero mean and a standard deviation of 0.

Data Set
In our experiment, raw SAR images from the moving and stationary target acquisition and recognition (MSTAR) program are employed to assess the recognition performance of the proposed multiview deep neural network. The MSTAR program aims to develop the advanced SAR ATR system in a battlefield environment under the support of U.S. Defense Advanced Research Projects Agency and the U.S. Air Force Research Laboratory [55]. It has collected a significant quantity of SAR images as the benchmark data set to evaluate the performance of an advanced SAR ATR system, and those SAR images were acquired near Huntsville, AL, USA, by the Sandia National Laboratory using the Synthetic Aperture Radar Target Location and Recognition System. The MSTAR data set includes a series of 0.3 m × 0.3 m resolution SAR images collected with an X-band spotlight SAR sensor. Those images contain different types of vehicle targets and clutter, and ten classes of targets, including T62 and T72 tanks, 2S1 rocket launcher, ZIL131 truck, BTR70, BTR60, BRDM2 and BMP2 armored personnel carriers, ZSU23/4 air defense unit, and D7 bulldozer, are utilized in our experiment for ATR performance evaluation. The optical images of those targets and their corresponding SAR images are illustrated in Figure 8.
The proposed multiview deep feature learning network will be tested both under standard operating condition (SOC) and extended operating condition (EOC) to comprehensively evaluate its recognition performance. In the following Sections 4.3 and 4.4, the two instances of the proposed deep network will be tested under SOC and EOC, respectively. In addition, we will also compare the recognition performance of the proposed deep network with some new published and widely cited SAR ATR algorithms in Section 4.5.

Results under SOC
In this experiment, we will evaluate the recognition performance of the network instances with ten classes of typical vehicle targets under SOC. We only select part of the raw SAR images with depression 17 • from the MSTAR data set to generate multiview SAR image sequences for network training. Their aspect angles of those selected raw SAR images for each target type are all covered from 0 • −360 • . All of the raw SAR images with depression 15 • from the data set are used to generate testing samples. The usage of raw SAR images in this experiment for training and testing samples generation is listed in Table 2.

Target Types 3-Views Instance 4-Views Instance Target Types 3-/4-Views Instances
BMP2sn -9563  78  59  BMP2sn-9563  195  BTR70  78  59  BTR70  196  T72sn-132  78  58  T72sn-132  196  BTR60  86  64  BTR60  195  2S1  100  75  2S1  274  BRDM2  100  75  BRDM2  274  D7  100  75  D7  274  T62  100  75  T62  273  ZIL131  100  75  ZIL131  274  ZSU23/4  100  75  ZSU23/4  274 Here, we use the method described in Section 3.2 to generate a large number of multiview SAR image sequences from a few subsets of the MSTAR data set for deep network training. The view interval θ is 45 • in both the multiview training and testing phase. There are 48,764 and 43,533 multiview SAR image sequences with 17 • depression for three and four input view deep network instances training, respectively. We randomly select the samples from multiview SAR image sequences with 15 • depression for testing. For each class, the number of randomly selected testing samples is 2000, thus there are 20,000 tests for ATR performance evaluation in each SOC experiment.
The recognition results of the proposed deep networks with three and four input views are shown in Tables 3 and 4, which are presented with confusion matrices. The rows in confusion matrix represent the ground truths of the target labels, and its columns are the predicted class labels by the ATR method. It can be observed that the recognition rates of the proposed multiview deep feature learning networks with three and four views are all higher than 99.00% under SOC in the ten classes problem. From the experimental results in Tables 3 and 4, we can see that the multiview SAR images are with much classification information, and the proposed multiview deep network is able to learn both the intra-view and inter-view classification features of these multiview SAR images but only with a few raw SAR data for training samples generation. Hence, we can come to a conclusion that the designed deep network architecture based on the general processing framework can obtain satisfactory recognition performances in the SOC ATR experiment.
Part of the input testing multiview SAR samples and their corresponding output tensors in the last fully connected layer are mapped into 2D Euclidean space by the tdistributed stochastic neighbor embedding (t-SNE) [56] algorithm to illustrate the good classification performance of the multiview deep feature learning network. T-SNE is a powerful dimension reduction algorithm, which can help us study the distribution characteristics of the high-dimensional data in a visualized low-dimensional space. Figure 9 shows the results of 2D visualization example of the input testing multiview SAR samples and their corresponding outputs in the last fully connected layer of the two proposed networks. In Figure 9, the sample points with the same color belong to the same target class. Figure 9a,b illustrate the input multiview SAR samples and the corresponding output for a 3-view network instance, and Figure 9c,d show the input and output of the multiview SAR samples for the 4-view network instance, respectively.
From Figure 9a,c, we can see that the visualization results of the original multiview SAR samples are mixed together and difficult to be classified in practice. Nevertheless, after being processed by the proposed network, both the intra-view and inter-view classification features are learned, and the samples with the same class label get close, and the samples from different classes separate from each other in the visualized low-dimensional space, which makes them easy to be distinguished and leads to satisfactory recognition results.

Results under EOC
SAR ATR performances will be influenced by many kinds of operating condition variations in reality. Thus, we will do some experiments to assess the ATR performances of the multiview deep network with complex test scenarios under EOC. Here, we first evaluate the performance of the proposed deep network with the testing data with large depression angle variation denoted as EOC-D. The selection of raw SAR images in this experiment is listed in Table 5. In this test, four types of ground targets, 2S1, BRDM-2, T-72sn-132 and ZSU-234, in the MSTAR data set with depression 17 • are selected for training samples generation. Thus, there are 20,951 and 19,075 multiview SAR image sequences for three and four input view deep network instances training, respectively. In addition, four types of targets as shown in Table 5 with depression 30 • are used to generate the testing samples. Then, 2000 samples for each class are randomly selected for ATR performance evaluation in this experiment. The recognition results of the EOC-D test are shown in Table 6, and it can be seen that the three and four input view network instances can obtain good recognition results. The top recognition rate of the proposed network instance can reach more than 97.00%, and the recognition rates for all instances are higher than 95.00%. From an EOC-D test, we can see that the training data have a constant depression angle; however, the test results for the multiview deep network can still have relatively stable recognition performances under a large depression variation condition. Next, we will evaluate the performances of the proposed network under different target configurations and versions' testing conditions. The targets for training and testing have different components such as extra fuel tanks under the target configuration variation (EOC-C) test, while the version variation (EOC-V) test includes some structure difference among the training and testing targets, such as the rotation of the tank turret and so on. All of these conditions will add difficulties to accurate recognition but could be encountered in real applications.
In this experimental setup, there are four types of ground targets with depression 17 • that are selected as raw SAR images, and their type and number for each instance are listed in Table 7. Thus, 14,445 and 11,380 multiview SAR image sequences are generated for the three and four input view instances training, respectively. The raw SAR images selection for testing sample generation under EOC-C and EOC-V are also listed in Table 8. Then, we randomly select 2000 samples for each target type variation from these generated testing data for performance evaluations.
The recognition results of EOC-C and EOC-V of three and four input view networks are shown in Tables 9 and 10. It is worth noting that the columns of the tables correspond to the four predicted classes, and the rows in these two confusion matrices denote the actual target type with configuration or version variations. Table 7. Raw SAR images selection in the training phase under EOC-C and EOC-V.

3-Views Instance 4-Views Instance Depression
BMP2sn-9563 78 59 17 •  BTR70  78  59  T72sn-132  78  58  BRDM2 100 75  From Table 9, it can be observed that the recognition rates of the two network instances are higher than 96.00% and 97.00%, respectively. It shows that our proposed multiview network can achieve excellent ATR performance when the testing targets have different configurations. Table 10 shows the recognition performances of the two network instances under EOC-V test. It can be seen that the proposed network with three input views can achieve a recognition rate over 96.00% in this experiment. In addition, with the input views of the network instance increasing to four, the recognition rate can rise to 99.00%. These experimental results above have proven that the proposed multiview deep feature learning network can obtain outstanding recognition performances under different ATR operating conditions.

ATR Performance Comparison
In this subsection, we compare the multiview deep feature learning network with six other methods which have been widely cited or recently published in SAR ATR. These ATR methods for performance comparison are adaptive boosting (AdaBoost) [29], iterative graph thickening (IGT) [20], conditional Gaussian model (CGM) [30], joint sparse representation (JSR) [41], sparse representation-based classification (SRC) [42], and a multiview deep convolutional neural network (MDCNN) [45]. AdaBoost constructs an effective classifier as a linear combination of base classifiers for SAR ATR. IGT is a two-stage ATR framework applied in SAR images based on probabilistic graphical models. CGM is a good SAR ATR classification method based on conditional Gaussian models. In addition, JSR and SRC are two novel multiview SAR ATR methods based on a sparse representation theory, and MDCNN is a deep learning multiview SAR ATR method.
The recognition rates under SOC and EOC for each ATR method are listed in Table 11, and the results of ATR methods for comparison are cited from the related literature [20,30,41,42,45]. The proposed multiview deep feature learning networks with three and four input views are denoted as 3-VDFLN and 4-VDFLN, respectively. In addition, the ATR performance of the proposed deep network with just one view denoted as 1-VDFLN is also tested here as a classic counterpart. Table 11 shows that the accuracy rates of all the ATR methods are higher than 92.00% under SOC, but the performances of those SAR ATR methods are both different under SOC and EOC tests. It can be seen that, due to extracting much classification information from multiview SAR images, the recognition accuracy rates of the ATR methods with multiview inputs are generally higher than that with single-view approaches, especially under EOC tests. The comparison experiment results demonstrate that the proposed multiview deep feature learning networks have superior recognition performance in both SOC and EOC tests over the six other SAR ATR methods. All of the above experiments have shown the outstanding recognition capabilities of the proposed multiview deep feature learning network, and have manifested the reasonability and validity of the multiview SAR ATR processing framework as well.

Conclusions
A reasonable and valid ATR framework and an effective ATR method are the two important issues incorporated into the multiview SAR ATR domain. In this paper, a new processing framework for a multiview SAR ATR pattern has been presented firstly. Based on this framework, a novel ATR method with a multiview deep feature learning network has been presented and applied to multiview SAR ATR as well. Two kinds of crucial classification features, i.e., the intra-view and inter-view features existing in the multiview SAR images, have been learned thoroughly by our multiview deep neural network.
Extensive experimental results have shown that the proposed multiview SAR ATR method can achieve excellent recognition performances. Its recognition rates with three and four views can reach 99.30% and 99.62% under SOC in a ten classes problem, respectively. In addition, it can achieve superior recognition performances compared to existing SAR ATR methods under various operating conditions such as depression angles, configurations, and version variations. These good recognition capabilities of the proposed neural network have also demonstrated the reasonability and validity of the given multiview SAR ATR processing framework in this paper. The subsequent research mainly consists of new multiview ATR networks design and performance tests under more complex operating conditions. Additionally, we will study how to improve the multiview SAR ATR performance with small training samples.