Target Recognition in SAR Images Using Complex-Valued Network Guided with Sub-Aperture Decomposition

: Synthetic aperture radar (SAR) images have special physical scattering characteristics owing to their unique imaging mechanism. Traditional deep learning algorithms usually extract features from real-valued SAR images in a purely data-driven manner, which may ignore some important physical scattering characteristics and sacrifice some useful target information in SAR images. This undoubtedly limits the improvement in performance for SAR target recognition. To take full advantage of the physical information contained in SAR images, a complex-valued network guided with sub-aperture decomposition (CGS-Net) for SAR target recognition is proposed. According to the fact that different targets have different physical scattering characteristics at different angles, the sub-aper-ture decomposition is used to improve accuracy with a multi-task learning strategy. Specifically, the proposed method includes main and auxiliary tasks, which can improve the performance of the main task by learning and sharing useful information from the auxiliary task. Here, the main task is the target recognition task, and the auxiliary task is the target reconstruction task. In addition, a complex-valued network is used to extract the features from the original complex-valued SAR images, which effectively utilizes the amplitude and phase information in SAR images. The experimental results obtained using the MSTAR dataset illustrate that the proposed CGS-Net achieved an accuracy of 99.59% (without transfer learning or data augmentation) for the ten-classes targets, which is superior to the other popular deep learning methods. Moreover, the proposed method has a lightweight network structure, which is suitable for SAR target recognition tasks because SAR images usually lack a large number of labeled data. Here, the experimental results obtained using the small dataset further demonstrate the excellent performance of the proposed CGS-Net.


Introduction
As an active microwave imaging sensor, synthetic aperture radar (SAR) has the technological advantage of long combat distances, operating in any condition of time and weather [1][2][3][4][5]. It has a vital part to play in the remote sensing fields. Thus, as an important application, SAR target recognition has become an important issue in recent research.
There are many algorithms for SAR target recognition in the current research, which mainly include two paradigms: non-deep learning and deep learning methods. Specifically, non-deep learning SAR target recognition methods generally include template matching [6] and model-based methods [7]. In the following, several of these methods and the method proposed in this study are introduced in brief.
The template matching method is a basic method for pattern recognition, which generates numerous templates from the targets in different images and recognizes the targets by matching the templates with the region of interest (ROI) area [8]. However, it not only needs many training samples to generate templates but also requires an overwhelming amount of calculations. Consequently, it is difficult to apply the method in an actual SAR recognition task.
The model-based method extracts features in a dataset from a physical or conceptual model of a target and predicts the attributes of the target under different attitudes and configurations [9,10]. It is more effective than template matching. The key to its success lies in developing accurate models and identifying relevant features. However, physical models are complicated and the simulation of a model is difficult, which seriously restricts the development of the model-based method in SAR recognition tasks.
In recent years, with the rapid development of deep learning algorithms, some deep learning methods have prevailed in the SAR image and signal processing field [11][12][13][14][15]. These deep learning methods can extract target characteristics automatically rather than manually, as completed with traditional algorithms [16][17][18][19][20][21][22]. Compared with traditional methods, current deep learning methods are more robust, accurate, and efficient [23][24][25][26][27][28]. For example, Chen et.al [29] proposed an all-convolutional neural network (CNN) for SAR target recognition, which obtained better accuracy than traditional methods. Then, the excellent advantages of deep learning in the SAR field were further demonstrated [30][31][32]. However, deep learning methods typically require a large amount of labeled data for training. This undoubtedly restricts the application of deep learning methods in SAR recognition tasks. Then, to further increase the accuracy of deep learning methods in the case of small datasets, Peng et al. [33] applied a discriminator with classification for SAR target recognition, which improved the performance by adjusting the conditions of image generation and modifying the true and false discriminator. In reference [34], a Wasserstein deep convolutional generative adversarial network (W-GAN) was used for recognition, which obtained remarkable performance by improving the quality of generated images. Reference [35] introduced a task-driven domain adaptation method with transfer learning, which improved the performance of models in the case of small datasets. Subsequently, many relevant deep learning methods including DA, GAN, deep neural networks (DNNs) [36], and so on [37][38][39][40][41], have been proposed.
Although current deep learning methods have achieved some satisfactory results in SAR target recognition tasks, they always ignore physical scattering characteristics [42,43]. In contrast to natural images, the physical essence of SAR images is the coherent superposition of electromagnetic vectors after the electromagnetic waves interact with the scene or target. In the observation stage, the actual 'small antenna' of the SAR system is synthesized into an equivalent 'large antenna' to improve the imaging resolution. In fact, an SAR image is composed of multiple low-resolution echo signals with different imaging angles, which can be decomposed into multiple sub-aperture images using the sub-aperture decomposition algorithm [44,45]. Specifically, Figure 1 shows a full-aperture image and several corresponding sub-aperture images. Although the resolution of sub-aperture images is lower than that of full-aperture images, they contain abundant target features and electromagnetic information, which reflects the physical scattering characteristics of the target from different angles [46]. The scattering information for one target may be different at different angles. The target separability characteristics may exist in other angles when the type of target cannot be recognized from a specific angle. Hence, compared with the original composed SAR images, sub-aperture images contain multi-angle target information, which may increase the possibility of distinguishing different types of targets. However, the current deep learning methods generally regard SAR images simply as grayscale images and ignore some important physical scattering characteristics. Thus, it is crucial to establish a recognition method that can fully utilize the physical characteristic information in SAR images. Physical scattering characteristics are important parts of SAR images, which contain a lot of useful information for target recognition. To make full use of the multi-angle physical scattering characteristics of SAR images, Wang et al. [47] proposed a transfer learning method with sub-aperture decomposition (SD). The SD algorithm is used to obtain sub-aperture images, which can enrich target information and improve recognition accuracy. However, original SAR images and sub-aperture images are complex-valued, and directly applying the real-valued neural network to the SAR target recognition task may potentially sacrifice some useful information for target recognition [48]. To make full use of the target information in a complex-valued SAR image, Zeng et al. [49] proposed a multi-stream complex-valued network for target recognition. Although the multi-stream strategy can extract separability characteristics effectively, it also greatly increases the calculations and parameters in the networks. This will negatively affect performance in the case of small datasets. Subsequently, Liu et al. [50] applied the multilevel attributed scattering center (M-ASC) framework for SAR target recognition, which helps enhance the generalization ability of networks. However, the process of obtaining M-ASCs is complex and parameter optimization is difficult, which seriously restricts the application in actual SAR recognition tasks.
In order to extract the target separability characteristics effectively, in this paper, a complex-valued network guided with sub-aperture images (CGS-Net) for target recognition in SAR images is proposed. A multi-task learning strategy is used in the proposed method, which combines the physical scattering characteristics of complex SAR images for target recognition. It contains main and auxiliary tasks. Specifically, the main task is the target recognition task, which is used to obtain the result of recognition. The auxiliary task is the reconstruction task. One target has different scattering information in different angles, thus, in the auxiliary task, sub-aperture decomposition is used to guide the network to extract the separability features of targets, which fully utilizes the multi-angle target information to improve the performance of the proposed method. Here, since original SAR images and sub-aperture images are complex-valued, the proposed CGS-Net has a complex-valued structure, which makes full use of amplitude and phase information available in the complex SAR data for target recognition. Significantly, the proposed method has a lightweight network structure, which may be suitable for SAR target recognition tasks due to the scarcity of large amounts of labeled data in SAR images.
The main contributions of the proposed SAR target recognition method are summarized as follows.
(1) A novel SAR target recognition method based on complex-valued networks with a multi-task learning strategy is proposed in this paper. The proposed method is not only a complex-valued network but also a multi-task learning-based SAR target recognition method. Multi-task learning can be used to improve the performance of the main task by learning and sharing useful information from the auxiliary task.
Here, the main and auxiliary tasks are contained in the proposed method. Specifically, the main task is the target recognition task, which is used to obtain the recognition results. As an auxiliary task, the reconstruction task is used to guide the model to learn the separability characteristics of targets by reconstructing the sub-aperture image. Here, a complex-valued structure is used to obtain the features from SAR images because the original SAR images are complex-valued. (2) Multi-angle target information is mined for the SAR target recognition task using sub-aperture decomposition. Since different targets have different physical scattering characteristics at different angles, the sub-aperture images contain multi-angle target information, which increases the possibility of distinguishing different types of targets. Therefore, in this paper, sub-aperture decomposition is used to improve accuracy by guiding the model to learn the target separability characteristics.
The rest of this paper is summarized as follows. The proposed CGS-Net for SAR target recognition is briefly introduced in Section 2. Then, the experiments and analyses are discussed in Section 3. Section 4 summarizes the whole paper in general.

Overall CGS-Net Framework
A specific flowchart showing the CGS-Net framework is illustrated in Figure 2. It can be seen that the proposed CGS-Net mainly includes three parts: the base module, the recognition task, and the reconstruction task. In the base module, several complex-valued convolutional layers are used to extract features from SAR images. The features are used in the recognition task and the reconstruction task. Then, the recognition task is used to obtain the recognition results. Finally, the reconstruction task is used to guide the model to extract the separability features of targets by reconstructing the sub-aperture image, which takes full advantage of the information in the sub-aperture images to improve recognition performance. Notably, the reconstruction task only participates in the training stage as an assistant task. In the test stage, the final recognition results are obtained with the recognition task directly. These sub-structures are detailed in the following.

Base Module (1) Complex-Valued Convolutional Layer
The real-valued convolution (RV-Conv) operation purely extracts features from amplitude. Different from RV-Conv, in complex-valued convolution (CV-Conv), both the amplitude and phase information in complex data are used to extract features for target recognition. Hence, in SAR target recognition tasks, complex-value convolution is superior to traditional real-valued-based convolution [48,51].
Similar to traditional RV-Conv, the essence of CV-Conv is that the complex-valued operation is combined with the convolutional operation. Here, when the operation of convolution extends to the complex field, we perform the corresponding element multiplication and sum operation according to the complex-valued operation. In order to comprehend CV-Conv easily, the complex-valued features are separated into real and imaginary parts. Specifically, CV-Conv can be equivalent as follows [48]: In order to describe the process clearly, Figure 3 is used to illustrate the specific difference between complex-valued and real-valued convolution. In Figure 3, red and black represent the real and imaginary parts of the complex-valued operation, where  is the convolution operator, and the kernel size is K K  . In RV-Conv, 1 C is the input channel and 2 C is the output channel. In CV-Conv, the first 1 C 2 and 2 C 2 feature maps (black) are the real components, and the remaining feature maps (red) are the imaginary components.
Here, it can be demonstrated that CV-Conv under the same conditions (input and output channels, and kernel size) has fewer parameters than RV-Conv. Specifically, the parameters of CV-Conv are expressed by the following formula: where the P is the parameter of CV-Conv. The parameters of RV-Conv under the same conditions are 1 2 C C K K    . It is obvious that CV-Conv has fewer parameters than realvalued convolution, which is more suitable for a small dataset. Here, * is the convolution operation, the red one is the convolution operation on the imaginary convolution kernel, and the black one is on the real convolution kernel.
(2) Specific Structure of Base Module The base module is used to extract features from SAR images. Inspired by ResNet [52], in this paper, the complex-valued residual structure is used in the base module. As shown in Figure 2, the base module mainly includes a complex-valued convolutional layer and four complex-valued residual modules. Each complex-valued residual module has two complex-valued convolutional layers and a shortcut layer. Compared with general deep learning target recognition networks, the proposed method has fewer parameters. This is mainly due to the following reasons. The base module of the proposed method has a lightweight structure. It only contains a total of nine complex-valued convolutional layers, which is far fewer than typical deep learning networks, such as ResNet, VGG [53], etc. In addition, CV-Conv has fewer parameters than traditional RV-Conv. Thus, the proposed method may be suitable for SAR target recognition tasks because SAR images usually lack a large number of labeled data.

Reconstruction Task
(1) Sub-Aperture Decomposition Algorithm In the SAR system, SAR images are composed of low-resolution echo signals with different azimuths. In different sub-looks, the scattered echo information is different, which are also called sub-aperture images. The information can be obtained using the subaperture decomposition (SD) algorithm. Here, the sub-aperture images are related but different from each other [45,47]. Abundant electromagnetic scattering information on ground targets is contained in the sub-aperture images, such as geometry, material, structure, etc. Figure 4 shows the specific process of the sub-aperture decomposition method. In order to clearly explain the process, here, the number of decomposed sub-aperture images is set to three. Theoretically, a SAR image can be decomposed into any number of subaperture images, and the Doppler spectrum may overlap or not.
The specific procedure for generating sub-aperture images using the SD algorithm is summarized as follows.
Step 1: The Doppler spectrum on the direction dimension is obtained using the fast Fourier transform (FFT).
Step 2: A remove-window process is performed in the Doppler spectrum, and the Doppler spectrum is divided into three equal parts.
Step 3: The inverse FFT (IFFT) operation is performed in three parts of the Doppler spectrum to obtain the final sub-aperture image. S1-S3 in Figure 4 are three sub-aperture images generated with the SD algorithm. Notably, in the SD procedure, the original SAR image is complex-valued, and the subaperture image is also complex-valued. Here, for visualization, only the real-valued image is shown in the figure.
(2) Guided Module The guided module is used to lead network training, which fully extracts separable features of the target and effectively identifies different targets accurately. Owing to special imaging characteristics, in the SAR system, the sub-aperture image contains multiangle target information. Different targets have different physical scattering characteristics at different angles, which increases the possibility of distinguishing different types of targets. Therefore, in the guided module, the sub-aperture image is used to guide the network to learn the separability characteristics of targets. Specifically, the guided module upsamples the features extracted using the base module to reconstruct a complex image. Then, the parameters in the base module are updated according to the recognition loss and guided loss after calculating the loss between the upsampling results and sub-aperture images. With the guided module, the base module pays more attention to the separability characteristics of targets, which aids the network to recognize different targets efficiently.
Since the base module is a complex-valued network and the sub-aperture image itself is also complex-valued, the structure of the guided module must be complex-valued. Several complex-valued transposed convolutional layers are contained in the guided module, which can be used to reconstruct sub-images with upsampling features. Finally, the sub-aperture image is used to guide the network to learn which regions and features in the SAR image mainly determine the category of the target.

(3) Reconstruction Loss Function
Owing to the complex-valued structure of the reconstruction task, a complex-valued loss function is used to calculate the difference between the reconstruction results and sub-aperture images. Specifically, the reconstruction loss function is summarized as follows: are the kth complex-valued pixel of the reconstructed and sub-aperture image, respectively. Compared with the real-valued loss, the loss function highlights the significance of complex-valued information, and both the real and imaginary parts are processed using backpropagation simultaneously.

Recognition Task
(1) Complex-Valued FC-Layer In the recognition task, a complex-valued fully connected layer is used to integrate the complex features to obtain the recognition output since the input features are complexvalued. The formula for the specific complex-valued fully connected layer is given as follows: x = x + ix is the input neuron. (

2) Recognition Loss Function
In order to highlight the importance of the imaginary part, a complex-valued loss function is used to integrate complex information, which processes both the real and imaginary parts using backpropagation simultaneously. The specific formula for the loss function is expressed as follows: where r x and i x are the real and imaginary parts of the complex output, respectively, and j y is the label of SAR images.

Specific Loss Function in the Proposed Method
Owing to the proposed main and auxiliary tasks, recognition loss and reconstruction loss are both contained in the loss function in the proposed method. Here, the specific loss function in the proposed method can be expressed as follows: where L is the loss function in the proposed method and r L and c L are the recognition loss and reconstruction loss functions, respectively. Here, the specific expression of r L and c L are displayed in Equations (3) and (5).

Experimental Data
The experiment data used in this paper are from the moving and stationary target acquisition and recognition dataset, which is the benchmark dataset for SAR target recognition tasks. It contains ground vehicle targets for different target types, depression angles, serial numbers, and aspect angles. Specifically, the dataset includes different tenclass targets with omnidirectional coverage in the 0-360° range [29]. The samples for the ten-class targets and corresponding optical images are displayed in Figure 5. It should be noted that the experimental data used in the complex-valued networks are the original MSTAR data with complex-valued components. The data used in the real-valued networks are also processed from the original complex-valued MSTAR data. The specific information in the MSTAR dataset is shown in Table 1.

Experimental Details
All experiments are conducted with the same configuration during training. The specific configuration is as follows: the number of iterations is 20,000, the optimizer is Adam, the initial learning rate is 1 × 10 −3 , and the MultiStepLR strategy is used to adjust the learning rate. The experimental platform is a personal computer with NVIDIA RTX 2080Ti GPU and Inter (R) Xeon (R) Silver 4210 CPU on the Ubuntu 18.04 Linux system. The deep learning framework is Pytorch 1. 2.

Evaluation Criteria
To evaluate the experimental results scientifically, the following evaluation criteria are used in the experiment, which include precision, recall, F1-score, and accuracy. Specifically, the formulas for the evaluation criteria are as follows:

TP Precision TP FP
  (8) TP Recall = TP+FN (9) (11) where C is the number of classes, TP is the number of correctly recognized targets, and FP is the number of false alarms.

A. Comparison with Classical Recognition Methods
In order to demonstrate the effectiveness of the proposed CGS-Net, several existing, widely used, real-valued deep learning recognition methods are selected and compared with CGS-Net. Specifically, they mainly include methods such as ResNet18, VGG16, Net4, ResNet10, etc. The specific experimental results are displayed in Table 2. Here, Net4 is a lightweight network, which only contains two convolutional and two fully connected layers. ResNet10 includes nine convolutional layers and a fully connected layer.  Table 2, it is obvious that CGS-Net is superior to the real-valued networks, which achieves an accuracy of more than 99.5%. This is mainly because of the following reasons. Compared with the typical real-valued convolutional networks, the proposed method utilizes physical scattering characteristics and complex information effectively. On the one hand, the complex-valued network mines the target information in complex SAR data effectively, which improves the performance of recognition. On the other hand, the guided module efficiently enhances the capacity of the model to extract the separability characteristics of different targets. In addition, SAR images usually lack sufficient labeled data. Here, the proposed method has fewer parameters than typical deep learning methods, which may be more suitable for SAR target recognition tasks.
In order to evaluate the experimental results scientifically and clearly, several evaluation criteria are used to further demonstrate the performance of the proposed method. Table 3 shows the precision, recall, and F1-score for the different ten-class targets, and the confusion matrix is displayed in Figure 6. The results demonstrate that the proposed method has excellent performance for recognizing any class targets.

B. Comparison with Other Complex-Valued Networks
To independently demonstrate the performance of the guided module, several complex-valued methods including Complex net [48] and DH-RCCNNs [54] are selected to compare with CGS-Net. The specific recognition accuracies are compared in Table 4. Obviously, compared with other complex-valued networks, the proposed method still achieves the highest accuracy. This suggests that the proposed method can fully exploit the physical scattering characteristics. The reconstruction task can guide the network to learn how to identify the target accurately. In addition, the results further demonstrate the superior performance of CGS-Net. To further demonstrate the performance of CGS-Net, several related works proposed in the last two years are used in the experiments, which include FEC [55], CAE [43], and A-ConvNet [56]. Table 5 shows the specific experimental results. It can be seen that the proposed method is superior to other related methods. To demonstrate the universality and robustness of the proposed CGS-Net in the case of limited data, 40%, 50%, 60%, and 70% of the original training data are used as several new training datasets. Due to the small size of the training data, only the methods with a small number of parameters are selected to compare with the proposed method. The specific recognition accuracies are compared in Table 6. As shown in Table 6, it can be seen that the proposed method still has a higher accuracy than ResNet10 and the lightweight network (Net4) in the case of limited training data. Specifically, ResNet10 has the same number of layers (nine convolutional layers in the base module and one fully connected layer) as the proposed method; however, there is no guided module or complex-valued structure. It is obvious that the proposed method has more excellent performance than the other methods, and the guided module and complex-valued structure can improve the accuracy of SAR target recognition. In addition, the experimental results for the different-sized training datasets further demonstrate the universality and robustness of the proposed method.

E. Ablation Experiments
In order to demonstrate the performance of each part of the proposed method, in this paper, ablation experiments are conducted. The specific experimental results are shown in Table 7. As shown in Table 7, compared with ResNet10, Complex-ReNet10 has a preferable performance. The results demonstrate that the complex-valued base module is helpful for recognition. In addition, the proposed method has the highest accuracy, this is mainly because the proposed method uses physical scattering characteristics and complex information effectively. This further proves the effectiveness of the recognition task. In order to demonstrate the influence of different sub-aperture numbers for the proposed method, a comparison experiment with different sub-aperture numbers is conducted in this paper. The experimental results are shown in Table 8. It is obvious that the proposed method obtains the highest accuracy when the number of sub-apertures is 3. This suggests that the optimal number of sub-apertures is 3, which may be mainly because of the following reason. Too few sub-apertures cannot provide sufficient scattering features, while too many sub-apertures may lead to low-resolution subaperture images.

Discussion
From the experiments in Section 3, it is obvious that the proposed CGS-Net is superior to the state-of-the-art methods. This is mainly based on the following reasons. Firstly, compared with typical real-valued convolutional networks, the proposed method utilizes physical scattering characteristics and complex information effectively. Secondly, the guided module efficiently enhances the capacity of the model to extract the separability characteristics of different targets. Finally, the proposed method has fewer parameters than typical real-valued deep learning methods. Hence, the proposed method may be more suitable for SAR target recognition tasks because SAR images usually lack sufficient labeled data. The experimental results obtained when using the small dataset further prove that the proposed CGS-Net has an excellent performance.
In addition, in the training stage, some hyper-parameters, e.g., sub-aperture number and running time, are crucial for the performance of the model. From the experiments in Section 3, it is obvious that the optimal number of sub-apertures is 3. This may be mainly because of the following reasons. Too few sub-apertures cannot provide sufficient scattering features, while too many sub-apertures may lead to low-resolution of sub-aperture images.
Regarding the running time, indeed, the proposed method requires a longer operating time than classical methods due to an immature mode of complex computation in the deep learning frame. However, the proposed method has fewer parameters and flops than typical deep learning methods. This demonstrates that the proposed method has a lightweight structure. In addition, because the scene in SAR images in target recognition tasks is usually very small compared to the large scene in detection tasks, the algorithm used for recognition always processes at high efficiency and speed.
Although the proposed method obtained good performance in the SAR target recognition task, it also has the following limitations.
(1) The proposed method is only applicable to SAR images. The proposed method includes sub-aperture decomposition. This is the unique imaging mechanism in the SAR system. Therefore, it is impossible to extend the proposed method to other fields, such as optical remote sensing and natural images. Its application is limited. (2) The proposed method has not been verified using a large-scale dataset. In contrast to some state-of-the-art methods, such as LW-CMDANet [57] and so on, the dataset used in this paper is complex-valued SAR data. Although the SAR image itself is complex-valued data, there is currently no public large-scale complex-valued SAR dataset, such as ImageNet [58] in the natural image field. The complex-valued SAR data currently available are generally the MSATR and MiniSAR datasets. Therefore, we have not verified the proposed method with a large-scale dataset. (3) Whether the proposed method can be extended to other tasks in the SAR field has not been verified. We have not applied the proposed method to other tasks, such as target detection. Therefore, the extensibility of the proposed method has not been thoroughly explored.
Based on the above analysis, the proposed method may have the limitation that it is only suitable for the SAR field. To address the above limitations, we will further improve the proposed method in future work.

Conclusions
Traditional deep learning algorithms generally treat SAR images simply as grayscale images and usually extract features from the real-valued SAR images in a purely datadriven manner. This may ignore the physical scattering characteristics of SAR images and sacrifice some useful target information. This is undoubtedly a huge barrier for SAR target recognition tasks and seriously restricts the development of deep learning methods. In order to fully exploit the physical information in SAR images, a complex-valued network guided with sub-aperture decomposition for target recognition in SAR images is proposed in this paper. A multi-task learning strategy is used in the proposed method, which combines the physical scattering characteristics of complex SAR images for target recognition. Specifically, sub-aperture decomposition is used to guide the network to learn the separability characteristics of targets as an auxiliary task, which mines the multi-angle target information in the SAR images for target recognition. Here, since both the original SAR images and sub-aperture images are complex-valued, the proposed CGS-Net has a complex-valued structure, which makes full use of amplitude and phase information efficiently. The experimental results demonstrate the outstanding performance of the proposed method on the MSTAR dataset.
In future work, the following two directions will be mainly researched. One is to further improve the performance of the recognition method by combining the proposed method with transformer and semi-supervised learning, especially for complex scenes and limited data. The other is the sub-aperture decomposition-guided strategy for the SAR target detection task.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.