CNN-Based Vehicle Target Recognition with Residual Compensation for Circular SAR Imaging

: The contour thinning algorithm is an imaging algorithm for circular synthetic aperture radar (SAR) that can obtain clear target contours and has been successfully used for circular SAR (CSAR) target recognition. However, the contour thinning imaging algorithm loses some details when thinning the contour, which needs to be improved. This paper presents an improved contour thinning imaging algorithm based on residual compensation. In this algorithm, the residual image is obtained by subtracting the contour thinning image from the traditional backprojection image. Then, the compensation information is extracted from the residual image by repeatedly using the gravitation-based speckle reduction algorithm. Finally, the extracted compensation image is superimposed on the contour thinning image to obtain a compensated contour thinning image. The proposed algorithm is demonstrated on the Gotcha dataset. The convolutional neural network (CNN) is used to recognize the target image. The experimental results show that the image after compensation has a higher target recognition accuracy than the image before compensation.


Introduction
Because of its all-day, all-weather imaging capability, synthetic aperture radar (SAR) has been widely used in military and civilian applications in recent years. There are different types of SAR depending on their mode of detection. One of the detection methods of SAR is to span a large azimuth in the process of data acquisition, which is called wide-angle SAR (WSAR). If the radar always shines on the same ground when detecting, and the azimuth turned around is large enough to make the radar's flight track a circle, it is called circular SAR (CSAR). CSAR is a special case of WSAR. Research based on SAR and CSAR includes time-frequency analysis, 2D/3D imaging, digital elevation model (DEM), target detection and recognition, etc. [1][2][3][4][5][6][7][8][9][10][11].
Research into CSAR began in the early 1990s [12][13][14]. Soumekh first proposed the imaging mode and echo signal time-domain model of CSAR in 1996, and also proposed the CSAR imaging algorithm based on wavefront reconstruction [15]. Subsequently, more and more research has been carried out on CSAR. Many of the datasets used in these studies are from the Air Force Research Laboratory (AFRL). AFRL has released several experimental and simulation datasets for WSAR and CSAR, as well as challenging related problems [16][17][18][19]. In addition to the datasets released by AFRL, some used to verify whether the improved imaging algorithm is able to improve the accuracy of target recognition.

Analysis of Contour Thinning Imaging
The contour thinning algorithm was proposed in [34] and has been used in vehicle target recognition for CSAR [53]. This algorithm is based on the backprojection algorithm. The core idea is to stretch the modulus during the projection superposition process, highlight the areas with high modulus, and suppress the areas with low modulus, so as to obtain a contour thinning image. The main steps of the algorithm are given below [34].
The echo data received by the radar is a function of slow time n  and receiving frequency where L represents the side length of the imaging scene.
Let  denote the size of the synthetic aperture of the SAR. All sub-images in the range of  are superimposed. The function ( )   is then used to stretch the modulus for each sub-aperture image. The final image I can be obtained by superimposing all the sub-aperture images.
where 1 k and 2 k denote the enhancement coefficient and the inhibition coefficient, respectively.
The T denotes the threshold value. Ref. [34] shows that the empirical values of   Figure 1a-c shows the imaging results of using the traditional backprojection algorithm for three vehicle models, namely Chevrolet Impala LT, Mitsubishi Galant ES, and Toyota Highlander. Figure 1d-f corresponds to Figure 1a-c for the contour thinning imaging results. The value of during imaging is 10°. As can be seen from the figure, the results of contour thinning imaging are clearer than the results of traditional backprojection imaging, but the detailed information is also obviously lost, as shown by the red circle callout.   Figure 2. Obviously, not all vehicles have the same highly reflective contour in front or rear. Therefore, the loss of this part is likely to reduce the recognition of vehicle targets. The first task of this paper is to restore this missing part and compensate for it in the contour thinning image.

Figure 2.
A diagram of circular SAR imaging. The picture on the left illustrates the process of airborne SAR flying around a ground target in a circle. The image on the right is the result of imaging the echo data using a backprojection algorithm. The part marked by the red circle in the right image corresponds to the high reflection part in the front or rear of the vehicle in the left image.

Residual Compensation
To retrieve the information lost during contour thinning imaging, compensation is considered. Because the image obtained by the backprojection algorithm can well restore the scattering characteristics of the target, in this paper we refer to the image obtained by the traditional backprojection algorithm as the original image. The goal of compensation is to make the thinning contours correspond to all possible contours on the original image, and add the missing details to the thinning image. Let Iorg denote the original image. Let Ithin denote the contour thinning image. The difference between Iorg and Ithin is denoted by Ires, and is given by where L denotes the pixel length of the image.  Obviously, the missing details are contained in the residual image Ires. How to extract effective details from Ires and eliminate noise is a problem to be studied. Suppose the function that can achieve this requirement is represented by ( )   , the processed image can be expressed as ( ) cps res When the image of compensation Icps is obtained, it is superimposed with the thinning image Ithin to obtain the final compensated contour thinning image Ifin, which is given by fin thin cps In the process of compensation, the most important is the selection of ( )   . There are many ways to obtain the ( )   . For example, a method based on compressed sensing can be used to obtain the best signal under certain conditions. However, in the case of this paper, the main contour of the target has been successfully obtained, and the detailed information we want to extract is only the auxiliary information in the residual images. Therefore, after considering the computational complexity and cost performance, we prefer a simple image enhancement algorithm as the ( )   function. Let's refocus on the goal of the function ( )   , which should be to preserve and enhance larger, brighter areas in the residual image while suppressing smaller, darker areas in the images. This is just a common speckle reduction problem in SAR image processing. There are several speckle reduction algorithms that can be used. In this paper, an algorithm with excellent performance in different scenarios is selected, that is, the gravitation-based speckle reduction algorithm [53,63,64] is used as the ( )   function. The gravitation-based speckle reduction algorithm is calculated as follows [53]: where I(i, j) denotes the brightness of the point in the i-th row and the j-th column on image I. The representing the distance between (k, l) and (i, j). The radius of gravitational Ref. [53] shows that the empirical values of R and m are 10 and 1, respectively. In practice, in order to get different degree of speckle reduction effect, ( )   can be used several times iteratively, which is denoted by In addition, the gravitation-based speckle reduction algorithm can also be used to denoise contour thinning image Ithin. However, this will increase the computation time, which can be optionally performed according to the actual situation. The processing procedure of the residual compensation imaging algorithm is shown in Algorithm 1.

Output:
The compensated contour thinning image Ifin.

Vehicle Target Recognition
The image after compensation contains more information than the image before compensation. However, it is not clear whether the added information is helpful to recognize the target. In this section, the convolutional neural networks (CNN) is used for vehicle target recognition. The detailed description of CNN is found in many papers and books [65,66], and will not be repeated here. This article briefly describes the steps of the algorithm used as follows: (1) Linear coding and decoding are used for images in the library to obtain patch features through training. The image is cropped into patches of size 8 × 8, and then expanded into 64 × 1 vectors row by row. The vectors are input into a three-layer network, the input and output nodes of the network are 64, and the number of hidden layer nodes is variable. The function of this network is to make the output image as close to the input image as possible by training a large number of images. After the weight matrix converges, it can be used as the extracted features for subsequent convolution. When the number of hidden nodes is much less than the number of input nodes, it is also called sparse coding. (2) Scale all images to standard size. All images are divided into training images and test images, and label data is generated at the same time. The training data set and the test data set are obtained respectively. (3) The CNN is used to train the data in the training set. The network mainly includes two convolutions and pooling, and softmax regression at the output layer. In step (1), multiple 8 × 8 weight matrices obtained by linear coding are convolved with images of L × L size. The size after convolution is (L − 7) × (L − 7). The error is calculated for each training, and the corresponding network parameters are adjusted according to a certain criterion. Repeat the training until the error is small enough or the number of trainings exceeds the threshold. (4) The network parameters obtained after the training is the optimal parameters. Use these parameters to recognize the images in the test set and calculate the accuracy. The architecture of CNN used in this paper is shown in Figure 4.
where yi denotes the output of the i-th neuron. There are many optimization algorithms for CNN. The optimization algorithm in this paper uses Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) [67,68]. In In multi-class decision problems, the above equations cannot be used directly. However, the accuracy can be intuitively obtained, that is, the sum of all the correct numbers that are identified is divided by the total number.

Experimental Results
The data used in the experiments in this paper comprise a subset of the Gotcha dataset released by AFRL, that is, the Target Discrimination Research subset [19]. Airborne SAR detects vehicle targets on the ground at 31 altitude orbits. The ground area is approximately 5 km in diameter. In each altitude orbit, the airborne SAR makes circular flight around the ground area. Finally, 56 individual targets are extracted from the large dataset.

Residual Compensation Imaging
To get a better region of interest, the effect of speckle reduction under different iterations is compared, as shown in Figure 5. As can be seen from the figure, with the increase of iteration times, the bright areas become more concentrated and the noise is reduced. When the number of iterations is less than 2, the noise is relatively large, and the processed image is not suitable for direct compensation. When the iteration is more than three times, the noise is well suppressed. However, a large number of iterations will also erode the target, so it is appropriate to choose three or four iterations. In the following experiments, three iterations was selected, that is,  Figure 6 shows the comparison of Iorg, Ithin and Ifin images of the three vehicle models in Figure  1a-c. Figure 6g-i are superimposed images of the contour thinning images and compensation images of the three vehicles. As can be seen from the figure, the compensated image adds detailed information and the contour is closer to the original image. Ref. [34] gives an index of the contour thinning degree, which is denoted as D(I). It is defined as the perimeter of the pixels of all the target areas in a binary image divided by the area of the pixels of all the target areas.
Similarly, the data of 150 vehicles randomly selected in the Gotcha dataset are imaged with synthetic apertures of 5°, 10°, and full aperture, respectively. Imaging methods include traditional backprojection, contour thinning, and residual compensation, which are denoted as Iorg, Ithin , and Ifin . For each imaging method, the contour thinning degree D(I) is calculated according to Equation (13), and the results are shown in Figure 7. As can be seen from the figure, the contour thinning degree after the residual compensation is basically the same as before compensation, and the area increased by compensation does not lower the contour thinning degree.

Recognition Analysis
The dataset used for recognition is still the Target Discrimination Research subset of the Gotcha dataset [19]. In the process of radar detection, some vehicles have changed position, some doors or trunk opened. To reduce these interferences, only the data of the stationary vehicle is selected for the recognition experiment. Finally, 660 images from six models were used for recognition experiments. The names of the six models are Chevrolet Impala LT, Mitsubishi Galant ES, Toyota Highlander, Chevrolet HHR LT, Pontiac Torrent and Chrysler Town & Country respectively. The labels for these six models in the dataset are Fcara, Fcarb, Fsuv, Mcar, Msuv, and Van. The six vehicle models were represented in 80, 81, 143, 111, 103, and 142 images, respectively.
All images were randomly divided into training set and test set. Let β denote the ratio of the number of images in the training set to the total number of images. The training/testing ratio is β/(1 − β). Tables 1-3 show the total confusion matrices of all models when β is 0.7 (corresponding training/testing ratio is 2.3) in an experiment. It can be clearly seen from the confusion matrix which vehicle model is more likely to be misidentified.    Vehicle target recognition in this paper is a multi-class decision problem. For each model, it can be regarded as only a binary decision problem. That is, treat the model itself as positive, and treat others as negative. The total confusion matrix can be converted into six separate confusion matrices. The respective confusion matrices are shown in Tables 4-9 only for the test set.     According to Equation (12), the Accuracy, Precision, Sensitivity and Specificity of each model can also be calculated as shown in Table 10.  From the data in Table 10, the recognition accuracy looks pretty good. However, in fact, this is caused by simplifying the multi-class decision problem into a binary decision problem. In the binary decision process, many samples that are classified as negative and also identified as negative are actually identified incorrectly in multi-class decision. Therefore, the accuracy in Table 10 can be considered as artificially high. Let P denote the total accuracy of all models, defined as the ratio of the number of images recognized as the correct model in the dataset to the total number of images in the dataset. Let The data above is only the result of one experiment and is not representative. To analyze the recognition accuracy, subsequent experiments will be repeated multiple times to take the average. In the following experiments, each point in the figure is the average of the results of ten randomized trials. Figure 8 shows the curve of accuracy changing with training set ratio β when CNN takes different number of hidden nodes. As can be seen from Figure 8a,c, the accuracy increases as β increases. However, when β is greater than 90%, the accuracy of the test set has a large deviation. When the number of hidden nodes is sufficient and the β is between 0.7 and 0.8, the accuracy of the test set is the highest, almost 90%. In Figure 8b, when the number of hidden nodes is large, the accuracy is almost 100%. However, when the number of hidden nodes is small, the accuracy of training set decreases with the increase of β. This is because fewer hidden nodes cannot extract accurate features. When the number of training samples increases, the extracted features will become worse, and the accuracy will decrease.  Figure 9 shows the curve of accuracy changing with the number of hidden nodes in CNN at different β. As can be seen from the figure, when the number of hidden nodes exceeds 100, the accuracy is generally stable, and does not increase significantly as the number of nodes increases. When the number of hidden nodes is greater than 100, the test accuracy test P is almost greater than 80%, the training accuracy train P is close to 100%, and the total accuracy all P is greater than 90%.

Model Accuracy (%) Precision (%) Sensitivity (%) Specificity (%)
When the number of hidden nodes is 100 and β is 0.7, the test accuracy test P has the highest value of 89%.  Figure 10 shows the accuracy of each model changing with training set ratio β. Figure 11 shows the accuracy of each model changing with the number of hidden nodes in CNN. It can be seen from the figure that which model of vehicle is more easily recognized and which model of vehicle is more easily confused. That is, model Mcar has higher recognition accuracy, while model Fcarb has lower recognition accuracy.   Figure 12 show the examples of residual compensation imaging for six models of vehicles. As can be intuitively seen from the figure, some of the differences between different models of vehicles are small, while others are obvious. In the above experiment, the size of the images was 100 × 100. Figure 13 shows the comparison of the test accuracy of the compensated image Ifin and the contour thinning image Ithin changing with the image size L.
As can be seen from Figure 13, the accuracy increases as the image size increases. When the image size is less than 40 × 40, the accuracy is very low, just like random guess. When the image size is larger than 70 × 70, the accuracy tends to be stable, and increasing the image size has a limited improvement in accuracy. In almost all cases, the image recognition accuracy of Ifin is higher than that of Ithin. Experiment results show that the compensated image improves the recognition accuracy by about 3% on average.

Discussion
There have been many studies on CSAR imaging algorithms. Most of these studies have focused on accurately reducing the scattering characteristics of targets. The contour thinning imaging algorithm in this paper is not for the purpose of accurately reducing the scattering characteristics of the target, but for the purpose of enhancing the recognizability of the target. Residual compensation imaging algorithm proposed in this paper is to further improve and highlight the contour characteristics of the target on the basis of contour thinning. Therefore, this paper does not use other papers' commonly used signal-to-noise ratio, peak side lobe ratio and other indicators to evaluate the image results. To compare with the previous work, the contour thinning degree is used to quantify the contour characteristics of the imaging results. However, the contour thinning degree is only a secondary indicator; more important is the impact of imaging results on target recognition. Therefore, CNN was used to test different imaging results, and the experimental results showed that the residual compensation imaging algorithm could indeed improve the accuracy of target recognition.
At present, little research has been done on vehicle model recognition using the Gotcha dataset. There are only a few papers on vehicle target recognition with WSAR. The earlier study of vehicle target recognition of CSAR on the Gotcha dataset was conducted by Dungan et al. They used a point set to represent the vehicle image, used the Mahalanobis distance as a measure between the point sets, and applied algorithms such as point pattern matching and pyramid hash matching to recognize the vehicle model, and achieved a recognition accuracy of more than 95% [42,44]. However, the data used by Dungan et al. and the data in this paper are two different subsets of Gotcha dataset. The data used by Dungan et al. are eight groups of altitude orbit data of the same vehicle at the same location. The data used in this paper includes data for different vehicles, different locations and 31 altitude orbits. Gianelli et al. used the same subset of the Gotcha dataset for vehicle target recognition research, and their proposed recognition algorithm achieved a recognition accuracy of 90% [45]. However, in their experiment, they removed many of the flawed images and kept only 540 images of vehicles for recognition. In this paper, only the moving and changing image data of the vehicle is excluded. The number of images actually used for recognition experiments is 660. Based on the above situation, this paper mainly analyzes the impact of imaging results on the recognition accuracy, and does not compare with the recognition accuracy of other recognition algorithms.

Conclusion
This paper presents an improved contour thinning imaging algorithm based on residual compensation for CSAR. The algorithm adds a compensation module to the contour thinning imaging algorithm, which better restores the original scattering characteristics of the target. The imaging results show that the image after compensation does not reduce the contour thinning degree, and contains more information than the image before compensation. To verify the influence of the residual compensation imaging algorithm on target recognition, the convolutional neural network is used to recognize vehicle targets. Experiment results show that the image after compensation has a higher target recognition accuracy than the image before compensation. The improved accuracy is about 3% on average. The proposed algorithm is demonstrated on the Gotcha dataset. The residual compensation imaging algorithm proposed in this paper is simple and easy to understand. This algorithm can effectively obtain a clear and complete vehicle contour image, and improve the recognizability of the target. Although the algorithm proposed in this paper is obtained from CSAR data, it can be extended to common WSAR data with detection angles of less than 360°. In our future work, we will focus on the integration of imaging, focusing and target recognition to further improve the accuracy of target recognition.