Airborne SAR Autofocus Based on Blurry Imagery Classification

: Existing airborne SAR autofocus methods can be classified as parametric and non-parametric. Generally, non-parametric methods, such as the widely used phase gradient autofocus (PGA) algorithm, are only suitable for scenes with many dominant point targets, while the parametric ones are suitable for all types of scenes, in theory, but their efﬁciency is generally low. In practice, whether many dominant point targets are present in the scene is usually unknown, so determining what kind of algorithm should be selected is not straightforward. To solve this issue, this article proposes an airborne SAR autofocus approach combined with blurry imagery classiﬁcation to improve the autofocus efﬁciency for ensuring autofocus precision. In this approach, we embed the blurry imagery classiﬁcation based on a typical VGGNet in a deep learning community into the traditional autofocus framework as a preprocessing step before autofocus processing to analyze whether dominant point targets are present in the scene. If many dominant point targets are present in the scene, the non-parametric method is used for autofocus processing. Otherwise, the parametric one is adopted. Therefore, the advantage of the proposed approach is the automatic batch processing of all kinds of airborne measured data.


Introduction
Different from the spaceborne synthetic aperture radar (SAR) [1][2][3][4][5][6], airborne SAR is frequently affected by atmospheric turbulence, and thus, its flight trajectory may deviate from a pre-planned straight-line trajectory [7][8][9][10]. Therefore, combining motion compensation (MoCo)/autofocus processing for airborne SAR imaging [11][12][13][14] is necessary. In many cases, the motion compensation technique combined with the inertial navigation system (INS) and/or global position system (GPS) data cannot meet the expected accuracy requirements because the aircraft may not be able to carry enough high-precision INS/GPS equipment [15][16][17]. Consequently, the autofocus technique based on radar raw data needs to be implemented in airborne SAR imaging.
Generally, SAR autofocus methods can be classified as being parametric [18][19][20] or non-parametric [21][22][23][24][25]. The main principle of the parametric method is to model the motion error as a polynomial model with several parameters and then to estimate the parameters of the model according to some criteria. The criteria mainly include contrast optimization (CO) [19], minimum entropy (ME) [20], and sharpness [26], among which the ME criterion is the most widely used. When the motion error is more complex, a higherorder polynomial model is required, so the efficiency of the parametric method is usually low in the case of high accuracy requirements. As the non-parametric method does not need to model the motion error, its efficiency is relatively high. However, the non-parametric ones usually estimate the motion error by extracting the phase or phase gradient directly from the radar data, so it requires many dominant point targets in the scene. Otherwise, it leads to an unbearable estimation error. In summary, the latest literature shows that the current state-of-the-art autofocus methods still have some shortcomings, that is, they can not guarantee efficiency and accuracy at the same time.
From the perspectives of autofocus accuracy, first, and efficiency, second, when the scene contains many dominant points, we should choose the non-parametric method to achieve autofocus processing to improve efficiency. On the contrary, we should choose the parametric one to ensure accuracy. However, in actual data processing, we usually do not know whether many dominant point targets are present in the scene in advance so we cannot determine which autofocus algorithm should be chosen. To solve this problem, this paper proposes an airborne SAR autofocus approach combined with blurry imagery classification. Blurry imagery classification based on a typical VGGNet [27] is embedded into the traditional autofocus framework as a preprocessing step before autofocus processing. By using this preprocessing step, the type of scene can be automatically determined before autofocus processing. If no dominant point targets are present in the scene, it is regarded as the first kind of scene and the parametric method is used for autofocus processing. Otherwise, it is regarded as the second kind of scene, and the non-parametric one is adopted.
In some latest reports, deep learning has been applied to the ISAR imaging community [28][29][30][31], but these state-of-the-art methods cannot be used for SAR autofocus processing because they mainly aim to enhance the imaging performance of ISAR sparse imaging. As far as we know, there is no public report on how to integrate deep learning with SAR autofocus processing as well as blurry imagery classification.
The rest of this paper is organized as follows. In Section 2, we discuss the existing problems of current autofocus algorithms and the motivation of our approach. The proposed approach is detailed in Section 3. In Section 4, the processing results of real data are provided to validate the effectiveness of the proposed approach. The conclusion is drawn in Section 5.

Problem Formulation and Motivation
Autofocus processing is a core step of airborne SAR data processing. We summarize the applicable conditions from standard autofocus methods, as shown in Table 1. In theory, non-parametric methods, such as the dominant scatterer algorithm (DSA) and widely used phase gradient autofocus (PGA) algorithm, can estimate any form of motion error but they need to lay out corner reflectors in the scene in advance or many dominant point targets are present in the scene. They may not be suitable for evenly distributed scenes, such as grasslands and deserts. The CO/ME algorithm in parametric methods adopts the criterion of optimal image quality, which is suitable for all kinds of scenes in theory. However, it needs polynomial modeling for the motion error and iterative search, and thus, its efficiency is usually low. Although the MapDrift (MD) algorithm does not need iterative processing and its efficiency is usually higher than that of the CO/ME algorithm, it is difficult to estimate the high-frequency motion error. In summary, we can see that the parametric and non-parametric methods each have advantages and disadvantages. Therefore, for different types of scenes, we need to use different autofocus algorithms, which makes the current airborne SAR data processing not universal and unable to achieve the batch processing of airborne SAR data. We present the autofocus results of two sets of airborne SAR data, as shown in Figures 1 and Figure 2. We can see that the accuracy of the non-parametric method is low in the case where the intensity distribution of the targets is uniform (e.g., fewer dominant point targets) (see Figure 1a). For the scenes with more dominant point targets, its accuracy is higher (see Figure 2a). In contrast, the parametric method has a higher accuracy for both types of scenes (see Figure 1b and Figure 2b), but its efficiency is far lower than that of the non-parametric one. Therefore, in general, when many dominant point targets are present in the scene, a non-parametric method is recommended in terms of efficiency and accuracy. When few dominant point targets are present in the scene, using the parametric one to ensure accuracy at the expense of partial efficiency is recommended. Therefore, in practice, the automatic classification of blurry imagery before autofocus processing is required to determine which autofocus algorithm should be adopted. For this purpose, we divide the blurry imagery into two categories. One does not contain dominant point targets (called scene type #1 in the following), and the other is with dominant point targets (called scene type #2 in the following).

Autofocus Approach Based on Blurry Imagery Classification
The flowchart of the proposed autofocus approach based on blurry imagery classification is presented in Figure 3. The first step is to obtain the coarsely focused imagery through SAR imaging processing. The range migration algorithm (RMA) is used as the standard imaging algorithm. After that, we introduce the blurry imagery classification into autofocus processing as a pre-processing, which is different from the traditional autofocus algorithm. The classification of blurry imagery adopts the popular deep learning approach, and the learning network adopts the typical VGGNet [27]. Finally, when the blurry imagery is classified as the scene type #1, the ME algorithm as a parametric method is applied for autofocus processing. On the contrary, if the blurry imagery is classified as the scene type #2, the non-parametric one is adopted (the widely used PGA algorithm is adopted in this article). The proposed approach is detailed in the following.

Imaging Processing
First, before the classification of blurry imagery, one needs to use the standard imaging algorithm to obtain the coarsely focused imagery, namely, blurry imagery. The standard frequency-domain imaging algorithms mainly include the range-Doppler algorithm and chirp scaling algorithm [32], which are more efficient than the wavenumber-domain algorithms, but they are only suitable for the broadside mode or small squint angle case. The RMA and polar formation algorithm (PFA) belong to the wavenumber-number algorithm [33,34]. They are suitable for the case of large squint angles. Due to the assumption of wavefront curvature, PFA is generally only suitable for small-scene imaging. Therefore, RMA is adopted as the standard imaging algorithm in this article.
It should be pointed out that in the case of large motion error and/or large squint angle, RMA can introduce a serious defocusing in the range direction [33,34]. The influence of range defocusing on blurry image classification is not considered in this article. Therefore, the method proposed in this article is based on the assumption of broadside mode. Futhermore, if the motion error is too large, azimuth defocusing will be too serious, which may lead to wrong classification of blurry imagery. Therefore, this article also needs to use INS/GPS data with certain accuracy to roughly compensate the radar raw data.

Blurry Imagery Classification
After applying the standard imaging algorithm (i.e., RMA) to radar raw data, we obtain the blurry imagery. The imaging scene types are divided into two categories, as shown in Figure 4. The purpose of this section is to classify arbitrary blurry imagery accurately to determine to which scene type it belongs. Currently, the deep learning network has been widely used in image classification, so we use this type of approach to classify blurry imagery. Due to the robustness of VGGNet in image classification, VGGNet [27] is used as the learning network. It should be noted that the SAR imagery without autofocus processing usually has different degrees of defocusing and that the defocusing degree is unknown. Therefore, to increase the robustness of the imagery classification learning network, imageries with different defocusing degrees are added to the training data, as shown in Figure 4.
In addition, the imageries used for network training are usually small. For example, the pixels of the imageries for training are 512 × 512, but the actual blurry imagery may have much more pixels (e.g., 8192 × 8192). One solution is to reduce the pixels of the large imagery to the same size as the training imagery by downsampling processing and, then, to input the downsampled imagery into the network for classification. However, because the actual large imageries are very complex, if the downsampled imagery is directly input into the classification network, it may not achieve the classification effect (shown in Section 4). To solve this problem, we divide the large imagery into several small imageries (no overlapping between the imageries) with the same size as the training imageries. After all of the small imageries are input into the classification network, each small imagery corresponds to a classification result. Some small imageries may be classified as scene type #1, while the remaining small imageries are classified as scene type #2. To ensure the robustness of the algorithm, a suitable threshold should be set carefully. If the proportion of all small imageries classified into the scene type #1 exceeds this threshold, this large imagery is regarded as scene type #1. Otherwise, it is considered scene type #2. The discussion of the threshold is presented in Section 4. . SAR imageries of two kinds of scenes with different defocusing degrees. The first row corresponds to scene type #1, and the second row corresponds to scene type #2. The defocusing degree increases from left to right.

Autofocus Processing
Through the classification of blurry imagery, if it is classified as scene type #1, the parametric method should be applied for autofocus processing. This article uses the image quality optimization algorithm based on the ME criterion. Of course, we can also use other criteria, such as CO or sharpness. The estimation of motion error parameters based on ME criterion can be solved by where E(·) donetes the entropy value of the focused imagery G(k, n). (k, n) represents the index of the range and azimuth sampling points. A is the parameter set of the motion error to be optimized. The expression of the entropy value is given by If it is determined as scene type #2, the widely used PGA algorithm is used in this article. In the standard PGA algorithm, the phase gradient is estimated by the maximum likelihood (ML) estimator, which is given by [35] ∆ˆϕ(t n ) = arg where N denotes the number of seleted range cells.
It should be pointed out that the above blurry imagery classification and autofocus processing are only applicable to spotlight mode and cannot be directly applied to stripmap and other imaging modes. To make the proposed approach suitable for all modes, one can easily introduce the azimuth sub-aperture technique widely used in traditional autofocus processing. For each sub-aperture, we use the processing flowchart shown in Figure 3. After obtaining the motion errors of all sub-apertures, one can integrate all of the azimuth sub-aperture motion errors to obtain the motion error of full-aperture data and finally carry out MoCo and iterative processing.

Processing Results of Real Data
Here, we use the processing results of real data to verify the blurry image classification and autofocus processing.

Classification Verification
Since the imagery before autofocus processing is usually blurry or defocused, we usually classify blurry imageries. Before blurry imagery classification, we need to use a lot of training data to train VGGNet. First, one needs to build the training dataset. Usually, the defocusing degree of blurry imageries is unknown in advance, so we need to generate a large number of blurry images with different defocusing degrees. There are about 7000 imageries for both types of scenes. To achieve this, we use several well-focused SAR imageries to generate imageries with different defocusing degrees by introducing different phase errors in the unfocused domain. It should be noted that the introduced phase error should be a form of higher-order polynomials. However, to simplify the process of blurry imagery generation and considering that the quadratic phase error is the main component, we only introduce a pure quadratic phase error. Figure 4 shows partial datasets of two kinds of scenes with different defocusing degrees. Eighty-five percent of the generated datasets is randomly selected as the training set and fifteen percent is selected as the validation set. The two types of scenes are extracted independently. The imageries in the training and validation dataset are acquired from an X-band radar system working in sliding spotlight mode. The carrier frequency is 8.9 GHz and the resolution is 0.12 m.
It is worth mentioning that, during the production of the training data, we judge the scene type through our experience. For example, no obvious dominant point targets are present in the imageries of the first row in Figure 4, so it is determined to be of scene type #1, while dominant point targets can be seen in the imageries of the second row, so those are regarded to be of scene type #2. For the training of the learning network, the cross-entropy criterion is selected as the loss function and the activation function is the "Relu". The hyperparameters of the network are that the batch size is 32, the learning rate is 0.0001, the epoch is 10, and the solving algorithm is Adam. Based on VGGNet16, Figure 5 shows the loss function and accuracy varying with the training epoch. The results indicate that the accuracy of the training set can reach 99.5% after 10 epoch training. Finally, we input the validation set into the network for accuracy test, and its accuracy reached 99.7%. We know that different network layers will have different learning results, so we next compare VGGNet13 and VGGNet16.

Autofocus Verification
Additionally, ten different large imageries are used to verify the effectiveness of the proposed approach. VGGNet13 and VGGNet16 are quantatively compared and analyzed. These ten imageries and the imageries in training dateset are acquired from different radar systems. As shown in Tables 2 and 3, imageries 5 , 6 , 7 , and 9 are obtained by a Ka-band radar operating in sliding spotlight mode. The carrier frequency is 35GHz, the resolution is 0.2m, and the imagery size is 3584×512. Imageries 1 , 2 , 3 , 4 , and 8 are obtained by a Ku-band radar operating in spotlight mode. The carrier frequency is 16GHz, the resolution is 0.1m, and the imagery size is 18432×2048. Imagery 10 is obtained by a Ku-band radar operating in stripmap mode. The carrier frequency is 16GHz, the resolution is 0.6m, and imagery size of 9557×1024.
Their actual type is shown in the second row in Tables 2 and 3, which can be easily determined by the PGA autofocus results of the ten large imageries. If defocusing occurs in the imagery, it is determined to be of scene type #1. Otherwise, it is of scene type #2. As mentioned previously, because the ten imageries are much larger than the imageries for training, one can resize the large imageries to small imageries directly through the downsampling processing, and the classification results are shown in the third row in Tables 2 and 3. One can see that imageries 1 , 4 , and 6 are incorrectly classified. Further analysis found that scene type #2 is easily incorrectly classified after downsampling, while the imageries for scene type #1 are all classified correctly. The reason for this is because the downsampling processing may make the dominant point targets in scene type #2 weaker.  Alternatively, we first divide the large imageries into small imageries with a size of 512 × 512. Then, for each large imagery, their small imageries are all put into the network for classification and the output results are statistically analyzed, which are shown in the fourth and fifth rows in Tables 2 and 3. Using imagery 1 as an example and based on VGGNet16, among the 144 small imageries, 95 belong to scene type #1 and 49 belong to scene type #2. As mentioned previously, with a given threshold, we can determine to which category the large imagery belongs. As shown in Table 3, if one sets the threshold as 98% (e.g., 98% of the imageries are judged to be of scene type #1), imageries 8 and 10 are classified incorrectly (the sixth row in Table 3). If the threshold is 96%, imagery 10 is classified incorrectly (the seventh row in Table 3). If the threshold is set as 94%, all ten large imageries are classified correctly (the eighth row in Table 3). Consequently, the threshold should be set carefully. By comparing VGGNet13 with VGGNet16, one can see that when the threshold is set as 94%, imagery 10 is wrongly classified by VGGNet13 as shown in Table 2. Therefore, VGGNet16 can achieve a higher accuracy.
After the ten imageries are classified, one knows which type of method should be used for autofocus processing. If it is determined to be of scene type #1, the ME algorithm is applied for autofocus processing. If it is of scene type #2, the PGA algorithm is used. If the threshold is set as 94% and VGGNet16 is adopted, all the large imageries are classified correctly. In this condition, we next compare the autofocus results of the traditional methods and the proposed approach for the ten large imageries. Figure 6 presents the imageries before and after autofocus processing of imageries 5 and 6 . Figure 7 presents partially enlarged imageries of those in Figure 6. From Table 3, the two imageries are classified as scene type #1 and scene type #2 using our approach, respectively. Therefore, if the PGA algorithm is applied to both imageries, the autofocus quality of imagery 5 is low. In contrast, the two imageries can be focused well using our approach.
Finally, we quantitatively evaluate the autofocus results based on the entropy criterion for the ten large imageries. The formula of entropy function is shown in (2). The quantitative results are shown in Table 4. It is obvious that, for the ten large imageries, the autofocus quality of our approach is no worse than that of the PGA algorithm. It is worth mentioning that, if the ME algorithm is adopted for the ten large imageries, the autofocus quality for all imageries is all high but the efficiency is very low. Therefore, through the preprocessing of blurry imagery classification, we can choose the appropriate autofocus algorithm according to different scene types, avoiding the problem of poor precision in non-parametric autofocus for scene type #1 or avoiding the problem of low efficiency in parametric autofocus methods for scene type #2. The proposed approach combined with blurry imagery classification can achieve better autofocus accuracy and efficiency.   Finally, we dicuss the time-consumption. Compare to the non-parametric and parametric algorithms, the proposed method adds a preprocessing step (i.e., imagery classification), and most of its time-consumption is the training process of the network. Therefore, as long as the network training is completed, the test takes a short time. Using imagery 5 as an example, a time-consumption comparison is performed. After testing, the time-consumption of the PGA and ME are 126.9 s and 494.5 s, respectively, while the imagery classification based on VGGNet16 takes only 2.96 s. The experimental environment is as follows: image classification network is based on tensorflow-gpu2.3 version and the used GPU is NVIDIA TITAN RTX. For the ME and PGA methods, MATLAB R2018a based on CPU is adopted and the CPU is Intel Xeon Gold 6234.

Conclusions
In this article, a SAR autofocus approach based on blurry imagery classification is proposed. This method embeds blurry image classification as a preprocessing step in traditional autofocus processing. Through this preprocessing, the scene type before autofocus processing can be determined to automatically determine whether to use parametric or non-parametric methods. By using this approach, the capability of the batch processing of airborne SAR data can be improved. The blurry imagery classification is based on a typical VGGNet in a deep learning community. The imagery classification performance based on VGGNet13 and VGGNet16 are compared and indicates that a deep layer could slightly improve the classification accuracy. The effectiveness of the method is verified by the processing results of the real airborne SAR dataset.
Some points needing attention are that this article needs the support of INS/GPS data with certain accuracy to ensure that the azimuth defocusing is not too serious. Besides, the influence of range defocusing on imagery classification is not considered and thus the assumption of broadside mode is adopted in this article, so the imagery classification of largely squint mode needs further study. Deep learning is only introduced into blurry image classification and the traditional autofocus method is still used for radar motion parameter estimation. The radar parameter estimation for SAR autofocus based on deep learning may be an important development direction in the future.