End-to-End SAR Deep Learning Imaging Method Based on Sparse Optimization

: Synthetic aperture radar (SAR) imaging has developed rapidly in recent years. Although the traditional sparse optimization imaging algorithm has achieved effective results, its shortcomings are slow imaging speed, large number of parameters, and high computational complexity. To solve the above problems, an end-to-end SAR deep learning imaging algorithm is proposed. Based on the existing SAR sparse imaging algorithm, the SAR imaging model is ﬁrst rewritten to the SAR complex signal form based on the real-value model. Second, instead of arranging the two-dimensional echo data into a vector to continuously construct an observation matrix, the algorithm only derives the neural network imaging model based on the iteration soft threshold algorithm (ISTA) sparse algorithm in the two-dimensional data domain, and then reconstructs the observation scene through the superposition and expansion of the multi-layer network. Finally, through the experiment of simulation data and measured data of the three targets, it is veriﬁed that our algorithm is superior to the traditional sparse algorithm in terms of imaging quality, imaging time, and the number of parameters.


Introduction
Synthetic aperture radar (SAR) is an active sensor using microwaves for sensing. It can continuously observe targets of interest in all weather conditions, all day and over long distances, and has the ability to identify camouflage and optically concealed targets [1]. Therefore, it has become the main detection means of military reconnaissance and intelligence acquisition [2]. How to obtain better image quality and more target characteristic information with faster imaging time has always been the direction of SAR imaging technology. The traditional matching filter SAR imaging method can be regarded as an approximation of the classical minimum dip estimate; that is, the unstable part of the SAR imaging process is approximated, and a relatively stable solution is obtained. Thus, the approximate result is that the width of the main valve of the imaging result is larger and sidelobes appear [3].
Recently, SAR imaging technology based on sparse optimization theory has developed rapidly [4,5]. However, what needs to be noticed is that the previous methods convert the echo data vector into a one-dimensional column vector to solve, which creates the following three hidden dangers [6]. First, vectorization will lead to the observation matrix dimension increasing sharply, which makes the sparse regularization method need considerable storage space and a long reconstruction time. Second, only one small observation scene can be reconstructed at a time, which limits the practical application of a one-dimensional sparse regularized observation model. Third, vectorization destroys the spatial structure and spatial correlation of the original two-dimensional SAR echo, which cannot effectively utilize the sparse features of the two-dimensional signal, resulting in the degradation of the reconstructed image quality. However, when the coupling of range and azimuth exists in the traditional compression-aware imaging algorithm, it is not possible to derive accurate two-dimensional observation models for a given SAR system, nor can spatial variability be corrected. In addition, sparse reconstruction algorithms often require iterations, and high computational complexity and time consumption. This greatly limits the use of sparse SAR imaging technology [7][8][9].
For different signal forms, different observation matrices, and different signal-to-noise ratio (SNR), the optimal values of the parameters in compressed sensing are different, and it takes considerable time to adjust the parameters before the simulation experiment. The problems in traditional compressed sensing theory can be solved by a deep learning framework [7]. Aiming at the problem that the sparse hypothesis model in traditional compressed sensing theory is not fully satisfied in practical applications, the deep learning method adopts the data-driven method to learn the signal structure characteristics, relaxes the hypothesis condition of the sparsity of the original signal, and adaptively adjusts the network weight to learn the specific structure of the actual signal [10][11][12]. For example, convolutional neural networks, stacked denoising self-encoders, and other networks in deep learning have excellent signal feature representation ability, which can accurately learn the structural features of real signals through a large number of training samples and significantly improve signal reconstruction accuracy [13][14][15]. Furthermore, the deep learning method can transform the measurement and reconstruction process designed separately in traditional compressed sensing theory into an end-to-end framework and transform the traditional linear Gaussian random measurement matrix into an adaptive nonlinear measurement network to reduce the number of measurements and improve the performance of the reconstruction algorithm by high quality signal measurement [16]. However, the traditional compressed sensing reconstruction algorithm cannot achieve realtime processing, which limits the breadth and depth of compressed sensing applications. With the support of parallel GPU hardware, the operation time of the neural network is guaranteed [17]. Merhej et al. also verified that multiple iterations in traditional compressed sensing reconstruction can be converted into calculating deep neural networks to realize real-time reconstruction, which is conducive to the practical application of compressed sensing technology in image processing and other fields [18].
To solve the problems of parameter adjustment, high computational complexity with multiple iterations, and one-dimensional echo data in traditional compressed sensing imaging methods, this paper proposes an end-to-end deep learning network imaging method. Referring to the existing SAR sparse imaging algorithm, the SAR imaging model is first rewritten to an SAR complex signal form based on the real-value model. Next, a deep learning imaging network is built on the basis of the two-dimensional sparse SAR observation model that is decoupled from distance direction and orientation, and the solution process of the two-dimensional model is mapped to a single-layer neural network. Observational scenes are reconstructed through the overlay and extension of multilayered networks. Finally, the proposed algorithm is verified in both simulated echo data and measured data, which proves the feasibility and reliability of the method. The main contribution of our contribution can be summarized as follows: • In order to solve the problems of low imaging quality, excessive parameter settings and difficulty in parameter tuning of traditional SAR sparse imaging methods, we proposed a novel end-to-end SAR sparse imaging method based on a neural network.

•
The algorithm only performs imaging processing in the two-dimensional data domain and derives it into a neural network imaging model based on iteration soft threshold algorithm (ISTA) sparse algorithm, instead of arranging two-dimensional echo data into a vector to continuously construct an observation matrix. This can greatly reduce the computational cost and make sparse imaging of large-scale scenes possible.
• Compared with the previous methods, which can only reconstruct simple targets of simulated data and smaller scenes, our algorithm is superior to the traditional sparse algorithm in terms of imaging quality, imaging time, and parameter numbers through simulation data and measured data of three kinds of targets.
The remainder of this paper is organized as follows: the SAR sparse imaging model is given in Section 2, the further derived SAR deep learning imaging model based on ISTA is given in Section 3, some simulation results are presented in Section 4, and the conclusion is given in Section 5.

SAR Sparse Imaging Model
In this section, we introduce the SAR imaging model. First, we establish the conventional SAR sparse imaging model. Considering that most of the existing studies are based on the underdetermined equation of compressed sensing, the greedy algorithm [19][20][21] or iterative optimization algorithm [22][23][24] is used to solve the sparse SAR model. Then, we rewrite the above sparse model in the form of underdetermined complex matrix equations. Finally, ISTA is selected to solve the underdetermined equations.

SAR Sparse Imaging Model
In this paper, the positive side view mode of spaceborne SAR is used for imaging. To construct an SAR imaging model, we usually use a single point target to represent the target in the scene, which is shown in Figure 1. Assuming that the transmitted signal of the SAR platform is a linear frequency model (LFM) signal, the baseband echo signal received by the platform can be expressed as wheret and t m are fast time and slow time, respectively. Ω is the imaging scene area, x(p, q) is the scattering coefficient matrix of a scattering point in the scene, s 0 [·] is the LFM signal, and R(t m ; p, q) is the squint distance between a scattering point and the SAR platform in a certain slow time. c is the speed of light in vacuum and f c is the carrier frequency. n 0 t , t m denotes the complex Gaussian noise.
Furthermore, we can perform discrete sampling of the continuous-time echo signal, convert it into matrix Y ∈ C n a ×n r , and discretize the scene scattering coefficient into matrix X ∈ C l a ×l r ; then, the following SAR observation model can be obtained from Equation (1): where y ∈ C n×1 (n = n a × n r ) is the vectorized echo matrix and x ∈ C l×1 (l = l a × l r ) is the vectorized scene scattering coefficient matrix. Φ is the observation matrix acquired from the discrete weight of (1), which contains the radar window function, SAR platform to target slope distance, echo phase, and other information. Here we refer to the existing research [5][6][7] and do not consider the phase matrix separately, because from the final imaging results, the amplitude information can already reflect the feasibility of the imaging algorithm.
For the underdetermined linear system in Equation (2), if the considered scene x is sparse enough and the matrix Φ meets the conditions of RIP [25,26], we can recover x by solving the L1 optimization problem.

SAR Complex Signal Sparse Imaging Based on a Real-Value Model
When solving the SAR sparse imaging problem, prior information and optimization algorithms are used to estimate the scattering coefficient x of multiple targets. Considering that SAR imaging problems are all realized in actual scenes, it is necessary to transform the SAR complex signal model into the corresponding real-value model. Therefore, Equation (2) is rewritten into the following real-value model where y R , Φ R , x R , and n 0R denote the real parts of y, Φ, x, and n 0 . Similarly, y I , Φ I , x I , and n 0I denote the imaginary parts of the respective matrix. Equation (3) is rewritten into the form of the corresponding real and imaginary system of equations A complex observation model can be converted into a real-value model through the following relationship: Equation (5) represents the real-value model of SAR sparse imaging. Next, imaging processing is only performed in the two-dimensional data domain instead of arranging two-dimensional echo data into vectors to continuously construct an observation matrix. The methods of traditional SAR sparse imaging references and Ref. [27] first vectorize the two-dimensional echo matrix and then design the imaging algorithm, which will inevitably cause an increase in the amount of calculation. When the imaging scene is large, the obtained vector dimension will become so high that it is difficult to obtain better imaging results. Therefore, its calculation cost is reduced to the same order of magnitude as the matched filter method, which makes sparse imaging of large-scale scenes possible.

Iterative Optimization of the Sparse Imaging Model Based on L1 Decoupling
For the SAR sparse real-value model in Section 2.2, we write the two-dimensional imaging model [26] as where Ξ a and Ξ r denote the binary matrices to denote the downsampling strategy in azimuth and range directions, respectively. The matrix decomposition operation from Equations (5) and (6) is the Kronecker product. • is Hadamard product operator. Furthermore, the dimension of the output result of this operation is still unchanged, but it has a very effective compression in terms of time and space complexity.
For the two-dimensional imaging model in Equation (6), the considered scene can be reconstructed by solving the L1 optimization problem where · F denotes the Frobenius norm and β is the regularization parameter, which controls the balance between data fidelity and sparsity. However, it should be noted that due to the azimuth-distance coupling in the twodimensional echo data domain, the observation matrix cannot be constructed directly [26], which means that based on the optimization problem in Equation (7), it is impossible to achieve sparse reconstruction of the considered scene.
In the implementation of the algorithm for reconstructing the signal, algorithms such as greedy algorithms or threshold iterations are generally used. In this paper, we use ISTA for signal reconstruction.
Generally, ISTA solves the reconstruction problem in Equation (7) by iterating between the following update steps: whereX and X denote the sparse and non-sparse estimations of X, respectively. µ is an iterative parameter, • F denotes Frobenius norm, and r is the residual. However, ISTA usually requires multiple iterations to obtain satisfactory results and requires many calculations. The optimal transformation and related parameters are set based on prior information, but it is very difficult to obtain prior information in practice. Additionally, although we changed the original vectorization processing mode to twodimensional matrix processing, the calculation still has a considerable burden. For example, it still takes more than 50s to reconstruct the targets in [26] experiments. We further consider the use of neural networks to learn some imaging parameters which can greatly reduce the amount of calculation in directly imaging.

Construction of the Deep Learning Imaging Network
In recent years, data-driven methods based on deep learning have shown strong advantages in signal processing, especially in the field of image processing. As long as the network model is sufficiently complex, it can theoretically fit any nonlinear function [27]. Furthermore, any iterative algorithm can be expanded into the corresponding deep learning network structure. Considering that the essence of reconstructing the imaging model in Section 2 is to solve the nonlinear function problem, we naturally associate the problem of L1 iteration optimization with deep learning.
Equation (8) can be rewritten into the following form where F(·) denotes the nonlinear activation function. W(Θ) and b(Θ) represent the weight and bias, respectively. The single-layer network structure Equation (10) of is basically the same as that of the deep unfolding network [28], that is, the single-layer network with the same multilayer structure is stacked, and each layer can obtain the reconstruction results of the scattering coefficient.
At present, the sigmoid function, tanh function, and ReLU function are commonly used as deep learning activation functions [29]. Theoretically, any nonlinear activation function satisfying the condition can be used as the activation function of SAR learning imaging. Considering that the SAR imaging scene is likely to be sparse, a feasible scheme is to make the output value of the activation function as sparse as possible. Thus, we use the soft threshold function in ISTA as the activation function where learnable parameter is Θ (k) . Then, Equation (10) is expressed in the form of an operator update layer in the network where the learnable parameter is µ (k) . Similarly, the activation function of the soft threshold function can be expressed as a nonlinear transformation layer Equation (11). Learnable parameters in the nonlinear transfer layer are nonlinear functions F and regularization parameters β. Of course, in addition to the existing neural network activation functions, adaptive activation functions related to the noise distribution of echo data and the prior distribution of scenes can also be designed. However, in SAR deep learning imaging, whether the activation function constructed by the prior distribution of echo data and scenes can satisfy the differentiability, monotonicity, and other activation function conditions need to be further studied [27]. The residual layer, operator update layer, and nonlinear transform layer constitute the single-layer topology of the SAR imaging network.

Training the Deep Learning Imaging Network
SAR deep learning imaging networks can be used for unsupervised training or supervised training [30] by using the known feature information of SAR target geometry size, shape, and statistical distribution. As the SAR imaging scene is unknown, it is impossible to directly use the scattering coefficient of the scene for error backpropagation.
To solve this problem, we use the scene scattering coefficient estimation obtained from the last layer of the network to multiply by the observation matrix to obtain the estimated SAR echo data, which can be compared with the original SAR echo data to realize the unsupervised training of the model.
In the process of unsupervised learning sample data, the method used in this paper is to downsample the original echo data, increase the system noise of different SNR, increase the echo phase disturbance, and so on, to realize the generation of unlabeled training samples. This method can not only reduce the quantity of data and increase imaging efficiency but also improve the robustness and reliability of the algorithm. The data downsampling method is shown in [6], which will not be repeated here. Considering the echo data defect in most practical application scenarios, in order to verify the imaging ability of SAR deep learning imaging method under this condition, downsampling is defined as the ratio of actual radar sampling points to sampling points according to Nyquist sampling rate.
Considering the increase in training noise for the system, for the SAR sparse imaging model, Y − Ξ a • ΘX • Ξ r 2 F is the data fitting term, which reflects the fitting degree between the reconstructed signal and the original signal. As the noise we add is additive white noise to the system, we can use the Frobenius norm to express the data fitting term. To obtain a large number of training samples and ensure that the sparse imaging model of SAR is not changed, different Gaussian white noise can be added to the original echo data. Furthermore, the primary phase disturbance and secondary phase disturbance with different amplitudes can be added to the SAR echo phase to realize the sample generation of echo data. The phase disturbance simulates the motion compensation error of the SAR platform, which can be used to verify the robustness of the deep learning network to imaging phase error. Assume that the sample database is a and N is the number of samples. If the mean square error loss function is used, the cost function can be written as The advantage of unsupervised training is that there is no need to add any label to SAR data, which greatly reduces the training cost of the SAR learning imaging network. The overall framework of our proposed method is given in Figure 2.

Experiments and Analysis
To test the feasibility of the proposed deep learning SAR sparse imaging algorithm, we use the simulated data and measured data to verify jointly. Among them, the simulated data consist of nine different scattering points on the surface. The measured data include vehicles, ships, and planes in different scenes, which are from the MSTAR and Gaofen-3 datasets. To verify the superiority of the proposed algorithm, deep learning SAR sparse imaging is compared with the existing imaging algorithms ISTA [5] and fast ISTA [6].
The radar parameters used in this paper are shown in Table 1. For unification, we uniformly consider the polarization mode of SAR as VH in the experiment. To highlight the reliability of the algorithm, the downsampling of the echo data is directly adopted during imaging processing.

Simulation Point Target Imaging Experiment
In this section, the SAR imaging model in the presence of Gaussian white noise is taken as the simulation object. The input SNR of the echo is set to 20 dB. The imaging region is discretized into a 30 × 30 grid scene, which contains nine scattering points with different scattering coefficients. In terms of network parameters, the initial learning rate is 0.0001, the epoch number is 100, and the layer number is set to L. In actual imaging, the trained imaging observation matrix, regularization parameters, iteration step length, and other parameters are directly input into the imaging network, and the imaging results are output after forward propagation through the network. The simulated data imaging needs to be compared with the original SAR echo data, so the network training is first performed under the full sampling data structure. The quantity of data for the training to be imaged by no more than five scattering points is 1000. The results of the comparative experiment are shown in Figure 3. It can be seen in the imaging results in Figure 3 that under the premise of downsampling the echo data, our proposed method and the two ISTA imaging algorithms have a large imaging quality degradation. We also show the imaging results of the traditional Range Doppler method [31] under downsampling condition in Figure 3b, which also has much side lobe interference. Since the scattering points have short distance and the scattering coefficients are relatively close, the imaging results inevitably reveal side lobes and other clutter interference. For the proposed method, the loss of unsupervised training is compared with the original echo data by using the estimated echo data. Therefore, when a large quantity of original data is lost, the imaging quality of unsupervised training methods cannot be further improved.
Although all methods can correctly reconstruct the coordinate position of the point target, it is obvious that the superiority of our proposed method in reconstruction can be seen. Both ISTA and fast ISTA have a large degree of defocus, and the imaging quality is significantly worse than that of our proposed method. Compared with the imaging results at L = 3, the imaging results at L = 8 and L = 11 are obviously better at suppressing side lobes. However, the imaging result of L = 11 is not too far from the result of L = 8, which shows that we can use fewer network layers to achieve simulation point target imaging. After comparing the imaging results, we also counted the comparison results of the peak signal-to-noise ratio (PSNR), mean square error (NMSE), peak sidelobe ratio (PLSR), and imaging time of the three methods. Table 2 shows the comparison results. The experimental results given in Table 2 can also reflect the experimental results of Figure 3, and the proposed method obtains the imaging results of thousands of iterations better than the existing mainstream iteration optimization algorithm by using a small number of network layers under the condition of unsupervised training. In terms of imaging speed, after the imaging network has trained the network parameters, the imaging process only requires forward feeding, and the number of network layers is small, so the imaging speed is faster than the ISTA and fast ISTA for multiple iterations.

Measured Target Imaging Experiment
In this section, the effectiveness of the proposed method is further verified by using measured SAR data from the MSTAR and Gaofen-3 satellites. The imaging algorithms used to obtain the MSTAR and Gaofen-3 datasets are traditional Range Doppler, Chirp Scaling, etc. The selected measured SAR data include the following three observation scenes: (a) ground observation scene, which contains a single-vehicle target, (b) sea surface observation scene, which contains several ship targets, and (c) ground airport observation scene, which contains multiple planes. In fact, several other vehicle targets in this dataset can be used as our experimental objects, and here are just these two vehicle targets (T72 and ZSU-23-4) selected for experimental demonstration. For the measured data, the position relationship between the imaging scene of the simulated echo data and the radar is consistent with the previous point target simulation experiment, and the phase information of the echo data can be obtained by a two-dimensional Fourier transform of the complex image data. The network parameters are the same as in the previous section, and no training is performed on the measured data. The imaging results are shown in Figures 4-9.      It can be seen in Figures 4-9 that under the condition of downsampling, the imaging algorithm we proposed is still better than the two iterative optimization algorithms.
We selected T72 and ZSU-23-4 in the MSTAR dataset as the imaging objects. For the single vehicle target, because of the large downsampling rate, both the ISTA and fast ISTA algorithms have different degrees of defocusing. However, the methods we proposed can achieve better reconstruction results. In contrast, the refocusing effect for vehicle targets and the effect of background clutter suppression are also better when L = 8. The possible reason is that when L = 3, all the parameters in the imaging model are not fully learned; and when L = 11, over-fitting of the network occurs. For imaging the ship targets on the sea, we selected two offshore ship targets and one inshore ship target. Figure 6 shows that the three methods can be accurately implemented for imaging on relatively sparse sea scenes. This situation is similar to the imaging results of the single vehicle target. On the offshore scene, due to the sparse background, our imaging algorithm can achieve better reconstruction and suppress clutter interference. For inshore scenes, the image quality of ISTA and Fast ISTA is poor, because the inshore scenes contain a wide variety of targets. Containers and bridges on shore all have high scattering coefficients, causing great interference in the accurate reconstruction of ship targets. However, for the inshore ship, Figure 7 shows that our method achieves a better imaging result which means our proposed method can reconstruct the target better in a complex background. Figures 8 and 9 also show the imaging results in the airport scene. The airport scene is also a challenging task for imaging because the airport scene is more complex with irregular arrangement of a variety of buildings and planes may be densely distributed. In Figure 8, our proposed method reconstructs the plane and the building next to it integrally, while ISTA and Fast ISTA cannot distinguish the plane and other targets. When three planes are parked close together in Figure 9, the scattering points interfere with the imaging. Experimental results show that our method can also reconstruct such densely distributed targets.
Therefore, SAR images obtained by the proposed imaging network have higher reconstruction accuracy, and the target scenes are better preserved, and the background clutter is further suppressed. Since we do not change the parameters of the network, PSNR, NMSE, and PLSR in the measured data part are almost the same as the results in the simulation data given in Table 2. Due to the different scenes of the imaging targets, the imaging time of the measured data is shown in Table 3.

Conclusions
In this paper, an end-to-end SAR deep learning imaging method based on a sparse optimization iterative algorithm was proposed. Referring to the existing SAR sparse imaging algorithm, the SAR imaging model was first rewritten into an SAR complex signal form based on the real-value model. Next, because of the deep learning imaging algorithm based on sparse optimization, the above model was further written into a sparse imaging model based on L1 decoupling. Then, the deep learning imaging network was established based on the two-dimensional SAR observation model, the solution process of the twodimensional model was mapped to a layer of the neural network, and the scene scattering coefficient was solved through the stacking and expansion of the multilayer network. The original fixed imaging model parameters were changed into learnable network parameters. Through unsupervised training, the network independently learns to obtain the model parameters with the best imaging quality and improve the universality and generalization ability of the imaging method to SAR echo data. Finally, the proposed algorithm was verified in both the simulated point echo data and the measured target data with different targets in multiple scenes, which proves the feasibility and reliability of the method.
Moreover, the disadvantages of unsupervised training are also obvious. First, there are certain requirements for the sampling rate and SNR of radar echo. When the sampling rate is insufficient or the echo is compressed, the quantity of echo data is greatly reduced, thus affecting the accuracy of the cost function. Second, the prior information and known characteristics of the target in the scene are ignored, which cannot further improve the imaging performance of the key target. Therefore, in future research, we will focus more on how to further improve the imaging effect and supervise deep learning imaging algorithms. Simultaneously, how to make the reconstructed image have complete information of the phase and background statistical distribution so that the images obtained by these algorithms can better support some important SAR applications has also become a focus of our follow-up research.