Inverse Synthetic Aperture Radar Sparse Imaging Exploiting the Group Dictionary Learning

: Sparse imaging relies on sparse representations of the target scenes to be imaged. Prede-ﬁned dictionaries have long been used to transform radar target scenes into sparse domains, but the performance is limited by the artiﬁcially designed or existing transforms, e.g., Fourier transform and wavelet transform, which are not optimal for the target scenes to be sparsiﬁed. The dictionary learning (DL) technique has been exploited to obtain sparse transforms optimized jointly with the radar imaging problem. Nevertheless, the DL technique is usually implemented in a manner of patch processing, which ignores the relationship between patches, leading to the omission of some feature information during the learning of the sparse transforms. To capture the feature information of the target scenes more accurately, we adopt image patch group (IPG) instead of patch in DL. The IPG is constructed by the patches with similar structures. DL is performed with respect to each IPG, which is termed as group dictionary learning (GDL). The group oriented sparse representation (GOSR) and target image reconstruction are then jointly optimized by solving a l 1 norm minimization problem exploiting GOSR, during which a generalized Gaussian distribution hypothesis of radar image reconstruction error is introduced to make the imaging problem tractable. The imaging results using the real ISAR data show that the GDL-based imaging method outperforms the original DL-based imaging method in both imaging quality and computational speed.


Introduction
Inverse synthetic aperture radar (ISAR) can obtain high resolution images of moving targets in all weather, day and night. It is an important tool for target surveillance and recognition in non-cooperative scenarios [1]. Traditionally, ISAR imaging uses the range-Doppler (RD) type of methods. Under the assumption of a small rotational angle, the cross-range imaging is achieved by fast Fourier transform (FFT). If the targets undergo complex motion, the imaging time needs to be selected or the high-order motion needs to be compensated. The imaging results of this type of method usually suffer from sidelobe interferences.
The sparsity-driven radar imaging methods have verified that incorporating the sparsity as prior information in the radar image formation process is able to cope with the shortcomings of the RD type of methods. These sparsity-driven imaging methods [2][3][4][5][6][7][8][9] assume that the target scene admits sparsity in a particular domain. In particular, regularized-based image formation models focus on enhancing point-based and region-based [2][3][4][5][6] image features by imposing sparsity on features of the target scene, whereas sparse transformationbased image formation models [7][8][9] represent the reflectivity fields sparsely with dictionaries by imposing sparsity on the representation coefficients through the dictionaries. Both models have been shown to offer better image reconstruction quality as compared to traditional RD imaging methods. However, the aforementioned ways for sparsifying the target scene only depict pre-defined image features and are not adaptive to the unknown target scenes; the performance is, therefore, limited.
In contrast to dictionaries constructed with fixed image transformations used in sparse transformation based image formation models, the dictionaries obtained by the dictionary learning (DL) technology [10][11][12][13] are generated with the prior information of the unknown target image. Thus, the learned dictionaries are adaptive to the target images to be reconstructed and can find the optimal sparse representation coefficients [14,15]. Nevertheless, the strategy of processing each target scene patch independently during the DL and sparse coding stages neglects the important feature information between the patches, such as the self-similarity information which has been proved to be very efficient for preserving image details [16][17][18][19][20] during the image formation process. Both DL and sparse coding stages are calculated with relatively expensive nonlinear estimations, e.g., orthogonal matching pursuit (OMP). These two deficiencies actually limit the improvement of the reconstruction quality and efficiency of DL-based ISAR sparse imaging, respectively.
In order to exploit self-similarity information between patches to recover more details of the target image, we adopt the image patch group (IPG) instead of the independent patch as the unit in DL and sparse coding stages. The IPG is constructed by the patches with a similar structure. A singular value decomposition (SVD) based DL method is performed with respect to each IPG, which is termed as group dictionary learning (GDL). The group-oriented sparse representation (GOSR) and target image reconstruction are then jointly optimized by solving a l 1 norm minimization problem exploiting GOSR, during which a generalized Gaussian distribution hypothesis [21] of radar image reconstruction error is employed to make the imaging problem tractable. The initial idea of our work for ISAR imaging using GDL was presented in the conference paper [22].
Compared with the existing ISAR sparse imaging methods, the innovations of the proposed imaging method are as follows: (1) The IPGs, instead of independent patches, are used as the units in DL and sparse coding stages. The GOSRs characterize the local sparsity of target image and self-similarity information between patches, simultaneously.
(2) A GDL method with low complexity is designed. The GDL is performed with respect to each IPG rather than the target image using the simple SVD. (3) An iterative algorithm combined with soft thresholding function is developed to solve the GOSRs-based l 1 norm minimization problem for target sparse imaging.
The real ISAR data are used to demonstrate the performance of the proposed GDLbased sparse imaging method. The comparisons with the greedy Kalman filtering (GKF) based sparse imaging method [9] and on-line DL and off-line DL based sparse imaging methods [15] are conducted.
The rest of this paper is organized as follows: Section 2 briefly presents the ISAR measurements model and sparse imaging model. Section 3 presents the DL-based ISAR sparse imaging methods. Section 4 elaborates the GDL-based ISAR sparse imaging method in great detail. Section 5 shows the real ISAR data imaging results and the performance analyses of our imaging method. Section 6 draws the conclusions.

Model of ISAR Measurements
We consider an ISAR imaging geometry, including a moving radar platform and a target with both transnational motion and rotational motion, in an image projection plan (IPP). The radar first transmits a linear frequency modulated (LFM) pulsed waveform p(t) = rect(t/T p )e jπk a t 2 e jω 0 t . Here, t represents the fast time, T p represents the pulse width, k a is the frequency modulation rate, ω 0 is the carrier frequency of the transmitted waveform and rect(·) is the rectangular function. The received signal from the target scene is then mixed with a reference chirp. After performing the operations of demodulation, range compression and motion compensation of higher order [23] on the de-chirped signal, the ISAR image formation can be formulated as a 2D inverse Fourier transform (FT) [9] as follows: where f t ∈ [−B/2, B/2] is the range frequency with B denoting the bandwidth of the transmitted waveform, f s = 2 f 0 c Ωx and τ = 2y c , Ω denotes the effective rotational vector of the target [24], x O y denotes the local coordinate system centered at O on the target, the A( f s ) is the spectrum of the amplitude modulation due to the azimuth antenna beam pattern, and T( f s , τ) is the reflectivity distribution of the ISAR target scene to be reconstructed. The r mc (s, f t ) is the ISAR measurement after motion compensation and range compression.

Sparse Imaging Model
Let σ be the vector of the reflectivity function T( f s , τ) and G be the vector of ISAR measurements r mc for discrete samples of fast-time and slow-time domain. The relationship between the ISAR measurements and the reflectivity function to be reconstructed can be modeled in terms of a linear system of equations in a matrix form [9] as follows: where H is the observation matrix of ISAR imaging. Specifically, the H is a Fourier matrix formed by H = F C ⊗ F R , where F C denotes the 1D Fourier transform matrix applied to the column dimension of T( f s , τ), and F R denotes the 1D Fourier transform matrix applied to the row dimension of F R . n is the noise vector embedded in the ISAR measurements. We assume that the number of samples in the range and cross-range dimensions are N r and N a , respectively. G and σ are both vectors with the dimension of N r N a , and H is a N r N a × N r N a square matrix. The σ is naturally sparse, considering that the background of an ISAR image usually has relatively low reflectivity and the target to be imaged is a composition of a number of relatively strong scatterers. The target image can be reconstructed with measurements smaller than N r N a in the theoretic framework of CS based on the following under-determined linear systems of equations: where G s ∈ C m is a randomly under-sampled measurement vector, Ψ ∈ C m×n with m < n, n = N r N a is the measurement matrix, which is a partial Fourier matrix obtained by Ψ = ΘH, where Θ denotes the sensing matrix, n s is the noise vector corresponding to the under-sampled measurements G s . The imaging problem in Equation (3) can be formulated as a space sparse constrained l 1 norm minimization model as follows: The sparse representations in the transform domains depict the certain features (pointbased or region-based image features) of the interested target [7,9] , thereby enhancing the imaging quality of the target scene σ. Let D ∈ C n×n be a dictionary, which sparsely represents the σ as follows: where the vector w ∈ C n is sparse representation of σ in the domain expanded by D.
Thus, the image reconstruction in Equation (4) is performed by obtaining the sparse representation firstly as follows: and then form the target image by the following: However, these dictionaries were artificially designed using the fixed image transformations and cannot be adaptive to the unknown target scene to find optimal sparse representations.

DL-Based Sparse Imaging
The main idea of DL-based ISAR sparse imaging is to utilize an adaptive dictionary to sparsely represent the unknown target scenes [15]. The adaptive dictionary can be learned off-line from the previously available ISAR data or on-line from the current data to be processed; the atoms in the adaptive dictionary are generated with prior information of the unknown target scene rather than the fixed image transformations.

Off-Line DL Based Sparse Imaging
A block processing strategy is adopted for reducing the size of training images to improve the efficiency of DL. Extracting the patches from a training image can be simply expressed as follows: where σ t ∈ C n denotes vectorized training image, σ tk ∈ C n p denotes the k th vectorized patch extracted from σ t , F (·) denotes the operator of patch extraction and k = 1, 2, . . . , N is the index of the patch. Given a set of patches {σ tk } N k=1 , the patch-based DL can be modeled as a l 1 norm minimization problem [15] as follows: where D p ∈ C n p ×n p is the patch based dictionary to be learned, w tk is the sparse representation of σ tk over the D p , and T p is the required sparsity level for each patch. The K-SVD algorithm is used to optimize D p and w tk alternatively, leading to the optimal " D p . Then, the " D p , containing the prior information of the unknown target image, is applied to the following joint optimization problem for reconstructing the target image: where σ k is the to be reconstructed patch, and w k is the sparse representation of σ k over " D p , λ is the regularization parameter and balances the measurements fidelity and sparse representation. An iterative strategy is utilized to minimize Equation (10). In each iteration, the " w k is obtained with OMP and the σ k is reconstructed by σ k = " D p " w k ; the target image σ is estimated by performing conjugate gradient algorithm on the set { σ k } N k=1 .

On-Line DL-Based Sparse Imaging
On-line DL-based sparse imaging models the DL, sparse coding and image reconstruction as a joint optimization problem [15] as follows: An alternating iteration procedure is adopted to solve Equation (11). " D p and " w k are alternately solved with K-SVD, and σ is also reconstructed by implementing the conjugate gradient algorithm on set { σ k } during each iteration.
The dictionaries offered by both off-line DL method and on-line DL method are able to find better sparse representations of the target image as compared to the fixed image transformations [15]. However, the K-SVD used for the DL inevitably requires high computational complexity. In addition, from Equations (9)-(11), it can be noticed intuitively that each patch is actually considered independently in the process of DL and sparse coding, which neglects the important feature information between similar patches in essence, such as self-similarity information.

GDL-Based Sparse Imaging
In order to rectify the above problems of DL-based sparse imaging, we adopt the IPG instead of an independent patch as the unit for DL and sparse coding with the aim of exploiting the local sparsity of target image and the self-similarity information between patches simultaneously. Each IPG is composed of patches with similar structures and is represented by the form of a matrix. An effective SVD-based DL method is performed with respect to each IPG to obtain the corresponding dictionary.

Construction of Image Patch Group
Given a vectorized image x, the size of x equals that of σ, i.e., x ∈ C n . In order to intelligibly elaborate the construction of IPG, the vectorized form of the x needs to be converted to the matrix form with the size of √ n × √ n as shown in Figure 1.
For each patch x k ∈ C √ n p × √ n p , denoted by the dark blue square in Figure 1, in the search window (red square), we search its l best matched patches to compose the image patch set s x k . Here, the similarity between patches is measured, using a certain similarity criterion.
Next, all the patches in s x k are stacked into a matrix of size n p × l, represented by x G k , which contains every patch in s x k as its columns, as shown in Figure 2. The matrix x G k , including all patches with similar structures, is named an IPG. For simplicity, we define the construction of the IPG as follows: where F G k (·) denotes the operator that extracts the k th IPG from x, and its transpose, denoted by F T G k (·), can put the k th IPG back into its original position in the reconstructed image, padded with zeros elsewhere.
By averaging all the IPGs, the reconstruction of the whole image x from set {x G k } N k=1 becomes the following: where "./" denotes the element-wise division and B is a matrix of size n p × l with all the elements being 1. Note that in our work, each patch x k is represented as a vector, and each IPG x G k is represented as a matrix as shown in Figure 3. According to the above definition, it is obvious to observe that each patch x k corresponds to an IPG x G k . One can also see that the construction of x G k explicitly exploits the self-similarity information between patches.

ISAR Image Patch Group Based Imaging Model
Let I be an initial ISAR target image obtained by directly implementing the 2D FFT on the measurements G s , represented by the following: The quality of the initial target image I ∈ C n is very poor as expected since the pulses cannot be coherently integrated. The purpose of GDL-based imaging is to completely reconstruct the high quality target image σ from I.
According to the method provided in Section 4.1, we construct the ISAR IPG set {I G k } N k=1 using the I, and the size of each I G k is n p × l. During the construction of each IPG, the cross-correlation is selected as the criterion to measure the similarity between patches. Thus, reconstructing the σ from I can be modeled as follows:

Group Dictionary Learning Based Sparse Imaging
To enforce the local sparsity and the self-similarity of target image simultaneously in a unified framework, we suppose that the I G k can be sparsely represented over a group dictionary D G k . Here, ×l is a matrix of the same size as the IPG I G k , and m is the number of atoms in D G k . Different from the dictionary in patch-based DL, here, D G k is of size (n p × l) × m, that is, D G k ∈ C (n p ×l)×m . How to learn D G k with high efficiency is given in detail in the next subsection.
Similar to the notations about sparse coding process in patch-based DL, the sparse coding process of each IPG over D G k is to seek a sparse representation w G k ∈ C m such that I G k ≈ D G k w G k , we refer to the w G k as GOSR. Thus, the target image reconstruction model in Equation (15) can be rewritten as follows: Only measurement G s and IPG set {I G k } are available. However, we need to obtain the optimal group dictionaries {D G k } N k=1 and corresponding GOSRs {w G k } N k=1 . Similar to the joint optimization model in on-line DL-based sparse imaging described in Section 3, we reform the reconstruction model in Equation (16) as a joint optimization model as follows: where T g is sparse level of each group. The weight µ in our formulation is a positive constant and balances the measurements fidelity and GOSRs. The first term in Equation (17) captures the quality of the sparse approximations of {I G k } with respect to group dictionaries {D G k }, the second term in the cost measures of the measurements fidelity. Our formulation is, thus, capable of designing an adaptive group dictionary for each IPG, and also using the group dictionary to reconstruct the current IPG. In addition, the model in Equation (17) can typically avoid artifacts seen in the initial image obtained in Section 4.2. All of the above are done, using only the under-sampled measurements G s and set {I G k }.
In our work, we adopt the alternate iteration strategy to minimize the joint optimization problem in Equation (17) to solve the {D G k }, the {w G k } and the σ. Each iteration includes N cycles, and each cycle involves two steps: learning D G k as well as jointly optimizing the w G k and I G k . In the first step, D G k is obtained by GDL, while the corresponding I G k and w G k are fixed. In the second step, the learned D G k is fixed, w G k and I G k are estimated by solving a l 1 norm minimization problem. The details of these two steps are further given in the following subsections.

Group Dictionary Learning
In this subsection, we show how to learn the group dictionary D G k for each IPG I G k . Note that, on one hand, we hope that each I G k can be represented by the corresponding D G k faithfully. On the other hand, we hope that the sparse representation coefficient of I G k over the D G k is as sparse as possible. According to the patch-based DL method presented in Section 3, the GDL can be intuitively modeled as follows: Note that the group dictionary form is very complex. If we adopt the iteration method, such as K-SVD, to learn group dictionary, it is a time-consuming process. Therefore, we do not directly utilize Equation (18) to learn the group dictionary for each IPG.
In order to obtain the group dictionary with high efficiency, in this paper, the SVDbased GDL method is directly performed on each I G k . Thus, the I G k can be decomposed into the sum of a series of weighted rank-one matrices as follows: where the singular value vector with the values in set ∆ G k as its elements. The left singular vector u (G k ,i) ∈ C n p and the right singular vector v (G k ,i) ∈ C l are the columns of unitary matrices U G k ∈ C n p ×n p and V G k ∈ C l×l , respectively. H represents the Hermitian transpose operation. Each atom in D G k for I G k is defined as follows: where the d (G k ,i) ∈ C n p ×l . Therefore, the ultimate adaptively learned group dictionary for I G k is defined as follows: Based on the definitions in above, we can obtain I G k = D G k z G k . From Equations (19)-(21), we can obviously see that the SVD-based GDL method guarantees that all the patches in an IPG use the same group dictionary and share the same dictionary atoms. In addition, it is clear to see that the proposed GDL is self-adaptive to each IPG I G k and is quite efficient, requiring only one SVD for each IPG.

Group Sparse Representation and Target Image Reconstruction
According to the second term in Equation (17), the joint optimization problem of GOSRs and the target image can be formulated as follows: By multiplying the Ψ T for G s and Ψσ, the Equation (22) becomes the following: where σ denotes the target image to be reconstructed and I is the initial image of σ defined in Section 4.2.
The Equation (23) can be rewritten as a regularized form by introducing a regularized parameter λ: where the parameter λ/µ controls the trade-off between the first and second terms in Equation (24).
The Equation (24) can be minimized in the iteration manners of greedy pursuit or convex optimization. In each iteration, using the I to reconstruct the σ, the reconstructed σ is regarded as a novel I in next iteration. Since the form of the ∑ N k=1 w G k 1 is too complicated, to exactly reconstruct σ from I in the iteration manners is a very hard process.
In order to reduce the difficulty of minimizing Equation (24), we perform some experiments to investigate the statistics of the error between the initial images I (t) and corresponding reconstruction results σ (t) in each iteration, where t is the index of iteration. Since obtaining the initial images and the exact reconstruction results is not available, we use the poor quality images obtained by the RD method with different under-sampled measurements to approximate the initial images and reconstructed results. Concretely, the images reconstructed with 25%, 30% and 35% measurements are regarded as the approximated initial images in 1st, 2nd and 3rd iterations, and the images obtained by 30%, 35% and 40% measurements are regarded as the reconstruction result in the 1st, 2nd and 3rd iterations.
We use the real plane data and ship data as the examples. By implementing the approximate operation mentioned above for the motion compensated real plane data, we can calculate the reconstruction errors e (t) = σ (t) − I (t) in the first three iterations, i.e., t = 1, 2, 3. Then, we can drawn the probability density histograms for e (1) , e (2) and e (3) , as shown in Figure 4a-c, respectively. In Figure 4a, the horizontal axis denotes the range of the pixel values in error matrix e (1) , and the vertical axis denotes the probability of the number of pixel values in different ranges to the number of total pixel. From Figure 4a, we can observe that the probability density histograms of e (1) can quite be characterized as a generalized Gaussian distribution (the probability density function of generalized Gaussian distribution is given in https://sccn.ucsd.edu/wiki/Generalized_Gaussian_Probability_ Density_Function, accessed on 22 November 2020.) where the mean is zero and variance is v (t) . The v (t) is estimated by the following: where n is the number of total pixels. Similar to observation in Figure 4a, the probability density histograms of e (2) and e (3) shown in Figure 4b,c can also be approximated as the generalized Gaussian distributions.
We also perform the approximation operation mentioned above for the motion compensated ship data. The probability density histograms of e (1) , e (2) and e (3) of ship data are shown in Figure 4d-f, which have distributions similar to those of the plane data.
Based on the statistics of the probability density histograms of reconstruction errors in the iteration process, to enable minimizing Equation (24) tractably, a reasonable assumption is made in this paper. We suppose that each element in e (t) satisfies an independent distribution with zero mean and variance be v (t) . By this assumption, for ∀ε > 0, we can obtain the following conclusion: where σ G k denotes the to be reconstructed IPG, P(·) be a probability function. Probability coefficient K = n p × l × N with n p is the size of the patch, l is the number of patches in an IPG, and N is the number of IPGs extracted from the initial image. The detailed proof of Equation (26) is given in Appendix A.
According to the approximation in Equation (26), we have the following equation with probability nearly at 1: where η = λK/µn. Note that (28) can be efficiently minimized by solving N joint optimization problems, each of which is expressed as follows: From the definitions of w G k , z G k we can know σ G k = D G k w G k and I G k = D G k z G k , z G k is the singular value vector defined in Section 4.4. Due to the construction of D G k in Equation (21) and the unitary property of U G k and V G k , we can obtain the following relationship: The detailed proof of Equation (30) is provided in Appendix B. Submitting Equation (30) into Equation (29), Equation (29) can then be minimized by solving w G k first: According to Lemma 1 in [25], the closed-form solution of w G k is as follows: where SOT F (·) denotes the soft thresholding function and "·" denotes the element-wise product. Then, the IPG in Equation (29) can be reconstructed by the following: where " D G k is the group dictionary obtained in Equation (21). According to the strategy of alternatively solving Equations (21) and (29), all IPGs can be sequentially recovered, and the target image is reconstructed through Equation (13).
So far, all issues in the process from under-sampled measurements to the target image reconstruction have been solved. In light of all derivations above, a detailed flow chart of the proposed algorithm for ISAR imaging using GDL is shown in Figure 5.

Experimental Results
In this section, we use real plane data and ship data sets to demonstrate the performance of the proposed GDL based ISAR imaging method. In order to evaluate the feasibility and the chief advantages of our method faithfully, the GDL ISAR imaging method is compared with the greedy Kalman filtering (GKF) imaging method [9], ISAR image patch based online dictionary learning (ONDL) imaging method and offline dictionary learning (OFDL) imaging method [15], which deal with the ISAR data in a spatial domain and transform domain adaptive to ISAR data, respectively.

Imaging Data and Parameters
The plane data were collected by a ground-based ISAR operating at C band; the bandwidth of the transmitted waveform is 400 MHz. A de-chirp processing was used for the range compression of the plane data. The ship data were collected by a shore-based X-band radar, and the bandwidth of corresponding transmitted waveform is 80 MHz.
All data sets were motion compensated by the minimum entropy based global range alignment algorithm [26] as well as the improved phase gradient algorithm (PGA) [27]. The details of the size of the raw data (S_r_data), under-sampling ratios (S_ratios) as well as the sparsity are listed in Table 1. Note that the sparsity is estimated, using the approach in [28].
All data sets used for verifying the reconstruction performance of the proposed imaging method were obtained by performing a random under-sampling operation on both the range domain and cross-range domain of the corresponding motion-compensated raw data. For the plane data, we consider two types of under-sampling ratios, which are 25% and 50%. For the ship data, we set the under-sampling ratio to 50%, i.e., 4608 measurements, as listed in Table 1. We set the optimal parameters in the GDL-based imaging method as listed in Table 2. The detailed settings of all the parameters are discussed in Section 5.5. All the experiments are performed in Matlab2015b on an assembled computer with Intel (R) Core (TM) i7-7700 CPU @ 3.60 GHz, 8G memory, and a Windows 7 operating system.

Image Quality Evaluation
To provide a quantitative evaluation of the images reconstructed with the proposed imaging method, we use two types of performance evaluation indices [29]. One is the "true-value" based indices and the other is the conventional image quality indices. The "true-value" based indices assess the accuracy of position of reconstructed scatterers. The conventional indices mainly assess the visual quality of the reconstructed images.
The "true-value" based evaluation is based on the comparison of the original or reference image (which represents the "true-value") with the reconstructed image. Since we do not have ground-truth images of non-cooperative targets, in our experiment, a highquality image reconstructed by the conventional RD method using full data is referred to as the reference image in our work. Thus, the metrics evaluate the performance of the proposed GDL-based imaging method as compared to the RD method. The "true-value" based evaluation uses the following indices: False Alarm (FA) and Missed Detection (MD). FA is used for assessing the scatterers that are incorrectly reconstructed. MD is used for assessing the missed scatterers.
The conventional image quality evaluation includes the target-to-clutter ratio (TCR), image entropy (ENT) and image contrast (IC). The TCR that we use in our work is defined as follows: where (c, r) represents the pixel index, σ(c, r) denotes the reconstructed value at pixel (c, r) in the reconstructed image σ and Ω τ , Ω c denote the target region and clutter region in σ, respectively. We determine Ω τ and Ω c by performing a binarization processing on the RD image. The pixels whose values are greater than a specified threshold are classified into Ω τ and otherwise into Ω c . Figure 6a,b presents the full data imaging results of the plane data and ship data, using the RD method, respectively. Figure 7 shows the imaging results of 25% measurements of plane data, using the GKF, ONDL, and OFDL, as well as our GDL-imaging methods, respectively. Figure 8 shows the imaging results of 50% measurements of plane data. Figure 9 shows the imaging results obtained by 50% ship raw data, using the different imaging methods mentioned above. Note that all imaging results are displayed with the same contour level.

Imaging Results of Real Data
Comparing Figure 7a-d, we see that many artifacts appear in the results of the GKF, ONDL, and OFDL methods. The first three imaging methods cannot provide wellreconstructed images, while our GDL method well reconstructs the nose, tail, and wings of the plane as indicated by the red and blue circles in Figure 7. This verifies the superiority of the proposed GDL-based imaging method in target shape reconstruction. The GOSR obtained by the group dictionary can account for the self-similarity information between image patches, leading to better retaining of the information regarding the plane shape as compared to the other methods considered here. Figure 8 shows the imaging results of 50% measurements of plane data, using the imaging methods considered here. Specifically, the GDL-based imaging method provides the best results. The fewest artifacts appear in the reconstructed image of GDL as shown in the regions indicated by the red and blue circles in Figure 8.
From Figure 9 we can see that the ship target can be reconstructed successfully, using these four methods. It shows the imaging results of the ship target obtained by the GKF, ONDL, OFDL, and GDL methods, using 50% measurements. By performing a further comparison of the regions indicated by red circles and blue rectangles, we see that the result shown in Figure 9d have the fewest artifacts or interferences.
We also see that there are some errors in the results of the GDL method, for example, the poor reconstruction of the nose of the plane in Figure 7. Note that the target region is reconstructed exploiting GOSRs that are calculated with Equation (32), which reflects that the quality of the GOSRs is influenced by the singular value vector of IPG and the soft thresholding. Therefore, the reasons for the poor reconstruction of the target may be that the singular value vector of IPG or the soft thresholding are not accurate enough.

Quantitative Evaluation of Image Quality
Except for the visual comparisons of the imaging results, we also evaluate the image quality using the metrics introduced in Section 5.2. The evaluations of the imaging results are listed in Table 3. From the second and third columns in Table 3, we see the results of our method have the smallest FA and MD, which means that our method can reconstruct the position of target scatterers most accurately and suppress the artifacts and sidelobe in the background well. This is consistent with the imaging results shown in Figures 7-9.
As indicated in the fourth, fifth and sixth columns in Table 3, the TCR, ENT and IC of the GDL-based imaging method show the best values. This is also consistent with the visual comparison of Figures 7-9.
The last column of Table 3 presents the computing time of each imaging method considered. It can be seen that the proposed GDL-based imaging method is the fastest one among all methods. This is due to the non-iterative processing employed in the SVD during the GDL process as compared to other methods.

Discussion on the Parameter Setting
In our experiments, all parameters in GDL based imaging method are shown in Table 2. The P_size denotes the size of the vectorized image patch, and the P_step denotes the moving step of the search window on the initial image during the IPG extraction process. h × w represents the initial size of search window, and l represents the number of similar image patches in an IPG. The µ and λ are the regularization parameters. The parameters, including P_size, P_step, h × w, l, µ and λ, are set empirically as shown in Table 2 and are kept unchanged for all three data sets where λ is adjustable.  From Equations (31) and (A1), we know the parameter λ balances the suppression of the artifacts and the preservation of the details of the target. If λ is too small, the artifacts cannot be fully suppressed, whereas if λ is too large, the target details may be lost. Thus, we consider the visual results and the quantitative indices of the imaging results simultaneously to explore the optimal values of λ for each type of imaging data. Figure 10 shows the variation of FA, MD, TCR, ENT, IC and Times with different values of λ, where other parameters, including P_size, P_step, h × w, l, µ are kept constant for three data sets. We see that as the λ increases, the values of FA and ENT decrease, while the values of MD, TCR and IC increase. Note that all metrics but MD tend to the optimal situation with the increase in λ. The higher MD, in fact, indicates the sparser result, which means that the target structure details in the result may be missing, leading to the relatively bad appearance.
The target images of three data sets with different values of λ are shown in Figures 11-13, respectively. From the Figures 11b, 12b and 13b; we can see that three data sets have the best image quality in the case of λ that equals 0.02, 0.03 and 0.035, respectively. The target images have the smallest number of artifacts and the best target shape. Furthermore, the quantitative indices of the three results are better than those of the results obtained by other imaging methods as shown in Table 3. Thus, the optimal values of λ for plane data 1 and plane data 2 as well as ship data 1 imaging can be set as 0.02, 0.03 and 0.035, respectively.

Conclusions
In this paper, we extended the DL-based ISAR sparse imaging method and presented the GDL based ISAR sparse imaging method. The GDL without the time-consuming iteration process has high efficiency. The sparse representation extracted from IPG contains the self-similarity information between the image patches and the local sparse prior information of the target image. The self-similarity information is very helpful in preserving the target shape or contour during the imaging process. The GDL ISAR sparse imaging method is better than the state-of-the-art ISAR sparse imaging methods considered in this paper in both imaging quality and computation speed.  Assume that e i is independent and satisfies a distribution with N (0, v 2 ). Then, for ∀ε > 0, the relationship between σ − I 2 2 and ∑ N k=1 σ G k − I G k satisfies the following property: Proof. Based on the assumption that each e i is independent, we know that each e 2 i is also independent. Since mean E{e i } = 0 and variance V{e i } = v 2 , the mean of e 2 i can be expressed as follows: By invoking the Convergence in Probability of Law of Large Numbers, for ∀ε > 0, it leads to the following: i.e., Further, let σ G and I G denote the concatenations of all σ G k and I G k , respectively. The error between σ G and I G is represented by e G where each element in e G denoted by e j , j = 1, 2, . . . . . . , K and K = n p × l × N. Due to the assumption that each e j is independent and satisfies a distribution with N (0, v 2 ), the same manipulation with Equation (A3) applied to e 2 j yields the following: which can be rewritten as follows: From Equation (A4), we know the following: From Equation (A6) we know lim K→∞ P {β ∈ (− ε 2 , ε 2 )} = 1. Therefore, when K → ∞, we have − ε 2 − β > − ε 2 − ( ε 2 ) = −ε and ε 2 − β < ε 2 − (− ε 2 ) = ε. Thus, the Equation (A7) can be scaled to the following: i.e., lim n→∞ K→∞
Proof. According to the definitions of D G k , w G k and z G k , we have the following: Then,