CIST: An Improved ISAR Imaging Method Using Convolution Neural Network

Compressive sensing (CS) has been widely utilized in inverse synthetic aperture radar (ISAR) imaging, since ISAR measured data are generally non-completed in cross-range direction, and CS-based imaging methods can obtain high-quality imaging results using under-sampled data. However, the traditional CS-based methods need to pre-define parameters and sparse transforms, which are tough to be hand-crafted. Besides, these methods usually require heavy computational cost with large matrices operation. In this paper, inspired by the adaptive parameter learning and rapidly reconstruction of convolution neural network (CNN), a novel imaging method, called convolution iterative shrinkage-thresholding (CIST) network, is proposed for ISAR efficient sparse imaging. CIST is capable of learning optimal parameters and sparse transforms throughout the CNN training process, instead of being manually defined. Specifically, CIST replaces the linear sparse transform with non-linear convolution operations. This new transform and essential parameters are learnable end-to-end across the iterations, which increases the flexibility and robustness of CIST. When compared with the traditional state-of-the-art CS imaging methods, both simulation and experimental results demonstrate that the proposed CIST-based ISAR imaging method can obtain imaging results of high quality, while maintaining high computational efficiency. CIST-based ISAR imaging is tens of times faster than other methods.


Introduction
Inverse synthetic aperture radar (ISAR) imaging is capable of imaging the non-cooperative targets, such as aircraft, ships, missiles, etc. in all-day and all-time environment. Thus, ISAR has been wildly applied in various field, e.g., target detection and recognition, missile defense, space surveillance, etc. [1,2]. Generally, ISAR can achieve high range resolution by transmitting wide bandwidth signal, and achieve high cross-range resolution through targets' relative rotational motion. Traditional ISAR imaging methods are mainly based on the Range-Doppler (RD) algorithm [3,4], i.e., Fourier transform or matched filter. To achieve high cross-range resolution, they require the raw echo data to be complete, otherwise it may lead to low imaging quality in cross-range direction. However, since the targets of ISAR are mainly non-cooperative moving target, radar is likely to lost targets while observing. Therefore, a high cross-range resolution ISAR imaging method with limited data is meaningful.
Compressive sensing (CS) has been successfully utilized to reconstruct sparse signals with limited measurements [5], so it has been wildly used in ISAR sparse imaging [6]. Many CS-based ISAR imaging methods have been proposed in recent years [7][8][9][10]. Zhang et al. introduced compressed sensing into ISAR imaging, and showed that CS-based imaging methods outperform the RD types measured experimental results and analysis are presented. In Section 5, we discuss the influence of the convolution part in CIST. The conclusions and future work are drawn in Section 6.

ISAR Sparse Imaging Methods
In this section, we firstly introduce the typical signal model of ISAR imaging system. Subsequently, we briefly elaborate how ISTA works. Figure 1 presents the ISAR imaging model. The non-cooperative target is moving with relative motion, including rotational motion and translational motion. The translational motion error is supposed to be well compensated through range alignment and phase adjustment [26,27]. R 0 denotes the distance from radar to target center O. Supposed that the radar transmits a linear frequency modulated pulse signal s T , which can be expressed as:

ISAR Signal Model
where τ, T, A T , f c , and γ denotes the fast time, pulse repetition period, signal amplitude, carrier frequency, and the chirp rate, respectively; rect(·) denotes the unit rectangular function, as follows: During the coherent processing interval (CPI), rotational angle changes ∆θ(t) = θ − θ, then the instantaneous distance R(t) from P(x, y) to radar becomes approximately: R(t) ∼ = R 0 + x sin ∆θ(t) + y cos ∆θ(t) since the rotation angle change ∆θ(t) is small enough during CPI. Additionally, ∆θ(t) can be expanded by Taylor to: where ω denotes rotation rate and α denotes its acceleration. Subsequently, the radar returned signal from P(x, y) can be presented as: where c, t, A R , and T a denote the speed of light, slow time, echoed signal amplitude, and the observation duration, respectively. Additionally, t d = 2R(t)/c denotes the round-trip time delay between radar and target. After range compression, i.e., Fourier transform along the range direction, the echoed signal can be expressed as: where A is the signal amplitude after range compression. The target of ISAR are generally moving, so t d may lead to Doppler ratio. Subsequently, we substitute Equations (3) and (4) into Equation (6): where λ is the wavelength, f = 2ωx/λ denotes Doppler frequency, and β = 2αx/λ denotes the Doppler rate. Suppose that range cell τ = 2(R 0 + y)/c contains N scatters at different cross-range locations, the returned signal in the range cell can be expressed as: where we have neglected the constant phase term. When considering that the rotational motion is assumed to be stationary in RD-type algorithms, rotation acceleration α is zero, so Equation (8) can be simplified as: After applied cross-range Fourier transform and ignoring the constant phase term, the final ISAR imaging result can be formulated, as follows: where f d denotes the frequency domain. It can be seen from Equation (10) that the resolution in cross-range direction is proportional to CPI T a . However, the targets of ISAR are usually non-cooperative, so the CPI is greatly limited, leading to low cross-range resolution for RD imaging algorithm. As a result, ISAR CS imaging with limited data becomes more significant and practical. When the CPI is very short and takes noise into account, the echoed signal from a single point after ranged compression in Equation (9) can be rewritten as: where M is the total number of scattering centers, but M < N since the observation time is shorter and some scattering centers are lost; n(t) denotes the independent and identically distributed complex Gaussian noise. Equation (11) can be formulated in matrix form, as follows: where w w w ∈ C N , n n n ∈ C M , and s s s ∈ C M denote weighted vector, Gaussian noise, and observed data, respectively. The time and frequency resolution can be defined as ∆t and ∆ f d . Supposing that the pulse repetition frequency is f r , then ∆t = 1/ f r and ∆ f d = f r /N. Accordingly, matrix H H H ∈ C M×N can be presented, as follows: where ϕ m,n = exp(−j2π · n∆t · m∆ f d ), 0 n N, 0 m M. After using the Fourier transform along cross-range direction, i.e., to achieve cross-range compression, the ISAR imaging result is as follows: where F F F ∈ C N×N is the cross-range Fourier transform matrix. Equation (14) shows the linear relationship between the imaging result and input echo data, which is crucial to construct the CS-based ISAR imaging model.

ISTA Sparse Imaging
In general, given the fine ISAR image x x x ∈ C N in cross-range direction, linear measurements y y y ∈ C M , M < N and the measurements matrix Φ Φ Φ ∈ C M×N , CS-based ISAR imaging model can be presented as: y y y = Φx x x + n n n.
Specifically, y y y denotes the measurements in data domain (can be regarded as echoed data); measurements matrix Φ is constructed by Φ = D D DF F F , where D D D ∈ C M×N and F F F ∈ C N×N denotes the down-sampling matrix and Fourier transform matrix, respectively. To obtain the imaging result x x x in Equation (15), regularized minimization under the CS theorem can be presented, as follows: where x x x ∈ C n denotes the ISAR scene to be imaged, λ denotes the regularization coefficient, and Ψ Ψ Ψx x x denotes the transform coefficients of x x x with respect to sparse transform Ψ Ψ Ψ ∈ C M×N . The sparsity of Ψ Ψ Ψx x x is constrained by the l 1 norm [28,29]. The sparse imaging problem presented in Equation (16) can be solved with ISTA as the following iterative steps: Here, k denotes the ISTA iteration number; v v v (k) denotes the residual measurement error in iteration-k; γ is the stepsize. To solve the last step in Equation (17) (so-called proximal mapping) [30,31], an efficient way is using the soft thresh-holding shrinkage, as follows: where η st (·) denotes the soft thresh-holding shrinkage function; λ is the shrinkage. η st (·) function is defined as: We let z z z (k) denote the input of η st (·) function in Equation (18): Hence, the second part of Equation (17) can be rewritten as With iterations from Equations (20) and (21), the traditional ISTA can obtain a satisfactory imaging result. However, it requires extensive computation, and the parameters (e.g., thresh-hold λ, stepsize γ, and sparse transform Ψ Ψ Ψ) need to be carefully pre-defined [32] in order to obtain satisfactory results, which are not easy to be optimally hand-crafted.

Proposed CIST-Based Imaging Method
In the proposed CIST-based ISAR sparse imaging method, iterations of ISTA are strictly mapped to a deep network, as shown in Figure 2. Each iteration corresponds to one phase of ISTA operation, as illustrated in Figure 3. CIST unfolds the conventional ISTA, and parameters in CIST are set to be learnable, which means essential parameters (e.g., λ and γ) can achieve optimal value automatically through iterations. In addition, the linear transform Ψ Ψ Ψ is substituted by a more general nonlinear transform T (·), which contains two convolution operations and a LReLU in between, as illustrated in Figure 2. In order to increase the capacity of the proposed method, the convolution size is set by N f , where N f is the filter size of convolution kernel (by default is 32). And the filter size of convolution kernel is set by 3 × 3. Inspired by ResNet [33], a skip connection is also applied (from the start to end of one phase, as the red line in Figure 3 shows), in order to avoid vanishing gradient.

Network Model
To map ISTA into a convolutional network, the linear transform Ψ Ψ Ψ is replaced by a nonlinear ), ⊗ denotes convolution operation, A A A and B B B denote the first and second convolution respectively. Subsequently, Equation (16) can be rewritten as: T (·) is shown in Figure 2, framed by a red dotted rectangle. By Solving Equation (22) with ISTA and applying the new sparse transform T (·) to x, we can obtain a new form of Equation (16): In the CIST, Equations (20) and (21) are mapped into a new form. Firstly, stepsize γ is allowed to be variable across iterations, so the first part of CIST is as follows: Secondly, compute x x x (k+1) in Equation (21) associated with nonlinear transform T (·), which can be presented in matrix form as: where ρ is a coefficient in LReLU (set to 0.01 by default) and A A A and B B B can be any matrices.
are linear related, i.e., the linear relationship can be expressed, as follows: where α is a scalar and only related to T (·), By applying the linear relationship in Equation (26) into Equation (23), we obtain: where σ = λα. Similar to Equation (21), the solution processes of Equation (23) are as follows: Here, step size γ (k) and regularization parameter σ (k) are set to be variables. After every iteration, they will update their values, which is more flexible than traditional ones.
To solve x x x (k) in Equation (28), a left inverse of T (·) is needed, so we introduce T (·), such that Equations (24) and (29) are illustrated in Figure 2, every step of ISTA is mapped strictly into every phase of CIST, which guarantees the feasibility of CIST. Meanwhile, learnable parameters and transforms increase its flexibility.

Algorithm Flow
We cascade the structure of Figure 2 to complete the network, as illustrated in Figure 3. The number of cascades P is set by six, which means that every input is reconstructed by the structure in Figure 2 six times. Inputs are echo data down-sampled in cross-range, which has been range compressed and well motion compensated. Note that input measurements are in data domain, while imaging results are in image domain. Because, in this paper, considering that sparsity of cross-range direction is more practical in ISAR imaging, we focus on non-completed data in cross-range only.
As for the initial reconstruction x x x 0 , as denoted in Figure 3, we use least squares estimation to compute the initialization. Given the label and input pairs {x x x i , y y y i }, i = 1, 2, · · · , N d , where N d is the total training number. So that the label and input can be presented as and Y Y Y = [y y y 1 , y y y 2 , · · · , y y y N d ], respectively. Subsequently, the initial reconstruction x x x 0 can be determined, as follows: x where y y y denotes any given input. ISAR data are generally in the form of complex number; however, normal CNN networks support real number only. As a result, the plural data and measurement matrix need to be separated into real part and imagery part. According to [34], complex multiply β β β = Φ Φ Φ × α α α can be expressed as: where β β β and α α α are complex value vectors, i and j are index numbers in row and column direction, (·) denotes real part of the plural, and (·) denotes imaginary part. To process real ISAR data, we decompose the complex-valued data and measurement matrix before importing data into the network, and compose the output to generate imaging results.

Loss Function
Given the characteristic of input and output data, we need the relative error from every pixel of reconstructed result; hence, the loss function for the network training is designed, as follows: where N d denotes the total training number, x x x p i denotes the imaging result after p phases, and x x x i denotes the corresponding label. The first part in Equation (32) denotes the error between reconstruction signal and the label; the second part denotes the error between T (T (x x x p i )) and the label, to ensure the assumption of inverse matrix T · T = E E E. Besides, µ is a regularization parameter, which is set to 0.01 by default. The loss function is optimized while using Adaptive Moment Estimation (Adam) [35].

Experiments
We use plenty of simulation data as training data and then test the network performance with simulated data and real measured ISAR data in order to validate the performance of the proposed method. In addition, results of some conventional CS methods, such as ISTA, AMP, and OMP, are also presented for comparison. Several metrics are also introduced to quantitatively evaluate performance of the CIST-based imaging method and traditional CS-based methods.

Simulated Data
To match the size of real measured ISAR data, simulation scene size is set by N r × N a , where N r = 1024 denotes range dimension and N a = 2048 denotes cross-range dimension. We generate 20 scenes with random points, i.e., the total number of training samples in cross-range dimension is N d = 20 × 1024 = 20,480. Other parameters of simulated radar signal include carrier frequency, bandwidth, pulse width, and pulse repetition frequency are set by 10 GHz, 600 MHz, 100 µs, 200 Hz, respectively. As for measurements matrix, we construct it by Φ Φ Φ = D D DF F F , where F F F ∈ C n×n is the Fourier Transform matrix, and D D D ∈ C m×n is a randomly down-sampling matrix, i.e., m = 512, n = 2048 for the 25% down-sampled rate. Besides, Gaussian white noise is added to echo data, so that the Signal to Noise Ratio (SNR) is 20 dB, to simulate different noise environment. All of the inputs and labels are divided into real parts and imagery parts before the training and composed together after the reconstruction, as described in Section 3.2.
The details of the training processes is as follows. In the training process of CIST, the parameters σ and γ are treated as trainable parameters with initialization 0.02 and 0.002, respectively. The iteration number is set to six and the size of mini-batch is set to 64. Adam optimizer with learning rate 0.0001 was used for training. We introduce several quantification standards, such as normalized mean square error (NMSE), false alarm (FA), image entropy (ENT), target-to-clutter ratio (TCR), where NMSE and FA take the high resolution result in Figure 4c as reference, in order to quantitatively evaluate the performance of different CS-based ISAR imaging methods. Note that the evaluation results are computed after normalized. TCR is defined, as follows: where S denotes the sum of whole simulated imaging result, and S t denotes the target area in it. Target area is defined as the valid area in the labeled imaging result, which can be determined by a threshold. FA is defined as: where function Num(·) denotes the length of the input; S t denotes the target area in imaging result; is the exclusive OR operation. Note that target area S t are determined by the high resolution ISAR imaging result (referring as label), as shown in Figure 4c. In addition, the computational times were collected on a platform of Intel Core i7-7700k @ 4.20 GHz and Nvidia 1080ti.  We use a model of F35 plane as simulated data, and the full data echo as well as high resolution ISAR image result are given in Figure 4. In addition, we validate the performance of CIST with different down-sampling ratio and SNR. In this experiment, simulated echo data with down-sampling rates of 40%, 20% and 12.5% are considered, and the SNR of each echo data are set to 20 dB and 0 dB, respectively. Figures 5 and 6 give the imaging results of four methods under higher and lower SNR, respectively, and Tables 1 and 2 present the corresponding quantitative results. In Figures 5 and 6, the first column gives the echo data at different random down-sampling rate; the second, third, fourth, and fifth give the ISAR imaging results of ISTA, AMP, OMP, and CIST, respectively; the first, second, and third row present random down-sampling rate at 40%, 20%, and 12.5%, respectively. As shown in Figure 5, as compared with the traditional methods, the proposed method CIST can obtain ISAR imaging results of high quality with a more clean background. In addition, as the echo ratio decreases, the image results of the traditional ISTA become worse, while results of CIST remain satisfactory. Furthermore, Table 1 gives the quantitative evaluation of these algorithms. Among the four methods, the proposed CIST-based ISAR imaging method obtains the lowest RNMSE, highest TCR, lowest entropy, and lowest FA in most cases. Except the one for down-sampling ratio at 20%, where OMP has obtained slightly lower entropy than CIST. Furthermore, Figure 6 and Table 2 give the result of a more strict condition, in which the SNR is only 0dB. From the imaging results, it is seen that AMP, OMP, and CIST can achieve satisfactory results, except for traditional ISTA, which has a high side lobe in results. However, there are many 'ghost' in the results of AMP and OMP. CIST has the best focused image and clean background. As demonstrated in Table 2, CIST achieves the lowest RNMSE, ENT, FA, and highest TCR in most cases, which indicates that CIST is more robust than the other three algorithms. In addition, while other algorithms need tens of seconds or even hundred of seconds for ISAR imaging, it takes CIST only less than one second to achieve the satisfactory results.
Under the condition of different down-sampling ratio and SNR, among the four CS-based imaging methods, the proposed CIST-based ISAR imaging method is capable of obtaining imaging results of highest quality within less than one second, which confirms its robustness and efficiency. Most importantly, it takes CIST only less than one second to obtain a satisfactory ISAR imaging result for a data size of 1024 × 2048, while other traditional algorithms generally need tens of second or even thousands of second.

Measured Data
In order to test the network's performance realistically, we use two group of real measured ISAR scatter of a plane(Yak-42) as test data (named as data I and data II). The Yak-42 data was collected by a ground-landed radar which operated at C-band and the bandwidth is 400 MHz. Each of the full data consists of 2048 pulses in cross-range, and each pulse contains 1024 samples. Note that the echo data has been range compressed and well motion compensated.The high resolution ISAR imaging results achieved by RD algorithm with full data are presented in Figure 7. The range compressed data are imported into CIST after randomly down-sampled to ratio at 40%, 20%, and 12.5%.  Figure 8 gives the imaging results of data I of the four algorithms at different down-sampling ratio. The first column shows the the input echo data at different down-sampling ratio; the other four columns present the imaging results. It can be seen that as sampling ratio decreases, the imaging quality of ISTA, AMP, and OMP become worse obviously. Specifically, ISTA and AMP lost the weak reflective parts of the target, and OMP has the highest side lobe. On the other hand, the results of CIST maintain a relatively complete target as well as a clean background. When the sampling ratio is as low as 12.5%, the results of ISTA and AMP are almost unusable, while CIST can still achieve satisfactory imaging result, which implies the robustness of CIST.
When considering the lack of true value of imaging target, which determines the results of RNMSE and FA, we use only TCR and ENT as the quantitation criteria of different methods. From the evaluation result of data I in Table 3, CIST achieves the highest TCR and lowest ENT at sampling ratio at 40% and 20%. In the special case, where the ratio is 12.5%, results of ISTA and AMP have the highest TCR and lowest ENT. But their imaging results are lacking some part of the target, i.e., the wings and fuselage only contain the stronger points but missing some weak points (around cross-range 1010-1040 and range 350-500), which leads to the superficially best evaluation results. After ignoring these disturbing results, CIST still has the better evaluation results than OMP. In addition, while conventional methods take generally more than 30 s for imaging process, CIST only takes less than one second (tens of times faster). Therefore, from the results of measured experiments, CIST has shown its robustness and high computational efficiency.   Figure 9 gives the imaging results of data II. It can be seen that the images that were obtained by ISTA, AMP, and OMP are defocused as sampling ratio decreases, but CIST maintains the fine imaging quality and clean background under all condition. It implies the superior performance of the proposed CIST imaging method. Furthermore, Table 4 gives the numerical evaluation of data II. It shows that CIST reaches the lowest ENT and highest TCR at every down-sampling ratio. Most importantly, CIST takes around 0.9 s for the target imaging, which is much faster than other algorithms that take over 25 s at best and can take up to two minutes to complete. The better imaging quality and less computational time indicate the superior performance and high efficiency of the proposed CIST-based ISAR imaging method.

Effectiveness of Convolution Layer
A suitable sparse transform is one of the key questions in CS problem. The convolution layer in CIST plays an essential role in sparse transform. Candes and Tao have proven that Restricted Isometry Constants (RIP) is the sufficient condition for a perfect reconstruction [5]. For a given measurement matrix Φ and a constant δ k ∈ (0, 1), it should obey: for all k-sparse signal x x x. However, to validate whether the measurement matrix Φ satisfy RIP condition is NP-hard. Hence, coherence µ(Φ Φ Φ) is more common approach, which is defined, as follows: where χ χ χ i denotes the ith column of Φ Φ Φ. In conventional CS imaging methods, they generally use Fourier transform, Discrete cosine transform(DCT), wavelet transform, [36] etc. as sparse transform. The sparsity of measurements should be sparse enough to accurately reconstruct the signal [37,38].
To be specific, the sparsity K of the signal to be accurately reconstructed under l 1 -regularization should satisfy: where µ(Φ Φ Φ) denotes the coherence of measurement matrix Φ Φ Φ. Therefore, a fixed sparse transform is based on the prior information, which is not suitable for different types of data.
One of the advantages of CIST is the convolution-based sparse transform, which is a crucial improvement for conventional ISTA. Whether it is self adaptive and learnable depends on the data characteristic. To validate the effectiveness of convolution layer, we compare CIST with learned ISTA (LISTA) network [39] proposed by Grefor and LeCun, based on which we construct a simplified version of CIST, so that the only difference between them is the existence of convolution layers. We train CIST and LISTA under the same condition, where stepsize, regularization parameters, iteration number and learning rate are initialized as γ 0 = 0.002, σ 0 = 0.02, 6 and 0.0001, respectively. Besides, training data are simulated echoed signal at down-sampling rate 20%. Figure 10 gives the NMSE along with the training epochs. It is seen that CIST has the lower NMSE throughout the training process. Especially when the training just starts, CIST reaches the much lower (around one tenth smaller) NMSE than LISTA. At the end of training, the NMSE of CIST is still averagely one-tenth smaller than LISTA. Besides, CIST has a faster convergence since the NMSE of CIST reaches lowest point after 20 epochs, but LISTA needs around 30 epochs. In addition, Figure 11 shows the Yak-42 imaging results of CIST and LISTA. LISTA lost most part of the target, while CIST remain the fine imaging quality. As a result, we believe that the lower NMSE during training and the better imaging result of CIST can prove the effectiveness of convolution-based sparse transform in CIST.

Prospect of Network-Based ISAR Sparse Imaging Methods
ISAR plays a crucial role in the detection and recognition of moving targets, but non-cooperative targets could be lost during the observation. Accordingly, the CS-based ISAR sparse imaging methods are meaningful. There are two main obstacles of conventional CS imaging methods: low computational efficiency and manually defined parameters. The heavy computational cost limits the real-time applications of ISAR CS imaging to a large extent. Some essential parameters can greatly affect the imaging quality, so they need to be defined carefully, which usually takes several times for trial. Network-based ISAR imaging methods are highly promising to overcome the limitations. Firstly, they generally have higher computational efficiency once they are well trained. For instance, CIST can obtain imaging results of fine quality using much less time than conventional CS imaging methods, which can meet the demand for real-time processing. Secondly, parameters and sparse transform are set to be learnable, which means that they could achieve the optimal point through iterations. To obtain a fine imaging result, we have tuned the parameters of conventional ISTA several times, and every attempt takes tens of second. In addition, as discussed above in Section 5.1, the convolution-based sparse transform along makes a great difference under the same condition.
In a nutshell, network-based ISAR sparse imaging methods have higher computational efficiency and more flexible for moving targets imaging.

Conclusions and Future Work
In this paper, we proposed a CIST-based ISAR imaging method. Because CIST composed the advantage of convolution neural network and traditional ISTA, CIST can learn essential parameters automatically from end-to-end. Besides, CIST replaces the linear sparse transform with nonlinear convolution operations, which makes it more flexible and suitable for target-uncooperative ISAR imaging with under-sampled or non-completed data. Furthermore, it takes CIST less than one second to image an ISAR scene with size of 1024 × 2048, which is dozens of times faster than other three conventional algorithms. Experimental result based on both simulated and measured data indicate that compared with state-of-art traditional CS-based methods, our proposed method can obtain results of sound quality, while maintaining high computational efficiency. In addition, when considering that AMP is an improved version of ISTA (faster for convergence and better reconstruction) and CIST has shown its advantages over other three conventional algorithms evaluated (ISTA, AMP, and OMP), to develop a convolution-involved version of AMP will be our future work.