Hyper-ISTA-GHD: An Adaptive Hyperparameter Selection Framework for Highly Squinted Mode Sparse SAR Imaging

Chen, Tiancheng; Ding, Bailing; Gao, Heli; Liu, Lei; Zhang, Bingchen; Wu, Yirong

doi:10.3390/rs18020369

Open AccessArticle

Hyper-ISTA-GHD: An Adaptive Hyperparameter Selection Framework for Highly Squinted Mode Sparse SAR Imaging

by

Tiancheng Chen

^1,2,3,*

,

Bailing Ding

^1,2,3,

Heli Gao

⁴

,

Lei Liu

⁴,

Bingchen Zhang

^1,2,3 and

Yirong Wu

^1,2,3

¹

National Key Laboratory of Microwave Imaging Technology, Beijing 100094, China

²

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

³

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Institute of Remote Sensing Satellite, China Academy of Space Technology, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(2), 369; https://doi.org/10.3390/rs18020369 (registering DOI)

Submission received: 28 October 2025 / Revised: 27 December 2025 / Accepted: 20 January 2026 / Published: 22 January 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A novel training-free framework, Hyper-ISTA-GHD, is proposed for sparse SAR imaging, which automatically and adaptively optimizes both the regularization parameter and step size during reconstruction.
The integration of a high-order phase compensation into the observation operator successfully extends the applicability of the sparse imaging method to challenging highly squinted SAR configurations.

What are the implications of the main findings?

The proposed method achieves high-precision, rapid imaging for highly squinted SAR, significantly enhancing image quality and robustness against noise without manual parameter tuning or reliance on training datasets.
This framework offers a highly generalizable and efficient imaging solution applicable to diverse SAR modes and scenes, overcoming critical limitations of existing deep unfolding networks and conventional sparse methods.

Abstract

The highly squinted mode, as an operational configuration of synthetic aperture radar (SAR), fulfills specific remote sensing demands. Under equivalent conditions, it necessitates a higher pulse repetition frequency (PRF) than the side-looking mode but produces inferior imaging quality, thereby constraining its widespread application. By applying the sparse SAR imaging method to highly squinted SAR systems, imaging quality can be enhanced while simultaneously reducing PRF requirements and expanding swath. Hyperparameters in sparse SAR imaging critically influence reconstruction quality and computational efficiency, making hyperparameter optimization (HPO) a persistent research focus. Inspired by HPO techniques in the deep unfolding network (DUN), we modified the iterative soft-thresholding algorithm (ISTA) employed in fast sparse SAR reconstruction based on approximate observation operators. Our adaptation enables adaptive regularization parameter tuning during iterations while accelerating convergence. To improve the robustness of this enhanced algorithm under realistic SAR echoes with noise, we integrated hypergradient descent (HD) to automatically adjust the ISTA step size after regularization parameter convergence, thereby mitigating overfitting. The proposed method, named Hyper-ISTA-GHD, adaptively selects regularization parameters and step sizes. It achieves high-precision, rapid imaging for highly squinted SAR. Owing to its training-free iterative minimization framework, this approach exhibits superior generalization capabilities compared to existing DUN methods and demonstrates broad applicability across diverse SAR imaging modes and scene characteristics. Simulations show that the hyperparameter selection and reconstruction results of the proposed method are almost consistent with the optimal values of traditional methods under different signal-to-noise ratios and sampling rates, but the time consumption is only one-tenth of that of traditional methods. Comparative experiments on the generalization performance with DUN show that the generalization performance of the proposed method is significantly better than DUN in extremely sparse scenarios.

Keywords:

highly squinted synthetic aperture radar (HS SAR); sparse SAR imaging; iterative soft-thresholding algorithm (ISTA); hyperparameter optimization (HPO); hypergradient descent (HD); Hyper-ISTA-GHD

1. Introduction

Squint synthetic aperture radar (squint SAR) is a special SAR imaging mode where the beam direction is not orthogonal to the platform motion direction [1,2,3]. Squint SAR is mainly used to improve the imaging capability of radar systems; it can image the area in front or behind, enhancing the acquisition of wide images [4,5]. Different from the range migration in side-looking SAR echo, which is dominated by quadratic coupling terms, the squint SAR echo signal has significant linear range migration and a high-order coupling phase; thus, squint SAR signal processing requires an improved imaging algorithm. Gordon W. Davidson and other researchers proposed some improved chirp scaling (CS) algorithms for squint SAR imaging [6,7,8,9], while range–Doppler (R-D),

ω - k

, and backward projection (BP) algorithms are also used in squint SAR imaging [10,11,12,13]. Highly squinted SAR refers to the situation where the impact of high-order phase error of the echo on imaging cannot be ignored. Due to different radar bands, the angle threshold corresponding to highly squinted mode is also different. For C-band SAR, the threshold of highly squinted is about 30° [9]. Recently, researchers have made a lot of progress in highly squinted SAR imaging [14,15,16,17]. However, highly squinted SAR usually requires a higher pulse repetition frequency (PRF), which results in a larger amount of echo data and a smaller imaging bandwidth. These properties of the highly squinted mode limit its widespread application in engineering.

Given the sparsity of the echo signal in SAR, compressed sensing methods have been widely adopted for sparse SAR reconstruction [18,19,20]. Sparse SAR imaging can utilize the sparsity of the signal to sample below the Nyquist frequency and reconstruct it without distortion. It works effectively in multiple SAR modes, including stripmap SAR, ScanSAR, spotlight SAR, and TOPS SAR [21,22,23]. Despite the great success achieved by sparse reconstruction (SR) in SAR imaging applications, it suffers from several drawbacks. As a regularization framework, the sparse SAR imaging model tries to balance the fidelity to data and prior knowledge to obtain a stable solution. This stability is ensured through a scalar parameter, which is called the regularization parameter or hyperparameter. Selection of this parameter is a fundamental problem within a regularization framework. In addition, in the process of iterative methods using gradient descent, such as iterative soft thresholding algorithm (ISTA) [24], approximate message passing (AMP) [25], and alternating direction multiplier method (ADMM) [26], choosing a proper step size (learning rate) can be difficult. A step size that is too small leads to painfully slow convergence, while a step size that is too large can hinder convergence and cause the loss function to fluctuate around the minimum or even to diverge.

In order to achieve better performance in sparse SAR imaging, research on hyperparameter optimization (HPO), such as the regularization parameter and step size mentioned above, has become a hot topic [27,28,29]. HPO is the process of systematically searching for the best combination of hyperparameters for a machine learning model to achieve optimal performance. Hyperparameters, unlike model parameters learned during training, are predefined settings that control the learning process, such as learning rate, number of layers in a neural network, or regularization strength. The goal is to find values for these hyperparameters that maximize a model’s accuracy, generalization, or other performance metrics. Common techniques include grid search, which exhaustively tests predefined hyperparameter combinations; random search, which samples hyperparameters randomly within specified ranges; and more advanced methods like Bayesian optimization, which uses probabilistic models to guide the search efficiently. Other approaches include gradient-based optimization for differentiable hyperparameters and evolutionary algorithms inspired by natural selection. Challenges include balancing computational costs with search thoroughness and avoiding overfitting during optimization. Researchers have proposed some HPO methods, but these methods face two bottlenecks when applied to sparse SAR imaging methods based on approximate observations: first, many methods are based on smooth convex optimization problems, but the sparse SAR imaging model is nonsmooth, so these methods cannot be applied; second, many methods rely on various properties of the measurement matrix, such as the Hessian matrix or the matrix eigenvalues, but after using the approximate observation operator to replace the measurement matrix in the model, the above intermediate variables cannot be obtained, and they are also not applicable. Therefore, the HPO of the sparse SAR imaging model must select a method that can overcome the above problems.

Batu and Cetin [27] once applied Stein’s unbiased risk estimator (SURE) [30], generalized cross-validation (GCV) [31], and the L-curve [32] to the regularization parameter selection of sparse SAR imaging. Among them, SURE and GCV are based on statistical considerations, and the L-curve is based on graphical tools, which means that these methods have high computational complexity and increase the time-cost of sparse SAR imaging. Recently, the advent of deep unfolding networks (DUNs) has opened new possibilities for solving inverse problems in synthetic aperture radar (SAR) imaging, particularly through their hybrid architecture that synergizes physical models with data-driven learning [33,34,35,36,37,38]. Chen et al. conducted a rigorous analysis [39] of the learned ISTA (LISTA) [33], subsequently developing enhanced variants including analytic LISTA (ALISTA) [40] and hyper-LISTA [41]. The advantage of these DUN-based LISTA architectures lies in incorporating hyperparameters of ISTA—such as regularization parameters and step sizes—as network parameters, which are learned through training to achieve optimal values on the training dataset. Despite the theoretical advantages of DUN-based LISTAs, their practical deployment in highly squinted SAR imaging remains substantially constrained by three fundamental limitations: (1) suboptimal performance when processing real SAR echo with noise, (2) prohibitively high computational complexity in deriving precise observation matrices under squint conditions, and (3) inherent generalization errors when applying pre-trained networks to different radar parameters or scene characteristics.

Another distinct category of HPO problems involves methodological investigations into the selection of step sizes (learning rates). Constant step sizes and constant decay, or exponential decay, are step sizes often used in practice. While this type of step size selection method can ensure the convergence of the algorithm, the convergence speed is generally not the fastest. Barzilai and Borwein proposed the Barzilai–Borwein (BB) method to adjust the step size of gradient descent algorithms [42,43]. The BB step size achieves a faster convergence speed but may fall into a suboptimal solution if the threshold is set unreasonably. Baydin proposed to use the hypergradient descent (HD) method to adaptively learn the optimal step size of gradient descent [44,45,46,47]. HD can quickly find the optimal step size of the current optimization problem without significantly increasing the computational complexity and storage capacity. Recently, HD has become a hot topic in the study of HPO, and a series of research results have been achieved around this method [48,49,50,51].

Inspired by Hyper-LISTA and HD architectures, this work proposes an enhanced ISTA variant, designated as Hyper-ISTA-GHD, specifically optimized for highly squinted SAR imaging. Experimental validation through simulated and empirical datasets demonstrates that Hyper-ISTA-GHD adaptively adjusts model regularization parameters, accelerates convergence rates, and mitigates overfitting risks, whose systematic integration enables high-resolution, rapid imaging in highly squinted SAR configurations. Furthermore, the proposed methodology exhibits transferability across multiple SAR imaging modalities, demonstrating marked advantages over classic methods and DUN frameworks in computational efficiency and generalization capability.

The contributions of this article can be summarized as follows.

Pioneering application of Hyper-ISTA-GHD to fast sparse SAR imaging reconstruction: This study presents the first integration of Hyper-ISTA-GHD into sparse SAR imaging methods, proposing a novel framework that incorporates an approximate observation operator. This advancement enables automatic regularization parameter tuning and significantly accelerates convergence speed.
First-time adoption of hypergradient descent for adaptive Hyper-ISTA-GHD step size optimization: The introduction of hypergradient descent to dynamically adjust Hyper-ISTA-GHD step sizes represents an innovation that substantially enhances algorithm performance under low-SNR conditions in practical SAR echo scenarios, addressing critical stability challenges.
Innovative incorporation of high-order compensation phase into sparse SAR observation operators: By embedding high-order phase compensation into the approximate observation model, this work extends the applicability of sparse SAR imaging methodologies to configurations with larger squint angles, overcoming critical limitations in existing approaches.

The rest of this article is organized as follows. Section 2 introduces the highly squinted SAR imaging principle, the sparse SAR imaging framework. Section 3 presents the two modified ISTAs we proposed: Hyper-ISTA and Hyper-ISTA-GHD. In Section 4, we quantitatively analyze the performance improvement of the proposed method compared with the existing methods and analyzes the performance of the proposed methods in practical data. Section 6 presents the conclusions. Finally, Section 7 discusses the limitations of the proposed method and gives some possible improvements.

2. Signal Model and Related Method

2.1. Highly Squinted SAR Signal Model

According to past studies [9,14], the highly squinted SAR imaging geometry model is shown in Figure 1.

The SAR platform moves along the X-axis with the velocity of

V_{r}

.

X^{'}

is an auxiliary line passing through the target

P (x_{p}^{'}, R_{0})

and parallel to the X-axis. Point O is the platform position at the azimuth time zero,

O^{'}

is the azimuth center of the footprint.

\vec{O O^{'}}

and

\vec{S Q}

are the vectors of the antenna pointing direction, the angle of which away from the X-axis is (

π / 2 - θ_{s}

), where

θ_{s}

is the squint angle.

R_{0} = |\vec{O O^{'}}|

is the closest slant range between the target P and platform trajectory along the squint direction, while

R (t_{a}, R_{0})

is the instantaneous slant range between the flight platform and the target at azimuth time

t_{a}

.

t_{p}

represents the moment when the beam center of the squint SAR illuminates target P.

x_{p} = V_{r} t_{p}

is the azimuth distance between the azimuth footprint center

O^{'}

and the target P.

Assuming that the radar transmits a linear frequency modulated (LFM) pulse, the demodulated radar signal

s (τ, t_{a}; R_{0})

received from the target

P (x_{p}^{'}; R_{0})

is given by

\begin{matrix} s (τ, t_{a}; R_{0}) = W_{R} [τ - 2 R (t_{a}; R_{0}) / c] W_{A} (t_{a}) \\ \times exp [- j \frac{4 π f_{c}}{c} R (t_{a}; R_{0})] exp [j π K_{r} (τ - \frac{R (t_{a}; R_{0})}{c})] \end{matrix}

(1)

where

τ

is the fast time,

t_{a}

is the slow time,

f_{c}

is the carrier frequency,

K_{r}

is the range frequency modulation rate, and c is the speed of light.

W_{R}

and

W_{A}

are the range and azimuth envelopes, respectively.

According to the Law of Cosines in

Δ S P Q

shown in Figure 1, the instantaneous slant range

R (t_{a}, R_{0})

can be given by the following equation:

\begin{matrix} R (t_{a}, R_{0}) = \sqrt{R_{0}^{2} + V_{r}^{2} {(t_{a} - t_{p})}^{2} - 2 R_{0} V_{r} (t_{a} - t_{p}) cos (\frac{π}{2} - θ_{s})} . \end{matrix}

(2)

Expanding Equation (2) at

t_{a} = t_{p}

to its Taylor series and manipulating it, we have

\begin{matrix} R (t_{a}; R_{0}) \approx & R_{0} - V_{r} (t_{a} - t_{p}) sin θ_{s} + \frac{V_{r}^{2} cos θ_{s}}{2 R_{0}} {(t_{a} - t_{p})}^{2} \\ + \frac{V_{r}^{3} sin θ_{s} {cos}^{2} θ_{s}}{2 R_{0}^{2}} {(t_{a} - t_{p})}^{3} + \dots \\ = & R_{0} - \frac{λ}{2} f_{dc} t_{a} - \frac{λ}{4} f_{dr} t_{a}^{2} - \frac{λ}{12} {\dot{f}}_{dr} t_{a}^{3} + \dots \end{matrix}

(3)

where

\begin{matrix} \{\begin{matrix} f_{dc} = \frac{2 V_{r}}{λ} sin θ_{s} \\ f_{dr} = - \frac{2 V_{r}^{2}}{λ R_{0}} {cos}^{2} θ_{s} \\ {\dot{f}}_{dr} = - \frac{6 V_{r}^{3} sin θ_{s} {cos}^{2} θ_{s}}{λ R_{0}^{2}} . \end{matrix} \end{matrix}

(4)

In Equation (3),

λ = c / f_{c}

is the wavelength, and

- (λ / 2) f_{dc} t_{a}

,

- (λ / 4) f_{dr} t_{a}^{2}

, and

- (λ / 12) {\dot{f}}_{dr} t_{a}^{2}

are called the linear range walk (LRW), range cell curvature, and cubic range migration terms, respectively.

The validity of range history approximations depends critically on the imaging geometry. For broadside mode, the slant range history follows the Pythagorean theorem, where a second-order Taylor expansion is generally sufficient for accurate focusing. In small-squint mode, the geometry shifts to the Law of Cosines. While this introduces a significant linear range walk term compared to the broadside case, the cubic and higher-order phase terms typically remain negligible. Therefore, a second-order expansion—accounting for both the linear walk and the range curvature—remains a valid approximation for imaging. However, in highly squinted mode, this assumption breaks down. The large squint angle causes the cubic and higher-order phase terms in the Taylor expansion to become substantial. Ignoring these terms leads to significant defocusing. Consequently, precise focusing for highly squinted SAR requires specific analysis and compensation for these higher-order phases, which simpler second-order models fail to capture. The extended nonlinear chirp scaling (ENLCS) algorithm proposed by An et al. [14] can compensate for these high-order terms well and achieve good imaging results under full sampling conditions. However, the high PRF required by high-squint SAR leads to a small mapping bandwidth, and the electronic scanning antenna pattern under high-squint conditions is severely deformed, resulting in high side lobes. In this paper, we use the sparse SAR imaging method to suppress the high side lobes generated by the non-ideal antenna pattern of high-squint SAR under undersampling conditions to achieve high-precision imaging.

2.2. Sparse SAR Imaging Method

According to the theory of sparse microwave imaging [18], we express the imaging process as a linear model:

y = Φ x + n

(5)

where

x = vec (X) \in C^{N \times 1}

,

X \in C^{N_{p} \times N_{q}}

is the backscattering coefficient matrix of the two-dimensional imaging scene,

N = N_{p} \times N_{q}

.

y = vec (Y) \in C^{M \times 1}

,

Y \in C^{N_{a} \times N_{r}}

is the two-dimensional echo matrix, and

M = N_{a} \times N_{r}

.

Φ \in C^{M \times N}

is the measurement matrix of SAR system.

The sparse microwave imaging process involves solving a regularization problem; that is, to minimize the objective function

\hat{x} = \underset{x}{arg min} {∥y - Φ x∥}_{2}^{2} + \sum_{i = 1}^{K} λ_{i} p_{i} (x)

(6)

where

\hat{x}

denotes the imaging result,

{∥\cdot∥}_{2}

denotes the

ℓ_{2}

-norm, and

λ_{1}, λ_{2}, \dots, λ_{K}

denote the regularization parameters controlling penalty terms

p_{1} (\cdot), p_{2} (\cdot), \dots, p_{K} (\cdot)

, respectively.

In order to facilitate the derivation, the

ℓ_{1}

-norm is used as the penalty term, Equation (6) becomes a least absolute shrinkage and selection operator (LASSO) regression

\hat{x} = \underset{x}{arg min} {∥y - Φ x∥}_{2}^{2} + λ {∥x∥}_{1}

(7)

which can be solved using ISTA

\begin{matrix} \{\begin{matrix} {\hat{z}}^{(k + 1)} = {\hat{x}}^{(k)} + μ Φ^{T} (y - Φ {\hat{x}}^{(k)}) \\ {\hat{x}}^{(k + 1)} = sign ({\hat{z}}^{(k + 1)}) ⊙ max ((|{\hat{z}}^{(k + 1)}| - λ μ), 0) \end{matrix} \end{matrix}

(8)

where k is the number of iterations,

{\hat{z}}^{(k)} = vec (Z) \in C^{N \times 1}

,

Z \in C^{N_{p} \times N_{q}}

is an intermediate variable,

μ

is the step size of the gradient descent operation,

sign (\cdot)

is the sign function, and ⊙ is the Hadamard product.

2.3. Approximate Observation (AO)

However, the requirements of computing power and storage power are huge due to the scale of the exact measurement matrix

Φ

. Similarly, the computational efficiency of the proposed SAR imaging model will be greatly reduced by this exact measurement matrix. To overcome this problem and better correct the high-order phase and coupling phase in the highly squinted SAR echo, the ENLCS algorithm is adopted to construct the approximate measurement operator as in [21]. Let

I_{ENLCS} (\cdot)

and

G_{ENLCS} (\cdot)

represent the matched filter (MF; imaging) operator and inverse MF (IMF; echo generating) operator, respectively. The definitions of these operators are as follows:

\begin{matrix} I_{ENLCS} (\cdot) = F_{a}^{- 1} (F_{a} (F_{a}^{- 1} (F_{r}^{- 1} (F_{a} (F_{r} (\cdot) ⊙ H_{LRWC}) ⊙ H_{BRC}) ⊙ H_{IF}) ⊙ H_{ANLCS}) ⊙ H_{AC}) \end{matrix}

(9)

\begin{matrix} G_{ENLCS} (\cdot) = F_{r}^{- 1} (F_{a}^{- 1} (F_{r} (F_{a} (F_{a}^{- 1} (F_{a} (\cdot) ⊙ H_{AC}^{H}) ⊙ H_{ANLCS}^{H}) ⊙ H_{IF}^{H}) ⊙ H_{BRC}^{H}) ⊙ H_{LRWC}^{H}) \end{matrix}

(10)

where

F_{a}

,

F_{r}

,

F_{a}^{- 1}

, and

F_{r}^{- 1}

denote the fast Fourier transform (FFT) and inverse FFT (IFFT) operators along the azimuth direction and range direction, respectively, and

H_{LRWC}

,

H_{BRC}

,

H_{IF}

,

H_{ANLCS}

, and

H_{AC}

are the phase terms for linear range walk correction, bulk second range compression, fourth-order azimuth filter, fourth-order chirp scaling factor, and azimuth compression, respectively, whose expressions are detailed in [14], and are not repeated here.

In this way, the exact measurement matrix

Φ

is approximately represented by

\tilde{Φ} \overset{Δ}{=} G_{ENLCS} (\cdot)

, whose conjugate transpose is

{\tilde{Φ}}^{H} \overset{Δ}{=} I_{ENLCS} (\cdot)

which means that

{\tilde{Φ}}^{H} \tilde{Φ} = I

. The measurement equation in Equation (5) can be approximately written in matrix form as

\begin{matrix} Y \approx \tilde{Φ} X + N = G (X) + N . \end{matrix}

(11)

By introducing the MF and IMF operators, the sparse highly squinted SAR imaging method solved by IST is shown in Algorithm 1. It can be seen that the algorithm needs to input two hyperparameters

λ

and

μ

. Both hyperparameters need to input empirical values based on prior information, such as scene sparsity and echo signal-to-noise ratio (SNR), and be adjusted according to different imaging scenes. This limits the practical application of sparse SAR imaging methods in highly squinted SAR.

Algorithm 1 ISTA for sparse highly squinted SAR Imaging

Input: $Y$ , $I_{ENLCS} (\cdot)$ , $G_{ENLCS} (\cdot)$ , $λ$ , $μ$ , $I_{\max}$ , $ϵ$

Initialization: ${\hat{X}}^{(0)} = 0$ , $r^{(0)} = 0$ , $k = 1$

Iterations:
While

k < I_{\max}

and

r^{(k)} > ϵ

{\hat{Z}}^{(k + 1)} = {\hat{X}}^{(k)} + μ I_{ENLCS} (Y - G_{ENLCS} ({\hat{X}}^{(k)}))

{\hat{X}}^{(k + 1)} = sign ({\hat{Z}}^{(k + 1)}) ⊙ max ((|{\hat{Z}}^{(k + 1)}| - λ μ), 0)

r^{(k)} = {∥{\hat{X}}^{(k + 1)} - {\hat{X}}^{(k)}∥}_{2}^{2} / {∥{\hat{X}}^{(k)}∥}_{2}^{2}

End

Output: ${\hat{X}}^{(k + 1)}$

2.4. Barzilar–Borwein (BB) Method

The Barzilar–Borwein (BB) method is a special gradient method that often performs better than general gradient methods [42]. Formally, the descent direction of the BB method is still the negative gradient direction

- \nabla f (x^{k})

at point

x^{k}

, but the step size

μ_{k}

is not directly given by the line search algorithm. Consider the format of the gradient descent method:

x^{k + 1} = x^{k} - μ_{k} \nabla f (x^{k})

(12)

The

μ_{k}

selected by the BB method is the solution to one of the following two optimal problems:

\begin{matrix} \underset{μ}{m i n} & ∥ μ g^{k - 1} - s^{k - 1} ∥^{2}, \\ \underset{μ}{m i n} & ∥ g^{k - 1} - μ^{- 1} s^{k - 1} ∥^{2}, \end{matrix}

(13)

where

s^{k - 1} \overset{d e f}{=} x^{k} - x^{k - 1}

and

g^{k - 1} \overset{d e f}{=} \nabla f (x^{k}) - \nabla f (x^{k - 1})

.

It is easy to verify that the solutions are

\begin{matrix} μ_{B B 1}^{k} \overset{d e f}{=} & \frac{{(s^{k - 1})}^{T} g^{k - 1}}{{(g^{k - 1})}^{T} g^{k - 1}} \\ μ_{B B 1}^{k} \overset{d e f}{=} & \frac{{(s^{k - 1})}^{T} s^{k - 1}}{{(s^{k - 1})}^{T} g^{k - 1}} \end{matrix}

(14)

Therefore, two iterative formats of the BB method can be obtained:

\begin{matrix} x^{k + 1} = x^{k} - μ_{B B 1}^{k} \nabla f (x^{k}), \\ x^{k + 1} = x^{k} - μ_{B B 2}^{k} \nabla f (x^{k}) . \end{matrix}

(15)

For general problems, the calculated step size may be too large or too small, so it is necessary to truncate the step size to the upper and lower bounds; that is, select

0 < μ_{m} < μ_{M}

so that

μ_{m} \leq μ_{k} \leq μ_{M} .

(16)

The selection of upper and lower truncation is very important. Inappropriate upper and lower bounds may cause the algorithm to oscillate, fail to converge or converge slowly. Therefore, it is necessary to select a simple and stable step-size update algorithm, which we will discuss in the next section.

3. Sparse Imaging Method of Highly Squinted SAR Based on Hyper-ISTA-GHD

3.1. Hyper-ISTA-AO and Hyper-ISTA-BB

Inspired by Chen et al. [41], three principal modifications have been incorporated into Algorithm 1: (1) integration of momentum terms in gradient descent iterations, (2) implementation of support selection mechanisms within soft-thresholding operations, and (3) adaptive updating of hyperparameters. For notational compactness, the identifiers

I_{ENLCS} (\cdot)

and

G_{ENLCS} (\cdot)

are systematically abbreviated as

I

and

G

, respectively, throughout subsequent formulations.

3.1.1. Momentum Term

As is well known in the optimization community, adding a momentum term can accelerate many iterative algorithms. To avoid extra iteration overhead, we add a momentum term to the gradient descent formula in Algorithm 1 for

k > 1

:

\begin{matrix} {\hat{Z}}^{(k + 1)} = {\hat{X}}^{(k)} + μ^{(k)} I (Y - G ({\hat{X}}^{(k)})) + μ_{m}^{(k)} ({\hat{X}}^{(k)} - {\hat{X}}^{(k - 1)}) . \end{matrix}

(17)

The momentum term effectively enhances convergence acceleration; however, excessive

μ_{m}

settings may induce oscillatory instability that prevents algorithmic convergence. The principled selection strategy for

μ_{m}

will be methodically presented in subsequent methodological developments.

3.1.2. Soft–Hard (SH) Threshold

As common threshold operations, soft threshold and hard threshold have their own characteristics and uses. For elements less than the threshold, both will set them to 0; for elements greater than the threshold, the soft threshold operation will subtract the threshold, while the hard threshold will not change its value. This difference leads to the soft-threshold operation being able to better suppress noise but unable to maintain the target amplitude (it will decrease), and vice versa. Therefore, it is a natural idea to develop a new threshold operation that combines the advantages of both.

The SH thresholding operator is improved with support selection. At k-th iteration a certain percentage of entries with the largest magnitudes are trusted as “true support” and will not be passed through thresholding. Specifically, the i-th element of

η_{T^{(k)}}^{p^{(k)}} (x)

is defined as

{(η_{T^{(k)}}^{p^{(k)}} (x))}_{i} = \{\begin{matrix} x_{i} & : x_{i} > T^{(k)}, & i \in S^{p^{(k)}} (x) \\ x_{i} - T^{(k)} & : x_{i} > T^{(k)}, & i \notin S^{p^{(k)}} (x) \\ 0 & : o t h e r w i s e \\ x_{i} + T^{(k)} & : x_{i} < - T^{(k)}, & i \notin S^{p^{(k)}} (x) \\ x_{i} & : x_{i} < - T^{(k)}, & i \in S^{p^{(k)}} (x) \end{matrix}

(18)

where

T^{(k)} = λ^{(k)} μ^{(k)}

and

S^{p^{(k)}} (x)

includes the elements with the largest

p^{(k)}

magnitudes in vector

x \in R^{n}

:

\begin{matrix} S^{p^{(k)}} (x) = \{i_{1}, i_{2}, \dots, i_{p^{(k)}} | |x_{i_{1}}| \geq |x_{i_{2}}| \geq \dots |x_{i_{p^{(k)}}}| \dots \geq |x_{i_{n}}|\} . \end{matrix}

(19)

With

p^{(k)} = 0

, the operator reduces to the soft-thresholding and, with

p^{(k)} = n

, the operator reduces to the hard-thresholding. Thus, operator

η_{T^{(k)}}^{p^{(k)}} (x)

is actually a balance between soft- and hard-thresholding. When k is small,

p^{(k)}

is usually chosen as a small fraction of n and the operator tends to not trust the signal, and vice versa.

3.1.3. Adaptive HPO

Based on ref. [41], we design the following instance-adaptive parameter formula as the recovery signal

{\hat{X}}^{(k)}

is not accurate enough (equivalently, as k is small):

\begin{matrix} μ^{(k)} = 1 \end{matrix}

(20)

\begin{matrix} λ^{(k)} = c_{1} {∥I (Y - G ({\hat{X}}^{(k)}))∥}_{1} \end{matrix}

(21)

\begin{matrix} μ_{m}^{(k)} = c_{2} {∥{\hat{X}}^{(k)}∥}_{0} \end{matrix}

(22)

\begin{matrix} p^{(k)} = c_{3} min (log (\frac{{∥I (Y)∥}_{1}}{{∥I (Y - G ({\hat{X}}^{(k)}))∥}_{1}}), n), \end{matrix}

(23)

where

c_{1} > 0

,

c_{2} > 0

,

c_{3} > 0

are proper parameters.

We call the ISTA based on an approximate observation operator that incorporates the above three improvements Hyper-ISTA-AO. Work by our peers and ourselves has shown that this improved algorithm performs poorly under conditions where the data dimension is too large and noisy. This is because the algorithm uses the noise in the data as a target for learning, resulting in overfitting. One idea to solve the overfitting problem of Hyper-ISTA is to quickly reduce the step size after the algorithm reaches initial convergence. This is a technique from the field of machine learning. The BB method given in the previous section can be used to update the step size. We call this combination Hyper-ISTA(-AO)-BB. However, the BB method is very prone to oscillation, which makes us want to find a better step size selection method.

3.2. Generalized Hypergradient Descent (GHD)

For the hyperparameter

μ^{(k)}

in Equation (20), we introduce the hypergradient descent (HD) method proposed by Baydin [45] for adaptive optimization. The most basic form of HD can be derived from regular gradient descent as follows. Given an objective function f and original optimization variable

θ^{(k - 1)}

, evaluates the gradient

\nabla f (θ^{(k - 1)})

and moves against it to arrive at updated parameters

\begin{matrix} θ^{(k)} = θ^{(k - 1)} - μ \nabla f (θ^{(k - 1)}) \end{matrix}

(24)

where

μ

is the step size (learning rate). In addition to this update rule, we would like to derive an update rule for the step size

μ

itself. We make the assumption that the optimal value

μ

does not change much between two consecutive iterations so that we can use the update rule for the previous step to optimize

μ

in the current one. For this, we will compute

\partial f (θ^{(k - 1)}) / \partial μ

, the partial derivative of the objective f at the previous step with respect to the step size

μ

. Noting that

θ^{(k - 1)} = θ^{(k - 2)} - μ \nabla f (θ^{(k - 2)})

, i.e., the result of the previous update step, and applying the chain rule, we obtain

\begin{matrix} \frac{\partial f (θ^{(k - 1)})}{\partial μ} & = \nabla f {(θ^{(k - 1)})}^{T} \cdot \frac{\partial (θ^{(k - 2)} - μ \nabla f (θ^{(k - 2)}))}{\partial μ} \\ = \nabla f {(θ^{(k - 1)})}^{T} \cdot (- \nabla f (θ^{(k - 2)})) \end{matrix}

(25)

which allows us to compute the needed hypergradient with a simple dot product and the memory cost of only one extra copy of the original gradient. Using this hypergradient, we construct a higher-level update rule for the step size as

\begin{matrix} μ^{(k)} = μ^{(k - 1)} - β \frac{\partial f (θ^{(k - 1)})}{\partial μ} = μ^{(k - 1)} + β \nabla f {(θ^{(k - 1)})}^{T} \nabla f (θ^{(k - 2)}) \end{matrix}

(26)

introducing

β

as the hypergradient step size. We call the previous rule the additive rule of HD. It is usually better for this gradient descent to set

\begin{matrix} β = β_{μ} \frac{μ^{(k - 1)}}{∥\nabla f (θ^{(k - 1)})∥ ∥\nabla f (θ^{(k - 2)})∥} \end{matrix}

(27)

so the rule of HD can also be

\begin{matrix} μ^{(k)} = μ^{(k - 1)} (1 - β^{'} \frac{\nabla f {(θ^{(k - 1)})}^{T} \nabla f (θ^{(k - 2)})}{∥\nabla f (θ^{(k - 1)})∥ ∥\nabla f (θ^{(k - 2)})∥}) . \end{matrix}

(28)

We call this rule the multiplicative rule of HD. One of the practical advantages of this multiplicative rule is that the multiplicative adaptation is, in general, faster than the additive adaptation. We then modify Equation (24) to use the sequence

μ^{(k)}

to become

\begin{matrix} θ^{(k)} = θ^{(k - 1)} - μ^{(k)} \nabla f (θ^{(k - 1)}) \end{matrix}

(29)

The HD method exhibits very good performance in smooth convex optimization problems. However, the sparse SAR imaging problem based on the

ℓ_{1}

-norm is a nonsmooth convex optimization problem, and the HD method cannot be directly applied to it. Therefore, a slight deformation of the HD is needed here to make it applicable to our problem.

First, define the generalized gradient as [52]

\begin{matrix} G^{(k)} = θ^{(k)} - θ^{(k - 1)} . \end{matrix}

(30)

Then, substituting the generalized gradient into the multiplication rule, we obtain

\begin{matrix} μ^{(k)} = μ^{(k - 1)} (1 - β^{'} \frac{G {(θ^{(k - 1)})}^{T} G (θ^{(k - 2)})}{∥G (θ^{(k - 1)})∥ ∥G (θ^{(k - 2)})∥}) . \end{matrix}

(31)

We call this deformation the generalized HD (GHD). GHD is applicable to nonsmooth optimization problems, so it can be used to update the step size of Hyper-ISTA-AO above. We call this combination Hyper-ISTA(-AO)-GHD.

3.3. Overall Framework of Hyper-ISTA-GHD

The iterative process of Hyper-ISTA-GHD can be divided into two stages. In the early stage, in order to adjust the regularization parameters and make the reconstruction results converge quickly, the momentum term is introduced in the gradient descent step, and the momentum step size and regularization parameters are adaptively updated, and the step size is kept unchanged. After the regularization parameters converge, the update is stopped, and the momentum term is removed. At the same time, the hypergradient descent is introduced to start adjusting the step size, which avoids overfitting in noisy scenes. The threshold operation based on support selection runs through both stages. See Algorithm 2 for the specific steps and Figure 2 for the flowchart.

Algorithm 2 Hyper-ISTA-GHD

Input: $Y$ , $I$ , $G$ , $c_{1}$ , $c_{2}$ , $c_{3}$ , $β_{μ}$ , $I_{\max}$ , $ϵ$

Initialization: ${\hat{X}}^{(0)} = 0$ , $μ^{(0)} = 1$ , $r^{(0)} = 0$ , $k = 1$

Iterations:
While

k < I_{\max}

and

r^{(k)} > ϵ

Update

λ^{(k)}

via Equation (21) (Until

λ

convergence)
Update

μ_{m}^{(k)}

via Equation (22) (Until

λ

convergence)
Update

p^{(k)}

via Equation (23)
Update

μ^{(k)}

via Equation (31) (Since

λ

convergence)
Update

{\hat{Z}}^{(k + 1)}

via Equation (17)

{\hat{X}}^{(k + 1)} = η_{T^{(k)}}^{p^{(k)}} ({\hat{Z}}^{(k + 1)})

r^{(k)} = {∥{\hat{X}}^{(k + 1)} - {\hat{X}}^{(k)}∥}_{2}^{2} / {∥{\hat{X}}^{(k)}∥}_{2}^{2}

If (

λ

convergence)

μ_{m}^{(k)} = 0

End
End

Output: ${\hat{X}}^{(k + 1)}$

3.4. Complexity and Convergence

Hyper-ISTA-GHD has almost no additional matrix operations, so its computational complexity is

O (N_{a} \times N_{r})

, the same as ISTA. In contrast, methods such as SURE, GCV, and L-curve require

N_{λ}

regularization parameter samplings, each of which is a complete ISTA process, so the complexity of the above algorithms is

O ((N_{λ} + 1) \times N_{a} \times N_{r})

. Given that the accuracy of the above algorithm results is approximate, we only select the L-curve, which has a slightly smaller complexity, for comparative experiments.

The latest research shows that HD can achieve local superlinear convergence, which depends on smoothness and strong convexity [53]. GHD, which is applicable to nonsmooth problems, has only just been proposed, and its convergence and convergence conditions need to be rigorously mathematically demonstrated. Our work only focuses on the convergence effect of this method in practical work.

4. Simulation and Experiment

4.1. Experiment Settings

4.1.1. Simulation Scene

To demonstrate the effectiveness of the regularization parameter and step size selection method we proposed, simulated spaceborne highly squinted SAR echoes were constructed according to the parameters in Table 1 for comparative experiments.

To ensure reproducible performance comparisons across algorithms, all simulations were executed on a standardized platform comprising an Intel Core i9-12900HX processor with 32 GB system memory. Algorithm implementations utilized MATLAB 2024a’s native computational framework, deliberately excluding third-party acceleration toolboxes to maintain baseline performance metrics.

4.1.2. Evaluation Indicators

In terms of highly squinted sparse SAR imaging, impulse response width (IRW), peak sidelobe ratio (PSLR), and integrated sidelobe ratio (ISLR) are used to measure the imaging accuracy of different algorithms.

IRW is the 3 dB width of the main lobe in the azimuth and/or range direction. It quantifies spatial resolution, with a narrower IRW indicating better resolution.

PSLR is the ratio (in dB) between the highest sidelobe amplitude and the main lobe peak amplitude. Lower PSLR reduces false target risks:

PSLR = 20 {log}_{10} (\frac{Peak Sidelobe Level}{Main Lobe Peak}) (dB)

(32)

ISLR is the ratio (in dB) of the total power in the sidelobes to the power in the main lobe. Lower ISLR indicates less energy leakage:

ISLR = 10 {log}_{10} (\frac{\sum_{sidelobes} {| g (x) |}^{2}}{\sum_{main lobe} {| g (x) |}^{2}}) (dB)

(33)

where

g (x)

is the impulse response function.

In terms of HPO, the evaluation framework comprises two principal aspects: (i) systematic examination of regularization parameters’ effects on reconstruction fidelity through quantitative metrics, including peak signal-to-noise ratio (PSNR) and normalized mean squared error (NMSE), and (ii) comprehensive assessment of convergence dynamics via NMSE iteration trajectories paired with computational time measurements to evaluate step size configurations.

PSNR serves as a standard benchmark in digital imaging applications, primarily quantifying the fidelity of reconstructed visual data relative to reference sources. This metric has become an essential criterion for assessing compression algorithm efficacy, with established utilization spanning multimedia signal processing, computer vision systems, and image restoration frameworks.

The definition of PSNR is based on the mean squared error (MSE). The formulas are as follows:

To calculate MSE, we use:

\begin{matrix} MSE = \frac{1}{m \cdot n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {(I (i, j) - K (i, j))}^{2} \end{matrix}

(34)

where

I (i, j)

is the pixel value of the original image,

K (i, j)

is the pixel value of the reconstructed or compressed image, and m and n are the height and width of the image, respectively.

To calculate PSNR, the formula is:

PSNR = 10 {log}_{10} (\frac{{(max (| x_{ref} |))}^{2}}{MSE})

(35)

where

max (| x_{ref} |)

denotes the peak magnitude value of the reference (ground truth) SAR image, and

MSE

represents the mean squared error between the reconstructed image and the reference.

NMSE constitutes a fundamental benchmarking framework for evaluating signal estimation systems, mathematically characterizing the discrepancy between source signals and their reconstructed counterparts. This measure has been widely adopted in inverse problem analysis as a standardized methodology for quantifying estimation precision in computational reconstruction paradigms.

The formula for NMSE is given by

\begin{matrix} NMSE = \frac{MSE}{Var (I)} \end{matrix}

(36)

where

\begin{matrix} Var (I) = \frac{1}{m \cdot n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {(I (i, j) - μ)}^{2} \end{matrix}

(37)

The normalized mean squared error (NMSE) implements variance-based standardization of reconstruction errors through normalization relative to source signal energy characteristics. This dimensionless metric operates within a bounded interval [0, 1], exhibiting a monotonically decreasing relationship with reconstruction accuracy. Such normalization facilitates cross-dataset interpretability in inverse problem solutions while maintaining scale-invariant performance benchmarking across heterogeneous signal recovery paradigms.

We clarify the statistical nature of the metrics as follows.

Average Performance: The values for PSNR, NMSE, and time represent the average performance calculated over the entire test dataset (10 held-out images). This aggregation ensures that the reported metrics reflect the generalizability of the method rather than the result of a specific best-case scenario.
Runtime Definition: The “Time (s)” column indicates the average wall-clock inference time per image.
–
For deep learning methods (including the proposed framework), this refers to the forward pass duration for a single image on the GPU, excluding data loading overhead.
–
For traditional algorithms (implemented in MATLAB), this refers to the execution time measured via standard timing functions (tic/toc) for reconstructing a single image on the CPU.

4.2. Simulation Under Different Squint Angles

The parameters in Table 1 are used to construct the highly squinted SAR point target echo, and the ENLCS algorithm and the sparse reconstruction method based on the ENLCS operator (SR-ENLCS) are used to image it. The imaging results are shown in Figure 3, and the indicators are shown in Table 2.

It can be found that as the squint angle increases, the azimuth resolution of the two algorithms gradually deteriorates. ENLCS can control various indicators in a lower range within the range of [20–50]°. The proposed method can further improve the resolution and significantly reduce PSLR and ISLR on this basis. The simulation results verify the effectiveness of the high-squint sparse SAR imaging method based on high-order phase compensation for imaging point targets under given parameter conditions.

4.3. Comparison Simulation of HPO Methods

The scene is set with

3 \times 3

ideal point targets with

θ_{s} = 50^{°}

. To observe the impact of

λ

values on reconstruction accuracy, especially the accuracy of weak targets, the energy of these point targets is set to decrease from the upper right to the lower left of the scene, with a 3 dB difference in energy between any two adjacent points. The simulation scene target setup is shown in Figure 4.

In order to compare the performance of the proposed method with other HPO methods, seven comparative experiments were set up under each set of parameters, namely, matched filtering (MF), sparse SAR imaging with

λ

too small, sparse SAR imaging with

λ

too large, L-curve [32], Hyper-ISTA-AO, Hyper-ISTA-BB, and Hyper-ISTA-GHD.

4.3.1. Variable SNR

The simulation results for different SNRs are shown in Figure 5, and the quantitative metrics are presented in Table 3.

Observing the experimental results, it is evident that the matched filtering method achieves the shortest computation time but lacks noise suppression capability. Inappropriate manual regularization parameter settings may lead to either insufficient noise suppression or over-suppression of weak targets. While the L-curve method can select optimal regularization parameters, its computational time remains excessively long. Hyper-ISTA demonstrates adaptive regularization parameter adjustment during iterations approaching near-optimal values. Benefiting from support selection, it outperforms ISTA in preserving strong targets, yet struggles to converge to better PSNR and NMSE under noisy conditions. Hyper-ISTA-GHD effectively adjusts step sizes after regularization parameter convergence, preventing metric rebound in noisy environments. It achieves accurate reconstruction of targets with intensity differences exceeding −10 dB while attaining optimal metrics. Notably, Hyper-ISTA-GHD accomplishes this performance without significant increases in memory consumption, computational complexity, or runtime—its computation time only slightly exceeds classical ISTA while being substantially lower than L-curve methods.

4.3.2. Variable PRF

The simulation results for different SNRs are shown in Figure 6, and the quantitative metrics are presented in Table 4.

Unlike the previous section, PRF reduction decreases the target peak energy under constant noise power. This occurs because sub-Nyquist PRF causes azimuth ambiguity, dispersing target energy from the mainlobe to the ambiguous regions. As PRF decreases, although the L-curve method identifies optimal regularization parameters, the amplitude reduction from soft-threshold iterations prevents target-noise separation. In contrast, Hyper-ISTA-GHD maintains strong scatterers’ energy through support selection while suppressing noise via soft-threshold iterations. Even when PRF reduction-induced azimuth ambiguity severely degrades target amplitude, Hyper-ISTA-GHD effectively distinguishes targets from noise. Simulation results confirm Hyper-ISTA-GHD’s robustness under undersampling conditions, demonstrating its advantage in enhancing mapping width through PRF reduction—a crucial requirement for highly squinted SAR imaging.

4.4. Analyzes

To investigate the performance differences between Hyper-ISTA and Hyper-ISTA-GHD, Figure 7 shows NMSE convergence curves for ISTA, Hyper-ISTA, and Hyper-ISTA-GHD under identical simulation conditions, and Figure 8 displays adaptive regularization parameter adjustment curves. In the initial convergence stage, the momentum term accelerates parameter adjustment so that the regularization parameter can match the optimal value obtained by the L-curve within 3–4 iterations. In the secondary convergence stage, ISTA and Hyper-ISTA without step size adaptation will overfit under noise pollution, resulting in a decrease in reconstruction quality. In contrast, Hyper-ISTA-BB and Hyper-ISTA-GHD use gradient information to dynamically adjust the iteration step size, suppress overfitting, and always maintain a low NMSE level. In contrast, the BB algorithm converges more slowly than GHD due to oscillation, and GHD can make the step size converge quickly and smoothly. The combination of these techniques makes Hyper-ISTA-GHD effective and robust under low signal-to-noise ratio and undersampling conditions.

Figure 8 further shows that the proposed method can converge the regularization parameter to the optimal parameter obtained by the L-curve method under different SNR conditions and different sampling rates, proving the effectiveness and reliability of the proposed method under different conditions.

To demonstrate the robustness of the proposed framework and justify the default settings for the coefficients defined in Equations (21)–(23), we performed a sensitivity analysis on

c_{1}

, and

c_{2}

, as shown in Figure 9.

Sensitivity of $c_{1}$ (Gradient Scaling): The coefficient $c_{1}$ acts as an empirical scaling factor to align the magnitude of the gradient-based estimation with the optimal regularization parameter derived from the L-curve. Our experiments show that the model maintains stable performance within a reasonable range around the chosen default value.
Sensitivity of $c_{2}$ (Acceleration Factor): The coefficient $c_{2}$ controls the acceleration of convergence during the early stages of the unfolding network. Variations in $c_{2}$ primarily affect the convergence speed rather than the final reconstruction accuracy.
Insensitivity of $c_{3}$ : Extensive experiments indicate that the model is highly insensitive to variations in $c_{3}$ over a wide range. Consequently, to reduce model complexity without compromising performance, we have fixed $c_{3} = 1$ in our final implementation.

5. Real Data Experiments

5.1. Data and Training Instructions

In the experiments, we employed the FAIR-CSAR-V1.0 dataset. FAIR-CSAR-V1.0 utilizes Gaofen-3 (GF-3) SAR complex images as its data source. The images in the dataset possess a nominal resolution of 1 m and cover 32 global regions, including airports, oil refineries, ports, and river channels. Based on this dataset, we constructed two specialized datasets: ocean scenes and land scenes. Each dataset contains 160 SAR complex images of size 512 × 512 pixels along with their corresponding echoes. From each dataset, 150 images were randomly selected for training, and the remaining 10 were reserved for testing. The batch size, epoch number, and learning rate are set to 10, 300, and 0.001.

Since real raw echoes with large squint angles are difficult to obtain directly with perfectly matched ground truth, we simulate high-squint echoes from these broadside images. The transformation relies on the approximate observation operator

G

and its inverse, which conceptually encapsulate the ENLCS algorithm and its reverse process, parameterized by the SAR imaging geometry (specifically the squint angle). The detailed procedure for generating high-squint raw echoes is as follows:

Step 1: Zero-Padding. The original broadside SLC image is first zero-padded in the frequency domain. This step is crucial to prevent aliasing artifacts during the subsequent domain transformations.
Step 2: Inverse Operator Application. We configure the inverse observation operator $G^{- 1}$ with the target highly squinted angle (e.g., $30^{°}$ ). This operator is applied to the padded broadside image to transform it from the image domain back into the raw echo domain.
Step 3: Cropping. The resulting simulated echo data is then cropped to the original dimensions to emulate the collected raw data size.

While the raw data is sourced from FAIR-CSAR-V1.0 and GF-3, establishing reliable “ground truth” labels (reference images) is critical for training and quantitative evaluation. Since the optimal regularization parameter is unknown a priori for real data, we adopted a strategy based on the classical L-curve method combined with an exhaustive search to generate these labels. The specific process for generating the reference label

x^{*}

and the corresponding optimal parameter

λ^{*}

is as follows:

Dense Sampling: For each training sample, we perform a dense grid search over a wide range of potential regularization parameters $λ$ .
Optimal Selection: We calculate the curvature of the L-curve for each sampled $λ$ . The value corresponding to the maximum curvature point is identified as the optimal parameter, denoted as $λ^{*}$ .
Reconstruction: Using this optimal $λ^{*}$ , we execute the standard ISTA until convergence to obtain the high-quality sparse reconstruction result $x^{*}$ .

We utilize this

x^{*}

as the reference label. Although this exhaustive search process is computationally prohibitive for practical imaging applications—necessitating the efficient framework proposed in this paper—it provides the rigorous, high-quality benchmarks required for network training and validation.

To ensure the reproducibility of our results and clarify the training procedure, we provide a detailed description of the network configuration, data splitting strategy, and optimization settings.

Data Split and Independence: The FAIR-CSAR-V1.0 dataset used in our experiments consists of 160 samples. We adopted a split of 150 samples for training and 10 samples for testing. We explicitly confirm that the testing samples were strictly held out and unseen during the training phase; no hyperparameter tuning was performed on the test scenes. Given the model-driven nature of our network, which embeds physical priors, this dataset size is sufficient for robust convergence.
Model Complexity: A key advantage of the proposed deep unfolding framework is its extreme lightweightness. The network contains only 30 trainable parameters in total (3 parameters per stage × 10 stages). This low complexity significantly reduces the risk of overfitting compared to large-scale black-box CNNs, thereby lessening the dependency on massive training datasets or extensive data augmentation.
Optimizer and Training Strategy: The network was trained using the Adam optimizer with a learning rate of $5 \times 10^{- 4}$ and default beta parameters ( $β_{1} = 0.9, β_{2} = 0.999$ ). To ensure optimal convergence and prevent overfitting, we implemented an early stopping mechanism monitoring the Normalized Mean Squared Error (NMSE) on the validation set, with a patience of 20 epochs.
Initialization: Unlike conventional deep learning models that require random weight initialization (necessitating specific random seeds for reproducibility), our method utilizes deterministic initialization based on the physical interpretation of the ISTA algorithm. For instance, the step size parameters are initialized using the theoretical Lipschitz constant. This deterministic approach ensures consistent starting points without the need for random seeds.

To rigorously evaluate the generalization capability and transferability of the proposed model, we deliberately selected test scenarios that represent the extreme boundary conditions of SAR scene sparsity. In sparse signal recovery, the optimal regularization parameter is intrinsically linked to the sparsity level of the scene. Therefore, rather than selecting multiple scenes with intermediate characteristics, we devised a “stress test” using two contrasting extremes:

Ocean Ship Scenario (High Sparsity Limit): This dataset represents the upper bound of sparsity, featuring isolated strong targets against a vast, empty background.
Complex Land Scenario (Low Sparsity Limit): This dataset represents the lower bound, characterized by dense structures and rich texture information.

By validating the method on these opposing ends of the spectrum, we effectively bracket the problem space. As discussed in the subsequent analysis, while standard Deep Unfolding Networks (e.g., LISTA) often struggle to adapt to such distinct distributions simultaneously—typically converging to a suboptimal average—our proposed framework successfully identifies the ideal hyperparameters for both extremes. This demonstrates the model’s robustness and its ability to adaptively span the full dynamic range of potential sparsity levels without manual intervention.

For rigorous generalization comparison, three specialized datasets were constructed: ocean scenes, land scenes, and mixed land–sea scenes. LISTA networks trained on these datasets are denoted as ocean-trained LISTA, land-trained LISTA, and mix-trained LISTA, respectively. Using these pre-trained networks and our proposed method, we reconstructed squint-mode SAR echoes for both ocean and land scenes.

5.2. Results

Ocean scene imaging results are shown in Figure 10, while land scene results appear in Figure 11. The indicators are listed in Table 5. NMSE curves are shown in Figure 12.

The distinct testing of ocean and land scenarios stems from their drastic sparsity contrast, where regularization parameter selection critically depends on scene sparsity. DUN-based LISTA, being data-driven, exhibits network parameters tightly coupled with training dataset characteristics. As shown in figures and quantitative metrics, ocean-trained LISTA excels in maritime imaging but over-sparsifies land scenes, suppressing weak scattering points. Conversely, land-trained LISTA achieves optimal terrestrial reconstruction but fails to suppress noise/clutter in ocean scenes, while mix-trained LISTA delivers suboptimal performance in both scenarios. This demonstrates the inherent generalization limitations of data-driven deep unfolding networks. In contrast, our proposed training-free method achieves adaptive hyperparameter optimization (HPO) across drastically different sparsity regimes and superior reconstruction fidelity in both scenarios through self-adjusting mechanisms. Notably, NMSE curves reveal that real-data-constructed squint-mode echoes pose greater challenges than simulated data, yet our method maintains convergence within 15 iterations (comparable to typical LISTA depth) despite more tortuous convergence paths. This validates our design philosophy of “simple but efficient”—achieving state-of-the-art generalization without dataset-specific training, fundamentally addressing the performance degradation of data-driven methods in cross-scenario applications.

6. Conclusions

In this paper, we have successfully addressed the challenges associated with highly squinted mode SAR imaging, a domain often hindered by reduced imaging quality and the stringent requirements for PRF. By integrating a sparse SAR imaging model that utilizes compressed sensing techniques, we enhanced the imaging quality even under the highly squinted mode constraints.

It has been proven that Hyper-ISTA-GHD can achieve adaptive adjustment of regularization parameters, accelerate convergence, and significantly reduce the risk of overfitting, thereby ensuring robust model performance.

Through extensive simulations and empirical data experiments, our proposed methods demonstrated high precision and rapid imaging capabilities without reliance on complex neural network architectures. This approach showcases the versatility and effectiveness of sparse SAR imaging techniques across various radar modes, highlighting their potential for widespread application in remote sensing tasks.

Overall, our findings contribute significantly to the field of synthetic aperture radar imaging, providing a foundation for future research and practical implementations that can further refine highly squinted mode applications.

7. Discussion

The Hyper-ISTA-GHD proposed in this paper can be applied to various SAR working modes and various imaging algorithms by replacing the appropriate AO. Thus, it has broad application prospects in engineering. It is worth noting that in order to speed up, the two hyperparameters of Hyper-ISTA-GHD converge to an approximate value, so the reconstruction accuracy of the algorithm can be improved in theory. Future plans include studying the chain HPO of sparse SAR imaging algorithms from the perspective of hyper-HPO.

Author Contributions

Conceptualization, T.C.; Methodology, H.G.; Validation, L.L.; Data curation, B.D.; Writing—original draft, T.C.; Writing—review & editing, B.Z.; Supervision, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chang, C.; Jin, M.; Curlander, J. Squint Mode SAR Processing Algorithms. In Proceedings of the 12th Canadian Symposium on Remote Sensing Geoscience and Remote Sensing Symposium, Montreal, QC, Canada, 10–14 July 1989; Volume 3, pp. 1702–1706. [Google Scholar]
Curlander, J.C.; McDonough, R.N. Synthetic Aperture Radar; Wiley: New York, NY, USA, 1991; Volume 11. [Google Scholar]
Amitrano, D.; Di Martino, G.; Di Simone, A.; Imperatore, P. Flood detection with SAR: A review of techniques and datasets. Remote Sens. 2024, 16, 656. [Google Scholar] [CrossRef]
Moreira, A.; Huang, Y. Airborne SAR processing of highly squinted data using a chirp scaling approach with integrated motion compensation. IEEE Trans. Geosci. Remote Sens. 1994, 32, 1029–1040. [Google Scholar] [CrossRef]
Luo, X.-L.; Xu, W.; Guo, L. The application of PRF variation to squint spotlight SAR. J. Radars 2015, 4, 70–77. [Google Scholar]
Davidson, G.; Cumming, I.; Ito, M. An approach for improved processing in squint mode SAR. In Proceedings of the IGARSS’93—IEEE International Geoscience and Remote Sensing Symposium, Tokyo, Japan, 18–21 August 1993; pp. 1173–1175. [Google Scholar]
Davidson, G.W. Image Formation from Squint Mode Synthetic Aperture Radar Data. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 1994. [Google Scholar]
Davidson, G.; Cumming, I.G.; Ito, M. A chirp scaling approach for processing squint mode SAR data. IEEE Trans. Aerosp. Electron. Syst. 1996, 32, 121–133. [Google Scholar] [CrossRef]
Davidson, G.W.; Cumming, I. Signal properties of spaceborne squint-mode SAR. IEEE Trans. Geosci. Remote Sens. 1997, 35, 611–617. [Google Scholar] [CrossRef]
Schmidt, A.R. Secondary Range Compression for Improved Range/Doppler Processing of SAR Data with High Squint. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 1986. [Google Scholar]
Smith, A. A new approach to range-Doppler SAR processing. Int. J. Remote Sens. 1991, 12, 235–251. [Google Scholar] [CrossRef]
Jian-ping, O.; Wei, L.; Jun, Z. ω-k Imaging Algorithm for SAR with Low Height and Large Squint Angle. J. Signal Process. 2014, 30, 1. [Google Scholar]
Chen, X.; Sun, G.C.; Xing, M.; Li, B.; Yang, J.; Bao, Z. Ground Cartesian back-projection algorithm for high squint diving TOPS SAR imaging. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5812–5827. [Google Scholar] [CrossRef]
An, D.; Huang, X.; Jin, T.; Zhou, Z. Extended Nonlinear Chirp Scaling Algorithm for High-resolution Highly Squint SAR Data Focusing. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3595–3609. [Google Scholar] [CrossRef]
Li, Z.; Chen, J.; Du, W.; Gao, B.; Guo, D.; Jiang, T.; Wu, T.; Zhang, H.; Xing, M. Focusing of maneuvering high-squint-mode SAR data based on equivalent range model and wavenumber-domain imaging algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2419–2433. [Google Scholar] [CrossRef]
Xu, X.; Su, F.; Gao, J.; Jin, X. High-squint SAR imaging of maritime ship targets. IEEE Trans. Geosci. Remote Sens. 2020, 60, 5200716. [Google Scholar] [CrossRef]
Guo, Y.; Wang, P.; Men, Z.; Chen, J.; Zhou, X.; He, T.; Cui, L. A modified range Doppler algorithm for high-squint SAR data imaging. Remote Sens. 2023, 15, 4200. [Google Scholar] [CrossRef]
Zhang, B.; Hong, W.; Wu, Y. Sparse microwave imaging: Principles and applications. Sci. China Inf. Sci. 2012, 55, 1722–1754. [Google Scholar] [CrossRef]
Çetin, M.; Stojanović, I.; Önhon, N.Ö; Varshney, K.; Samadi, S.; Karl, W.C.; Willsky, A.S. Sparsity-driven synthetic aperture radar imaging: Reconstruction, autofocusing, moving targets, and compressed sensing. IEEE Signal Process. Mag. 2014, 31, 27–40. [Google Scholar] [CrossRef]
Fan, Y.; Zhang, J.; Zhang, B.; Wu, Y. A Multi-Agent Consensus Equilibrium Perspective for Multi-feature Enhancement Sparse SAR Imaging. IEEE Geosci. Remote Sens. Lett. 2025, 22, 4010405. [Google Scholar] [CrossRef]
Fang, J.; Xu, Z.; Zhang, B.; Hong, W.; Wu, Y. Fast compressed sensing SAR imaging based on approximated observation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 7, 352–363. [Google Scholar] [CrossRef]
Quan, X.; Zhang, B.; Zhu, X.X.; Wu, Y. Unambiguous SAR Imaging for Nonuniform DPC Sampling: l₁ Regularization Method Using Filter Bank. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1596–1600. [Google Scholar] [CrossRef]
Bi, H.; Zhang, B.; Zhu, X.X.; Jiang, C.; Hong, W. Extended chirp scaling-baseband azimuth scaling-based azimuth–range decouple l₁ regularization for TOPS SAR imaging via CAMP. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3748–3763. [Google Scholar] [CrossRef]
Daubechies, I.; Defrise, M.; De Mol, C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 2004, 57, 1413–1457. [Google Scholar] [CrossRef]
Donoho, D.L.; Maleki, A.; Montanari, A. Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 2009, 106, 18914–18919. [Google Scholar] [CrossRef]
Neal, P.; Eric, C.; Borja, P.; Jonathan, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar]
Batu, O.; Cetin, M. Parameter selection in sparsity-driven SAR imaging. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 3040–3050. [Google Scholar] [CrossRef]
Martín-del Campo-Becerra, G.D.; Serafín-García, S.A.; Reigber, A.; Ortega-Cisneros, S. Parameter selection criteria for Tomo-SAR focusing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1580–1602. [Google Scholar] [CrossRef]
Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Thomas, J.; Ullmann, T.; Becker, M.; Boulesteix, A.L.; et al. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2023, 13, e1484. [Google Scholar] [CrossRef]
Stein, C.M. Estimation of the mean of a multivariate normal distribution. Ann. Stat. 1981, 9, 1135–1151. [Google Scholar] [CrossRef]
Golub, G.H.; Heath, M.; Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 1979, 21, 215–223. [Google Scholar] [CrossRef]
Hansen, P.C. Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 1992, 34, 561–580. [Google Scholar] [CrossRef]
Gregor, K.; LeCun, Y. Learning fast approximations of sparse coding. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 399–406. [Google Scholar]
Monga, V.; Li, Y.; Eldar, Y.C. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Process. Mag. 2021, 38, 18–44. [Google Scholar] [CrossRef]
Zhou, G.; Xu, Z.; Fan, Y.; Zhang, Z.; Qiu, X.; Zhang, B.; Fu, K.; Wu, Y. SPHR-SAR-Net: Superpixel High-resolution SAR Imaging Network Based on Nonlocal Total Variation. arXiv 2023, arXiv:2304.04428. [Google Scholar]
Zhang, H.; Ni, J.; Zhang, C.; Luo, Y.; Zhang, Q. PnP-Based ground moving target imaging network for squint SAR and sparse sampling. IEEE Trans. Geosci. Remote Sens. 2023, 62, 5201020. [Google Scholar] [CrossRef]
Zhou, G.; Zuo, Y.; Zhang, Z.; Zhang, B.; Wu, Y. CR-DEQ-SAR: A deep equilibrium sparse SAR imaging method for compound regularization. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4680–4695. [Google Scholar] [CrossRef]
Wang, M.; Hu, Y.; Wei, S.; Shi, J.; Cui, G.; Kong, L.; Guo, Y. Synthetic Aperture Radar Imaging Meets Deep Unfolded Learning: A comprehensive review. IEEE Geosci. Remote Sens. Mag. 2024, 13, 79–120. [Google Scholar] [CrossRef]
Chen, X.; Liu, J.; Wang, Z.; Yin, W. Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 9079–9089. [Google Scholar]
Liu, J.; Chen, X.; Wang, Z.; Yin, W. ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Chen, X.; Liu, J.; Wang, Z.; Yin, W. Hyperparameter Tuning is All You Need for LISTA. Neural Inf. Process. Syst. 2021, 34, 11678–11689. [Google Scholar]
Barzilai, J.; Borwein, J.M. Two-point step size gradient methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
Tan, C.; Ma, S.; Dai, Y.H.; Qian, Y. Barzilai-borwein step size for stochastic gradient descent. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 85–693. [Google Scholar]
Almeida, L.B.; Langlois, T.; Amaral, J.D.; Plakhov, A. Parameter adaptation in stochastic optimization. In On-Line Learning in Neural Networks; Cambridge University Press: Cambridge, UK, 1999; pp. 111–134. [Google Scholar]
Baydin, A.G.; Cornish, R.; Rubio, D.M.; Schmidt, M.; Wood, F. Online learning rate adaptation with hypergradient descent. arXiv 2017, arXiv:1703.04782. [Google Scholar]
Rubio, D.M. Convergence Analysis of an Adaptive Method of Gradient Descent. Master’s Thesis, University of Oxford, Oxford, UK, 2017. [Google Scholar]
Chandra, K.; Xie, A.; Ragan-Kelley, J.; Meijer, E. Gradient descent: The ultimate optimizer. Adv. Neural Inf. Process. Syst. 2022, 35, 8214–8225. [Google Scholar]
Haji, S.H.; Abdulazeez, A.M. Comparison of optimization techniques based on gradient descent algorithm: A review. Palarch’s J. Archaeol. Egypt/Egyptol. 2021, 18, 2715–2743. [Google Scholar]
Seifi, F.; Niaki, S. Extending the hypergradient descent technique to reduce the time of optimal solution achieved in hyperparameter optimization algorithms. Int. J. Ind. Eng. Comput. 2023, 14, 501–510. [Google Scholar] [CrossRef]
Yang, Z.; Li, X. Powered stochastic optimization with hypergradient descent for large-scale learning systems. Expert Syst. Appl. 2024, 238, 122017. [Google Scholar] [CrossRef]
Liu, Y.; Pan, Z.; Yang, J.; Zhang, B.; Zhou, G.; Hu, Y.; Ye, Q. Few-shot object detection in remote-sensing images via label-consistent classifier and gradual regression. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5612114. [Google Scholar] [CrossRef]
Zhang, H.; Lu, S. Proximal recursive generalized hyper-gradient descent method. Appl. Soft Comput. 2025, 175, 113073. [Google Scholar] [CrossRef]
Chu, Y.C.; Gao, W.; Ye, Y.; Udell, M. Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent. arXiv 2025, arXiv:2502.11229. [Google Scholar] [CrossRef]

Figure 1. Highly squinted SAR imaging geometry model.

Figure 2. Flowchart of Hyper-ISTA-GHD.

Figure 3. Imaging result comparison at different squint angles. The left column shows the ENLCS results, and the right column shows the SR-ENLCS results. First row:

θ_{s} = 20^{°}

; second row:

θ_{s} = 30^{°}

; third row:

θ_{s} = 40^{°}

; fourth row:

θ_{s} = 50^{°}

.

Figure 3. Imaging result comparison at different squint angles. The left column shows the ENLCS results, and the right column shows the SR-ENLCS results. First row:

θ_{s} = 20^{°}

; second row:

θ_{s} = 30^{°}

; third row:

θ_{s} = 40^{°}

; fourth row:

θ_{s} = 50^{°}

.

Figure 4. Simulation target setting.

Figure 5. Simulation results under different SNRs. The three columns from left to right correspond to SNRs of 30 dB, 25 dB, and 20 dB, respectively.

Figure 6. Simulation results under different sampling rates. The three columns from left to right correspond to sampling rates of 75%, 50%, and 25%, respectively.

Figure 7. NMSE curves of different methods under different circumstances. (a) SNR = 30 dB, 100% Sampling, (b) SNR = 25 dB, 100% Sampling, (c) SNR = 20 dB, 100% Sampling, (d) SNR = 30 dB, 75% Sampling, (e) SNR = 30 dB, 50% Sampling, (f) SNR = 30 dB, 25% Sampling.

Figure 8. Regularization hyperparameters. (a)

λ

in Figure 5; (b)

λ

in Figure 6.

Figure 8. Regularization hyperparameters. (a)

λ

in Figure 5; (b)

λ

in Figure 6.

Figure 9. The impact of coefficients on algorithm results. (a) The impact of

c_{1}

on

λ

. (b) The impact of

c_{2}

on NMSE.

Figure 9. The impact of coefficients on algorithm results. (a) The impact of

c_{1}

on

λ

. (b) The impact of

c_{2}

on NMSE.

Figure 10. Ocean scene imaging results. (a) MF; (b) ocean-trained LISTA; (c) land-trained LISTA; (d) mix-trained LISTA; (e) Hyper-ISTA-GHD; (f) Label.

Figure 11. Land scene imaging results. (a) MF; (b) ocean-trained LISTA; (c) land-trained LISTA; (d) mix-trained LISTA; (e) Hyper-ISTA-GHD; (f) Label.

Figure 12. NMSE curves. (a) Ocean; (b) land.

Table 1. Parameters for typical spaceborne highly squinted SAR.

Parameter	Symbol	Value
Carrier frequency	$f_{c}$	9.8 GHz
Signal bandwidth	$B_{r}$	150 MHz
Sampling rate	$f_{s}$	180 MHz
Antenna size in azimuth	$L_{a}$	2 m
Squint angle	$θ_{s}$	[20–50]°
Platform height	H	500 km
Equivalent speed	$V_{r}$	7340 m/s
Pulse repetition frequency	PRF	[25–100%] $B_{a}$ *
Range resolution **	$ρ_{r}$	1 m
Azimuth resolution **	$ρ_{a}$	≃1 m

*

B_{a}

represents the Doppler frequency, which depends on

θ_{s}

. ** Theoretical value.

Table 2. Azimuth results under different squint angles.

$θ_{s}$	Method	IRW (m)	PSLR (dB)	ISLR (dB)
$20^{°}$	ENLCS	1.290	−13.263	−10.248
$20^{°}$	SR-ENLCS	0.712	$- 43.812$	$- 41.567$
$30^{°}$	ENLCS	1.290	−13.256	−10.223
$30^{°}$	SR-ENLCS	0.712	$- 42.087$	$- 41.765$
$40^{°}$	ENLCS	1.462	−13.237	−10.154
$40^{°}$	SR-ENLCS	0.806	$- 40.876$	$- 40.158$
$50^{°}$	ENLCS	1.720	−13.040	−9.955
$50^{°}$	SR-ENLCS	0.949	$- 40.670$	$- 40.129$

Table 3. Simulation results under different SNRs.

SNR	Method	$λ$	PSNR (dB)	NMSE	Time (s)
30 dB	Matched filter	-	31.529	13.125	0.023
	Too small $λ$	0.011	35.056	5.211	2.118
	Too large $λ$	0.174	47.220	0.235	2.336
	L-curve	0.044	51.467	0.125	14.11
	Hyper-ISTA-AO	0.042	51.249	0.133	2.801
	Hyper-ISTA-BB	0.060	53.439	0.074	3.762
	Hyper-ISTA-GHD	0.060	54.253	0.066	2.956
25 dB	Matched filter	-	24.701	54.040	0.020
	Too small $λ$	0.045	33.123	7.219	2.187
	Too large $λ$	0.179	47.108	0.224	2.147
	L-curve	0.090	45.531	0.447	26.71
	Hyper-ISTA-AO	0.088	46.083	0.439	2.814
	Hyper-ISTA-BB	0.107	49.508	0.191	3.171
	Hyper-ISTA-GHD	0.109	49.767	0.193	2.941
20 dB	Matched filter	-	19.576	178.294	0.019
	Too small $λ$	0.106	29.243	12.312	2.181
	Too large $λ$	0.238	44.324	0.396	2.231
	L-curve	0.159	39.027	1.837	24.45
	Hyper-ISTA-AO	0.159	39.452	1.824	2.912
	Hyper-ISTA-BB	0.191	44.732	0.397	3.223
	Hyper-ISTA-GHD	0.192	47.471	0.371	2.902

Table 4. Simulation results under different sampling rates.

PRF/Ba	Method	$λ$	PSNR (dB)	NMSE	Time (s)
75%	Matched filter	-	28.875	13.316	0.023
	Too small $λ$	0.010	31.971	6.540	2.165
	Too large $λ$	0.153	42.446	0.368	2.247
	L-curve	0.038	43.680	0.393	23.01
	Hyper-ISTA-AO	0.037	43.511	0.421	3.027
	Hyper-ISTA-BB	0.052	48.655	0.135	3.301
	Hyper-ISTA-GHD	0.052	48.861	0.135	2.835
50%	Matched filter	-	26.770	9.099	0.021
	Too small $λ$	0.008	30.117	4.670	2.152
	Too large $λ$	0.120	36.954	0.557	2.164
	L-curve	0.030	38.212	0.563	13.38
	Hyper-ISTA-AO	0.030	39.138	0.518	2.899
	Hyper-ISTA-BB	0.042	40.869	0.338	3.275
	Hyper-ISTA-GHD	0.043	41.582	0.334	2.828
25%	Matched filter	-	23.638	4.827	0.020
	Too small $λ$	0.011	28.735	1.370	2.153
	Too large $λ$	0.044	31.054	0.716	2.170
	L-curve	0.022	30.894	0.714	12.95
	Hyper-ISTA-AO	0.022	31.147	0.692	2.886
	Hyper-ISTA-BB	0.031	31.767	0.635	3.291
	Hyper-ISTA-GHD	0.030	32.481	0.653	2.863

Table 5. GF-3 Real data results.

Scene	Method	PSNR (dB)	NMSE	Time (s)
Ocean	Matched filter	16.005	1.851	-
	Ocean—LISTA	21.635	0.506	3.370
	Land—LISTA	16.819	1.535	3.322
	Mixed—LISTA	17.805	1.223	3.386
	Hyper-ISTA-GHD	70.322	6.89 $\times 10^{- 6}$	9.937
Land	Matched filter	24.512	0.053	-
	Ocean—LISTA	14.216	0.563	3.582
	Land—LISTA	26.591	0.0326	3.618
	Mixed—LISTA	29.150	0.018	3.321
	Hyper-ISTA-GHD	77.357	2.73 $\times 10^{- 7}$	10.084

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, T.; Ding, B.; Gao, H.; Liu, L.; Zhang, B.; Wu, Y. Hyper-ISTA-GHD: An Adaptive Hyperparameter Selection Framework for Highly Squinted Mode Sparse SAR Imaging. Remote Sens. 2026, 18, 369. https://doi.org/10.3390/rs18020369

AMA Style

Chen T, Ding B, Gao H, Liu L, Zhang B, Wu Y. Hyper-ISTA-GHD: An Adaptive Hyperparameter Selection Framework for Highly Squinted Mode Sparse SAR Imaging. Remote Sensing. 2026; 18(2):369. https://doi.org/10.3390/rs18020369

Chicago/Turabian Style

Chen, Tiancheng, Bailing Ding, Heli Gao, Lei Liu, Bingchen Zhang, and Yirong Wu. 2026. "Hyper-ISTA-GHD: An Adaptive Hyperparameter Selection Framework for Highly Squinted Mode Sparse SAR Imaging" Remote Sensing 18, no. 2: 369. https://doi.org/10.3390/rs18020369

APA Style

Chen, T., Ding, B., Gao, H., Liu, L., Zhang, B., & Wu, Y. (2026). Hyper-ISTA-GHD: An Adaptive Hyperparameter Selection Framework for Highly Squinted Mode Sparse SAR Imaging. Remote Sensing, 18(2), 369. https://doi.org/10.3390/rs18020369

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Hyper-ISTA-GHD: An Adaptive Hyperparameter Selection Framework for Highly Squinted Mode Sparse SAR Imaging

Highlights

Abstract

1. Introduction

2. Signal Model and Related Method

2.1. Highly Squinted SAR Signal Model

2.2. Sparse SAR Imaging Method

2.3. Approximate Observation (AO)

2.4. Barzilar–Borwein (BB) Method

3. Sparse Imaging Method of Highly Squinted SAR Based on Hyper-ISTA-GHD

3.1. Hyper-ISTA-AO and Hyper-ISTA-BB

3.1.1. Momentum Term

3.1.2. Soft–Hard (SH) Threshold

3.1.3. Adaptive HPO

3.2. Generalized Hypergradient Descent (GHD)

3.3. Overall Framework of Hyper-ISTA-GHD

3.4. Complexity and Convergence

4. Simulation and Experiment

4.1. Experiment Settings

4.1.1. Simulation Scene

4.1.2. Evaluation Indicators

4.2. Simulation Under Different Squint Angles

4.3. Comparison Simulation of HPO Methods

4.3.1. Variable SNR

4.3.2. Variable PRF

4.4. Analyzes

5. Real Data Experiments

5.1. Data and Training Instructions

5.2. Results

6. Conclusions

7. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI