Deep Learning-Driven Atomic Norm Optimization for Accurate Downlink Channel Estimation in FDD Systems

Xu, Ke; Li, Sining; Huang, Changwei; Wu, Dan; Wei, Changning; Zhang, Dongjun; Jin, Richu; Ren, Huilin; Ji, Zhuoqiao; Chen, Xinbo; Wu, Weiqiang

doi:10.3390/electronics15071461

Open AccessArticle

Deep Learning-Driven Atomic Norm Optimization for Accurate Downlink Channel Estimation in FDD Systems

by

Ke Xu

^1,†,

Sining Li

^2,3,†

,

Changwei Huang

⁴,

Dan Wu

^3,*,

Changning Wei

^2,*

,

Dongjun Zhang

²

,

Richu Jin

²

,

Huilin Ren

^2,3,

Zhuoqiao Ji

⁵,

Xinbo Chen

^2,3 and

Weiqiang Wu

¹

School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China

²

Tech X Academy, Shenzhen Polytechnic University, Shenzhen 518055, China

³

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

⁴

School of Mechanical and Electrical Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China

⁵

Institute for Carbon-Neutral Technology, School of Automotive and Transportation Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2026, 15(7), 1461; https://doi.org/10.3390/electronics15071461

Submission received: 21 January 2026 / Revised: 12 March 2026 / Accepted: 16 March 2026 / Published: 1 April 2026

(This article belongs to the Section Circuit and Signal Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

In this paper, we propose a downlink (DL) channel estimation scheme for frequency-division duplex (FDD) multi-antenna orthogonal frequency-division multiplexing (OFDM) systems, leveraging atomic norm minimization (ANM) and deep neural networks (DNN). Unlike time-division duplex (TDD) systems, where uplink (UL) and DL channels are reciprocal, FDD systems do not share this reciprocity, leading to increased channel training overhead. However, both theoretical analyses and empirical evidence reveal that key channel characteristics—such as angles of arrival and departure, path delays, and the number of propagation paths—exhibit partial reciprocity between UL and DL. Building on this insight, we design a DL channel estimation scheme that exploits frequency-independent UL parameters along with estimated DL channel gains. Our method integrates ANM with DNN to enhance estimation accuracy and efficiency. Specifically, ANM formulates the estimation problem while avoiding the off-grid errors inherent in traditional grid-based methods. To further mitigate performance degradation in clustered-path channels and reduce computational complexity, we introduce a DNN-based architecture that predicts channel parameters. The DNN captures hidden relationships between received pilot signals and frequency-independent channel parameters, enabling accurate estimation with linear time complexity. During training, ANM assists in serving users, ensuring reliable performance. Once the DNN is fully trained, it takes over to balance quality of service (QoS) and latency, providing an efficient and accurate solution for DL channel estimation in FDD-OFDM systems.

Keywords:

frequency-division-duplex; channel estimation; atomic norm minimization; deep neural network

1. Introduction

One of the fundamental challenges in modern wireless communication systems is channel estimation, as most beamforming and user scheduling strategies in massive multiple-input multiple-output (MIMO) systems rely on accurate channel state information (CSI) to achieve optimal performance. In time-division duplexing (TDD) mode, this issue is mitigated since uplink (UL) CSI can be directly used as downlink (DL) CSI due to channel reciprocity. However, in frequency-division duplexing (FDD) mode, channel reciprocity generally does not hold because UL and DL CSI differ significantly. A straightforward approach to address this problem is to introduce a feedback mechanism: the transmitter sends pilot signals, the receiver estimates DL CSI, and the estimated CSI is fed back to the transmitter. In most wireless systems, an orthogonal set of pilot signals is required for channel estimation. While this additional training overhead is manageable for systems with a small number of antennas, it becomes prohibitive in massive MIMO systems employing large-scale antenna arrays [1], as the vast number of orthogonal pilot signals required would consume excessive resources within the limited channel coherence time and frequency [2].

To tackle the challenges associated with FDD systems, various strategies have been explored. One common approach is to extract DL CSI from UL CSI by assuming partial reciprocity, such as similarity between the UL angle-of-arrival (AoA) and DL angle-of-departure (AoD), or assuming they are nearly identical [3]. However, fully eliminating feedback for DL CSI estimation in FDD mode typically requires the assumption that UL and DL channel gains are highly correlated or nearly identical [4]. These assumptions, however, lack rigorous theoretical justification and experimental validation [5]. In contrast, studies such as [6] have shown that UL and DL cluster gains generally differ.

Leveraging angular reciprocity, CSI estimation techniques can further exploit the sparse nature of massive MIMO channels to reduce, though not entirely eliminate, DL feedback overhead due to frequency-dependent UL/DL channel gains. The assumption that propagation between the base station (BS) and user equipment (UE) follows a limited number of dominant paths is widely accepted, as recent works such as [7] have proved that the reciprocity holds in both sparse and rich scatter channels. Frequency-independent parameters can be estimated from UL pilots and used for DL CSI estimation, requiring only minimal feedback. A widely adopted method in this context involve quantizing the spatial domain using predefined grids or codebooks, reducing feedback to grid point indices for AoA and AoD estimation. Examples include modular quantization [8] and beam pattern training [9]. On the other hand, sensing-aided channel path estimation, such as [10,11], is of more favor as it recovers the channel paths by extracting the sensing parameters, which avoids tedious beam training within the quantized angular domain.

Despite their effectiveness, these methods have inherent limitations. Grid-based quantization methods suffer from off-grid errors. Increasing the grid density improves accuracy but significantly raises computational complexity, leading to a trade-off between precision and efficiency. On the other hand, sensing-based approaches require DL channel sparsity, which is often inferred from prior knowledge or UL feedback. The former can be unreliable in dynamic environments, while the latter introduces additional training overhead.

To overcome these challenges, we propose a low-complexity, high-accuracy channel estimation framework for FDD massive MIMO systems based on atomic norm minimization (ANM). Our approach focuses on the dominant propagation path and defines an atomic set encompassing all potential paths. The channel parameters of the most significant path are extracted by solving an ANM problem. Unlike existing ANM-based channel estimation methods, e.g., [12,13], that rely on a single semidefinite optimization, our approach explicitly estimates and extracts real multi-path channels, making it well-suited for spatial diversity.

Furthermore, we observe that ANM can be interpreted as a mapping function between received signals and channel paths, which allows us to replace it with a deep neural network (DNN) to further enhance efficiency and accuracy. During DNN training, we mitigate potential gradient descent stalls by introducing a simplified weight compression algorithm. Conventional weight compression methods require retraining for each compression parameter and iterative loss function evaluation, resulting in high complexity. In contrast, our approach selects the compression parameter that maximizes the number of activated neurons, avoiding iterative loss function searches while maintaining comparable performance.

By integrating ANM with a DNN, we propose a comprehensive DL channel estimation strategy for FDD massive MIMO systems. The system initially employs ANM during DNN training and subsequently switches to the trained DNN for real-time CSI estimation, significantly reducing computational complexity and improving estimation accuracy.

The major contributions of this paper are summarized as follows:

We propose an ANM-based channel estimation framework for FDD massive MIMO OFDM systems. The method estimates frequency-independent channel parameters along with path gains and formulates an ANM problem solvable via off-the-shelf solvers. Unlike grid-based methods, our approach avoids off-grid errors and reduces complexity without exhaustive grid searches.
We introduce a DNN-assisted approach that enhances ANM-based estimation by reducing computational overhead and improving accuracy. Additionally, we propose a simplified weight compression scheme to prevent training stagnation. The DNN, trained using parameters obtained from exhaustive search, enables low-latency channel prediction post-training.

The rest of this paper is organized as follows: Section 2 formulates the channel estimation problem. Section 3 derives the ANM problem. Section 4 uses DNN to solve the ANM problem. Section 5 presents numerical results. Section 6 concludes the paper.

Notations: Throughout this paper, we use the following notations: bold uppercase letters (e.g.,

A

) denote matrices, bold lowercase letters (e.g.,

a

) denote vectors,

A

represents a set, and

α

denotes a scalar. The pseudo-inverse, conjugate-transpose (Hermitian), conjugate, and transpose of matrix

A

are denoted as

A^{†}

,

A^{H}

,

A^{*}

, and

A^{T}

, respectively. The Kronecker product and Hadamard product of matrices

A

and

B

are represented as

A \otimes B

and

A ⊙ B

, respectively.

2. System Model and Problem Formulation

2.1. Channel Model

In this section, we consider an OFDM-based massive MIMO system operating in FDD mode. The system consists of a base station (BS) equipped with M antennas and S subcarriers, serving multiple pieces of single-antenna user equipment (UE), which is a typical configuration for lower hardware complexity, power consumption and cost [14]. It is well known that the channel in massive MIMO systems is typically characterized by a limited number of propagation paths [4]. Let

h_{m}^{U} (s)

denote the uplink (UL) channel between UE and the mth BS antenna on subcarrier s, given by

h_{m}^{U} (s) = \sum_{l = 1}^{L^{U}} g_{l}^{U} e^{j 2 π (Φ_{l, s}^{U} + Θ_{l, s}^{U})},

(1)

where

L^{U}

represents the total number of propagation paths,

g_{l}^{U}

is the gain of the lth path,

Φ_{l, s}^{U}

accounts for the phase difference due to the path delay, and

Θ_{l, m}^{U}

represents the phase difference caused by the relative arrival time variation of the path. For simplicity, we assume that both M and S are even integers, such that

m = - M / 2, \dots, - 1, 0, 1, \dots, M / 2

and

s = - S / 2, \dots, - 1, 0, 1, \dots, S / 2

.

The phase delay

Φ_{l, s}^{U}

is related to the difference between the UL and DL central frequencies, denoted as

Δ F

, with an inter-subcarrier spacing of

Δ f

(i.e.,

Δ F = S Δ f

). Defining a delay

τ_{l}^{U}

for path l, which satisfies

0 \leq τ_{l}^{U} < 1 / Δ f

, we express

Φ_{l, s}^{U}

as

Φ_{l, s}^{U} = s Δ f τ_{l}^{U},

(2)

Similarly, the antenna element delay

Θ_{l, s}^{U}

is determined by the inter-element spacing of the antenna array. Assuming a uniform linear array (ULA) at the BS,

Θ_{l, s}^{U}

can be written as

Θ_{l, s}^{U} = m \frac{d}{λ^{U}} sin θ_{l}^{U},

(3)

where d is the inter-element spacing,

λ^{U}

is the wavelength, and

sin θ_{l}^{U}

represents the signal arrival angle. For other array configurations, such as co-prime [15] and nested arrays [16], similar calculations apply, and their distinct properties warrant further investigation in future work.

Substituting (2) and (3) into (1), we obtain

h_{m}^{U} (s) = \sum_{l = 1}^{L^{U}} g_{l}^{U} e^{j 2 π (s Δ f τ_{l}^{U} + m \frac{d}{λ^{U}} sin θ_{l}^{U})},

(4)

To represent the UL channel across all antennas, we stack

h_{m}^{U} (s)

for

s = 1, \dots, S

and

m = 1, \dots, M

into the vector

h^{U}

:

h^{U} = \sum_{l = 1}^{L^{U}} g_{l}^{U} p (τ_{l}^{U}) \otimes q (θ_{l}^{U}),

(5)

where

p (τ_{l}^{U}) = {[e^{- j 2 π Δ f τ_{l}^{U}}, \dots, e^{- j 2 π S Δ f τ_{l}^{U}}]}^{H},

(6)

q (θ_{l}^{U}) = {[e^{- j 2 π \frac{d}{λ^{U}} sin θ_{l}^{U}}, \dots, e^{- j 2 π \frac{d}{λ^{U}} M sin θ_{l}^{U}}]}^{H},

(7)

Since FDD systems do not exhibit full reciprocity between DL and UL channels, additional estimation and feedback steps are required. However, prior work [17] has shown that sparsity reciprocity and angular reciprocity hold, meaning that the number of propagation paths and the angles of departure (AoD) in the DL closely match those of the angles of arrival (AoA) in the UL. Letting

L^{D}

denote the number of DL propagation paths, with corresponding delays

τ^{D}

and angles

θ^{D}

, we approximate

L^{D} \approx L^{U}

,

τ^{D} \approx τ^{U}

, and

θ^{D} \approx θ^{U}

.

From (4), we see that the parameters

L^{D}, τ^{D}, θ^{D}

—excluding the DL channel gain

g^{D}

—can be inferred from the UL channel. Therefore, estimating

g^{D}

requires an additional step. For notation simplicity, we omit frequency-dependent superscripts and use L,

τ

, and

θ

to represent their UL and DL counterparts in the remainder of this paper.

2.2. Frequency-Independent Parameter Extraction

Let

t (s)

denote the pilot signal transmitted on subcarrier s. Given the UL channel model (4), the received signal

y_{m}^{U} (s)

at the BS antenna m is given by

y_{m}^{U} (s) = h_{m}^{U} (s) t (s) + n_{m}^{U} (s),

(8)

where

n_{m}^{U} (s) \sim CN (0, σ^{2})

represents Gaussian noise. Normalizing the pilot power, we set

| t (s) | = 1 / \sqrt{M}

.

For DL channel estimation, both

g_{l}^{D}

and frequency-independent parameters

τ_{l}, θ_{l}

must be determined. Stacking the received pilot signals across subcarriers and antennas, we obtain

y^{U} = \sum_{l = 1}^{L^{U}} g_{l}^{U} (p (τ_{l}^{U}) \otimes q (θ_{l}^{U})) t + n,

(9)

where

t = [t (1), \dots, t (S)]

. Defining the joint function

u (τ, θ) = p (τ) \otimes q (θ),

(10)

we rewrite (8) as

y^{U} = \sum_{l = 1}^{L^{U}} g_{l}^{U} u (τ_{l}^{U}, θ_{l}^{U}) t + n,

(11)

This indicates that for any received

y

, we need to recover

\{τ_{l}^{U}, θ_{l}^{U}\}

(namely

\{τ_{l}, θ_{l}\}

) for

l = 1, \dots, L

, without any prior knowledge of the number of propagation paths L. In this paper, this is achieved in two steps: first, estimating

\{L, τ, θ\}

, followed by determining the DL gain

g_{l}^{D}

with an extra feedback mechanism.

2.3. Estimation of DL Channel

Given the same channel model and leveraging angular reciprocity, the DL channel can be expressed as

h_{m}^{D} (s) = \sum_{l = 1}^{L^{U}} g_{l}^{D} e^{j 2 π (s Δ f τ_{l}^{U} + m \frac{d}{λ^{D}} sin θ_{l}^{U})},

(12)

The frequency-independent parameters

τ_{l}

and

θ_{l}

are derived from the UL channel, whereas the DL path gain

g_{l}^{D}

in (12) requires separate estimation and feedback. The DL channel gain

g_{l}^{D}

in (12) is obtained from the received signal at the UE side. In a multi-subcarrier system, pilot signals are inserted at regular subcarrier intervals. We assume that the channel gains for adjacent subcarriers remain nearly identical, introducing minimal error [18].

To estimate

g_{l}^{D}

, the BS transmits pilots and informs the UE of the estimated

\{τ_{l}, θ_{l}\}

. These pilots are beamformed towards

θ_{l}

to maximize the signal-to-noise ratio (SNR), requiring only a single OFDM symbol frequency division multiplexed over all

θ_{l}

. Let us assume that a pilot signal is inserted at every K subcarrier, and the total number of subcarriers S is divisible by K. The subcarriers containing pilot signals are indexed as

s_{1}, \dots, s_{I}, I = S / K

.

To achieve maximum SNR, each pilot signal

t_{i}

on subcarrier

s_{i}

is beamformed along one of the L propagation paths. In a sparse massive MIMO channel, the number of paths L is typically much smaller than the number of subcarriers (i.e.,

L ≪ S

), ensuring that the I subcarriers sufficiently cover all L propagation directions when the gap K is small enough. The beamformed pilot signal

t_{i}

is given by

t_{i} = q (θ_{\bar{i}}),

(13)

where

\bar{i} = i

mod

L

. The received signal at the UE on subcarrier

s_{i}

is expressed as

y^{D} (s_{i}) = \sum_{l = 1}^{L} g_{l}^{D} e^{j 2 π (Δ F + s_{i} Δ f) τ_{l}} q^{H} (θ_{l}) t_{i} + n^{D},

(14)

for

i = 1, \dots, I

. Given the pilot signal structure

s_{l}

, the UE is informed of the estimated

τ_{l}

and

θ_{l}

and stacks all received signals into

y^{D} = Q g^{D} + n,

(15)

where

y^{D} \in C^{K \times 1}

,

g^{D} \in C^{L \times 1}

and

Q \in C^{K \times L}

are defined as

y^{D} = [\begin{matrix} y^{D} (n_{1}) \\ ⋮ \\ y^{D} (n_{K}) \end{matrix}],

(16)

g^{D} = [\begin{matrix} g^{D} (n_{1}) \\ ⋮ \\ g^{D} (n_{L}) \end{matrix}],

(17)

Q = [q^{D} (θ_{1}) \dots q^{D} (θ_{L})],

(18)

where

q^{D} (θ_{l})

is given by

q^{D} (θ_{l}) = e^{j 2 π (Δ F + n_{i} Δ f) τ_{l}} {[q (θ_{1}) \dots q (θ_{K})]}^{H} t_{l} .

(19)

To estimate

g^{D}

from (15), we apply the pseudo-inverse estimator to obtain a least squares (LS) estimate:

g^{D} = Q^{†} y^{D} .

(20)

So far, with (17) and (20), the DL path gains

g^{D} (n_{l}), l = 1, \dots, L

can be obtained.

2.4. Necessity of Angular Reciprocity

As discussed in Section 1, in an FDD system, leveraging angular reciprocity significantly reduces channel training overhead. Beyond reducing overhead, we also analyze the accuracy implications of using angular reciprocity.

Our proposed approach first estimates the directional angles

θ_{l}

, the number of paths L, and the path delays

τ_{l}

, before separately estimating the DL channel gains

g_{l}^{D}

independently via extra feedback. An alternative method would first estimate

θ_{l}

and L using traditional DOA estimation methods, then determine

τ_{l}

and

g_{l}^{D}

through feedback. We compare these two approaches in terms of complexity and accuracy.

Complexity: The computational complexity of the proposed grid-free scheme will be analyzed in Section 3.3. In contrast, we evaluate the complexity of conventional DOA estimation methods as a benchmark. Specifically, we consider the MUSIC [19] and ESPRIT [20] algorithms. For MUSIC, the complexity per propagation path l to estimate

θ_{l}

is

O (M^{2} S^{2} + M^{3} S^{3} + U_{m} V_{m} (M^{2} S^{2} - 1))

, leading to an overall complexity of

O (L M^{2} S^{2} + L M^{3} S^{3} + L U_{m} V_{m} (M^{2} S^{2} - 1))

, where

U_{m}

represents the step size. Similarly, the complexity for ESPRIT is

O (M^{2} S^{2} + M^{3} S^{3} + 2 (M - 1) S + 2 (S - 1) M + 6)

per path and

O (L M^{2} S^{2} + L M^{3} S^{3} + 2 L (M - 1) S + 2 L (S - 1) M + 6 L)

in total. As will be demonstrated later, despite only estimating the DOA

θ_{l}

, both algorithms exhibit higher complexity than the proposed grid-free method, which jointly estimates

θ_{l}

and path delay

τ_{l}

.

Accuracy: In mmWave channels, it is widely recognized that propagation paths exhibit a very narrow angular spread, a fact confirmed by extensive practical measurements [21]. For example, measurements at 28 GHz and 73 GHz report a −10 dB average root-mean-square (RMS) lobe angular spread of

6 . 8^{\circ}

in azimuth and

6 . 7^{\circ}

in elevation for 28 GHz non-line-of-sight (NLOS) scenarios [22]. At 73 GHz in NLOS conditions, these values further reduce to

3 . 7^{\circ}

and

2 . 2^{\circ}

, respectively. In a multipath channel model, such a small angular spread demands an extremely high-resolution algorithm to distinguish paths, despite their limited number. Moreover, practical measurements indicate that path delays are generally minimal. For instance, tests in both 28 GHz and 73 GHz environments reveal that, in line-of-sight (LOS) cases, most path delays are below 2 ns, while in NLOS conditions, they extend to approximately 60 ns and 20 ns, respectively [23]. These observations highlight a critical issue: in an alternative approach where the downlink (DL) channel is obtained by first estimating the direction of arrival (DOA), followed by path delay and gain estimation, errors in DOA estimation propagate to subsequent parameters. This occurs because the phase of the channel vector depends on both

sin θ

and

τ

, as shown in (1). In some cases, DOA estimation inaccuracies may completely obscure the true path delay. To mitigate such error accumulation, the proposed scheme jointly estimates

θ

and

τ

. The resulting atomic norm minimization (ANM) formulation generates a 2D power spectrum, where each parameter occupies a separate dimension, preventing mutual interference.

3. Parameter Estimation with Atomic Norm Minimization

In FDD mode, where partial channel reciprocity holds, the DL channel can be inferred from the feedback DL path gains

g^{D} (n_{l})

and the frequency-independent parameters

\{L^{D}, τ^{D}, θ^{D}\}

. Thus, accurately estimating

\{L^{D}, τ^{D}, θ^{D}\}

is essential. Conventional methods rely on exhaustive searches over a densely sampled grid, assuming that these parameters lie precisely on grid points. However, this either introduces off-grid errors or significantly increases computational complexity due to the trade-off between grid resolution and search efficiency. To overcome this limitation, in this section we propose an ANM-based approach that achieves high-resolution estimation of

\{L^{D}, τ^{D}, θ^{D}\}

with reduced complexity.

3.1. The Atomic Norm Minimization Problem

For simplicity, we assume the BS employs a half-wavelength ULA, i.e.,

d = \frac{λ^{U}}{2}

. Given

θ_{l} \in [0, π)

, we define

ν \in [0, 0.5)

as

\frac{d}{λ^{U}} sin θ_{l}

. Similarly, since

0 \leq τ_{l}^{U} < 1 / Δ f

, we introduce

μ \in [0, 1)

as

Δ f τ

. Applying these transformations to (9), we obtain

\begin{matrix} y^{U} & = \sum_{l = 1}^{L} g_{l}^{U} p (μ_{l}) \otimes q (ν_{l}) + n \\ = \sum_{l = 1}^{L} g_{l}^{U} u_{l} (μ_{l}, ν_{l}) + n \\ = U g + n, \end{matrix}

(21)

where

u_{l} (μ, ν) = p (μ) \otimes q (ν)

. The matrix

U \in C^{M N \times L}

is formed by stacking all

u {(μ, ν)}_{l}

for

l = 1, \dots, L

:

U = [u (μ_{1}, ν_{1}), \dots, u (μ_{L}, ν_{L})],

(22)

and

g \in C^{L \times 1}

is constructed by stacking all

g_{l}^{U}

:

g = {[g_{1}^{U}, \dots, g_{L}^{U}]}^{T} .

(23)

Since the number of paths L is unknown, simultaneously estimating

μ_{l}, ν_{l}

, and

g_{l}

is challenging. Paths with small gains

g_{l}

may be buried in noise

n

, making them indistinguishable in

y^{U}

. Given the channel’s sparsity, we focus on identifying dominant paths by minimizing L. This problem, which seeks to recover the minimal set of contributing components, is naturally formulated as an ANM problem [24]. Compared to most existing works which directly extract channel paths from CSI, channel paths recovered from ANM are estimated in the continuous domain, thus avoiding off-grid error and enhancing accuracy [25].

To cast (21) as an ANM problem, we define the atomic set

A

for the reconstructed signal

y

as

A ≜ \{u_{l} (μ, ν) : ν \in [0, 0.5), μ \in [0, 1)\},

(24)

and the atomic norm

| y | A, 0

, corresponding to the number of contributing atoms as

{∥y∥}_{A, 0} ≜ \inf \{L : \sum_{i = 1}^{L} g_{l}^{U} u (μ_{l}, ν_{l}), u (μ_{l}, ν_{l}) \in A\} .

(25)

Since solving (25) is NP-hard, we relax it by minimizing the sum of absolute path gains, leading to

{∥y∥}_{A} ≜ \inf \{\sum_{l = 1}^{L} |g_{l}^{U}| : \sum_{l = 1}^{L} g_{l}^{U} u (μ_{l}, ν_{l}), (μ, ν) \in A\} .

(26)

To exclude paths dominated by noise, we define an error margin

ϵ

based on the noise power:

ϵ = σ^{2} N Δ f,

(27)

and an error vector

z

satisfying

| z | 2 ⩽ ϵ

. Thus, our final optimization problem is

\begin{matrix} \underset{g, z}{minimize} & {∥y∥}_{A} \\ subject to & y = U g + z \\ {∥z∥}_{2} ⩽ ϵ, \end{matrix}

(28)

where the first constraint is derived from Equation (21) and the second constraint from the noise power threshold Equation (27).

Problem (28) shares similarities with the direction-of-departure and direction-of-arrival (DOD-AOA) estimation problem in radar systems, for which grid-free ANM approaches have been extensively studied. For instance, ref. [13] reformulates ANM as a positive semidefinite relaxation (SDR) problem by constructing a block Toeplitz matrix, which can be efficiently solved using standard techniques. However, ref. [26] shows that such a Toeplitz construction may degrade estimation accuracy. In this work, we propose a grid-free approach by solving the dual problem of (28) and constructing a tailored Toeplitz matrix leveraging the properties of

g_{l}^{U}

.

Notably, since the number of paths L—and thus the dimensions of

U

(

M N \times L

) and

g

(

L \times 1

)—is unknown in advance, problem (28) cannot be solved directly. To address this, we first extract the most dominant path and subtract its contribution from the received signal

y

. The ANM process is then iteratively executed until a predefined stopping criterion is met. Through this iterative approach, dominant paths are sequentially identified and removed, allowing the DL channel to be estimated based on these dominant components due to its inherent sparsity. For each iteration, extracting the most dominant path

\{μ_{1}, ν_{1}, g_{1}\}

is feasible, as the problem has a bounded infimum for any practical

y

, given that at least one propagation path is always present. Therefore, we solve the problem by analyzing the values of

U

and

g

under this bounded infimum condition. It will be demonstrated that, through this process, the dimensions of

U

and

g

can be effectively reduced.

3.2. Construction of the Dual Problem

To derive the dual problem of (28), we introduce the dual variables

α \in C^{M N \times 1}

and

β \in R^{1 +}

. The Lagrangian function associated with (28) is formulated as

\begin{matrix} L (y, α, β) = \\ {∥y∥}_{A} + Re [α^{H} (y - U g - z)] + β (z^{H} z - ϵ^{2}) . \end{matrix}

(29)

The dual function

f (α, β)

is obtained by minimizing the Lagrangian (29) over

g

and

z

:

\begin{matrix} f (α, β) = & \inf_{g, z} L (y, α, β) \\ = & \inf_{g, z} {Re [α^{H} y - α^{H} z] + β (z^{H} z - ϵ^{2}) \\ + {∥y∥}_{A} - Re [α^{H} U g]} . \end{matrix}

(30)

To solve for

z

, we compute the partial derivative of

f (α, β)

with respect to

z

:

\frac{\partial (α, β)}{\partial n} = - α + 2 β z .

(31)

Setting (31) to zero yields the optimal

z

:

z_{0} = \frac{α}{2 β} .

(32)

Substituting (32) into (30), we obtain

\begin{matrix} f (α, β) |_{z_{0}} = & Re [α^{H} y] - \frac{α^{H} α}{2 β} + β (\frac{α^{H} α}{4 β^{2}} - ϵ^{2}) \\ + & \inf_{g} ({∥y∥}_{A} - Re [α^{H} U g]) . \end{matrix}

(33)

Next, differentiating with respect to

β

gives

\frac{\partial (α, β)}{\partial β} = \frac{α^{H} α}{4 β^{2}} - ϵ^{2} .

(34)

Setting (34) to zero yields the optimal

β

:

β_{0} = \frac{{∥α∥}_{2}}{2 ϵ} .

(35)

Substituting (32) and (35) into (30) gives

\begin{matrix} f (α) |_{z_{0}, β_{0}} = & Re [α^{H} y] - ϵ {∥α∥}_{2} \\ + & \inf_{g} ({∥y∥}_{A} - Re [α^{H} U g]) . \end{matrix}

(36)

To maximize (36) over

α

, we examine

\inf_{g} ({∥y∥}_{A} - Re [α^{H} U g])

. By (26), we have

\begin{matrix} {∥y∥}_{A} - Re [α^{H} U g] \\ = \sum_{l = 1}^{L} g_{l}^{U} - Re [α^{H} U g] \\ = \sum_{l = 1}^{L} \{g_{l}^{U} - Re [{(α^{H} U)}_{l} g_{l}^{U}]\} \\ = \sum_{l = 1}^{L} |g_{l}^{U}| \{1 - |{(U^{H} α)}_{l}| cos ϕ_{l}\}, \end{matrix}

(37)

where

{(α^{H} U)}_{l}

is the lth entry of

α^{H} U

and

ϕ_{l}

is the angle between

{(U^{H} α)}_{l}

and

g_{l}^{U}

.

Proposition 1.

{|y|}_{A} - Re [α^{H} U g]

has a finite infimum only if either

|{(α^{H} U)}_{l}| = 1

or the path gain

g_{l}^{U} \neq 0

.

Proof.

Since we have

\begin{matrix} |g_{l}^{U}| \{1 - |{(U^{H} α)}_{l}| cos ϕ_{l}\} \geq |g_{l}^{U}| \{1 - |{(U^{H} α)}_{l}|\}, \end{matrix}

(38)

when

|{(α^{H} U)}_{l}| > 1

, the lower bound of (37) becomes

- \infty

, implying that the infimum does not exist. Similarly, when

|{(α^{H} U)}_{l}| < 1

and

|g_{l}^{U}| \to \infty

, the lower bound of (37) approaches ∞, meaning the infimum is also undefined in this case.

Thus, to ensure a bounded infimum, we must have

|{(α^{H} U)}_{l}| = 1

. Under this condition, maximizing (36) over

α

leads to the following optimization problem:

\begin{matrix} \underset{α}{maximize} & Re [α^{H} y] - ϵ {∥α∥}_{2} \\ subject to & |{(U^{H} α)}_{l}| = 1, \forall l = 1, \dots, L . \end{matrix}

(39)

This problem can be reformulated into a structured optimization problem by imposing a Toeplitz matrix constraint. □

Proposition 2.

There exists a Hermitian matrix

D \in C^{M N \times M N}

such that for any

U_{i} \in A

, we have

U_{i}^{H} D U_{i} = 1

, where

U_{i}

represents the ith column of

U^{H}

and the corresponding row of

U

.

Proof.

See Appendix A. □

Using the Schur complement, we obtain

[\begin{matrix} D & α \\ α^{H} & 1 \end{matrix}] = 0 .

(40)

Thus, the unitary amplitude constraint in problem (39) is reformulated as

\begin{matrix} \underset{α, D \in H}{maximize} & Re [α^{H} y] - ϵ {∥α∥}_{2} \\ subject to & [\begin{matrix} D & α \\ α^{H} & 1 \end{matrix}] = 0, \end{matrix}

(41)

where

H

is the set of all Hermitian matrices

D \in C^{M N \times M N}

satisfying the constraints in (A4). The optimization problem (41) can be solved using off-the-shelf solvers, such as the SDPT3 toolbox for Matlab [27]. After obtaining the optimal

α

from (41), we define

P (μ, ν)

as the power spectrum of

u_{l} (μ, ν) = p (μ) \otimes q (ν)

over the domain

ν \in [0, 0.5)

and

μ \in [0, 1)

:

P (μ, ν) = u_{l} {(μ, ν)}^{H} α α^{H} u_{l} (μ, ν) .

(42)

By locating the highest peak of

P (μ, ν)

within

ν \in [0, 0.5)

and

μ \in [0, 1)

, we estimate

ν_{1}

and

μ_{1}

for the most dominant UL channel path. The corresponding path gain

g_{1}^{U}

is then computed as

g_{1}^{U} = \frac{u_{l}^{H} (μ_{1}, ν_{1}) y^{U}}{{∥u_{l}^{H} (μ_{1}, ν_{1})∥}^{2}} .

(43)

Once the first UL channel parameters

g_{1}^{U}, μ_{1}, ν_{1}

are determined, we define the residual signal as

y_{r}^{U} = y^{U} - g_{1}^{u} u_{l} (μ_{1}, ν_{1}) .

(44)

By replacing

y^{U}

with

y_{r}^{U}

, we iteratively apply the same ANM approach to extract subsequent UL channel parameters

g_{2}^{U}, μ_{2}, ν_{2}

. Repeating this process for all paths, we obtain

g_{l}^{U}, μ_{l}, ν_{l}

for

l = 1, \dots, L

. Since the exact number of paths L is unknown, we monitor the residual signal power

{|y_{r}^{U}|}^{2}

after each iteration and compare it with the noise power

ϵ^{2}

. The process terminates when

{∥y_{r}^{U}∥}^{2} < ϵ,

(45)

indicating that the residual signal is dominated by noise, making further extraction unnecessary.

3.3. Complexity Analysis

The proposed ANM approach consists of two main stages: solving the optimization problem (41) and identifying peaks in the power spectrum (42). In the first stage, constructing the matrix

D

requires

O (M S)

operations, followed by solving (41) using SDPT3, which has a computational complexity of

O (\frac{1}{2} M^{2} S^{2} + 2 M S^{3})

, as outlined in [28]. For the second stage, searching for peaks in the power spectrum demands

O (p_{m} p_{n} M S)

operations, where

p_{m}

and

p_{n}

represent the search step sizes for

μ

and

ν

, respectively. Additionally, evaluating the stopping criterion requires

O (M S)

. Given that there are L paths, the iterative procedure is expected to run L times. A summary of the complexity analysis is presented in Table 1, while a comparison of complexity with existing message passing techniques as well as conventional methods (i.e., MUSIC and ESPRIT) is presented in Table 2, where

N_{i t}

denotes the number of iterations, where

M = 256

,

S = 512

,

L = 3

,

N_{i t} = 5

are considered.

4. Parameter Estimation with DNN

Table 1 indicates that the most computationally intensive stage is solving problem (41). This step can be further optimized by adopting a lower-complexity approach. Additionally, when multiple propagation paths are closely spaced, the ANM approach struggles to distinguish them, as observed in numerical results. Therefore, replacing this procedure with a method that offers both lower complexity and higher resolution would be highly beneficial. This motivates us to develop a deep neural network (DNN)-based architecture for fast and accurate channel parameter prediction from the received signal

y^{U}

in this section.

In recent years, DNNs have gained significant attention and have demonstrated remarkable success in various domains, including image recognition, natural language processing, and autonomous driving [31]. One of the key advantages of DNN architectures is their ability to provide predictions with nearly linear time complexity once trained, making them well-suited for modern telecommunication systems that require low-latency processing. The DNN framework consists of multiple layers, each acting as a nonlinear mapping function. Given the input

y^{U}

and

ϵ

, the expected output

\{μ_{1}, ν_{1}\}

can be expressed as

\begin{matrix} \{μ_{1}, ν_{1}\} = f^{n} (f^{n - 1} (\dots f^{1} (y^{U}, ϵ))), \end{matrix}

(46)

where

f^{n}

represents the mapping function of the nth layer. Once

μ_{1}

and

ν_{1}

are determined,

g_{1}^{U}

is computed using (43), thereby identifying the first propagation path. Subsequently, the same procedure in (44) is applied iteratively for the remaining paths until the stopping criterion in (45) is met.

4.1. Exhaustive Search for Training Data Set

Before deploying the DNN for channel parameter prediction, it must first be trained in a supervised manner. This requires obtaining the desired output

\{{\tilde{μ}}_{1}, {\tilde{ν}}_{1}\}

for any given

y^{U}

and noise power

ϵ

. To achieve this, we adopt the conventional 2D-grid-based approach, where the 2D spatial range

μ \in [0, 1)

and

ν \in [0, 0.5)

is discretized into

γ_{1} \times γ_{2}

grid points. Let

B

denote the set of all grid points, given by

B = \{\frac{m}{γ_{1}}, \frac{n}{2 γ_{2}} : m = 1, \dots, γ_{1}; n = 1, \dots, γ_{2}\} .

(47)

The estimated parameters

{\tilde{μ}}_{1}, {\tilde{ν}}_{1}

are obtained via an exhaustive search:

{\tilde{μ}}_{1}, {\tilde{ν}}_{1} = arg max_{(μ_{1}, ν_{1}) \in B} \frac{{|u^{H} (μ, ν) y^{U}|}^{2}}{{∥u^{H} (μ, ν)∥}^{2}},

(48)

and the corresponding channel gain

{\tilde{g}}_{1}^{U}

is computed using (43).

The input–output pairs

\{y^{U}, ϵ\}

and

\{{\tilde{μ}}_{1}, {\tilde{ν}}_{1}\}

form the training dataset for supervised DNN training. Since training occurs offline and computational complexity is not a primary concern,

γ_{1}

and

γ_{2}

can be set sufficiently large to enhance the precision of the estimated parameters

\{{\tilde{g}}_{1}^{U}, {\tilde{μ}}_{1}, {\tilde{ν}}_{1}\}

.

4.2. Architecture of the Proposed DNN

The proposed DNN architecture, illustrated in Figure 1, consists of three types of layers: input, hidden, and output layers.

The input layer comprises three neurons, corresponding to

Re (y^{U})

,

Im (y^{U})

and

ϵ

. The received signal

y^{U}

is decomposed into its real and imaginary components,

Re (y^{U})

and

Im (y^{U})

, respectively. As noted in [32], normalizing the inputs often improves learning efficiency and convergence speed. Following the approach in [33], normalization is applied in the pre-processing phase to preserve critical hidden features. The normalization factor

Δ_{n}

is defined as the

l

2-norm of the real input values:

Δ_{n} = max_{i = 1, \dots, M N} {∥Re (y^{U})∥}_{2}^{2},

(49)

where it is worth noting that the Frobenius norm of a vector is essentially a special case of the 2-norm, but the 2-norm is more consistent with vector operation specifications. All real and imaginary inputs are then divided by

Δ_{n}

.

For the hidden layers, the rectified linear unit (ReLU) function, defined as

ReLU (a) = max (0, a)

, is employed as the activation function. ReLU ensures nonnegative data flow and has been demonstrated to enhance training efficiency [34].

The output layer contains two neurons corresponding to

μ_{1}

and

ν_{1}

. Only predictions satisfying

μ \in [0, 1)

and

ν \in [0, 0.5)

are considered valid. The channel gain

g_{1}^{U}

is computed using (43), while predictions outside the valid range or with gains below

ϵ

are marked as invalid. The output regulation is summarized as follows:

g_{1}^{U} = \{\begin{matrix} 0, if \{\begin{matrix} μ \notin [0, 1), \\ or ν \notin [0, 0.5) \\ or g_{1}^{U} < ϵ \end{matrix} \\ g_{1}^{U} otherwise \end{matrix}

(50)

If

g_{1}^{U} = 0

, the DNN has failed to provide a valid prediction, necessitating further training. Invalid outputs are discarded until a valid prediction is obtained.

The loss function,

Loss (μ_{1}, ν_{1})

, is defined as the sum of the squared

l

2-norm distances between the actual output

(μ_{1}, ν_{1})

and predicted values

({\tilde{μ}}_{1}, {\tilde{ν}}_{1})

obtained via exhaustive search:

\begin{matrix} Loss (μ_{1}, ν_{1}) & = {∥μ_{1}^{U} - {\tilde{μ}}_{1}^{U}∥}_{F}^{2} + {∥ν_{1}^{U} - {\tilde{ν}}_{1}^{U}∥}_{F}^{2}, if g_{1}^{U} \neq 0 . \end{matrix}

(51)

The proposed model is a multilayer perceptron (MLP) with multiple hidden layers in a fully connected feedforward network. Each neuron in one layer is connected to every neuron in the subsequent layer, with data flowing in a forward direction. The input vector

x_{0}

is defined as

x_{0} = {[Re (y^{U}) Im (y^{U}) ϵ]}^{T} \in C^{(2 M N + 1) \times 1} .

(52)

Denoting the output of the ith hidden layer as

x i

, the transformation performed at layer i is given by

x_{i + 1} = f (w_{i}^{T} x_{i} + b_{i}),

(53)

where

w_{i}

and

b_{i}

are the weight and bias vectors for the ith hidden layer, updated during training.

4.3. Simplified LM-WC for Training

During the training stage, the weight and bias vectors for each hidden layer are updated by minimizing the loss function (51). In general, training algorithms fall into two categories: first-order methods, such as conventional gradient descent and error back-propagation [35], and second-order methods, including Newton’s method and the well-known Levenberg–Marquardt (LM) algorithm [36]. It is widely recognized that second-order algorithms typically achieve faster convergence, making them preferable for complex scenarios. In this work, we adopt the LM algorithm for training.

However, a common challenge in DNN training is the occurrence of diminishing gradients in certain hidden layer neurons, causing the learning process to stall—an issue known as the flat-spot problem [37]. This phenomenon has been frequently observed in our numerical experiments. First-order algorithms can mitigate this issue by learning additional parameters to normalize neuron activations. However, this approach is not practical for the LM algorithm, as it involves computing a large Jacobian matrix [38]. To address this issue, ref. [38] proposed the weight compression (LM-WC) algorithm, which adjusts neuron outputs to ensure they remain within the activation function’s active region. This compression effectively modifies the inputs to the network’s activation function, guiding them toward an activated state, as illustrated in Figure 2. By applying weight compression, the likelihood of successful training significantly increases.

The LM-WC algorithm effectively mitigates the flat-spot problem with high probability but introduces an additional compression parameter that must be determined. Consequently, it can only function properly after this parameter has been appropriately identified. In [38], the parameter is chosen to minimize the loss function, which requires re-training the network. Turning to our proposed DNN with the ReLU activation function, we observe that the compression process can be further simplified. If certain weight vectors remain unchanged for an extended period, the corresponding neurons are considered ‘stuck’. When the number of stuck neurons becomes sufficiently large, we compress the weight vector so that the maximum possible number of stuck neurons get activated (i.e., their ReLU inputs become positive). Given the original weight vector

w

, the compressed weight vector

w_{c}

is defined as

w_{c} = c I w,

(54)

where

c

is the weight compression vector, and

I

is an identity matrix of appropriate size. For a hidden layer with M outputs and N neurons,

w

and

c

can be further expressed as

\begin{matrix} w = [w_{1}, \dots, w_{N}] \\ c = [c_{1}, \dots, c_{N}] . \end{matrix}

(55)

The value of

c_{n}

is computed as

c_{n} = \frac{Ω M w_{n}}{\sum_{m = 1}^{M} a_{m}},

(56)

where

Ω > 0

is the compression parameter, and

a_{m}

represents the output of the ReLU activation function. Instead of tuning

Ω

during training to minimize error, as done in [38], our simplified LM-WC algorithm selects

Ω

such that the number of positive

a_{m}

values is maximized. That is, we determine

Ω

by solving

Ω = arg max_{Ω \in R} M (Ω),

(57)

where

M (Ω) \in [0, M]

denotes the number of positive

a_{m}

values after applying the compression parameter

Ω

in (56).

Since the exact range of

Ω

is unknown, an exhaustive search is impractical. Instead, we adopt an alternative approach to obtain a near-optimal solution. Beginning with an initial value of

Ω

, we iteratively multiply and divide

Ω

by a constant factor

ρ

and compare the values of

M (Ω ρ)

and

M (Ω / ρ)

, recording the larger one. This process continues by evaluating

M (Ω ρ^{2})

and

M (Ω / ρ^{2})

, updating the best value accordingly. The procedure stops at stage s when both

M (Ω ρ^{s})

and

M (Ω / ρ^{s})

are smaller than the current maximum

M (Ω_{o p t})

. At this point, weight compression is performed using

Ω = Ω_{o p t}

. A summary of the proposed simplified LM-WC algorithm is presented in Algorithm 1. The value of compression parameter

Ω

ought to be a value adjacent to the activation function, while a successful step size

ρ

should start from a marginal threshold. Recommended values of

Ω = 1

and

ρ = 1.1

, tested in [38], demonstrate strong performance in our simulations.

Algorithm 1 Simplified LM-WC algorithm

Require:: Compression step size $ρ$ , initial $Ω$
Ensure:: Near-optimal $Ω_{o p t}$
1:: Index $s = 1$ , current optimal $O = 0$
2:: while $M (Ω / ρ^{s}) > O$ or $M (Ω ρ^{s}) > O$ do
3:: if $M (Ω / ρ^{s}) > M (Ω ρ^{s})$ then
4:: $O = M (Ω / ρ^{s})$ , $Ω_{o p t} = Ω / ρ^{s}$
5:: else
6:: $O = M (Ω ρ^{s})$ , $Ω_{o p t} = Ω ρ^{s}$
7:: end if
8:: $s = s + 1$
9:: end while

4.4. The Prediction Stage

The proposed DNN architecture requires a supervised learning dataset, consisting of input samples

\{y^{U}, ϵ\}

and their corresponding desired outputs

\{{\tilde{μ}}_{1}, {\tilde{ν}}_{1}\}

, obtained via an exhaustive search approach. Given an input

\{y^{U}, ϵ\}

, we perform an exhaustive search to determine

\{{\tilde{g}}_{1}^{U}, {\tilde{μ}}_{1}, {\tilde{ν}}_{1}\}

. Using these values in (44), we compute the residual signal

y_{r}^{U}

. Setting

y_{r}^{U}

as the new input

\{y^{U}\}

, another round of exhaustive search is conducted to extract the corresponding

\{{\tilde{g}}_{1}^{U}, {\tilde{μ}}_{1}, {\tilde{ν}}_{1}\}

. By repeating this process iteratively, we can generate up to L training datasets from a single received signal

y^{U}

.

In a real-world system, these two approaches can be integrated as follows. Initially, when the DNN is untrained, the base station (BS) employs the atomic norm minimization (ANM) method to estimate the downlink (DL) channel and serve the user equipment (UE). Concurrently, the DNN undergoes offline training using received signals and channel parameters estimated via a high-resolution exhaustive search. After a certain training period, we evaluate the DNN’s readiness by monitoring the probability of invalid predictions (where

g_{1}^{U} = 0

). If this probability falls below a predefined threshold, the system can switch to the DNN-based approach for serving UE. Once the DNN is deployed, it remains crucial to periodically validate its performance using the ANM approach. If the probability of invalid predictions exceeds the threshold, the system should revert to the ANM method while continuing to refine the DNN with new training data.

5. Numerical Results

We evaluate the performance of the proposed approach in a multi-user wideband mmWave MIMO-OFDM system in this section. The carrier frequency is set to

f_{c} = 28

GHz, with a total bandwidth of

f_{s} = 8

GHz and

S = 512

subcarriers. The BS is equipped with

M = 256

antennas, arranged with half-wavelength inter-element spacing. There are eight single-antenna pieces of UE uniformly distributed over the angular range

(- π / 2, π / 2)

. The sparse mmWave channel for each UE is generated following the configuration in [39]. Each UE has

L = 3

propagation paths, with channel gains

g_{l} \sim CN (0, 1)

. The angles of arrival

θ_{l}

are uniformly distributed as

θ_{l} \sim U (- π / 6, π / 6)

, while the propagation delays

τ_{l}

follow

τ_{l} \sim U (0, 2 \times 10^{- 8} s)

. All simulation results are averaged over 3000 independent trials.

To assess the accuracy of channel estimation, we adopt the normalized mean square error (NMSE) metric, computed as

E (\frac{1}{N} \sum_{n = 1}^{N} {∥h - h^{e}∥}_{2}^{2} / {∥h∥}_{2}^{2}),

(58)

where

h

represents the actual channel, and

h^{e}

is the estimated channel.

5.1. Performance of the Atomic Norm Minimization Approach

We first evaluate the performance of the proposed ANM approach by measuring NMSE against SNR. Two scenarios are considered: (a) the propagation paths for a single UE are evenly distributed within the angular spreading range

θ_{l} \sim U (- π / 6, π / 6)

, and (b) the propagation paths for a single UE are clustered within a narrow angular spreading range of

4^{\circ}

, as considered in [40]. In the latter case, if the central angle of arrival for UE p is

θ_{p}

, then the directional angles of its three paths are

θ_{p} - 4^{\circ}

,

θ_{p}

, and

θ_{p} + 4^{\circ}

, respectively. For comparison, we use the unified scheme from [40] as a lower-bound benchmark, which reduces training overhead but sacrifices some performance due to the use of basis vectors for channel estimation. The linear minimum mean square error (LMMSE) scheme serves as the upper-bound benchmark, where each antenna is assigned an orthogonal pilot signal to ensure maximum accuracy. Additionally, we compare the proposed scheme with the Newtonized orthogonal matching pursuit (NOMP) scheme [41].

Figure 3 presents the NMSE performance of the proposed scheme alongside the benchmark methods under different channel conditions. The results show that our approach achieves competitive performance compared to both the unified scheme [40] and the NOMP scheme [41]. The main performance loss in the unified scheme stems from the reduced number of inserted pilot signals and the limited basis used for channel estimation. For the NOMP approach, two primary sources of performance loss exist: first, its initial grid-based search introduces off-grid errors, especially when using a low over-sampling rate (only two, as in [41]); second, its Newtonized step descending search may converge to a local rather than a global optimum. In contrast, our proposed approach effectively mitigates both off-grid errors and local optima, except when identifying peaks in the power spectrum (42). Fortunately, peak detection over a given spectrum is a much simpler task than the iterative search and refinement process used in the NOMP approach, leading to lower overall error. Numerical results further reveal that all methods tend to achieve better NMSE performance when propagation paths are evenly distributed. This is because, in the clustered channel model where paths are confined within a narrow angular range, accurately extracting their directions becomes more challenging. Although our approach outperforms NOMP in clustered channels, its performance degradation is slightly more pronounced. This is due to the ANM algorithm’s tendency to minimize the number of detected paths, making it more sensitive to scenarios where paths are closely spaced. This sensitivity is a key motivation for introducing the DNN-based approach in this work.

Additionally, we evaluate the bit error rate (BER) performance of the proposed approach and benchmark methods. Simulations employ quadrature phase shift keying (QPSK) modulation with maximum likelihood (ML) detection. The results, shown in Figure 4, indicate that both the NOMP and proposed approaches achieve performance comparable to the LMMSE method. Moreover, consistent with our NMSE observations, the proposed ANM approach performs better when propagation paths are evenly distributed than when they are closely clustered.

5.2. Performance of the DNN-Based Approach

To assess the effectiveness of the proposed DNN, we conduct supervised training using data obtained from an exhaustive search over a high-resolution grid. The DNN is designed as a medium-sized network comprising one input layer, three fully connected hidden layers, and one output layer. The hidden layers contain 128, 256, and 384 neurons, respectively. The training and evaluation are performed on a personal computer equipped with an Intel^® Core^TM i7-11700K processor (Intel Corporation, Santa Clara, CA, USA) and 32 GB of RAM, using Matlab R2020b (The MathWorks Inc., Natick, MA, USA). For the training process, we compare three algorithms: the standard Levenberg–Marquardt (LM) algorithm, our proposed simplified LW-WC method (introduced in Section 4.3), and the conventional scaled conjugate gradient algorithm.

For training, we generate a total of 25,600 samples of

y

, with the corresponding

μ, ν, g

as inputs and desired outputs. The dataset consists of 12,800 samples from the sparse channel model (Scenario a) and 12,800 samples from the clustered channel model (Scenario b), following the channel configuration described in Section 6. The training process is conducted with the following hyperparameters: a batch size of 64,120 epochs, and a total of 48,000 iterations, with a learning rate of 0.1.

Figure 5 compares the training accuracy of three algorithms: LM, our modified LM-WC, and the conventional gradient descent algorithm, based on their training loss. It can be observed that the second-order algorithms, LM and LM-WC, converge significantly faster than the first-order gradient descent algorithm. Additionally, as indicated in the figure, a flat-spot problem arises. To address this, the LM-WC algorithm applies weight compression when the average gradient remains unchanged by at least

5 %

for five consecutive epochs. By implementing our simplified LM-WC algorithm, the training process exits the flat spot approximately three epochs earlier. This demonstrates that the proposed LM-WC effectively mitigates training stalls while maintaining nearly the same performance. Figure 6 illustrates the average gradient during DNN training using different algorithms. When weight compression is applied, the gradient improves almost immediately, further confirming that LM-WC can boost the gradient when it becomes trapped in a flat spot. However, while LM-WC accelerates training convergence, its overall performance in terms of loss function and gradient remains nearly identical to that of conventional LM.

We then compare the training results of the proposed simplified LM-WC algorithm with two benchmarks: conventional LM and gradient descent. The training process allows for a maximum of 200 epochs to ensure all algorithms converge to their final error level while also passing validation and testing data. The number of epochs and the total training time required for all testing data to be successfully verified are recorded. To provide a comprehensive evaluation of LM-WC, we consider different values of the learning rate

γ

and compression step size

ρ

in Algorithm 1. Table 3 summarizes the results, where

LM - WC (γ, ρ)

in the algorithm column represents the simplified LM-WC algorithm with a learning rate

γ

and compression step size

ρ

. The results indicate that, in general, LM-WC requires fewer epochs to converge compared to conventional LM but spends more time per epoch due to the additional weight compression operations. The trade-off is that LM-WC achieves convergence with fewer epochs, making it more suitable for high-speed computational environments or scenarios where severe flat-spot issues occur. Additionally, a smaller learning rate

γ

results in more epochs and longer convergence time but tends to yield better performance. Meanwhile, a smaller compression step size

ρ

enhances precision in weight compression, which can further improve results, though at the cost of increased computation time for optimizing the compression parameter

Ω

.

Finally, we utilize the trained DNN to perform ANM-based channel parameter estimation and assess its effectiveness. Since offline training is used and training time is not a primary concern, the DNN can be trained for an extended period to enhance prediction accuracy. In this study, we adopt a DNN trained with the optimal WC parameters listed in Table 3, specifically LM-WC (0.01,1.1), for our DNN-aided ANM. The results, presented in Figure 7 and Figure 8, compare the proposed ANM with LMMSE and ANM-only benchmarks. As seen in the NMSE comparison in Figure 7, the DNN-ANM approach significantly outperforms the conventional ANM and achieves performance nearly on par with the LMMSE method. Notably, in scenarios with clustered paths—where many algorithms struggle—the DNN-ANM method excels. Since channel parameter extraction can be viewed as a pattern recognition problem, where distinct paths must be identified, the DNN-ANM effectively captures the subtle variations among clustered paths. This leads to superior performance compared to traditional mathematical approaches such as the benchmark NOMP.

6. Conclusions

In this work, we propose two approaches for channel parameter estimation in FDD systems: the Atomic Norm Minimization (ANM) approach and the Deep Neural Network (DNN)-based approach. The ANM method constructs an atomic set based on the phases of propagation paths and sequentially extracts each path by solving a dual minimization problem. This method effectively mitigates the off-grid mismatching error present in conventional grid-based techniques. To address potential path confusion in clustered channel environments and reduce computational complexity, we employ a DNN trained to perform parameter extraction. Additionally, we introduce a simplified weight compression algorithm to overcome training stalls. These two approaches can be used in a complementary manner: the ANM method remains operational while the DNN is training, and once the DNN is deployed, the ANM approach can serve as a validation tool for its predictions. Simulation results demonstrate that the ANM approach outperforms conventional methods, while the DNN-based approach further improves estimation accuracy, achieving performance close to that of the LMMSE method. The DNN training requires 25,600 samples (about 10 h of offline training), and its performance degrades significantly when the sample size is insufficient. In the future, few-shot learning (e.g., meta-learning) can be used to reduce the demand for training data.

Author Contributions

Conceptualization, K.X. and C.W.; methodology, K.X., S.L. and C.H.; software, K.X. and D.W.; validation, D.Z., R.J., H.R. and Z.J.; formal analysis, K.X. and X.C.; investigation, W.W.; resources, C.W.; data curation, S.L. and C.H.; writing—original draft preparation, K.X.; writing—review and editing, C.W., S.L. and K.X.; visualization, K.X.; supervision, C.W.; project administration, C.W.; funding acquisition, C.W., K.X. and S.L. contributed equally to this work and share first authorship. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (NSFC) under grant No. 62301337; the Scientific Research Startup Fund of Shenzhen Polytechnic University under grant Nos. 6023271042K, 6024330002K, 6024210083K, 6023312046K, 6025310049K, 6025310020K, and 6024310032K; and the Shenzhen Science and Technology Program under grants GJHZ20220913144203007 and GJHZ20240218113900001.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments, which greatly improved the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Appendix Proof of the Existence of Matrix D

We define the power spectrum with

u (μ, ν) = p (μ) \otimes q (ν)

as follows:

\begin{matrix} {[p (μ) \otimes q (ν)]}^{H} D [p (μ) \otimes q (ν)] \\ = \sum \{{[p (μ) \otimes q (ν)]}^{*} {[p (μ) \otimes q (ν)]}^{T} ⊙ D\} \\ = \sum \{u_{l} {(μ, ν)}^{*} u_{l} {(μ, ν)}^{T} ⊙ D\} . \end{matrix}

(A1)

For convenience, we define the matrix

G = u_{l} {(μ, ν)}^{*} u_{l} {(μ, ν)}^{T}

. As shown in (6) and (7), all elements in

p (μ)

and

q (μ)

are exponential functions, which also holds for all elements in

G

. The matrix

G \in C^{M N \times M N}

is both Hermitian and a twofold Toeplitz matrix. To facilitate further analysis, we partition

G

into

M \times M

blocks as follows:

G = [\begin{matrix} G_{0} & G_{1} & \dots & G_{M - 1} \\ G_{1}^{H} & G_{0} & \dots & G_{M - 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ G_{M}^{H} & G_{M - 1}^{H} & \dots & G_{0} \end{matrix}],

(A2)

where each block

G i \in C^{N \times N}, i = 1, \dots, M

is a Toeplitz matrix, with the main diagonal elements of

G 0

all set to 1. Similarly, the Hermitian matrix

D

is partitioned into block matrices as follows:

D = [\begin{matrix} D_{0, 0} & D_{0, 1} & \dots & D_{0, M - 1} \\ D_{0, 1}^{H} & D_{0, 0} & \dots & D_{0, M - 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ D_{0, M - 1}^{H} & D_{0, M - 2}^{H} & \dots & D_{0, 0} \end{matrix}] .

(A3)

We define the matrix

T j

as

\begin{matrix} T_{j} & = \sum_{i = 0}^{M - 1 - j} D_{i, i + j - 1}, j = 0, \dots, M - 1 \\ \{\begin{matrix} \sum diag (T_{0}, 0) & = 1 \\ \sum diag (T_{0}, k) & = 0, & k = 1, \dots, N - 1 \\ \sum diag (T_{j}, k) & = 0, & k = 1 - N, \dots, N - 1 \\ j = 1, \dots, M - 1 \end{matrix}, \end{matrix}

(A4)

where

diag (T, k)

represents the kth diagonal of

T

—either above (

k > 0

) or below (

k < 0

) the main diagonal (

k = 0

corresponds to the main diagonal itself). The operator ∑ denotes the summation of all elements. For blocks

D_{i, j}

satisfying the constraints in (A4), the following condition holds:

\sum (G ⊙ D) = 1,

(A5)

which ensures that

{[p (μ) \otimes q (ν)]}^{H} D [p (μ) \otimes q (ν)] = 1

.

References

Andrews, J.G.; Buzzi, S.; Choi, W.; Hanly, S.V.; Lozano, A.; Soong, A.C.K.; Zhang, J.C. What Will 5G Be? IEEE J. Sel. Areas Commun. 2014, 32, 1065–1082. [Google Scholar] [CrossRef]
Marzetta, T.L. Noncooperative Cellular Wireless with Unlimited Numbers of Base Station Antennas. IEEE Trans. Wirel. Commun. 2010, 9, 3590–3600. [Google Scholar] [CrossRef]
Li, K.; Li, Y.; Cheng, L.; Luo, Z.Q. Enhancing Multi-Stream Beamforming Through CQIs for 5G NR FDD Massive MIMO Communications: A Tuning-Free Scheme. IEEE Trans. Wirel. Commun. 2024, 23, 17508–17521. [Google Scholar] [CrossRef]
Wang, A.; Yin, R.; Wei, G. Spatial-Sampling-Based Spectrum Aliasing Analysis and Antenna Array Structure Optimization for Massive MIMO Systems. IEEE Internet Things J. 2025, 12, 21618–21629. [Google Scholar] [CrossRef]
Li, M.; Han, Y.; Lu, Z.; Jin, S.; Zhu, Y.; Wen, C.K. Keypoint Detection Empowered Near-Field User Localization and Channel Reconstruction. IEEE Trans. Wirel. Commun. 2025, 24, 5664–5677. [Google Scholar] [CrossRef]
Xu, H.; Zhang, J.; Tang, P.; Tian, L.; Wang, Q.; Liu, G. An Empirical Study on Channel Reciprocity in TDD and FDD Systems. IEEE Open J. Veh. Technol. 2024, 5, 108–124. [Google Scholar] [CrossRef]
Lu, Q.; Li, M.; Han, Y.; Jin, S. Learning-Based Rich Scattering Channel Estimation for U6G FDD XL-MIMO Systems. IEEE Commun. Lett. 2025, 29, 2969–2973. [Google Scholar] [CrossRef]
Liao, J.; Vehkalahti, R.; Pllaha, T.; Han, W.; Tirkkonen, O. Modular CSI Quantization for FDD Massive MIMO Communication. IEEE Trans. Wirel. Commun. 2023, 22, 8543–8558. [Google Scholar] [CrossRef]
Tan, J.; Wang, J.; Song, J. Angle-Domain Partition Beam Pattern-Based Beam Training in Sub-THz Extremely Large-Scale Antenna Array Communication Systems. IEEE Trans. Broadcast. 2025, 71, 741–755. [Google Scholar] [CrossRef]
Qing, C.; Liu, Z.; Hu, W.; Zhang, Y.; Cai, X.; Du, P. LoS Sensing-Based Channel Estimation in UAV-Assisted OFDM Systems. IEEE Wirel. Commun. Lett. 2024, 13, 1320–1324. [Google Scholar] [CrossRef]
Qing, C.; Hu, W.; Liu, Z.; Ling, G.; Cai, X.; Du, P. Sensing-Aided Channel Estimation in OFDM Systems by Leveraging Communication Echoes. IEEE Internet Things J. 2024, 11, 38023–38039. [Google Scholar] [CrossRef]
Li, J.; Da Costa, M.F.; Mitra, U. Joint Localization and Orientation Estimation in Millimeter-Wave MIMO OFDM Systems via Atomic Norm Minimization. IEEE Trans. Signal Process. 2022, 70, 4252–4264. [Google Scholar] [CrossRef]
Groll, H.; Gerstoft, P.; Hofer, M.; Blumenstein, J.; Zemen, T.; Mecklenbräuker, C.F. Scatterer Identification by Atomic Norm Minimization in Vehicular mm-Wave Propagation Channels. IEEE Access 2022, 10, 102334–102354. [Google Scholar] [CrossRef]
He, S.; Wang, J.; Huang, Y.; Ottersten, B.; Hong, W. Codebook-Based Hybrid Precoding for Millimeter Wave Multiuser Systems. IEEE Trans. Signal Process. 2017, 65, 5289–5304. [Google Scholar] [CrossRef]
Liu, Z.; Ma, B.; Liu, J.; Yang, K.; Wang, Y. Joint DOA-Range Estimation for Coherent Signals Exploiting Moving Time-Modulated Frequency Diverse Coprime Array. IEEE Signal Process. Lett. 2025, 32, 3186–3190. [Google Scholar] [CrossRef]
Patra, R.K. A Novel Third-Order Nested Array for DOA Estimation with Increased Degrees of Freedom. IEEE Signal Process. Lett. 2025, 32, 1475–1479. [Google Scholar] [CrossRef]
Imtiaz, S.; Dahman, G.S.; Rusek, F.; Tufvesson, F. On the directional reciprocity of uplink and downlink channels in Frequency Division Duplex systems. In Proceedings of the 2014 IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication (PIMRC), Washington, DC, USA, 2–5 September 2014; IEEE: Piscataway, NJ, USA, 2014. [Google Scholar] [CrossRef]
Series, P. Propagation data and prediction methods for the planning of indoor radiocommunication systems and radio local area networks in the frequency range 900 MHz to 100 GHz. In Recommendation ITU-R; Electronic Publication: Middlesex, UK, 2012; pp. 1238–1247. [Google Scholar]
Zhang, X.; Xu, L.; Xu, L.; Xu, D. Direction of Departure (DOD) and Direction of Arrival (DOA) Estimation in MIMO Radar with Reduced-Dimension MUSIC. IEEE Commun. Lett. 2010, 14, 1161–1163. [Google Scholar] [CrossRef]
Li, J.; Zhang, X.; Jiang, D. DOD and DOA estimation for bistatic coprime MIMO radar based on combined ESPRIT. In Proceedings of the 2016 CIE International Conference on Radar (RADAR), Guangzhou, China, 10–13 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–4. [Google Scholar] [CrossRef]
Rappaport, T.S.; MacCartney, G.R.; Samimi, M.K.; Sun, S. Wideband Millimeter-Wave Propagation Measurements and Channel Models for Future Wireless Communication System Design. IEEE Trans. Commun. 2015, 63, 3029–3056. [Google Scholar] [CrossRef]
Samimi, M.K.; Rappaport, T.S. 3-D statistical channel model for millimeter-wave outdoor mobile broadband communications. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 2430–2436. [Google Scholar] [CrossRef]
MacCartney, G.R.; Samimi, M.K.; Rappaport, T.S. Exploiting directionality for millimeter-wave wireless system improvement. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 2416–2422. [Google Scholar] [CrossRef]
Chandrasekaran, V.; Recht, B.; Parrilo, P.A.; Willsky, A.S. The Convex Geometry of Linear Inverse Problems. Found. Comput. Math. 2012, 12, 805–849. [Google Scholar] [CrossRef]
Prasobh Sankar, R.S.; Deepak, B.; Chepuri, S.P. Joint Communication and Radar Sensing with Reconfigurable Intelligent Surfaces. In Proceedings of the 2021 IEEE 22nd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lucca, Italy, 27–30 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 471–475. [Google Scholar] [CrossRef]
Tang, W.G.; Jiang, H.; Pang, S.X. Grid-Free DOD and DOA Estimation for MIMO Radar via Duality-Based 2D Atomic Norm Minimization. IEEE Access 2019, 7, 60827–60836. [Google Scholar] [CrossRef]
Tütüncü, R.H.; Toh, K.C.; Todd, M.J. Solving semidefinite-quadratic-linear programs using SDPT3. Math. Program. 2003, 95, 189–217. [Google Scholar] [CrossRef]
Toh, K.C.; Todd, M.J.; Tütüncü, R.H. SDPT3—A MATLAB software package for semidefinite programming, version 1.3. Optim. Method. Softw. 1999, 11, 545–581. [Google Scholar] [CrossRef]
Li, M.; Zhang, S.; Ge, Y.; Gao, F.; Fan, P. Joint Channel Estimation and Data Detection for Hybrid RIS Aided Millimeter Wave OTFS Systems. IEEE Trans. Commun. 2022, 70, 6832–6848. [Google Scholar] [CrossRef]
Sun, P.; Dong, M.; Guo, Q.; Cui, J.; Yu, H.; Liu, F. Low Complexity Channel Estimation Based on UAMP for Orthogonal Time Frequency Space Systems. IEEE Commun. Lett. 2025, 29, 2208–2212. [Google Scholar] [CrossRef]
Sun, H.; Chen, X.; Shi, Q.; Hong, M.; Fu, X.; Sidiropoulos, N.D. Learning to Optimize: Training Deep Neural Networks for Interference Management. IEEE Trans. Signal Process. 2018, 66, 5438–5453. [Google Scholar] [CrossRef]
Wang, L.; Zhang, X.; Su, H.; Zhu, J. A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5362–5383. [Google Scholar] [CrossRef]
Alkhateeb, A.; Alex, S.; Varkey, P.; Li, Y.; Qu, Q.; Tujkovic, D. Deep Learning Coordinated Beamforming for Highly-Mobile Millimeter Wave Systems. IEEE Access 2018, 6, 37328–37348. [Google Scholar] [CrossRef]
Mao, Q.; Hu, F.; Hao, Q. Deep Learning for Intelligent Wireless Networks: A Comprehensive Survey. IEEE Commun. Surv. Tut. 2018, 20, 2595–2621. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Cogn. Model. 1988, 5, 533–536. [Google Scholar]
Hagan, M.T.; Menhaj, M.B. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 1994, 5, 989–993. [Google Scholar] [CrossRef]
Vitela, J.E.; Reifman, J. Premature saturation in backpropagation networks: Mechanism and necessary conditions. Neural Netw. 1997, 10, 721–735. [Google Scholar] [CrossRef]
Smith, J.S.; Wu, B.; Wilamowski, B.M. Neural Network Training With Levenberg–Marquardt and Adaptable Weight Compression. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 580–587. [Google Scholar] [CrossRef]
Gao, Z.; Hu, C.; Dai, L.; Wang, Z. Channel Estimation for Millimeter-Wave Massive MIMO with Hybrid Precoding over Frequency-Selective Fading Channels. IEEE Commun. Lett. 2016, 20, 1259–1262. [Google Scholar] [CrossRef]
Xie, H.; Gao, F.; Zhang, S.; Jin, S. A Unified Transmission Strategy for TDD/FDD Massive MIMO Systems With Spatial Basis Expansion Model. IEEE Trans. Veh. Technol. 2017, 66, 3170–3184. [Google Scholar] [CrossRef]
Han, Y.; Hsu, T.H.; Wen, C.K.; Wong, K.K.; Jin, S. Efficient Downlink Channel Reconstruction for FDD Multi-Antenna Systems. IEEE Trans. Wirel. Commun. 2019, 18, 3161–3176. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed DNN (schematic diagram).

Figure 2. Weight compression against a flat spot.

Figure 3. MMSE performance comparison for the proposed ANM approach. (a) Paths evenly distributed. (b) Paths clustered [40,41].

Figure 4. BER performance comparison for the proposed ANM approach. (a) Paths evenly distributed. (b) Paths clustered [40,41].

Figure 5. Train accuracy comparison of different algorithms.

Figure 6. Average gradient comparison of different algorithms.

Figure 7. MMSE performance comparison for the DNN-based approach.

Figure 8. BER performance comparison for the DNN-based approach.

Table 1. Complexity of ANM.

Stage	Complexity (FLOPs)
Check Stopping Criterion	$O (L M S)$
Build Matrix $D$	$O (L M S)$
Solve (41) with SDPT3	$O (\frac{1}{2} M^{2} S^{2} + 2 M S^{3})$
Find the peaks from spectrum	$O (L p_{m} p_{n} M S)$
Overall complexity	$O (M^{2} S^{2} + M S^{3})$

Table 2. Comparison of Complexity with Message Passing Techniques.

Stage	Complexity	FLOPs
Proposed ANM	$O (M^{2} S^{2} + M S^{3})$	5.2 × 10¹⁰
Joint Data Detection [29]	$O (N_{i t} M^{2} S^{3})$	4.6 × 10¹³
Unitary Approximation [30]	$O (N_{i t} (M^{2} S^{3} + M^{3} S^{2}))$	6.6 × 10¹³
MUSIC	$O (M^{2} (L + N_{i t}) S^{3} + M^{3} S^{2})$	7.5 × 10¹³
ESPRIT	$O (M^{2} L + M^{3} + M S^{2} + S^{3})$	1.6 × 10¹³

Table 3. Training Result.

Algorithm	Epochs	Time (s)	Final MSE
Gradient Descent	134.5	31.2	1.13 × 10⁻⁷
LM	97.7	929.6	1.43 × 10⁻¹¹
LM-WC (0.1,1.1)	75.5	876.2	1.44 × 10⁻¹¹
LM-WC (0.1,1.5)	78.3	852.0	1.77 × 10⁻¹¹
LM-WC (0.01,1.1)	83.0	956.5	2.23 × 10⁻¹²
LM-WC (0.01,1.5)	86.7	953.7	3.14 × 10⁻¹²

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, K.; Li, S.; Huang, C.; Wu, D.; Wei, C.; Zhang, D.; Jin, R.; Ren, H.; Ji, Z.; Chen, X.; et al. Deep Learning-Driven Atomic Norm Optimization for Accurate Downlink Channel Estimation in FDD Systems. Electronics 2026, 15, 1461. https://doi.org/10.3390/electronics15071461

AMA Style

Xu K, Li S, Huang C, Wu D, Wei C, Zhang D, Jin R, Ren H, Ji Z, Chen X, et al. Deep Learning-Driven Atomic Norm Optimization for Accurate Downlink Channel Estimation in FDD Systems. Electronics. 2026; 15(7):1461. https://doi.org/10.3390/electronics15071461

Chicago/Turabian Style

Xu, Ke, Sining Li, Changwei Huang, Dan Wu, Changning Wei, Dongjun Zhang, Richu Jin, Huilin Ren, Zhuoqiao Ji, Xinbo Chen, and et al. 2026. "Deep Learning-Driven Atomic Norm Optimization for Accurate Downlink Channel Estimation in FDD Systems" Electronics 15, no. 7: 1461. https://doi.org/10.3390/electronics15071461

APA Style

Xu, K., Li, S., Huang, C., Wu, D., Wei, C., Zhang, D., Jin, R., Ren, H., Ji, Z., Chen, X., & Wu, W. (2026). Deep Learning-Driven Atomic Norm Optimization for Accurate Downlink Channel Estimation in FDD Systems. Electronics, 15(7), 1461. https://doi.org/10.3390/electronics15071461

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Driven Atomic Norm Optimization for Accurate Downlink Channel Estimation in FDD Systems

Abstract

1. Introduction

2. System Model and Problem Formulation

2.1. Channel Model

2.2. Frequency-Independent Parameter Extraction

2.3. Estimation of DL Channel

2.4. Necessity of Angular Reciprocity

3. Parameter Estimation with Atomic Norm Minimization

3.1. The Atomic Norm Minimization Problem

3.2. Construction of the Dual Problem

3.3. Complexity Analysis

4. Parameter Estimation with DNN

4.1. Exhaustive Search for Training Data Set

4.2. Architecture of the Proposed DNN

4.3. Simplified LM-WC for Training

4.4. The Prediction Stage

5. Numerical Results

5.1. Performance of the Atomic Norm Minimization Approach

5.2. Performance of the DNN-Based Approach

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Appendix Proof of the Existence of Matrix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI