Attention-Based AFDM Channel Estimation Network Using Diagonal Reconstruction

Yin, Jiale; Xu, Shangzhi; Li, Zhipeng

doi:10.3390/electronics15050957

Open AccessArticle

Attention-Based AFDM Channel Estimation Network Using Diagonal Reconstruction

by

Jiale Yin

,

Shangzhi Xu

and

Zhipeng Li

^*

State Key Laboratory of High-Speed Maglev Transportation Technology, Department of Information and Communication Engineering, Tongji University, Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(5), 957; https://doi.org/10.3390/electronics15050957

Submission received: 25 January 2026 / Revised: 18 February 2026 / Accepted: 19 February 2026 / Published: 26 February 2026

(This article belongs to the Section Microwave and Wireless Communications)

Download

Browse Figures

Versions Notes

Abstract

Affine Frequency Division Multiplexing (AFDM) has been proposed for future high-mobility communication scenarios. However, existing AFDM channel estimation methods suffer significant performance degradation under fractional Doppler conditions due to path energy dispersion. To address this issue, we propose a deep learning network that adaptively learns path energy dispersion through a 1D processing module and a Transformer block, based on the diagonal reconstruction of the AFDM effective channel matrix. 1D processing module employs convolutions with different kernel sizes to extract pilot features, and Transformer block models vary energy dispersion patterns. The proposed method does not require prior knowledge of the number of paths and the assumption of distinct path delays. Simulation results demonstrate that at a Signal-to-Noise Ratio (SNR) of 25 dB, the proposed method achieves up to a 4 dB gain in Normalized Mean Square Error (NMSE) and an 6 dB improvement in Bit Error Rate (BER) over existing traditional methods under fractional Doppler conditions.

Keywords:

affine frequency division multiplexing; deep learning; channel estimation

1. Introduction

Next-generation wireless communication systems (6G) are envisioned to encompass high-mobility communication scenarios, such as high-speed railway networks, vehicle-to-everything communications, unmanned aerial vehicles, and low-earth-orbit satellite constellations [1]. In the current 4G and 5G standards, Orthogonal Frequency Division Multiplexing (OFDM) serves as the dominant waveform. However, its inherent advantages are severely compromised in high-mobility environments. The rapid relative motion between the transmitter and receiver induces significant Doppler shifts, destroying the orthogonality among subcarriers and leading to severe Inter-Carrier Interference (ICI) [2,3]. Consequently, the research community has shifted focus towards designing new waveforms capable of achieving full diversity in doubly dispersive channels.

To address the challenges posed by high-mobility environments, Orthogonal Time Frequency Space (OTFS) modulation was introduced [4], which transforms the time-varying fading channel into a quasi-static 2D channel to achieve full diversity [5,6]. However, the 2D nature of OTFS entails high pilot overhead. As a promising alternative, Affine Frequency Division Multiplexing (AFDM) [7] has recently emerged. AFDM achieves full diversity with lower complexity by adjusting the chirp parameters based on the channel’s delay-Doppler profile. Moreover, as a one-dimensional modulation scheme, AFDM achieves channel estimation with lower guard-symbol overhead than OTFS [8].

Given its potential, AFDM has garnered widespread attention, with recent studies expanding its applicability across various domains, including Index Modulation (IM) [9], multi-user uplink schemes [10], Reconfigurable Intelligent Surfaces (RISs) [11], and Integrated Sensing and Communication (ISAC) [12,13]. However, the foundation of all the aforementioned domains is the availability of accurate Channel State Information (CSI). While several conventional channel estimation schemes have been proposed, they exhibit significant susceptibility to fractional Doppler shifts and inter-Doppler interference (IDoI), leading to severe degradation in estimation accuracy (detailed in Section 2).

To perform AFDM channel estimation under fractional-Doppler conditions, we propose a novel deep learning network based on the Transformer architecture. Our main contributions are outlined as follows:

1.: A 1D feature processing module is applied to capture pilot features with varying kernel sizes: small kernels for distinct paths and large kernels for smeared energy clusters.
2.: Self-attention module, based on Vision Transformer(ViT) blocks, is integrated to act as a global energy aggregator. This allows the network to effectively suppress path ambiguity by correlating and gathering the dispersed energy to implicitly model path parameters.
3.: Compared to conventional methods, the proposed method requires no prior knowledge of the number of paths and the assumption of distinct path delays, and demonstrates superior performance in minimizing NMSE and BER with an efficient architecture.

The paper is organized as follows. In Section 2, the related works are reviewed. In Section 3, the AFDM system model is described, and the embedded pilot channel estimation method is discussed. In Section 4, the details of the proposed module, data collection, and training are described. In Section 5, comparative experiments and the complexity of different methods are considered. Finally, the conclusion is drawn in Section 6.

2. Related Works

2.1. Traditional Channel Estimation in AFDM

Several channel estimation (CE) schemes have been proposed. An embedded pilot-aided scheme was introduced in [14], utilizing guard bands to isolate the pilot from data symbols, thereby mitigating pilot-data interference. Exploiting the unique structure of AFDM, the Embedded Pilot-Aided Diagonal Reconstruction (EPA-DR) method [15] was proposed, which points out that the full AFDM effective channel matrix can be reconstructed from a single column using a deterministic transformation factor. Other approaches, such as the Embedded Pilot-Aided Approximated Maximum Likelihood (EPA-AML) [16], attempt to estimate channel parameters by brute searching for Doppler shifts. Additionally, Compressed Sensing (CS) based methods, including Orthogonal Matching Pursuit (OMP) and off-grid estimation [17,18].

Despite their commendable performance in ideal settings, these methods exhibit significant susceptibility to fractional Doppler shifts and inter-Doppler interference (IDoI), both of which lead to a severe degradation in estimation accuracy. EPA-DR relies on hard thresholds to detect paths. When fractional Doppler occurs, the energy of a single path spreads beyond its principal location. Because fixed thresholds lack the flexibility to adapt to varying Doppler shifts, spreading energy is often filtered out, leading to a significant loss of channel information. EPA-AML operates on the a priori assumption of distinct path delays and a known number of propagation paths. However, when paths share the same delay, IDoI significantly impairs the estimation accuracy of delay and Doppler parameters. This limitation inherently compromises the practical viability of the method in complex scattering environments.

2.2. Deep Learning-Based Channel Estimation in OTFS

The severe path energy dispersion and interference induced by fractional Doppler shifts are not unique to AFDM; they also pose significant challenges in OTFS systems. To overcome the limitations of traditional algorithms in these environments, Deep Learning (DL) has been applied for OTFS channel estimation. For instance, some approaches formulate the task as an image recovery problem, utilizing video denoising networks to exploit correlations among multiple coarse channel estimates for enhanced accuracy [19]. Deep Residual Attention Network (DRAN) employs cascaded attention mechanisms to serve as adaptive filters that explicitly suppress interference from data and noise around embedded pilots [20]. Inspired by the structural similarity between the effective channel matrices of OTFS and AFDM [21], these successes demonstrate that DL models offer a powerful alternative for AFDM.

2.3. Motivation

In AFDM channel estimation, each position in the truncated received signal inherently corresponds to a specific delay-Doppler pair. Under integer-Doppler conditions, channel parameters can be efficiently estimated by simply detecting the presence of a signal at these positions. However, when fractional Doppler shifts occur, the energy of a single path disperses infinitely across the domain, leading to severe path ambiguity that renders direct parameter extraction infeasible.

Motivated by this physical mechanism, our network first expands the dimension of the received signal to mitigate inter-path interference and alleviate initial path ambiguity. Subsequently, the received signal sequence is divided into patches, effectively grouping specific positions of the received signal that inherently correspond to delay-Doppler pairs. By processing these position-specific segments, the patch embedding layer implicitly models the underlying path parameters from the dispersed energy within each patch. The self-attention mechanism is then employed to effectively combat path ambiguity by capturing global correlations among the dispersed energy components. Finally, the network leverages the inherent diagonal reconstructability of the AFDM effective channel by utilizing transposed convolutions to simulate this reconstruction behavior. The network seamlessly maps the implicitly modeled path parameters back into the full AFDM effective channel matrix.

2.4. Notation and Normalization

To maintain consistency and clarity throughout the system model, the key physical variables and their normalized discrete-time counterparts are summarized in Table 1. We define the bandwidth as B (Hz) and the total number of subcarriers as N. The sampling interval (the time between consecutive samples) is defined as

T_{s} = 1 / B

(seconds). Consequently, the total duration of one AFDM symbol (excluding the prefix) is

T_{s y m} = N T_{s}

(seconds). The subcarrier spacing, which also represents the Doppler resolution, is given by

Δ f = 1 / T_{s y m} = B / N

(Hz).

Physical parameters, such as path delay

τ_{i}

(seconds) and Doppler shift

f_{i}

(Hz), are normalized to these fundamental units. Specifically, the normalized delay

l_{i} = τ_{i} / T_{s}

is assumed to be an integer based on the high-bandwidth sampling assumption, and the normalized Doppler shift

v_{i} = f_{i} / Δ f = f_{i} N T_{s}

includes both integer

α_{i}

and fractional

a_{i}

components.

3. Review of the AFDM System Model and Channel Estimation

In this section, the AFDM system model and the embedded pilot-aided channel estimation method are briefly introduced. The block diagram of AFDM systems is demonstrated in Figure 1.

3.1. System Model of AFDM

In the following, we consider a Single-Input, Single-Output (SISO) AFDM system. The modulation and demodulation schemes, along with the channel model of AFDM, are presented.

3.1.1. Modulation

All transmit symbols are in the Discrete Affine Fourier Transform (DAFT) domain, let

x = {[x [0], \dots, x [N - 1]]}^{T} \in A^{N \times 1}

denote the vector of N phase shift keying (PSK)/quadrature amplitude modulation (QAM) symbols in the DAFT domain, which includes pilots, data, and guard symbols, and

A

represent the modulation alphabet. Following the serial-to-parallel (S/P) operation, the discrete-time domain representation of the signal

s \in C^{N \times 1}

is obtained via the Inverse Discrete Affine Fourier Transform (IDAFT), which can be specifically formulated as

s [n] = \sum_{m = 0}^{N - 1} x [m] \cdot \frac{1}{\sqrt{N}} e^{j 2 π (c_{1} n^{2} + c_{2} m^{2} + \frac{n m}{N})}

(1)

where n and m denote the time and DAFT domains indices, respectively, and N represents the number of subcarriers. The AFDM signal has a bandwidth

B = \frac{1}{T_{s}}

; subcarrier spacing (Doppler resolution)

Δ f = \frac{B}{N} = \frac{1}{N T_{s}}

.

c_{1}

,

c_{2}

are adjustable parameters of AFDM; and

c_{1}

determines the slope of chirp. To achieve the optimal diversity order of the given Delay-Doppler profile of the environment [16],

c_{2}

can be an arbitrary irrational number or a rational number sufficiently smaller than

\frac{1}{2 N}

, and

c_{1}

should be tuned as follows

c_{1} = \frac{2 (α_{\max} + k_{v}) + 1}{2 N}

(2)

where

k_{v}

means additional guard symbols between pilot and data, and

α_{\max}

represents the max normalized integer Doppler, which will be further explored later in this paper, then the modulation of AFDM in Equation (1) can be rewritten in matrix form as

s = A^{H} x = Λ_{c_{1}}^{H} F^{H} Λ_{c_{2}}^{H} x

(3)

where

A = Λ_{c_{2}} F Λ_{c_{1}} \in C^{N \times N}

represents the DAFT transform matrix, and

F

represents the N-point discrete Fourier transform (DFT) matrix with entries

e^{- j 2 π m n / N} / \sqrt{N}

,

m, n \in {1, \dots, N}

, and

Λ_{c}

is a frequency adjustment diagonal matrix given by

Λ_{c} = diag ({[1, e^{- j 2 π c}, \dots, e^{- j 2 π c {(N - 1)}^{2}}]}^{T}) .

(4)

A chirp-periodic prefix (CPP) should be added to

s

before transmitting; this prefix, analogous to the cyclic prefix (CP) in OFDM systems, aims to mitigate inter-symbol interference caused by multipath propagation. After parallel-to-serial conversion, the signal is transmitted in the time domain.

3.1.2. Channel Model

The impulse response of a doubly selective fading channel at time n and delay l can be represented as

g_{n} (l) = \sum_{i = 1}^{P} h_{i} e^{- j 2 π f_{i} n} δ (l - τ_{i} / T_{s}),

(5)

where P is the number of propagation paths;

δ (\cdot)

is the Dirac delta function;

h_{i}

,

f_{i}

, and

τ_{i}

are the complex path coefficients, Doppler shift, and the delay of the i-th propagation path, respectively.

The discrete-time received signal

r = {[r [0], \dots, r [N - 1]]}^{T} \in C^{N \times 1}

, after propagating through the time-varying multipath channel, is characterized as follows

r [n] = \sum_{i = 1}^{P} h_{i} s [n - l_{i}] e^{- j \frac{2 π}{N} v_{i} n} + w [n],

(6)

where

w [n]

represents the additive white Gaussian noise (AWGN) modeled by a complex normal distribution

CN (0, N_{0})

.

N_{0}

is the variance of the AWGN, which corresponds to the noise power. Furthermore,

v_{i}

signifies the Doppler shift associated with the i-th propagation path, normalized relative to the subcarrier spacing. Correspondingly,

l_{i}

denotes the i-th path delay, normalized by the sampling interval

T_{s}

. More specifically, the normalized parameters for the i-th path are defined as

l_{i} = \frac{τ_{i}}{T_{s}}, v_{i} = f_{i} N T_{s},

(7)

where

l_{i} \in [0, l_{max}]

is the normalized delay, with

l_{max}

denoting the maximum delay spread of the channel. The normalized Doppler shift

v_{i} \in [- v_{max}, v_{max}]

can be decomposed into an integer component

α_{i} \in [- α_{max}, α_{max}]

and a fractional component

a_{i} \in (- 1 / 2, 1 / 2]

,

v_{i} = α_{i} + a_{i}

.

After discarding the CPP, the received signal

r

in Equation (6) can be rewritten in matrix form as

r = H s + w

(8)

where

w \sim CN (0, N_{0} I)

, while the effective channel matrix

H

is given by

H = \sum_{i = 1}^{P} h_{i} Γ_{{CPP}_{i}} Δ_{f_{i}} Π^{l_{i}}

. Here,

Π

represents the forward cyclic-shift matrix:

Π = {[\begin{matrix} 0 & \dots & 0 & 1 \\ 1 & \dots & 0 & 0 \\ ⋮ & ⋱ & ⋱ & ⋮ \\ 0 & \dots & 1 & 0 \end{matrix}]}_{N \times N}

(9)

where

Δ_{f_{i}} ≜ diag (e^{- j 2 π f_{i} n}, n = 0, 1, \dots, N - 1)

and

Γ_{{CPP}_{i}}

is an

N \times N

diagonal matrix:

Γ_{{CPP}_{i}} = diag \{\begin{matrix} e^{- j 2 π c_{1} (N^{2} - 2 N (l_{i} - n))} & n < l_{i} \\ 1 & n \geq l_{i} \end{matrix}

(10)

3.1.3. Demodulation

At the receiver side, the DAFT domain demodulated symbols are obtained by the following matrix

y = Λ_{c_{2}} F Λ_{c_{1}} r = H_{eff} x + \tilde{w}

(11)

where the effective channel matrix is defined as

H_{eff} ≜ A H A^{H}

, and the transformed noise vector is given by

\tilde{w} = A w

. Given that

A

is a unitary operator, the statistical characteristics of

\tilde{w}

remain identical to those of the original noise vector

w

.

3.1.4. Input–Output Relation

To explicitly characterize the individual contribution of each propagation path, (11) can be reformulated as follows:

y = \sum_{i = 1}^{P} h_{i} {\tilde{H}}_{i} x + \tilde{w}

(12)

where

{\tilde{H}}_{i} ≜ A Γ_{{CPP}_{i}} Δ_{f_{i}} Π^{l_{i}} A^{H}

. It can be shown that

\begin{matrix} {\tilde{H}}_{i} [p, q] = & \frac{1}{N} e^{j \frac{2 π}{N} (N c_{1} l_{i}^{2} - q l_{i} + N c_{2} (q^{2} - p^{2}))} \\ \times \sum_{n = 0}^{N - 1} e^{- j \frac{2 π}{N} ((p - q + v_{i} + 2 N c_{1} l_{i}) n)} \end{matrix}

(13)

The summation term in (13) varies significantly depending on whether the normalized Doppler shift takes an integer value. In light of this, the following two conditions are investigated.

(1) Integer Doppler Shifts: Specifically, when the Doppler shift is restricted to an integer value (i.e.,

a_{i} = 0

), the normalized shift simplifies to

v_{i} = α_{i}

. Under this condition, the i-th sub-channel matrix exhibits a sparse structure, where its elements

H_{i} [p, q]

are non-zero only at specific indices, as expressed by

{\tilde{H}}_{i} [p, q] = \{\begin{matrix} e^{j \frac{2 π}{N} (N c_{1} l_{i}^{2} - q l_{i} - N c_{2} (q^{2} - p^{2}))} & q = {(p + l o c_{i})}_{N} \\ 0 & otherwise \end{matrix}

(14)

(2) Fractional Doppler Shifts: It is evident from Figure 2 that under fractional normalized Doppler shifts, the structure of the equivalent path matrix deviates from its original form of P diagonals. An infinite energy leakage is observed around these diagonals, yet the dominant portion of the energy is still confined to the neighborhood of the diagonals.

3.2. Embedded Pilot-Aided Channel Estimation

From Equation (11),

H_{eff}

only have

Q + 1

non-zero elements per column under Integer Doppler condition,

Q = (l_{max} + 1) (2 (α_{max} + k_{v}) + 1) - 1

. As a result, Q guard symbols are set between the pilot symbol and data symbols so that the channel estimation can be done with less interference from the data symbols. In this paper, we place a single pilot at the Tx frame; the channel estimation can be performed by means of the subsequent pilot placement:

x [n] = \{\begin{matrix} x_{pilot}, & n = Q \\ 0, & 0 \leq n < Q, Q < n \leq 2 Q \\ x_{data}, & e l s e \end{matrix}

(15)

Due to the structure of

H_{eff}

,

Q + 1

elements of Rx frame that are most relevant to the transmitted pilots are considered for the channel estimation. These symbols can be derived as

y_{E} = H_{eff, E} x_{E} + {\tilde{w}}_{E}

(16)

where E stands for channel estimation. Define

T_{x} = {[I_{N}]}_{i n d_{x}}

, and

T_{y} = {[I_{N}]}_{i n d_{y}}

to extract specific rows or columns from matrix,

i n d_{x} = [0 : 2 Q]

and

i n d_{y} = [(α_{max} + k_{v}) : Q + (α_{max} + k_{v})]

,

x_{E} = T_{x} x

,

y_{E} = T_{y} y

and

H_{eff, E} = T_{y} H_{eff} T_{x}^{H}

.

In [16],

l_{i}

,

f_{i}

, and

h_{i}

can be estimated through the following equation, which has good performance if the paths have different delays.

l (y_{E} ∣ θ, x_{E}) = {∥y_{E} - H_{eff, E} x_{E}∥}^{2} .

(17)

According to the diagonal reconstructability [15] of the AFDM channel matrix, the whole

H_{eff}

can be reconstructed “diagonally” as long as we acquire an arbitrary column of

H_{eff}

, which can be expressed as

H [{(m + 1)}_{N}, {(m^{'} + 1)}_{N}] = T (l, m, m^{'}) H [m, m^{'}]

(18)

where

T (l, m, m^{'})

is the transform factor between

H [m, m^{'}]

and

H [{(m + 1)}_{N}, {(m^{'} + 1)}_{N}]

, which is just relevant to the location and the delay. In our proposed scheme, the transformation factor can be adaptively learned through network training rather than being a fixed constant value as in [15].

4. Proposed Scheme

In this section, a deep learning-based channel estimation with an attention network is proposed to extract the pilot feature from the received signal and then reconstruct

{\hat{H}}_{eff}

.

4.1. 1D Process Module

The 1D process module is designed to suppress interference from data and noise and to extract pilot features across different scales. As shown in Figure 3, this module consists of one linear layer, three 1D convolutions with different kernel sizes as 1, 3, and 5, and one convolutional layer.

The input of this module

Y_{E}

can be denoted as

Y_{E} = [y_{E, R}, y_{E, I}] \in R^{2 \times (Q + 1) \times 1}

(19)

where

y_{E, R}

and

y_{E, I}

represent, respectively, the real and imaginary parts of the truncated received signal

y_{E}

, and Q means guard symbols between pilot and signal, which is shown in Section 3.2.

The input

Y_{E}

is first projected into a higher-dimensional feature space through a linear layer to mitigate the inter-path interference in the original signal, thereby enhancing the discriminability of different paths for subsequent processing stages. To adaptively capture diverse channel impulse patterns in the DAFT domain, the signal is subsequently passed through three convolutional layers with varying kernel sizes to extract multi-scale pilot features.

Specifically, the small-scale kernels are designed to maintain high-resolution localization for paths with integer Doppler shifts or low-velocity scenarios, where the path energy remains highly concentrated. Conversely, the large-scale kernels are employed to capture the extended energy dispersion patterns characteristic of high-mobility scenarios with path energy leakage. This multi-scale architecture allows the 1D processing module to effectively gather dispersed energy across different Doppler spread levels.

All convolutional layers are designed to preserve the spatial dimensions of the feature maps while modifying only the channel dimensions in this paper. The dimensional configurations of the linear layers and the channel settings of the convolutional layers are illustrated in Figure 3. And the output of the 1D process module can be expressed as follows:

S_{1 D} = f_{1 D} (Y_{E}) \in R^{8 \times (32) \times 1}

(20)

4.2. Transform-Based Module

In this module, the self-attention mechanism [22] is employed to capture path energy dispersion under varying fractional Doppler conditions. By exploiting the inherent diagonal reconstruction properties of the AFDM effective channel [15], the proposed module effectively completes high-precision channel estimation. As shown in Figure 3,

S_{1 D}

first undergoes segmentation with different patch sizes. Then, these patches are linearly projected into a sequence of token embeddings and augmented with standard positional encodings to retain sequential information. Subsequently, the multi-head self-attention (MHA) mechanism within the Transformer blocks computes global correlations across all tokens. This global receptive field enables the model to capture long-range dependencies and aggregate dispersed path energy across the entire sequence.

As illustrated in Figure 4, the Transformer block begins with a LayerNorm layer, followed by a multi-head self-attention layer that incorporates a residual connection. In the following, a second LayerNorm layer and an MLP are applied, where an additional residual connection is utilized to facilitate gradient flow.

The architecture of the MHA is shown in Figure 5. Notably, the multi-head mechanism linearly projects the input into multiple independent representation subspaces, allowing different heads to concurrently capture diverse energy dispersion patterns. The calculation formula for the attention score is as follows.

Attention (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d}}) V

(21)

At the end of the module, a linear projection and a reshape operation are applied after the Transformer blocks to map the extracted features into a higher dimension and reshape them into 2D structural representations. These representations then undergo progressive upsampling via Transposed Conv2D layers. Meanwhile, subsequent convolutional layers are utilized to adaptively reweight the upsampled features, thereby fine-tuning the transformation factors across different columns to ensure accurate reconstruction of the effective channel matrix. The detailed step-by-step operations and exact dimensional transitions of this entire process are summarized in Algorithm 1.

Algorithm 1 Forward Pass of the Proposed AFDM Channel Estimation Network

Require: Truncated received signal

X \in R^{B \times 2 \times 15}

Ensure: Estimated effective channel matrix

{\hat{H}}_{eff} \in R^{B \times 2 \times 128 \times 128}

1: % Phase 1: 1D Feature Processing
2:

X_{1} \leftarrow {Linear}_{15 \to 32} (X)

{Initial dimension expansion, Shape:

B \times 2 \times 32

}
3:

X_{2} \leftarrow 1 D - process (X_{1})

{Shape:

B \times 8 \times 32

}
  4: Phase 2: Vision Transformer (ViT) Blocks
  5: {Three parallel ViT branches processing the same 1D features}
  6:

F_{v 1} \leftarrow ViT_1D (X_{2}, p = 4, upsample = 2)

{Branch 1, Shape:

B \times 8 \times 64 \times 64

}
7:

F_{v 2} \leftarrow ViT_1D (X_{2}, p = 2, upsample = 1)

{Branch 2, Shape:

B \times 16 \times 32 \times 32

}
8:

F_{v 3} \leftarrow ViT_1D (X_{2}, p = 1, upsample = 0)

{Branch 3, Shape:

B \times 32 \times 16 \times 16

}
9: % Phase 3: Upsample
10:

U_{3} \leftarrow {ConvTranspose 2 D}_{k = 4, s = 2} (F_{v 3})

{Shape:

B \times 16 \times 32 \times 32

}
11:

C_{2} \leftarrow Concat (U_{3}, F_{v 2}, \dim = 1)

{Shape:

B \times 32 \times 32 \times 32

}
12:

U_{2} \leftarrow {ConvTranspose 2 D}_{k = 4, s = 2} (C_{2})

{Shape:

B \times 8 \times 64 \times 64

}
13:

C_{1} \leftarrow Concat (U_{2}, F_{v 1}, \dim = 1)

{Shape:

B \times 16 \times 64 \times 64

}
14:

U_{1} \leftarrow {ConvTranspose 2 D}_{k = 4, s = 2} (C_{1})

{Shape:

B \times 8 \times 128 \times 128

}
15: % Phase 4: Final Feature Refinement
16:

M_{out} \leftarrow {Conv 2 D}_{1 \times 1} (U_{1})

{Channel-wise refinement, Shape:

B \times 8 \times 128 \times 128

}
17:

{\hat{H}}_{eff} \leftarrow {Conv 2 D}_{3 \times 3} (M_{out})

{Final output mapping, Shape:

B \times 2 \times 128 \times 128

}
18: return

{\hat{H}}_{eff}

The key settings of each transformer block are shown in Table 2. where patch size determines the division of input patches, and depth denotes the number of stacked layers within each Transformer block. The parameter dim represents the feature dimension of both the input and output of the Transformer block, and heads specifies the number of attention heads. The mlp-dim corresponds to the hidden dimension of the MLP illustrated in Figure 4. In addition, linear-output refers to the output dimension of a linear layer applied to expand the feature dimension from dim, serving as a preparatory step for the subsequent transformation into a 2D matrix.

4.3. Data Collection and TRAINING

All datasets are generated in MATLAB R2023a to provide a controlled environment for performance evaluation. The number of propagation paths P is uniformly distributed between 1 and 5. The delay for each path is randomly chosen from a uniform distribution within

[0, l_{max}]

. In practical wideband systems, the sampling frequency is sufficiently high, allowing the actual physical delays to be accurately approximated as integer multiples of the sampling interval, so the path delays are set as integers, consistent with the settings in [7,15,16].

The channel follows a uniform power-delay profile (PDP), where the complex path coefficients

h_{i}

are generated as independent complex Gaussian random variables with zero mean and a variance of 1.

The Doppler shift of each path is generated using

v_{i} = α_{max} cos (θ_{i})

, where

θ_{i}

is uniformly distributed over

[- π, π]

to simulate realistic mobility.

The system signal-to-noise (SNR) is defined as

\frac{E \{{| x (n) |}^{2}\}}{σ^{2}}

, where

σ^{2}

means the energy of noise and

E {\cdot}

is the expected operation. For the pilot,

10 {log}_{10} \frac{| x_{p} |^{2}}{E \{{| x (n) |}^{2}\}} = 20 dB

is obeyed, and other settings can be found in Table 3. The training, validation, and test datasets are generated at each SNR ranging from 0 dB to 25 dB with a 5 dB step size, containing 80,000, 20,000, and 20,000 samples per SNR level, respectively. Specifically, independent models are trained for each specific SNR level to ensure optimal estimation performance across different noise conditions. To train this network, we chose MSE as the loss function to improve performance.

L_{MSE} = \frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} {|H_{eff} [i, j] - {\hat{H}}_{eff} [i, j]|}^{2}

(22)

The proposed network is implemented using the PyTorch 2.6.0 framework on a desktop workstation equipped with an 10th Gen Intel(R) Core(TM) i9-10900K CPU (Intel Corporation, Santa Clara, CA, USA) and an NVIDIA RTX 2080 GPU (NVIDIA Corporation, Santa Clara, CA, USA). The network is trained from scratch for 300 epochs with a batch size of 128. To optimize the network, we employ the Adam optimizer with an initial learning rate of 0.001. Regarding internal configurations, we use GELU as the nonlinear activation function throughout the network, along with Layer Normalization (LayerNorm) in the ViT blocks. The network weights are initialized using PyTorch’s default methods (e.g., Kaiming initialization for convolutional layers). Furthermore, the exact input dimension shaping and the specific channel transitions across all intermediate layers are comprehensively detailed in Algorithm 1. The entire training process takes approximately a few hours.

5. Simulation and Results

In this section, we consider an SISO-AFDM system with N = 128 subcarriers. One pilot is inserted into the Q-th subcarrrier and the arrangement of pilot and data symbols is shown in Section 3.2, and

k_{v} = 1

. For a fair comparison, we adopt the same dual-dispersion channel model as [15,16,18], the parameters are generated in the same manner as described in Data collection. The remaining AFDM parameters can be found in Table 3.

5.1. Simulation Results

In accordance with the configurations adopted in prior studies, the Doppler grid resolution for the OMP scheme is 0.2 [17], and the Doppler grid resolution for the EPA-AML is 0.1 [17]. To quantify the accuracy of various channel estimation techniques, the Normalized Mean Squared Error (NMSE) is employed as the primary performance metric, which is defined as follows:

NMSE = \frac{{∥H_{eff} - {\hat{H}}_{eff}∥}^{2}}{{∥H_{eff}∥}^{2}}

(23)

Figure 6 and Figure 7 illustrate the NMSE performance of the EPA-DR, OMP, EPA-AML, and the proposed network. In Figure 6, it is observed that EPA-DR exhibits the most inferior performance, primarily because its fixed-threshold mechanism fails to accommodate the path energy dispersion induced by fractional Doppler shifts. While all methods experience varying degrees of performance degradation due to the introduction of IDoI, as shown in Figure 7, the impact of EPA-AML is particularly pronounced, which is derived based on the assumption of the absence of IDoI. In contrast, our proposed method consistently achieves superior NMSE performance. This phenomenon underscores the advantages of data-driven adaptive strategies in modeling fractional Doppler effects in AFDM channel estimation. Furthermore, as shown in Figure 7, the comparative results involving a CNN-based scheme and an ablation study without the ViT blocks further validate the architectural effectiveness.

To further evaluate various channel estimation algorithms, the Minimum Mean Square Error (MMSE) equalizer [23] is employed for signal detection. As shown in Figure 8, our proposed method achieves the best BER performance compared with other methods in the presence of IDoI, which is very close to the ideal CSI. Although the BER curves of all evaluated methods are similar in the low-SNR region, our approach achieves a 6 dB improvement over the OMP algorithm at 25 dB SNR.

To verify the robustness and generalization of the model, we conducted experiments under different parameter settings. As illustrated in Figure 9a, we evaluate the NMSE performance with varying numbers of propagation paths. Although the training set comprises conditions with one to five paths, the proposed model demonstrates superior performance even in an unseen six-path scenario, which shows its remarkable generalization across diverse channel conditions. Furthermore, Figure 9b investigates the impact of pilot power bias. The results indicate that our model maintains a significant performance, exhibiting consistent resilience even when the power deviation reaches 2 dB.

5.2. Complexity Analysis

Unlike traditional methods, our approach directly outputs the effective channel matrix in an end-to-end manner. Therefore, the complexity of our model can be considered in two parts, localized feature extraction and channel reconstruction. In the feature extraction stage, the Transformer blocks process the effective region of the received signal of size Q, with

N_{p} \propto Q

tokens and transform block feature dimension dim D shown in Table 2. The linear layer within the Transformer blocks and the feed-forward network contribute a complexity of

O (Q D^{2})

. The self-attention mechanism has complexity

O (Q^{2} D)

. Consequently, the feature extraction stage complexity is

O (Q D^{2} + Q^{2} D)

. In the following, the reconstruction stage employs transposed convolutions to upsample these features into the final

N \times N

estimation matrix. These convolution operations result in a complexity of

O (N^{2})

.

Table 4 demonstrates the complexity of other methods from the perspectives of feature extraction and reconstruction, and execution time of one sample.

N_{O}

represents the column dimension of the measurement matrix, T is the number of iterations, and V represents the Doppler resolution. Our proposed methods have 246.7 k parameters and 109M FLOPs, which maintain a competitive scale compared to deep learning-based OTFS estimation frameworks [20]. Although the computational overhead of the proposed data-driven approach exceeds that of conventional methods, dedicated hardware effectively bridges this latency gap. This ensures that the execution time remains comparable to traditional methods. However, practical deployment must be evaluated against strict mobile device operational boundaries, including limited battery capacity, thermal constraints, and restricted memory bandwidth. And mobile CPUs struggle with the massive parallel computations required by our model, leading to latency bottlenecks. In contrast, GPUs are inherently optimized for such tasks. Therefore, dedicated hardware—such as mobile GPUs or Neural Processing Units (NPUs) are needed.

6. Conclusions

In this paper, we propose a deep learning network based on the Transformer architecture and the diagonal reconstruction property of the AFDM effective channel matrix to improve AFDM channel estimation performance under fractional Doppler shifts. By integrating ViT blocks with 1D feature processing module, the proposed network effectively captures dispersed path energy patterns without requiring any prior knowledge of the number of paths or distinct path delays, achieving up to 4 dB and 6 dB improvements in NMSE and BER respectively over existing traditional methods, making it well-suited for high-speed scenarios that require highly accurate estimation of the effective channel matrix with dedicated hardware. The model’s rationality, feasibility, and robustness are validated through comparative experiments, ablation studies, and evaluations under different parameter settings. However, this study has some limitations. First, the current results lack validation on a measured channel, so the model’s performance under real hardware imperfections remains uncertain. Second, the model sacrifices sensing capability, as it does not explicitly estimate path parameters (such as delay and Doppler shifts). Instead, the network implicitly models these parameters to directly reconstruct the effective channel matrix.

In future work, we will focus on two main directions. First, we plan to incorporate lightweight attention mechanisms and model compression methods, such as network pruning, to reduce the model’s parameters and accelerate inference speed. Second, since the feature extraction stage implicitly models path parameters, we aim to explore an end-to-end communication paradigm. By designing a unified network for joint channel estimation and signal detection, we can potentially eliminate the need for traditional standalone detectors.

Author Contributions

Conceptualization, J.Y. and Z.L.; methodology, J.Y. and S.X.; validation J.Y. and S.X.; data curation, J.Y. and S.X.; investigation, Z.L. and S.X.; writing—review and editing, J.Y. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 61773290 and by the Fundamental Research Funds for the Central Universities (Grant No. 22120230311).

Data Availability Statement

Data are contained within the article. The original contributions presented in this paper are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Saad, W.; Bennis, M.; Chen, M. A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open Research Problems. IEEE Netw. 2020, 34, 134–142. [Google Scholar] [CrossRef]
Wang, T.; Proakis, J.G.; Masry, E.; Zeidler, J.R. Performance degradation of OFDM systems due to Doppler spreading. IEEE Trans. Wirel. Commun. 2006, 5, 1422–1432. [Google Scholar] [CrossRef]
Yih, C.H. BER Analysis of OFDM Systems Impaired by DC Offset and Carrier Frequency Offset in Multipath Fading Channels. IEEE Commun. Lett. 2007, 11, 842–844. [Google Scholar] [CrossRef]
Hadani, R.; Rakib, S.; Tsatsanis, M.; Monk, A.; Goldsmith, A.J.; Molisch, A.F.; Calderbank, R. Orthogonal Time Frequency Space Modulation. In Proceedings of the 2017 IEEE Wireless Communications and Networking Conference (WCNC), San Francisco, CA, USA, 19–22 March 2017; pp. 1–6. [Google Scholar]
Li, S.; Yuan, J.; Yuan, W.; Wei, Z.; Bai, B.; Ng, D.W.K. Performance Analysis of Coded OTFS Systems Over High-Mobility Channels. IEEE Trans. Wirel. Commun. 2021, 20, 6033–6048. [Google Scholar] [CrossRef]
Hadani, R.; Rakib, S.; Molisch, A.F.; Ibars, C.; Monk, A.; Tsatsanis, M.; Delfeld, J.; Goldsmith, A.; Calderbank, R. Orthogonal Time Frequency Space (OTFS) modulation for millimeter-wave communications systems. In Proceedings of the 2017 IEEE MTT-S International Microwave Symposium (IMS), Honololu, HI, USA, 4–9 June 2017; pp. 681–683. [Google Scholar]
Bemani, A.; Ksairi, N.; Kountouris, M. AFDM: A Full Diversity Next Generation Waveform for High Mobility Communications. In Proceedings of the 2021 IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
Luo, Q.; Xiao, P.; Liu, Z.; Wan, Z.; Thomos, N.; Gao, Z.; He, Z. AFDM-SCMA: A Promising Waveform for Massive Connectivity Over High Mobility Channels. IEEE Trans. Wirel. Commun. 2024, 23, 14421–14436. [Google Scholar] [CrossRef]
Tao, Y.; Wen, M.; Ge, Y.; Li, J.; Basar, E.; Al-Dhahir, N. Affine Frequency Division Multiplexing With Index Modulation: Full Diversity Condition, Performance Analysis, and Low-Complexity Detection. IEEE J. Sel. Areas Commun. 2025, 43, 1041–1055. [Google Scholar] [CrossRef]
Tao, Y.; Wen, M.; Ge, Y.; Mao, T.; Tang, Y.; Doosti-Aref, A. Affine Frequency Division Multiple Access Based on DAFT Spreading for Next-Generation Wireless Networks. IEEE Trans. Wirel. Commun. 2025, 25, 4626–4641. [Google Scholar] [CrossRef]
Ranasinghe, K.R.R.; Sandoval, I.A.M.; Rou, H.S.; Abreu, G.T.F.D.; Alexandropoulos, G.C. Doubly-Dispersive MIMO Channels with Stacked Intelligent Metasurfaces: Modeling, Parametrization, and Receiver Design. IEEE Trans. Wirel. Commun. 2025, 25, 3801–3817. [Google Scholar] [CrossRef]
Zhang, F.; Wang, Z.; Mao, T.; Jiao, T.; Zhuo, Y.; Wen, M.; Xiang, W.; Chen, S.; Karagiannidis, G.K. AFDM-Enabled Integrated Sensing and Communication: Theoretical Framework and Pilot Design. IEEE J. Sel. Areas Commun. 2025, 44, 310–324. [Google Scholar] [CrossRef]
Zhu, J.; Tang, Y.; Liu, F.; Zhang, X.; Yin, H.; Zhou, Y. AFDM-Based Bistatic Integrated Sensing and Communication in Static Scatterer Environments. IEEE Wirel. Commun. Lett. 2024, 13, 2245–2249. [Google Scholar] [CrossRef]
Yin, H.; Tang, Y. Pilot Aided Channel Estimation for AFDM in Doubly Dispersive Channels. In Proceedings of the 2022 IEEE/CIC International Conference on Communications in China (ICCC), Foshan, China, 11–13 August 2022; pp. 308–313. [Google Scholar]
Yin, H.; Wei, X.; Tang, Y.; Yang, K. Diagonally Reconstructed Channel Estimation for MIMO-AFDM With Inter-Doppler Interference in Doubly Selective Channels. IEEE Trans. Wirel. Commun. 2024, 23, 14066–14079. [Google Scholar] [CrossRef]
Bemani, A.; Ksairi, N.; Kountouris, M. Affine Frequency Division Multiplexing for Next Generation Wireless Communications. IEEE Trans. Wirel. Commun. 2023, 22, 8214–8229. [Google Scholar] [CrossRef]
Cao, R.; Zhong, Y.; Lyu, J.; Wang, D.; Fu, L. AFDM Channel Estimation in Multi-Scale Multi-Lag Channels. In Proceedings of the GLOBECOM 2024—2024 IEEE Global Communications Conference, Cape Town, South Africa, 8–12 December 2024; pp. 1569–1574. [Google Scholar]
Yang, F.; Luo, S.; Wu, L.; Song, D.; Lin, R.; Xie, S. An AFDM off-grid Channel Estimation Based on Sparse Bayesian Learning. In Proceedings of the 2024 IEEE 24th International Conference on Communication Technology (ICCT), Yichang, China, 18–20 October 2024; pp. 1565–1569. [Google Scholar]
Jing, L.; Wang, Q.; He, C.; Zhang, X. A Learned Denoising-Based Sparse Adaptive Channel Estimation for OTFS Underwater Acoustic Communications. IEEE Wirel. Commun. Lett. 2024, 13, 969–973. [Google Scholar] [CrossRef]
Qi, S.; Wang, Q.; Ma, Z. Deep Residual Attention Network for OTFS Channel Estimation. IEEE Trans. Veh. Technol. 2025, 74, 9834–9839. [Google Scholar] [CrossRef]
Rou, H.S.; de Abreu, G.T.F.; Choi, J.; González G., D.; Kountouris, M.; Guan, Y.L.; Gonsa, O. From Orthogonal Time–Frequency Space to Affine Frequency-Division Multiplexing: A comparative study of next-generation waveforms for integrated sensing and communications in doubly dispersive channels. IEEE Signal Process. Mag. 2024, 41, 71–86. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Bemani, A.; Ksairi, N.; Kountouris, M. Low Complexity Equalization for Afdm In Doubly Dispersive Channels. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 5273–5277. [Google Scholar]

Figure 1. AFDM block diagram.

Figure 2. Structure of the AFDM effective channel matrix under fraction Doppler.

Figure 3. Structure of the network.

Figure 4. Structure of the transform block.

Figure 5. Structure of MHA.

Figure 6. NMSE performance of different methods in absence of IDoI.

Figure 7. NMSE performance of different methods in the presence of IDoI.

Figure 8. BER performance of different methods under fractional Doppler with IDoI.

Figure 9. NMSE performance of different parameters: (a) NMSE performance of the proposed method with different numbers of paths; (b) NMSE performance of the proposed method with different pilot power biases.

Table 1. Summary of key notations and units.

Variable	Description	Unit/Scale
B	System bandwidth	Hz
N	Number of subcarriers	Dimensionless
$T_{s}$	Sampling interval	Seconds (s)
$T_{s y m}$	AFDM symbol duration ( $N T_{s}$ )	Seconds (s)
$Δ f$	Subcarrier spacing/Doppler resolution	Hz
$τ_{i}$	Physical path delay	Seconds (s)
$f_{i}$	Physical Doppler shift	Hz
$l_{i}$	Normalized path delay ( $τ_{i} / T_{s}$ )	Samples (Integer)
$v_{i}$	Normalized Doppler shift ( $f_{i} / Δ f$ )	Dimensionless
$c_{1}, c_{2}$	AFDM chirp parameters	Dimensionless

Table 2. Parameters of the Transformer block.

Parameter	Vit-Third	Vit-Second	Vit-First
patch size	1	2	4
depth	2	2	2
dim	64	48	48
heads	8	8	8
mlp-dim	64	64	64
linear-output	256	256	256

Table 3. Parameters of the AFDM system.

Parameter	Value
Carrier frequency $f_{c}$ (GHz)	28
Constellation	16-QAM
Max normalized Doppler	1
Max normalized delay	2
Bandwidth (MHz)	2
Sampling frequency	2 × Bandwidth
Guard symbols	$(l_{max} + 1) (2 (α_{max} + k_{v}) + 1) - 1$
$k_{v}$	1
Max UE speed (kmph)	600
Number of paths (p)	1–5

Table 4. Complexity and computational efficiency comparison.

Method	Params	Latency (ms)	Theoretical Complexity
Proposed	246.7 k	19.3/9.7 ^†	$O (Q D^{2} + Q^{2} D) + O (N^{2})$
EPA-DR	-	8.4	$O (Q) + O (N)$
OMP	-	12.3	$O (T^{3} + T N_{O} Q + T^{2} Q) + O (N^{2})$
EPA-AML	-	18.9	$O (Q V^{P}) + O (N^{2})$

^† The latency values represent one example execution time, respectively. ^† Proposed model implemented in Python 3.10.9 (GPU/CPU) and the traditional baselines in MATLAB R2023a (CPU). ^† The latency values for the Proposed method represent CPU and GPU (RTX 2080) for one example execution time, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yin, J.; Xu, S.; Li, Z. Attention-Based AFDM Channel Estimation Network Using Diagonal Reconstruction. Electronics 2026, 15, 957. https://doi.org/10.3390/electronics15050957

AMA Style

Yin J, Xu S, Li Z. Attention-Based AFDM Channel Estimation Network Using Diagonal Reconstruction. Electronics. 2026; 15(5):957. https://doi.org/10.3390/electronics15050957

Chicago/Turabian Style

Yin, Jiale, Shangzhi Xu, and Zhipeng Li. 2026. "Attention-Based AFDM Channel Estimation Network Using Diagonal Reconstruction" Electronics 15, no. 5: 957. https://doi.org/10.3390/electronics15050957

APA Style

Yin, J., Xu, S., & Li, Z. (2026). Attention-Based AFDM Channel Estimation Network Using Diagonal Reconstruction. Electronics, 15(5), 957. https://doi.org/10.3390/electronics15050957

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Attention-Based AFDM Channel Estimation Network Using Diagonal Reconstruction

Abstract

1. Introduction

2. Related Works

2.1. Traditional Channel Estimation in AFDM

2.2. Deep Learning-Based Channel Estimation in OTFS

2.3. Motivation

2.4. Notation and Normalization

3. Review of the AFDM System Model and Channel Estimation

3.1. System Model of AFDM

3.1.1. Modulation

3.1.2. Channel Model

3.1.3. Demodulation

3.1.4. Input–Output Relation

3.2. Embedded Pilot-Aided Channel Estimation

4. Proposed Scheme

4.1. 1D Process Module

4.2. Transform-Based Module

4.3. Data Collection and TRAINING

5. Simulation and Results

5.1. Simulation Results

5.2. Complexity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI