Next Article in Journal
CasDacGCN: A Dynamic Attention-Calibrated Graph Convolutional Network for Information Popularity Prediction
Previous Article in Journal
Synergizing High-Quality Tourism Development and Digital Economy: A Coupling Coordination Analysis in Chinese Prefecture-Level Cities
Previous Article in Special Issue
Signal Detection Based on Separable CNN for OTFS Communication Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cross-Domain OTFS Detection via Delay–Doppler Decoupling: Reduced-Complexity Design and Performance Analysis †

1
State Key Laboratory of ISN, Xidian University, Xi’an 710071, China
2
Electrical Engineering and Computer Science Department, Technische Universität Berlin, 10587 Berlin, Germany
*
Author to whom correspondence should be addressed.
This article is a revised and expanded version of a paper entitled Insights into [Reduced-complexity cross-domain iterative detection for OTFS modulation via delay-Doppler decoupling], which was presented at [IEEE 24th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Shanghai, China, 25–28 September 2023].
Entropy 2025, 27(10), 1062; https://doi.org/10.3390/e27101062
Submission received: 10 September 2025 / Revised: 3 October 2025 / Accepted: 10 October 2025 / Published: 13 October 2025

Abstract

In this paper, a reduced-complexity cross-domain iterative detection for orthogonal time frequency space (OTFS) modulation is proposed that exploits channel properties in both time and delay–Doppler domains. Specifically, we first show that in the time-domain effective channel, the path delay only introduces interference among samples in adjacent time slots, while the Doppler becomes a phase term that does not affect the channel sparsity. This investigation indicates that the effects of delay and Doppler can be decoupled and treated separately. This “band-limited” matrix structure further motivates us to apply a reduced-size linear minimum mean square error (LMMSE) filter to eliminate the effect of delay in the time domain, while exploiting the cross-domain iteration for minimizing the effect of Doppler by noticing that the time and Doppler are a Fourier dual pair. Furthermore, we apply eigenvalue decomposition to the reduced-size LMMSE estimator, which makes the computational complexity independent of the number of cross-domain iterations, thus significantly reducing the computational complexity. The bias evolution and variance evolution are derived to evaluate the average MSE performance of the proposed scheme, which shows that the proposed estimators suffer from only negligible estimation bias in both time and DD domains. Particularly, the state (MSE) evolution is compared with bounds to verify the effectiveness of the proposed scheme. Simulation results demonstrate that the proposed scheme achieves almost the same error performance as the optimal detection, but only requires a reduced complexity.

1. Introduction

Future wireless networks are envisioned to accommodate many emerging applications, such as low-earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs), where the signal is inevitably transmitted over complex and challenging high-mobility channel scenarios [1,2]. In such scenarios, both time dispersion and frequency dispersion occur simultaneously. The popular orthogonal frequency division multiplexing (OFDM), known for its ability to combat time dispersion well, suffers from severe performance degradation due to inter-carrier interference (ICI) and carrier frequency offset (CFO) caused by Doppler shifts [3].
The recently proposed orthogonal time frequency space (OTFS) modulation has shown to be a good solution to signal transmissions over such challenging high-mobility channels [4,5]. Unlike OFDM, which modulates information symbols in the time–frequency (TF) domain, the information symbols in OTFS systems are multiplexed in the delay–Doppler (DD) domain, leading to the full exploration of appealing DD domain channel properties, including quasi-static, separable, and sparse properties [6,7], which in return facilitates the design of channel estimation and equalization. More importantly, OTFS can potentially achieve full channel diversity [8,9,10,11], which ensures better performance robustness compared to currently deployed OFDM over challenging transmission scenarios.
The promising performance of OTFS relies on advanced equalization to handle the DD domain interference caused by multipath transmissions. Conventional linear equalizers, such as zero forcing (ZF) and minimum mean square error (MMSE), are widely applied in OTFS due to their simple implementation. Considering the high complexity order of traditional matrix inversion operations in linear equalization, several works of literature have proposed low-complexity implementation of ZF/MMSE for different cases of OTFS systems, such as [12,13,14,15,16]. However, linear equalization cannot achieve optimal performance. As a classical nonlinear equalization scheme, maximum-likelihood sequence estimation (MLSE) is optimal. However, it usually requires prohibitively high detection complexity and cannot be directly applied to practical systems. Thus, the design of reduced-complexity detection for OTFS has acquired much attention. For instance, a message passing (MP) algorithm based on maximum a posteriori probability (MAP) was proposed in [17], where the DD domain inter-symbol interference (ISI) is Gaussian-approximated to reduce detection complexity. Furthermore, damping was introduced to improve the convergence. Nevertheless, the convergence performance is still limited by the 4-cycles in the factor graph (FG). In order to address it, an improved MP detector for OTFS that constructs an FG of girth 6 and then directly applies an exact sum-product algorithm (SPA) with linear complexity in the symbol constellation size was proposed in [18]. Furthermore, many improved detection algorithms based on MP were proposed, such as hybrid MAP and parallel interference cancellation [19] and Gaussian approximate MP [20]. In particular, a variational Bayes framework for OTFS detection that effectively mitigates the performance degradation caused by the short cycles of the probabilistic graphical model in the MP algorithm was introduced in [21].
Note that most OTFS detection schemes, including the aforementioned ones, operate in the DD domain. However, when the time and frequency resources for OTFS are limited, the DD domain channel matrix could be dense due to insufficient resolution of delay and Doppler, and, consequently, DD domain detection may suffer from high detection complexity [22]. As a special type of MP algorithm with a lower complexity, unitary approximate message passing (UAMP) was proposed in [23], which can achieve a promising error performance with an efficient implementation in the case of rich scattering environments or insufficient Doppler resolution by exploiting the circulant or sparsity structures of the channel matrix. In order to obtain the special structures, ideal bi-orthogonal waveforms are assumed to be used. In the case of more practical waveforms, null symbols or cyclic prefixes (CPs) are frequently inserted into the OTFS signal block, which leads to significant overhead. Due to the limitations of single-domain detection, many studies have conducted research on joint multi-domain detectors. The cross-domain iterative detection proposed in [24] was a preliminary attempt to solve this issue by considering the detection in both time and DD domains via iterative processing, which can achieve almost the same error performance as ML detection even in the presence of fractional Doppler shifts. The cross-domain iterative detection is motivated by the unitary transformation between the time and DD domains, ensuring that the detection error in one domain is principally orthogonal to that in the other domain. Thus, it allows for cross-domain iterations for signal detection without introducing error propagation. Subsequently, ref. [25] systematically analyzed three cross-domain iterative detection models between the frequency domain and the DD domain. Their work further evaluated the convergence behavior, computational complexity, and error-rate performance of these models, thereby providing valuable theoretical guidance for the design of cross-domain detection schemes. However, in these cross-domain iterative detection algorithms, a full-size linear minimum mean square error (LMMSE) filter is adopted in the time domain, which, as we will show later, does not fully exploit the advantages of cross-domain iteration.
In this paper, we propose a novel cross-domain iterative detection for OTFS with reduced complexity. The major motivation is that the effects of delay and Doppler can be decoupled and thus can be treated separately. In particular, the rationale behind our work is that the path delay only introduces interference among samples in adjacent time slots, while the Doppler behaves as a phase term that does not affect the sparsity of the time-domain OTFS effective channel. Consequently, the time-domain channel matrix has a “band-limited” structure. Based on this, we propose a reduced-size estimator in the time domain to eliminate the effect of delay, while relying on cross-domain iterations to minimize the effect of Doppler. Such an iterative scheme is motivated by the fact that time and Doppler are a Fourier dual pair and therefore the effect of Doppler can be minimized by iteratively exchanging the extrinsic information between the time and DD domains. Furthermore, we provide a detailed performance analysis for the proposed scheme by studying the bias evolution and the state evolution, and compare it with the boundary to further prove the effectiveness of the proposed scheme. The detection complexity is also discussed. The main contributions of this paper are summarized as follows.
  • We derive the time domain and DD domain vectorized input–output relations for OTFS transmissions using the discrete Zak transform (DZT). Based on this, we analyze the properties of the time-domain effective channel matrix and reveal that the effects of delay and Doppler can be decoupled and treated separately. Furthermore, we propose a reduced-complexity cross-domain iterative detection that applies a reduced-size LMMSE in the time domain and a simple symbol-by-symbol detection in the DD domain and iteratively exchanges the extrinsic information via the unitary transformation. In particular, we further apply eigenvalue decomposition (EVD) to the proposed reduced-size LMMSE estimator, which makes the computational complexity independent of the number of cross-domain iterations, thus significantly reducing the complexity.
  • We derive the average MSE of the proposed algorithm, which shows that the error performance of the estimator is characterized by the covariance of the observation and the bias of the estimator. We show that the proposed estimators suffer from only negligible estimation bias in both time and DD domains. We further derive the state evolution for the unbiased estimation case and the theoretical performance bounds under treating-the-interference-as-noise (TIN) and genie-aided strategies. More importantly, we show that the TIN bound and genie-aided bound converge to each other after a sufficient number of cross-domain iterations. The proposed reduced-size LMMSE estimator also aligns well with the mechanism of the cross-domain iterative detection.
  • We investigate the converged error performance of the proposed algorithm by focusing on the effective DD domain signal-to-noise ratio (SNR) under the TIN and genie-aided strategies. We show that the upper bound of the effective SNR under the TIN strategy converges to that under the genie-aided strategy with the iteration of the cross-domain detection. Furthermore, we show that the effective SNR of the proposed algorithm can theoretically approach the maximum receiver SNR with a sufficient number of iterations for a given fading channel. This also demonstrates that the proposed algorithm can achieve near-optimal performance with sufficient iterations.
  • We evaluate the MSE performance and the error performance of the proposed algorithm by numerical simulations. The numerical results agree with our analysis and demonstrate a near-optimal error performance.
Notations: F N and F N H denote the discrete Fourier transform (DFT) matrix and inverse DFT (IDFT) matrix of size N × N , respectively; I N denotes the identity matrix of size N × N ; ⊗ denotes the Kronecker product operator; · H denotes the Hermitian transpose; · T denotes the transpose; · * denotes conjugation; diag · denotes the diagonal matrix; δ · is the Dirac delta function.

2. System Model

2.1. Backgrounds on OTFS Transmissions

Without loss of generality, let us consider the transceiver structure of OTFS transmissions using the discrete Zak transform (DZT) in Figure 1. Let M be the number of delay bins (sub-carriers) and N be the number of Doppler bins (time slots). Let T be the duration of each time slot and, correspondingly, the sub-carrier spacing is 1 / T . A length- M N DD domain information symbol vector x , selected uniformly from a constellation alphabet A = a 1 , , a Q , is passed through the inverse DZT (IDZT) module, resulting in the time-domain transmitted signal s , i.e.,
s = F N H I M x .
By applying the reduced-cyclic-prefix (reduced-CP) structure [26], the resultant time domain transmitted signal can be represented as
s ˜ = A CP s ,
where A CP = G CP I M N T is the block-wise CP addition matrix with size M N + L CP × M N , and  G CP of size M N × L CP consists of the last L CP columns of the identity matrix I M N . Without loss of generality, the length of CPs, L CP , is selected to be no less than the maximum delay index, which will be introduced later. After adding CPs, the symbol s ˜ n is transmitted by the T s -orthogonal shaping pulse p t , yielding
s t = n = 0 M N + L CP 1 s ˜ n p t n T s .
Consider a path-P linear time-varying (LTV) wireless channel given by
h τ , ν = i = 1 P h i δ τ τ i δ ν ν i ,
where h i is the fading coefficient for the i-th path, following a complex Gaussian distribution with zero mean and variance 1 1 2 P 2 P per real dimension (uniform power profile), and τ i 0 , T and ν i 0 , 1 / T represent the delay and Doppler shifts associated with the i-th path, respectively. In particular, we consider the discretized delay and Doppler indices defined by l i + ι i = τ i M / T and k i + κ i = ν i N T , where 0 l i M 1 and 0 k i N 1 are the corresponding integer indices of delay and Doppler for the i-th path, while ι i 0.5 ,   0.5 and κ i 0.5 ,   0.5 represent the fractional contribution of the delay shift and the Doppler shift, respectively.
The transmitted time-domain signal s t passes through the above LTV channel. At the receiver, the corresponding received time-domain signal r t can be given by
r t = h τ , ν s t τ e j 2 π ν t τ d τ d ν + n t = i = 1 P n = 0 M N + L CP 1 h i e j 2 π ν i t τ i s ˜ n p t n T s τ i + n t ,
where n t is the additive white Gaussian noise (AWGN) process with the one-sided power spectral density (PSD) N 0 . By applying a matched filter with the pulse p * t on r t , we can obtain
r ˜ m = r t p * t m T s d t = i = 1 P n = 0 M N + L CP 1 g m , n i s ˜ n + n ˜ m
for m = 0 , , M N + L CP 1 , where g m , n i denotes the effective time-domain channel coefficient between the n-th transmitted symbol s ˜ n and the m-th received symbol r ˜ m over the i-th resolvable path. In (6), the coefficient g m , n i can be expressed as
g m , n i = Δ h i e j 2 π n ν i T s A p * n m T s + τ i , ν i ,
where A p τ , ν denotes the ambiguity function of the pulse p t with respect to delay τ and Doppler ν , given by
A p τ , ν = Δ p t p * t τ e j 2 π ν t τ d t .
Arrange (6) into a length- M N + L CP vector r ˜ , i.e.,
r ˜ = i = 1 P G i s ˜ + n ˜ ,
we define G i of size M N + L CP × M N + L CP as the time-domain effective channel matrix for the i-th resolvable path, whose m , n -th element is g m , n i . After performing CP removal with a block-wise CP removal matrix R CP composed of the last M N rows of the identity matrix I M N + L CP , the time-domain vectorized input–output relation for OTFS transmission can be given by
r = i = 1 P R CP G i A CP s + R CP n ˜ ,
where n = R CP n ˜ denotes the time-domain effective noise vector with zero mean and a one-sided power spectral density of N 0 . Finally, the DD domain received symbol vector y can be obtained by performing DZT on r , written as
y = i = 1 P F N I M H T i F N H I M x + F N I M R CP n ˜ ,
where H T i = R CP G i A CP is the time-domain effective channel matrix for the i-th resolvable path.

2.2. Properties of the Time-Domain Effective Channel Matrix

According to (10), the time-domain effective channel matrix H T after adding and removing CPs can be characterized as
H T = i = 1 P R CP G i A CP ,
whose size is M N × M N . As shown in (7), the path delay introduces the time-domain ISI, where the length of ISI is defined as L ISI . Noticing that 0 l p M 1 and 0.5 < ι i 0.5 , the effective maximum length of ISI is less than M. Furthermore, it should be pointed out that the maximum interference length can still be approximately limited to M 1 to consider fractional delay, since the interference outside the maximum interference length is so weak as to be almost negligible. Thus, it is clear that the interference induced by path delay is restricted to adjacent time slots [27]. Following the conventional OFDM setup, we partition s and r into N sub-blocks, each containing M samples, i.e.,  r = r 0 H , r 1 H , , r N 1 H H and s = s 0 H , s 1 H , , s N 1 H H , respectively, as shown in Figure 2. Thus, the i-th received block r i can have interference from only the i 1 -th block s i 1 in the time domain. It should be noted that the first block s 0 is interfered with by the last block s N 1 due to the appended reduced CP [28]. According to this sub-block structure, H T can be rewritten as
H T = H T 0 , 0 0 0 0 H T 0 , 1 H T 1 , 1 H T 1 , 0 0 0 0 0 H T 2 , 1 H T 2 , 0 0 0 0 0 0 H T N 2 , 0 0 0 0 0 H T N 1 , 1 H T N 1 , 0 ,
where H T i , 0 , i = 0 , , N 1 , of size M × M are the diagonal blocks of H T , and  H T i , 1 , i = 0 , , N 1 , of size M × M are the first sub-diagonal blocks of H T , representing the inter-block interference from the i 1 -th transmitted block to the i-th received block [28].
Furthermore, we notice that the Doppler effect behaves like a phase term in H T , which does not affect the channel sparsity. The above observation suggests that the time-domain effective channel matrix H T has a “band-limited" structure, where each row only has limited non-zero elements.
According to the above analysis, we reformulate the block-wise input–output relation in the time domain as
r i = H T i , 0 s i + H T i , 1 s i 1 N + n i ,
i = 0 , , N 1 , where · N denotes mod-N operation. Note that both r i and r i + 1 N contain the information of s i due to the path delay. Thus, it is convenient to write
r i r i + 1 N = H T i , 0 H T i + 1 N , 1 s i + H T i , 1 H T i + 1 N , 0 s i 1 N s i + 1 N + n i n i + 1 N ,
which can be further written as
r ˜ i = H A i s i + H B i s ˜ i + n ˜ i .
In (16), r ˜ i C 2 M × 1 is the observation at the receiver side corresponding to s i ; H A i C 2 M × M is the effective observation matrix, characterizing the interference pattern related to s i ; H B i C 2 M × 2 M is the effective interference matrix, characterizing the additional interference from other transmitted sub-blocks; s ˜ i C 2 M × 1 is the interfering signal vector; and n ˜ i C 2 M × 1 is the considered noise vector.
The above discussion naturally motivates us to design a time-domain estimator/detector that exploits the effective channel matrix structure, and this is presented in the following section.

3. Cross-Domain Iterative Detection for OTFS Modulation via Delay–Doppler Decoupling

Based on (16), we propose applying a reduced-size LMMSE estimator for eliminating the delay interference that has a much lower complexity compared to the full-size LMMSE estimator adopted in [24]. However, such an estimator cannot fully minimize the effect of Doppler. In [24], the authors have shown that the cross-domain iteration can effectively detect the OTFS signal by iteratively exchanging the extrinsic information between the time domain and the DD domain due to the fact that time and Doppler are a Fourier dual pair. Following this idea, we adopt the cross-domain iteration to minimize the Doppler interference as will be discussed later. For clarity, we assume perfect knowledge of channel state information (CSI), including the number of resolvable paths, path delays, Doppler shifts, and fading coefficients. In practice, the DD domain channel remains roughly constant within a stationarity region [7,11,28]. Consequently, acquiring CSI in the DD domain is generally feasible, even in high-mobility scenarios.

3.1. Reduced-Size LMMSE Estimator in the Time Domain

Based on (16), the block-wise LMMSE estimation matrix W MMSE i for the i-th transmitted block s i can be obtained as
W MMSE i = C s i a , T H A i H H A i C s i a , T H A i H + H B i C s ˜ i a , T H B i H + N 0 I 2 M 1 ,
where C s i a , T and C s ˜ i a , T are the a priori covariance matrices of s i and s ˜ i and initialized as I M and I 2 M for the first iteration, respectively. Furthermore, the a posteriori estimation output m s i p , T of s i is given by
m s i p , T = m s i a , T + W MMSE i r ˜ i H B i m s ˜ i a , T H A i m s i a , T ,
where m s i a , T and m s ˜ i a , T are the a priori mean vectors of s i and s ˜ i with sizes M × 1 and 2 M × 1 , respectively. Note that the above LMMSE estimator applies the successive interference cancellation (SIC) to eliminate the interference from  s ˜ i according to the a priori information from the previous iteration.
The a posteriori covariance matrix C s i p , T of s i is given by
C s i p , T = C s i a , T W MMSE i H A i C s i a , T .
Note that C s i a , T should be a diagonal matrix due to the independent and identically distributed (i.i.d.) assumption of the transmitted symbols, which is independent of channel impairments such as fractional Doppler shifts and waveform distortions. As such, we discard the non-diagonal entries of C s i p , T (treated as zeros) for further processing [24]. The extrinsic covariance matrix C s i e , T and mean m s i e , T associated with s i from the LMMSE estimation can be written as
C s i e , T = C s i p , T 1 C s i a , T 1 1
and
m s i e , T = C s i e , T C s i p , T 1 m s i p , T C s i a , T 1 m s i a , T ,
respectively.
According to (17), it is clear that the computational complexity depends on the inverse operation, which is in the order of O 2 M 3 . Since N sub-blocks need to be detected, the overall computational complexity of the proposed reduced-size LMMSE estimator per execution is O 2 M 3 N . In (13), we consider a one-sided ISI of length L ISI , since the path delay is not less than zero. Thus, H T i , 0 is a lower triangular matrix, while H T i , 1 is a strictly upper triangular matrix. Obviously, both H A i C s i a , T H A i H and H B i C s ˜ i a , T H B i H are banded matrices with halfwidth L ISI + 1 . Therefore, we can utilize LU decomposition [29] to further reduce the complexity of the inverse operation in (17) to O 2 M 2 L ISI + 1 2 + 2 2 L ISI + 1 . Nevertheless, when the ISI length is large, the computational complexity of the LU-based reduced-size LMMSE scheme approaches that of the reduced-size LMMSE scheme. Interestingly, by investigating (17), we find that only the a priori information C s a , T of the proposed reduced-size LMMSE estimator is updated in each cross-domain iteration. Note that the diagonal entries of the diagonal matrix C s a , T tend to be of the same value due to the law of large numbers, when M N tends to infinity. These observations motivate us to use the eigenvalue decomposition (EVD) to minimize the computation of (17) in each cross-domain iteration. We refer to this scheme as EVD-based reduced-size LMMSE, which will be described in detail in the following subsection.

3.2. EVD-Based Reduced-Size LMMSE Estimator in the Time Domain

We assume that the a priori covariance matrix C s a , T is a diagonal matrix with the same diagonal entries converging to
C s a , T = 1 M N Tr C s e , DD I M N = α n iter I M N ,
which also verifies that such an assumption is aligned with the law of large numbers and only causes negligible performance loss when M N is sufficiently large, especially in the high-SNR regime [24]. Thus, substituting C s i a , T = α n iter I M and C s ˜ i a , T = α n iter I 2 M into (17), we can obtain
W MMSE i = H A i H H A i H A i H + H B i H B i H + N 0 α n i t e r I 2 M 1 .
Furthermore, we define
Φ i = H A i H A i H + H B i H B i H + N 0 α n i t e r I 2 M 1
and
Ψ i = H A i H A i H + H B i H B i H .
Evidently, (25) is a Hermitian matrix with size 2 M × 2 M , which can be decomposed by EVD into
Ψ i = U i Λ i U i H .
Specifically, U i with size 2 M × 2 M is a unitary matrix whose j-th column is the eigenvector of Ψ i , and  Λ i is a diagonal matrix whose diagonal elements are the corresponding eigenvalues, i.e.,  Λ i = diag { λ 1 i , , λ j i , , λ 2 M i } . Consequently, (24) can be simplified as
Φ i = U i Λ i + N 0 α n i t e r I 2 M 1 U i H = U i Λ ˜ i U i H ,
where
Λ ˜ i = Λ i + N 0 α n i t e r I 2 M 1
            = diag 1 λ 1 i + N 0 / α n iter , , 1 λ 2 M i + N 0 / α n iter
is updated in each cross-domain iteration with negligible computational complexity compared to that of the matrix inversion operation.
The computational complexity of the proposed EVD-based reduced-size LMMSE estimator mainly depends on the EVD of (25), which is generally in the order of O 2 M 3 , consistent with that of the matrix inversion operation in the reduced-size LMMSE estimator. However, the EVD operation used in the proposed scheme only needs to be performed in the first iteration and then retained for subsequent cross-domain iterations. Thus, the computational complexity of the proposed EVD-based reduced-size LMMSE scheme does not become cumulatively larger as the number of cross-domain iteration increases, i.e.,  O 2 M 3 N , which significantly reduces the computational complexity of the entire cross-domain iteration detection.

3.3. Cross-Domain Iterative Detection for OTFS

The extrinsic information C s i e , T and m s i e , T obtained from the LMMSE estimation is passed to the DD domain as shown in Figure 3. According to the relationship between the DD domain and time domain, the DD-domain a priori information, C s a , DD and m x a , DD , is given by [24]
C s a , DD = C s e , T ,
and
m x a , DD = F N I M m s e , T ,
where C s e , T = diag diag C s 0 e , T , , diag C s N 1 e , T and m s e , T = m s 0 e , T T , , m s N 1 e , T T T are the extrinsic covariance matrix and mean of s , respectively.
In the DD domain, the detection can be conducted in a simple symbol-by-symbol manner, e.g., Algorithm 2 in [24], where the corresponding a posteriori mean m x p , DD and covariance matrix C x p , DD of the DD domain OTFS symbol x are then passed back to the time domain for calculating the extrinsic information. Based on (1), the corresponding a posteriori mean m s p , DD and covariance matrix C s p , DD of s are given by
m s p , DD = F N H I M m x p , DD ,
and
C s p , DD = F N H I M C x p , DD F N I M ,
respectively. Then, the extrinsic information of s in terms of the covariance matrix and mean can be obtained as [24]
C s e , DD = C s p , DD 1 C s a , DD 1 1 ,
and
m s e , DD = C s e , DD C s p , DD 1 m s p , DD C s a , DD 1 m s e , T ,
respectively. Next, the extrinsic information is fed back to the time-domain LMMSE estimator for the coming iteration. Specifically, the a priori covariance matrix and mean of s are updated to C s a , T = C s e , DD and m s a , T = m s e , DD . The details of the proposed low-complexity cross-domain iterative detection for OTFS are given in Algorithm 1.
Algorithm 1 Low-Complexity Cross-Domain Iterative Detection for OTFS
   1:
Input:  r , H T , N 0 , and maximum iteration L max
   2:
Initialization: Set C s a , T = I M N , m z a , T = 0 M N , n iter = 1
   3:
while  n iter L max   do
   4:
    for  i = 0 , 1 , , N 1  do
   5:
       Compute the time domain LMMSE estimator matrix W MMSE i by (17) or (23)
   6:
       Compute the a posteriori information m s i p , T based on (18) and C s i p , T based on (19).
   7:
       Compute the extrinsic information C s i e , T based on (20) and m s i e , T based on (21).
   8:
   end for
   9:
   Compute the DD domain a priori information C s a , DD based on (30) and m x a , DD based on (31).
 10:
   Perform the symbol-by-symbol detection for DD domain symbols according to Algorithm 2 in [24].
 11:
   Compute the a posteriori information of the time domain signal s , i.e.,  m s p , DD based on (32) and C s p , DD based on (33).
 12:
   Compute the extrinsic information C s e , DD based on (34) and m s e , DD based on (35).
 13:
   Refresh the time domain a priori information C s a , T = C s e , DD and m s a , T = m s e , DD .
 14:
    n iter = n iter + 1
 15:
end while
 16:
Output: DD domain estimated signal x ^ .

4. Performance Analysis

In this section, we analyze the MSE performance and error performance of the proposed reduced-size cross-domain iterative detection algorithm.

4.1. Average MSE Analysis

Let us consider the a priori mean m x a , DD of the DD domain detector. Recalling (31), m x a , DD is related to the extrinsic mean output of the time-domain detector, i.e., m s e , T . We rewrite m s i e , T in (21) as follows:
m s i e , T k = v s i e , T k m s i p , T k v s i p , T k m s i a , T k v s i a , T k = 1 1 v s i p , T k 1 v s i a , T k m s i p , T k v s i p , T k m s i a , T k v s i a , T k = m s i p , T k v s i a , T k m s i a , T k v s i p , T k v s i a , T k v s i p , T k ,
with respect to the index k. In (36), v s i e , T k , v s i p , T k , and v s i a , T k represent the k-th diagonal element of C s i e , T , C s i p , T , and C s i a , T , respectively. According to (18), we define
m s i p , T k = m s i a , T k + m ˜ s i T k ,
where m ˜ s i T k is the k-th element of the vector
m ˜ s i T = W MMSE i r ˜ i H B i m s ˜ i a , T H A i m s i a , T .
Substituting (37) into (36), we further obtain
m s i e , T k = m s i a , T k + m ˜ s i T k v s i a , T k m s i a , T k v s i p , T k v s i a , T k v s i p , T k = m s i a , T k + v s i a , T k v s i a , T k v s i p , T k m ˜ s i T k .
Vectorizing (39), we have
m s i e , T = m s i a , T + C ˜ s i T m ˜ s i T ,
where C ˜ s i T = Δ diag v s i a , T k v s i a , T k v s i p , T k for k = 0 , , M 1 . Arranging m s i e , T , i = 0 , , N 1 , into a length- M N vector m s e , T , we can obtain
m s e , T = m s a , T + C ˜ s T m ˜ s T ,
where C ˜ s T = diag C ˜ s 0 T , , C ˜ s N 1 T and m ˜ s T = m ˜ s 0 T T , , m ˜ s N 1 T T T . Let us focus on the form of m ˜ s T , given by
m ˜ s T = W H A s W H A m s a , T + W H B s ˜ W H B m s ˜ a , T + W n ˜ ,
where W = diag W MMSE 0 , , W MMSE N 1 with size M N × 2 M N , H A = diag H A 0 , , H A N 1 with size 2 M N × M N , H B = diag H B 0 , , H B N 1 with size 2 M N × 2 M N , s ˜ = s ˜ 0 H , , s ˜ N 1 H H with size 2 M N × 2 M N , m s ˜ a , T = m s ˜ 0 a , T H , , m s ˜ N 1 a , T H H with size 2 M N × 1 , and n ˜ = n ˜ 0 H , , n ˜ N 1 H H with size 2 M N × 1 . According to (31), (38), and (42), we can rewrite m x a , DD as
m x a , DD = F N I M m s a , T + C ˜ s T m ˜ s T = F N I M C ˜ s T W H A s + I M N C ˜ s T W H A m s a , T + C ˜ s T W H B s ˜ m s ˜ a , T + C ˜ s T W n ˜ = x + F N I M C ˜ s T W H A F N H I M I M N x + F N I M I M N C ˜ s T W H A m s a , T + F N I M C ˜ s T W H B s ˜ m s ˜ a , T + F N I M C ˜ s T W n ˜ = x + F N I M C ˜ s T W H A F N I M s + F N I M I M N C ˜ s T W H A m s a , T + F N I M C ˜ s T W H B s ˜ m s ˜ a , T + F N I M C ˜ s T W n ˜
Investigating (43), we can obtain that the mean of m x a , DD , i.e., E m x a , DD , approaches zero when E m s a , T = s . From the derived condition for unbiased estimation in the DD domain, we note that it depends on the accuracy of the a priori mean m x a , DD of the time-domain estimator in the current l-th iteration, which is related not only to the output of the DD-domain estimator in the l 1 -th iteration but also to the accuracy of the output of the DD domain in all previous iterations. Evidently, according to (35), m s a , T l depends on m s e , T l 1 , C s a , DD l 1 , m s p , DD l 1 , m x p , DD l 1 , C s p , DD l 1 and C x p , DD l 1 , which are updated recursively in the iterations. Therefore, the quality of the proposed algorithm depends on the qualities of the time-domain and DD-domain estimators and can be improved by recursive iterations. This analysis will be further demonstrated in numerical results.
Without loss of generality, we will characterize the error performance of the proposed algorithm by tracking the average MSEs in both the time domain and the DD domain. We define the average MSEs of the inputs of the time-domain estimator and the DD-domain estimator in the l-th iteration as
MSE T l = Δ 1 M N E m s a , T l s H m s a , T l s
and
MSE DD l = Δ 1 M N E m x a , DD l x H m x a , DD l x ,
respectively. Based on the properties of the expectation operation, (44) can be further derived as
MSE T l = 1 M N E m s a , T l E m s a , T l + E m s a , T l s H m s a , T l E m s a , T l + E m s a , T l s = 1 M N E m s a , T l E m s a , T l H m s a , T l E m s a , T l + 1 M N E E m s a , T l s H E m s a , T l s = 1 M N Tr C m s a , T l + 1 M N bias m s a , T l , s 2 ,
where
C m s a , T l = E m s a , T l E m s a , T l m s a , T l E m s a , T l H
denotes the covariance matrix of m s a , T l , and bias m s a , T l , s = E m s a , T l s denotes the bias vector between m s a , T l and s . Similar to (46), (45) can be reformulated as
MSE DD l = 1 M N Tr C m x a , DD l + 1 M N bias m x a , DD l , x 2
Based on (46) and (48), the characterization of the error performance of the proposed algorithm is divided into two parts, namely the covariance of the observation of z (or x ) and the bias of the observation of z (or x ) and z (or x ). In the following subsections, we will analyze in detail the impact of the covariance of the observation and the bias of the estimator on the error performance of the estimator, respectively.

4.2. Bias Analysis by Monte Carlo Method

In this subsection, we explore the evolution of the average biases of the time-domain and DD-domain estimators. By utilizing the Monte Carlo method, we recursively track the average biases of the two domains in each iteration, which can be described as follows. Firstly, we establish a given time-domain effective channel matrix, as shown in (12), and a given signal-to-noise ratio (SNR). Then, in each Monte Carlo trial, the DD-domain transmitted symbol x is regenerated and transmitted according to the system model described in Section 2.1. During the transmission process, the DD-domain transmitted signal x and the time-domain transmitted signal z are retained for subsequent bias calculation. At the receiver, the proposed algorithm is executed multiple times to obtain the mean of the observation, i.e., E m s a , T and E m s a , DD . Finally, the average bias performance can be obtained by performing sufficient Monte Carlo trials as described above.
Figure 4 illustrates the Monte Carlo simulation results of the squared estimation bias under a given channel realization, where the signal-to-noise ratio (SNR) is given by E s / N 0 = 12 dB. The simulation parameters are considered as follows: P = 4 , M = 32 , N = 16 , QPSK, the channel coefficients of 0.27 + 0.35 i , 0.17 + 0.01 i , 0.56 0.33 i , 0.31 0.56 i , the delay indices of 0 , 1 , 4 , 2 , and the Doppler indices of 1.40 , 2.98 , 1.96 , 3.66 . The results are averaged over 1000 independent Monte Carlo trials, ensuring statistical stability of the estimated bias curves. As can be observed from the figure, the squared estimation biases of both the time-domain estimator and the DD-domain estimator decrease with the number of iterations, eventually converging to stable values. Moreover, the simulation results indicate that the residual estimation bias in both domains becomes negligible after sufficient iterations, thereby confirming the effectiveness and robustness of the proposed algorithm.

4.3. Variance Analysis by State Evolution

Recall that the above estimation bias analysis shows that the proposed estimators suffer from only negligible estimation bias in both domains. Thus, with the assumption of unbiased estimation, (46) can be rewritten as
MSE T l = 1 M N Tr C m s a , T l = 1 M N Tr C s a , T l ,
where the MSE of the residual errors in the time domain is determined by the input a priori variance of the time-domain estimator. This also indicates that it is of great significance to track the evolution of the a priori variance of each domain, where the error performance of the proposed algorithm can be characterized by recursively calculating the asymptotic MSE performance.
Without loss of generality, we define the average a priori variance of the inputs to the time-domain estimator and the DD-domain estimator in the l-th iteration as
v s a , T l = Δ 1 M N Tr C s a , T = 1 M N Tr C s e , DD
and
v s a , DD l = Δ 1 M N Tr C s a , DD = 1 M N Tr C s e , T ,
respectively. Under the unbiased estimation, the two states can be viewed as the asymptotically average MSEs of inputs in the l-th iteration. In the following, we will characterize the asymptotic MSE performance of the proposed detection scheme by state evolution.
Assume that the main diagonal entries of C s a , T and C s a , DD are of the same value as v s a , T l and v s a , DD l , respectively, for the l-th iteration. v s a , DD l can be represented as [24]
v s a , DD l = 1 1 v s p , T l 1 v s a , T l ,
according to (20). Substituting C s a , T = v s a , T l I M N into (19), the average a posteriori variance of C s p , T can be obtained as
v s p , T l = v s a , T l v s a , T l 2 M N × i = 0 N 1 Tr H A i H v s a , T l H A i H A i H + v s a , T l H B i H B i H + N 0 I 2 M 1 H A i
Based on (52) and (53), the state evolution from the state v s a , T l to the state v s a , DD l can be obtained.
According to (34), the update of the state v s a , T l + 1 can be given by
v s a , T l + 1 = 1 1 v s p , DD l 1 v s a , DD l ,
where
v s p , DD l = Δ 1 M N Tr C s p , DD = lim M N 1 M N Tr C x p , DD .
We define the average of the main diagonal entries of C x p , DD by [24]
v x p , DD l = E x E x x + ξ 2 = M S E η DD l ,
where x is an arbitrary DD-domain OTFS symbol and ξ is a complex AWGN sample with zero mean and variance 1 / η DD . In (56), η DD l denotes the effective SNR for the DD domain in the l-th iteration, i.e., the ratio between the DD-domain OTFS signal energy and the average a priori variance of the inputs to the DD-domain estimator, given by
η DD l = 1 v s a , DD l ,
where the DD-domain OTFS signal energy is normalized, i.e., E x 2 = 1 . Thus, we have
v s p , DD l = v x p , DD l .
According to (54), (56), and (58), we can obtain the state evolution from the state v s a , DD l to the state v s a , T l + 1 . In the above state evolution, we use the Monte Carlo method to obtain M S E η DD l .
By iteratively updating the MSE state according to (52) and (54), the state evolution can then be derived.

4.4. MSE Boundary Analysis

In addition to the derived state evolution, we further apply bounding techniques to discuss the insights of the proposed scheme. Note that the proposed scheme adopts the SIC for time-domain estimation. Therefore, depending on whether SIC can fully eliminate the interference, the MSE performance can be bounded by applying the treating-interference-as-noise (TIN) strategy (corresponding to the worst-case scenario) and the genie-aided strategy (corresponding to the best-case scenario).

4.4.1. TIN Strategy

TIN is a known interference management technique when the interference is sufficiently weak. Thus, TIN application corresponds to the worst-case scenario of the proposed scheme, where the interference is treated as noise regardless of its strength. Using the TIN strategy, the block-wise LMMSE estimation matrix W MMSE i for the i-th transmitted block s i becomes
W MMSE i = C s i a , T H A i H H A i C s i a , T H A i H + λ ISI i I 2 M + N 0 I 2 M 1
where λ ISI i = 1 2 M Tr H B i C s ˜ i a , T H B i H denotes the average interference energy of the i-th block. Correspondingly, the a posteriori estimation output m s i p , T of s i is replaced by
m s i p , T = m s i a , T + W MMSE i r ˜ i H A i m s i a , T .
The a posteriori covariance matrix C s i p , T , the extrinsic covariance matrix C s i e , T , and the extrinsic mean m s i e , T of s i can still be calculated by (19), (20), and (21), respectively. The remaining steps are the same as those shown in Algorithm 1. In the state evolution, the key term v s p , T l is modified by
v s p , T l = v s a , T l v s a , T l 2 M N × i = 0 N 1 Tr H A i H v s a , T l H A i H A i H + λ ISI i I 2 M + N 0 I 2 M 1 H A i
Based on (52) and (54), the average MSE via state evolution for the TIN case is thereby obtained.

4.4.2. Genie-Aided Strategy

This strategy assumes no inter-block interference in the time domain. We use this baseline as the upper bound of the MSE performance in our analysis. The block-wise LMMSE estimation matrix W MMSE i for the i-th transmitted block s i is changed to
W MMSE i = C s i a , T H A i H H A i C s i a , T H A i H + N 0 I 2 M 1 .
The corresponding a posterior mean m s i p , T and covariance matrix C s i p , T are calculated by (60) and (19), respectively. The extrinsic mean m s i e , T and covariance matrix C s i e , T and the extrinsic of s i are calculated by (20) and (21), respectively. The key term v s p , T l is modified by
v s p , T l = v s a , T l v s a , T l 2 M N × i = 0 N 1 Tr H A i H v s a , T l H A i H A i H + N 0 I 2 M 1 H A i .
Therefore, according to Algorithm 1 and the above variance analysis via state evolution, we can achieve an upper bound on the MSE performance under the genie-aided strategy.
It should be noted that both TIN and genie-aided bounds are of theoretical significance, as they together indicate whether the residual interference in the time domain can be minimized by the cross-domain iteration. As we will demonstrate in the numerical results part, the TIN bound and genie-aided bound will converge to each other after a sufficient number of cross-domain iterations. This suggests that the adopted reduced-size LMMSE estimator aligns well with the mechanism of the cross-domain iterative detection, where only the interference caused by the delay needs to be considered for the time-domain estimation, while the interference caused by Doppler can be resolved naturally by the cross-domain iteration.

4.5. Converged Error Performance Analysis

In this subsection, we will investigate the converged error performance by focusing on the effective DD-domain SNR under the TIN and genie-aided strategies. Observing (61) and (63), we find that G A i = Δ H A i H A i H is a Hermitian matrix. Applying eigenvalue decomposition, we can obtain G A i = U A i Λ A i U A i H , where U A i is a unitary matrix and Λ A i is a diagonal matrix containing the descending eigenvalues of G A i , i.e., Λ A i = d i a g λ A i , 0 , , λ A i , j , , λ A i , 2 M 1 . Note that G A i has only M non-zero eigenvalues. Thus, we have the following Lemma.
Lemma 1 
(Time-Domain a Posteriori Variance). The time-domain a posteriori variance v s p , T l under the TIN strategy can be simplified by
v s p , T l = v s a , T l v s a , T l M N i = 0 N 1 j = 0 M 1 v s a , T l λ A i , j v s a , T l λ A i , j + λ ISI i + N 0 .
Similarly, let λ ISI i = 0 . The time-domain a posteriori variance v s p , T l under the genie-aided strategy can be obtained, i.e.,
v s p , T l = v s a , T l v s a , T l M N i = 0 N 1 j = 0 M 1 v s a , T l λ A i , j v s a , T l λ A i , j + N 0 .
Proof. 
The proof is given in Appendix A. □
In order to obtain further insights, we apply Jensen’s inequality and the assumption that different resolvable paths have different delay indices, i.e., l i l j , i j , 1 j , j P . (This assumption is realistic due to the fact that resolvable paths usually originate from geographically distinct reflectors. Thus, different resolvable paths are generally unlikely to have the exact same delay [30]. In particular, in sparse reflector environments, it is even less likely that different paths share the same delay [24].) We can derive the lower bound of the DD-domain a priori variance v s a , DD l , as shown in the following Theorem.
Theorem 1 
(Lower Bound of v s a , DD l ). The DD-domain a priori variance v s a , DD l under the TIN strategy is lower-bounded by
v s a , DD l v s a , T l 2 h 2 + N 0 h 2 ,
where λ ISI i = v s a , T l 2 h 2 represents the energy of the interference terms; h = h 1 , , h P T denotes the channel coefficients vector; and · 2 denotes the square of the Euclidean norm of a vector.
Similarly, for the genie-aided strategy, we have
v s a , DD l N 0 h 2 .
Proof. 
The proof is given in Appendix B. □
Based on the above Theorem 1, we can obtain the upper bound of the DD-domain effective SNR η DD l , as shown in the following corollary.
Corollary 1 
(The Upper Bound of η DD l Under TIN Strategy). The DD-domain effective SNR η DD l under the TIN strategy is upper-bounded by
η DD l h 2 v s a , T l 2 h 2 + N 0 .
Proof. 
The proof can be derived from (57) and (66). □
Corollary 2 
(The Upper Bound of η DD l Under the Genie-Aided Strategy). The DD-domain effective SNR η DD l under the genie-aided strategy is upper-bounded by
η DD l h 2 N 0 .
Proof. 
The proof can be derived from (57) and (67). □
Compared with the TIN strategy, the difference in the genie-aided strategy is that there is no inter-block interference in the time domain, i.e., λ ISI i = 0 in (66). With the iteration between the time-domain detector and the DD-domain detector, λ ISI i gradually tends to 0. As shown in Corollaries 1 and 2, the upper bound of η DD l under the TIN strategy converges to that under the genie-aided strategy with the iteration of the cross-domain detection. Furthermore, for a given fading channel, the effective DD-domain SNR of the proposed algorithm can theoretically approach the maximum receiver SNR with a sufficient number of iterations. This also indicates that the proposed algorithm can approach the error performance of MLSE theoretically given a sufficient number of iterations. In [24], the proposed cross-domain detection algorithm also approaches the maximum receiver SNR with sufficient iterations. This also shows that the proposed algorithm with low complexity has almost the same performance as that in [24] under sufficient iterations.

5. Numerical Results

We present the numerical results of the proposed schemes in this section. As an example, consider P = 4 , M = 64 , N = 32 , QPSK. The maximum delay and Doppler index are 10 and 5. In the figures, “Proposed 1” refers to the cross-domain iterative detection scheme based on the reduced-size LMMSE, while “Proposed 2” denotes the cross-domain iterative detection scheme employing the EVD-based reduced-size LMMSE. For comparison, “Full-size” corresponds to the cross-domain iterative detection scheme based on the full-size LMMSE proposed in [24].
The state (MSE) evolution performance of the proposed scheme at a signal-to-noise ratio (SNR) E s / N 0 = 12 dB is given in Figure 5, where both the actual MSE, the MSE evolution results, and the derived TIN and genie-aided bounds are presented in comparison to the results obtained from [24]. In the simulation, the channel coefficients, Doppler indices, and delay indices are 0.02 0.09 i , 0.40 + 0.73 i , 0.03 + 0.45 i , 0.15 0.43 i , 4.82 , 3.23 , 1.38 , 2.47 , and 0 , 8 , 4 , 6 , respectively. As shown in Figure 5, our derived state evolution provides a good prediction of the actual MSE performance, where both the actual MSE and the MSE derived from the state evolution decrease first and then saturate at MSEs around 9 × 10 5 after sufficient iterations. Furthermore, we can observe that the derived state evolution matches perfectly with both the TIN and genie-aided bounds, which verifies the correctness of our derivation. Finally, we notice that the proposed scheme only exhibits marginal MSE loss compared to the scheme in [24] in early iterations both numerically and theoretically, and this loss becomes negligible with an increased number of iterations.
Building upon the previous discussion of the time-domain state evolution in Figure 5, we now turn our attention to the DD domain. Figure 6 presents the state evolution of the proposed scheme (“proposed 1”), as well as its convergence trajectory, under the same channel parameters as in Figure 5 but at an SNR of E s / N 0 = 10 dB. For clarity, the figure also includes the DD-domain error-state lower bound corresponding to the genie-aided strategy, as established in Theorem 1. In Figure 6, the horizontal and vertical axes represent the pairs of input–output MSEs under unbiased estimation in either the time or DD domain. The results demonstrate that the error states of the proposed scheme match well with the derived genie-aided lower bound, thereby confirming the validity of both our theoretical derivation and the proposed state evolution framework. Moreover, Figure 6 also shows the convergence trajectory of the proposed detector (depicted as the unlabeled dashed line). As shown, the algorithm converges to a stable MSE of approximately 0.0899 within three or four iterations. In general, under practical SNR ranges of 5–15 dB, the algorithm typically converges within three to five iterations. It is also worth noting that, at higher receive SNRs, the convergence point further decreases, leading to significantly lower MSE levels. This highlights the fact that the proposed scheme not only tracks the genie-aided bound but also approaches near-optimal detection performance under sufficiently high SNR conditions, thereby providing a favorable trade-off between complexity and performance.
In Figure 7, we compare the bit error rate (BER) performance of the proposed schemes with that of the full-size LMMSE baseline [24]. In the simulations, we consider integer delay and fractional Doppler, as fractional Doppler better reflects practical high-mobility scenarios. As observed from the figure, Proposed 1 suffers from a noticeable performance degradation with one iteration compared to the scheme in [24] at high SNRs due to the imperfect SIC adopted in the scheme in the early iterations. However, we notice that Proposed 1 shows roughly the same performance as the scheme in [24] with five iterations, and both their results converge to the optimal performance obtained by using the MP algorithm [19] with only integer delay and Doppler indices. Therefore, we observe that Proposed 1 enjoys a near-optimal performance with reduced complexity. Specifically, the proposed cross-domain iterative detection scheme based on the reduced-size LMMSE achieves approximately a 99% reduction in computational complexity compared with the conventional full-size LMMSE, while maintaining comparable detection performance. This demonstrates a favorable trade-off between computational efficiency and performance.
In addition to the BER performance shown in Figure 7, we evaluate the pragmatic capacity of the proposed reduced-size LMMSE scheme for QPSK signaling. Figure 8 presents the pragmatic capacity after five iterations. As observed, the capacity increases with SNR and approaches the theoretical maximum of 2 bits/s/Hz at a high SNR, indicating that the proposed algorithm not only achieves near-optimal detection performance but also maintains high spectral efficiency. This further highlights the practical benefits of the reduced computational complexity, enabling efficient real-time implementation without compromising throughput.
Figure 9 further illustrates the performance of the two proposed schemes, where the parameters are set to M = 32 and N = 16 . It can be observed that, as the SNR increases, Proposed 2 exhibits inferior performance compared with Proposed 1. This performance degradation arises from two main factors. On the one hand, the EVD method relies on the asymptotic assumption that M N , which is not fully satisfied in the finite-size simulation setting. On the other hand, the computation of extrinsic information involves inversion, which is prone to numerical instability, particularly in high-SNR regimes. Nevertheless, the observed performance loss remains acceptable when considering the overall computational complexity. In addition, Figure 9 also presents a BER performance comparison of Proposed 1 under fractional delay against its performance under integer delay. The results show that the proposed scheme continues to perform effectively under fractional delay, highlighting its effectiveness, robustness, and applicability in more general and practical channel conditions.

6. Conclusions

In this paper, we studied OTFS transmission using the discrete Zak transform and derived vectorized input–output relations in both the time and delay–Doppler domains, revealing that delay and Doppler effects can be decoupled. Based on this, we proposed a reduced-complexity cross-domain iterative detection algorithm that combines a reduced-size LMMSE estimator in the time domain with symbol-by-symbol detection in the DD domain, where extrinsic information is exchanged via a unitary transformation. By applying eigenvalue decomposition, the LMMSE complexity becomes independent of the number of iterations. Analytical and numerical results show that the estimator bias is negligible, the TIN and genie-aided bounds converge with sufficient iterations, and the effective DD-domain SNR approaches the maximum receiver SNR. Overall, the proposed scheme achieves near-optimal error performance with reduced computational complexity.

Author Contributions

Conceptualization, M.L., S.L. and B.B.; Methodology, M.L., S.L. and G.C.; Software, M.L.; Validation, S.L.; Formal analysis, M.L. and S.L.; Writing—original draft, M.L.; Writing—review & editing, M.L., S.L., B.B. and G.C.; Funding acquisition, S.L., B.B. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant 62171356, and in part by the National Key Research and Development Program of China under Grant 2021YFA1000500. It was also supported by the European Union’s Horizon 2020 Research and Innovation Program under MSCA Grant No. 101105732—DDComRad, and by the Bundesministerium für Bildung und Forschung (BMBF) Germany in the program of “Souverän. Digital. Vernetzt.” Joint project 6G Research and Innovation Cluster (6G-RIC), project identification number: 16KISK030. The APC was funded by the Technical University of Berlin (TU Berlin) through the MDPI Institutional Open Access Program (IOAP).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Derivation for (64)

According to (61) and Tr AB = Tr BA , we have
v s p , T l = v s a , T l v s a , T l 2 M N i = 0 N 1 Tr v s a , T l H A i + H A i H λ ISI i I 2 M + N 0 I 2 M 1 H A i H A i H .
Substituting H A i H A i H = U A i Λ A i U A i H into (A1), we get (A2), i.e.,
v s p , T l = v s a , T l v s a , T l 2 M N i = 0 N 1 Tr v s a , T l U A i Λ A i U A i H + λ ISI i I 2 M + N 0 I 2 M 1 U A i Λ A i U A i H = v s a , T l v s a , T l 2 M N i = 0 N 1 Tr U A i H 1 v s a , T l Λ A i + λ ISI i I 2 M + N 0 I 2 M 1 U A i 1 U A i Λ A i U A i H = v s a , T l v s a , T l 2 M N i = 0 N 1 Tr v s a , T l Λ A i + λ ISI i I 2 M + N 0 I 2 M 1 Λ A i = v s a , T l v s a , T l M N i = 0 N 1 j = 0 M 1 v s a , T l λ A i , j v s a , T l λ A i , j + λ ISI i + N 0 .
This completes the proof of Lemma 1.

Appendix B. Derivation for (66)

Observing (64), we define a function, i.e., f λ = v λ v λ + η , with respect to λ , where v 0 and η > 0 . Since its second-order derivative, i.e., f λ = 2 v 2 η v λ + η 3 , is always negative, the considered function is concave. According to Jensen’s inequality, (A2) can be represented as
v s p , T l v s a , T l v s a , T l N i = 0 N 1 v s a , T l M j = 0 M 1 λ A i , j v s a , T l M j = 0 M 1 λ A i , j + λ ISI i + N 0 ,
where the lower bound becomes tighter with the decrease in v s a , T l and the equality is achieved when v s a , T l = 0 . Note that j = 0 M 1 λ A i , j = M h 2 and λ ISI i = Tr H B i C s ˜ i a , T H B i H / 2 M = v s a , T l h 2 / 2 , i 0 , 1 , , N 1 , where h = h 1 , , h P T denotes the channel coefficients vector and · 2 denotes the square of the Euclidean norm of a vector. Thus, (A3) is rewritten as
v s p , T l v s a , T l v s a , T l N i = 0 N 1 v s a , T l h 2 v s a , T l h 2 + v s a , T l 2 h 2 + N 0 = v s a , T l v s a , T l 2 h 2 v s a , T l h 2 + v s a , T l 2 h 2 + N 0 .
Substituting (A4) into (52), we arrive at
v s a , DD l = 1 1 v s p , T l 1 v s a , T l 1 1 v s a , T l v s a , T l 2 h 2 v s a , T l h 2 + v s a , T l 2 h 2 + N 0 1 v s a , T l = v s a , T l v s a , T l 2 h 2 v s a , T l h 2 + v s a , T l 2 h 2 + N 0 v s a , T l h 2 v s a , T l h 2 + v s a , T l 2 h 2 + N 0 = v s a , T l 2 h 2 + N 0 h 2 .
This completes the proof of Theorem 1.

References

  1. You, X.; Wang, C.X.; Huang, J.; Gao, X.; Zhang, Z.; Wang, M.; Huang, Y.; Zhang, C.; Jiang, Y.; Wang, J.; et al. Towards 6G wireless communication networks: Vision, enabling technologies, and new paradigm shifts. Sci. China Inf. Sci. 2021, 64, 5–78. [Google Scholar] [CrossRef]
  2. Wei, Z.; Li, S.; Yuan, W.; Schober, R.; Caire, G. Orthogonal Time Frequency Space Modulation–Part I: Fundamentals and Challenges Ahead. IEEE Commun. Lett. 2023, 27, 4–8. [Google Scholar] [CrossRef]
  3. Wang, T.; Proakis, J.; Masry, E.; Zeidler, J. Performance degradation of OFDM systems due to Doppler spreading. IEEE Trans. Wirel. Commun. 2006, 5, 1422–1432. [Google Scholar] [CrossRef]
  4. Hadani, R.; Rakib, S.; Tsatsanis, M.; Monk, A.; Goldsmith, A.J.; Molisch, A.F.; Calderbank, R. Orthogonal Time Frequency Space Modulation. In Proceedings of the 2017 IEEE Wireless Communications and Networking Conference (WCNC), San Francisco, CA, USA, 19–22 March 2017; pp. 1–6. [Google Scholar] [CrossRef]
  5. Gaudio, L.; Colavolpe, G.; Caire, G. OTFS vs. OFDM in the Presence of Sparsity: A Fair Comparison. IEEE Trans. Wirel. Commun. 2022, 21, 4410–4423. [Google Scholar] [CrossRef]
  6. Groll, H.; Zöchmann, E.; Pratschner, S.; Lerch, M.; Schützenhöfer, D.; Hofer, M.; Blumenstein, J.; Sangodoyin, S.; Zemen, T.; Prokeš, A.; et al. Sparsity in the Delay-Doppler Domain for Measured 60 GHz Vehicle-to-Infrastructure Communication Channels. In Proceedings of the 2019 IEEE International Conference on Communications Workshops (ICC Workshops), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar] [CrossRef]
  7. Hlawatsch, F.; Matz, G. Wireless Communications over Rapidly Time-Varying Channels; Academic Press: Cambridge, MA, USA, 2011. [Google Scholar]
  8. Surabhi, G.D.; Augustine, R.M.; Chockalingam, A. On the Diversity of Uncoded OTFS Modulation in Doubly-Dispersive Channels. IEEE Trans. Wirel. Commun. 2019, 18, 3049–3063. [Google Scholar] [CrossRef]
  9. Zhang, H.; Huang, X.; Zhang, J.A. Comparison of OTFS Diversity Performance over Slow and Fast Fading Channels. In Proceedings of the 2019 IEEE/CIC International Conference on Communications in China (ICCC), Changchun, China, 11–13 August 2019; pp. 828–833. [Google Scholar] [CrossRef]
  10. Raviteja, P.; Hong, Y.; Viterbo, E.; Biglieri, E. Effective Diversity of OTFS Modulation. IEEE Wirel. Commun. Lett. 2020, 9, 249–253. [Google Scholar] [CrossRef]
  11. Li, S.; Yuan, J.; Yuan, W.; Wei, Z.; Bai, B.; Ng, D.W.K. Performance Analysis of Coded OTFS Systems Over High-Mobility Channels. IEEE Trans. Wirel. Commun. 2021, 20, 6033–6048. [Google Scholar] [CrossRef]
  12. Tiwari, S.; Das, S.S.; Rangamgari, V. Low complexity LMMSE Receiver for OTFS. IEEE Commun. Lett. 2019, 23, 2205–2209. [Google Scholar] [CrossRef]
  13. Long, F.; Niu, K.; Dong, C.; Lin, J. Low Complexity Iterative LMMSE-PIC Equalizer for OTFS. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar] [CrossRef]
  14. Surabhi, G.D.; Chockalingam, A. Low-Complexity Linear Equalization for OTFS Modulation. IEEE Commun. Lett. 2020, 24, 330–334. [Google Scholar] [CrossRef]
  15. Zou, T.; Xu, W.; Gao, H.; Bie, Z.; Feng, Z.; Ding, Z. Low-Complexity Linear Equalization for OTFS Systems with Rectangular Waveforms. In Proceedings of the 2021 IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
  16. Surabhi, G.D.; Chockalingam, A. Low-complexity Linear Equalization for 2 × 2 MIMO-OTFS Signals. In Proceedings of the 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Atlanta, GA, USA, 26–29 May 2020; pp. 1–5. [Google Scholar] [CrossRef]
  17. Raviteja, P.; Phan, K.T.; Hong, Y.; Viterbo, E. Interference Cancellation and Iterative Detection for Orthogonal Time Frequency Space Modulation. IEEE Trans. Wirel. Commun. 2018, 17, 6501–6515. [Google Scholar] [CrossRef]
  18. Gaudio, L.; Kobayashi, M.; Caire, G.; Colavolpe, G. On the Effectiveness of OTFS for Joint Radar Parameter Estimation and Communication. IEEE Trans. Wirel. Commun. 2020, 19, 5951–5965. [Google Scholar] [CrossRef]
  19. Li, S.; Yuan, W.; Wei, Z.; Yuan, J.; Bai, B.; Ng, D.W.K.; Xie, Y. Hybrid MAP and PIC Detection for OTFS Modulation. IEEE Trans. Veh. Technol. 2021, 70, 7193–7198. [Google Scholar] [CrossRef]
  20. Xiang, L.; Liu, Y.; Yang, L.L.; Hanzo, L. Gaussian Approximate Message Passing Detection of Orthogonal Time Frequency Space Modulation. IEEE Trans. Veh. Technol. 2021, 70, 10999–11004. [Google Scholar] [CrossRef]
  21. Yuan, W.; Wei, Z.; Yuan, J.; Ng, D.W.K. A Simple Variational Bayes Detector for Orthogonal Time Frequency Space (OTFS) Modulation. IEEE Trans. Veh. Technol. 2020, 69, 7976–7980. [Google Scholar] [CrossRef]
  22. Liu, M.; Li, S.; Wei, Z.; Bai, B.; Caire, G.; Ng, D.W.K. Near Optimal Hybrid Digital-Analog Beamforming for mmWave Point-to-Point MIMO Transmissions Using OTFS Waveforms. IEEE Trans. Commun. 2025, 73. [Google Scholar] [CrossRef]
  23. Yuan, Z.; Liu, F.; Yuan, W.; Guo, Q.; Wang, Z.; Yuan, J. Iterative Detection for Orthogonal Time Frequency Space Modulation with Unitary Approximate Message Passing. IEEE Trans. Wirel. Commun. 2022, 21, 714–725. [Google Scholar] [CrossRef]
  24. Li, S.; Yuan, W.; Wei, Z.; Yuan, J. Cross Domain Iterative Detection for Orthogonal Time Frequency Space Modulation. IEEE Trans. Wirel. Commun. 2022, 21, 2227–2242. [Google Scholar] [CrossRef]
  25. Chong, R.; Li, S.; Wei, Z.; Matthaiou, M.; Ng, D.W.K.; Caire, G. Cross-Domain Iterative Detection for OTFS Transmission with Frequency Domain Equalization. IEEE Trans. Commun. 2025, 73. [Google Scholar] [CrossRef]
  26. Raviteja, P.; Hong, Y.; Viterbo, E.; Biglieri, E. Practical Pulse-Shaping Waveforms for Reduced-Cyclic-Prefix OTFS. IEEE Trans. Veh. Technol. 2019, 68, 957–961. [Google Scholar] [CrossRef]
  27. Liu, M.; Li, S.; Bai, B.; Caire, G. Reduced-Complexity Cross-Domain Iterative Detection for OTFS Modulation via Delay-Doppler Decoupling. In Proceedings of the 2023 IEEE 24th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Shanghai, China, 25–28 September 2023; pp. 546–550. [Google Scholar] [CrossRef]
  28. Hong, Y.; Thaj, T.; Raviteja, P. Delay-Doppler Communications: Principles and Applications; Academix Press: Cambridge, MA, USA, 2022. [Google Scholar]
  29. Walker, D.W.; Aldcroft, T.; Cisneros, A.; Fox, G.C.; Furmanski, W. LU decomposition of banded matrices and the solution of linear systems on hypercubes. In Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, Pasadena CA, USA, 19–20 January 1988; Association for Computing Machinery: New York, NY, USA, 1989; pp. 1635–1655. [Google Scholar]
  30. Chong, R.; Li, S.; Yuan, J.; Ng, D.W.K. Achievable Rate Upper-Bounds of Uplink Multiuser OTFS Transmissions. IEEE Wirel. Commun. Lett. 2022, 11, 791–795. [Google Scholar] [CrossRef]
Figure 1. The transceiver structure of DZT-based OTFS transmissions.
Figure 1. The transceiver structure of DZT-based OTFS transmissions.
Entropy 27 01062 g001
Figure 2. Brief illustration of the interference pattern between the time-domain transmitted vector s and the received vector r .
Figure 2. Brief illustration of the interference pattern between the time-domain transmitted vector s and the received vector r .
Entropy 27 01062 g002
Figure 3. The block diagram of the considered cross-domain iterative receiver with the time-domain reduced-size LMMSE.
Figure 3. The block diagram of the considered cross-domain iterative receiver with the time-domain reduced-size LMMSE.
Entropy 27 01062 g003
Figure 4. Evaluation of estimation bias for the proposed scheme.
Figure 4. Evaluation of estimation bias for the proposed scheme.
Entropy 27 01062 g004
Figure 5. Comparison of time-domain state (MSE) evolution performance of the proposed scheme and the scheme in [24], as well as the derived bounds.
Figure 5. Comparison of time-domain state (MSE) evolution performance of the proposed scheme and the scheme in [24], as well as the derived bounds.
Entropy 27 01062 g005
Figure 6. State (MSE) evolution performance and convergence trajectory of the proposed scheme. The red dashed line represents the convergence trajectory of the proposed detector.
Figure 6. State (MSE) evolution performance and convergence trajectory of the proposed scheme. The red dashed line represents the convergence trajectory of the proposed detector.
Entropy 27 01062 g006
Figure 7. Comparison of BER performance of the proposed scheme (“Proposed 1”), the scheme in [24] (“Full-size”), and the optimal detection in [19] (“Optimal”).
Figure 7. Comparison of BER performance of the proposed scheme (“Proposed 1”), the scheme in [24] (“Full-size”), and the optimal detection in [19] (“Optimal”).
Entropy 27 01062 g007
Figure 8. The pragmatic capacity of the proposed scheme (“Proposed 1”).
Figure 8. The pragmatic capacity of the proposed scheme (“Proposed 1”).
Entropy 27 01062 g008
Figure 9. Comparison of BER performance of the proposed schemes.
Figure 9. Comparison of BER performance of the proposed schemes.
Entropy 27 01062 g009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, M.; Li, S.; Bai, B.; Caire, G. Cross-Domain OTFS Detection via Delay–Doppler Decoupling: Reduced-Complexity Design and Performance Analysis. Entropy 2025, 27, 1062. https://doi.org/10.3390/e27101062

AMA Style

Liu M, Li S, Bai B, Caire G. Cross-Domain OTFS Detection via Delay–Doppler Decoupling: Reduced-Complexity Design and Performance Analysis. Entropy. 2025; 27(10):1062. https://doi.org/10.3390/e27101062

Chicago/Turabian Style

Liu, Mengmeng, Shuangyang Li, Baoming Bai, and Giuseppe Caire. 2025. "Cross-Domain OTFS Detection via Delay–Doppler Decoupling: Reduced-Complexity Design and Performance Analysis" Entropy 27, no. 10: 1062. https://doi.org/10.3390/e27101062

APA Style

Liu, M., Li, S., Bai, B., & Caire, G. (2025). Cross-Domain OTFS Detection via Delay–Doppler Decoupling: Reduced-Complexity Design and Performance Analysis. Entropy, 27(10), 1062. https://doi.org/10.3390/e27101062

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop