Low-Latency Marine-Based OTFS Echo Parameter Estimation Enabled by AI

Hussain, Khurshid; Yoo, Jeseon

doi:10.3390/s25237104

Open AccessArticle

Low-Latency Marine-Based OTFS Echo Parameter Estimation Enabled by AI

by

Khurshid Hussain

and

Jeseon Yoo

^*

Ocean Climate Prediction Center, Marine Natural Disaster Research Department, Korea Institute of Ocean Science and Technology, Busan 49111, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(23), 7104; https://doi.org/10.3390/s25237104

Submission received: 27 October 2025 / Revised: 7 November 2025 / Accepted: 19 November 2025 / Published: 21 November 2025

(This article belongs to the Topic AI-Driven Wireless Channel Modeling and Signal Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

We propose an end-to-end pipeline for Orthogonal Time–Frequency Space (OTFS) sensing that integrates deterministic signal processing with a Machine-Learning (ML) inference stage. The pipeline first generates a complex delay–Doppler grid via standard Symplectic Fast Fourier Transform (SFFT)-based OTFS reception. We then employ an ’oracle’ Ground-Truth (GT) association process to deterministically label signal peaks, extracting their complex gain (

α

) and absolute indices

(m, n)

to deduce physical targets (range, radial velocity). These oracle-aligned labels are used to train a Random-Forest (RF) classifier. The RF model learns to map normalized

33 \times 33

complex patches, centered on signal peaks, to their corresponding target parameters. On an 80/20 split of 10,000 samples, the classifier achieved a 0.966 accuracy, 0.965 macro-F1 score, and 0.998 macro Receiver Operating Characteristic–Area Under the Curve (ROC–AUC). Notably, when tested on held-out scenes, the model’s derived range and velocity predictions achieved

100 %

coincidence with the GT, while amplitude and phase corresponded in

89 %

of instances. This hybrid oracle-and-ML approach demonstrates a highly effective and robust method for precise target extraction in OTFS-based sensing systems.

Keywords:

AI; delay–Doppler; integrated sensing and communications; OTFS; parameter estimation

Graphical Abstract

1. Introduction

Future wireless systems must deliver both data communication and situational awareness (sensing) as twin outputs, necessitating shared waveforms, bandwidth, and latency budgets. While the fundamental principles of radar ensure reliable range and velocity extraction, the requirements of modern embedded deployments impose tighter constraints on computing resources and power consumption. Nevertheless, the practical sensing with OTFS must deal with inter-path interference (IPI), leakage, and fractional delay/Doppler, but still needs to be real-time. One of the reasons that OTFS has a good appeal is that it models the propagation in the delay–Doppler (DD) domain, which is the area where the target energy is sparse and quite physically interpretable beforehand [1,2,3,4,5,6]. The work done previously has partly covered the whole area: DD-domain IPI cancellation (DDIPIC) for joint estimation [7], channel estimation with exclusive or embedded pilots [8,9], and Bayesian/ML estimators sparse Bayesian learning (SBL), modified maximum likelihood estimate (M-MLE), and two-step estimation (TSE) that handle fractional effects with varying assumptions and complexity [10,11,12].

Spectrum overcrowding and mutual induction of antennae create self/cross-interference to a great extent in joint communication–sensing, thus driving the need for intelligent cancellation that is power and latency tight. Aiming at unifying these objectives, OTFS was first developed to manage the radar–communication waveform mismatch along with developments in pulse shaping, channel estimation, and detection [1,9,13,14,15,16]. Despite this, many OTFS pipelines still consider the delay–Doppler to be an integer or depend on extensive refinements, which makes them susceptible to fractional DD, leakage, and clutter [17,18,19,20]. Traditional high-resolution techniques, Multiple Signal Classification (MUSIC, noise-subspace), require eigen-decompositions and dense matrix-vector operations that slow down their use in real time [21,22]. Fast Fourier Transform/Fast Coherent Cross-Correlation (FFT/FCCC) range-Doppler maps with Cell-Averaging Constant False Alarm Rate (CA-CFAR) are efficient but biased towards single-target scenes and hence need millimeter-level precision; the generation/scanning of 2D RDM is expensive [23,24]. Virtual cyclic prefix (VCP) simplifies the task by working in the Fourier domain, but does not solve the problem of label fidelity in supervised sensing [25]. Atomic-norm OTFS with Alternating Direction Method of Multipliers (ADMM) and Multiple Signal Classification (2D-MUSIC) can achieve a very high resolution at a considerable computational and modeling cost [26]. Noting these gaps and supporting the approach of DD-domain IPI cancellation (DDIPIC) [27], we claim strict and physics-aware oracle labeling and applying a Random-Forest learner on

33 \times 33

DD patches. This results in GT-consistent supervision and millisecond-class inference.

Traditional and full sensing pipelines in ISAC typically follow either (i) a 2D-FFT + CFAR chain, which is hardware-friendly but sensitive to leakage, near–far masking, and non-uniform clutter thresholds [28,29,30,31,32]; or (ii) subspace/super-resolution methods (e.g., MUSIC/ESPRIT), which offer fine resolution under clean modeling assumptions but incur higher computational/latency costs and brittleness to colored clutter or snapshot scarcity [33,34,35,36]. In contrast, our approach operates natively in the OTFS delay–Doppler (DD) grid and evaluates only small peak-centered patches with a lightweight classifier, preserving the physics of the DD lattice while avoiding global 2D searches and heavy threshold sweeps. This yields millisecond-class inference, robust bin recovery for range/velocity, and strong tolerance to fractional-bin leakage and non-uniform clutter. Radar principles support reliable range/velocity extraction, and OTFS provides a physically interpretable DD-domain model with demonstrated tools for IPI cancellation, pilot-aided channel estimation, and fractional-handling estimators (as reviewed above with their cited works). Despite these advances, many pipelines assume integer DD bins or rely on heavy refinements, leaving label fidelity under fractional DD/leakage unresolved and introducing runtime/computational bottlenecks in real-time deployments (as detailed above in the cited literature).

Our proposed formulation yields Ground-Truth-consistent supervision and millisecond-class inference latency. The model performs with

0.966

accuracy,

0.965

macro-F1,

0.998

macro ROC–AUC, and

0.972

macro average precision on a balanced 10k corpus (80/20 split). Critically, in held-out scenes, the derived delay and Doppler bins—and consequently range and velocity—perfectly match the GT at a rate of

100 %

, while amplitude and phase agree in

89 %

of instances. This formulation successfully eradicates the labeling and runtime gaps while preserving the physical interpretability and deployability of the OTFS sensing system. In this work, the oracle association is used only in simulation to generate unambiguous labels and to validate lattice indexing/units. At deployment, the pipeline remains GT-free: the receiver detects DD peaks (CFAR + NMS), extracts normalized complex patches, and the trained model regresses

(m, n)

(hence range/velocity) directly from data. Beyond synthetic validation, we outline a concrete maritime field program with unchanged runtime (detector → patch → inference), multi-source ground truth, and robustness tests under diverse sea states.

2. Marine OTFS Transmitter/Reception

2.1. Marine OTFS Transmitter

The OTFS lattice is defined on

Z_{M} \times Z_{N}

with M delay bins and N Doppler/time bins. Throughout this work, we use

(M, N)

, a sampling rate

F_{s}

MHz, and a cyclic prefix (CP) length of

N_{CP}

samples (fixed per profile). Indices are

m \in {0, \dots, M - 1}

(delay/subcarriers) and

n \in {0, \dots, N - 1}

(Doppler/OFDM symbols). Unitary DFTs are assumed so that

F_{K}^{H} F_{K} = I

, ensuring exact energy preservation between domains. Symbol allocation is governed by a binary mask

M_{data} \in {0, 1}^{M \times N}

(1 = data, 0 = pilot). We reserve a DC pilot at the origin so that

X_{DD} [0, 0] = 1

(equivalently

S_{TF} [0, 0] = 1

) for absolute phase/timing reference. Embedded pilots are placed on a 2-D comb with spacings

(P_{m}, P_{n})

(CLI-controlled, typically

P_{m} \geq 2

,

P_{n} \geq 2

). Every k-th frame (CLI flag -pilot-period

= k

) is declared pilot-only with

M_{data} \equiv 0

to enable dense channel snapshots; all other frames are mixed data+pilot according to

M_{data}

. For data bins

D = {(m, n) : M_{data} [m, n] = 1}

, the number of modulated bits is

N_{bits} = b | D |

with

b = {log}_{2} M_{Q}

for

M_{Q} \in {64,}

. We support uncoded operation (

u \in {0, 1}^{N_{bits}}

) as well as classical convolutional forward error Correction (FEC)

(171, 133)

,

K = 7

when enabled. Symbols are normalized to unit average power so that

E [| \hat{s} |^{2}] = 1

, keeping SNR accounting consistent across figures and tables. The clean Tx signal in dB and Phase domain can be seen in the Figure 1 and Figure 2. Data and pilots are first placed on the delay–Doppler grid

X_{DD} \in C^{M \times N}

according to

M_{data}

. The time–frequency waveform

S_{TF}

is produced via the unitary ISFFT:

S_{TF} = F_{N} (F_{M}^{H} X_{DD}) .

(1)

Equation (1) maps physically sparse DD content into a TF lattice that is well-conditioned for CP-OFDM synthesis. Using unitary transforms guarantees that total symbol energy is unchanged from

X_{DD}

to

S_{TF}

. Each time–frequency (TF) column

S_{TF} (:, n)

is converted to the time domain by an M-point inverse-IFFT.

A cyclic prefix of

N_{CP}

samples is then prepended to each of the N OFDM symbols, yielding a frame of length

L_{frame} = N (M + N_{CP})

complex samples. The CP protects against inter-symbol interference under bounded delay spread and maintains the circular convolution structure assumed by OTFS. Figure 1, displays the transmit TF magnitude (in dB) for a mixed data+pilot frame with

(M, N) = (1024, 4096)

,

F_{s} = 200

MHz and unit-power symbol normalization. Pilot tones appear on a regular 2-D comb with spacings

(P_{m}, P_{n})

at fixed amplitude (0 dB reference), while data bins occupy the complement defined by

M_{data}

. The flat spectral floor outside pilot/data locations confirms the absence of spurious emissions and validates that ISFFT + CP-OFDM preserves power exactly (no unintended scaling). This panel is used downstream as the reference for receiver TF equalization quality. Figure 2 shows the transmit TF phase (degrees). Pilot phases follow the chosen reference (e.g.,

0^{\circ}

or ZC-like patterns when configured), while data phases reflect Quadrature Amplitude Modulation (QAM) symbol arguments after unit-power normalization. The clean, piecewise-constant appearance at pilot coordinates demonstrates coherent phase control—critical for accurate channel estimation and Carrier Frequency Offset, Common Phase Error (CFO/CPE) diagnostics at the receiver.

2.2. Marine OTFS Receiver

The receiver ingests a length-

L_{frame}

complex baseband vector per frame and first performs CP removal and M-point FFT per OFDM symbol to reconstruct

Y_{TF} \in C^{M \times N}

. If the capture length L deviates from

L_{frame} = N (M + N_{CP})

(e.g., buffered captures), the implementation infers the effective symbol period

{\hat{L}}_{sym}

and CP length

{\hat{N}}_{CP}

from

L / N

to remain robust. Known TF pilots (at the

(P_{m}, P_{n})

grid and DC) are used for per-tone channel estimation; embedded frames interpolate between pilots, while pilot-only frames provide dense snapshots for diagnostics and holdover. The DD snapshot

Y_{DD}

is obtained via the unitary SFFT (FFT across delay, IFFT across Doppler):

Y_{DD} = F_{M} (F_{N}^{H} Y_{TF}) .

(2)

Equation (2) inverts (1) up to channel distortion and noise, re-concentrating target energy into sparse neighborhoods in the DD grid. Using unitary transforms ensures energy consistency between TF and DD, enabling quantitative comparisons of noise floors and pilot SNR. Figure 3 and Figure 4 presents the receive TF magnitude (dB) and the receive TF phase (degrees) under additive noise and fractional delay/Doppler. Relative to the transmit reference, pilot tones are attenuated and phase-rotated by the channel; their SNR determines the reliability of per-tone channel estimates and the interpolation accuracy in data regions. The observed noise pedestal and any slight spectral tilt reflect the configured SNR and front-end windowing; these factors directly influence equalizer choice in the communication stack and detection thresholds in the sensing stack.

Figure 4 shows the receive TF phase (degrees). Pilot phases now exhibit a common phase error CPE and potential linear trends (residual CFO/TO) that are subsequently corrected using pilot-based estimators. Spatial smoothness across subcarriers is a practical indicator of well-behaved oscillators and modest phase noise; strong ripples would signal impaired synchronization or frequency selectivity requiring tighter interpolation or refined pilot density. Across Transmitter and Receiver, we keep

(M, N) = (1024, 4096)

,

F_{s} = B = 200

MHz, unit-power QAM, a DC reference at

(0, 0)

, a configurable pilot comb with spacings

(P_{m}, P_{n})

, and a pilot-only period of k frames when enabled. Frames have length

L_{frame} = N (M + N_{CP})

samples. For context, we reference two canonical baselines: (i) 2D-FFT with CFAR detection, and (ii) subspace/super-resolution estimators (e.g., MUSIC/ESPRIT). Our pipeline differs by restricting computation to local peak-centered DD patches rather than scanning the full grid or forming large covariance scans. Where relevant, we report complexity/latency and recovery behavior relative to these baselines under similar OTFS settings. The pipeline is intentionally CPU-centric. FFT/SFFT dominate compute and are handled by mature vectorized libraries; pilot-aided equalization and CFAR/NMS are single-pass, cache-friendly operations; and the learning head is peak-only and bounded. With double-buffering, each frame is processed steadily as the next arrives, avoiding deep queues or accelerator dependencies. This design keeps latency predictable on small marine computers while preserving accuracy.

3. Parameter Extraction

The parameter extraction module bridges the physical environment of the marine scene with the digital OTFS lattice while maintaining a strictly blind evaluation protocol. All test-time detections are produced solely from received data, with no access to ground-truth coordinates or coefficients. The smallest resolvable delay and Doppler intervals are obtained from the sampling frequency and frame length. The delay resolution

Δ τ

represents the time spacing per delay bin, while the Doppler resolution

Δ f_{D}

denotes the frequency spacing per Doppler bin:

Δ τ = \frac{1}{F_{s}}, Δ f_{D} = \frac{1}{N T_{sym}} .

(3)

Equation (3) determines how each discrete grid coordinate in the delay–Doppler plane maps to a physical range and radial velocity. For the present configuration, these correspond to approximately

Δ τ = 5

ns (

\approx 1.5

m range resolution) and

Δ f_{D} \approx 47.7

Hz (yielding

\approx 0.76

m/s velocity resolution). Each fractional delay–Doppler index

(m_{f}, n_{f})

obtained from the receiver is rounded to its nearest integer bin:

m = round (m_{f}) mod M, n = round (n_{f}) mod N .

The corresponding physical quantities—delay

τ

, Doppler frequency

f_{D}

, range R, and velocity v—are then recovered through

τ (m) = m Δ τ, f_{D} (n) = n Δ f_{D}, R (m) = \frac{c}{2} τ (m), v (n) = \frac{λ}{2} f_{D} (n) .

(4)

Equation (4) ensures that every DD bin has a deterministic and physically interpretable range and velocity. These mappings form the basis for quantitative sensing evaluation. Each DD frame

H_{DD} \in C^{F \times M \times N}

is indexed consistently so that

ℜ {h_{i} [m, n]}

,

ℑ {h_{i} [m, n]}

,

| h_{i} [m, n] |

, and

∠ h_{i} [m, n]

can be directly compared against oracle references. For performance analysis, each detection is later associated with the nearest ground-truth entry within fixed range/velocity tolerances

(ε_{R}, ε_{v})

, using a one-to-one matching procedure. The magnitude and phase of the complex coefficient

α = α_{ℜ} + j α_{ℑ}

are retrieved as

| α | = \sqrt{α_{ℜ}^{2} + α_{ℑ}^{2}}

and

∠ α = atan 2 (α_{ℑ}, α_{ℜ})

, respectively. The simulation performs strict consistency checks on

(M, N)

and frame counts before analysis to maintain determinism and reproducibility. The system automatically compares estimated

(m_{hat}, n_{hat})

,

(R_{hat}, v_{hat})

against the corresponding ground-truth

(m_{gt}, n_{gt})

,

(R_{gt}, v_{gt})

. A perfect match occurs when both indices and physical quantities coincide, i.e.,

(m_{hat}, n_{hat}) = (m_{gt}, n_{gt})

and

(R_{hat}, v_{hat}) = (R_{gt}, v_{gt})

. Phase and amplitude agreements are evaluated separately, forming the statistical base for the following figure. Figure 5 illustrates the quantitative accuracy of the parameter extraction process.

The bars represent the proportion of correctly matched delay–Doppler bins, physical ranges, and velocities relative to their ground-truth values. The figure shows that the bin, range, and velocity parameters achieve an exact

100 %

correspondence, indicating that the lattice indexing and physical mapping are perfectly aligned with the ground-truth metadata. In contrast, the amplitude and phase exhibit an

89 %

match rate—an expected deviation due to residual channel noise and fractional Doppler leakage. This confirms that while structural localization (in terms of range and velocity) is entirely reliable, fine-scale amplitude and phase recovery are limited by the inherent SNR of the received signal. The figure validates the deterministic and physics-consistent nature of the extraction algorithm across all evaluated frames. Range and radial velocity are derived from integer

(m, n)

bins (or their continuous refinements), which are robust to global complex rotations and mild leakage; hence, their perfect agreement with ground truth. In contrast, absolute complex amplitude and phase are sensitive to a per-frame complex gain (CPE), fractional-bin leakage under the chosen window, and multipath superposition. Our audit, therefore, reports amplitude/phase as secondary diagnostics with strict tolerances, while range/velocity remains the primary physics-anchored Key Performance Indicator (KPIs).

Compare and Recovery of Bins, Range, Velocity, Amplitude, and Phase: This module verifies that the receiver’s DD estimates are consistent with GT and then recovers physically meaningful amplitude and phase per target. It operates in two stages. Given a GT fractional coordinate

(m_{f}, n_{f})

for a wave (or anomaly), we form the nearest integer OTFS indices

m_{gt} = round (m_{f}) mod M

and

n_{gt} = round (n_{f}) mod N

. The receiver produces detected peaks

(\hat{m}, \hat{n})

obtained solely from the delay–Doppler (DD) magnitude map after non-maximum suppression and CFAR thresholding.

bins_exact : (\hat{m} = m_{gt}) \land (\hat{n} = n_{gt}),

R_exact : | \hat{R} - R_{gt} | \leq ε_{R},

v_exact : | \hat{v} - v_{gt} | \leq ε_{v} .

where

ε_{R}

(e.g., in meters) and

ε_{v}

(m/s) are application-level tolerances. For each detected wave, the record includes

{m_{gt}, n_{gt}, \hat{m}, \hat{n}, R_{gt}, \hat{R}, v_{gt}, \hat{v}, H_{dd}, α}

, with

H_{dd}

the complex DD coefficient at

(\hat{m}, \hat{n})

and

α

an available GT scatter coefficient when provided. Per frame f, we select a valid tap set

K_{f} = {k : | α_{k} | \geq α_{min}, | H_{k} | \geq H_{min}}

, using user thresholds

α_{min}

and

H_{min}

. The key step is a physics-aware complex-gain calibration. Optionally, after inference, an analysis-only calibration can estimate a per-frame complex gain

{\hat{g}}_{f}

using Equation (5) to study residual front-end bias.

{\hat{g}}_{f} = \frac{\sum_{k \in K_{f}} α_{k}^{*} H_{k}}{\sum_{k \in K_{f}} {| α_{k} |}^{2}} .

(5)

Equation (5) captures residual front-end scaling/CPE common to the frame. After calibration, each tap forms a dimensionless per-tap residual.

r_{k} = \frac{H_{k}}{{\hat{g}}_{f} α_{k}},

(6)

Whose magnitude and phase,

| r_{k} |

and

∠ r_{k}

, quantify amplitude ratio and phase error, respectively (ideal target:

| r_{k} | = 1

,

∠ r_{k} = 0

). To mitigate systematic trends across the DD grid, the residual phase can be approximated by a plane

∠ r_{k} \approx ϕ_{0} + a m_{k} + b n_{k}

(-phase-plane

\in {

none, ls, best3}), and removed to yield

{\tilde{r}}_{k} = r_{k} exp {- j (ϕ_{0} + a m_{k} + b n_{k})}

. Similarly, residual amplitude can be log-linear detrended,

log | {\tilde{r}}_{k} | \approx c_{0} + c_{m} m_{k} + c_{n} n_{k}

(-amp-plane

\in {

none, ls, best3}), producing the final, detrended

{\hat{r}}_{k}

. These steps separate global bias (e.g., windowing, slight CFO/TO) from tap-level fidelity.

Per tap, we check amplitude and phase against thresholds

A_{thr}

and

Φ_{thr}

:

amp_{ok}_{k} : | | {\hat{r}}_{k} | - 1 | < A_{thr},

phase_{ok}_{k} : | ∠ {\hat{r}}_{k} | < Φ_{thr},

both_{ok}_{k} : amp_{ok}_{k} \land phase_{ok}_{k} .

A frame f passes if at least -frame-pass-k taps satisfy

both_ok

. Robustness can be improved with an optional drop-worst refit (-drop-worst

> 0

): remove the single most deviant tap by the max of normalized amp/phase errors, recompute the planes, and re-test (the dropped tap can be excluded from acceptance if configured). For angular dispersion we compute the circular statistics of

θ_{k} = ∠ {\hat{r}}_{k}

:

C = \frac{1}{K_{f}} \sum_{k} cos θ_{k}

,

S = \frac{1}{K_{f}} \sum_{k} sin θ_{k}

,

R = \sqrt{C^{2} + S^{2}}

, and the circular standard deviation circ_std_deg

= \sqrt{- 2 ln R} \cdot (180 / π)

. We also report robust percentiles

P 95 (| | {\hat{r}}_{k} | - 1 | |)

and

P 95 (| θ_{k} |)

to summarize tail behavior at a glance.

The pipeline combines (i) oracle-aligned bin verification (strict DD index and physical tolerance checks with

ε_{R}, ε_{v}

) and (ii) physics-aware LS calibration (Equation (5)) followed by plane detrending and robust acceptance using

(A_{thr}, Φ_{thr})

. This yields millisecond-class diagnostics that separate global front-end bias from true per-target fidelity, producing stable, interpretable indicators across frames and SNRs. All thresholds

(α_{min}, H_{min}, A_{thr}, Φ_{thr})

and controls (-phase-plane, -amp-plane, -frame-pass-k, -drop-worst) are exposed so practitioners can trade strictness for yield depending on deployment constraints. Oracle association is a simulation-time labeler that aligns detected peaks with known targets for (i) lattice/units verification and (ii) label generation. In deployment, no oracle exists: we run CFAR+NMS, crop RMS/phase-normalized complex patches with wrap-around indexing, and infer

(m, n)

using the trained model. When real data lacks GT, we fine-tune with self-supervision: track-gated pseudo-labels, Kalman Filter, Interacting Multiple Model, Joint Probabilistic Data Association, and Multiple Hypothesis Tracking (KF/IMM, optional JPDA/MHT), confidence thresholds, and temporal-consistency checks, plus domain adaptation (phase rotations, SNR, and fractional-bin jitter) and few-shot real-data tuning when available.

4. Ai Implementation

4.1. Patch and Label Preparation

The goal of this stage is to transform complex delay–Doppler (DD) radar data into compact, learning-ready patches that preserve the key physical properties of each reflection while being normalized for neural-network training. The input consists of the complex stack

Y_{DD} \in C^{F \times M \times N}

, representing F frames with M delay bins and N Doppler bins, together with oracle or estimated coordinates

(fid, m, n)

that identify each target. The physical constants include the carrier frequency

f_{c}

, the Doppler spacing

Δ f_{D}

, and the OTFS symbol duration

T_{sym}

. Hyperparameters such as the patch size P (set by -patch), the search radius r (set by -peak-radius), and the minimum amplitude threshold

A_{min}

(set by -min-peak-mag) control the spatial resolution and quality of the extracted data. The delay–Doppler domain captures the scattering behavior of each surface component or anomaly through localized complex peaks. However, these peaks can shift slightly due to noise, fractional Doppler, or windowing effects. To train a neural model that can learn robust physical features, each instance must be spatially aligned, phase-referenced, and amplitude-normalized. The proposed patch-based method achieves this by (i) finding the true energy peak near the expected location, (ii) extracting a fixed neighborhood around it, (iii) normalizing the magnitude and phase to eliminate global biases, and (iv) providing both physical (range, velocity) and complex (amplitude, phase) targets as labels. This combination allows the AI to learn mappings that are invariant to scale and global phase yet sensitive to local structure. For every candidate coordinate

(m_{c}, n_{c})

from ground truth or receiver output, the algorithm searches within a square window

W = {[m_{c} - r : m_{c} + r]}_{M} \times {[n_{c} - r : n_{c} + r]}_{N}

, wrapped modulo

(M, N)

to handle boundary effects. The local maximum of the magnitude

| Y_{f_{t}} [m, n] |

inside

W

defines the refined peak position:

(m^{⋆}, n^{⋆}) = arg max_{(m, n) \in W} | Y_{f_{t}} [m, n] |, A^{⋆} = | Y_{f_{t}} [m^{⋆}, n^{⋆}] | .

A sample is kept only if

A^{⋆} \geq A_{min}

; it ensures that each patch represents a physically meaningful reflection rather than low-energy clutter. Around the detected peak

(m^{⋆}, n^{⋆})

, a complex patch

P \in C^{P \times P}

is extracted by symmetric padding with wrap-around indexing. Because radar returns vary in absolute power and carrier phase, each patch is normalized both in magnitude and reference phase. The root-mean-square (RMS) amplitude and the mean phasor are calculated as

RMS = \sqrt{\frac{1}{P^{2}} \sum_{i, j} {| P [i, j] |}^{2} + ε}, ϕ_{ref} = arg (\frac{1}{P^{2}} \sum_{i, j} P [i, j]) .

(7)

The normalized real-valued tensor for learning is then provided in-phase and quadrature channels with unit energy and zero global phase. This normalization makes patches comparable across frames and sea states.

X = \frac{1}{RMS} [\begin{matrix} ℜ {P e^{- j ϕ_{ref}}} \\ ℑ {P e^{- j ϕ_{ref}}} \end{matrix}] \in R^{2 \times P \times P},

(8)

At the central bin

(m^{⋆}, n^{⋆})

, the DD coefficient

h_{0} = Y_{f_{t}} [m^{⋆}, n^{⋆}]

represents the local reflection amplitude and phase. Its normalized logarithmic amplitude and phase difference relative to

ϕ_{ref}

form the regression targets:

\log_amp = log (max {\frac{| h_{0} |}{RMS}, 10^{- 12}}), Δ ϕ = {wrap}_{(- π, π]} (arg h_{0} - ϕ_{ref}),

with the phase represented as

(sinp, cosp) = (sin Δ ϕ, cos Δ ϕ)

to avoid discontinuities at

\pm π

. This dual representation provides stable supervision for continuous learning. To associate each patch with measurable quantities, the physical parameters are derived using the radar propagation constants. Let the propagation speed be

c = 2.9979 \times 10^{8} m / s

, the wavelength

λ = c / f_{c}

, and the delay resolution

Δ τ = T_{sym} / M

. Then the Doppler frequency, radial velocity, and range corresponding to the detected peak are

f_{D} = n_{cent} Δ f_{D}, v_{r} = \frac{λ}{2} f_{D}, R = \frac{c}{2} (m^{⋆} Δ τ),

(9)

where

n_{cent}

is the signed Doppler index in

[- N / 2, N / 2)

. Hence, each sample contains both signal-domain features (

X

) and physics-domain labels

(v_{r}, R)

. This patch-based extraction integrates radar physics with deep learning in a self-consistent way. By centering on the physical energy peak and normalizing for RMS power and reference phase, the network learns spatial–spectral patterns that correspond to actual scattering behavior rather than arbitrary phase offsets. The approach maintains the original OTFS lattice structure, ensuring compatibility with sensing and communication stages. Furthermore, using compact

P \times P

neighborhoods (e.g.,

P = 11

) drastically reduces dimensionality while preserving fine structure around each target. The search radius r allows tolerance to fractional Doppler shifts, and the threshold

A_{min}

filters out noise. This method provides an interpretable bridge between raw complex radar signals and the feature representations required by convolutional or transformer networks. The computational cost per patch is

O ({(2 r + 1)}^{2} + P^{2})

, and for

N_{s}

total samples,

O (N_{s} (P^{2} + r^{2}))

. In practice, the pipeline can extract tens of thousands of patches in seconds on a GPU, enabling large-scale supervised learning while retaining full physical traceability. At deployment, the estimator consumes only detection-centered patches from the received DD grid; no ground-truth is available or used. The learned stage is lightweight and supports millisecond-class inference. (1) Detect peaks via CFAR+NMS. (2) Gate detections by tracker predictions (Kalman/IMM) with Mahalanobis and innovation thresholds; require stability over K frames. (3) Accept only high-confidence detections (SNR

\geq γ

, temporal consistency, small residuals) as pseudo-labels. (4) Fine-tune with sample weights proportional to confidence; stop early if a held-out weak set degrades.

4.2. Dataset Generation and Scene Configuration

A physically grounded marine ISAC dataset was generated by integrating a JONSWAP-Gerstner ocean surface synthesizer with an OTFS radar front end. Each simulation frame represents a

1 km \times 1 km

ocean patch containing

N_{wave} = 10

stochastic wave components and

N_{anom} = 4

labeled anomalies. The anomalies correspond to three representative classes—rogue peaks, whitecap breakers, and floating debris. All random draws share a fixed seed for reproducibility, guaranteeing identical environmental conditions across radar–AI experiments. Each frame’s sea state follows JONSWAP-based stochastic distributions for the principal parameters: the significant wave height

H_{s}

ranges from 0.5 to 3.0 m, the spectral peak period

T_{p}

varies between 4 and 12 s, the peakedness factor

γ

lies within 1–7, and the wind speed at 10 m altitude

U_{10}

spans 5–15 m/s. These values represent realistic marine conditions ranging from calm seas to moderately rough states. Individual wave periods are perturbed around

T_{p}

using a log-normal spread to emulate natural irregularity, and each component’s energy contribution is shaped by the JONSWAP spectral envelope. The spatial positions

(x_{i}, y_{i})

of the waves are uniformly distributed within

\pm 500

m around the radar, while the radial distance

r_{i} = \sqrt{x_{i}^{2} + y_{i}^{2}}

and angle

θ_{i}

determine the propagation geometry.

4.3. System Architecture

The monostatic radar operates in the X-band at a carrier frequency of

f_{c} = 9.4

GHz, corresponding to a wavelength of

λ_{r} = c / f_{c} \approx 0.032

m. The total system bandwidth and sampling rate are both

B = F_{s} = 200

MHz, enabling fine delay resolution. The OTFS lattice is defined as

Z_{M} \times Z_{N}

with

M = 1024

delay bins and

N = 4096

Doppler/time bins. These parameters yield a subcarrier spacing of

Δ f = B / M = 195.3125

kHz and an OTFS symbol duration

T_{sym} = 1 / Δ f = 5.12 μ s

. The total frame duration is

T_{frame} = N \times T_{sym} = 0.02097

s, producing a Doppler spacing of

Δ ν = 1 / T_{frame} = 47.68

Hz. The resulting delay resolution is

Δ τ = 1 / F_{s} = 5

ns (equivalent to 0.75 m in range), and the velocity resolution is

Δ v = (λ_{r} / 2) Δ ν \approx 0.76

m/s. These values establish a high-fidelity sensing grid well-suited for maritime scenarios. All the simulations used the same lattice and these parameters. The full architecture of the project is shown in Figure 6, Table 1 summarizes the principal sea-state parameters for multiple frames.

The range of

H_{s}

from 0.8–2.6 m reflects variation in overall wave energy, while

T_{p}

decreases from 11.1 s to 4.6 s, marking the transition from long-period swells to short, choppy waves. The spectral shape factor

γ

varies from 2.0 to 6.6, indicating changes in spectral peakedness, and

U_{10}

spans 6–11 m/s, representing moderate wind conditions. Each frame consistently includes 10 primary waves and 4 anomalies, ensuring balanced scene complexity. The dataset can be extended to Nth frames with Nth waves, so we have mentioned it for future work. Together with Figure 7 and Figure 8, these results confirm that the dataset encompasses a broad spectrum of marine environments suitable for validating OTFS-based sensing and learning frameworks. Each ocean component, positioned at range

r_{i}

and moving with radial velocity

v_{r, i}

, contributes a propagation delay

τ_{i}

and Doppler shift

f_{d, i}

defined as:

τ_{i} = \frac{2 r_{i}}{c}, f_{d, i} = \frac{2 v_{r, i}}{λ_{r}}, | α_{i} | \propto \frac{\sqrt{{RCS}_{i}}}{r_{i}^{2}},

(10)

where

c = 3 \times 10^{8}

m/s is the speed of light. The pair

(τ_{i}, f_{d, i})

determines the delay and Doppler coordinates of each reflection, which are then discretized into fractional OTFS bins

(m_{i}^{(f)}, n_{i}^{(f)})

. This mapping connects the physical ocean geometry directly to the radar’s delay–Doppler representation, forming the ground truth for both classical and AI-based sensing. Each anomaly produces a single complex reflection coefficient

α_{i}

that depends on its radar cross-section (RCS) and distance. The received amplitude decreases quadratically with range, while the RCS controls the overall signal strength. The magnitude of each tap follows:and the phase

∠ α_{i}

is uniformly distributed in

[- π, π)

. This relationship ensures realistic amplitude attenuation and random phase diversity, consistent with the physical radar scattering model.

Figure 7 illustrates a representative

1 km \times 1 km

sea surface generated under JONSWAP statistics with

H_{s}

m,

T_{p}

s,

γ

, and

U_{10}

m/s. The color scale encodes instantaneous surface elevation, highlighting the spatially varying energy distribution. The superimposed symbols denote the three anomaly types—rogue peaks (sharp crests), whitecaps (foam-topped waves), and floating debris (low-profile reflectors). These anomalies differ in both geometric height and effective RCS, resulting in distinctive delay–Doppler signatures. The figure confirms that the simulator reproduces realistic sea morphology and anomaly separability, forming a robust physical basis for subsequent OTFS sensing evaluation.

Figure 8 shows the spatial and statistical distribution of all simulated components across the dataset. Each point represents a wave or anomaly characterized by range

r_{i}

, azimuth

θ_{i}

, velocity

v_{r, i}

, and reflection strength

| α_{i} |

. The spread across both spatial and Doppler dimensions demonstrates broad coverage of marine dynamics, ensuring that the dataset spans calm to turbulent conditions. The anomalies occupy separable clusters in the delay–Doppler domain, validating that the generator preserves both spectral realism and class distinctness—critical properties for training AI-based radar sensing models.

4.4. Balanced and Oversampling

The learning set consists of complex, phase-normalized patches

x \in R^{N \times 2 \times P \times P}

(two real channels: in-phase and quadrature), paired with per-sample indices

(m, n) \in Z^{N}

and regression labels

(\log_amp, sinp, cosp) \in R^{N}

. Optional aligned vectors include the frame ID fid, radial velocity

v_{r}

[m/s], range R [m], and per-patch

RMS

. Additional bookkeeping keys (e.g.,

\hat{m}, \hat{n}, wave_id

) are preserved and passed through unchanged. Delay–Doppler data are inherently imbalanced: slow targets outnumber fast ones, near-range returns dominate far-range, and large-RCS anomalies are rarer than background waves. Training on such skewed data leads to biased models that overfit frequent regimes and underperform on rare but operationally important cases. Our balancing pipeline fixes this by (i) stratifying the dataset along physically meaningful axes (velocity, delay, amplitude/phase), (ii) oversampling under-represented strata to a user-specified target size, and (iii) adding phase-equivariant augmentation that respects the radar physics. The result is a training distribution that is both diverse and faithful to operational edge cases. Samples

I = {0, \dots, N - 1}

are partitioned into quantile bins along three axes: velocity

v_{r}

, delay index m, and amplitude/phase. Let

B_{V}, B_{M}, B_{A}, B_{ϕ}

denote the number of bins for velocity, delay, log-amplitude, and phase, respectively. Bin edges are computed by empirical quantiles (with a uniform fallback for ties or missing values):

\begin{matrix} e_{i}^{(v)} & = Quantile (v_{r}, i / B_{V}), e_{i}^{(m)} = Quantile (m, i / B_{M}), \end{matrix}

(11)

\begin{matrix} e_{i}^{(a)} & = Quantile (\log_amp, i / B_{A}), {\tilde{e}}_{j}^{(ϕ)} = - π + j \frac{2 π}{B_{ϕ}} . \end{matrix}

(12)

Here, the instantaneous phase is represented continuously as

(sinp, cosp)

, while

{\tilde{e}}^{(ϕ)}

provides a coarse partition for balancing. This stratification guarantees coverage across slow/fast, near/far, weak/strong, and different phase regimes. Given a total target count T per group (e.g., per axis) and a per-bin minimum

m_{0}

, we allocate samples to each nonempty bin b (with population

s_{b}

) either uniformly across nonempty bins or proportionally to their sizes. The policy is exposed via CLI flags, ensuring users can bias toward rare bins (uniform allocation) or preserve dataset shape (proportional). Sampling is with replacement inside bins to avoid exhausting rare strata. The union across axes (velocity V, delay R, and amplitude/phase O) forms the final index set

S = S_{V} \cup S_{R} \cup S_{O}

. Radar measurements are equivariant to global phase rotations. We exploit this by drawing K random phases

θ_{k, u} \sim U [- π, π]

for each selected index

k \in S

and rotating both the inputs and phase labels with the

2 \times 2

rotation matrix

R (θ) = [\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix}] .

(13)

Applied to the two input channels and to the label pair

(cosp, sinp)

, this yields

[\begin{matrix} x_{ℜ}^{'} \\ x_{ℑ}^{'} \end{matrix}] = R (θ) [\begin{matrix} x_{ℜ} \\ x_{ℑ} \end{matrix}], [\begin{matrix} {cosp}^{'} \\ {sinp}^{'} \end{matrix}] = R (θ) [\begin{matrix} cosp \\ sinp \end{matrix}] .

Crucially, amplitude (log_amp) and physics descriptors (

v_{r}, R

) are unchanged by global phase, preserving label correctness. This augmentation improves generalization without violating radar invariances. After augmentation, the raw size is

N_{raw} = (K + 1) | S |

. If a global cap is requested (-max-total

= N_{max}

) and

N_{raw} > N_{max}

, we uniformly subsample to

N_{max}

and apply a joint shuffle over all arrays to preserve alignment of inputs, labels, and metadata. This guarantees reproducible runs and bounded memory. Quantile ties fall back to uniform bin edges on

[min, max]

; the last bin is right-inclusive to avoid boundary loss. Sampling is with replacement within a bin to ensure adequate representation of rare strata. If

v_{r}

is unavailable, we skip velocity partitioning and balance over the remaining axes. All operations are stratified by split (train/val/test) to prevent leakage; the augmentation factor K is applied to train only. Binning is

O (N)

, targeted sampling is linear in the selected counts, and the phase rotations cost

O (K S P^{2})

for

S = | S |

patches of size

P \times P

. In practice, with moderate P (e.g.,

P = 11

) and

K \in [2, 8]

, the pipeline is fast and easily GPU-parallelizable.

4.5. Fast Random-Forest Baseline

This baseline establishes a fast, interpretable machine-learning benchmark for the parameter-recovery task. Each complex patch

x \in R^{N \times 2 \times P \times P}

(real–imaginary channels) is associated with a discrete label

y = m \in {0, \dots, M - 1}

, the true delay-bin index corresponding to its range cell. The objective is to predict m directly from the spatial–spectral texture of the patch. Every patch is flattened into a d-dimensional feature vector where

d = 2 P^{2}

. The full design matrix becomes

X = reshape (x, N \times d) .

(14)

Using a fixed random seed

σ

, a reproducible permutation

π_{σ}

divides the data into training and validation subsets with

n_{tr} = ⌊ ρ N ⌋

training samples and

n_{va} = N - n_{tr}

validation samples, where

ρ

is the train fraction. This ensures deterministic benchmarking across runs. A Random-Forest classifier is trained with

n_{est}

trees, maximum depth

d_{max}

, and minimum split size

n_{split}

, using the Gini-impurity criterion:

\hat{h} \leftarrow RF (n_{est}, d_{max}, n_{split}; seed = σ), \hat{h} . fit (X_{T}, y_{T}) .

(15)

Each tree

T_{j}

produces a class vote, and the final decision is obtained by majority rule:

{\hat{y}}_{i} = mode {T_{j} (X_{i})}_{j = 1}^{n_{est}}, i \in V .

This ensemble averaging suppresses overfitting and provides fast, non-parametric classification with complexity

O (n_{est} n_{split} n_{tr} log n_{tr})

during training and

O (n_{est} \cdot depth)

per-sample at inference. The RF model offers a physics-agnostic baseline that captures the statistical texture of amplitude–phase patterns in the delay–Doppler domain without relying on deep-network training or large GPUs. It is particularly useful for rapid diagnostics: by inspecting feature importance across flattened patch dimensions, one can identify which spatial–spectral regions contribute most to delay classification. Moreover, its deterministic random-seed initialization makes it ideal for reproducible benchmarking before introducing neural architectures. Performance is summarized using accuracy, macro-averaged precision, recall, and F1-score:

{Prec}_{macro} = \frac{1}{C} \sum_{c} \frac{{TP}_{c}}{{TP}_{c} + {FP}_{c}}, {Rec}_{macro} = \frac{1}{C} \sum_{c} \frac{{TP}_{c}}{{TP}_{c} + {FN}_{c}}, F 1_{macro} = \frac{1}{C} \sum_{c} \frac{2 {Prec}_{c} {Rec}_{c}}{{Prec}_{c} + {Rec}_{c}} .

Here C is the number of classes (delay bins). These metrics ensure balanced evaluation across all range of categories, including rare far-range bins. We adopt a Random-Forest classifier on small, peak-centered, phase-normalized DD patches. This choice reflects our study constraints: moderate data scale, CPU-class millisecond latency, and the need for deterministic, reproducible behavior. Because upstream steps (oracle alignment, RMS scaling, reference-phase removal) already encode the core physics, RF provides a low-variance, fast, and stable decision rule without large training overhead or accelerator dependency. Higher-capacity deep models are not pursued here, as they would require expanded data, heavier tuning, and a distinct deployment pathway that is outside the present scope. Figure 9, Figure 10 and Figure 11 collectively evaluate the RF baseline trained on the balanced and phase-augmented dataset.

Figure 9 summarizes the macro-averaged metrics—accuracy, precision, recall, and F1-score—computed across all C delay classes

m \in {0, \dots, M - 1}

, showing that the proposed quantile-bin balancing (velocity

B_{V}

, delay

B_{M}

, amplitude–phase

B_{A}, B_{ϕ}

) and targeted oversampling (T,

m_{0}

) significantly improve the recall and F1 of rare bins such as far-range or high-Doppler targets. The gains indicate that the model now generalizes more uniformly across the OTFS lattice, rather than favoring dense near-range regimes.

Figure 10 presents the multi-class Precision–Recall (PR) curves, where the macro Average Precision (AP) reaches

0.972

, confirming excellent ranking of positives above negatives even when the curves exhibit small oscillations due to finite validation size. The high AP demonstrates that the classifier maintains robust ordering of correct classes across thresholds, validating that normalization (RMS, reference phase) and balanced sampling reduce bias from SNR variations.

Finally, Figure 11 displays the corresponding multi-class ROC curves, where most traces lie near the top-left corner, demonstrating strong separability of delay bins. Compared with an unbalanced baseline, the ROC curves shift upward after phase-equivariant augmentation (K random rotations), which enforces invariance to global carrier phase while exposing the model to diverse amplitude–phase orientations. Together, these figures confirm that the RF baseline trained on balanced and phase-augmented data achieves stable, physics-consistent performance, offering a reliable benchmark for subsequent deep-learning models.

5. Discussion

The results indicate that the proposed Marine OTFS–AI system’s quantitative performance aligns with practical sensing needs. The strong correlation between ground truth and the estimated range–velocity pairs confirms that the physical modeling is correct, while the AI refinement improves reliability under noise and multipath. Our evaluation includes quantitative metrics (RMSE, precision, recall, F1) and targeted figure interpretations. The AI module complements physics-based OTFS sensing by learning fine-grained amplitude–phase patterns in the DD grid. While the OTFS front end yields interpretable delay

τ

and Doppler

f_{D}

, slight inaccuracies from fractional bins or multipath can hinder direct analytical extraction. Trained on normalized complex patches

Y_{DD} \in C^{M \times N}

, the model captures associations between DD intensity/phase curvature and the physical parameters

(R, v)

. At inference, it consumes only received DD data—preventing any GT leakage—and maintains a pipeline suitable for dynamic marine environments.

Interpreting amplitude/phase vs. range/velocity: Root causes: (i) Per-frame complex ambiguity (CPE/gain) leaves

(m, n)

intact but rotates/scales taps; (ii) fractional delay–Doppler yields 2-D leakage, so the central-sample complex value depends on the window and nearby clutter; (iii) multipath superposition alters the complex coefficient more than the bin index; (iv) noise and phase wrapping are non-linear in angle. Implication: This does not indicate a fundamental limitation of OTFS sensing or of our pipeline; it reflects the identifiability of absolute complex coefficients under small residuals. Mitigations (analysis/calibration, no runtime bloat). (1) Pilot- or LS-based frame-wise complex-gain/CPE fit; (2) 2-D fractional-bin interpolation (e.g., quadratic/Quinn/Jacobsen-style) to reduce leakage bias; (3) window selection sweep (Hann/Kaiser) for leakage vs. resolution; (4) neighbor-aware debias in dense scenes; (5) robust reporting using 95th percentile amplitude ratio and circular phase error with SNR-aware pass bands. These steps raise complex-tap fidelity while preserving the deployed detector→patch→inference path.

Conventional ISAC sensing relies on (i) 2D-FFT with CFAR thresholding, valued for hardware efficiency but sensitive to leakage, near–far masking, and non-uniform clutter; and (ii) subspace/super-resolution estimators (e.g., MUSIC/ESPRIT), which can surpass grid resolution yet demand higher compute and robust covariance modeling. Our formulation differs by operating natively in the OTFS DD grid and confining computation to localized peak-centered patches with lightweight classification, thereby avoiding costly full-grid searches and retaining millisecond-class inference. In contrast, our OTFS CFAR → patch → RF pipeline remains near-linear in lattice size, requires no task-specific hardware IP, and achieves perfect bin-level localization (hence exact range/velocity) on held-out synthetic scenes. While we do not claim universal superiority over super-resolution methods in minimum-separation limits at very high SNR, our objective is complementary: physics-anchored, millisecond-class parameter extraction with explicit hooks for fractional/leakage calibration and a fairness protocol (SNR sweeps, close-spacing resolution curves, latency and footprint tables with confidence intervals).

(i) Sim → real shift can reduce accuracy; augmentation, domain adaptation, and few-shot fine-tuning help but may not remove all shift. (ii) Weak-label noise from CFAR/track pseudo-labels is mitigated by kinematic gating, temporal agreement, and robust losses. (iii) Amplitude/phase are more sensitive than delay/Doppler to global phase/CPE and fractional DD; frame-wise complex-gain calibration and phase-plane detrending provide auditing. (iv) Dense scenes with overlapping returns may require multi-frame association (JPDA/MHT) and/or higher pilot density. (v) Multi-frame logic is used during training/fine-tuning; runtime inference remains lightweight (ms-class). On constrained platforms, we preserve timing by (i) streaming rather than batch solves, (ii) avoiding full-grid global optimizations, and (iii) budgeting the number of peaks handed to the AI. If resources tighten, the system gracefully trades patch count for headroom without altering the detector’s core behavior. The result is a robust CPU-only implementation suitable for boats and buoys that value reliability, low power, and ease of maintenance.

RF’s capacity ceiling mainly affects continuous amplitude/phase fidelity, which is more sensitive to SNR and fractional leakage than discrete bin recovery. As we scale real-world scenes, lightweight deep heads (with pruning/quantization) can be investigated for amplitude/phase while preserving CPU-class latency; this is orthogonal to the present contribution focused on deterministic oracle labeling and interpretable, low-variance inference. We will validate the system in X-band (9.4 GHz) with the same OTFS lattice (M = 1024, N = 4096, CP = 1024,

B = F_{s} = 200

MHz) to preserve one-to-one lattice mapping. Data will be collected across harbor, coastal, and offshore sites spanning Douglas 1–5. Ground truth will combine wave buoys (Hs, Tp), fixed corner reflectors (range anchors), Automatic Identification System (AIS) vessel tracks, and Global Navigation Satellite System, Inertial Measurement Unit (GNSS/IMU) tags on a maneuvering workboat; shore Uncrewed Aerial Vehicles cameras/UAVs provide qualitative cross-checks. Runtime remains detector→patch→inference (ms-class). Optional fine-tuning uses weak/self-supervision: track-gated pseudo-labels (Kalman/IMM; optional JPDA/MHT), Mahalanobis/innovation gating, temporal stability over K frames, confidence weighting, and early stopping on a small vetted subset.

6. Conclusions

We presented a physics-anchored OTFS sensing pipeline with a lightweight learning head that operates on peak-centered delay–Doppler patches. Using oracle association only in simulation for unambiguous labels—and a GT-free detector → patch → inference path at deployment—the system achieves millisecond-class runtime in our experiments on a CPU-centric setup while preserving interpretability. On a balanced 10k corpus (80/20 split), the learned head reached 0.966 accuracy, 0.965 macro-F1, and 0.998 macro ROC–AUC. In held-out scenes, discrete delay/Doppler bins—and hence range and radial velocity—matched ground truth at 100%, with 89% agreement for amplitude/phase. These results confirm exact lattice mapping and reliable recovery of the primary sensing KPIs (range/velocity) at low computational cost. Limitations primarily affect continuous amplitude/phase due to residual CPE, fractional-bin leakage, and multipath. We treat these as secondary diagnostics and, when needed, apply analysis-only calibrations (frame-wise complex-gain fits, phase-plane detrending, fractional-bin interpolation) without altering the deployed runtime. Planned work focuses on field validation in X-band (9.4 GHz) with the same OTFS lattice across Douglas 1–5 sea states using multi-source ground truth, alongside weak/self-supervised fine-tuning (track-gated pseudo-labels with Kalman/IMM; optional JPDA/MHT). Overall, the OTFS CFAR → patch → RF design delivers physics-consistent, deployable sensing with exact bin recovery and predictable latency, providing a strong baseline for marine ISAC applications.

Author Contributions

Conceptualization, methodology, K.H. and J.Y.; software, formal analysis, K.H.; investigation, resources, J.Y.; data curation, K.H.; writing—original draft preparation, K.H.; writing—review and editing, K.H.; supervision, J.Y.; Analysis, J.Y.; project administration, J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Korea Institute of Marine Science & Technology Promotion (KIMST) funded by the Ministry of Oceans and Fisheries, Korea (RS-2021-KS211502, Establishment of the Ocean Research Station in the Jurisdiction Zone and Convergence Research; RS-2022-KS221620, Development of Simulation Technology for Maritime Spatial Policy).

Data Availability Statement

All the data are available in the paper. The code and other related data will be provided on request.

Acknowledgments

The author acknowledges the supervision of Jeseon Yoo.

Conflicts of Interest

There is no conflict of interest.

References

Hadani, R.; Rakib, S.; Tsatsanis, M.; Monk, A.; Goldsmith, A.J.; Molisch, A.F.; Calderbank, R. Orthogonal time frequency space modulation. In Proceedings of the 2017 IEEE Wireless Communications and Networking Conference (WCNC), San Francisco, CA, USA, 19–22 March 2017; pp. 1–6. [Google Scholar]
Wei, Z.; Yuan, W.; Li, S.; Yuan, J.; Bharatula, G.; Hadani, R.; Hanzo, L. Orthogonal time-frequency space modulation: A promising next-generation waveform. IEEE Wirel. Commun. 2021, 28, 136–144. [Google Scholar] [CrossRef]
Yuan, W.; Wei, Z.; Li, S.; Yuan, J.; Ng, D.W.K. Integrated sensing and communication-assisted orthogonal time frequency space transmission for vehicular networks. IEEE J. Sel. Top. Signal Process. 2021, 15, 1515–1528. [Google Scholar] [CrossRef]
Raviteja, P.; Phan, K.T.; Hong, Y.; Viterbo, E. Orthogonal time frequency space (OTFS) modulation based radar system. In Proceedings of the 2019 IEEE Radar Conference (RadarConf), Boston, MA, USA, 22–26 April 2019; pp. 1–6. [Google Scholar]
Li, S.; Yuan, W.; Liu, C.; Wei, Z.; Yuan, J.; Bai, B.; Ng, D.W.K. A novel ISAC transmission framework based on spatially-spread orthogonal time frequency space modulation. IEEE J. Sel. Areas Commun. 2020, 40, 1854–1872. [Google Scholar] [CrossRef]
Gaudio, L.; Kobayashi, M.; Caire, G.; Colavolpe, G. On the effectiveness of OTFS for joint radar parameter estimation and communication. IEEE Trans. Wirel. Commun. 2020, 19, 5951–5965. [Google Scholar] [CrossRef]
Hooli, K.; Juntti, M.; Latva-Aho, M. Inter-path interference suppression in WCDMA systems with low spreading factors. In Gateway to 21st Century Communications Village. VTC 1999-Fall. IEEE VTS 50th Vehicular Technology Conference (Cat. No. 99CH36324), Amsterdam, The Netherlands, 19–22 September 1999; IEEE: Piscataway, NJ, USA, 1999; Volume 1, pp. 421–425. [Google Scholar]
Ramachandran, M.K.; Chockalingam, A. MIMO-OTFS in high-Doppler fading channels: Signal detection and channel estimation. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 206–212. [Google Scholar]
Raviteja, P.; Phan, K.T.; Hong, Y. Embedded pilot-aided channel estimation for OTFS in delay–Doppler channels. IEEE Trans. Veh. Technol. 2019, 68, 4906–4917. [Google Scholar] [CrossRef]
Liu, F.; Yuan, Z.; Guo, Q.; Wang, Z.; Sun, P. Message passing-based structured sparse signal recovery for estimation of OTFS channels with fractional Doppler shifts. IEEE Trans. Wirel. Commun. 2021, 20, 7773–7785. [Google Scholar] [CrossRef]
Wei, Z.; Yuan, W.; Li, S.; Yuan, J.; Ng, D.W.K. Off-grid channel estimation with sparse Bayesian learning for OTFS systems. IEEE Trans. Wirel. Commun. 2022, 21, 7407–7426. [Google Scholar] [CrossRef]
Khan, I.A.; Mohammed, S.K. Low complexity channel estimation for OTFS modulation with fractional delay and Doppler. arXiv 2021, arXiv:2111.06009. [Google Scholar] [CrossRef]
Raviteja, P.; Hong, Y.; Viterbo, E.; Biglieri, E. Practical pulse-shaping waveforms for reduced-cyclic-prefix OTFS. IEEE Trans. Veh. Technol. 2018, 68, 957–961. [Google Scholar] [CrossRef]
Thaj, T.; Viterbo, E.; Hong, Y. Orthogonal time sequency multiplexing modulation: Analysis and low-complexity receiver design. IEEE Trans. Wirel. Commun. 2021, 20, 7842–7855. [Google Scholar] [CrossRef]
Raviteja, P.; Phan, K.T.; Hong, Y.; Viterbo, E. Interference cancellation and iterative detection for orthogonal time frequency space modulation. IEEE Trans. Wirel. Commun. 2018, 17, 6501–6515. [Google Scholar] [CrossRef]
Hong, Y.; Thaj, T.; Viterbo, E. Delay-Doppler Communications: Principles and Applications; Academic Press: Cambridge, MA, USA, 2022. [Google Scholar]
Liu, F.; Cui, Y.; Masouros, C.; Xu, J.; Han, T.X.; Eldar, Y.C.; Buzzi, S. Integrated sensing and communications: Toward dual-functional wireless networks for 6G and beyond. IEEE J. Sel. Areas Commun. 2022, 40, 1728–1767. [Google Scholar] [CrossRef]
Wild, T.; Braun, V.; Viswanathan, H. Joint design of communication and sensing for beyond 5G and 6G systems. IEEE Access 2021, 9, 30845–30857. [Google Scholar] [CrossRef]
Qi, Q.; Chen, X.; Khalili, A.; Zhong, C.; Zhang, Z.; Ng, D.W.K. Integrating sensing, computing, and communication in 6G wireless networks: Design and optimization. IEEE Trans. Commun. 2022, 70, 6212–6227. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, H.; Di, B.; Di Renzo, M.; Han, Z.; Poor, H.V.; Song, L. Holographic integrated sensing and communication. IEEE J. Sel. Areas Commun. 2022, 40, 2114–2130. [Google Scholar] [CrossRef]
Schmidt, R. Multiple Signal Classification (MUSIC); ESL Technical Memo TM 1098; ESL: Cologne, Germany, 1979. [Google Scholar]
Roy, R.; Kailath, T. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Signal Process. 2002, 37, 984–995. [Google Scholar] [CrossRef]
Sturm, C.; Wiesbeck, W. Waveform design and signal processing aspects for fusion of wireless communications and radar sensing. Proc. IEEE 2011, 99, 1236–1259. [Google Scholar] [CrossRef]
Zeng, Y.; Ma, Y.; Sun, S. Joint radar-communication with cyclic prefixed single carrier waveforms. IEEE Trans. Veh. Technol. 2020, 69, 4069–4079. [Google Scholar] [CrossRef]
Wu, K.; Zhang, J.A.; Huang, X.; Guo, Y.J. Integrating low-complexity and flexible sensing into communication systems. IEEE J. Sel. Areas Commun. 2022, 40, 1873–1889. [Google Scholar] [CrossRef]
Gong, Z.; Liu, S.; Li, L.; Huang, Y.; Yuan, J. Super-resolution delay-Doppler estimation for OTFS radar. In Proceedings of the 2023 IEEE International Conference on Communications Workshops (ICC Workshops), Rome, Italy, 28 May–1 June 2023; pp. 921–924. [Google Scholar]
Nie, G.; Zhang, J.; Zhang, Y.; Yu, L.; Zhang, Z.; Sun, Y.; Tian, L.; Wang, Q.; Xia, L. A predictive 6G network with environment sensing enhancement: From radio wave propagation perspective. China Commun. 2022, 19, 105–122. [Google Scholar] [CrossRef]
Hussain, K.; Ali, E.M.; Hussain, W.; Raza, A.; Elkamchouchi, D.H. Robust OTFS-ISAC for Vehicular-to-Base Station End-to-End Sensing and Communication. Electronics 2025, 1, 4340. [Google Scholar] [CrossRef]
Wang, L.; Wei, Z.; Chen, X.; Feng, Z. Coherent compensation-based sensing for long-range targets in integrated sensing and communication system. IEEE Trans. Veh. Technol. 2025, 6, 9134–9148. [Google Scholar] [CrossRef]
Sim, Y.; Heo, J.; Jung, Y.; Lee, S.; Jung, Y. Design of Reconfigurable Radar Signal Processor for Frequency Modulated Continuous Wave Radar. IEEE Sens. J. 2025, 25, 11601–11612. [Google Scholar] [CrossRef]
Xia, X.; Fang, Z.; Xu, K.; Xie, W. An Ambiguity Function Assisted Active Sensing Scheme for OFDM Based ISAC Systems Toward Low-Altitude Airspace. IEEE Internet Things J. 2025, 12, 19471–19487. [Google Scholar] [CrossRef]
Ni, Y.; Yuan, P.; Huang, Q.; Liu, F.; Wang, Z. An integrated sensing and communications system based on affine frequency division multiplexing. IEEE Trans. Wirel. Commun. 2025, 24, 3763–3779. [Google Scholar] [CrossRef]
Zhang, J.; Yang, G.; Ye, Q.; Huang, Y.; Hu, S. Low-complexity joint azimuth-range-velocity estimation for integrated sensing and communication with OFDM waveform. IEEE Trans. Wirel. Commun. 2025, 24, 7517–7529. [Google Scholar] [CrossRef]
Peng, L.; Yin, M.; Zou, D.; Yang, N.; Xiao, Y.; Li, F. Photonics-assisted integrated sensing and communication with ranging resolution improvement by multiple signal classification. Opt. Express 2024, 32, 34796–34806. [Google Scholar] [CrossRef]
Zegrar, S.E.; Haif, H.; Arslan, H. OTFS-based ISAC for super-resolution range-velocity profile. IEEE Trans. Commun. 2024, 72, 3934–3946. [Google Scholar] [CrossRef]
Yu, Z.; Ren, H.; Pan, C.; Zhou, G.; Wang, R.; Liu, M.; Wang, J. Addressing the mutual interference in uplink ISAC receivers: A projection method. IEEE Wirel. Commun. Lett. 2024, 13, 3109–3113. [Google Scholar] [CrossRef]

Figure 1. Wave visualization of OTFS signal in TF (dB) Domain.

Figure 2. Wave visualization of OTFS signal in TF (degree).

Figure 3. Noisy Rx Signal in TF domain (dB).

Figure 4. Noisy Rx Signal in TF domain (degree).

Figure 5. Match rate of different parameters.

Figure 6. Full end-to-end view of sensing and AI.

Figure 7. Visualization of a generated marine scene.

Figure 8. Distribution of waves and anomalies in the dataset.

Figure 9. Macro-averaged performance of the RF baseline after balancing and augmentation.

Figure 10. Multi-class Precision–Recall (PR) curves of the RF baseline.

Figure 11. Multi-class ROC curves of the RF baseline showing consistent separability across range classes.

Table 1. Per-frame sea-state summary with wave/anomaly counts.

U_{10}

denotes wind speed at 10 m.

Table 1. Per-frame sea-state summary with wave/anomaly counts.

U_{10}

denotes wind speed at 10 m.

Frame	$H_{s}$ [m]	$T_{p}$ [s]	$γ$	$U_{10}$ [m/s]	Waves/Anoms
0	0.84	11.10	6.60	9.46	10/4
1	2.17	9.76	5.70	8.21	10/4
⋮	⋮	⋮	⋮	⋮	⋮
9	2.62	4.66	2.05	6.10	10/4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hussain, K.; Yoo, J. Low-Latency Marine-Based OTFS Echo Parameter Estimation Enabled by AI. Sensors 2025, 25, 7104. https://doi.org/10.3390/s25237104

AMA Style

Hussain K, Yoo J. Low-Latency Marine-Based OTFS Echo Parameter Estimation Enabled by AI. Sensors. 2025; 25(23):7104. https://doi.org/10.3390/s25237104

Chicago/Turabian Style

Hussain, Khurshid, and Jeseon Yoo. 2025. "Low-Latency Marine-Based OTFS Echo Parameter Estimation Enabled by AI" Sensors 25, no. 23: 7104. https://doi.org/10.3390/s25237104

APA Style

Hussain, K., & Yoo, J. (2025). Low-Latency Marine-Based OTFS Echo Parameter Estimation Enabled by AI. Sensors, 25(23), 7104. https://doi.org/10.3390/s25237104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Low-Latency Marine-Based OTFS Echo Parameter Estimation Enabled by AI

Abstract

1. Introduction

2. Marine OTFS Transmitter/Reception

2.1. Marine OTFS Transmitter

2.2. Marine OTFS Receiver

3. Parameter Extraction

4. Ai Implementation

4.1. Patch and Label Preparation

4.2. Dataset Generation and Scene Configuration

4.3. System Architecture

4.4. Balanced and Oversampling

4.5. Fast Random-Forest Baseline

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI