1. Introduction
In contemporary Intelligent Transportation Systems (ITS), the exchange of real-time information is fundamental. In this context, Vehicle-to-Infrastructure (V2I) and Vehicle-to-Vehicle (V2V) communication schemes must be efficiently integrated. However, these links typically operate over Doubly Selective Channels (DSCs), which arise from multipath propagation, environmental scatterers, and the high mobility of vehicles, thereby posing significant challenges to the design and performance of communication systems [
1,
2,
3,
4].
In V2V environments, vehicular motion induces significant Doppler spreads that lead to the loss of orthogonality among Orthogonal Frequency Division Multiplexing (OFDM) subcarriers, making Inter-Carrier Interference (ICI) particularly pronounced [
5,
6,
7,
8]. As a result, the induced ICI degrades the effectiveness of OFDM-based V2V systems and hinders fundamental receiver tasks, such as channel estimation, data detection, and error correction, which are required to mitigate their effects. These tasks typically entail high computational complexity, exceeding
, thereby complicating their real-time implementation and operation. These limitations have motivated considerable research interest in developing approaches that reduce computational complexity while improving performance compared to conventional equalization and data detection schemes [
9,
10,
11].
Linear detectors, including Least Squares (LS) and Linear Minimum Mean Square Error (LMMSE), have been extensively investigated [
12,
13]. Although these schemes are attractive due to their simplicity and suitability for hardware implementation, their Bit-Error Rate (BER) performance is often limited compared to nonlinear methods. This performance degradation is mainly attributed to the fact that linear detectors perform symbol detection without exploiting the discrete structure of the signal constellation [
4,
14], which leads to suboptimal decisions, particularly in V2V scenarios characterized by low signal-to-noise ratios. Despite these limitations, their low computational complexity and reduced processing requirements continue to make them appealing candidates for practical systems, thereby highlighting the need for improved schemes that preserve these advantages while enhancing performance.
Nonlinear detectors exploit the finite symbol alphabet to improve performance. Common examples include Ordered Successive Interference Cancellation (OSIC) [
15] and Decision Feedback Equalization (DFE) [
16,
17], and Maximum Likelihood (ML) detection [
14], which typically achieve lower BER than LMMSE; however, both linear and nonlinear methods experience significant performance degradation in vehicular channels due to ICI. Although nonlinear detectors provide strong performance, this is often achieved at the expense of high computational complexity, typically exceeding
, which hinders real-time implementation. To alleviate this complexity, suboptimal ML variants restrict the search to a lower-dimensional latent space or perform candidate sequence pruning.
Neural Networks (NN) constitute a natural bridge between linear and nonlinear approaches by offering a favorable trade-off between performance and computational complexity [
18,
19,
20]. On the one hand, nonlinear activations enable the modeling of complex decision boundaries, leading to significant performance gains; on the other hand, the underlying operations largely reduce to matrix-vector products, which are highly amenable to hardware acceleration and efficient real-time execution. Within this framework, NN are built from two fundamental blocks: fully connected layers and convolutional layers. Convolutional layers are especially attractive because they reuse the same weights and focus on local regions, allowing for faster inference with a lower computational cost. In the context of equalizer design, this efficiency translates into MobileNetV3-based architectures that achieve low latency, good scalability, and strong performance, outperforming linear NN. Moreover, MobileNetV3 incorporates lightweight activation functions such as hard-swish and hard-sigmoid, along with depthwise-separable convolutions, which drastically reduce the number of parameters and operations compared to standard convolutions [
21], further reinforcing its suitability for real-time vehicular applications.
Neural networks have recently been integrated into OFDM receiver stages in an effort to increase detection accuracy while maintaining practical deployment viability. Most NN-based methods for rapidly time varying multipath channels focus primarily on channel estimation [
22,
23,
24]. While others like ComNet explicitly combine estimation and equalization [
25], none of these cutting-edge techniques takes advantage of the channel matrix to perform a dedicated preprocessing step prior to the network input.
Contribution and Article Structure
This work addresses the problem of low-complexity symbol detection in OFDM-based V2V communication systems operating over doubly dispersive channels, where inter-carrier interference severely degrades performance. The main contributions of this paper are summarized as follows:
MobileNetV3-based neural equalization for V2V links: A lightweight convolutional neural network based on MobileNetV3 is proposed for M-PSK detection in OFDM systems affected by ICI, adapting an architecture originally designed for mobile devices to the vehicular communications context.
Statistics-informed ICI-aware preprocessing: A channel-prediction-based preprocessing stage is introduced to mitigate ICI prior to neural equalization, relieving the network from directly suppressing interference and allowing it to focus on detection.
Signal-model-guided phase distortion compensation: The signal model proposed during the preprocessing stage compensates for Doppler-induced phase distortions, thereby facilitating MobileNetV3 to perform detection by efficiently exploiting phase–data correlations.
Favorable performance–complexity trade-off: The receiver proposed achieves a lower bit-error rate than conventional linear, nonlinear, and recent neural-based approaches, while maintaining computational complexity close to linear least-squares detection.
Overall, the proposed framework provides an efficient and scalable detection solution for ICI-impaired V2V OFDM systems, bridging the gap between the high performance of nonlinear methods and the low complexity of conventional linear receivers.
The remainder of this paper is organized as follows.
Section 2 presents the system model of the proposed OFDM framework.
Section 3 describes the proposed receiver architecture based on MobileNetV3 for OFDM signal detection.
Section 4 analyzes the computational complexity and presents the performance results of the proposed receiver. Additionally,
Section 5 provides a discussion of the experimental results. Finally, the conclusions of this work are drawn in
Section 6.
2. Model System
The model includes the OFDM modulation of the 802.11p standard. In this context, the signal
at the receiver side in complex baseband, following synchronization and cyclic prefix (CP) removal, is denoted by:
using
to denote
N-modulus indexing, the transmitted signal at the
n-th sample is denoted as
. The channel’s impulse response at time
n for the
l-th preceding sample is represented as
, where
L denotes the number of channel taps; whereas
signifies the complex Additive White Gaussian Noise (AWGN) characterized by zero mean and variance
, where
denotes the noise power spectral density. The circular convolution, as delineated in (
1), between the impulse response
and
, can be reformulated in a matrix-vector representation as follows:
where
denotes a
matrix, indexed by
in each dimension, with its elements formulated from the Channel Impulse Response (CIR) coefficients as follows:
for
. The CIR is taken to be zero whenever
. After CP removal, the received OFDM symbol in the frequency domain is obtained by applying the normalized Discrete Fourier Transform (DFT):
where
,
, and
are the DFTs of
,
, and
, respectively. The channel frequency matrix is
, with
denoting the normalized DFT matrix. The data vector
contains
M-PSK symbols, which can be written in terms of magnitude
A and phases
as:
where
denotes the
n-th transmitted complex value in the discrete frequency domain. In a typical OFDM receiver with ICI cancellation, the estimate of
is used to perform data detection via LMMSE, e.g., in the form (where
is the noise-regularization term equivalent to
for unit-power symbols):
This structure is taken as a reference for the proposed NN-based detector to compute the MobileNetV3 input vector:
where
denotes the Hermitian transpose. Here,
denotes the matched-filter preprocessed received vector, obtained as
and used as the NN input. The operation
corresponds to a Matched-Filter (MF) preprocessing step with respect to the effective frequency-domain channel
, since it correlates the received vector with the conjugate channel response. The matched filter does not cancel interference and therefore does not suppress ICI completely.
In doubly selective channels,
is generally non-diagonal, so
contains off-diagonal terms that represent residual ICI. Nevertheless, applying
partially compensates channel-induced phase distortions and tends to tighten the received constellation, as illustrated in
Figure 1. Overall, this step reduces the learning burden on the neural network and enables a more streamlined architecture with fewer layers, while still leaving residual ICI that can limit performance at high SNR.
3. Deep Learning MobileNetV3
CNNs effectively capture spatial correlations; however, their computational cost increases significantly with the number of feature extraction channels. MobileNetV3 addresses this limitation through the use of depthwise separable convolutions, enabling a more computationally efficient architecture.
In this work, MobileNetV3 is adapted for M-PSK detection in OFDM systems operating over doubly dispersive channels. Since standard CNN implementations operate in the real domain, the complex channel matrix
is first transformed into its polar representation. The magnitude and phase components are then arranged as two input channels forming the tensor
. Specifically,
The input tensor is constructed by stacking these components along the channel dimension; i.e.,
The complete end-to-end processing chain—including preprocessing, network inference, and postprocessing—is illustrated in
Figure 2 and described in detail in Algorithms 1 and 2. The network output is defined as
and is projected onto the complex unit circle to generate the phasor
. This phasor acts as a learned phase correction term and is applied elementwise, via the Hadamard operator ⊙, to the Matched-Filter (MF) preprocessing output
defined in (
7). The angular component of the resulting compensated signal provides the final phase estimate, which is expressed as:
The training objective does not aim to directly regress
, as this variable serves only as an intermediate latent representation produced by the network. Instead, the optimization is formulated over the final phase estimate defined in (
11), which directly impacts the symbol decision process. In this way, the learning procedure is explicitly aligned with the system performance metric.
Specifically, the goal is to minimize the discrepancy between the estimated phase and the true phase , where denotes the ground-truth phase of the transmitted symbol vector. To achieve this, it is necessary to define a loss function that properly quantifies this discrepancy in the angular domain.
A straightforward approach would be to employ a direct Mean-Squared Error (MSE) between angles,
however, this metric is not appropriate due to the inherent
periodicity of phase. When
and
lie on opposite sides of the wrapping boundary (e.g., near
and
), direct subtraction may yield an artificially large error, even though the phases are physically close. To properly account for the circular nature of phase, one could define the wrapped angular difference as
and compute the mean-squared wrapped phase error over a batch of size
N as
However, in the proposed approach, an equivalent reformulation in the complex domain is adopted. Since
represents a unit phasor on the complex circle, the phase discrepancy can be measured as the Euclidean distance between the corresponding unit phasors. Under this formulation, the loss function is defined as:
expanding the squared norm in the complex plane yields
so that the loss can be equivalently expressed as:
This formulation is inherently
-periodic and directly penalizes the true circular distance between phases, avoiding ambiguities associated with angular discontinuities.
Given the phase aware loss defined above, we can now describe how the proposed receiver is trained. Algorithm 1 details the end-to-end procedure used to generate training frames, apply the Matched-Filter (MF) preprocessing, form the real-valued channel representation
, and forward it through MobileNetV3 to obtain the phase-correction vector
. The correction is mapped to a unit phasor
and applied elementwise to the MF output, yielding the phase estimate
in (
11). The wrapped phase discrepancy between
and the ground truth
is then evaluated with the proposed loss and used to update the network weights via backpropagation. Algorithm 2 follows the same pipeline at test time but omits the loss computation and weight updates.
| Algorithm 1 Training of the proposed MobileNetV3 receiver |
| Input: Number of frames ; subcarriers ; channel matrix ; constellation ; SNR. |
| Output: Trained weights . |
| 1: | for to do |
| 2: | Draw symbols . |
| 3: | Generate noise according to SNR. |
| 4: | Receive . |
| 5: | MF preprocessing: . |
| 6: | Build network input from : |
| | | ▹ magnitude channel normalized |
| | | ▹ phase channel normalized |
| | . |
| 7: | Forward pass: . |
| 8: | Apply phase correction: . |
| 9: | Predict phases and target . |
| 10: | Wrapped phase error: . |
| | . |
| 11: | Loss . |
| 12: | Update using backpropagation. |
| 13: | end for |
| Algorithm 2 Inference mode of the proposed MobileNetV3 receiver |
| Input: Trained weights ; channel matrix ; received vector . |
| Output: Detected symbols . |
| 1: | Build network input from : |
| | , |
| | |
| 2: | MF preprocessing: . |
| 3: | Forward pass:
|
| 4: | Apply Phase correction: . |
| 5: | Phase Correction
|
| 6: | Estimate
|
Rather than explicitly learning to cancel ICI entirely, the preprocessing stage partially mitigates ICI through an initial matched-filter step. By reducing the complexity of the interference, the network can focus on correcting residual distortions, primarily due to noise, by leveraging phase consistency among the received signals.
The effectiveness of this preprocessing approach is clearly demonstrated in
Figure 1:
Before preprocessing (Blue scatters): Symbols are widely scattered due to ICI and Gaussian noise, limiting accurate identification.
After preprocessing (Orange scatters): Symbols are clearly clustered around the most effective QPSK constellation points, suggesting a substantial reduction in interference. This simplifies the learning process for MobileNetV3 correction.
For reproducibility,
Table 1 details the dataset construction and stored tensors used in training: the transmitted 4-QAM (QPSK) symbols, the matched-filtered noisy received vectors, the per-frame channel matrices, and the binary LOS/NLOS indicator that controls how channel realizations are selected across frames, together with the fixed SNR and modulation order. In particular, LOS and NLOS channel snapshots are alternated (toggled) during dataset generation to expose the network to both propagation conditions; however, this indicator is used only for bookkeeping and analysis and is
not provided as an input feature. Therefore, the proposed model is not explicitly aware of whether the current received frame corresponds to LOS or NLOS, and it must learn a unified mapping that generalizes across both channel regimes.
Table 2 complements this by listing all learning-related settings for the proposed MobileNetV3 receiver, including the backbone choice, tensor shapes, wrapped phase-MSE objective, optimizer and learning rate, batch/epoch schedule, validation protocol, numerical precision, and magnitude/phase normalization used to build the network input.
4. Results and Experiments
The MobileNetV3-based equalizer was assessed in simulation for computational complexity (
), BER, and Block-Error Rate (BLER). Performance was evaluated against OFDM systems utilizing both linear and nonlinear detectors. A WINNER II V2V channel model was employed to simulate a doubly selective environment [
26]. All performance experiments were conducted under ideal channel state information conditions. The complete channel and system configurations are detailed in
Table 3.
The proper dataset SNR for the network training was determined by running a series of simulations under several fixed and mixed SNR values. The results in
Figure 3 show that training with a wider range of SNR values (or with mixed-SNR combinations) does not improve the final BER performance. In contrast, training only at 15 dB achieves a comparable BER while converging faster and requiring fewer computational resources.
The Big
Benchmark chart (
Figure 4) presents the computing complexity of several equalization algorithms applied to OFDM signals. The x-axis represents
N, indicating the number of symbols, while the y-axis measures ‘Operations’ in terms of real products, with values displayed on a logarithmic scale. The red dashed line at
delineates the particular complexity associated with the 802.11p standard parameters [
27].
Figure 4 shows frame size (x-axis,
) and operation count (y-axis, log scale). Key points follow:
The OSIC approach demonstrates a significant escalation in operational complexity as N increases. In contrast to prior observations, it exhibits the lowest complexity among the evaluated strategies. Concurrently, the growth trajectories of NearML stabilize as N escalates. Significantly, NearML exhibits a increased complexity compared to LMMSE.
MobileNetV3-Small complexity: For an
input with 2 channels, each depthwise-separable block at layer
ℓ with cumulative stride
costs
MACs, where
is the kernel size and
,
are the input/output channel counts. Since MobileNetV3 uses fixed channel widths and small kernels (e.g.,
), the complexity grows mainly with the spatial size
. Early downsampling (
) reduces the spatial size of feature maps (e.g.,
for stride 2), so later layers operate on fewer pixels and the total operations drop significantly. Thus MobileNetV3 scales roughly with
but is nearly constant for a fixed
N (e.g., 48), unlike fully connected layers whose cost can grow quadratically with input size [
21].
Occupying the most complex is LMMSE, bearing the highest complexity, scaling as due to the need to invert a matrix.
The graph shows a comparison of both BER (
Figure 5) and BLER (
Figure 6) performance between PhaseNet (
Figure 2) and classical equalization methods—MobileNetV3, ComNet, LMMSE, OSIC, and NearML—across various SNR values in an OFDM system. In the context of LTE, BLER is defined as the ratio of the number of erroneous blocks to the total number of received blocks. This metric is calculated using a Cyclic Redundancy Check (CRC) evaluation, where each transport block has an attached CRC. At the receiver side, the transport block is considered successfully decoded if the CRC calculated by the receiver matches the CRC sent by the transmitter. In BLER metrics, MobileNetV3 behaves pretty similar to the classical methods and does not show a significant improvement. However, its capability to being parallelized makes it a suitable approach for real-time implementation.
Low SNR Range (5–15 dB): In this range, MobileNetV3 has lower BER than classical methods. The BER for MobileNetV3 decreases significantly, reaching values below by 10 dB, whereas methods such as OSIC, LMMSE, and NearML remain closer to .
Moderate SNR Range (15–20 dB): As the SNR increases, MobileNetV3 slows its decreasing rate compared to traditional methods around 17 dB. ComNet shows only limited improvement, with a slow reduction in BER and an almost flat trend. By 20 dB, the MobileNetV3 BER approaches , which is higher than the classical methods, it does not surpass the performance of the classical methods.
High SNR Range (20–25 dB): At higher SNR values, the performance of MobileNetV3 flattens and stabilizes around , while classical methods continue to decrease and achieve better results under noise. ComNet shows a similar stabilization but with a noticeable offset of about one order of magnitude, reaching BER values near . Traditional detectors such as LMMSE and OSIC achieve slightly lower error rates, although the gap becomes minimal in this regime. Neural networks demonstrate their limitations beyond the trained SNR range but remain competitive with classical approaches.
We also extended the evaluation scope to higher-order constellations.
Figure 7 and
Figure 8 report the BER and BLER performance, respectively, when the same receiver architecture is tested on QPSK, 8-PSK, and 16-QAM.
As the constellation order increases, constellation points become more densely packed for a fixed average symbol energy. This reduces the effective separation between decision regions, making symbol detection more sensitive to additive noise as well as to residual phase and amplitude errors. As a result, for the same SNR, higher-order constellations generally exhibit higher symbol error rates and, under Gray mapping, higher BER.
This trend is consistent with our results: compared to QPSK, the 8-PSK BER curve is shifted upward by approximately two orders of magnitude on average, and moving from 8-PSK to 16-QAM introduces roughly one additional order of magnitude. The BLER curves follow the same ordering, indicating that the network maintains the same relative behavior when evaluated on blocks. This is expected because BLER treats a block as incorrect if it contains even a single bit error.
5. Discussion
MobileNetV3 is an efficient neural network architecture designed to achieve fast inference by relying primarily on lightweight convolutional operations. Its depthwise separable convolution strategy is particularly well suited for embedded and real-time applications, as it significantly reduces computational cost while maintaining strong feature extraction capability. In this work, we explore a new application of MobileNetV3 in a regression setting, where the objective is to estimate a latent representation of the channel as a single vector. Conceptually, this can be interpreted as extracting a diagonal like summary of the channel matrix, while simultaneously incorporating the most informative features from the entire channel.
Beyond the architectural efficiency of MobileNetV3, the novelty of the proposed receiver lies in how channel knowledge is exploited and how the network output is integrated into the classical OFDM chain. Instead of directly estimating transmitted symbols or learning a full equalizer, the network processes the complex channel matrix by forming the real-valued two-channel tensor and regresses a phase-correction vector . The model is trained with an MSE objective that minimizes the discrepancy between the corrected phase and the ideal phase . During inference, is mapped to a unit-magnitude phasor and applied elementwise to the matched-filter output (which already partially mitigates ICI), yielding the final phase estimate . This structured design targets residual phase distortions after model-based preprocessing, rather than replacing the detection or equalization pipeline.
Furthermore, a matched-filter preprocessing stage is explicitly introduced prior to the neural network to partially mitigate channel-induced distortions and tighten the received constellation, thereby reducing the learning burden of the network. In contrast to end-to-end architectures that implicitly learn estimation and equalization jointly, the proposed approach enforces a structured decomposition: the channel matrix is explicitly used to construct the preprocessing stage and the network input representation, while training is performed using a wrapped phase-MSE criterion to properly handle phase periodicity. This model-aware design differentiates the proposed receiver from existing NN-based OFDM detectors such as ComNet.
One limitation of the current design is that the network can only be trained over a limited SNR range, and it does not incorporate temporal context from previous OFDM frames. As a consequence, residual inter carrier interference cannot be fully eliminated when the interference originates from adjacent V2V frames. Nevertheless, the model effectively reduces BER levels over low SNR values and various channel impairments within the considered operating range.
In terms of BLER performance, the network behaves similarly to the classical methods baseline. This is a promising result as it indicates that the network is capable of learning and reproducing the behavior of a traditional signal processing method. It is also worth noting that neural networks are highly parallelizable, even on embedded platforms. With current state-of-the-art hardware accelerators such as NPUs, FPGAs, and lightweight AI inference engines, deploying MobileNet-based architectures in practical V2V systems is becoming increasingly feasible.
6. Conclusions
In this paper, a low-complexity detection scheme for M-PSK symbols in OFDM systems operating over doubly dispersive V2V channels is proposed. The MobileNetV3-based detector emerges as a suitable solution for mitigating the effects of severe ICI induced by Doppler spread in vehicular communication links. Unlike classical linear and nonlinear equalizers, which rely on computationally expensive matrix inversions or iterative interference cancellation, the proposed receiver introduces a lightweight preprocessing stage based on the signal model that compensates for Doppler-induced phase distortions. This preprocessing enables the neural network to focus on symbol detection by exploiting phase-data correlations rather than directly suppressing ICI. The proposed MobileNetV3-based receiver is particularly effective in the low-to-mid SNR regime, where ICI degrades the performance of conventional OFDM systems. Simulation results show that the proposed receiver improves BER performance compared to linear and nonlinear detectors, as well as ComNet, in the low-SNR regime (below 15 dB). At high SNR values, a mild performance saturation is observed, which is consistent with the absence of temporal memory in the network architecture. Nevertheless, this trade-off is justified by the reduced computational complexity of the proposed solution.
Overall, the proposed receiver distinguishes itself from existing equalization and neural-based schemes through its compact architecture, inherent parallelism, and reduced computational burden. These characteristics make the MobileNetV3-based approach a strong candidate for real-time V2V receivers and hardware-constrained platforms such as FPGA- or ASIC-based embedded systems.