Carrier Diversity Incorporation to Low-Complexity Near-ML Detection for Multicarrier Systems over V2V Radio Channel

Inter-carrier interference (ICI) in vehicle to vehicle (V2V) orthogonal frequency division multiplexing (OFDM) systems is a common problem that makes the process of detecting data a demanding task. Mitigation of the ICI in V2V systems has been addressed with linear and non-linear iterative receivers in the past; however, the former requires a high number of iterations to achieve good performance, while the latter does not exploit the channel’s frequency diversity. In this paper, a transmission and reception scheme for low complexity data detection in doubly selective highly time varying channels is proposed. The technique couples the discrete Fourier transform spreading with non-linear detection in order to collect the available channel frequency diversity and successfully achieving performance close to the optimal maximum likelihood (ML) detector. When compared with the iterative LMMSE detection, the proposed system achieves a higher performance in terms of bit error rate (BER), reducing the computational cost by a third-part when using 48 subcarriers, while in an OFDM system with 512 subcarriers, the computational cost is reduced by two orders of magnitude.


Introduction
The development of vehicle-to-vehicle (V2V) wireless communications has experienced a boom in recent years due to its main applications for traffic control and road safety, such as: reducing traffic in main avenues, collision prevention, autonomous vehicle development, remote tracking of vehicles, etc. Different measurement campaigns and channel sounding in the V2V environment [1,2] confirm the existence of high Doppler spread frequencies (above 600 Hz), which causes inter-carrier interference (ICI) to be one of the main problems that affect the receivers' performance and greatly complicate the estimation and data detection tasks.
The specific problem of channel parameter estimation at the receiver is further complicated in V2V systems because the ICI also affects the pilot sub-carrier integrity required to properly carry out the channel estimation. Among the works that tackle this problem, the most relevant are [3,4]; these works propose an iterative receiver with a channel estimator based on a two-dimensional basis expansion model (2D-BEM). However, one problem with these approaches is that they only reconstruct the channel variations at an OFDM frame resolution, omitting the temporal variation of the channel within one OFDM symbol. In addition, these works employ a linear data detection scheme that is unable to mitigate ICI in highly time-variant channels. Furthermore, the iterative receivers in [3,4], require at least 5 iterations to deliver acceptable performance in terms of bit error rate (BER). In [5], channel estimation and tracking are performed by resolving dominant multipath components.
Approaches [6][7][8] make a substantial reduction in the computational complexity required in the data detection. This is achieved by approximating the original signal model using a reduced signal model in the frequency domain, based on the band equalization, where the channel matrix is truncated to keeping only a small number of bands. The approximate observation model described in [6][7][8] does not include the channel frequency diversity, which causes the linear detection the achieving of a lower performance compared to the detection carried out in the complete observation model. Furthermore, the approximate model described in [6][7][8] is not compatible with non-linear detectors when including the channel frequency diversity.
Works [9,10] presents systems with data estimators suitable to counteract the distortions produced by the ICI, achieving better performance than conventional receivers. However, they modify the physical layer of the 802.11p standard in order to introduce additional training sequences, which decreases the spectral efficiency of the system and represents an incompatibility problem with the 802.11p standard [11]. They also use channel estimators with an observation window that covers a large number of OFDM symbols, increasing the memory required for their implementation and the system's latency.
Since data detection is the process with the greatest computational complexity in the receiver, the practical feasibility of any system designed to work in real-time in V2V channels depends mainly on the order of complexity required in the detection task. The computational complexity of the optimal detector of maximum likelihood (ML) is O(N D Ω N D ) [12], being N D number of data subcarriers and Ω the constellation size. This complexity makes the real-time implementation to be less viable in comparison with linear detectors whose complexity is bounded in N 3 D . Recently, a series of detectors have been proposed, which have lower complexity than the ML detector while maintaining similar performance. The spherical detector (SD) [13], which is based on tree-search algorithms, has been applied to multicarrier systems [14], yielding lower complexity than the ML detection [15], as it was expected. Some non-linear detection schemes are based on the M-algorithm, and the QR decomposition of the channel matrix [16,17]. Ordered successive iterative cancellation (OSIC) has been proposed recently in [18]. However, all the aforementioned non-linear detectors are applied to a system model that does not include the channel frequency diversity, as well as not exploiting an approximation of the band channel matrix during its QR decomposition. However, all the aforementioned non-linear detectors are applied to a system model that does not include the channel frequency diversity, as well as not exploiting an approximation of the band channel matrix during its QR decomposition.
An important factor that impacts the multicarrier system performance is the frequency selectivity of the channel. This characteristic of the broadband channel makes it difficult to recover some signals degraded by a deep fading of the channel. Because the non-linear data detection is carried out consecutively, the channel's selectivity causes degradation in performance of the non-linear detection, even in conditions of high signal-to-noise ratio (SNR). The works mentioned above [3,4] tackle this problem through the inclusion of a stage for channel coding, with the consequent loss of spectral efficiency. In recent works [19][20][21], it is observed that low-density parity-check (LDPC) codes have a better performance on doubly selective channels compared to convolutional codes and turbo coding. In LDPC coded transmission schemes, the system parameters are usually optimized towards a particular scenario to achieve capacity-approaching performance. However, due to high mobility in V2V communications, several channel scenarios can be experienced within the narrow time window. In order to maintain the performance of the LDPC encoded transmission schemes, the system parameters must be individually optimized for different scenarios, which translates into an increase in the computational complexity of the system making real-time implementation difficult. Additionally, at the signal level, no schemes have been incorporated to effectively take advantage of the frequency selectivity in the form of diversity to achieve better performance in the non-linear detection without significantly affecting the system's spectral efficiency.

Objectives and Contributions
This paper proposes a new detection algorithm that yields similar performance to that of ML with low computational complexity. A data precoding scheme is proposed that efficiently exploits the channel's frequency diversity without affecting the system's spectral efficiency. The proposed detector offers two main advantages: • It assimilates the linear data precoding stage so that the operation is obtained with equivalent channel matrices that maintain the band structure. This makes it possible to extract the channel diversity, which significantly improves performance while demanding low complexity; • It uses an improved search order that enables the decoder to find the optimal solution in fewer iterations. Furthermore, the detector's maximum search size can be set to an amount of operations that will not have a significant impact on performance.
Finally, an iterative reception structure is proposed that makes it possible to eliminate ICI among data subcarriers. The performance achieved with this technique after one iteration comes close to that obtained in conditions of exact knowledge of the channel.

Abbreviations and Acronyms
Lower (upper) case letters refer to vectors (matrices); [·] T , (·) H , (·) N and [·] T are the transpose, Hermitian, circular shift modulo N and band truncation operators, respectively; (·) k refers to the k-th OFDM symbol being considered. The subscripts (·) P and (·) D are the sampled versions of the vectors in the pilot and data positions; and in the case of the matrices, the versions sampled in the rows and columns in the pilot and data positions.

Organization
This paper is organized as follows: The signal model used is described in Section 2. The proposed reception system with the incorporation of frequency dispersion is addressed in Section 3. The low complexity non-linear detection proposed is described in Section 4. The complete block structure of the transmitter and receiver is presented in Section 5. Computational complexity analysis is addressed in Section 6. The simulation results are presented in Section 7. Finally, the conclusions are stated in Section 8.

System Model
The frame structure in 802.11p consists of a preamble that contains 10 symbols, each one lasting 1.6 µs, found at the beginning of the frame and used to synchronize the system. Subsequently, two long training symbols are transmitted, each one lasting 6.4 µs, used for fine synchronization and channel estimation. The remaining part of the frame, which has a variable length, is used to transmit the payload. Depending on the different modulation and coding schemes used, IEEE 802.11p permits a number of data transmission speeds ranging from 3 to 27 Mbps [22]. The physical layer of 802.11p, shown in Figure 1, uses 64 subcarriers per OFDM symbol, including 48 data subcarriers, 4 pilot subcarriers located at the indexes [−21, −7], [7,21], 11 virtual subcarriers, and one subcarrier for direct current (DC) component. Let x k [n] the k-th transmitted OFDM symbol of N b = N + N g samples, where N and N g are the number of subcarriers and the length of the cyclical prefix (CP), respectively. Assuming that the CP is long enough to absorb the channel impulse response (CIR), the k-th received symbol y k [n] after removing CP can be expressed in its complex baseband representation as: where n = {0, ..., N − 1}, l = {0, ..., L − 1}, L denotes the CIR length, h[n, l] is the CIR for the k-th block in the n-th time instant for an impulse function introduced l samples earlier, and w[n] is the circular and symmetrical additive white Gaussian noise (AWGN), with zero mean and variance σ 2 w = N 0 /2. The circular convolution between the CIR and x k [n] can be rewritten in matrix form as: where in addition, H k is an N × N matrix, indexed 0, 1, ..., N − 1 on each dimension, whose elements are formed with the CIR coefficients as follows: where n, n = {0, 1, ..., N − 1} and CIR is assumed to be zero for (n − n ) N > L − 1. The OFDM symbol received in frequency domain (FD) is obtained by multiplying both sides of (2) by the matrix of the normalized discrete Fourier transform (DFT): which gives the result: where u k is the DFT of the y k and z k is the DFT of noise sequence. Since matrix F is unitary, Equation (5) can be expressed as: where s k = Fx k is the OFDM symbol transmitted in the frequency domain, consisting of the χ data vector with N D data symbols, the s P pilot vector with N P pilots symbols, and N G guard symbols. G = FHF H is the channel frequency matrix (CFM). When the Doppler propagation is insignificant, G is a diagonal matrix, and the system is ICI-free. In V2V environments, due to high mobility of both the transmitter and the receiver, the combined effect of Doppler shift for each of the individual received paths results in significant Doppler dispersion, causing the CIR to become time varying within an OFDM symbol and resulting in matrix G to have energy on the components outside of the diagonal, giving rise to ICI. An example of the case of the scattered matrix of the channel is shown in Figure 2.

Channel Estimation
The reduced number of pilots in a single OFDM symbol and the fact that they experience ICI complicate the task of estimating the time-varying CIR. To counteract this, the observation model of the channel estimator is extended to a sliding window that includes adjacent OFDM symbols as follows: where the superscript of each variable represents the position relative to the current k symbol. In order to carry out the channel estimation with this observation model, the algorithm proposed in [23] is used, where BEM is applied so as to obtain a compact representation of the CIR in the interval of the three OFDM symbols as follows: } are the functions that expand the time domain and the delay time, respectively. Given that in V2V scenarios the Doppler and delay dispersion presents statistics within a very diverse set, the discrete prolate spheroidal sequences (DPSS) are used as base functions since they optimally concentrate the energy in a finite time and bandwidth window. The modeling error for this representation in subspace is concentrated in the term ε[n, l].
To determine the number of functions needed in each of the CIR domains, we use the approach proposed in [24]: where · denotes the upward rounding operator, F S is the system's bandwidth, τ max is the maximum time delay spread and f D is the maximum frequency Doppler spread. The information from the BEM of channel in the frequency and frequency Doppler domains are found compactly in the doubly-indexed matrix: with B k r,q n,n and the BEM coefficients given by: Substituting the channel for its BEM in (6) yields the expression: Annexing this representation of the channel to the Equation (7) and considering only the positions where the transmitted and received pilots are found, yields the following: where: the subscript P refers to the sub-sample of the vectors and matrices in the rows and columns corresponding to the pilots' position. For reasons of simplification in the notation, the indexed variable i = q + M D (r − 1) was used, where 0 ≤ r ≤ M τ and 0 ≤ q ≤ M D − 1.
P is the vector that concentrates the noise contributions, modeling error and intersymbol interference in order to simplify the expressions.
It is assumed that the receiver has matrix Λ and the received vector u P , such that the calculation of the channel's estimated coefficient vector can be obtained by the least-squares (LS) algorithm [23]:ρ Once these coefficients are obtained, any of the representations, such as the timevarying impulse response and the channel transfer function, can be calculated directly by computing the weighted sum of the base functions. In this way, the frequency-Doppler and frequency response matrix can be calculated by using the expression:

DFT Dispersion
The V2V channel selectivity makes OFDM systems susceptible to detection errors because the instant power of some subcarriers can be low due to the deep fading, which makes it difficult to detect the transmitted data. To counteract this problem, the precoding by dispersion in frequency (Direct Fourier Transform Spreading: DFTS) technique is used in this work because it uniformly distributes each symbol's energy over the entire bandwidth and because its implementation, by means of the Fast Fourier Transform (FFT), is low in complexity. This operation can be represented formally on the transmitter side as follows: This is to say, the elements transmitted in frequency in the data positions are built by applying the Fourier matrix to the data symbols in vector χ. The elements of the Fourier matrix are determined by: where d, d = {0, 1, . . . , N D }. Due to the fact that the data detection will only be carried out in the data subcarrier indexes, the original signal model in (4) is reduced to the following expression: where u D and z D are the received signal vector and the noise vector, respectively, each in the position of the data subcarriers. The G D matrix is obtained by taking the rows and columns of the data carrier positions. The term G D,P s P represents the interference in the carriers, with data from the pilot carriers.

Non-Linear Detection on DFTS-OFDM System
The main contribution in this paper is the efficient integration of the DFTS precoding in the non-linear data detection process in the receiver. This has to be emphasized: one of the problems when using DFTS in the receiver is the difficulty of applying the nonlinear detection algorithms while maintaining low computational complexity, since the equivalent channel matrix G D F D does not conserve a band structure. One solution to this issue is to find an operator that, upon being applied to the received signal, re-establishes the equivalent channel matrix's band structure. In this sense, the inverse Fourier transform F H D is used to complete the Cramer-Loève operator in the G D channel matrix. In mathematical terms, the vector received in the data position is obtained as: where is the equivalent channel matrix after applying the inverse Fourier transform to the received symbols in the data position with linear precoding. In order to simplify the notation, the noise vector z D = F H D z D maintains the same nomenclature since the orthonormal transforms does not affect its statistics. Notice that the correlation characteristics and quasi-band structure of matrix G D imply that matrix K also possess a quasiband structure after the transformation, as shown in Figure 3. This structure in particular allows detection algorithms to diminish their computational complexity, while performing close to the ML detector in terms of BER.

Maximum Likelihood Detection Criteria
The data detection using the maximum likelihood criterion can be determined from the Equation (26) by finding the vector χ that minimizes the following metrics: where Ω is the constellation used for data modulation and Ω N D is the set containing all possible combinations of the symbols transmitted through the N D data subcarriers. This data detection method is highly computationally complex due to the exhaustive calculation of all of the Euclidean distances that are needed to estimate the vector χ. In order to reduce the complexity required by the detection, it can be applied a truncation to the matrix K, keeping B non-zero diagonals. These are determined using the following equation: where the B diagonals are distributed in 3 bands, one band is formed by the main diagonal and 2λ adjacent diagonals, the second band is formed by the λ c diagonals located in the upper right corner, and the third band is formed by λ c diagonals located in the lower-left corner. The rest of the diagonals are truncated to zero. The structure of the truncated K matrix is shown in Figure 3.

ICI Mitigation
As in related works [3,23,25], an iterative receiver is proposed, in which the estimated data can be reused to perform interference cancellation in the data, attaining an improved channel estimation in the next iteration. Mathematically, the estimated data in a it-th iteration can be described recursively as: where the sub-index it = {0, 1, 2, . . .} denotes the iteration number; u k it is the k-th received signal, [Ĝ k ] T is the estimated channel matrix with null elements in the main diagonal and s k it−1 the received signal of the previous iteration.

Low Complexity Non-Linear Detection
The solution in (27) of the proposed signal model (26), although optimal, is not practical for implementation due to its high computational complexity. This section describes a methodology for non-linear detection of linearly precoded data; suitable for the signal model described in (26). The detection process consists of two stages; first, the sorted QR decomposition of the channel matrix K is performed. Subsequently, with the help of this decomposition, the proposed non-linear detectors of low computational complexity are executed. Each of the above algorithms is now described.

Sorted QR Decomposition
QR decomposition is used to obtain the matrices Q and R from the channel matrix with precoding K. This decomposition must comply with the following relationship: where Q is an orthonormal matrix that fulfils the property Q H Q = I, R is a superior triangular matrix, and P is a permutation matrix whose reordering depends on the signal to interference ratio. This decomposition can be accomplished using different methods; however, for this article, the method based on the Givens unit rotations is used [26]. In each step of the orthonormalization process for determining the Q matrix, permutations are taken in the position of the columns in order for the resulting rows in the R matrix to be ordered according to their signal to interference ratio. The K matrix's quasi-band structure is exploited in order to reduce, as much as possible, the computational complexity in terms of the number of Givens rotations required for the calculation of the QR decomposition.
To accomplish the mentioned decomposition according to the zero-forcing (ZF) criterion, the following extended matrix is defined: to which a sequence of unit rotations is applied. When the minimum mean square error (MMSE) criterion is used, the extended matrix is constructed including the noise statistics as follows: The advantage of using unit rotations in the orthogonalization process to obtain the QR decomposition is that it conserves the original energy of all the elements of the original K matrix, maintaining a dynamic range of all the variables used in the process. This characteristic makes it easier to implement this method in devices in real-time using fixedpoint arithmetic. To simplify the notation in the successive process of Givens rotations, the following notation: is defined. Each Givens rotation, described by matrix Θ j which is calculated to cancel a non-zero element of the X (j−1) matrix, obtained in the previous iteration. Therefore, the process of generating the upper triangular matrix R requires a sequence of Givens rotations to be applied to the expanded matrices (31) and (32) as follows: which ultimately yields for the ZF criteria case; for the MMSE criterion the following representation is obtained: The detailed description of the Sorted QR algorithm is described in [27], with the only difference of omitting the rotation when a null element is found in the truncated channel matrix K.

Two Approaches for Non-Linear Data Detection
Once the preprocessing of the received signal, and the QR decomposition of the estimated channel matrix, have been completed, the next step is to perform data detection. For accomplishing this task, this work considered the following two non-linear detectors:

OSIC Detection
The ordered successive interference cancellation (OSIC) algorithm is an effective method to carry out the cancellation of the ICI. This method is suitable to apply to the proposed model in Equation (26) combined with the QR decomposition described above, allowing the sub-optimal detection of the data with very low computational complexity. Substituting this decomposition in Equation (26), it results in: pre-multiplying both sides of this equation by Q H leads to the following system of equations: whereṽ = Q H v, andχ = P T χ is the vector of the precoded data, ordered in decreasing order with respect to its contained energy. The z D noise vector maintains its statistics due to the fact that Q is unitary. Due to the triangular structure of R, the v vector elements can be expressed individually as: where the notations [·] a and [·] a,b indicate the a-th element of the vector and the b-th element of the a-th row of the matrix, respectively. This way, the detection of each of the data signal can be obtained iteratively using the following expression: where the operator Q{·} is a decision operator that maps its arguments to the closest point in the constellation Ω used by the transmitter. Assuming that at each iteration, the previous decisions are correct, then the interference of the previously detected symbols can be subtracted from the current symbol to be detected.

Near ML Detection
The ML detection using QR decomposition can be reformulated in the following way: The search for the ML solution of theχ vector based on the criterion established in Equation (43) can be reflected graphically in the construction of the tree shown in Figure 4. At the n-th level there are q = N N D −n+1 c possible candidates for [χ] n , where n = {N D , N D − 1, · · · , 1}. The total minimum metric needed to determine the vector with the minimum distance in accordance to Equation (43) is defined as: where:χ = χ 1 ,χ 2 , · · · ,χ N D , corresponds to the trajectory that forms the nodes at the n levels of the tree. The trajectory minimizes the distance calculated in the Equation (44). The symbol vector χ is defined to the n-th level as: with N D − n + 1 length. Adjusting the Equation (44) to calculate the partial metric for the χ vector, the modification is defined as: where d n (χ) is the value obtained from the accumulated branch metric of the [χ] n node that has a [χ] N D . . . , [χ] n+1 as its predecessor nodes. This distance represents the addition of all of the branch metrics from the root (N D node) to the node indicated at the n-th level. The QRD-M algorithm based on the QR decomposition reaches a performance in terms of BER, similar to the ML detector [27]. The QRD-M algorithm is a transversal tree search algorithm. At the n-th level, the algorithm maintains M possible candidates before selecting the symbol. Subsequently, the decision of theχ vector is performed once the N D levels have been processed. The search process in the QRD-M algorithm, used to detect the symbols, is run sequentially and is initiated at the last level (n = N D ). The algorithm calculates the metric d n (χ), defined in Equation (47), for all the possible values of χ i ∈ Ω. The distances and the nodes are then arranged in ascending order, and only the M nodes with the least distance are maintained while the rest are discarded. The same procedure is used for the next level and continues until the first level (n = 1) is processed. The performance of the QRD-M algorithm depends on the established value of M ≤ N c ; a higher value makes it more probable that the optimal branch is included in the branches selected at the n-th level. In the search process at each one of the n-th levels, the tree extends to p = MN c branches, and their corresponding Euclidean distance is calculated in order to select the surviving M branches at the n-th level, as illustrated in Figure 5. This value is much lower than the number of branches q = N N D −k+1 c that are required in the ML algorithm. Additionally, in our proposal, we introduce heuristics so that the number of surviving M branches per level can be adapted. This value is adjusted during the search process run at each level of the tree. With this modification, the detector complexity is variable, and for a high signal to noise ratio the savings in computational complexity of the detector are significant, maintaining the detector's performance close to that of the ML detector in terms of BER.
Exploiting the R matrix structure, the estimation of the last level N D indicates the constellation symbol Ω with the least distance in d(β). The symbol search process is continued χ 1 , · · · ,χ N D −1 using the QRD-M algorithm described previously assuming that [χ] N D = χ (1) N D . The search result is stored in the vectorχ (1) = χ (1) 1 , · · · , χ (1) N D , with a distance calculated using where the super indexes of d (1) and χ (1) correspond to the assumption [χ] N D = χ (1) N D . A new search is run in the tree and [χ] N D = χ (2) N D is established only if it meets the following condition: [d] 2 < d (1) (the first phase demonstrates that d (β * ) = d (1) ). In case it is not met the significant condition that this search has an initial distance greater than d (β * ) . The complexity of the optimal vector search forχ in the tree can be reduced further by calculating the partial distance metric to the n-th level defined in Equation (47) is met, the search is cancelled at that level and restarted with the next phase assuming that [χ] N D = χ (β+1) N D . The algorithm finalizes theχ optimal vector search process when d (β * ) ≤ [d] β . A complete description of the Near ML V2V algorithm that includes the previously described procedure is presented in Algorithm 1.

Computational Structure
The architecture of the transmitter and the receiver are summarized in Figures 6 and 7, respectively. The transmitter conserves the structure of a conventional OFDM system for the standard 802.11p. The only difference is the DFTS block, applied to the data before the OFDM modulation is completed. The receiver is composed of four main stages. First, the conventional OFDM demodulation is executed with the help of the FFT block, followed by the demapping of the OFDM symbols. The second stage consists of estimating the channel with the use of the pilot symbols. The channel estimator block has the channel matrices G and K as output, necessary to carry out the ICI mitigation and data estimation, respectively. In the third stage, the D T truncation is carried out once the B bands of the matrix K are selected. Afterward, the QR decomposition of the truncated matrix K under the ZF or MMSE criterion is carried out. Next, the vectorṽ is calculated by multiplying v and Q H . The following step is to carry out either OSIC or Near-ML non-linear detection. At the final stage, bit error correction is performed on the estimated vectorχ. The estimated vector enters a feedback loop where ICI mitigation is performed for the data subcarriers. This process reduces the channel estimation error and improves the detection of the data. The process is repeated until the number of iterations configured in the receiver is executed.

Computational Complexity
The computation complexity is first presented for the QR decomposition; it is given in terms of O and verified by counting the number of complex operations, where Givens rotations are implemented using conventional arithmetic. The parameters considered for the system are: As can be observed in Table 1, the complexity obtained in this proposal, in terms of complex operations, is in quadratic order with respect to the B number of bands and in linear order with respect to the N D number of data subcarriers. Systems with similar orders of complexity can be found in [8,14], but the work proposed in this paper provides a significant reduction in the complexity required for the conventional MMSE linear criterion, which is reported in the literature in O(N 3 D ) order. A system that uses the real model and LDL decomposition with a complexity similar to O(B 2 N D ) was proposed in [8] for an OFDM system in doubly dispersive channels. However, that work does not achieve to improving the performance in terms of BER when compared to the conventional MMSE detection. In the case of the OSIC and Near ML detector complexity, the necessary complex operations were assessed for the detection of N D symbols transmitted by the system. The number of required complex operations was counted depending on the signal-to-noise ratio, both for the ZF criterion and for the MMSE. The results of these simulations are presented in Figure 8.  Figure 8 demonstrates the complexity in terms of complex operations required for both proposed detectors. For the case of the OSIC detector the complexity, using the MMSE criterion in the calculation of the QR decomposition, significantly reduces the complexity of the data detection task. For the Near ML detector, none of the two mentioned criteria for QR decomposition is affecting the complexity of finding the optimal vector. The complexity in the OSIC detector is constant and it does not depend on the receiver's SNR. This approach presents the lowest complexity but its performance is not the best, as shown in Figure 9. In the case of the Near ML detector, Figure 8 shows that its complexity tends towards a low constant value starting at an SNR of 15 dB, where it achieves a very similar value to the complexity obtained by the OSIC detector. On the other hand, its performance, in terms of BER, is quite close to the ML detector, as will be discussed in detail in the following section.  Table 2 summarizes the complexity required by the most representative detection algorithms used in doubly selective channels (DSC). The main objective is to provide the reader with an overview of the computational complexity of the compared approaches. Table 2 shows the closeness in computational cost of Near ML detection compared to Sorted OSIC detection. The LMMSE detection requires approximately three times the computational cost of the OSIC and Near ML detection. The proposed algorithms achieve a cost reduction in 10 25 order compared with the Full-ML detection.

Simulations Results
The numerical results presented below were obtained from a simulator implemented in Matlab-Simulink, replicating compatible simulation environments with the 802.11p link [11]. The blocks that describe the internal signal processing algorithms considered for the transmitter and the iterative receiver were discussed in Section 5.
The standard 802.11p uses a bandwidth of BW = 10 Mhz, a cyclic prefix containing CP = 16 samples, due to which the system can absorb a maximum tolerable delay of 1.6 µs in the duration of the CIR from the V2V channel. Of the 64 subcarriers that compose an OFDM symbol, 48 are used for the data transmitting. This allows the use of eight modulation schemes which permit the handling of transmission speeds between 3 and 27 Mbps and a frame length of L F = 37 OFDM signals.
In this article, we present results using the following parameters: a 4-QAM modulation scheme, N c = 4 for the Near ML search tree, convolutional coding of length L c = 7, a code rate of Rc = 1/2 and B = 27 main diagonals, with λ = 8, λ c = 5. The channel was implemented using the filtered method explained in [28]. The channel power delay profile (PDP) consists of six uncorrelated paths defined by p(τ) = δ(τ − mτ 0 )e τ 10τ 0 where 0 < τ < 1.6 µs, τ 0 = 0.1 µs and m = {0, 1, . . . , }. The parameters that were used to generate the V2V channel are reported in [3], where V2V channel models consider vehicle velocities of v = 100 Km/h. The number of OFDM symbols transmitted by each evaluated SNR level is equal to 1.6 × 10 5 . Figure 10 shows the BER vs. E b /N 0 , under a vehicular scenario with NLOS (no-lineof-sight). To emulate this scenario, a channel with Rayleigh fading was considered with a power delay profile, decreasing exponentially with root mean square (RMS) delay time τ RMS = 0.4 µs and f D = 1 kHz. This model is similar to the one called "RTV-Expressway" described in [2]. The dotted lines show the system performance with linear detectors, using the pilot assignment shown in Figure 1 for the iterative estimation of the channel. The solid line is assigned to the performance of our proposed system with Near ML non-linear detection. The tests show that the linear detection approach needs at least three iterations in order for a floor error not to be found. In the case of the proposed Near ML non-linear detection, no iteration is needed to achieve this. It can be seen that the proposed Near ML detection largely surpasses the LMMSE detection approach proposed in [3]. In order to more clearly quantify the Near ML and OSIC detection performance with respect to the DFTS-OFDM-LMMSE detection, tests were completed without the use of the convolutional encoder. Additionally, both the OSIC and Near ML detection were evaluated using QR decomposition with ZF and MMSE as criteria. Figure 9 shows the BER vs. SNR comparison of the proposed algorithms that includes frequency dispersion. With the exception of the ZF-OSIC detection, the proposed algorithms present better performance compared with LMMSE detection with ideal channel. In one particular case above SNR = 15 dB, the MMSE-OSIC suboptimal detection and the MMSE-Near ML detection surpass the LMMSE detection by 2.5 and 5 dB, respectively. It is important to mention that the tendency shown by both proposed detectors does not exhibit the undesirable error floor. Figure 11 shows the performance of the proposed iterative receiver. With just two iterations completed in the proposed receiver, a performance similar to the ideal channel, in terms of BER, is achieved. This represents a substantial decrease in the number of iterations required by the receivers reported in state of the art, which require at least five iterations to achieve a performance similar to the ICI-free condition.

Conclusions
This article has presented a low computational complexity receiver that achieves both the extraction of frequency diversity, as well as efficient ICI mitigation in V2V communication systems. It was shown that reception in challenging high-mobility environments can be achieved with the proposed scheme, also exhibiting a manageable computational complexity. The proposal consists of an efficient treatment of the data frequency dispersion, a time-varying channel estimation using the two dimensional basic expansion model, and sub-optimal non-linear detectors. The results show a performance close to the one obtained by the ML algorithm. Furthermore, with only two iterations, it was shown that with true low complexity, the proposed system performed equally or better, in terms of BER, than other approximations presented as the state of the art, confirming its capacity to exploit the available frequency diversity in doubly selective channels.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.