Next Article in Journal
Single Photon Avalanche Diode Arrays for Time-Resolved Raman Spectroscopy
Next Article in Special Issue
Ground Moving Target Tracking Filter Considering Terrain and Kinematics
Previous Article in Journal
An Improved Voltage Clamp Circuit Suitable for Accurate Measurement of the Conduction Loss of Power Electronic Devices
Previous Article in Special Issue
Scheduled QR-BP Detector with Interference Cancellation and Candidate Constraints for MIMO Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

V2X Wireless Technology Identification Using Time–Frequency Analysis and Random Forest Classifier

COSYS-LEOST, University Gustave Eiffel, IFSTTAR, F-59650 Villeneuve d’Ascq, France
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(13), 4286; https://doi.org/10.3390/s21134286
Submission received: 7 May 2021 / Revised: 18 June 2021 / Accepted: 21 June 2021 / Published: 23 June 2021
(This article belongs to the Special Issue Signal Processing in Radar and Wireless Communication Systems)

Abstract

:
Signal identification is of great interest for various applications such as spectrum sharing and interference management. A typical signal identification system can be divided into two steps. A feature vector is first extracted from the received signal, then a decision is made by a classification algorithm according to its observed values. Some existing techniques show good performance but they are either sensitive to noise level or have high computational complexity. In this paper, a machine learning algorithm is proposed for the identification of vehicular communication signals. The feature vector is made up of Instantaneous Frequency (IF) resulting from time–frequency (TF) analysis. Its dimension is then reduced using the Singular Value Decomposition (SVD) technique, before being fed into a Random Forest classifier. Simulation results show the relevance and the low complexity of IF features compared to existing cyclostationarity-based ones. Furthermore, we found that the same accuracy can be maintained regardless of the noise level. The proposed framework thus provides a more accurate, robust and less complex V2X signal identification system.

1. Introduction

Intelligent Transport Systems (ITS) play a significant role in improving road safety and optimizing traffic management. They rely on advanced wireless technologies to share a large amount of data collected from hundreds of embedded sensors. These information exchanges are referred to as Vehicle-to-Everything (V2X) communications, and they encompass all the communications between a vehicle and its environment [1].
Two major wireless technologies have emerged to ensure this connectivity. On one hand, the ITS-G5 has been developed by the European Telecommunications Standards Institute (ETSI), based on the IEEE 802.11p access layer. It represents an extension of the general WiFi standard optimized for vehicular environments [2]. On the other hand, the Cellular Vehicle-to-Everything (C-V2X) communications have been introduced by the Third Generation Partnership Project (3GPP) with release 14 of the Long-Term Evolution (LTE) standard [3], then expanded in release 16 with the coming of the 5G New-Radio (NR) [4].
The coexistence of ITS-G5 and C-V2X technologies will satisfy the specific requirements of transport services in terms of latency, reliability and coverage. However, several challenges will arise, since they both operate over the 5.9 GHz spectrum. One solution to facilitate this coexistence and avoid interference consists of detecting and identifying the wireless technology, then dynamically selecting the appropriate transmission channel. Therefore, the efficiency of the spectrum usage relies on the ability of the ITS station to accurately identify the received signal [5].
Signal identification has been an intensive topic of research over the last two decades. In the context of cognitive radio, the classification of digitally modulated signals has been performed in several studies [6,7,8,9,10]. The authors of [7,8] exploit the statistics derived from the instantaneous features of the incoming signals, whereas the algorithms proposed in [9,10] are based, respectively, on the statistical moments and cumulants of these signals. Another study, conducted in [11], aims to recognize single carrier modulated signals versus Orthogonal Frequency Division Multiplexing (OFDM) signals based on their cyclostationarity. This property has aroused a great deal of interest in the research community and has been employed for the identification of some standard signals as well, such as the Global System for Mobile communication (GSM) versus LTE in [12], and Worldwide Interoperability for Microwave Access (WiMAX) versus LTE in [13].
All the above-mentioned methods belong to the feature-based statistical approach. It consists of extracting explicit features from the received signal, then passing them through a classification algorithm where the decision is made based on their observed values [14]. This decision making step is mostly based on the analysis of the probability distribution function of the feature vectors or the evaluation of the Euclidean distance between their prescribed and estimated values. They have both been proven to be simple to implement, with near-optimal performance. However, they are sensitive to the noise level and/or require a priori information on the received signal [6]. Moreover, the manually set decision parameters, such as thresholds, make it challenging to systematically adapt these techniques whenever a new wireless technology emerges.
Recently, deep learning techniques have been rapidly developed and have made great strides in the signal identification field. For example, convolutional neural networks can be said to be the most popular architecture for both modulation and wireless technology recognition [15,16]. Although this approach performs well in different applications and has the advantage of simple feature pre-processing or even raw data usage, it requires large-scale training data resulting in high implementation costs [17]. Moreover, the availability of datasets for wireless communications is one of the biggest challenges for researchers. As a result, Machine Learning (ML) techniques, such as Support Vector Machine (SVM) and Random Forest, have been widely used in related studies [18,19,20]. Combined with simulation based data generation, they have shown promising results with small datasets.
The aim of this paper is thus to exploit the power of ML techniques to identify ITS-G5, LTE-V2X and NR-V2X signals in an Additive White Gaussian Noise (AWGN) channel. Wireless technology identification is a substantial field of knowledge but the vehicular context has not been considered in existing studies. The proposed approach addresses three main issues: the confusion between two close technologies such as LTE and NR; the sensitivity of accuracy to noise level; and the high computational complexity. The first step is the extraction of the feature vector by performing a time–frequency (TF) analysis on the received signal. It consists of decomposing it into Intrinsic Mode Functions (IMF) then computing their Instantaneous Frequency (IF). This combination has the power to bring out the local and unique characteristics of signals. In order to achieve the best estimation of the raw features using fewer dimensions, we also implement the Singular Value Decomposition (SVD) technique. The obtained feature vector can then be fed into any classifier for the decision making step. In this study, we used the random forest classifier because of its simplicity.
To demonstrate the superiority of its performance, we compared different classification metrics of the proposed technique with those of the SVM classifier used with Spectral Correlation Function (SCF) features [20]. The accuracy of a cyclostationarity-based technique proposed in [12] was also evaluated to show the limitations of the statistical approach.
The rest of the paper is organized as follows: Section 2 reviews some signal pre-processing techniques that are relevant to our study. An overview of the considered V2X signals is presented in Section 3 along with their physical layer parameters. Section 4 presents the proposed identification algorithm based on instantaneous frequency features and the random forest classifier. After a description of the data generation process, the obtained results are evaluated in Section 5, where the confusion matrix and other classification metrics are compared with those of the cyclostationarity-based approach. Section 6 concludes this work and proposes some future research directions.

2. Background

In this section, we provide a review of some common pre-processing techniques, which will be used later in the proposed algorithm and the comparative study.

2.1. Cyclostationarity

A signal x ( t ) is considered to be second order cyclostationary if its second order statistics exhibit hidden periodicities in time. Its autocorrelation function R x ( t , τ ) can thus be expressed as [12]:
R x ( t , τ ) = E { x ( t + τ 2 ) x * ( t τ 2 ) } ,
where τ denotes the time delay and E { . } the statistical expectation.
Applying a Fourier series expansion to Equation (1), the T 0 —periodic function R x ( t , τ ) can be represented as:
R x ( t , τ ) = α R x α ( τ ) e j 2 π α t ,
where α = m / T 0 , m Z are the cyclic frequencies, and the Fourier coefficients,
R x α ( τ ) = 1 T 0 T 0 / 2 T 0 / 2 R x ( t , τ ) e j 2 π α t d t ,
are referred to as the Cyclic Autocorrelation Function (CAF).
In the frequency domain, the signal x ( t ) is characterized by its cyclic spectrum S x α ( f ) , also known as the Spectral Correlation Function (SCF). It is defined as the Fourier transform of the CAF and is given by [21]:
S x α ( f ) = T 0 / 2 T 0 / 2 R x α ( τ ) e j 2 π f τ d τ .
The SCF cannot be directly estimated using Equation (4) because of its high computational complexity. Therefore, an efficient cyclic spectral analysis algorithm, called the Fast Fourier Transform (FFT) Accumulation Method (FAM), has been proposed to reduce this complexity. The first step consists of computing the complex demodulates R T ( n , f ) using a sliding N -point FFT as follows:
R T ( n , f ) = k = N / 2 N / 2 a ( k ) x ( n k ) e j 2 π f ( n k ) T s ,
where a ( n ) is a Hamming window of length T = N T s , and T s is the sampling period. In the next step, the N —point FFT is hopped over the data in blocks of size L. Then, the product between the complex demodulates and its conjugate is time-smoothed by a second FFT of length P. Hence, the SCF estimate obtained by FAM is expressed as [21]:
S x T α i + q Δ α ( n L , f ) = k R T ( k L , f ) R T * ( k L , f ) g c ( n k ) e j 2 π k q / P .

2.2. Time–Frequency Analysis

Time–frequency analysis is effective for analyzing non-stationary signals and exploring their time-varying characteristics. One common technique is the Hilbert–Huang transform, which is a two step transform proposed by Huang et al. in 1998 [22]. The first step is called Empirical Mode Decomposition (EMD), and it transfers any complex signal x ( t ) into the linear superposition of K Intrinsic Mode Function (IMF) components c i ( t ) ( i = 1 , , K ) , which contain the local characteristics of the original signal at different time scales. Therefore, the signal x ( t ) can be written as [23]:
x ( t ) = i = 1 K c i ( t ) + r K ( t ) ,
where r K ( t ) is the residue and represents the average trend of the signal.
The second step consists of applying the Hilbert transform to the IMF component c i ( t ) and constructing the analytic signal z i ( t ) defined as:
z i ( t ) = c i ( t ) + j c i ˜ ( t ) = a i ( t ) e j ϕ i ( t ) ,
where c i ˜ ( t ) is the Hilbert transform of c i ( t ) and is expressed as:
c i ˜ ( t ) = 1 π + c i ( t ) t τ d τ .
Thus, a non-stationary complex signal x ( t ) can be expressed by a time-dependent function Z ( ω , t ) as follows [14]:
Z ( ω , t ) = Re i = 1 K a i ( t ) exp j ω i ( t ) d t ,
where
a i ( t ) = c i 2 ( t ) + c i ˜ 2 ( t )
is the instantaneous amplitude and
ω i ( t ) = d ϕ i ( t ) d t ; ϕ i ( t ) = tan 1 c i ˜ ( t ) c i ( t )
are the instantaneous frequency and phase.

3. Signal Model

In this section, the OFDM signal model used for ITS-G5, LTE-V2X and NR-V2X is introduced. More specifically, we present the frame structure and the physical layer parameters of these three standards, as they have a direct impact on the time–frequency features and the periodic behavior of the signals, needed for identification purposes.
Assuming that an OFDM symbol consists of N c subcarriers at frequencies f 0 , f 1 , …, f N c 1 separated by Δ f , the baseband equivalent transmitted signal x ( t ) is given by:
x ( t ) = a k n = 0 N c 1 s n , k e j 2 π f n ( t T c p k T s ) g ( t k T s ) ,
where a = E s / N c is the amplitude factor with E s representing the signal power. s n , k denotes the transmitted symbol within the n-th subcarrier and the k-th symbol period. These symbols are assumed to be independent and identically distributed (i.i.d) random variables with values drawn from an M-ary Quadrature Amplitude Modulation (QAM) constellation. T s is the symbol period given by T s = T u + T c p , with T u = 1 / Δ f denoting the useful symbol duration and T c p the length of the Cyclic Prefix (CP). The function t g ( t ) is the pulse shaping filter.
Therefore, the baseband-equivalent received signal affected by the AWGN channel is expressed as:
y ( t ) = a k n = 0 N c 1 s n , k e j 2 π f n ( t T c p k T s ) g ( t k T s ) + n ( t ) ,
where n ( t ) denotes the zero mean white Gaussian noise of variance σ n 2 .

3.1. ITS-G5

The physical layer of ITS-G5 is based on IEEE 802.11p, a modified version of the IEEE 802.11a standard. The main difference is that the subcarrier spacing and bandwidth are halved, which results in a symbol duration twice as long. The cyclic prefix duration is also doubled, which allows us to compensate for larger delay spreads and makes it more suitable for vehicular environments [24].
The IEEE 802.11p frame consists of three main fields. The first field lasts 32 µs and is called the preamble. It is used for channel assessment before transmission and for signal detection at the receiver side. The second element of the frame is the signal field and consists of one OFDM symbol. It is intended to indicate the data rate, packet length and modulation scheme of the transmitted signal. The last element is the data field, which has a variable number of OFDM symbols. It contains data, tail and padding bits [2].
For OFDM transmission, a total of 64 subcarriers is used. The 0th and the central 11 subcarriers are null. Those with indices 7, 10, 44 and 58 are occupied by pilot symbols, and the remaining 48 are used for carrying data [25]. The OFDM symbol lasts 8 µs and the subcarrier spacing is 156.25 kHz, leading to a raw bandwidth of 10 MHz. ITS-G5 supports a wide range of modulation schemes, from Binary Phase Shift Keying (BPSK) to 64-QAM [26].

3.2. LTE-V2X

LTE-V2X supports 10 MHz and 20 MHz channels. Each channel is divided into subframes, Resource Blocks (RBs) and subchannels. A subframe is 1ms long, as is the transmission time interval. It consists of 14 OFDM symbols with a normal cyclic prefix. Those with indices 3, 6, 9 and 12 are used for channel estimation and carry Demodulation Reference Signals (DMRS); the last symbol is used as a guard period for Tx-Rx timing adjustment, and the remaining are the actual data symbols [4].
A resource block represents the smallest unit of frequency resources and is made up of 12 subcarriers of 15 kHz spacing (total of 180 kHz). A combination of RBs in the same subframe is referred to as a subchannel in LTE-V2X, and each subchannel may have a different number of RBs [27].
Within the same subframe, a subchannel is used to transmit Transport Blocks (TB) over the physical sidelink shared channel, and Sidelink Control Informations (SCI) over the physical sidelink control channel. A TB contains user data information and must be transmitted with its associated SCI. An SCI carries information, including the modulation and coding scheme, which is crucial to decode user data. It is always sent using the Quadrature Phase Shift Keying (QPSK) modulation scheme, whereas TB can also support the 16-QAM modulation scheme [28].

3.3. NR-V2X

The 3GPP Release 16 defines the first specifications for the NR-V2X sidelink. It supports the same numerology and frequency bands as the NR Uplink/Downlink, but only the CP-OFDM waveform is used. A channel bandwidth up to 100 MHz is allowed in the first Frequency Range (FR1) with a subcarrier spacing ranging from 15 kHz to 60 kHz, against 400 MHz in the second one (FR2), where the subcarrier spacing parameter takes the maximum value of 120 kHz. Four modulation schemes are available, namely QPSK, 16-QAM, 64-QAM and 256-QAM [4].
The frame structure of 5G-NR allows flexible configurations to enable novel V2X use cases. Similar to LTE, the frame length is fixed to 10 ms and is divided into ten equally sized subframes. The subframe is further subdivided into slots, depending on the used numerology. Each slot has 14 OFDM symbols, forming a typical transmission unit [29].
Unlike LTE, the reference signals of 5G-NR are time and frequency configurable. Indeed, the DMRS, used by the receiver to produce channel estimates for data demodulation on the physical channels, is specified with a structure that has a front-load DMRS mapped in the front part of the data channel, as well as the additional mapping of 0–3 symbols of additional DMRS. Each design aims to find the best tradeoff between channel estimation accuracy improvement and DMRS overhead reduction [30].
Very low latency and minimum interference with other signals is achieved with mini slot transmission. It consists of transmitting the physical channel and its DMRS over a fraction (2, 4, or 7 symbols) of the slot.

4. Signal Identification

In this section, we detail the proposed algorithm steps for identifying the communication signal received by an ITS station. The pipeline is depicted in Figure 1. First, we describe the feature extraction process and the SVD technique, then we present the random forest classifier used for the decision making, and finally we define some classification metrics for the performance evaluation.

4.1. Feature Vector Extraction

The feature extraction procedure starts by applying empirical mode decomposition to the received signal y ( t ) using Equation (7). The obtained IMF components represent the original signal from high frequency to low frequency in different frequency bands. In addition, the first few IMFs are significant as they have the largest energy and contain the most important information from the I/Q signal. Therefore, the instantaneous frequencies of the prior K IMFs are then extracted using Equation (12). The value of K depends on the signal length and complexity. In practice, it is usually set between three and five [31,32].
Given N is the length of the signal of interest y ( t ) , the number of elements of the feature vector made up of K IFs is equal to K N , leading to a high-dimensional dataset. Consequently, dimensionality reduction is required in order to reduce the overall execution time and thus improve the classification model performance. In this context, singular value decomposition might be the most popular and efficient dimensionality reduction technique in machine learning. It comes from the field of linear algebra and consists of decomposing an m × n matrix M into three matrices U , Σ and V as follows:
M = U Σ V T ,
where U and V are two orthogonal matrices of dimensions m × m and n × n, respectively, and Σ is an m × n diagonal matrix. The diagonal entries σ i ( i = 1 , , r ) of Σ are positive real values listed in descending order. They represent the singular values of M , while r is equal to its rank [33].
By applying the SVD algorithm to the previously constructed feature vector, the most important structure in the raw data is preserved whilst reducing its dimension to 1 × K. Therefore, the obtained time–frequency feature vector that will be used for signal identification is given by:
S = [ σ 1 ( ω ) , , σ K ( ω ) ] T ,
where σ i ( ω ) is the singular value related to the instantaneous frequency of the i-th IMF.

4.2. Random Forest Classifier

After the feature vector is generated, it is fed into the classification model to determine the class to which the signal belongs. The model selection should take into consideration both accuracy and complexity. The random forest classifier has been reported to be one of the most effective off-the-shelf methods in machine learning, working well for a wide range of problems [34].
This method consists of building an ensemble (forest) of decision trees. Each tree provides a classification result and the forest chooses the class that has the highest votes as the overall output [35]. Random Forest increases the diversity of the trees by making them grow from different training data subsets created through bootstrap aggregating (bagging) [36]. The implementation steps of a random forest classifier can thus be summarized as follows [23]:
  • Building the individual trees of the forest using algorithms such as C4.5 or CART.
  • Sampling randomly the original training dataset without deletion of the selected data in order to create an in-bag subset for each tree.
  • Selecting randomly a set of features to construct the nodes and leaves of each tree.
  • Selecting the root node of the tree, which represents the attribute (feature) with the highest Information Gain (IG).
  • Splitting the training data at the root node into subsets for every possible value of the attribute. Then, at each node, the splitting is conducted if the IG is positive; otherwise the node becomes a leaf node. The information gain of splitting the training dataset (Y) into subsets ( Y i ) is given by:
    I G = i size ( Y i ) size ( Y ) E ( Y i ) ; E ( Y i ) = j = 1 J p j log 2 ( p j ) ,
    where J is the number of signal classes and p j the proportion of the class j in the subset Y i .
  • Repeating this process of tree growing at each node using the subset that reaches the branch and the remaining attributes until all attributes are selected. The most occurring signal class that reached that node is the classification output of the tree.
It is worth mentioning here that injecting randomness in both bagging and feature selection strategies increases the stability and the accuracy of classification, decreases the sensitivity to noise in the data, and minimizes the correlation among features [35].

4.3. Classification Metrics

To assess the performance of the proposed technique, we need to define the three metrics mainly used for classification problems, which are precision ( Π ), recall ( Ψ ) and F 1 -score. The precision gives an idea of how many of the results determined as positive are actually positive. The recall is a measure denoting how many true positives are correctly identified. The F 1 –score is an overall measure of the accuracy of the classifier and represents the harmonic average of precision and recall. These metrics are given by [21]:
Π = ξ ξ + υ , Ψ = ξ ξ + μ , F 1 - score = 2 × Π × Ψ Π + Ψ ,
where ξ , υ and μ denote the numbers of true positives, false positives and false negatives, respectively.
In addition, we define the accuracy P as the measure of how well accurate recognition can be performed by the classifier. It is given by:
P = P ( χ ^ l = χ l ) , l = 0 , 1 , 2 ,
where χ l and χ ^ l denote the label arrays of the received and the predicted signals, respectively. While the index l = 0 , 1 , 2 represents the label of the classes ITS-G5, LTE-V2X and NR-V2X, respectively.

5. Performance Evaluation

The aim of this section is to evaluate the performance of the proposed identification technique and to compare it with that of the existing cyclostationarity-based ones. So, we first describe the process to generate the vehicular signals as well as the resulting feature vectors that are used to train and test the two classifiers, then we present the simulation results comparing the performance metrics of both approaches.

5.1. Dataset Generation

The vehicular communication signals dataset used in this study is a synthetically generated dataset obtained using MATLAB [37]. It contains feature vectors extracted from ITS-G5, LTE-V2X and NR-V2X signals along with their respective labels. For each label (signal type), the simulations are performed at 15 different Signal-to-Noise Ratio (SNR) levels ranging from −10 dB to 18 dB, and each level consists of the same number of signals. As a result, the dataset covers, for the three wireless technologies, a total of 4500 signals, whose parameters are summarized in Table 1 and which stem from the possible configurations previously described in Section 3. An example of each signal type received at SNR = 10 dB is depicted in Figure 2 in both time and frequency domains.
For the feature extraction step, we consider two feature types. The first one is the SCF of the generated signals, estimated by Equation (6) and used as an input to the SVM classifier [20]. The length of the feature vector is set here, as in the original study, to 1 × 16,385, leading to a dataset dimension of 4500 × 16,386. The second feature is the singular values of IFs, given by Equation (16) and fed into the random forest classifier, as seen in Figure 1. We set K, the number of IMFs, to the lowest value that can be considered leading to a feature vector of three elements and a dataset dimension of 4500 × 4.

5.2. Simulation Results

5.2.1. Data Analysis

To better understand the resulting datasets, we need to visualize the feature vectors of the three considered signal types in two-dimensional space. For ease of plotting, the t-Distributed Stochastic Neighbour Embedding (t-SNE) technique is used. It is a method for visualizing high-dimensional data by giving each sample a location in a two or three-dimensional space, whilst preserving distances between samples [38].
As can be seen in Figure 3, the t-sne representation of both datasets clusters the three signal types into distinct groups in space. However, by comparing the two graphs, we can first observe that the SCF feature makes the separation harder and will consequently require a more complex classifier such as SVM. Moreover, the samples not belonging to any of the formed clusters or those superimposed on each other may increase the confusion among signals, unlike the IF feature samples in which almost no confusion can be seen. Therefore, the proposed feature vector allows us to address the first issue of signal identification, which consists of reducing the confusion between signals sharing the same PHY layer parameters. More in-depth analysis is required to explore these preliminary results in greater detail, which will be performed in the next subsection through confusion matrices and the previously defined classification metrics.

5.2.2. Performance Analysis

The proposed approach, consisting of IF features combined with the random forest classifier, is evaluated in tandem with that based on SCF features and the SVM classifier. Each of the datasets is shuffled and split into training and test sets containing 3000 and 1500 samples, respectively. Then, the input data is normalized and scaled, which is a crucial step to alleviate the effect of SNR variations, especially for distance-based classifiers like SVM. Implementation and evaluation are conducted in the open source Scikit-learn software library [39].
The confusion matrices of the two considered techniques are depicted in Figure 4. They show that the SCF features provide a slightly worse performance compared to the IF features when dealing with LTE-V2X and NR-V2X signals. Indeed, the two technologies share many PHY layer parameters, as previously seen in Section 3. Since SCF reveals the hidden periodicities within the signals, which are caused by the symbol period and cyclic prefix duration among others, their similarity decreases the distance between samples and prevents the SVM algorithm from correctly identifying those signals.
Their identification rates are 96% and 91%, respectively. The difference between these two rates flows from the fact that the NR-V2X standard has more configurations, and thus more dispersed SCF values, than the LTE-V2X standard. Therefore, the boundary placed by the SVM classifier to identify the NR-V2X signals is less accurate than that of the LTE-V2X signals. On the other hand, the random forest classifier increases the identification rate to 99% for both signals because the confusion between them has been decreased by using IF features. Their relevance comes from the power of intrinsic mode functions and instantaneous frequency to bring out the local time–frequency characteristics of the signals.
When it comes to the 802.11p technology, its unique characteristics make it more distinguishable, and both cyclostationarity and time–frequency based features can be used to identify ITS-G5 signals with an accuracy of 100%.
The precision, recall and F1–score of the three signal types and both techniques are summarized in Table 2. A simple comparison shows that the classification results are in line with the previous t-sne analysis. However, they only represent the global performance of the algorithms within a wide range of SNRs and do not really reflect the impact of this parameter on the identification accuracy.
In order to investigate this relationship, the two classifiers are trained and tested on signals of each SNR level separately. Figure 5 depicts the accuracy variation of both techniques, along with that obtained by implementing the classification algorithm in [12], taken as an example of a comparison with the statistical approach.
The results show that the cyclostationarity-based features are more sensitive to the noise level than the proposed IF features. For instance, the SVM classifier exceeds 90% accuracy at −4 dB, then gives the best performance, 100%, at −2 dB and remains constant until 18 dB. This behavior can be explained by the decrease of SCF amplitudes at low SNRs. Therefore, the difference between the cyclostationarity properties of C-V2X signals that have similar configurations becomes more difficult to discern, leading to a higher number of classification errors. Similarly, the statistical approach, based on a comparison between the CAF estimates of the considered signals at their cyclic frequencies and a threshold value determined by setting the probability of false alarm to 0.1, shows the poorest performances and its accuracy strongly depends on the SNR. It achieves a maximum value of 82% at 6 dB then fluctuates around 80% for the higher SNR regimes. On the other hand, the accuracy of the proposed model is almost stable at 100% for all SNR values due to the insensitivity of the time–frequency features to noise level.
This discussion demonstrates why the proposed ML approach represents a better choice for fulfilling the high requirements of vehicular applications in terms of accuracy, and why cyclostationarity-based features cannot maintain the same level of performance regardless of SNR value.

5.2.3. Complexity Analysis

So far, the proposed technique outperforms the SCF with SVM technique in terms of classification accuracy. However, the computational complexity is another important parameter that needs to be explored in order to make the optimal choice of features.
By applying the FFT accumulation method described in Section 2, the computational complexity of SCF estimation is given by O ( 2 N [ 4 + 2 log 2 ( N ) + 4 N + 2 N + N log 2 ( 4 N N ) ] ) , where N is the signal length [20]. By keeping the highest order terms of the Big-O notation, the overall time complexity is O ( N 2 ) . Therefore, the SCF combined with SVM technique is computationally expensive although it has a relatively good classification performance.
On the other hand, the extraction of IF features involves empirical mode decomposition, Hilbert transform and singular value decomposition. The computational complexity of the three algorithms is O ( K N ) [40], O ( N log 2 ( N ) ) [41] and O ( K 2 N ) [42], respectively. Therefore, the overall time complexity of the proposed technique is as low as O ( N log 2 ( N ) ) . Moreover, the dataset size of 4500 × 4 significantly decreases the training processing time of the classifier.

6. Conclusions

In this study, an ML-based technique for the identification of V2X communication signals without any prior information is proposed. It combines the use of robust features based on time–frequency analysis along with the random forest classifier.
First, we present the model of the three considered signals as well as their physical layer parameters. A comparison of these parameters shows that LTE-V2X and NR-V2X have similar properties, in particular those related to the periodicity of signals. Their instantaneous frequency is thus extracted to distinguish between them, then passed through the SVD algorithm to reduce their dimensionality.
By implementing the random forest classifier, the results show the effectiveness of our approach and the superiority of IF as a distinctive feature when compared to the cyclostationarity-based feature utilized in many existing studies. Moreover, comparative analysis with the statistical approach indicates that the latter is not suitable for identifying signals that have similar CFs, and that it is highly dependent on the SNR level.
In subsequent studies, the performance of the proposed identification technique can be explored on vehicular signals affected by multi-path fading channels. Furthermore, the proposed technique can also be used for real-world applications such as dynamic spectrum access or jamming signals detection.

Author Contributions

Conceptualization, C.S. and F.E.; methodology, C.S.; software, C.S.; validation, F.E.; formal analysis, C.S.; investigation, C.S.; resources, F.E.; data curation, C.S.; writing—original draft preparation, C.S.; writing—review and editing, C.S.; visualization, C.S.; supervision, F.E.; project administration, F.E.; funding acquisition, F.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by the project ELSAT 2020-OS4 ORIO. It is co-financed by the European Union with the European Regional Development Fund, the French state and the Region Hauts de France.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AWGNAdditive White Gaussian Noise
BPSKBinary Phase Shift Keying
CAFCyclic Autocorrelation Function
CPCyclic Prefix
DMRSDemodulation Reference Signal
EMDEmpirical Mode Decomposition
FAMFFT Accumulation Method
FFTFast Fourier Transform
GSMGlobal System for Mobile communication
IFInstantaneous Frequency
IMFIntrinsic Mode Function
ITSIntelligent Transport Systems
LTELong-Term Evolution
MLMachine Learning
NRNew Radio
OFDMOrthogonal Frequency Division Multiplexing
QAMQuadrature Amplitude Modulation
QPSKQuadrature Phase Shift Keying
RBRessource Block
SCFSpectral Correlation Function
SCISidelink Control Information
SNRSignal to Noise Ratio
SVDSingular Value Decomposition
SVMSupport Vector Machine
TBTransport Block
TFtime–frequency
t-SNEt-Distributed Stochastic Neighbour Embedding
V2XVehicle-to-Everything
WiMAXWorldwide Interoperability for Microwave Access

References

  1. Kiela, K.; Barzdenas, V.; Jurgo, M.; Macaitis, V.; Rafanavicius, J.; Vasjanov, A.; Kladovscikov, L.; Navickas, R. Review of V2X–IoT Standards and Frameworks for ITS Applications. Appl. Sci. 2020, 10, 4314. [Google Scholar] [CrossRef]
  2. ETSI. ITS-G5 Access layer specification for Intelligent Transport Systems operating in the 5 GHz frequency band. In EN 302 663-V1.3.1-Intelligent Transport Systems (ITS); Technical Report; ETSI: Sophia Antipolis, France, 2019. [Google Scholar]
  3. ETSI. Evolved Universal Terrestrial Radio Access (E-UTRA) and Evolved Universal Terrestrial Radio Access Network (E-UTRAN); Overall description; Stage 2 (3GPP TS 36.300 version 14.2.0 Release 14). In TS 136 300-V14.2.0-LTE; Technical Report; ETSI: Sophia Antipolis, France, 2017. [Google Scholar]
  4. ETSI. 5G; Overall description of Radio Access Network (RAN) aspects for Vehicle-to-everything (V2X) based on LTE and NR (3GPP TR 37.985 version 16.0.0 Release 16). In TR 137 985-V16.0.0-LTE; Technical Report; ETSI: Sophia Antipolis, France, 2020. [Google Scholar]
  5. Choi, J.; Marojevic, V.; Dietrich, C.B.; Reed, J.H.; Ahn, S. Survey of Spectrum Regulation for Intelligent Transportation Systems. IEEE Access 2020, 8, 140145–140160. [Google Scholar] [CrossRef]
  6. Dobre, O.; Abdi, A.; Bar-Ness, Y.; Su, W. Survey of automatic modulation classification techniques: Classical approaches and new trends. IET Commun. 2007, 1, 137. [Google Scholar] [CrossRef] [Green Version]
  7. Hu, Y.Q.; Liu, J.; Tan, X.H. Digital modulation recognition based on instantaneous information. J. China Univ. Posts Telecommun. 2010, 17, 52–90. [Google Scholar] [CrossRef]
  8. Moser, E.; Moran, M.K.; Hillen, E.; Li, D.; Wu, Z. Automatic modulation classification via instantaneous features. In Proceedings of the IEEE National Aerospace Electronics Conference, NAECON, Dayton, OH, USA, 15–19 June 2015; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2016; pp. 218–223. [Google Scholar] [CrossRef]
  9. Le Martret, C.J.; Boitea, D.M. Modulation classification by means of different orders statistical moments. In Proceedings of the IEEE Military Communications Conference MILCOM, Monterey, CA, USA, 3–5 November 1997; Volume 3, pp. 1387–1391. [Google Scholar] [CrossRef]
  10. Swami, A.; Sadler, B.M. Hierarchical digital modulation classification using cumulants. IEEE Trans. Commun. 2000, 48, 416–429. [Google Scholar] [CrossRef]
  11. Zhang, Q.; Dobre, O.A.; Eldemerdash, Y.A.; Rajan, S.; Inkol, R. Second-order cyclostationarity of BT-SCLD signals: Theoretical developments and applications to signal classification and blind parameter estimation. IEEE Trans. Wirel. Commun. 2013, 12, 1501–1511. [Google Scholar] [CrossRef]
  12. Karami, E.; Dobre, O.A.; Adnani, N. Identification of GSM and LTE signals using their second-order cyclostationarity. In Proceedings of the Conference Record-IEEE Instrumentation and Measurement Technology Conference, Pisa, Italy, 11–14 May 2015; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2015; pp. 1108–1112. [Google Scholar] [CrossRef] [Green Version]
  13. Al-Habashna, A.; Dobre, O.A.; Venkatesan, R.; Popescu, D.C. Second-order cyclostationarity of mobile WiMAX and LTE OFDM signals and application to spectrum awareness in cognitive radio systems. IEEE J. Sel. Top. Signal Process. 2012, 6, 26–42. [Google Scholar] [CrossRef]
  14. Al-Nuaimi, D.H.; Hashim, I.A.; Zainal Abidin, I.S.; Salman, L.B.; Mat Isa, N.A. Performance of Feature-Based Techniques for Automatic Digital Modulation Recognition and Classification—A Review. Electronics 2019, 8, 1407. [Google Scholar] [CrossRef] [Green Version]
  15. Shi, Y.; Davaslioglu, K.; Sagduyu, Y.E.; Headley, W.C.; Fowler, M.; Green, G. Deep Learning for RF Signal Classification in Unknown and Dynamic Spectrum Environments. In Proceedings of the 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), Newark, NJ, USA, 11–14 November 2019. [Google Scholar]
  16. Bitar, N.; Muhammad, S.; Refai, H.H. Wireless technology identification using deep convolutional neural networks. In Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, Montreal, QC, Canada, 8–13 October 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
  17. Li, X.; Dong, F.; Zhang, S.; Guo, W. A Survey on Deep Learning Techniques in Wireless Signal Recognition. Wirel. Commun. Mob. Comput. 2019. [Google Scholar] [CrossRef] [Green Version]
  18. Wang, L.X.; Ren, Y.J. Recognition of digital modulation signals based on high order cumulants and support vector machines. In Proceedings of the 2009 Second ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM 2009, Sanya, China, 8–9 August 2009; Volume 4, pp. 271–274. [Google Scholar] [CrossRef]
  19. Wang, X.; Gao, Z.; Fang, Y.; Yuan, S.; Zhao, H.; Gong, W.; Qiu, M.; Liu, Q. A signal modulation type recognition method based on kernel PCA and random forest in cognitive network. In Intelligent Computing Methodologies; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; Volume 8589, pp. 522–528. [Google Scholar] [CrossRef]
  20. Tekbiyik, K.; Akbunar, O.; Ekti, A.R.; Gorcin, A.; Karabulut Kurt, G. Multi-dimensional wireless signal identification based on support vector machines. IEEE Access 2019, 7, 138890–138903. [Google Scholar] [CrossRef]
  21. Tekbıyık, K.; Akbunar, O.; Ekti, A.R.; Görçin, A.; Kurt, G.K.; Qaraqe, K.A. Spectrum Sensing and Signal Identification with Deep Learning based on Spectral Correlation Function. arXiv 2020, arXiv:2003.08359. [Google Scholar]
  22. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Snin, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
  23. Fraiwan, L.; Lweesy, K.; Khasawneh, N.; Wenz, H.; Dickhaus, H. Automated sleep stage identification system based on time–frequency analysis of a single EEG channel and random forest classifier. Comput. Methods Programs Biomed. 2012, 108, 10–19. [Google Scholar] [CrossRef]
  24. Anwar, W.; Franchi, N.; Fettweis, G. Physical layer evaluation of V2X communications technologies: 5G NR-V2X, LTE-V2X, IEEE 802.11bd, and IEEE 802.11p. In Proceedings of the IEEE Vehicular Technology Conference, Honolulu, HI, USA, 22–25 September 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2019. [Google Scholar] [CrossRef]
  25. Sattiraju, R.; Wang, D.; Weinand, A.; Schotten, H.D. Link Level Performance Comparison of C-V2X and ITS-G5 for Vehicular Channel Models. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020. [Google Scholar]
  26. Bazzi, A.; Cecchini, G.; Menarini, M.; Masini, B.M.; Zanella, A. Survey and Perspectives of Vehicular Wi-Fi versus Sidelink Cellular-V2X in the 5G Era. Future Internet 2019, 11, 122. [Google Scholar] [CrossRef] [Green Version]
  27. Fan, Y.; Liu, L.; Dong, S.; Zhuang, L.; Qiu, J.; Cai, C.; Song, M. Network Performance Test and Analysis of LTE-V2X in Industrial Park Scenario. Wirel. Commun. Mob. Comput. 2020, 2020, 1–12. [Google Scholar] [CrossRef]
  28. Mannoni, V.; Berg, V.; Sesia, S.; Perraud, E. A Comparison of the V2X Communication Systems: ITS-G5 and C-V2X. In Proceedings of the IEEE Vehicular Technology Conference (VTC) Spring 2019, Kuala Lumpur, Malaysia, 28 April–1 May 2019. [Google Scholar]
  29. Bagheri, H.; Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Pesch, D.; Moessner, K.; Xiao, P. 5G NR-V2X: Towards Connected and Cooperative Autonomous Driving. arXiv 2020, arXiv:2009.03638. [Google Scholar]
  30. Skiribou, C.; Elbahhar, F.; Elassali, R. DMRS-based channel estimation for railway communications in tunnel environments. Veh. Commun. 2021, 29, 100340. [Google Scholar] [CrossRef]
  31. Jouny, I. Target recognition using scattering features extracted with EMD. In Proceedings of the 2014 IEEE Radar Conference, Cincinnati, OH, USA, 19–23 May 2014; pp. 0126–0129. [Google Scholar]
  32. Cheng, J.; Yu, D.; Yang, Y. A fault diagnosis approach for gears based on IMF AR model and SVM. EURASIP J. Adv. Signal Process. 2008, 2008, 1–7. [Google Scholar] [CrossRef] [Green Version]
  33. Tanwar, S.; Ramani, T.; Tyagi, S. Dimensionality reduction using PCA and SVD in big data: A comparative case study. In International Conference on Future Internet Technologies and Trends; Springer: Berlin/Heidelberg, Germany, 2018; Volume 220, pp. 116–125. [Google Scholar] [CrossRef]
  34. Louppe, G. Understanding Random Forests: From Theory to Practice. Ph.D. Thesis, University of Liège, Liège, Belgium, 2014. [Google Scholar]
  35. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  36. Breiman, L. Bagging Predictors; Technical Report; Springer: Berlin/Heidelberg, Germany, 1994. [Google Scholar]
  37. MATLAB, version 8.1.0.604 (R2013a); The MathWorks Inc.: Natick, MA, USA, 2013.
  38. Van Der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  39. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  40. Wang, Y.H.; Yeh, C.H.; Young, H.W.V.; Hu, K.; Lo, M.T. On the computational complexity of the empirical mode decomposition algorithm. Phys. A Stat. Mech. Appl. 2014, 400, 159–167. [Google Scholar] [CrossRef]
  41. Bilato, R.; Maj, O.; Brambilla, M. An algorithm for fast hilbert transform of real functions. Adv. Comput. Math. 2014, 40, 1159–1168. [Google Scholar] [CrossRef]
  42. Li, X.; Wang, S.; Cai, Y. Tutorial: Complexity analysis of Singular Value Decomposition and its variants. arXiv 2019, arXiv:1906.12085. [Google Scholar]
Figure 1. Pipeline of the proposed signal identification system.
Figure 1. Pipeline of the proposed signal identification system.
Sensors 21 04286 g001
Figure 2. Time and frequency domain representation of received (a) ITS-G5 (b) LTE-V2X and (c) NR-V2X signals at SNR = 10 dB.
Figure 2. Time and frequency domain representation of received (a) ITS-G5 (b) LTE-V2X and (c) NR-V2X signals at SNR = 10 dB.
Sensors 21 04286 g002
Figure 3. t-sne representation of (a) SCF and (b) IF features.
Figure 3. t-sne representation of (a) SCF and (b) IF features.
Sensors 21 04286 g003
Figure 4. Confusion matrix of (a) SCF with SVM and (b) IF with random forest techniques.
Figure 4. Confusion matrix of (a) SCF with SVM and (b) IF with random forest techniques.
Sensors 21 04286 g004
Figure 5. Classification accuracy with respect to SNR.
Figure 5. Classification accuracy with respect to SNR.
Sensors 21 04286 g005
Table 1. PHY layer parameters of generated vehicular communication signals.
Table 1. PHY layer parameters of generated vehicular communication signals.
ITS-G5LTE-V2XNR-V2X
Bandwidth10 MHz{10, 20} MHz{10, 20, 50} MHz
Subcarrier spacing156.25 kHz15 kHz{15, 30} kHz
FFT size64{1024, 2048}{512, 1024, 2048}
CP size16{72, 144}{36, 72, 144}
QAM order{4, 16, 64}{4, 16}{4, 16}
Table 2. Performance metrics of random forest and support vector machine classifiers.
Table 2. Performance metrics of random forest and support vector machine classifiers.
ModelSignalPrecisionRecallF1–Score
Proposed IF with RFITS-G5111
LTE-V2X0.990.990.99
NR-V2X0.990.990.99
Average0.990.990.99
SCF with SVM [20]ITS-G5111
LTE-V2X0.910.960.94
NR-V2X0.960.910.93
Average0.960.960.96
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Skiribou, C.; Elbahhar, F. V2X Wireless Technology Identification Using Time–Frequency Analysis and Random Forest Classifier. Sensors 2021, 21, 4286. https://doi.org/10.3390/s21134286

AMA Style

Skiribou C, Elbahhar F. V2X Wireless Technology Identification Using Time–Frequency Analysis and Random Forest Classifier. Sensors. 2021; 21(13):4286. https://doi.org/10.3390/s21134286

Chicago/Turabian Style

Skiribou, Camelia, and Fouzia Elbahhar. 2021. "V2X Wireless Technology Identification Using Time–Frequency Analysis and Random Forest Classifier" Sensors 21, no. 13: 4286. https://doi.org/10.3390/s21134286

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop