Hierarchical Classiﬁcation Method for Radio Frequency Interference Recognition and Characterization in Satcom

: The Quality of Service (QoS) and security of Satellite Communication (Satcom) are crucial as Satcom plays a signiﬁcant role in a wide range of applications, such as direct broadcast satellite, earth observation, navigation, and government/military systems. Therefore, it is necessary to ensure that transmissions are incorruptible, particularly in the presence of challenges such as Radio Frequency Interference (RFI), which is of primary concern for the efﬁciency of communications. The security of a wireless communication system can be improved using a robust RFI detection method, which could, in turn, lead to an effective mitigation process. This paper presents a new method to recognize received signal characteristics using a hierarchical classiﬁcation in a multi-layer perceptron (MLP) neural network. The considered characteristics are signal modulation and the type of RFI. In the experiments, a real-time video stream transmitted in the direct broadcast satellite is utilized with four modulation types, namely, QPSK, 8APSK, 16APSK, and 32APSK. Moreover, it is assumed that the communication signal can be combined with one of the three signiﬁcant types of interference, namely, Continuous Wave Interference (CWI), Multiple CWI (MCWI), and Chirp Interference (CI). In addition, two robust feature selection techniques have been developed to select more informative features, which leads to improving the classiﬁcation precision. Furthermore, the robustness of the trained techniques is assessed to predict unknown signals at different Signal to Noise Ratios (SNRs). an unsupervised correlation-based feature selection method the Pearson


Introduction
In an optimal radio communication system, the design objective is to allow users to share a medium with minimum or no interference [1]. RFI is one of the most critical issues facing Satcom since it corrupts radio communication networks, disrupting the transmitter channel and Signal of Interest (SoI) reception capacity [2]. Nevertheless, a critical capability is concerning robust RFI recognition and characterization through effective real-time monitoring [3]. Since civilian non-intentional RFI could compromise industry revenues and military interference may cause mission failure and put lives in danger [4]. In this context, it is essential to deploy effective jamming detection methods in both civil and military applications. The jamming detection technologies need to be adapted to the kind of RFI such that their efficiencies are maximized [4][5][6]. Designing a highly accurate detection technique, which can deal with different jamming types under severe channel distortions, represents an interesting avenue for research.

Related Works
The main focus of this study is determining the type of received signal and its modulation in a Satcom scenario based on a supervised ML-based classification using MLP. ML methods have been used in different research studies for RFI classification and AMC regarding their flexibility in data processing and accuracy in the designed models. As in [12] an Artificial Neural Network (ANN) is proposed for a jammer detection in wideband radios. To this end, spectral correlation is used as the feature extraction technique. According to the obtained results, the proposed technique performed efficiently on SNR values down to −3 dB.
In [16], three robust ML-based classifiers such as MLP, Support Vector Machine (SVM) and Random Forest (RF) have been deployed to detect jamming attacks in the 5G wireless network. According to the results RF has slightly more precise classification performance. Reference [17] presents an efficient MLP-based approach to recognize one of the jamming attacks known as Denial of Service (Dos). Moreover, an unsupervised correlation-based feature selection method using the Pearson Correlation Coefficient (PCC) is used to select more informative features. The authors in [18], present a robust MLP technique combined with a GA-based feature selection to detect various intrusions such as Remote to Local (R2L), User to Root (U2R) and DoS attacks. In [19], an efficient MLP design has been developed to recognize different intrusions. In [20], an efficient MLP is proposed to determine either a DoS attack exists in the wireless network or not.
In our previous work [21], we developed an MLP in different learning modes, namely Stochastic Gradient Descent (SGD), full-batch, and mini-batch. As the results show, the proposed technique can precisely classify four received signals, including SoI and a combination of SoI with three other jamming signals. Moreover, we deployed PCA to select more appropriate features to optimize the classification process.
In this work, we intend to propose a solution to determine not only the type of received signal but also Automatic modulation classification (AMC). AMC is a significant procedure for present and next-generation communication networks and facilitates the demodulation process at the receiver side [22]. Therefore, modulation recognition is an intermediate step between signal detection and demodulation. AMC is a system that automatically identifies the modulation type of the received signal [23], and the technology is widely used in various applications, such as dynamic spectrum management [24] and interference identification in CR [23].
In [25], a robust hierarchical classification based on MLP has been presented to recognize the modulation types of communication signals. As was thoroughly discussed in [14], AMC can be implemented regarding either a likelihood-based or a feature-based scenario. The disadvantages of a likelihood-based approach may include high computational complexity and sensitivity to impairments, such as phase and frequency offsets. The feature-based AMC, maybe comparatively more efficient [23] as it leverages robust extracted features such as an instantaneous amplitude, phase, and frequency [26], cyclostationary features [27], higher-order cumulants [28], and spectral correlation features [22]. In [23,29], the authors investigated different modulation types of recognition in a digital video broadcasting scenario based on higher-order cumulants and MLP.
AMC-based higher-order cumulant features are the primary concern of this study, inspired by [23] with proposing an approach based on higher-order cumulants of wavelet coefficients and MLP. Moreover, this study also applies PCA to select features that are more informative to accelerate the classification process. The results indicate that this method precisely classifies various modulation types with an accuracy above 99% at different SNRs ranges (−4 ∼ 4) dB [23]. In [30], an AMC-based DT approach and higher-order cumulants are proposed. Additionally, different designs of DT, such as Fine Tree (FT), Medium Tree (MT), and Coarse Tree (CT), are also thoroughly analyzed. In [31], the author has applied SVM as the classifier and 4th-order cyclic cumulant feature to classify three modulation types such as ASK, BPSK, and QPSK by assuming that the channel noise is in the range of −10 ∼ 10 dB. The experiment result shows that the 4th-order cyclic cumulant feature has an efficient discrimination capability in both non-noisy and noisy channels [31].

Proposed Methodology
The focus of this research study is on recognizing and characterizing the received radio frequency signals. To this end, a hierarchical classification design is proposed in which the first level deals with signal type recognition, and the second one automatically classifies the modulation type. Figure 1 presents the overview of the set-up, configuration, and framework of this study. It includes four main steps, namely data acquisition, feature extraction, feature selection, and classification. Each step is presented in detail in the rest of this section.

Data Acquisition
As thoroughly has been explained in [21], the dataset created and used in this study is extracted from a real-time video stream, which is modulated and processed by GNU radio. GNU is a free and open-source software development toolkit that provides signal processing blocks to implement software radios. The SoI is transmitted using a Universal Software Radio Peripheral (USRP-N210) [32], which is an enhanced version of the USRP that includes a larger FPGA. In GNU radio, the modulation type and amplitude of the transmitted signal can be easily adjusted [33].
A Satcom emulator [34] (RTLogic) is used for modeling a real-time communication channel. The programmatic control of the channel simulator is facilitated over an Ethernet connection using a control protocol or optional plugin to System Tool-Kit (STK) software. The channel simulator produces IF/RF signals with extracting signal behavior for any scenario, such as various satellite's orbit around Earth [35]. The Kratos STK plugin provides real-time, phase-continuous control of the channel simulator when playing STK scenarios [34]. Furthermore, the generated jamming signals are transmitted using a NanoBee modem and are combined with SoI by an additive combiner [21]. Finally, the combined signal is received by a MegaBee modem [21].
A summary of the dataset characteristics is presented in Table 1. The generated dataset consists of 300 samples for each class of modulation type, namely Quadrature phase-shift keying (QPSK) and 8/16/32 asymmetric phase-shift keying (APSK).

Feature Extraction
According to distinguishable characteristics of RF signal in each classification phase (RFI classification and characterization), this study considers two feature extraction methods to use in "RFI classification" and "modulation type recognition", as follows:

Features for RFI Classification:
To address RFI classification phase, by inspiring from [21] the six extracted features from each received signals are mean, standard deviation, skewness, Real-Signal Kurtosis (RSK), average power, and average power of the wavelet coefficients (4th approximation and details from 1st to 4th Order) as follows by assuming each received signal x with size 1 by n: • Standard deviation (σ): • Skewness (I/Q): As in [36], the skewness of a signal is computed as: The Real-Signal Kurtosis (RSK): This feature has been proposed by the author of [37] in which the kurtosis of In phase and Quadratic (I/Q) components of the signal is computed, and finally the RSK is obtained by averaging the computed kurtosis (I/Q).
The average power: according to [38], the average power of each signal over its length is calculated as: • Average power of the wavelet coefficients: First, each observation (x) is decomposed up to 4 levels (in this case, the higher levels than 4 are less informative) using 10th Daubechies wavelet (db 10). Notably, different types of the wavelet have been tested and the best classification result obtained using db 10. Secondly the average power of the 4th approximation and details of 1st to 4th level is computed by applying Equation (6).
Therefore, the size of each feature vector for the RFI classification phase is 1 by 10.

Features for Modulation Type Recognition:
To present the modulation type recognition, the higher-order cumulant-based features have been computed for the AMC process. In mathematics, moments are employed to describe the probability distribution of a function. The pth order and qth conjugation moment for a received signal x[n] in time-step n are defined as [39]: where E(.) and [] * are the expected operator and complex conjugate, respectively. Equation (7) can be approximated as [29]: In this work, the magnitude of the 2nd, 4th and 6th [21] cumulants are computed to distinguish between QPSK, 8APSK, 16APSK, and 32APSK: where C and M, and index, respectively present Cumulant, Moment and their order.
Notably, different orders of cumulant-based features were tested, and the best results were achieved using the above orders (|C21|, |C42| and |C63|).

Feature Selection Techniques
In supervised classification, it is required to minimize the number of features to speed up the training and classification processes, [40]. However, fewer characteristic features may come at the expense of classification accuracy degradation [40]. However, the trick in dimensionality reduction is to trade some precision for simplicity [15]. This study considers a comparative study of two feature selection techniques, namely PCA and GA. The main reason for using these techniques in this study is due to its low noise sensitivity, less memory requirements, and high efficiency in the training process.

Principal Component Analysis (PCA)
In [41], PCA has been presented as an efficient feature selection method with lower computational complexity in comparison to other approaches, like colony optimization [41]. PCA as a feature selection approach is implemented as follows [21].
First, the mean of each feature set is calculated then each value is subtracted from the mean. It is a crucial step to ensure that the first principal component describes the direction of the maximum variance [15]. A further step is calculating the covariance matrix, which is a representation of the linear dependency between two values. The third step is the calculation of eigenvectors and eigenvalues of the covariance matrix to determine the principal components of the data. Therefore, the highest eigenvalues are related to the most uncorrelated eigenvectors, which are considered to be the principal components. The main focus of PCA is to put maximum possible information in the direction of the first component, then the maximum remaining data in the second one, and so on. Since a feature component is less significant for feature extraction, it can also be interpreted that this feature is less informative in the original space [15].
As shown in Figure 2, a GA algorithm starts with an initial population, which is a subset of all the possible solutions (Also known as individuals) to the given problem [42]. Each individual has a set of genes, represented by a string of zeros and ones [42]. To evaluate the quality of a solution, a fitness value is assigned to each individual. Further various strategies can be applied to select the best individuals known as parents [42]. After the parent selection step, the variation operations such as mutation and crossover [42] are applied to generate new off-springs [42]. Finally, these off-springs replace the existing individuals in the population, and the process is repeated until reaching the stopping criterion.
As shown in Figure 3, to adapt GA technique with this study, the steps below are proposed: • Binary representation: In the feature selection step, each of the elements in the feature vector may or may not be selected. Therefore, a string is used to represent the selected and discarded features. In the given string, each 1 indicates that the corresponding feature to the index is chosen, while each 0 determines the discarded features [42].

•
Population model: In a steady state of GA technique, one or more off-spring is generated in each iteration, and they replace one or more individuals from the population [42].

•
Population initialization: This study considers a random initialization, in which k bits of the string is randomly set to 1. In other words, k features are chosen out of the n-dimensional space of features [42]. • Parent selection: The "Rank Selection" approach is used for selecting parents since the individuals in the population have very close fitness values. In this work, the fitness value is the classification accuracy which is computed by specifying the number of True Detection (N TD ) and the number of False Detection (N FD ) [21]: Therefore, each individual in the population is ranked according to their fitness, and the parent selection depends on the rank of each individual, not on the fitness value [42]. • Mutation: The swap mutation is deployed, in which two random positions on the chromosome are selected, and their values interchanged. It should be noted that in this application, the crossover is the same as the mutation. In other words, if two parents are selected, to exchange the elements in the selected indices (single point or other crossover methods), while preserving the number of 1s in the solution, so it accurately mimics the mutation process [49]. • Survival selection: The survival policy determines which individuals should be maintained in the next generation. However, this is a crucial step as it must ensure that the fitter individuals are not kicked out of the population while maintaining diversity in the same population. In this work, a fitness-based selection is used, in which the children tend to replace the least fit individuals in the population [49].

•
Termination criterion: The algorithm is terminated when a set number of generations is reached. Figure 4 shows the variation in the best fitness values during 100 generations for RFI classification using five features. The result presents the improvement in the classification accuracy of over 100 number of generations up to 99.81%. Moreover, the best fitness value corresponds to the chromosome [0010001111], which indicates that the more informative features are the standard deviation of the received signal and average power of the 4th wavelet coefficients (D 1 to D 4 ) using Daubechies 10 (dB10) wavelet.

Classification
As shown in Figure 1, the classification phase contains a hierarchical design, including two levels. The first level is a global search using one classifier to detect the type of received signals (SoI, SoI + CWI, SoI + MCWI, and SoI + CI). Furthermore, a localized search is deployed for each type of signal to recognize the modulation type. To this end, we used the same classifier design proposed in our previous work based on MLP [21] trained using 10-fold cross-validation technique.
Cross-validation is a statistical approach used to evaluate the ability of ML-based models on unseen data. Generally, it is deployed in applied ML to compare and select a model for a given predictive modeling problem [52]. As the benefits of this generalization technique can refer to easy to understand, simple implementation, and results in skill estimates that generally have a lower bias than other methods such as a simple train/test split [52]. The procedure has a key parameter called k that refers to the number of groups that a given data sample is to be split into such that the procedure is often called k-fold cross-validation [52]. The implementation steps of k-fold cross-validation are as follows: Step 1: Shuffle the dataset randomly and split it into k groups.
For each unique group: Step 2: Take the group as a holdout or test dataset.
Step 3: Take the remaining groups as a training dataset.
Step 4: Fit a model on the training set and evaluate it on the test set.
Step 5: Retain the classification accuracy and discard the model. Finally, the model's performance is evaluated by the average of all the obtained classification accuracy values.
Therefore, each observation in the dataset is assigned to an individual group and stays in that group during the training procedure. Therefore, each sample has the opportunity to be used in the holdout set once and used to train the model k − 1 times [52]. Choosing an optimal value for k is very crucial since a poorly chosen value for k may result in a misrepresentative of the model's skill, such as a score with a high variance or bias [52]. This study considers k = 10 due to achieved experiments results with low bias in estimated model skill. The value of k = 10 is widespread in the field of applied ML tasks.

Results and Discussion
This section presents and analyses the performance of the proposed algorithms for both RFI and modulation classification. All the simulations have been performed with MATLAB (Version R2019b), in a Core i5-5257U CPU computer system, operating at 2.70 GHz with RAM = 8 GB. Moreover, the dataset used for the classification has been generated at AWGN power −140 dBm, and approximately SNR value is 9 dB. As shown in Table 2, the power of the received signals (dBm/Hz) is measured using a signal analyzer in a 3.84 MHz Bandwidth (BW). The Jamming to Signal Ratio (JSR) is computed as Therefore, the measured JSR for CWI, MCWI, and CI is 7 dB, 5 dB, and 8 dB, respectively.

RFI Classification Results
The performance of the developed MLP approach in [21] is considered for RFI classification. Therefore, a two-layer MLP is used with 10 and 4 neurons in the input and output layers, respectively. Moreover, the logarithmic sigmoid and linear functions are used as the hidden and output layer activation functions [21]. Table 3 presents the details of the designed MLP using different Hidden Layer Neurons (HLN) and Batch Sizes (BS). The training process of this study is performed by the MLP using a 10-fold cross-validation. As the results show, by increasing the HLN the accuracy can slightly improve. Moreover, due to high variety of classes in the dataset the online learning mode performs better in which the network's key parameters are updated based on each sample one by one. Therefore, the most accurate result has been achieved using HLN = 30 and online learning mode (BS = 1), with a precision of 99.58%. Table 4 illustrates the results of applying the proposed MLP + GA approach for RFI classification, using the different Numbers of Features (NoF) with HLN = 30. According to the results, the highest accuracy, 99.97%, is obtained with BS = 1 and NoF = 8. Also, the classification precision is increased by dealing with more features. Table 5 demonstrates the result of deploying PCA + MLP for the first classification phase using a various number of features and learning modes such as online learning (BS = 1), mini-batch, and Full-Batch. As can be seen, the classification accuracy reached to 97.05% with only five features. Table 5 demonstrates the result of deploying MLP + PCA for the first classification phase. As can be seen, a precise classification was achieved with only five features.  Table 6 illustrates the MLP-based classification results for modulation type recognition. The number of neurons for each layer is 3, 30, and 4, respectively. The results show that in case of no jamming, the average AMC accuracy is 87%. In the presence of jamming signals, the classification performance is degraded locally, about 3%, 16%, and 36% for CWI, CI, and MCWI, respectively. Each AMC classifier refers to a specific class of the received signal. AMC1, AMC2, AMC3, and AMC4, are respectively referred to SoI, SoI + CWI, SoI + MCWI, and SoI + CI.  Table 7 demonstrates the effect of GA-based feature selection on the second phase of classification to recognize the modulation types. As is shown, MLP + GA can precisely classify the four modulation types per each received signals using only one feature with a precision of 90.83%, 86.66%, 61.66% and 76.67% for AMC1 to AMC4 respectively.  Table 8 illustrates the effect of PCA-based feature selection on classifying the modulation types. As is shown, MLP + PCA performs efficiently using only one feature for AMC1 to 4 with an accuracy of 86%, 89.77%, 51.11%, and 67.66%, respectively. The learning process is accelerated using PCA because the classifier is trained with fewer features.

Comparative Classification Results
This section includes a summary of the results for RFI classification and characterization. Figure 5 demonstrates the comparative analysis of deploying the proposed approaches for the first level classification to recognize the type of received signals. All the three approaches reach performance accuracy more than 90%, while MLP + GA with five features slightly outperforms other techniques.    To this end, we used the "tic toc" function in MATLAB to calculate the consumed training time. As the results show, GA + MLP is computationally more expensive, and as a result, the computation time is comparatively longer, while MLP + PCA is the most efficient one.

Prediction Phase Results
In this section, the robustness of the proposed techniques is thoroughly analyzed in detecting unseen data generated at different AWGN power ranging from −140 to −125 dBm. The performance of the classifier varies depending on the noise level and the existence of jammers. To detect the type of the received signal, the trained MLP using 10-fold cross-validation has a higher generalization accuracy in detecting unseen data generated at the same noise level as the training data. Therefore, the prediction accuracy degrades by increasing AWGN power. Table 9 shows the results of evaluating the trained classifier's performance to recognize the type of received signal at different AWGN powers.  Tables 10-13 show the effect of noise on the trained classifiers in predicting the type of new data. As is shown, two main factors concerning the prediction accuracy are the type of jammer and noise power. Apparently, in the presence of MCWI and CI, the classifiers cannot perform precisely.

Conclusions
In this study, a novel hierarchical classifier has been proposed to facilitate the RFI classification and characterization. The proposed classifier not only classifies the type of the received signal but also determines its modulation precisely. Moreover, three robust approaches have been developed, namely MLP + 10-fold cross-validation, MLP + GA, and MLP + PCA. The results confirm that for the RFI classification phase, the classification accuracy reaches 99.81% using the MLP + GA technique, depending on the chosen batch size. In general, MLP + GA performs more precisely in determining the modulation type and RFI classification. Moreover, the results approve that PCA-based feature selection is more efficient in terms of computation time and computational complexity versus GA. In future studies, we intend to look at deep learning-based classification techniques using raw received data to avoid complicated feature extraction and selection steps.

Materials
The raw RFI dataset is available at https://zenodo.org/record/3819586#315.XriaSGhKh3h. In this dataset, Signal of Interest (SoI) is a real-time video stream that is transmitted using DVB-S2 standards in four modulation types, including (QPSK, 8/16/32 APSK). Furthermore, this SoI combined with three well-known jamming signals, namely Continuous Wave Interference (CWI), Multiple CWI (MCWI), and Chirp Interference (CI). This dataset includes 300 samples per modulation type for each type of signal. Therefore, there are 4800 observations in the dataset, and each sample is a vector of size 1 by 32488 (8 ms) at sample frequency 40 Hz. Also, AWGN power is −140 dBm which is approximately equal to SNR = 9 dB.