Hierarchical Classiﬁcation Method for Radio Frequency Interference Recognition and Characterization

: Satellite communication (Satcom) is an artiﬁcial geostationary satellite that facilitates a wide range of telecommunications. Considering its quality of service (QoS) and security is crucial in government/military applications. The most challenging situation for efﬁcient Satcom is radio frequency interference (RFI) environment. Thus, it is necessary to ensure that transmissions are incorruptible or at least sense the quality of its spectrum. This paper presents a new method to recognize received signal characteristics using a hierarchical classiﬁcation in a multi-layer perceptron neural network. We consider signal modulation and the type of RFI as the characteristics of a real-time video stream transmitted in the direct broadcast satellite. Four different modulation types are investigated in this study. Moreover, the combination of the communication signal with various kinds of interference and their effects on the classiﬁcation method widely have been analyzed. Besides, two robust feature selection techniques have been developed to reduce the data-set dimensional, which leads to optimizing the classiﬁcation process. The results show that the Genetic Algorithm (GA) slightly outperforms Principal Component Analysis (PCA) for feature selection. Furthermore, the robustness of the proposed techniques is assessed to detect unknown signals at different signal to noise ratios.


Introduction
In an optimal radio communication system, the design objective is to allow users to share a medium with minimum or no interference [1]. RFI is one of the most critical issues facing Satcom since it corrupts radio communication networks, disrupting the transmitter channel and Signal of Interest (SoI) reception capacity [2]. Nevertheless, a critical capability is concerning robust RFI recognition and characterization through effective real-time monitoring [3]. Since civilian non-intentional RFI could compromise industry revenues and military interference may cause mission failure and put lives in danger [4]. In this context, it is essential to deploy effective anti-jamming methods in both civil and military applications. Anti-jamming technologies need to be adapted to the kind of RFI such that their efficiencies are maximized [4]- [6]. Designing a highly accurate detection technique, which can deal with different jamming types under severe channel distortions, represents an interesting avenue for research [5]. Several spectrum sensing techniques, such as Cyclo-Stationary Feature Detection (CFD) [7], Energy Detection (ED) [8], and matched filtering-based detection [9], have been proposed for RFI monitoring in Cognitive Radios system that automatically identifies the modulation type of the received signal [22], and the technology is widely used in various applications, such as dynamic spectrum management [23] and interference identification in CR [22]. As was thoroughly discussed in [32], AMC can be implemented regarding either a likelihood-based or a feature-based scenario. The disadvantages of a likelihood-based approach may include high computational complexity and sensitivity to impairments, such as phase and frequency offsets. The feature-based AMC, maybe comparatively more efficient [22] as it leverages robust extracted features such as an instantaneous amplitude, phase, and frequency [33], cyclostationary features [25], higher-order cumulants [26], and spectral correlation features [34]. In [22] and [27], the authors investigated different modulation types of recognition in a digital video broadcasting scenario based on higher-order cumulants and MLP. AMC-based higher-order cumulant features are the primary concern of this study, inspired by [22] with proposing an approach based on higher-order cumulants of wavelet coefficients and MLP. Moreover, this study also applies PCA to select features that are more informative in order to accelerate the classification process. The results indicate that this method precisely classifies various modulation types with an accuracy above 99% at different SNRs ranges (−4 ∼ 4) dB. In [28], an AMC-based DT approach and higher-order cumulants are proposed. Additionally, different designs of DT, such as Fine Tree (FT), Medium Tree (MT), and Coarse Tree (CT), are also thoroughly analyzed. In [29], the author has applied SVM as the classifier and 4 th -order cyclic cumulant feature to classify three modulation types such as ASK, BPSK, and QPSK by assuming that the channel noise is in the range of −10 ∼ 10 dB. The experiment result shows that the 4 th -order cyclic cumulant feature has an efficient discrimination capability in both non-noisy and noisy channels.

Proposed Methodology
The focus of this research study is on recognizing and characterizing the received radio frequency signals. To this end, a hierarchical classification design is proposed in which the first level deals with signal type recognition, and the second one automatically classifies the modulation type. Figure 1 presents the overview of the set-up, configuration, and framework of this study. It includes four main steps, namely, data acquisition, feature extraction, feature selection, and classification. Each step is presented in detail in the rest of this section.

Data acquisition
As thoroughly has been explained in [30], the dataset created and used in this study is extracted from a real-time video stream, which is modulated and processed by GNU radio. GNU is a free and open-source software development tool-kit that provides signal processing blocks to implement software radios. The SoI is transmitted using a Universal Software Radio Peripheral (USRP-N210) [37], which is an enhanced version of the USRP that includes a larger FPGA. This facility allows users to move additional functionality into the FPGA, increasing the maximum processing capability in both communication directions while offering potential improvements in processing latency. In GNU radio, the modulation type and amplitude of the transmitted signal can be easily adjusted [38]. A Satcom emulator [39] (RTLogic) is used for modeling a real-time communication channel. The programmatic control of the channel simulator is facilitated over an Ethernet connection using a control protocol or optional plugin to System Tool-Kit (STK) software. The channel simulator produces IF/RF signals with extracting signal behavior for any scenario. The Kratos STK plugin provides real-time, phase-continuous control of the channel simulator when playing STK scenarios [39]. Further, the generated jamming signals are transmitted using a NanoBee modem and are combined with SoI by an additive combiner. Finally, the combined signal is received by a MegaBee modem. A summary of the dataset characteristics is presented in Table 1. The generated dataset consists of 300 samples for each class of modulation type, namely, Quadrature phase-shift keying (QPSK) and 8/16/32 asymmetric phase-shift keying (APSK). Any digital modulation scheme uses a finite number (8,16, and 32 in this paper) of distinct signals to represent digital data. QPSK and APSK use a finite number of phases, each assigned a unique pattern of binary digits. The length of 8ms is considered to generate continuous time-series signals.

Feature extraction
According to distinguishable characteristics of RF signal in each classification phase (RFI classification and characterization), this study considers two feature extraction methods to utilize in "RFI classification" and "modulation type recognition," as follows: • Features for RFI classification: To address RFI classification phase, by inspiring from [30] the six extracted features are mean, standard deviation, skewness, Real-Signal Kurtosis (RSK), average power, and average power of the wavelet coefficients (4 th approximation and details from 1 st to 4 th Order). Therefore, the size of each feature vector for the RFI classification phase is 1 by 10. The importance of those features is explained in [30] in detail.
• Features for modulation type recognition: To present the modulation type recognition, the higher-order cumulant-based features have been computed for the AMC process. In mathematics, moments are employed to describe the probability distribution of a function. The p th order and q th conjugation moment for a received signal x[n] in time-step n are defined as [41]: where E(.) and [] * are the expected operator and complex conjugate, respectively. Eq. 1 can be approximated as [27]: In this work, the magnitude of the 2 nd , 4 th and 6 th [27] cumulants are computed to distinguish between QPSK, 8APSK, 16APSK, and 32APSK: where C and M, and index, respectively present Cumulant, Moment and their order.
Notably, different orders of cumulant-based features were tested and the best results were achieved using the above orders (|C21|, |C42| and |C63|).

Feature selection techniques
In supervised classification, it is required to minimize the number of features to speed up the training and classification processes, [42]. However, less number of characteristic features may come at the expense of classification accuracy degradation [42]. However, the trick in dimensionality reduction is to trade some precision for simplicity [43]. This study considers a comparative study of two feature selection techniques, namely, PCA and GA. The main reason for using these techniques in this study is due to its low noise sensitivity, less memory requirements, and high efficiency in the training process.

Principal Component Analysis (PCA)
This technique has been used in a wide range of computer science applications for feature extraction [43]. PCA projects a dataset from many correlated coordinates onto fewer uncorrelated coordinates called principal components, with information preserved as much as possible [43]. Moreover, in [44], PCA is presented as an efficient feature selection method with lower computational complexity in comparison to other approaches, like colony optimization [44]. The implementation steps of PCA have been fully explained in [30]. As shown in Algorithm 1, PCA for feature extraction is implemented in four steps. Firstly, the mean of each feature set is calculated then each value is subtracted from the mean. It is a crucial step to ensure that the first principal component describes the direction of the maximum variance [46]. A further step is calculating the covariance matrix, which is a representation of the linear dependency between two values. The third step is the calculation of eigenvectors and eigenvalues of the covariance matrix to determine the principal components of the data. Therefore, the highest eigenvalues are related to the most uncorrelated eigenvectors, which are considered as the principal components. The main focus of PCA is to put maximum possible information in the direction of the first component, then the maximum remaining data in the second one, and so on. Since a feature component is less significant for feature extraction, it can also be interpreted that this feature is less informative in the original space [46]. The implementation steps of PCA have been fully explained in [30]. GA is known as a search-based optimization technique inspired by Genetics and Natural Selection [56]. GA has been widely used not only for feature selection but also for optimizing the hyperparameters of an Artificial Neural Network (ANN), such as weight [42], [47]- [55]. As shown in Figure 2, a GA algorithm starts with an initial population, which is a subset of all the possible solutions (Also known as individuals) to the given problem [56]. Each individual has a set of genes, represented by a string of zeros and ones [56]. To evaluate the quality of a solution, a fitness value is assigned to each individual. Further various strategies can be applied to select the best individuals known as parents [56]. After the parent selection step, the variation operations such as mutation and crossover [56] are applied to generate new off-springs [56]. Finally, these off-springs replace the existing individuals in the population, and the process is repeated until reaching the stopping criterion. To adapt GA technique with this study, presented in pseudo-code of Algorithm 2, the steps below are proposed: • Binary representation: In the feature selection step, each of the elements in the feature vector may or may not be selected. Therefore, a string is used to represent the selected and discarded features. In the given string, each 1 indicates that the corresponding feature to the index is chosen, while each 0 determines the discarded features [56].
• Population model: In a steady state of GA technique, one or more off-spring is generated in each iteration, and they replace one or more individuals from the population [56].
• Population initialization: This study considers a random initialization, in which k bits of the string are randomly set to 1. In other words, k features are chosen out of the n-dimensional space of features [56].
• Parent selection: The "Rank Selection" approach is utilized for selecting parents since the individuals in the population have very close fitness values. In this study, the fitness function is the classification accuracy, which is defined as the number of correct detections divided by the total number of detections. Therefore, each individual in the population is ranked according to their fitness, and the parent selection depends on the rank of each individual, not on the fitness value [56].
• Mutation: The swap mutation is deployed, in which two random positions on the chromosome are selected, and their values interchanged. It should be noted that in this application, the crossover is the same as the mutation. In other words, if two parents are selected, to exchange the elements in the selected indices (single point or other crossover methods), while preserving the number of 1s in the solution, so it accurately mimics the mutation process [53].
• Survival selection: The survival policy determines which individuals should be maintained in the next generation. However, this is a crucial step as it must ensure that the fitter individuals are not kicked out of the population while maintaining diversity in the same population. In this work, Replace the individual with the minimum fitness value with the new off-spring end if Save the individual with the maximum fitness value end for return the individual with the maximum fitness value as the best feature set a fitness-based selection is used, in which the children tend to replace the least fit individuals in the population [53].
• Termination criterion: The algorithm is terminated when a set number of generations is reached. Figure 3 shows the variation in the best fitness values during 100 generations for RFI classification using five features. The result presents the improvement in the classification accuracy of over 100 number of generations up to 99.81%. Moreover, the best fitness value corresponds to the chromosome [0010001111], which indicates that the more informative features are the standard deviation of the received signal and average power of the 4 th wavelet coefficients (D 1 to D 4 ) using 10 dB mother wavelet.

Classification
As shown in Figure 1, the classification phase contains a hierarchical design, including two levels. The first level is a global search using one classifier to detect the type of received signals (SoI, CWI, MCWI, and CI). Further, a localized search is deployed for each type of signal to recognize the modulation type. To this end, we used the same classifier design proposed in our previous work based on MLP [30] trained using 10-fold cross-validation technique. Cross-validation is a statistical approach used to evaluate the ability of ML-based models on unseen data. Generally, it is deployed in applied ML to compare and select a model for a given predictive modeling problem [40]. As the benefits of this generalization technique can refer to easy to understand, simple implementation, and results in skill estimates that generally have a lower bias than other methods such as a simple train/test split [40]. The procedure has a key parameter called k that refers to the number of groups that a given data sample is to be split into such that the procedure is often called k-fold cross-validation [40]. Algorithm 3 illustrates the implementation steps of k-fold cross-validation as follows Take the group as a holdout or test data set Take the remaining groups as a training data set Fit a model on the training set and evaluate it on the test set Retain the evaluation score and discard the model Summarize the skill of the model using the sample model evaluation scores end for return Evaluation score As shown in Algorithm 3, each observation in the dataset is assigned to an individual group and stays in that group during the training procedure. Therefore, each sample has the opportunity to be used in the holdout set once and used to train the model k − 1 times [40]. Choosing an optimal value for k is very crucial since a poorly chosen value for k may result in a misrepresentative of the model's skill, such as a score with a high variance or bias [40]. This study considers k = 10 due to achieved experiments results with low bias in estimated model skill. The value of k = 10 is widespread in the field of applied ML tasks.

EXPERIMENTAL RESULTS
This section presents and analyses the performance of the proposed algorithms for both RFI and modulation classification. All the simulations have been performed with MATLAB (Version R2019b), in a Core i5-5257U CPU computer system, operating at 2.70 GHz with RAM = 8GHz. Moreover, the dataset used for the classification has been generated at AWGN power −140dBm, and approximately SNR value is 9dB. As shown in Table 2, the power of the received signals (dBm/Hz) is measured using a signal analyzer in a 3.84 MHz Bandwidth (BW). The Jamming to Signal Ratio (JSR) is computed as JSR = Power SoI − Power Jamming (6) Therefore, the measured JSR for CWI, MCWI, and CI is 7 dB, 5 dB, and 8 dB, respectively.

RFI Classification results
The performance of the developed MLP approach in [30] is considered for RFI classification. Therefore, a two-layer MLP is used with 10 and 4 neurons in the input and output layers, respectively. Moreover, the logarithmic sigmoid and linear functions are utilized as the hidden and output layer activation functions. Table 3 presents the details of the designed MLP using different Hidden Layer Neurons (HLN) and Batch Sizes (BS). The training process of this study is performed by the MLP using a 10 − f old cross-validation. As the results show, the most accurate result has been achieved using HLN = 30 and online learning mode (BS=1), with a precision of 99.58%. Table 4 illustrates the results of applying the proposed GA + MLP approach for RFI classification, using the different Numbers of Features (NoF) with HLN = 30. According to the results, the highest accuracy, 99.97%, is obtained with BS = 1 and NoF = 8. Also, the classification precision is increased by dealing with more features. Table 5 demonstrates the result of deploying PCA + MLP for the first classification phase using a various number of features and learning modes such as online learning (BS = 1), mini-batch, and Full-Batch. As can be seen, the classification accuracy reached to 97.05% with only five features. Table 5 demonstrates the result of deploying PCA + MLP for the first classification phase. As can be seen, a precise classification was achieved with only five features.  Table 6 illustrates the MLP-based classification results for modulation type recognition. The number of neurons for each layer is 3, 30, and 4, respectively. The results show that in case of no jamming, the average AMC accuracy is 87%. In the presence of jamming signals, the classification performance is degraded locally, about 3%, 16%, and 36% for CWI, CI, and MCWI, respectively. Each Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 May 2020 doi:10.20944/preprints202005.0356.v1

AMC phase results
AMC classifier refers to a specific class of the received signal. AMC1, AMC2, AMC3, and AMC4, are respectively referred to SoI, SoI + CW I, SoI + MCW I, and SoI + CI.  Table 7 demonstrates the effect GA-based feature selection on the second phase of classification to recognize the modulation types. As is shown, MLP + GA can precisely classify the four modulation types per each received signals using only one feature with a precision of 90.83%, 86.66%, 61.66% and 76.67% for AMC1 to AMC4 respectively.  Table 8 illustrates the effect of PCA-based feature selection on classifying the modulation types. As is shown, MLP + PCA performs efficiently using only one feature for AMC1 to 4 with an accuracy of 86%, 89.77%, 51.11%, and 67.66%, respectively. The learning process is accelerated using PCA because the classifier is trained with fewer features.

Comparative classification results
This section includes a summary of the results for RFI classification and characterization. Figure  4 demonstrates the comparative analysis of deploying the proposed approaches for the first level classification to recognize the type of received signals. All the three approaches reach performance accuracies more than 90%, while MLP + GA with five features slightly outperforms other techniques. Figure 5 indicates the results of the proposed classifiers for the second classification process to distinguish four modulation types (QPSK, 8APSK, 16APSK and 32APSK). As the results present, in no jamming case, the highest accuracy is achieved using MLP + GA (NoF = 1) with an accuracy of 90.83%. For AMC2, MLP + PCA (NoF = 1) outperforms other techniques with a precision of 89.77%. In the presence the two other jammers such as MCWI and CI, MLP + GA performs more precisely. Figure 6 illustrates the computation times of the presented techniques for the RFI classification. GA + MLP is computationally more expensive, and as a result, the computation time is comparatively longer, while MLP + PCA is the most efficient one.

Prediction phase results
In this section, the robustness of the proposed techniques is thoroughly analyzed in detecting unseen data generated at different AWGN power ranging from −140 to −125 dBm. The performance of the classifier varies depending on the noise level and the existence of jammers. To detect the type of the received signal, the trained MLP using 10-fold cross-validation has a higher generalization Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 May 2020 doi:10.20944/preprints202005.0356.v1 accuracy in detecting unseen data generated at the same noise level as the training data. Therefore, the prediction accuracy degrades by increasing AWGN power.  Tables 10 to 13 show the effect of noise on the trained classifiers in predicting the type of new data. As is shown, two main factors concerning the prediction accuracy are the type of jammer and noise power. Apparently, in the presence of MCWI and CI, the classifiers cannot perform precisely.

Conclusion
In this study, a novel hierarchical classifier has been proposed to facilitate the RFI classification and characterization. The proposed classifier not only classifies the type of the received signal but also determines its modulation precisely. Moreover, three robust approaches have been developed, namely, MLP+10 fold cross-validation, MLP + GA, and MLP + PCA. The results confirm that for the RFI classification phase, the classification accuracy reaches 99.81% using the MLP + GA technique, depending on the chosen batch size. In general, MLP + GA performs more precisely in determining the modulation type and RFI classification. Besides, the results approve that PCA feature selection is more efficient in terms of computation time and computational complexity versus GA. In future studies, we intend to look at deep learning-based classification techniques using raw received data in order to avoid complicated feature extraction and selection steps.
Author Contributions: The overall study supervised by R.J.L; Methodology, Software and preparing the original draft by S.U; review and editing by N.N; The results were analyzed and validated by R.J.L and N.N. All authors have read and agreed to the published version of the manuscript.