Radio Frequency Fingerprinting for Frequency Hopping Emitter Identiﬁcation

: In a frequency hopping spread spectrum (FHSS) network, the hopping pattern plays an important role in user authentication at the physical layer. However, recently, it has been possible to trace the hopping pattern through a blind estimation method for frequency hopping (FH) signals. If the hopping pattern can be reproduced, the attacker can imitate the FH signal and send the fake data to the FHSS system. To prevent this situation, a non-replicable authentication system that targets the physical layer of an FHSS network is required. In this study, a radio frequency ﬁngerprinting-based emitter identiﬁcation method targeting FH signals was proposed. A signal ﬁngerprint (SF) was extracted and transformed into a spectrogram representing the time–frequency behavior of the SF. This spectrogram was trained on a deep inception network-based classiﬁer, and an ensemble approach utilizing the multimodality of the SFs was applied. A detection algorithm was applied to the output vectors of the ensemble classiﬁer for attacker detection. The results showed that the SF spectrogram can be effectively utilized to identify the emitter with 97% accuracy, and the output vectors of the classiﬁer can be effectively utilized to detect the attacker with an area under the receiver operating characteristic curve of 0.99.


Introduction
The most important task in user authentication of a wireless communication system is to identify the emitter information of RF signals. A common way to confirm the emitter information, that is, the emitter ID, is to decode the address field of the medium access control (MAC) frame [1]. However, under this digitized information-based authentication process on a MAC layer, an attacker can possess the address information and imitate it as an authenticated user. To prevent this weakness, a physical layer authentication process, namely radio frequency (RF) fingerprinting, has been studied in recent years [2].
RF fingerprinting is an identification technique that utilizes a signal fingerprint (SF) to identify the unique emitter source of an RF signal. In the manufacturing of RF components inside an emitter, process tolerance is inevitable. These tolerances affect subtle differences in the features of the emitted RF signal. Because these process tolerances are not reproducible, an SF can act as the fingerprint of an emitter. It can also be utilized as a non-replicable authentication key to identify the authenticated user [3].
RF fingerprinting can be used to distinguish SFs in RF signals [4][5][6][7][8]. A conventional approach is to design a handcrafted feature from the SFs based on domain knowledge. In [4], the statistical moment and entropy were calculated from spectrograms of the transient signal to identify Bluetooth devices. In [5], statistical moments were calculated from current FH signal before it is passed to the MAC layer. The RFEI method can achieve system enhancement by being applied to the user authentication process. As the key of the RFEI method, that is, the SF, is generated by the process tolerances during the manufacturing process, the attacker cannot reproduce it. By detecting these attackers based on the SFs, non-replicable authentication systems can be achieved wherein the receiver can reject FH signals even if an attacker knows the hopping pattern and the digital key. The RFEI method consists of four steps: SF extraction (SFE, Section 3.1), time-frequency feature extraction (TFFE, Section 3.2), user emitter classification (UEC, Section 3.3), and attacker emitter detection (AED, Section 3.4). As a preprocessing step, the target hop signal is down-converted to the baseband based on the hopping pattern known to the receiver. The baseband hop signal is passed to the SFE step to extract the analog SFs, i.e., rising transient (RT), steady state (SS), and falling transient (FT) signals are extracted. The SF is provided to the TFFE step to transform the SF into the time-frequency domain, i.e., the spectrogram. The spectrogram is provided to the UEC stage to train and test the spectrogram on a custom deep inception network (DIN)-based classifier. In addition, the ensemble approach is applied to exploit the multimodality of the analog SFs. Finally, the classifier output vector is provided to the AED step in which a detection algorithm is applied to detect the FH signal of the attacker. The novelties of this study are that (1) RF fingerprinting methods were evaluated targeting for FH signals, (2) the ensemble approach was applied to utilize the multimodality of SFs, and (3) the RFEI framework was employed to identify users and detect attackers simultaneously.
The RFEI algorithm was evaluated on a few SFs and ensemble-based approaches. The algorithm compares to well-designed baselines inspired by recent approaches described in the RF fingerprinting literature [4,5,7,8]. The experiments were performed using an actual FH dataset to evaluate the reliability of the algorithm. The results confirm that the proposed DIN classifier could improve the emitter ID identification accuracy by more than 1% compared to the baseline (Section 5.1). In addition, the multimode SF ensemble approach proved to be the most effective, achieving the best results with 97.0% identification accuracy for the seven FHSS emitters (Section 5.2). Regarding the detection performance, the classifier output vector of the outliers exhibited a much lower value than those of the training sample. By utilizing these differences, the detector based on the DIN-based ensemble classifier can improve the area under the receiver operating characteristic curve (AUROC) from 0.97 to 0.99 compared to the baseline. This result indicates that the classifier output vectors can effectively be used to detect the attacker signal input (Section 5.4). The RFEI method consists of four steps: SF extraction (SFE, Section 3.1), time-frequency feature extraction (TFFE, Section 3.2), user emitter classification (UEC, Section 3.3), and attacker emitter detection (AED, Section 3.4). As a preprocessing step, the target hop signal is down-converted to the baseband based on the hopping pattern known to the receiver. The baseband hop signal is passed to the SFE step to extract the analog SFs, i.e., rising transient (RT), steady state (SS), and falling transient (FT) signals are extracted. The SF is provided to the TFFE step to transform the SF into the time-frequency domain, i.e., the spectrogram. The spectrogram is provided to the UEC stage to train and test the spectrogram on a custom deep inception network (DIN)-based classifier. In addition, the ensemble approach is applied to exploit the multimodality of the analog SFs. Finally, the classifier output vector is provided to the AED step in which a detection algorithm is applied to detect the FH signal of the attacker. The novelties of this study are that (1) RF fingerprinting methods were evaluated targeting for FH signals, (2) the ensemble approach was applied to utilize the multimodality of SFs, and (3) the RFEI framework was employed to identify users and detect attackers simultaneously.
The RFEI algorithm was evaluated on a few SFs and ensemble-based approaches. The algorithm compares to well-designed baselines inspired by recent approaches described in the RF fingerprinting literature [4,5,7,8]. The experiments were performed using an actual FH dataset to evaluate the reliability of the algorithm. The results confirm that the proposed DIN classifier could improve the emitter ID identification accuracy by more than 1% compared to the baseline (Section 5.1). In addition, the multimode SF ensemble approach proved to be the most effective, achieving the best results with 97.0% identification accuracy for the seven FHSS emitters (Section 5.2). Regarding the detection performance, the classifier output vector of the outliers exhibited a much lower value than those of the training sample. By utilizing these differences, the detector based on the DIN-based ensemble classifier can improve the area under the receiver operating characteristic curve (AUROC) from 0.97 to 0.99 compared to the baseline. This result indicates that the classifier output vectors can effectively be used to detect the attacker signal input (Section 5.4).
The remainder of this study is organized as follows. The problem formulation is presented in Section 2. The details of the RFEI method are described in Section 3, and the baseline algorithms are explained in Section 4. The results, a discussion, and other details of the experiments are described in Section 5. The conclusion is presented in Section 6.
The remainder of this study is organized as follows. The problem formulation is presented in Section 2. The details of the RFEI method are described in Section 3, and the baseline algorithms are explained in Section 4. The results, a discussion, and other details of the experiments are described in Section 5. The conclusion is presented in Section 6.

Frequency Hopping Signals of Frequency Hopping Spread Spectrum Network
In this study, we consider an FHSS network in which K FH signals are observed in a single receiver. To consider the ability of attackers to imitate FH signals similar to those of an authenticated user, we assume that the h th hopping times of the k th FH signals k h t have the same value, that is, the FH signals hop simultaneously. An example of an FHSS network with the two different FH signals is presented in Figure 2.
From Equation (1), all the K FH signals simultaneously observed by a single receiver can be defined as follows  A single FH signal is defined as follows where x k (t) is the FH signal emitted by the th emitter, a k is the amplitude and is the hopping frequency of the kth FH signal x k (t), and ϕ k (t) is the phase difference modulated by the kth message signal m k (t). When the message signal is modulated with frequency modulation (FM), the phase difference is defined as follows From Equation (1), all the K FH signals simultaneously observed by a single receiver can be defined as follows where y(t) is the observed RF signal and n(t) is the additive white Gaussian noise (AWGN) present in the channel environment. The FH signal is observed during the observation time T. During this time, a total of H hops are observed. Within a single hop duration of the hth hop signal, t h ≤ t < t h+1 , the hopping frequency f k (t) is held constant at f k h , denoting the hth hopping frequency of the kth FH signal. Thus, Equations (1) and (3) can, respectively, be reformulated as follows where x k h (t) is the hth hop signal of the kth FH signal and y h (t) is the observed RF signal during the hth hop duration, t h ≤ t < t h+1 , where a total of K hop signals exist.

User Authentication in Frequency Hopping Spread Spectrum Networks
In an FHSS network, the core process for user authentication can be performed in two steps: (1) determining whether or not the appropriate hopping frequency is measured, and (2) determining whether or not the header information of the MAC frame is correct.
Because we assume that the attacker can reproduce the predefined hopping pattern .., f k H , the imitated FH signal will display the same hopping frequency pattern. The imitated FH signal will be demodulated and passed through the MAC layer, that is, Step 1 is disabled. The process of inspecting the address field in the MAC header remains. However, because this address information has been digitized, the attacker can possess and imitate this address field. If an attacker sends an address field similar to an authenticated emitter, there is no way to detect and prevent it. Therefore, the emitter identification process based only on header information of the MAC frame is not sufficient to reject the imitated FH signal.

Emitter Identification Based User Authentication in Frequency Hopping Spread Spectrum Networks
We propose a non-replicable authentication system that operates on the physical layer of the FHSS network presented in Figure 3 and Algorithm 1. By adding the emitter identification framework within the authentication process, we can achieve an enhanced physical layer authentication system for the FHSS network by verifying (1) whether or not the appropriate hopping frequency is measured, (2) whether the emitter ID of the current FH signal is an authenticated user or attacker, and (3) whether or not the header information of the MAC frame is correct.
In this study, our target was to evaluate the RFEI framework for the FH signals corresponding to Step 2 of Algorithm 1. We intended to develop an algorithm to estimate the emitter ID from the baseband FH signal such that where s k h (t) is the baseband hop signal down-converted from the hop signal x k h (t) and k is the emitter ID estimated from the RFEI algorithm F RFEI .  Algorithm 1. Non-replicable authentication system for the physical layer of the FHSS network.

Input:
The observed RF signal ( ) y t Figure 3. Block diagram of the RFEI-based non-replicable authentication system.
As the receiver knows the hopping frequency, f k h , the target hop signal, x k h (t) can be extracted from the observed FH signal, y h (t). This approach is reasonable as the FH signal must be demodulated to an intermediate frequency (IF) or baseband and passed to the MAC layer to decode the digital data modulated by the message signal, m k (t). The SFs are non-replicable differences dependent on the manufacturing process of the emitter. Therefore, the SFs are independent of the hopping frequency and should be in the baseband of the hop signal, s k h (t).
Algorithm 1. Non-replicable authentication system for the physical layer of the FHSS network.

Input:
The observed RF signal y(t) For each hop duration, t h ≤ t < t h+1 do: Step1: Extract and down-convert the target hop signal x k h (t) to the baseband hop signal s k h (t) from the observed signal y h (t) based on a predefined hopping pattern f k h . If RFEI is activated do: Step 2-1: Estimate the emitter ID based on the RFEI algorithm on (7) Step 2-2: Pass the hop signal x k h (t) when the emitter ID k is an authenticated emitter ID. Step 2-3: Reject the hop signal x k h (t) when the emitter ID k is an attacker's emitter ID.
Step 3: Send all passed baseband hop signals s k h (t) to the next step, i.e., the MAC frame inspection.

Output:
The authenticated baseband signal x k (t).

Proposed RF Fingerprinting-Based Emitter Identification Method
The RFEI algorithm is implemented as follows.
• SF extraction: An SF is an RF signal that contains feature information for emitter ID identification. It can be any signal involved in the demodulation process for communication. However, the SF used in this study focused on analog SF, i.e., RT, SS, and FT signals. • Time-frequency feature extraction: A feature is a set of values containing physical measurements that can ensure robust classification. Any feature having a physical meaning can be applied from statistical moments to a raw preamble signal. In this study, a spectrogram of the SF was considered. • User emitter classification: Classification is a decision process in which an emitter ID can be estimated from an input feature. A classifier was trained and tested on a large set of extracted features. Subsequently, the emitter ID was estimated from the classifier output vector. In this study, we consider a discriminative classifier model from a support vector machine (SVM) to a DIN-based ensemble classifier. • Attacker emitter detection: This detection process enables the classifier to search whether the input feature has been trained for the classifier. The difference between the classifier output characteristics of the trained and outlier samples can be utilized. In this study, a simple but effective threshold based approach was applied.
The RFEI method can be formulated as a classification problem using the following expression y = F RFEI (s) (8) where s = [s(T s ), s(2T s ), ..., s(NT s )] ∈ C N×1 is a baseband hop signal sampled by the sampling period T s . The vector representation of the signal is now used in this study for convenience. Further, N is the length of a complex-valued baseband hop signal, F RFEI is a mapping function from the signal space to the ID space referencing the RFEI algorithm, and y ∈ R C×1 is the output vector of the algorithm containing the emitter ID information, where C is the number of emitters trained on the algorithm.

Signal Fingerprint Extraction
The SF can be defined as any subtle differences in the demodulation and decoding of the FH signal, which can uniquely identify the emitter ID. However, in this study, our objective was to identify the emitter ID before passing through the MAC layer. Thus, we targeted the analog SF that could pass the physical layer in the form of RT, SS, and FT signals. We represent them by s SF = g SF (s) (9) where g SF is the extraction function of the SF, and s SF ∈ C N SF ×1 is the SF selected from a set of possible lists, that is, SF ∈ {RT, SS, FT}. Here, N SF is the length of the SFs. Based on the definition of the SF signal in [6], the RT signal is defined as an increasing RF signal that increases from the noise level to the designed level. The SS signal is defined as a region of the RF signal that contains a modulated signal with a designed energy level, and the FT signal is defined as an inverse case of the RT signal, decreasing the RF signal from the designed energy level to the noise level.
For accurate extraction, the extraction procedure is structured based on the energy variation of the SFs. For the windowed vector s n = s[i + (n − 1)/2 × W E : i + (n + 1)/2 × W E ] with the extraction window size W E and its L 2 norm energy E n , the detection rule for the transient signals can be expressed as follows where δ is the threshold value for detecting the energy variance and T RT and T FT are the detected time indices for the RT and FT signals, respectively. A sliding window method is applied to monitor the energy variation of the incoming signal, which is then used to detect the RT and FT signals. The RT signal is detected as a signal in which the L 2 -norm energy of the window is increased by 10% or more. The FT signal is defined as a decreasing case. After detecting the RT and FT signals, the SS signal can be defined as the signal between the RT and FT signals using the following definitions: The extraction results for the SFs are presented in Figure 4.

Time-Frequency Feature Extraction
The next step is to design a feature from the SF. The purpose of this step is to transform the SF domain into a specific feature domain in which the physical measurements between different emitters could be well distinguished. In conventional approaches [4][5][6], the designed handcrafted features are calculated from signal characteristics of the SFs. In this case, the goal is to obtain a feature domain that can ensure robust classification results. However, in more recent approaches [7,8], the purpose of this step is slightly modified. The SFs are transformed into domains that can express the signal characteristics of the SFs, and the identification of a feature domain that can ensure robust classification is entrusted to the classification step based on a deep learning-based classifier. The relevant

Time-Frequency Feature Extraction
The next step is to design a feature from the SF. The purpose of this step is to transform the SF domain into a specific feature domain in which the physical measurements between different emitters could be well distinguished. In conventional approaches [4][5][6], the designed handcrafted features are calculated from signal characteristics of the SFs. In this Appl. Sci. 2021, 11, 10812 8 of 26 case, the goal is to obtain a feature domain that can ensure robust classification results. However, in more recent approaches [7,8], the purpose of this step is slightly modified. The SFs are transformed into domains that can express the signal characteristics of the SFs, and the identification of a feature domain that can ensure robust classification is entrusted to the classification step based on a deep learning-based classifier. The relevant procedure is expressed as follows where q SF is the transform function for the designed feature domain, SF are the sizes of the frequency and time indices, respectively, of the spectrogram transformed from the SF.
In this study, the time-frequency distribution of the FH signals, that is, the spectrogram, was analyzed. The spectrogram is a well-known time-frequency analysis method used to visualize the variation of the frequency components calculated from nonstationary signals [20]. The feature design strategy used in this study requires analysis of the power density behavior of the SFs in the time-frequency domain. The key idea of the FHSS system is that the carrier frequency of the FH signal hops within a predefined frequency range. Therefore, the signal characteristics must be implied in the distribution of the time-frequency domains.
A discrete-time short-time Fourier transform (STFT) is applied to compute the spectrogram of the SFs. With the sliding window w[n] with a size of W STFT , the STFT of the SFs can be calculated as follows where m = 1, 2, ..., K t SF is the time sampling point along the time axis and p = 1, 2, ..., K f SF is the frequency sampling point along the frequency axis. We set N SF as a sufficiently large value. Next, the power density behavior of the spectrogram can be represented as the magnitude squared of the STFT such that The spectrogram results are presented in Figure 5.

User Emitter Classification
The third step is to identify the emitter ID from the designed feature. The goal is to design a classification algorithm that can learn spectrograms for robust classification results. Owing to recent research in the field of deep learning, deep neural networks are well known for their abilities to extract spatial or temporal features with nonlinear computational capabilities [21]. Thus, we aimed to construct a deep learning-based classifier to train the spectrogram of the SFs. The classification process can be obtained using where f Classifier is the deep learning-based classification algorithm, and the output vector

User Emitter Classification
The third step is to identify the emitter ID from the designed feature. The goal is to design a classification algorithm that can learn spectrograms for robust classification results. Owing to recent research in the field of deep learning, deep neural networks are well known for their abilities to extract spatial or temporal features with nonlinear computational capabilities [21]. Thus, we aimed to construct a deep learning-based classifier to train the spectrogram of the SFs. The classification process can be obtained using where f Classi f ier is the deep learning-based classification algorithm, and the output vector y implies the emitter ID information k.

Base Classifier: Deep Inception Network Classifier
There are two main blocks to construct the custom deep learning-based classifier: a residual block [22] and an inception block [23]. The residual block is designed to enable flexible training as the depth of the network increases. In the case of the inception block, the main purpose is to filter out input features with different receptive field sizes. Details of the architecture and design strategies of the main blocks are described in Appendix A.
The spectrogram consists of physical measurements calculated from the SF signals. It represents the power densities of the SFs along the time-frequency axes. Thus, the subtle differences exhibited by the SFs can be anywhere on the time-frequency axes of the spectrogram, and the size of the features can be varied. To train these SFs, we aimed to filter the spectrogram on multiple scales in the temporal and spatial domains by applying inception blocks to construct a custom deep learning classifier.
We utilized the inception-A and reduction-A blocks to construct the base classifier: the DIN classifier. The inception-A and reduction-A blocks are the basic blocks for constructing the Inception-v4 models [24]. The role of the inception-A block is to filter the input features with multiple receptive field sizes and concatenate them as the filter axis, thereby expanding its dimensions. The role of the reduction-A block is to downsize the feature map on the grid side, that is, the time-frequency axes of the spectrogram. It can effectively manage the number of weights inside the classifier, similar to the pooling layer.
We adopted the inception-A and reduction-A blocks, as shown in Figure 6. The structures of the blocks are the same as defined in [24]. However, the filter sizes N F of the sublayers were set to 32 and 64, adjusted by the experiments. Batch normalization [25] and rectified linear unit activation units were applied immediately after every convolutional layer. The inception-A block was applied twice to expand the filter axis, and the reduction-A block was applied once to re-size the feature map on the grid axis. We applied these block sequences twice, adjusted by heuristic experiments. The total structure of the DIN classifier is provided in Table 1.    [24] and (b) the Reduction-A block in [24].
Finally, we obtained the deep learning classification framework, as in Equation (15). From the M training samples in S = [s 1 , s 2 , ..., s M ] and output samples Y = [y 1 , y 2 , ..., y M ], the cross-entropy loss, was applied such that loss function can be expressed as follows where c k is the true label of sample y i with the kth emitter ID, y i [c j ] is the jth element of output sample y i . Based on the cross-entropy losses, the Adam optimizer [26] is utilized to update the weights of our DIN classifier. After finishing the training of the DIN classifier, the emitter ID of input sample y i can be estimated as follows where p(c l ; s i,SF ) is the probability that the emitter ID of the input sample is c l , which can be defined as the softmax output of sample y i . In this probability, the estimated emitter ID k is defined as the maximum probability that input samples will be included in a particular emitter ID c j (see the Equation (18)).

Ensemble Approach for Multimodal Signal Fingerprints
The ensemble approach is a well-known method that ensures better generalization performance of classification models [27]. It combines the results of multiple base classifiers trained on the same training dataset and makes a final decision based on these results. Stacking is a combined method that uses the final model to combine the outputs of the base model [27]. It is useful when multimodal features are present in applications such as video signal processing where audio, video, and text segments exist simultaneously [28].
It was reported that multiple SFs, that is, the RT, SS, and FT signals, can be considered as multimodal features for an accurate RF fingerprinting model [6]. To utilize the multimodality features of the SFs, we adapted the stacking ensemble approach to the DIN model as presented in Figure 7. The SFs s SF were extracted from hop signal s in Equation (10). These SFs can act as independent features for emitter identification. Thus, each of the SFs, i.e., RT, SS, and FT, is assumed to be independent of the others. For the ensemble approach, the probability that the emitter ID is c l can be defined as follows p(c l ; s) = ∏ SF∈{RT,SS,FT} p c j ; s SF . (19) performance of classification models [27]. It combines the results of multiple base classifiers trained on the same training dataset and makes a final decision based on these results. Stacking is a combined method that uses the final model to combine the outputs of the base model [27]. It is useful when multimodal features are present in applications such as video signal processing where audio, video, and text segments exist simultaneously [28]. It was reported that multiple SFs, that is, the RT, SS, and FT signals, can be considered as multimodal features for an accurate RF fingerprinting model [6]. To utilize the multimodality features of the SFs, we adapted the stacking ensemble approach to the DIN model as presented in Figure 7. The SFs SF s were extracted from hop signal s in Equation (10).
These SFs can act as independent features for emitter identification. Thus, each of the SFs, i.e., RT, SS, and FT, is assumed to be independent of the others. For the ensemble approach, the probability that the emitter ID is l c can be defined as follows According to the DIN classifier trained on the RT, FT, and SS signals presented in Section 3.3.1, the final decision was performed by a linear combination of each base classifier (i.e., DIN classifier) such that According to the DIN classifier trained on the RT, FT, and SS signals presented in Section 3.3.1, the final decision was performed by a linear combination of each base classifier (i.e., DIN classifier) such that

Attacker Emitter Detection
The last step of the RFEI method is an outlier detection step implemented to detect the imitated FH signal. An outlier is a sample included in specific emitter IDs that is not considered during training. In this study, the imitated FH signal was the outlier. This step is aimed at detecting the differences in the classifier output characteristics between the outputs of the classifier when the trained and outlier samples are input. This objective can be achieved by comparing the classifier outputs [29][30][31], exposing the outliers during the training step to magnify the differences between the trained and outlier samples [32,33], and analyzing the likelihood of the inputs from a generative adversarial network [34,35].
The proposed outlier detection scheme is presented in Figure 8. We considered the outlier detection framework proposed in [30]. Temperature scaling [36] and the opposite application of an adversarial attack [37] have been reported to be effective in detecting outlier samples. After preprocessing the input sample, outliers can be detected when the maximum probability of the output vector is lower than the threshold. The key idea of this approach is that the output vector of the outlier represents a much smaller value than the output vector of the trained sample.
The proposed outlier detection scheme is presented in Figure 8. We considered the outlier detection framework proposed in [30]. Temperature scaling [36] and the opposite application of an adversarial attack [37] have been reported to be effective in detecting outlier samples. After preprocessing the input sample, outliers can be detected when the maximum probability of the output vector is lower than the threshold. The key idea of this approach is that the output vector of the outlier represents a much smaller value than the output vector of the trained sample. Utilizing this approach, we constructed the outlier detector to alert the signal input when the imitated FH signal was input by performing two steps: (1) calibration of the output vector of the classifier by a temporal scaling factor, s T , and (2) comparison of the maximum probability of the output vector to the outlier detection threshold, λ . In this study, opposite application of the adversarial attack was not performed because a small perturbation of the input sample may affect the SFs, defined as subtle differences in the FH signal. Mathematically, the temporal scaling process was applied to Equation (17) such that Utilizing this approach, we constructed the outlier detector to alert the signal input when the imitated FH signal was input by performing two steps: (1) calibration of the output vector of the classifier by a temporal scaling factor, T s , and (2) comparison of the maximum probability of the output vector to the outlier detection threshold, λ. In this study, opposite application of the adversarial attack was not performed because a small perturbation of the input sample may affect the SFs, defined as subtle differences in the FH signal.
Mathematically, the temporal scaling process was applied to Equation (17) In the case of the ensemble approach, the probability in Equation (19) Based on the scaled output probability, the detection rule for the outlier sample can be defined as follows where p(c out ; s, T s , λ) is the probability that the current input sample is an outlier. This detection rule is a binary classifier with trained class c train and outlier class c out . Thus, parameters T s and λ were optimized experimentally based on the minimum false positive rate (i.e., the part of the actual outliers that were misdetected as trained samples, FPR) when the true positive rate (i.e., the part of the actual trained samples that were detected as trained samples, TPR) was higher than 95%. The final version of the algorithm used for our proposed RFEI process is presented in Algorithm 2. Step 1: (Extract the target SF) while do: Detect the transient signal with Equation (10). Extract the target SF s SF with Equation (11).
Step 2: (Calculate the spectrogram) Calculate the spectrogram s Feature of the SF with Equation (13) with respect to the BOI, BW BOI .
Step 3: (Perform emitter classification) i + W Ext. < length(s) Estimate the emitter IDs from the decision rule using either the base classifier (18) or ensemble approaches in Equation (20).
Step 4: (Perform outlier detection) Scale the output vector for temporal scaling factor T s with Equation (22) and detect the outliers with Equation (23) Output: Return the authenticated baseband hop signals s k h (t)

Baseline Algorithms for RF Fingerprinting Method
In this study, for performance comparison, three other baseline methods were carefully designed and implemented based on algorithms from the literature [4,5,7,8].
Before describing the details, we note that the signal preprocessing steps, such as preamble extraction [5] and signal difference calculation after signal decoding [7], are not covered in this study. The goal of this study was to identify the emitter ID in the physical layer of the FHSS network. Therefore, we focused on analog SFs that can be obtained from the physical layer of the system. To this end, all baseline SFs were set to RT, SS, and FT, and the feature extraction and classification processes were designed to reflect the approaches in the literature.

Baseline 1: Statistical Moments Based RF Fingerprinting
The first baseline aims to reflect the conventional RF fingerprinting approaches based on handcrafted features. It was designed for statistical moments of the SFs, similar to that in [4,5].
The SF extraction process was the same as that of the proposed method described in Section 3.1.
For feature extraction, the SFs were segmented using N seg. . Because the RT and FT signals were too short to be segmented, segmentation was applied only to the SS signal.
where s SF|n is the nth segment of SF. For each segmented SF, a total of six sub-features were considered. The instantaneous amplitude, phase, and frequency, described in [5], were calculated as sub-features, and the time, frequency, and time-frequency axes of the spectrogram, identified as good features in [4], were applied as sub-features. Subsequently, the statistical moments (i.e., mean m, variance σ 2 , skewness γ, and kurtosis κ) and entropy H were calculated for each sub-feature.
where N stats SF was the size of the statistic moments vector. For classification, a linear SVM from [4] was applied. Random forest or multi-class AdaBoost from [5] and linear discriminant analysis from [4] were also investigated. We compared these algorithms when applied to our FH signal dataset, and the linear SVM showed the best classification results.

Baseline 2: Raw Signal-Based RF Fingerprinting
The second baseline aims to reflect the recent methods of RF fingerprinting based on raw signal processing. It was designed to train raw SF signals directly in the ensemble approaches of the deep learning classifiers described in [7].
As described at the beginning of Section 4, the SF extraction process was the same as that of the proposed method described in Section 3.1.
For feature extraction, the SFs were segmented using N seg. in Equation (24). The core idea of this approach was to train the raw signals in the ensemble classifiers, and the RT and FT were also segmented in this case. The feature vectors of each segment were set to a two-channel I/Q vector s Feature|s SF|n ∈ R N raw SF ×2 such that where N raw SF is the size of each segment s SF|n . For the ensemble classification approach, the base classifier was set to a one-dimensional CNN as an identification network for outdoor data in [4]. After training each base classifier using segmented feature s Feature|s SF|n , classification was performed using an ensemble approach, as in [7] k = argmax c j ∈C ∏ n∈N seg. p c j ; s Feature|s SF|n (28)

Baseline 3: Spectrogram-Based RF Fingerprinting
The third baseline aims to reflect the recent approach in [8], which is based on the SF spectrogram. As described in [8], the author trained the Hilbert spectrum of the received hop signal in a residual unit-based deep learning classifier. To reflect this approach in baseline 3, the algorithm was designed to train an SF spectrogram directly in the residualbased deep learning classifier.
The SF extraction and feature extraction processes were the same as those of the proposed method described in Sections 3.1 and 3.2.
For classification, the classifier structure was set to the residual-based deep learning classifier described in [8]. After training the classifier, classification was performed using Equation (18).

Experimental Results and Discussion
This section describes the experimental investigation of the emitter identification performance of the proposed RF fingerprinting method. Before discussing the results, several experimental setups are discussed. A custom DA system was set up for our experiments, as shown in Figure 9. The DA system consisted of a high-speed digitizer and a Raid-0 configuration with six SSD disk drives. The digitizer, PX14400, supports sampling rates of up to 400 MHz with a 14-bit analog-to-digital converter resolution, resulting in a streaming rate of 0.7 GB/s for real-time data acquisition. With write speeds of up to 1.6 GB/s in our Raid-0 configuration, the DA system can acquire data in real-time streaming.

Experimental Results and Discussion
This section describes the experimental investigation of the emitter identification performance of the proposed RF fingerprinting method. Before discussing the results, several experimental setups are discussed. A custom DA system was set up for our experiments, as shown in Figure 9. The DA system consisted of a high-speed digitizer and a Raid-0 configuration with six SSD disk drives. The digitizer, PX14400, supports sampling rates of up to 400 MHz with a 14-bit analog-to-digital converter resolution, resulting in a streaming rate of 0.7 GB/s for realtime data acquisition. With write speeds of up to 1.6 GB/s in our Raid-0 configuration, the DA system can acquire data in real-time streaming. We collected FH signals from a real experiment to determine the reliability of the algorithm. Seven FHSS devices were used to experiment. Each device utilized the same hopping rate for secure voice communication. The FH signal was frequency-modulated, We collected FH signals from a real experiment to determine the reliability of the algorithm. Seven FHSS devices were used to experiment. Each device utilized the same hopping rate for secure voice communication. The FH signal was frequency-modulated, and the carrier frequency was set to hops in the very high frequency range. The exact hopping rate and frequency range will not be disclosed owing to security issues. The FHSS device was connected under laboratory environmental conditions. The FH signal was acquired at a 400 MHz sampling rate and stored as raw FH data in the DA system.
Target hop extraction and down-conversion were performed on the stored raw training FH data. Because we assumed the predefined hopping pattern to be known, an energy detection approach was applied to the exact hopping frequency f k h and the target hop samples x k h were extracted from the observed RF signal y. Subsequently, the hop sample was down-converted to the baseband using a decimation factor of 20, i.e., 20M sample rate baseband hop signals s k h were acquired. These were stored as baseband FH training data in the DA system. This down-conversion approach is reasonable because the FH signals were also demodulated to the IF or baseband to decode the digital data modulated by the message signal m k (t) as in Equation (2). As the SFs depend on the component characteristics of the emitter, the SFs also should exist in the baseband hop signal, s k h . Another set of FH signals was acquired to prepare an outlier dataset. Two more FHSS devices were recruited, and the FH signals were acquired on different dates compared with those of the training dataset. The emitter specifications were the same as those of the training emitter. However, in this experiment, the FH signal was down-converted to baseband and stored as outlier FH data with a sampling rate of 2.34 MHz. For fair comparison, the sampling rate of the signal was resampled using the Fourier-domain based sampling rate conversion method, which can improve the accuracy and computational cost compared to the time domain-based method [38]. These outlier data were considered only in the outlier detection experiment described in Section 5.5.
An average of 168 hop FH signals were obtained for each training emitter, and an average of 310 hop FH signals were obtained for each outlier emitter; a total of 1796 samples from nine emitters were obtained. The details are presented in Table 2.
The results were obtained using the experimental setup as follows. For the training and testing datasets, the FH dataset was partitioned according to a 7:3 ratio; a total of 823 samples were trained, and a total of 353 samples were tested from seven emitters. In the outlier detection experiment, the test dataset for training emitters and the outlier dataset for outlier emitters were considered; a total of 353 samples from seven training emitters were tested, and a total of 620 outlier samples from two outlier emitters were tested. All the results were tested 10 times, and the average performance was presented.
The experiments were conducted with an Intel i7-6850K CPU unit and an NVIDIA Titan RTX GPU unit. The dataset generation task in Figure 9 was performed using MATLAB 2018a, and all RF fingerprinting algorithms were implemented in Python 3.6 with PyTorch 1.6.0. The other implemented parameters of the experiments are described in Appendix B.

Emitter Identification Accuracy
We firstly investigated the emitter identification performance of the proposed RFEI algorithm and the baselines. All algorithms were applied to all SFs, and the mean and standard deviation of the experimental values were investigated. The results are listed in Table 3.  [4,5]. **: (Baseline 2) raw signal approach in [7]. ***: (Baseline 3) spectrogram and residual block-based approach in [8]. † : (Proposed) spectrogram, DIN classifier, and ensemblebased approach in the proposed method. Table 3 demonstrates the efficiency of the proposed RFEI algorithm showing that the proposed algorithm for identifying the emitter ID based on the SS spectrogram and DIN base classifier performs with an accuracy of 95.3%, which is better than other baseline algorithms. In addition, the ensemble approach of RT, FT, and SS based on the proposed algorithm yielded an accuracy of 97.0%, demonstrating its efficiency with a higher identification accuracy than other baseline algorithms.
In terms of the SF efficiency, the results show that the SS signal is the most effective SF, as it is more accurate than the RT and FT signal-based results. In addition, in terms of the efficiencies of the feature extraction and classification approaches, the spectrogram feature is effective for representing the differences in the SF for each emitter. The most effective means of identifying the emitter ID in the FH signals is to ensemble the multimodal SFs, i.e., the RT, FT, and SS, trained by a DIN.
The emitter identification performance at SNRs is shown in Figure 10.  29) where N and σ 2 n are the length and variance of the noise signal n(t), respectively.
In terms of the SF efficiency, the results show that the SS signal is the most effective SF, as it is more accurate than the RT and FT signal-based results. In addition, in terms of the efficiencies of the feature extraction and classification approaches, the spectrogram feature is effective for representing the differences in the SF for each emitter. The most effective means of identifying the emitter ID in the FH signals is to ensemble the multimodal SFs, i.e., the RT, FT, and SS, trained by a DIN.
The emitter identification performance at SNRs is shown in Figure 10. The AWGN signal ( ) n t can be artificially added to the received hop signal s as follows Figure 10. Emitter identification accuracy at different signal-to-noise ratios (SNRs). Figure 10. Emitter identification accuracy at different signal-to-noise ratios (SNRs).
We found that the classification accuracy obtained by applying the proposed method to the SS signal was nearly 3% above baselines 1 and 2 and at least 1% above baseline 3 over the entire range of SNRs. In addition, the ensemble approach of the proposed algorithm can improve the accuracy by more than 1% compared to the proposed method. In particular, applying the proposed method to the SS signal at 20 dB SNR, which is the typical operating SNR of the FHSS network [39], yielded an accuracy of more than 95.0%. For the ensemble approach, the identification accuracy was measured to be greater than 96.4%, making it the most effective algorithm. These accuracies are higher than those of baseline 1 (87.5%), baseline 2 (89.8%), and baseline 3 (92.5%).
The validity of the proposed algorithm was verified again. At low SNR, the accuracy of baseline 1 decreases dramatically, whereas the other algorithms maintain their accuracies. Baseline 2, baseline 3, and the proposed method work well when applied to the SS signal, even at low SNRs. However, the proposed method outperforms the baselines, and the ensemble approaches outperform the other algorithms at all SNRs. These findings imply that a deep learning-based classifier at baselines 2 and 3 can learn the differences in the SFs for RF fingerprinting, but our proposed algorithm (i.e., using the spectrogram and DIN classifier) with the ensemble approach is more effective than the baselines.
The confusion matrix of the ensemble approach based on the proposed method is presented in Table 4. The confusion matrix is a specific metric for a classifier that can represent the relationship of each emitter. This matrix can be obtained by simply counting the results of the test samples with their true label information. The rows of the matrix indicate the true emitter IDs, and the columns indicate the predicted emitter IDs. The diagonal terms in the confusion matrix represent the correct classification result cases, and the off-diagonal terms represent the incorrect classification result cases. Thus, Table 4 shows that our ensemble approach based on the proposed method can identify the FH emitters with more than 94.6% accuracy without confusion between emitters.

Efficiency of the Inception Blocks
We constructed the DIN classifier based on the inception blocks. To confirm the efficiency of the inception blocks, the identification accuracy of the proposed method was compared with that of baseline 3. The difference between the proposed method and baseline 3 lies in the classifier. As in baseline 3, the classifier was set to the residual-based classifier described in [8]. Two experiments were performed for comparison. One was conducted to identify the emitter ID from the received hop signal s without the SF extraction, and the other was performed to identify the emitter ID from the ensemble approach of the SFs. The results are presented in Table 5 and Figure 11.  Table 5. Identification accuracies of the residual and inception blocks.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 19 of 27 Table 5. Identification accuracies of the residual and inception blocks.   Table 5 presents the identification accuracies of the proposed algorithm and baseline 3. The identification accuracy results at different SNRs are presented in Figure 11. Both sets of results demonstrate the efficiency of the inception blocks. Table 5 reveals that the DIN-based approach can produce higher accuracies than the residual-based approach. This result is also shown in Figure 11. As the SNR changes, the accuracy of the DIN-based approach is superior to that of the residual block-based approach, except when the ensemble approach of the residual-based method overcomes the hop and DIN-based method in environments with SNRs of 20 dB or more. However, if we focused on the classifier structure, i.e., compared the performance between hops approaches or ensemble approaches, the performance of the residual network could not overcome the performance of the inception blocks. As described in Section 3.3.1, this result may stem from the fact that filtering features with different receptive field sizes can help train SFs within deep learning architectures.  Table 5 presents the identification accuracies of the proposed algorithm and baseline 3. The identification accuracy results at different SNRs are presented in Figure 11. Both sets of results demonstrate the efficiency of the inception blocks. Table 5 reveals that the DIN-based approach can produce higher accuracies than the residual-based approach. This result is also shown in Figure 11. As the SNR changes, the accuracy of the DINbased approach is superior to that of the residual block-based approach, except when the ensemble approach of the residual-based method overcomes the hop and DIN-based method in environments with SNRs of 20 dB or more. However, if we focused on the classifier structure, i.e., compared the performance between hops approaches or ensemble approaches, the performance of the residual network could not overcome the performance of the inception blocks. As described in Section 3.3.1, this result may stem from the fact that filtering features with different receptive field sizes can help train SFs within deep learning architectures.

Class Activation Map (CAM) Analysis of the DIN Classifier
We investigated the feature map of the DIN classifier to understand why the DINbased model works well. To this end, we applied a gradient-weighted CAM (GCAM) to visualize the feature map. The GCAM is a well-known feature visualization that identifies parts of the input signal that positively influence the class decision [40]. This can be achieved by back-propagating the gradient of the inference to the input layer and highlighting the input parts using positive gradient values. The details of the GCAM are described in Appendix C.
The average GCAM (AGCAM) results are presented in Figure 12. Interestingly, for each emitter classification, we found that the activated region of the AGCAM is the location at which the head and tail of the signal are located. The GCAM of the positive sample with an inference score of 0.99 or higher is shown in Figure 12b. These results show that when the classifier model correctly identifies the emitter ID, the filter maps of the model are activated similarly to the AGCAM of the target emitter. In other words, the intensity of the activated region differs from that of the AGCAM, but the shape and location of the activated region are similar to those in the AGCAM results. Conversely, the GCAM of the negative sample with an inference score of 0.30 or less is shown in Figure 12c. The results demonstrate that when the model misidentifies the emitter ID, the activated region of the filter maps is completely different from those of the target emitter and other emitters. The average GCAM (AGCAM) results are presented in Figure 12. Interestingly, for each emitter classification, we found that the activated region of the AGCAM is the location at which the head and tail of the signal are located. The GCAM of the positive sample with an inference score of 0.99 or higher is shown in Figure 12b. These results show that when the classifier model correctly identifies the emitter ID, the filter maps of the model are activated similarly to the AGCAM of the target emitter. In other words, the intensity of the activated region differs from that of the AGCAM, but the shape and location of the activated region are similar to those in the AGCAM results. Conversely, the GCAM of the negative sample with an inference score of 0.30 or less is shown in Figure 12c. The results demonstrate that when the model misidentifies the emitter ID, the activated region of the filter maps is completely different from those of the target emitter and other emitters. To verify the meaning of the activated region in Figure 12, the physical layer convergence protocol (PLCP) frame format for the FHSS network as defined in the 802.11 standard [10] is presented in Figure 13. It can be verified that the preamble field is located at the head part of the frame, and the frame body is located at the tail part of the frame.
The preamble is a sequential signal for synchronization between the transmitter and receiver. Therefore, duplicated sync sequences must be transmitted repeatedly. When the data sequence contained in the frame is identical for each emitter, only the differences in SFs remain, which is helpful in RF fingerprinting. Consequently, many researchers have applied additional preprocessing steps to extract preamble signals [5,6]. However, the proposed method based on the DIN classifier can automatically learn the preamble field without additional preprocessing steps.
In the cases of the frame body, the GCAM is activated in this region because the FH signal dataset is collected in a laboratory environment; hence, the data sequences contained in the frame body are similar to each other. This similarity of the data sequences can help identify the differences in the SFs of the emitters. Again, the proposed method can automatically learn the fields in which the emitter IDs can be identified. To verify the meaning of the activated region in Figure 12, the physical layer convergence protocol (PLCP) frame format for the FHSS network as defined in the 802.11 standard [10] is presented in Figure 13. It can be verified that the preamble field is located at the head part of the frame, and the frame body is located at the tail part of the frame.
The preamble is a sequential signal for synchronization between the transmitter and receiver. Therefore, duplicated sync sequences must be transmitted repeatedly. When the data sequence contained in the frame is identical for each emitter, only the differences in SFs remain, which is helpful in RF fingerprinting. Consequently, many researchers have applied additional preprocessing steps to extract preamble signals [5,6]. However, the proposed method based on the DIN classifier can automatically learn the preamble field without additional preprocessing steps.
In the cases of the frame body, the GCAM is activated in this region because the FH signal dataset is collected in a laboratory environment; hence, the data sequences contained in the frame body are similar to each other. This similarity of the data sequences can help identify the differences in the SFs of the emitters. Again, the proposed method can automatically learn the fields in which the emitter IDs can be identified. without additional preprocessing steps.
In the cases of the frame body, the GCAM is activated in this region because the FH signal dataset is collected in a laboratory environment; hence, the data sequences contained in the frame body are similar to each other. This similarity of the data sequences can help identify the differences in the SFs of the emitters. Again, the proposed method can automatically learn the fields in which the emitter IDs can be identified.

Outlier Detection Performance
We evaluated the outlier detection performance of the proposed algorithm and baseline 3. The experimental dataset was prepared using the test dataset for trained emitters and the outlier dataset for outlier emitters. Before executing the experiment, the detectorrelated parameters, that is, the temporal scaling factor T s and detection threshold λ, were optimized. For T s ∈ [1, 2,3,4,5,10,15,20] and 0 ≤ λ ≤ 0.3, the parameters were set to T s = 2 and λ = 0.07 for the proposed algorithm and T s = 1 and λ = 0.05 for proposed baseline 3. These values were selected by finding the minimum FPR when the TRP was higher than 95%.
As discussed in Section 3.4, the key idea of the outlier detector is based on the fact that the maximum probability of the output vectors from the outlier samples has a smaller value than the maximum probability of the output vectors from the trained samples. To verify this idea, we plotted a histogram in Figure 14 showing the maximum probabilities of the output vectors obtained from the proposed method, i.e., the output vectors of the ensemble approach in the DIN. Evidently, the maximum probability values of the outlier samples occur at positions < 0.1. Conversely, the values of trained samples mostly exist at position ≥ 0.2. These results demonstrate that the differences between the characteristics of the outliers and trained samples are easily identified and can be utilized to detect the outlier samples.

Outlier Detection Performance
We evaluated the outlier detection performance of the proposed algorithm and baseline 3. The experimental dataset was prepared using the test dataset for trained emitters and the outlier dataset for outlier emitters. Before executing the experiment, the detectorrelated parameters, that is, the temporal scaling factor s T and detection threshold λ , for proposed baseline 3. These values were selected by finding the minimum FPR when the TRP was higher than 95%.
As discussed in Section 3.4, the key idea of the outlier detector is based on the fact that the maximum probability of the output vectors from the outlier samples has a smaller value than the maximum probability of the output vectors from the trained samples. To verify this idea, we plotted a histogram in Figure 14 showing the maximum probabilities of the output vectors obtained from the proposed method, i.e., the output vectors of the ensemble approach in the DIN. Evidently, the maximum probability values of the outlier samples occur at positions 0.1 < . Conversely, the values of trained samples mostly exist at position 0.2 ≥ . These results demonstrate that the differences between the characteristics of the outliers and trained samples are easily identified and can be utilized to detect the outlier samples. We present the confusion matrices of the outlier detectors based on the proposed method and baseline 3 in Tables 6 and 7. As we optimized our parameters based on the FPR values when the TPR was higher than 95.0%, both TPRs yielded similar rates in the detection of the actual trained samples. However, in the case of the true negative ratio, which represents the actual outlier sample detection ability, the proposed method can achieve a rate of 95.6%, which is 6.6% higher than that of baseline 3 (89.0%). In other words, the proposed method can reduce the FPR from 11.0% to 4.4%. These results indicate that the DIN classifier-based approach is useful for training SF features in FH signals and can effectively detect outlier samples by using these trained features. We present the confusion matrices of the outlier detectors based on the proposed method and baseline 3 in Tables 6 and 7. As we optimized our parameters based on the FPR values when the TPR was higher than 95.0%, both TPRs yielded similar rates in the detection of the actual trained samples. However, in the case of the true negative ratio, which represents the actual outlier sample detection ability, the proposed method can achieve a rate of 95.6%, which is 6.6% higher than that of baseline 3 (89.0%). In other words, the proposed method can reduce the FPR from 11.0% to 4.4%. These results indicate that the DIN classifier-based approach is useful for training SF features in FH signals and can effectively detect outlier samples by using these trained features.  Figure 15 plots the ROC curve and compares the AUROCs. As was done for the previously presented results in Section 5, the values were averaged over 10 experiments. The ROC metric describes the relationship between the probability of detection (i.e., TPR) and the probability of a false alarm (i.e., FPR). This result can be achieved by plotting the FPR together with the TPR at different detector thresholds λ. Additionally, this ROC metric is known as the cost-benefit relationship in decision theory. Thus, when a high benefit is obtained at a low cost, i.e., when the probability of false alarms is low, high detection rates should be obtained. In other words, if the curve moves toward the upper left with a high AUROC, the model possesses strong detection ability. The results confirm that the proposed method can clearly improve the ROC curve compared with baseline 3. The AUROC also improves from 0.97 to 0.99. These results provide clear evidence that the proposed DIN-based ensemble method is more effective than the residual blockbased method.

Predicted Emitter (%) Learned Classes
Outlier Classes Actual emitter (%) Learned classes 96.8 3.2 Outlier classes 11.0 89.0 Figure 15 plots the ROC curve and compares the AUROCs. As was done for the previously presented results in Section 5, the values were averaged over 10 experiments. The ROC metric describes the relationship between the probability of detection (i.e., TPR) and the probability of a false alarm (i.e., FPR). This result can be achieved by plotting the FPR together with the TPR at different detector thresholds λ . Additionally, this ROC metric is known as the cost-benefit relationship in decision theory. Thus, when a high benefit is obtained at a low cost, i.e., when the probability of false alarms is low, high detection rates should be obtained. In other words, if the curve moves toward the upper left with a high AUROC, the model possesses strong detection ability. The results confirm that the proposed method can clearly improve the ROC curve compared with baseline 3. The AUROC also improves from 0.97 to 0.99. These results provide clear evidence that the proposed DIN-based ensemble method is more effective than the residual block-based method.

Conclusions
In this study, an RFEI method that targets the physical layers of FHSS networks was proposed with the objective of directly identifying emitter IDs from received FH signals. An analog SF extraction process, SF spectrogram features, a DIN-based classifier for emitter classification, and an outlier detector algorithm for attacker detection were proposed and applied to the target hop signals. In addition, the ensemble approach that utilized multimodality SFs was evaluated for robust classification. The results showed that the SF spectrogram extracted from the received FH signal can be effectively analyzed using the

Conclusions
In this study, an RFEI method that targets the physical layers of FHSS networks was proposed with the objective of directly identifying emitter IDs from received FH signals. An analog SF extraction process, SF spectrogram features, a DIN-based classifier for emitter classification, and an outlier detector algorithm for attacker detection were proposed and applied to the target hop signals. In addition, the ensemble approach that utilized multimodality SFs was evaluated for robust classification. The results showed that the SF spectrogram extracted from the received FH signal can be effectively analyzed using the DIN-based classifier, and the classification accuracy was improved by at least 1.00% compared with those of other baselines. In addition, the multimodal SF ensemble approach, that is, the use of RT, FT, and SS, achieved the best results with a classification accuracy of 97.0% for the seven real FHSS emitters. In addition, the inception block-based approach was more effective than the residual block-based approach owing to its filtering ability at different receptive field sizes. From the analysis of the GCAM for each FH emitter, we found that the classifier model can train the region wherein the differences in the SFs can be maximized. In addition, the outlier detection performance of the proposed method was evaluated. We found that the output characteristics of the outliers differed from those of the training samples, and this property can be used by the detector to identify attacker signals with an AUROC of 0.99.
These results support that the proposed RFEI method can identify emitter IDs of the FH signals emitted by authenticated users and can detect the existence of the FH signals emitted by attackers. Because the SFs cannot be reproduced, it is possible to configure non-replicable authentication systems in the physical layer of the FHSS network. This study focused on evaluating the RFEI method, one of the components of the overall authentication system. Our future study will consider system improvement by utilizing the GCAM to detect misclassification cases.
As another future study, we will consider the property of the outliers in the RFEI system. We believe that further distinctions of the outliers, namely the detection of multilabeled outliers, may be possible. We expect that this future consideration will help prevent the malicious application of the RFEI system, such as when eavesdroppers utilize the RFEI system. If the eavesdropper can successfully prepare the target FH sample, it can be used as a signal tracking method to decode the actual FH signal transmission. Our future study will consider the ways to prevent this malicious scenario by generating artificial outliers that can imitate authentication users. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable. Due to security issues, the FHSS datasets are not disclosed.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study, the writing of the manuscript, or the decision to publish the results. However, the funders helped prepare the FHSS emitters for data collection, analysis, and interpretation.

Appendix A. Architecture and Design Strategies of the Main Blocks
Appl. Sci. 2021, 11, x FOR PEER REVIEW 24 of 27

Appendix A. Architecture and Design Strategies of the Main Blocks
(a) (b) Figure A1. Basic block for constructing the deep learning classifier used in this study: (a) the residual block [22] and (b) the inception block [23].
The custom deep learning-based classifier utilized in our study consists of two main blocks: a residual block [22] and an inception block [23]. The architecture of these blocks is shown in Figure A1.
The design strategy of the residual block is to handle the degradation problem as the network goes deeper [22]. The residual block contains skip connections between adjacent convolutional layers and helps mitigate the vanishing gradient problem. The goal of the residual network is to allow flexible training of the features as the network depth increases.
The design strategy of the inception block involves calculating features with different filter sizes in the same layer [23]. The inception block contains parallel convolutional layers with different filter sizes. The results for each layer are concatenated in the filter axis and pass through the next layer. These parallel connections can extract features with multiple receptive field sizes, which are useful when the features vary in location and size.
The spectrogram contains the physical measurements of the SF signals. It represents the power densities of the SF signals along the time-frequency axes. To train these twodimensional density behaviors of the SF signals, we aimed to filter the spectrogram on multiple filter scales in the temporal and spatial domains by applying inception blocks.

Appendix B. Implemented Parameter Settings in Experiments
The implemented parameters of the RF fingerprinting algorithms performed at our experiments are described in Table A1.  The custom deep learning-based classifier utilized in our study consists of two main blocks: a residual block [22] and an inception block [23]. The architecture of these blocks is shown in Figure A1.
The design strategy of the residual block is to handle the degradation problem as the network goes deeper [22]. The residual block contains skip connections between adjacent convolutional layers and helps mitigate the vanishing gradient problem. The goal of the residual network is to allow flexible training of the features as the network depth increases.
The design strategy of the inception block involves calculating features with different filter sizes in the same layer [23]. The inception block contains parallel convolutional layers with different filter sizes. The results for each layer are concatenated in the filter axis and pass through the next layer. These parallel connections can extract features with multiple receptive field sizes, which are useful when the features vary in location and size.
The spectrogram contains the physical measurements of the SF signals. It represents the power densities of the SF signals along the time-frequency axes. To train these twodimensional density behaviors of the SF signals, we aimed to filter the spectrogram on multiple filter scales in the temporal and spatial domains by applying inception blocks.

Appendix B. Implemented Parameter Settings in Experiments
The implemented parameters of the RF fingerprinting algorithms performed at our experiments are described in Table A1.