Securing the Insecure: A First-Line-of-Defense for Body-Centric Nanoscale Communication Systems Operating in THz Band

This manuscript presents a novel mechanism (at the physical layer) for authentication and transmitter identification in a body-centric nanoscale communication system operating in the terahertz (THz) band. The unique characteristics of the propagation medium in the THz band renders the existing techniques (say for impersonation detection in cellular networks) not applicable. In this work, we considered a body-centric network with multiple on-body nano-senor nodes (of which some nano-sensors have been compromised) who communicate their sensed data to a nearby gateway node. We proposed to protect the transmissions on the link between the legitimate nano-sensor nodes and the gateway by exploiting the path loss of the THz propagation medium as the fingerprint/feature of the sender node to carry out authentication at the gateway. Specifically, we proposed a two-step hypothesis testing mechanism at the gateway to counter the impersonation (false data injection) attacks by malicious nano-sensors. To this end, we computed the path loss of the THz link under consideration using the high-resolution transmission molecular absorption (HITRAN) database. Furthermore, to refine the outcome of the two-step hypothesis testing device, we modeled the impersonation attack detection problem as a hidden Markov model (HMM), which was then solved by the classical Viterbi algorithm. As a bye-product of the authentication problem, we performed transmitter identification (when the two-step hypothesis testing device decides no impersonation) using (i) the maximum likelihood (ML) method and (ii) the Gaussian mixture model (GMM), whose parameters are learned via the expectation–maximization algorithm. Our simulation results showed that the two error probabilities (missed detection and false alarm) were decreasing functions of the signal-to-noise ratio (SNR). Specifically, at an SNR of 10 dB with a pre-specified false alarm rate of 0.2, the probability of correct detection was almost one. We further noticed that the HMM method outperformed the two-step hypothesis testing method at low SNRs (e.g., a 10% increase in accuracy was recorded at SNR = −5 dB), as expected. Finally, it was observed that the GMM method was useful when the ground truths (the true path loss values for all the legitimate THz links) were noisy.


Introduction
Nanoscale communication systems have attracted researchers due to their promising applications in healthcare, manufacturing industries, environmental control, etc. [1]. On the other hand, body-centric communication has potential applications in healthcare, entertainment, etc. [2]. Generally, body-centric communication is classified as "off"-, "on"-, and "in"-body communication based on the communication among implanted or wearable electronic devices. In this work, we focused on the body-centric communication systems where nano sensors/devices operating in the THz band are deployed on the body of a human being.
Due to the small size of nano devices, the existing frameworks, techniques, and methods proposed for communication networks such as WiFi, 4G, etc., are not suitable for exchanging information amongst the nano devices [3]. For instance, nano devices are unable to operate at microwave bands due to their small size. They would require molecular communication and the terahertz (THz) band for operation. Additionally, in IoT devices, due to the small energy sources, the computational processing capability is limited. Therefore, it is necessary to meet the requirements for new protocols of nano devices at all layers of the protocol stack. Operating in the THz band (0.1-10 THz) is a promising solution at the physical layer (PL) [4], which makes the antenna size very small and thus suitable for exchanging information between nano devices.
Like other communication networks, the body-centric nanoscale communication networks are also prone to a wide range of active and passive attacks by adversaries [5]. Some of the common attacks include eavesdropping, impersonation, denial of service (DoS), etc. Here, we investigated an impersonation attack in body-centric nanoscale communication networks. Figure 1 shows an illustration of an impersonation attack on a smart healthcare system scenario. The nano nodes are deployed on the body of a person/patient for disease diagnostics or to remotely monitor his/her health parameters. These nano devices are connected to a wearable device, which communicates the data to an outdoor network via a nano-to-micro interface. Assuming an enemy of the person secretly deployed its own nano machines nearby with the aim of impersonating the person's legitimate nodes to report false measurements to the remote health unit, an incorrect response through the nano machines or nearby doctors could result in devastating consequences. Therefore, we need an authentication mechanism at the nano-to-micro interface device (wearable device) to allow data transmission (reported measurements, i.e., glucose, blood pressure, etc., of nano nodes/sensors) from legitimate nano nodes only, blocking all malicious nodes.  In traditional communication systems, the countermeasures for such attacks are performed at the higher layer using cryptography. Despite the wide work in the field of cryptography, the mechanism can be compromised because of its sole dependency on the predefined shared secret among the legitimate users. With recent advances in quantum computing, traditional encryption has become vulnerable to being easily decoded, and existing crypto-based measures are not quantum secure unless the size of secret keys increases to impractical lengths [6]. In this regard, physical layer (PL) security finds itself as a promising mechanism in future communication systems. PL security exploits the random nature of the physical medium/layer for security purposes [7].
Authentication is one of the pillars required for the security of any communication system. PL authentication is a systematic procedure that uses PL's features to provide authentication. In conventional systems, asymmetric key encryption (AKE) is typically used in the authentication phase, which is the realm of public key encryption (a crypto-based approach). Such schemes are quantum insecure and incur overhead or high computations, which not only increase the size of the device, but also consume much power. The devices fabricated for nanoscale communication are energy constrained as they incorporate a small source of energy (a battery). PL authentication has a low overhead (a simple procedure that typically includes feature estimation and testing) and is almost impossible to clone unless the devices lie on each other. Various fingerprints including RSS [8], CIR [9,10], CFR [11,12], carrier frequency offset [13], and I/Q imbalance [14] have been reported for PL authentication in conventional communication systems.
Related Work: The authors in [15] for the first time studied authentication using path loss (S21 parameter) in body-centric communication using millimeter waves. Regarding the security of systems operating in the THz band, we found some works [5,[16][17][18] in the literature. The work [16] provided the first study on the security challenges faced by nanoscale communication systems, while the work [17] presented some possible promising applications along with the security challenges in the Internet of Nano-Things. Further, the experimental work of Jianjun et al. [5] for the first time rejected the claim about security in the THz band. The claim was that the inherit narrow beamwidth of the THz link makes it secure and thus impossible for a malicious node to accomplish an eavesdropping attack. The authors in [5] in their experiments used reflectors of different shapes between the THz transmitter and receiver. Then, with the help of secrecy capacity and blockage as performance metrics, they clearly demonstrated that eavesdropping attacks in the THz band can be easily performed.
The differences between our work and previous work are as follows: The first work [15], which studied the authentication problem in body-centric communication systems, considered millimeter-wave communication with a three-node setup. In contrast, our work considered multiple legitimate and malicious nodes operating in the THz band. The work [5] considered an eavesdropping attack in a system operating in the THz band, which was a different problem/attack than the attack we considered in our work. Next, in our previous work [18], we studied PL authentication for an in vivo nanoscale communication system whereby we utilized the path loss as the device fingerprint for a three-node system (i.e., Alice, Eve, and Bob). The difference between our previous work [18] and this work was twofold. First, the previous work was limited to the three-node system only, while in this work, the system model was comprised of multiple legitimate and malicious nodes. Second, the previous work was for an in vivo nanoscale communication system where authentication occurs at a nano node (Bob).
Contributions: For the first time, this work studied authentication at a nano-to-micro interface device (wearable device) in an on-body-centric communication system where we exploited the high-resolution transmission molecular absorption (HITRAN) database [19] for computing the path loss. For the first time, impersonation attack detection at the wearable device/receiver/Bob in multiple legitimate and malicious nano nodes operating in the THz band is performed via different mechanisms. We performed authentication by two-step hypothesis testing. We refined the output of the hypothesis testing via the hidden Markov model (HMM) with the Viterbi algorithm. We also performed transmitter identification via the maximum likelihood and Gaussian mixture model (GMM) with the expectation-maximization algorithm.
Outline: The rest of this paper is organized as follows. Section 2 provides the system model. Section 3 discusses authentication via two-step hypothesis testing. Section 4 presents the hidden Markov model to refine the output of hypothesis testing. Section 5 provides transmitter identification schemes. Section 6 presents simulation results with discussions, and Section 7 concludes the paper.

System Model
For the purposes of the simulation, we considered a square 2D map/layout of size are deployed according to the uniform distribution model, whilst a nano-to-micro interface device/receiver node, Bob, is placed at the origin, as shown in Figure 2. We assumed that the Tx nodes transmitted with a fixed/pre-specified transmit power so that the path loss can be computed by Bob.  The path loss is given as [20,21]: is the spreading loss. More details of spreading and absorption losses are given in Appendix A.
In the next section, we discuss the two-step mechanism for impersonation detection.

Authentication via Two-Step Hypothesis Testing
We assumed that the shared channel is time-slotted, whilst the transmit nodes perform channel sensing before transmitting; hence, there are no collisions. Without loss of generality, it can be assumed that A i is the legitimate node for slot k, but if A i does not transmit during this time slot, E j could transmit to Bob pretending to be an Alice node. Therefore, Bob needs to authenticate each message received on the shared channel and verify the transmitter identity (if no impersonation has been declared) in a systematic manner.
Assume that the noisy measurement z(k) = L + n(k) has been obtained at time k (for instance, by using the pulse-based method as discussed in [22]), where n(k) ∼ N(0, σ 2 ) and L is the path loss. Furthermore, in line with previous studies [18,23], we assumed that Bob has already learned the ground truth via prior training on a secure channel. The ground truth vector can be denoted by l = {L 1 , ..., L M } T . The two-step hypothesis testing or maximum likelihood (ML) hypothesis test can be explained by the following equations: Next, the binary hypothesis test works as follows: Equivalently, we have: where is a small threshold-a design parameter. This work followed the Neyman-Pearson theorem [24], which states that, for a pre-specified P f a , can be chosen such that P md is minimized. The error probabilities for the above hypothesis tests are: where 2 dt is the complementary cumulative distribution function (ccdf) of a standard normal distribution, and π(i) is the prior probability of A i . Thus, the threshold could be computed as follows: Then, P md is given as: where π(j) = ∑ M i=1 α ij π(i) is the prior probability of E j . 0 < α ij < 1 is the fraction of slots that were originally dedicated to A i , but were found idle and thus utilized by E j .
Since P md is an R.V., the expected valueP md := E(P md ) is as follows: where we assumed that the unknown path loss L j ∼ U(L min , L max ) ∀j and ∆ = L max − L min . Next, we discuss the HMM for refining the outcomes/results of the two-step hypothesis testing.

Hidden Markov Model-Based Approach
To refine the output of the two-step hypothesis testing, we used the HMM-based approach. More specifically, at a given time instant k, the system is in one of the two states with the state-space: S = {s 0 , s 1 }. The states s 0 and s 1 imply that there is no impersonation, impersonation respectively, at time k. However, the true state of the system is hidden; therefore, what we observe through the hypothesis test is another observable Markov chain. The connection between the true/hidden state and the observable state is given by the emission probability matrix: where The off-diagonal elements in the i-th row of R represents the errors made by the ML test, i.e., deciding the state as s[k] = j, j ∈ {0, 1} \ i while the system was actually in state s[k] = i. The transition from state i to state j occurs after a fixed interval of T = t k − t k−1 seconds where 1/T is the measurement rate. Assume that the system was in state s 0 at time k = 0, i.e., x[0] = [1, 0] T , we are in time k − 1 and want to predict the probability vector x[k] at time k, and the system is in state s i , i ∈ {0, 1}. To this end, we have the following transition probability matrix: where At this stage, we are done with impersonation detection mechanisms. Next, we discuss the transmitter identification mechanisms.

Transmitter Identification
The transmitter identification is accomplished via two approaches: ML-and GMMbased transmitter identification.

ML-Based Approach
In the ML-based approach, the probability of the misclassification error resulting from Equation (2) is given as: where P mc|i = P(Bob decides A j |A i was the sender). For the hypothesis test of (4), P mc|i is given as: . Additionally,l = {L 1 , ...,L M } = sort(l) where the sort operation (.) sorts a vector in increasing order. For the boundary cases, e.g., i = 1, i = M, L l,1 = L min ,L l,M = L max , respectively.

Transmitter Identification Using Gaussian Mixture Modeling
The GMM consisted of Q = M + N component densities where only the Q = M densities could be trained. The 3Q GMM parameter was learned by running the expectation-maximization (EM) algorithm on the training data. The GMM, in its standard form, is perfectly suited for transmitter identification. Under the GMM, the probability density function (pdf) of the (observed) mixture random variable X is the convex/weighted sum of the component pdfs: where each φ q (x) is a Gaussian pdf that satisfies: φ q (x) ≥ 0, x∈R φ q (x)dx = 1. The weights/priors satisfy: π q (x) ≥ 0, ∑ Q q=1 π q = 1. The GMM has 3Q unknown parameters, which were learned by applying the iterative expectation-maximization algorithm on the training data {x m } M m=1 . The posterior probability for each point x m in the training data (i.e., the likelihood of x m belonging to component q of the mixture) was computed as follows (j is the iteration number): The Q number of priors were updated as follows: The Q number of means were updated as follows: The Q number of (co-)variances were updated as follows: The iterative EM algorithm monotonically increased the objective (likelihood) function value and converged when the increase in the likelihood function value between two successive iterations became less than the threshold . Figure 3 shows a flow graph of the proposed methodology. The noisy estimated measurement/path loss z(k) at slot k was fed to a two-step mechanism for impersonation detection, and the HMM was used to refine the outcomes of the two-step mechanism with the help of transition and emission probability matrices (i.e., P and R) and the Viterbi algorithm. Transmitter identification was done via the ML and GMM approaches when no impersonation was decided.

Setup
We kept M = N = 10, α ij = 0.5 ∀j, f = 1 THz, T = 285 k, and p = 1 atm. Both the Alice and Eve nodes were deployed according to the uniform distribution in a 1 m × 1 m area. A total of 10 5 random realizations (independent of the Alice and Eve nodes) of the nodes' deployment were taken, and then, errors were averaged over the realizations. P f a and P md are two well-known probabilities resulting in hypothesis testing. P f a was defined as the probability that any i-th Alice node can be considered as any of the Eve nodes P md is the probability of the event that any j-th Eve node can be considered as any of the Alice nodes. Figure 4 represents the two probabilities against SNR = 1 σ 2 where the improvement in error probabilities with an increasing SNR can be seen clearly. The designed parameter decreased P md , but increased P f a .   Figure 5 shows the efficacy of the HMM. At a low SNR, the performance of the HMM was far better than HT, and at a high SNR, HT was close to the HMM. The results were produced after the Monte Carlo-based simulation. The total number of transmissions was kept to 10 5 (more specifically, 10 5 binary states (s 0 , s 1 ) were generated), = 1, P = 0.5I 2×2 , where I is the identity matrix and K = 10 3 . The errors resulting from the HT and HMM methods were calculated as the number of times the predicted/estimated state was not equal to the actual state divided by the total transmissions. The accuracy was then computed accordingly. The entries of R were calculated according to P f a and P md . Figure 6 shows the receiver operating characteristic (ROC) curves for different configurations of the nodes and transmissions from Eve nodes (i.e., α ij ). Typically, the ROC contains two error probabilities (P d and P f a ), but due to multiple nodes in this study, we had three probabilities. For any P f a value, P mc was constant, which is obvious from Equation (13). Increasing the SNR not only improved P d , but also improved P mc as well. P f a was chosen as an independent variable and swept in the range from zero to one. Using Equation (6), the threshold was calculated for a given SNR value. Further, P d = 1 − P md (the detection probability) and P mc were computed as the average after doing 10 5 uniform realizations of the nodes' deployment. We observed that increasing the number of nodes did not affect P d , but P mc increased with an increase in the number of Alice nodes (M). We further observed that when fewer nodes (Alice nodes) remained idle during their allocated slot, the more P d we had. P mc is the probability of deciding the i-th Alice node, as any Alice node without i. P mc becomes an important metric when dealing with multiple nodes' identification. Here, P mc resulted from both transmitter identification algorithms (ML, which is a biproduct of two-step HT-based authentication and the GMM). As the GMM is a learning approach, it requires training data to learn its parameters. That is the reason that we only performed transmitter identification using the GMM. We assumed no data were available for Eve nodes.  was generated by assuming actual ground truths (noiseless (L i ∀i)) of Alice nodes available for performing ML-based transmitter identification. The ML was implemented using Equation (2) having noiseless ground truths. Figure 7a shows that the two approaches performed equally. To test the efficacy of the GMM approach, we performed another experiment and plotted the results in Figure 7b. This time, we assumed that the ground truths of the Alice nodes were noisy L i + n ∀i (i.e., when the ground truths were obtained on a secure channel, it also included noise or an error). This time, the ML-based approach was implemented using Equation (2) to include noisy ground truths. The GMM parameters were estimated on 10 4 training data generated from the legitimate nodes and then tested on 10 5 . The error was calculated as the number of times the estimated state was not equal to the actual value divided by the total transmissions for both approaches and for both cases. We observed from Figure 7b that the overall performance of GMM was improved. The performance improved even further for lower SNR or higher σ 2 .

Discussions
• From Figures 4 and 6, we learned that the path loss could be exploited as a fingerprint to carry out authentication in body-centric nanoscale communication systems operating in the THz band. In other words, the proposed mechanisms can be used as a first line of defense against impersonation attacks. • The results of the proposed two-step mechanism can be improved by using an additional approach (i.e., HMM). In particular, at a low SNR, the improvement was quite significant.

•
The results in Figures 4 and 6 indicate that, under the impersonation detection problem, it is not possible to minimize both Pmd and P f a at the same time because of their conflicting nature. In other words, one could minimize one error type only by compromising the other error type. • GMM (Learning-based scheme) performed the same as our proposed two-step mechanism in transmitter identification. However, we learned that slightly complex nature of the GMM could produce improvements when the ground truths of legitimate nodes are noisy.

Conclusions
This paper provided an authentication mechanism using path loss as a fingerprint at the physical layer in body-centric nanoscale communication systems operating in the terahertz band. The work's importance was advocated by illustrating envisioned smart healthcare application of body-centric nanoscale communication systems. The complex and quantum insecure crypto measures can be complemented using this approach, which is simple and quantum secure (i.e., no encryption or shared secret key is involved). This was observed from ROC curves after doing the Monte Carlo-based simulation for nodes' deployment under a uniform distribution that with a 20% false rate, the detection probability was almost one when operating with SNR = 10 dB. For simulation purpose, nodes were deployed in a 1 m × 1 m area under a uniform distribution, and air was considered as a medium among the nodes, while the path loss was calculated using the HITRAN database.  Acknowledgments: Waqas Aman would like to thank the Higher Education Commission of Pakistan (HEC) for providing him the IRSIP scholarship to travel to the University of Glasgow for his research studies.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: where i is the isotopologue (a molecule that differs in isotropic composition), g is gas, p 0 (T 0 ) is standard pressure (temperature), σ i,g ( f ) is the absorption cross-section, and Q i,g is the molecular density given by: where R is the gas constant, N A is the Avogadro constant, and q i,g is the mixing ratio for i of g. The absorption cross-section can be expressed as: where the line intensity S i,g and line shape G i,g ( f ) parameters can be computed using data from the HITRAN database [19].