Design and Analysis of a True Random Number Generator Based on GSR Signals for Body Sensor Networks

Today, medical equipment or general-purpose devices such as smart-watches or smart-textiles can acquire a person’s vital signs. Regardless of the type of device and its purpose, they are all equipped with one or more sensors and often have wireless connectivity. Due to the transmission of sensitive data through the insecure radio channel and the need to ensure exclusive access to authorised entities, security mechanisms and cryptographic primitives must be incorporated onboard these devices. Random number generators are one such necessary cryptographic primitive. Motivated by this, we propose a True Random Number Generator (TRNG) that makes use of the GSR signal measured by a sensor on the body. After an exhaustive analysis of both the entropy source and the randomness of the output, we can conclude that the output generated by the proposed TRNG behaves as that produced by a random variable. Besides, and in comparison with the previous proposals, the performance offered is much higher than that of the earlier works.


Introduction
The proliferation of wearable sensors has meant that medical environments are not the only ones in which the acquisition of vital signs can occur [1]. For instance, there are a large number of smart-watches (or sports watches) that monitor several of our physiological signs throughout our daily lives, and even smart-textiles that have one or more integrated sensors have appeared on the market [2]. Concerning the measured signal, there is a wide variety ranging from signals related to the brain (e.g., Electroencephalogram (ECG)) through signals linked to the heart (e.g., Electrocardiogram (ECG) or Photoplethysmogram (PPG)) to signals related to emotions (e.g., Galvanic skin response (GSR)). Sensors do not usually work in isolation but form a network. When we refer to sensors that are in (e.g., a pacemaker or a neurostimulator) or around (e.g., an insulin pump or a sport-watch) the body, this type of network is named Wireless Body Area Network (WBAN) [3,4]. Body Sensor Network (BSN) or Medical Body Area Network (MBAN) are other names given to these networks [5,6]. Apart from the sensors, there is a central element called the gateway-a smart-phone usually implements the latter. Currently, the sensors do not communicate directly with each other (shortly this may happen), and negative ions in the secreted fluid. As a result, we can observe changes in skin conductance. Note that an increase in skin conductivity means a decrease in skin resistance.
The Autonomic Nervous System (ANS), which forms with the Somatic Nervous System (SNS) the Peripheral Nervous System (PNS), controls the functioning of many organs, muscles, and glands [48]. In detail, this regulation (proper behaviour of our body) is achieved by impulses from the brain (and/or spinal cord) and generated by autonomous neurons. Sweet glands are part of the glands mentioned above. In detail, sweating is driven and balanced by the ANS, and we cannot consciously control it. The ANS consists of the parasympathetic and the sympathetic nervous system [49]. The former is responsible for "rest and digest". Decreased heart rate, decreased sweating, or decreased blood pressure are some effects of its activation. The latter is responsible for the body's "fight or flight" reaction. That is, it helps to protect the body and is involved in functions such as pupils dilatation, increased heart rate or sweating [50]. Therefore, both systems are complementary to each other.
The recording of the GSR signal is non-invasive, and we only need two electrodes for its acquisition. Three are the most common placements: (1) index and middle fingers; (2) left and right side of palm; and (3) foot. In the market, we can find low-cost hardware platforms (e.g., BITalino or Libelium e-Health platform [51]) for the acquisition of biosignals. In Figure 1, we show an example of the electrode placement using the Bitalino platform for the signal acquisition. In detail, the exosomatic method with a small constant voltage is the most common approach to measure the GSR signal. The skin conductance (1/resistance) values are determined by measuring the changes in the current flow between the two electrodes, as the voltage is constant [52].

Dataset Description
The randomness test batteries (e.g., DIEHARD [53] and NIST [54]) commonly used to verify the randomness quality of a random number generator require files of several tens of megabytes. For this reason, the GSR signals used in this study come from three well-known datasets: 1.
AMIGOS dataset [57]. Forty healthy users participated in the experiment (aged 30.5 ± 9.5; 32.5% female). The participants watched short (16) and long (4) emotional videos. Three neuro-physiological signals (i.e., EEG, ECG and GSR signals) were recorded using wearable sensors. In our experiments, we discarded three files (subjects) because of their short length.
Note that we discarded the acquisition of our own GSR signals (e.g., using the Bitlanino platform) because, for our experimentation, we needed signals from many subjects and at the same time very extensive in time. As mentioned, in our experiments, we used signals from three datasets forming a total of 82 individuals (aged 28.5 ± 7.5; 35.8% female). Since no individuals present any severe pathology, we can then discard any bias in the output bits generated by the proposed TRNG. Furthermore, the signal acquisition process guaranteed that the GSR signals of the subjects in the dataset are statistically independent.

Methods
In our experiments, we focused exclusively on the GSR signal. We aimed to validate the hypothesis we can extract randomness from this vital signal. The proposed procedure is summarised in Algorithm 1 and explained below. First, for the GSR signal pre-processing, we followed a similar approach with all three datasets. As a first step, the data were down-sampled to 128 Hz. Then, a low-pass filter with 60 Hz cut-off frequency was applied. As an illustrative example, Figure 2 shows three minutes of a GSR signal. Split GSR cleaned into N-seconds GSR-windows (N=4 in our experiments) 6: for each GSR-window(x (j) (t)) do 7: Hilbert Transform: y (j) (t) = h(t) * x (j) (t) 8: Entropy Extraction: g (j) (t) (0,...,7) = uint8((uint32(abs(y (j) (t) * 10 2 ))) >> 24) After cleaning the GSR signal, we needed to extract randomness numbers from it. For this, we divided the GSR signal into windows of N = 4 seconds to be able to capture some variability in the signal-we fixed the size of the window by experimenting after analysing an extensive set of possible values. Secondly, we computed the Hilbert transform for each window. Hilbert transform can be interpreted as an all-pass filter in which all positive/negative frequencies are sifted −90/90 degrees, respectively. Mathematically, the Hilbert transform of a real, continuous-time signal is given by: where h(t) represents the Hilbert transform kernel (h(t) = 1 πt , t ∈ (−∞, ∞)). Finally, we extracted random bits from the Hilbert transform values. Mainly, we used an entropy extraction algorithm for this purpose. More precisely, using an accuracy of six decimal places, each value was converted to a 32-bit unsigned integer value, and then a byte was extracted from the Least Significant Bits (LSBs). It means that the proposed TRNG can generate 8 × f s bits per second, with f s being the sampling rate used. The use of the LSBs is motivated by the fact that it is in these positions where there is more variability (randomness, formally stated) as confirmed by the results presented in the following sections. Mathematically, the extraction of random bits can be expressed as: Once we specified the randomness extraction algorithm, we needed to assess the quality of the random numbers generated. For this purpose, we used the datasets introduced in Section 2.1. The reader can consult the following section for an in-depth security analysis of the proposed TRNG.

Results
We analysed the proposal from two perspectives. Firstly, the quality of the entropy source was studied, using the NIST SP 800-90B recommendation [58]. Secondly, the randomness of the random numbers generated was examined using well-known batteries of tests, such as DIEHARDER [53] and NIST [54].

Source Entropy Analysis
A cryptographic Random Bit Generator (RBG) is composed of three components: (1) an entropy source; (2) an algorithm responsible of storing and providing bits to the target application, and (3) the procedure for combining the two first components. In a nutshell, the entropy source model consists of an analogue noise source (in our case, the GSR signal, which is first cleaned with the procedure Pre-Processing in Algorithm 1) and a digitisation algorithm (procedure GetEntropy specified in Algorithm 1 and defined by Equations (1) and (2)).
For testing the entropy of RBGs, the NIST SP 800-90B recommendation proposes ten estimators, including the Markov and LZ78Y estimate among others for calculating the min-entropy [58]. The final estimation is the minimum value of all these tests. A file of 25 million 1s and 0s was generated using the third dataset to evaluate the entropy quality of the GSR signal. In most tests (see Table 1), the entropy value was close to the optimal (1) and even for the worst case remained very high (0.935). In this particular case, the t-tuple test sets the min-entropy value. This test evaluates the frequency of pairs, triples, and so on, and estimates the entropy per sample based on these frequencies [58]. From all the above, fortunately, we can conclude that the GSR signal together with the proposed digitisation algorithm seemed appropriate for cryptographic solutions. In some occasions, the estimation of the entropy calculated on a very long sequence can produce an overestimation of the entropy-correlated sequences might be generated after a restart. If this is the case, the attacker could cause multiple restarts of the entropy source to generate an advantageous situation for her/him. The "restart" test is defined in the NIST SP 800-90B specification to evaluate this issue. As for generating data for this test, the GSR source was restarted 10 3 times, and then we recorded 10 3 consecutive values. In our case, we used the third dataset, in which the subjects were shown 20 different videos. Therefore, in our experiments, the reset of the physiological signal was simulated by exposing the subject to a different stimulation (video). Furthermore, to be even more confident, we repeated the test five times (i.e., from File-1 to File-5). As shown in Table 2, the five tests were passed successfully and confirmed that 0.94 was not an overestimate for the min-entropy.

Randomness Analysis
In Algorithm 1, we included an entropy distillation process (Procedure GetEntropy) to produce randomness. After the entropy analysis, we needed to assess the randomness quality of the bits generated by the GSR-TRNG. For a first visual inspection in Figure 3, we show an 8-bit grey scale image (512 × 512) of values generated by our TRNG. No anomalous patterns were detected, and the image behaves as the one generated by any other strong cryptographic random number generator. Several test batteries are commonly used (ENT [59], DIEHARDER [53] and NIST [54]) to analyse the randomness in depth. These tests require an input file of several hundred million bits. In our particular case, we generated a file of 30 MBytes by joining the GSR signals (signals of 84 subjects in total) of the three datasets introduced in Section 2.1. ENT suite [59], which is not intended for cryptographic applications, is one of the test batteries usually used first to discard weak or faulty generators without the need for additional testing. Table 3 shows the results after analysing the 30 MByte file mentioned above. The entropy and compression results indicate that the file was extremely dense in terms of information (randomness). As for the chi-square test, which is very sensitive to detect weak generators, the results show no g suspicion of being not random. The arithmetic mean value confirmed that the proportion of ones and zeros were equal (i.e., there was no bias in the output). The serial correlation coefficient showed the high unpredictability of the bitstream-there was a low dependence between a particular bit and its predecessors. To analyse whether there were no biases in the behaviour of each subject's signals, we performed an additional experiment by analysing them separately. Using the signal of the 37 subjects of the AMIGOS dataset, we generated a binary file of 800 KB for each of the subjects. Each of these files was analysed with the ENT suite. Figure 4 shows the result of the chi-square test. As shown in the figure, most values were within the optimal value (256) and ± the standard deviation. We can, therefore, conclude that the different subjects behaved similarly. In other words, there were no significant differences between the bitstreams generated from the different GSR signals corresponding to each subject. DIEHARDER [53] (a modern version of the Diehard battery), and NIST [54] are much more demanding test batteries than ENT. NIST has been designed to test RNGs that are devoted to cybersecurity solutions. DIEHARD consists of 15 test and the results obtained are summarised with a p-value in Table 4a. In detail, all tests were within the interval [0.025-0.975]-note that, due to a large number of p-values calculated, it would not be uncommon for some of them to be outside this range. Apart from being distributed within the interval mentioned above (0.05 of significance level), the critical point to consider the file under analysis random is that these p-values must follow a uniform distribution. We tested this hypothesis using a Kolgomorov-Smirnov test, which returned a decision that the p-values come from a uniform distribution at the 5% of the significance level. Therefore, we can conclude (95% of confidence) that there were no bad behaviours in the analysed bitstream (30 MByte file) and that all the DIEHARD tests were successfully passed. As mentioned above, NIST is often used in the context of cybersecurity and for formal verification of RNG designs. The NIST suite is made up of 15 tests, which examine bits, m-bit blocks or m-bit parts. Regarding the interpretation of the results, the first value corresponds with the p-value calculated for uniformity testing with the p-values obtained with a given test; the values in brackets represent the proportion of tests passing the corresponding test. The following equation gives the minimum number of tests (except for the random excursion test) that must be passed for each test: being (1 − α) the significance level and K the number of sequences tested. In our particular case, α = 0.01 and K = 100, thus the minimum pass rate was 96. From the results in Table 4b, all the tests passed the uniformity test (p-values in the interval 0.01-0.99; α = 0.01) and the proportion test was above the mentioned threshold (mpr = 96). Furthermore, the Kolgomorov-Smirnov confirmed the uniformity of all p-values (15 tests) with 1% of significance level. From all this, we can conclude that the bits generated by the TRNG based on GSR signals behaved as a random variable. As an additional experiment, we analysed whether there was any relationship between the random numbers generated by each user (GSR signal). If this were the case, it would be very advantageous for an attacker, since s/he could exploit the knowledge of a GSR signal (e.g., User-A) and predict the values of another signal (e.g., User-B). To assess this, using the 38 users of Dataset 3, we created a file of 800 KB. Next, we grouped the data of each file in words of different sizes (m = {8, 16, 32, 64}). For each of these word sizes, we computed the hamming distance between all the dataset pair combinations (C 38,2 ). We show the results obtained in Figure 5.
If there is no relation between the users (GSR signals), the calculated Hamming distance should follow a binomial distribution (p(X = k) = ( m k )p k (1 − p) n−k ; E(X) = m * p and σ 2 = n * p * (1 − p)) being m the size of the words and p = 1/2 as the zeros and the ones are equally likely). In our experiments (see Figure 5), as expected, the experimental values were almost identical to the theoretical ones (i.e., a hamming distance of 4, 8, 16 and 32, respectively). Therefore, the advantage of an adversary of predicting the values of a user using the knowledge of other users' signals was zero. Apart from the randomness tests, and as a final test, we analysed the TRNG as if it were used as a generator of a ciphering sequence (s) to encrypt a plain-text (m): c = E(s, m) = s ⊕ m. In particular, using this approach, five different images (256 × 256 grayscale images), chosen randomly from the Internet, were used as inputs for the experiment. As for the ciphering sequence, bits were grouped in bytes and then regrouped into a matrix of the same size as the inputs images. As a first glance, Figure 6 shows the histogram of one of the tested images and its histogram after encryption. As expected, the encryption made the histogram uniform. Note that, if s (image with random values) follows a uniform distribution, and s and m are chosen independently of each other, the resulting value is uniformly distributed, since we combine them with the bitwise operation. This uniform distribution at the output makes it impossible for an attacker to extract any information from the original plaintext (image from the Internet in our example). Nowadays, NPCR and UACI tests are used to evaluate the strength of an image encryption technique against differential attacks [60]. In short, the first assesses the number of changing pixels and the second evaluates the changes in intensity, in both cases, between two encrypted images when the two plain images differ by one bit. In Table 5, we summarise the results of these test for the five examined images. Considering the thresholds given in [61], NPCR and UACI tests passed successfully at 0.05 significance level (i.e., NPRC 0.05 ≥ 99.5693% and 33.284% ≤ U ACI 0.005 ≤ 33.6447%).

Discussion and Conclusions
Today, there are many devices that monitor vital signs. These devices can be medical devices such as pacemakers or insulin pumps or general purpose devices such as sports watches or smart clothing with sensors. In any case, we have devices equipped with one or several sensors that transmit the acquired information (in most cases, wirelessly) to a central device. Although no one doubts the benefits of constant monitoring of our physiological parameters, access to these data only to authorised entities and their protection when transmitted through an insecure channel (mainly the radio channel) should be guaranteed from the design phase. Random number generators play a critical role in the design of cryptographic solutions for this purpose. Motivated by this fact, in this article, we have proposed a TRNG that benefits from a vital signal that is already being monitored by a sensor on the body. In particular, we have studied how to design a random number generator based on the GSR signal. Both the entropy source and the output randomness analysis confirm that the generated bitstreams behave as a random variable.
As shown in Algorithm 1, for the extraction of the randomness (Procedure GetEntropy), the Hilbert transform is used, which is usually used to construct the Analytic signal. Mathematically, given a signal x(t) and its Hilbert transform y(t), it is defined by x A (t) = x(t) + jy(t). In our particular case, we use only the imaginary part of the analytic signal that corresponds to the Hilbert transform itself. The reader may be tempted to think that the extraction of the entropy could be done from the signal itself (without any transformation). However, this was the first approach that we tested, and, although the output is entropic, a simple test such as the chi-square (ENT suite) clearly shows how the bits generated are non-random. Therefore, the use of Hilbert's transform is justified. Note that the procedure for extracting random bits (see Equation (2) in Section 2.2) also plays a crucial role in our proposed TRNG.
In general terms, three elements are the main components of a TRNG: (1) noise source (GSR signal in our case); (2) digitisation algorithm; and (3) post-processing procedure (optional). In our case, we only have the first two elements since we consider that post-processing is not necessary. Among the most common post-processing techniques are bitwise XOR operations, Von Neumann algorithm or even the use of a hash function [62,63]. The use of these techniques is mandatory when the quality (randomness) of the output is not yet the desired. As shown in the in-depth analysis of the randomness (see Section 3.2), our generator successfully passes all the test batteries, and that is why our proposal dispenses with this stage.
A key parameter about any primitive cryptography is its performance. In the case of random number generators, high or moderately high throughput may be necessary for many applications. The proposed TRNG can generate 1024 bits per second (i.e., 8 × f s = 8 × 128). This performance is far superior to that achieved by other random number generators using biosignals. In this context, the cardiac signal is the most studied physiological signal for this purpose. Solutions based on Interpulse Interval (IPI) values can generate between 2 and 14 bits per second [38,64], which is far below our performance. Even modern solutions based on the wavelet transform offer a through three times lower [19]. Concerning the GSR signal and the recently proposed TRNG [23], its throughput is 16 times lower at best than that of our approach. We can conclude from all this that our proposal offers excellent performance to be used in cybersecurity solutions.
As shown in this article, a new generation of TRNGs based on our vital signs can be designed. Apart from the GSR signal, and cardiac signals, other signals, such a the electrical activity of the brain (e.g., electroencephalogram) or the skeletal muscles (e.g., electromyogram) could be employed. Even for highly demanding applications, the combined use of various signals could give excellent results. As a conclusion, we can state that just as we still have much to learn from the human body within medicine, the use of the body is even less explored for cybersecurity tasks. In addition, it is worth mentioning that the use of sensors, integrated into a wide variety of devices, plays a critical role in the acquisition of the signal at stake.