ECG-RNG: A Random Number Generator Based on ECG Signals and Suitable for Securing Wireless Sensor Networks

Wireless Sensor Networks (WSNs) are a promising technology with applications in many areas such as environment monitoring, agriculture, the military field or health-care, to name but a few. Unfortunately, the wireless connectivity of the sensors opens doors to many security threats, and therefore, cryptographic solutions must be included on-board these devices and preferably in their design phase. In this vein, Random Number Generators (RNGs) play a critical role in security solutions such as authentication protocols or key-generation algorithms. In this article is proposed an avant-garde proposal based on the cardiac signal generator we carry with us (our heart), which can be recorded with medical or even low-cost sensors with wireless connectivity. In particular, for the extraction of random bits, a multi-level decomposition has been performed by wavelet analysis. The proposal has been tested with one of the largest and most publicly available datasets of electrocardiogram signals (202 subjects and 24 h of recording time). Regarding the assessment, the proposed True Random Number Generator (TRNG) has been tested with the most demanding batteries of statistical tests (ENT, DIEHARDERand NIST), and this has been completed with a bias, distinctiveness and performance analysis. From the analysis conducted, it can be concluded that the output stream of our proposed TRNG behaves as a random variable and is suitable for securing WSNs.


Introduction
We are in the era of the Internet of Things (IoT), where all kinds of devices and sensors are connected to the Internet. There is a wide variety of applications/sectors that can benefit from this technology, but it can turn into a nightmare if security does not play a critical role [1,2]. This is even more critical, if possible, in particular sectors like the health-care sector, where sensors are in or on a subject's body, and a cybersecurity attack could have dramatic consequences. The reader should note that the new generations of implanted medical devices (e.g., pacemakers or insulin pumps) are already equipped with wireless connectivity and can be remotely accessed [3,4]. The security risks of these medical devices have been recently scrutinized, and the results show certain security pitfalls in some commercial devices [5].

Motivation and Related Work
In the last few years, WSNs have attracted the attention of many researchers because of their great potential. These can be categorized depending on: (1) the place where the sensors are deployed (terrestrial, underground or underwater WSNs); (2) their ability to deal with multimedia data (multimedia WSNs); and (3) their ability to move around (mobile WSNs) [17]. The domains in which WSNs have been applied are very diverse. Monitoring and tracking are the two main purposes of the wide suite of applications [18]. Among the main fields of application are military, environment, industry, agriculture, urbanization, infrastructure and health. This work is framed within BANs, in which health (patient monitoring) is the star application [19]. In our particular case, the monitored vital signal is used for security purposes (random number generation); patient status monitoring can be done at the same time.
As mentioned, the security of sensors in WSNs is fundamental to the success of the IoT paradigm [20]. Cryptographic solutions must be supported on-board these devices, and random-number generators are one of the commonly-required cryptographic primitives. In this vein, the proposal takes advantage of the fact that some sensors record our vital signals. For this reason, it explores whether randomness can be extracted from physiological signals. In fact, some authors have recently studied this topic in the context of neuronal signals [21,22]. The main limitation of these studies is length of the recordings used and the fact that medical-purpose Electroencephalogram (EEG) sensors have limited portability capabilities.
In our case, the experiments focus on heart signals. Particularly, the electrical signal of the heart can be measured by placing electrodes (e.g., three or 12 leads) on the body of the subject under analysis. The representation of this signal is the Electrocardiogram (ECG). There are five characteristics points in the ECG: (1) the P-wave represents the depolarization of the atria; (2-4) the QRS complex represents the depolarization of the ventricles; and (5) the T-wave represents the re-polarization of the ventricles [23]. In Figure 1, an ECG signal and its characteristics points are sketched. For cybersecurity purposes, the time interval between two consecutive heart-beats (R-peaks, which occur when the ventricles begin to contract), has gained the attention of many researchers in recent years [24][25][26]. This interval is commonly referred to as the Inter-Pulse-Interval (IPI). Accordingly, an interesting and proof-of-concept work can be found in [8], where Peter et al. presented a design and implementation of an IPI-based authentication protocol. In [27,28], the authors showed how IPI-based values can be employed as cryptographic keys. In addition, ECG biometrics is a growing field in which some approaches are based on characteristics' points (including R-peaks and IPIs) [29,30].
In relation to random numbers, some authors have pointed out how the last four bits of IPI values are highly entropic [27,31]. Nevertheless, high entropy is a necessary, but not a sufficient condition to be considered a random variable. In Table 1, the results obtained in the analysis of a 10-MB file of IPI values (4 LSBbits per IPI) with the ENT suite [32] are shown. Although the entropy value is high, the chi-square test clearly shows that this file is not random. In line with this, in [33], the randomness quality of IPI values was scrutinized in-depth using 19 public datasets with healthy and unhealthy subjects. Two main conclusions were drawn from this study: (1) IPI values can generate short bit streams that behave as a random variable; and (2) large files with IPI values have poor randomness quality. In addition, the generation of random numbers based on IPI values offers very low performance, and although, this value is double in [34], the offered throughput is still low.
For all this, the designed ECG-based TRNG does not use the IPI approach and exploits all the wealthy entropic information contained in the entire ECG signal.

Materials and Methods
In connection with the acquisition of the EEG signal, both medical equipment or low-cost sensors can be used for recording. The former ones often use twelve electrodes over the chest and limbs. These recordings are very accurate, but their portability is limited, making these devices unsuitable for WSNs. This equipment is commonly used in hospitals, and the subject must be at rest. With regard to low-cost sensors, only two or three electrodes on the chest or wrists are needed to capture the ECG traces. The signal can be a little noisy, but portability and integration into wearable devices (e.g., smart-watches or t-shirts [35]) make these devices very appropriate for WSNs: the wearer may be performing activities of her or his daily life; in other words, there is no need for the subject to be at rest. In our particular case, as a proof-of-concept, a low-cost ECG sensor (BITalino board with an ECG sensor [36]) was used for the acquisition of the ECG records. For this, three electrodes can be placed at the chest, but also at the palms of the hands. The aim of our contribution, taking advantage of the fact that some sensors in WSNs have the ability to sense heart signals, is to extract random numbers, which can be used for security purposes, from the above aforementioned signals. Once the raw ECG signal is acquired, pre-processing and randomness extraction by wavelet decomposition can be computed at the sensor itself or at the central node of the WSN that has greater computational and memory capabilities. Figure 2 shows all the necessary hardware, and the source-code is available in the following link to facilitate the reproduction of all the results (source code is available at these two links: https://goo.gl/WmQiiC and https://goo.gl/TpvSQq). The signal pre-processing and randomness extraction procedure are described below.
In detail, the ECG records have been cleaned using the following procedure (pre-processing procedure in Algorithm 1). Once the DC component is eliminated, a bandpass filter is used to remove two main noise sources. The lower and upper cut-off frequencies are fixed to 0.67 (to eliminate the noise caused by the respiration) and 45 Hz (to eliminate the power line noise), respectively. Algorithm 1 ECG-RNG. Split ECG cleaned into ECG-windows (one heart-beat per window) 6: for each ECG-window(j) do 7: Discrete wavelet decomposition (set parameters L and w f ) for each c i do 10: Fractional part extraction (z i ) 11: Output the 8-LBS bits (ri) For squeezing random values from the clean ECG trace, the following procedure (wavelet decomposition procedure in Algorithm 1) is proposed. The ECG record is split into windows that contain an R-peak (one heart-beat); for each ECG record, the first and last fifty windows have been discarded to guarantee that the signal is properly registered. Secondly, the approximation coefficients of each EGG window are obtained by wavelet analysis. Note that the discrete wavelet transform of a signal (x[n]) is computed by passing it through a low-pass filter (g[n]) and and high-pass filter (h[n]). The signal is then sub-sampled by 2, and the process is repeated to increase the level of decomposition. In particular, the number of iterations is conditioned by the pursued decomposition level. The procedure is summarized in Figure 3; the reader can consult [37] for a detailed explanation. As for the wavelet decomposition, there are two key-parameters that need to be set and a wide range of possibilities are studied in the following sections. On the one hand, L parameter sets the decomposition level: L = {1, 2, 3, 4} are the tested levels. On the other hand, w f sets the wavelet family (e.g., Daubechies or coiflets) used in the decomposition and determines the filters used (g [n] and h[n]).
Finally, random bits are extracted (entropy extraction procedure in Algorithm 1) using a kind of quantization. More precisely, the fractional part of each coefficient is converted into a 32-bit unsigned value, and then, the 8 LSB bits are extracted. Mathematically, let c i be a coefficient of wavelet-decomposition and r i an outputted random byte. Then, Although the proposal was initially evaluated with ECG records obtained with the BITalino board, an in-depth analysis has been carried out, using a well-known and public dataset that uses three electrodes. More precisely, the E-HOL-03-0202-003 dataset, which was provided by the Telemetric and ECG Warehouse (THEW) of University of Rochester (dataset available at: http://thew-project. org/index.htm), has been employed. ECG records were acquired using three pseudo-orthogonal lead configuration (X, Y and Z), and the sampling frequency was set to 200 Hz. The descriptive statistics of the dataset are summarized in Table 2.
This database has features that are not present in many other public datasets. First, the number of individuals (ECG signals) is very large (i.e., 202 subjects; in our experiments, 3 ECG records were discarded due to the very short length of these recordings). Secondly, each ECG record lasts around 24 h, which is much longer than the length of ECG files available in many other public datasets. Finally, it is worth mentioning that the subjects were healthy, and therefore uniformity in the distribution can be assumed.

Results and Analysis
To assess the randomness quality of the outputted bits, three of the most common statistical tests batteries to evaluate the randomness quality of a RNG have been used: NIST [38], DIEHARDER [39] and ENT [32]. NIST is the most demanding battery and requires long files (several tens of megabytes). In our particular case, files with a size of around 100 MB have been generated. For each subject (199 in total), experiments lasted between 4 and 6 h (the time interval was randomly chosen from the 24 h available of the ECG signal), and therefore, 0.5-MB files were obtained per subject after the entropy extraction by wavelet analysis (see Section 3 for details). Finally, all the files were appended (assuming independent and identically distributed random variables), and this was the file analyzed; note that the NIST suite requires files of at least 30 MB that would require the recording of one individual during approximately 15 days. In relation to the parameters w f and L, Daubechies was the family used (the number of vanishing moments was set to N = 4), and there were 1-4 levels tested.
Tables 3-5 summarize the results obtained with the NIST, DIEHARD and ENT suites for the four configurations studied. It is noteworthy that the NIST suite is devoted to test RNGs that have been designed for security purposes. Table 3 shows the p-value and the proportion of tests that pass each one of the fifteen tests included in the suite. Without a doubt, all configurations pass all the tests at the 0.005 level of significance, and it can be concluded that the output behaves as a random variable. Table 4 summarizes the p-values for each one of the test included in the Diehardsuite. The results were consistent with the NIST results. For a wavelet decomposition of three or four levels (the last two columns of Table 4), all tests passed. In the case of a decomposition of one or two levels, all the tests passed except a pair of tests where a weak-pass was obtained (p-value < 0.005); these are highlighted in bold in the table. Although the differences were minimal, the results indicated that a decomposition with a larger number of levels avoided the appearance of rare/weak patterns in the output. Finally, ENT results (as shown in Table 5) were in tune with all the above. In fact, and contrary to the results shown in Table 1 of Section 2, all tests were extensively passed. It is worth noting how the chi-square test was close to the optimal value (256).
In the subsequent subsections, the above analysis is rounded off by a bias and distinctiveness analysis. The performance of our proposal has also been analyzed and compared to previous works. Finally, some light is shed on which wavelet-family is more appropriate for the generation of random numbers.

Bias Analysis
The bias of the outputted stream has been analyzed for each approach. To this end, the following experiment has been carried out. For each subject (199 in total), a file of 0.5 MB has been generated using the same procedure as described in Section 3 and analyzed using the ENT suite. In Figure 4, a box-plot of the chi-square values is shown. It is worth noting that the optimal value of the chi-square test was 256, and the greater the distance to this optimal value, the greater the bias in the data. Using this analysis, it could be concluded that the fourth level approach was the most appropriate to build a secure and robust TRNG based on ECG signals: the average value (blue circle) was the optimal one, and the distribution of values between the first and third quartile was the narrowest.

Distinctiveness Analysis
We have tested whether the random data generated from different ECG signals (each one belonging to a single individual) were distinct. If this holds, an adversary cannot use data from another individual to predict values generated by the target. To evaluate this, as in Section 4.1, a file of 0.5 MB has been generated for each individual (ECG record). Then, for each file, data were grouped into {8, 16, 32, 64}-bit words. In Figure 5, the data distribution of the Hamming distance between all the pairs (C 199,2 in our particular case) of individuals belonging to the dataset is shown. As expected, the distribution fit a binomial distribution: where n = {8, 16  Apart from using the ECG of a different user, the attacker may be tempted to capture ECG signals from a distance. In [40], Calleja et al. showed how IPI-values (R-peaks) can be eavesdropped without touching the target individual and using a camera. Fortunately, this approach is totally uselessness against our proposal since the whole ECG signal (P-wave, QRS-complex and T-wave) was used and there is no way to predict or capture an entire ECG signal from a distance.

Performance Analysis
Apart from the poor randomness quality of IPI-based approaches [33], the throughput is also a bottleneck. Generally, in this sort of approach, four random bits (LSB) are extracted after the observation of two heart-beats [27,31]. In order to improve efficiency, in a recent proposal, Pirbhulal et al. were able to extract 16 bits per IPI value [34]. Despite all efforts, IPI-based approaches suffer from low throughput. Luckily, our approach was much more efficient since it was possible to extract several random bytes per each heart-beat. In Table 6, the performance of existing approaches is summarized. To facilitate the understanding of these values, in Columns 3 and 4, a healthy individual whose heart-beats between 60-and 100-times per minute (i.e., [1-0.6] s per beat) is assumed.
The particular number of bits that can be extracted from an ECG-window (including only one R-peak) depends on the heart rate of the individual. Figure 6 shows the average value of bits extracted per heart-beat for each of the 199 subjects belonging to the dataset. The overall average value of all these values is the value shown in Table 6.  Compared to previous proposals, the advantages of our solution were two-fold. On the one hand, one heart-beat (ECG-window) is only needed to extract random bits; note that IPI approaches require two heart-beats since a difference between two R-peaks is computed. On the other hand, the throughput has increased drastically (with a 2200% increase in the worst-case scenario). Therefore, our proposal was able to generate random bits at a moderately high throughput rate.

Wavelet Family Analysis
The wavelet decomposition of the ECG signal represents the core function of the proposed RNG. Up to this point, the experimentation has been conducted using the Daubechies family (the number of vanishing moments is set to four; N = 4). For completeness, we have evaluated the RNG with other families (i.e., Haar, coiflets, symlets, discrete Meyer, biorthogonal) to discern which alternatives were the most appropriate for the generation of randomness. A file of 100 MB has been generated in each case and then evaluated using the ENT, DIEHARDER and NIST randomness test suites. Table 7 summarizes the overall average results.
As already mentioned, Daubechies was our first approach since this is the common mother wavelet used for the analysis of the ECG signal [41][42][43]. Nevertheless, this paper explores how to extract randomness from ECG signals by multi-level wavelet decomposition, and to the best of our knowledge, this is the first time this approach has been studied. Therefore, the choice of the most appropriate wavelet family has not been evaluated before in the context of random number generation. Figure 7 summarizes the distribution of p-values for the tests included in the DIEHARD and NIST suites in order to gain a better overall perspective of the results. The biorthogonal approach can certainly be ruled out as the p-values were far away from an uniform distribution. In addition, Haar, coiflets and symlets are also not recommended as the median of the p-values fell well bellow the optimal value of 0.5. For all this, the use of Daubechies or discrete Meyer is recommended since with both approaches, the output stream behaved as a random variable.

Discussion
Regarding the extraction of randomness from cardiac signals, the reader may be tempted to think that this topic has already been studied in the literature. Nevertheless, there is a key-difference between IPI-based approaches, such as [31,34], and our proposal. In the former, the time difference between two R-peaks is the only information used; note that R-peaks can be read from an ECG record, but also from a Photoplethysmography (PPG) signal. In our approach, a whole ECG trace (P-wave, QRS-complex, T-wave) is needed.
As mentioned, the entire ECG signal is used to extract randomness from an ECG-window. In particular, a multi-level decomposition by wavelet analysis is the chosen technique. To the best of our knowledge, this is the first time that this approach has been proposed. It is worth noting that other transform domains (e.g., Fourier or Hadamard) have been tested, but the results were not as good as in the wavelet domain. Regarding the mother wavelet, as shown in Section 4.4, the two recommended families are Daubechies or Meyer.
The experimentation has been conducted with the E-HOL-03-0202-003 dataset, which contains 202 subjects recorded over a 24-h period. In the above-mentioned dataset, the subjects were healthy. The proposal could have been tested with a dataset in which the subjects suffer from a cardiac ailment. Nevertheless, this would be a more advantageous scenario since the disease itself would introduce more entropy into the ECG signal. Therefore, in a healthy setting, the worst case scenario for random number generation is considered.
Another critical aspect of the proposal is whether an adversary could predict the values of a target user using another user's ECG. The experiments conducted in Section 4.2 clearly point out how the attacker has no chance of success; that is, the adversary's advantage is zero. Furthermore, and unlike IPI-based approaches, in the proposed TRNG, the usage of the entire ECG signal prevents attacks where the heart signal is eavesdropped from a distance.
Finally, apart from randomness, throughput is a key aspect for cryptographic primitives. The proposed TRNG far exceeds its predecessors: throughput rate (bytes/s) is multiplied by about 20 in the worst-case scenario. Despite this increment, the study of whether the ECG signal can be further squeezed to extract randomness is a pending work.

Conclusions
In the last few years, the e-health sector has undergone a major transformation. The population is more concerned about it habits and health and has access to detailed information thanks to the wide variety of low-cost sensors or medical devices that monitor our vital signals and daily activities; all these sensors together with a central gateway make up the WSN and, more particularly, the WBAN when the sensors surround our bodies. In addition, the new generation of medical devices (e.g., pacemakers or insulin pumps) monitor physiological signals and upload these data to the hospital cloud. The doctor can not only check the status of a patient in real time, but can also re-program the device while the patient is comfortably at home. In short, the new health-tracking or IoT medical devices aim to improve the quality life of citizens by improving our performance/habits or treating a disease.
The benefits associated with continuous monitoring of our vital signals for medical or performance purposes are well-known. Nevertheless, the situation is very risky if security is not included on-board (and preferably by design) in these sensors/devices within a WSN. Therefore, sensors in a BAN that monitor our vital signals can be used with a dual purpose. On the one hand, the main goal is to improve the health of the user. Besides, additional goals do not have to compromise this primary goal; note that some sensors are critical for the treatment of certain medical conditions (e.g., heart attacks or epileptic seizures). On the other hand, the wireless connectivity of the devices makes the incorporation of security protection mechanisms mandatory; RNGs, such as the one designed in this article, can help in this task.
An authentication protocol is one of the most common solutions to provide an adequate security level for sensors with limited capabilities (computation, storage and energy). For this purpose, RNGs may be necessary for the generation of random numbers included in a cryptographic protocol or for the seed(s) employed in a key generation algorithm. As mentioned, in the context of random number generation, TRNGs exploit a physical phenomenon from which they extract entropy. Based on this principle, this article explores whether the randomness from cardiac signals can be extracted. In detail, a wavelet decomposition has been used to extract randomness from an ECG-window. To the best of our knowledge, this is the first time that this approach has been proposed. From the analysis carried out, it is concluded that the output of the proposed ECG-based TRGN behaves as a random variable. In addition, our TRNG offers a high throughput that has nothing to do with the low throughput of IPI-based approaches.
As future work, the proposal can be tested with other vital signals such as respiration, blood pressure or even an electroencephalogram. There is also room to study in depth the entropy extraction problem in a transformed domain.