Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform

Svetlakov, Mikhail; Kovalev, Ilya; Konev, Anton; Kostyuchenko, Evgeny; Mitsel, Artur

doi:10.3390/computers11030047

Open AccessArticle

Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform

by

Mikhail Svetlakov

^1,*

,

Ilya Kovalev

¹

,

Anton Konev

¹

,

Evgeny Kostyuchenko

¹

and

Artur Mitsel

²

¹

Department of Complex Information Security of Computer Systems, Faculty of Security, Tomsk State University of Control Systems and Radioelectronics, 634000 Tomsk, Russia

²

Department of Automated Control Systems, Faculty of Control Systems, Tomsk State University of Control Systems and Radioelectronics, 634000 Tomsk, Russia

^*

Author to whom correspondence should be addressed.

Computers 2022, 11(3), 47; https://doi.org/10.3390/computers11030047

Submission received: 20 January 2022 / Revised: 13 March 2022 / Accepted: 17 March 2022 / Published: 20 March 2022

(This article belongs to the Special Issue Explainable Artificial Intelligence for Biometrics 2021)

Download

Browse Figures

Versions Notes

Abstract

:

A promising approach to overcome the various shortcomings of password systems is the use of biometric authentication, in particular the use of electroencephalogram (EEG) data. In this paper, we propose a subject-independent learning method for EEG-based biometrics using Hilbert spectrograms of the data. The proposed neural network architecture treats the spectrogram as a collection of one-dimensional series and applies one-dimensional dilated convolutions over them, and a multi-similarity loss was used as the loss function for subject-independent learning. The architecture was tested on the publicly available PhysioNet EEG Motor Movement/Imagery Dataset (PEEGMIMDB) with a 14.63% Equal Error Rate (EER) achieved. The proposed approach’s main advantages are subject independence and suitability for interpretation via created spectrograms and the integrated gradients method.

Keywords:

EEG; biometrics; multi-similarity loss; subject-independent; representation learning; Hilbert–Huang transform

1. Introduction

Password-based authentication is being replaced by a more reliable biometric-based authentication [1]. Biometric-based authentication uses a person’s unique biological characteristics for recognition. Some of the most commonly used biometric traits are a finger or palm print, the iris pattern, the timbre and spectral images of the voice, facial images, handwritten signatures, or regular handwriting [2]. Some requirements must be met for biometrics to be applicable in a real-world setting. In particular, the biometric trait must be universal, persistent, and easy to measure, and biometric-trait-based identification systems must have high performance and recognize the identity with sufficient accuracy for practical applications [3]. Most biometric authentication systems also require the user to be physically present for authorization [4]. Considerably, the most important advantage of biometric authentication is that the user experience is usually convenient and fast [5]. Modern smartphones use fingerprint and facial recognition systems, which work fairly quickly for the end-user and partially bypass the problem of forgetting a password. Among the biometric authentication systems that have not yet become widespread, we can highlight those that rely on the use of EEG data.

EEG-based systems currently have many advantages over traditional methods and have attracted considerable research interest [6]. At this point, biometric EEG signals cannot be easily replicated, ensuring that the user is alive and well, making it a more reliable choice for identity verification, although the possibility of EEG signals being faked or compromised still exists [7]. EEG data can be used not only for authentication, but also for other purposes (emotion recognition, sleep, and health studies). In [8], the researchers created a new automated sleep staging system based on an ensemble learning stacking model that integrates Random Forest (RF) and eXtreme Gradient Boosting (XGBoosting), achieving 90.56% accuracy. In [9], EEG data from six electrodes were used to detect stroke patients with the C5.0 decision tree machine learning method achieving 89% accuracy. In [10], support vector machine was also used to distinguish stroke patients from healthy subjects (98% accuracy using only two electrodes versus 95.8% accuracy achieved in [11] using electrocardiogram (ECG) data and the random tree model). EEG data can also be used for the classification of Parkinson’s Disease (PD), as shown in [12] (the authors used Discriminant Function Analysis (FDA) and achieved 62% accuracy on EEG data alone and 98.8% accuracy combining EEG and Electromyogram (EMG) data). The classification of patients vs. controls for the diagnosis of PD in [13] was performed using a 13-layer neural net (88.2% accuracy). The multifunctionality of EEG data can help improve the reliability of an authentication system based on EEG data. For example, EEG data can change depending on the state and emotions of the user [14], which provides some protection in case the user is forcibly being scanned in a life-threatening situation. State-of-the-art methods (a dynamical graph convolutional neural network in [15], random forest in [16], k-NN in [17]) can classify emotions using EEG data with more than 80 % accuracy [18]. Multiple biometric data, such as facial recognition, can be used for surveillance without notifying the user, but in the case of EEG data, data extraction stops when the device is removed from the head [19].

At present, there are many studies on subject recognition using EEG data and machine learning methods. The first such study was conducted by the University of Piraeus in 1999. EEG signals were collected on a single monopolar channel using a mobile EEG device and used to train a vector quantizer network. The accuracy of the trained network was 72–84% [20]. In [21], the researchers used the k-Nearest-Neighbors (k-NN) algorithm and Linear Discriminant Analysis (LDA) to classify data from twenty participants, who were asked to perform two different tasks during signal capture: a hand movement task or an imaginary hand movement task. Accuracy ranged from 94.75% to 98.03%. In [22], a four-level (two convolutional layers and two pooling layers) Convolutional Neural Network (CNN) was used. Thirty subjects were recruited for the experiment. During the first task, participants were asked to remember their faces; during the second task, participants were asked to perform 10–12 eye blinks. The accuracy of this approach was 97.6%.

EEG-based subject-dependent recognition achieved practically perfect accuracy using a single recording session (3.9% EER in [22] using CNN and eye-blinking signals coupled with EEG signals, 99.8% accuracy in [23] using LDA and k-NN). However, the systems that achieve such high accuracy are of little use in real life for two reasons:

Most researchers use EEG data from only one data acquisition session without considering the possibility of the signal being non-stationary;
These approaches work only with a fixed list of users (subject-dependent).

Some researchers have tried to study and solve the first problem described above—non-stationarity. Reference [24] collected longitudinal EEG data (throughout the year) and found out that in the case of using only single-session data, system classification performance may generalize over session-specific recording conditions rather than over person individual EEG characteristics, achieving 90.8% Rank-1 identification accuracy over multiple sessions. Unfortunately, the collected dataset is not publicly available. In our work, we did not try to solve the first problem and used a dataset with only one recording session.

Regarding the second problem, subject dependency, all previous works had a fixed subject list output. In practical cases, the network should be able to recognize signals it has not encountered before in order to recognize a threat. It is possible to try to work around the problem by building separate classifiers for each user, but this is still impractical since training requires a fairly large amount of time. A subject-independent network has no classes at all. Instead, it takes data from two electroencephalogram signals, converts them into two feature vectors, and compares the distance between them to a certain threshold value. Recently, Reference [24] also considered the subject-independent classification approach, where system classification performance was tested using the leave-one-group-out methodology (the data of one of the users was not presented in the training fold and was present only in the test fold) [25]. In [26], a subject-independent classifier achieved the best validation result using the eyes-open (5.9% EER) and eyes-closed (7.2% EER) states’ data (multiple sessions) and 31 s verification phase data. Still, their architecture relied on one-dimensional convolutions performed over downsampled time series data, and the output process of the system was difficult for the average person to interpret, explain, or draw conclusions about, thus creating a new problem: the interpretability of deep learning systems.

Which frequencies contribute the most to the system’s output and distinguish its data from that of another subject? To partially solve this problem, we propose to use Hilbert spectrograms (obtained using the Huang–Hilbert transform and Empirical Mode Decomposition (EMD)) as the input and a publicly available dataset—the PhysioNet EEG Motor Movement/Imagery Dataset. Empirical mode decomposition with hand-crafted features has already been applied [27] on the PhysioNet EEG Motor Movement/Imagery Dataset (95.64% accuracy in the subject-dependent scenario, when each subject receives a separately built classifier). We also propose to apply an explainable artificial intelligence method—integrated gradients [28]. Such a method can increase user confidence in authentication system output, validate existing knowledge, question existing knowledge, and generate new assumptions [29].

In this paper, we propose a subject-independent learning method for EEG-based biometrics using Hilbert spectrograms of the data. The proposed neural network architecture treats a spectrogram as a collection of one-dimensional series and applies one-dimensional dilated convolutions over them, and a multi-similarity loss was used as the loss function for subject-independent learning. The architecture was tested on the PhysioNet EEG Motor Movement/Imagery Dataset (PEEGMIMDB) [30] with a 14.63% Equal Error Rate (EER) achieved. The proposed approach’s main advantage is the suitability for interpretation via Hilbert spectrograms and the integrated gradients method. The main contributions of this study are as follows:

The subject-independent neural network architecture for EEG-based biometrics using Hilbert spectrograms of the data as the input (trained using the multi-similarity loss);
The use of the integrated gradients method for the proposed architecture’s output interpretation.

2. Methodology and Proposed Solution

2.1. Dataset

The PhysioNet EEG Motor Movement/Imagery Dataset containing 1 min and 2 min recordings of 109 people from [30] was used. Subjects performed different motor/imagery tasks (4 tasks, 2 min EEG recordings); EEG recordings were also taken in the eyes-open and eye-closed resting states (1 min recordings).

2.2. Signal Processing

Initially, the EEG recordings were sets of recordings of 64 time series (from 64 electrodes), recorded using the BCI2000 system with a 160 Hz sampling rate. The data were divided into epochs of 5 s in duration (see Figure 1). To perform such a split and to process the dataset, we used the MNE Python toolkit [31]. We also used data from only 8 channels (O1, O2, P3, P4, C3, C4, F3, F4) to reduce the computational complexity, as [27] showed no significant classification performance drop after using only those 8 channels. We also used EEG data for only eyes-open and eyes-closed states, as it showed the best result in [26] and can be considered more practical from a consumer point of view (less time to authenticate the user while not requiring him/her to perform specific tasks other than him/her being still and resting). After such preprocessing, we had the following dataset dimensions: [2616 samples, 8 channels]. Some samples were rejected due to low quality. Each sample was a time series with 801 points, so we can present the dataset as a tensor: [2611 samples, 801 points, 8 channels].

To obtain the EEG signal spectrograms, we used the Hilbert–Huang Transform (HHT). In [32], it was concluded that the Hilbert–Huang transform can help eliminate noise from the EEG signal; the HHT is the most suitable method to process signals such as brain electrical signal and, at the same time, has excellent time–frequency resolution, so the HHT is more suitable to analyze non-stationary signals. As a result of the Hilbert–Huang transform’s first stage, the signal was decomposed into empirical modes. The Hilbert transform was subsequently applied to the selected modes in the decomposition. This transform allowed an effective decomposition of non-linear and non-stationary signals, which is especially useful in the case of EEG. The transformation also did not require an a priori functional basis for the transformation; the basis functions were set adaptively from the data by the empirical mode function selection procedure. An example of the EEG signal decomposition into empirical modes is shown in Figure 2.

After calculating the instantaneous frequencies from the derivatives of the phase functions by the Hilbert transform of the basis, the result can be represented in the frequency–time form. Given the Nyquist–Shannon sampling theorem and 160 Hz sampling rate, we used 60 frequency bins from 0.1 Hz to 60 Hz. The resulting spectrogram had the shape of [60 frequency bins, 801 points]. An example of the EEG signal transformation in the form of a spectrogram is shown in Figure 3. In order to prevent the mode mixing problem [33], we used the masked sifting method [34], implemented in the EMD Python package [35].

The spectrograms of EEG channel data that we obtained in the previous step were essentially two-dimensional maps. These two dimensions represent fundamentally different units of measurement, one of which is the frequency power and the other time. Therefore, the spatial invariance that two-dimensional CNNs provide may not be suitable for our task. It is better for us to represent spectrograms as a set of stacked time series for different frequency bins [36]. As such, we additionally reshaped the data to 60 time series with 801 points (Figure 4) and stacked the time series over all channels (such a transform can be easily reversed in case we want to use the integrated gradients method) and also applied min–max normalization over the (time series × channel) dimension. No further processing, such as noise removal or band-pass filtering, was applied. The resulting dataset shape was [2611 samples, 480 time series (60 time series × 8 channels), 801 points].

2.3. Deep Learning Methods

One-dimensional dilated convolutions can be successfully utilized to classify time series and are more computationally efficient than LSTM blocks [37]. We propose the multichannel dilated one-dimensional convolutional net architecture described in Table 1 to generate feature vectors from the data. We used metric learning methods to map the data to an embedding space, where similar data are close together and dissimilar data are far apart [38]. In general, this can be achieved using specific embedding and classification losses such as the triplet loss [39], ArcFace Loss [40] or multi-similarity loss [41]. In this work, we used multi-similarity loss and the metric-learning framework [38] implemented in PyTorch.

The first convolution layer uses padding in such a way that the input data shape is preserved (except the channels’ dimension) to correctly process the edge values. We also used Parametric Rectified Linear Unit (PReLU) as the activation function, because [42] showed that it can outperform the Rectified Liner Unit function (ReLU).

2.4. Model Interpretation

Improving the interpretability of deep models is a critical task for machine learning. One method for solving this problem is to identify the portions of the input data that contribute most to the final model output. However, existing approaches have several drawbacks, such as poor sensitivity to and instability in the specific implementation of the model. Reference [28] discussed two axioms: sensitivity and implementation invariance, which they believe a good interpretation method must satisfy.

The sensitivity axiom means that if two images differ by exactly one pixel (but they have all other pixels in common) and give different predictions, the interpretation algorithm should give a non-zero attribution to that pixel. The axiom of implementation invariance means that the basic implementation of the algorithm should not affect the result of the interpretation method. Researchers have used these principles to develop a new attribution method called integrated gradients.

IG starts with a base image (usually a completely darkened version of the input image) that increases in brightness until the original image is restored. Gradients of class estimates for the input pixels are computed for each image and averaged to obtain a global importance value for each pixel. Besides the theoretical properties, IG thus also solves another problem with vanilla gradient ascent: saturated gradients. Since the gradients are local, they do not reflect the global importance of pixels, but only the sensitivity at a particular input point. By changing the image brightness and calculating gradients at different points, IG can obtain a more complete picture of the importance of each pixel. In our work, we used the PyTorch-based Captum [43] framework implementation of integrated gradients and call the output of the integrated gradients an importance map. The block diagram featuring all output steps is shown in Figure 5.

3. Results

3.1. Model Training

To test the architecture’s performance, we used the leave-k-groups-out (the data of multiple users are not presented in the training set and are present only in the testing set) validation methodology. GroupKFold (with k = 5) from the scikit-learn package [44] was used as an iterator variant with non-overlapping groups. The same group would not appear in two different CV testing sets/folds (the number of distinct groups has to be at least equal to the number of folds). The folds were approximately balanced (the number of distinct groups was approximately the same in each fold). There were 22 (21 in the last fold) subjects’ data appearing only in the test fold during each CV iteration. Each epoch, 10 data samples per class in the training fold were randomly selected, forming batches. For model training, we used the Adam optimizer (lr = 1 × 10⁻⁴, weight_decay = 1 × 10⁻³, 500 epochs).

After training, we generated 128-unit l2-normalized feature vector representations of the input data and computed the cosine distance matrix for the generated representations. After this, the sklearn [44] classifier CalibratedClassifierCV (using LinearSVC as a base estimator) was used to calculate the confusion matrix over different distance thresholds. In such a way, we could obtain the Equal Error Rate (EER), which is a metric always used in state-of-the-art EEG-based verification systems [45]. The EER is the location on a Detection Error Tradeoff (DET) curve where the false acceptance rate and false rejection rate are equal. In general, the lower the equal error rate value, the higher the accuracy of the biometric system is. The obtained EER value was 14.63%. The feature space with training fold samples is visualized in Figure 6 using the TSNE method [46].

The hardware used in this study consisted of one Nvidia Tesla T4 GPU card (320 Turing Tensor cores, 2560 CUDA cores, and 16 GB of GDDR6 VRAM), one 8-core CPU, and 64 GB of RAM. The DNN model was trained using the GPU implementation of PyTorch, while all other processes used the CPU. The Python programming language was used for the present study. Along with it, some libraries in addition to the ones already mentioned before were also employed: Keras [47], NumPy [48], Matplotlib [49].

3.2. Model Interpretability

After training, integrated gradients method can be applied to the model. An example output is shown in Figure 7. The integrated gradients method output in our case can be summed over the time dimension or the channel dimension. Figure 8 and Figure 9 show the integrated gradients method output for spectrograms of four subjects, summed over the time dimension. Here, Channels 1–8 correspond to the (O1, O2, P3, P4, C3, C4, F3, F4) channels. It can be clearly seen which channels and frequencies were more important for the model feature vector output.

Figure 8 demonstrates that there was a large variability within the same class and a small separation between two different classes (they look alike). We can additionally sum importance maps over the channel dimension to see which frequencies are more important for the model feature vector output and more clearly visually distinguish importance maps for each class (see Figure 10 and Figure 11).

3.3. Ablation Study

We also processed the entire dataset to obtain the importance maps and access each channel and frequency bin importance for our model. Figure 12 shows that the contribution of channels P4, C3 was very low; Figure 13 shows that Delta (<4 Hz), Beta (16–31 Hz), and Gamma (>32 Hz) contributed most. Afterward, we tested the model performance by leaving only one frequency range present in the dataset and zeroing all other frequency ranges for the input data (see Table 2).

4. Discussion

The proposed architecture was tested on the publicly available PEEGMIMDB dataset with a 14.63% Equal Error Rate (EER) achieved. It had a worse EER value than in [26] (Single-Session Enrollment (SSE) and Short Time Distance (STD) with deep representations with channel-specific CNN modeling achieved an 8.1% EER and a 6.8% EER for the eyes-closed and eyes-open states, respectively; the dataset used is not publicly available), which may have contributed to different dataset subject numbers (109 in our case vs. 50 subjects in [26]), but our proposed approach’s main advantage is its suitability for interpretation via the created spectrograms and the integrated gradients method (we operated on spectrograms in the time–frequency domain, and Reference [26] operated only in time domain). In some cases, the difference can not be clearly seen, as in Figure 8. However, we can additionally sum importance maps over the channel dimension to see which frequencies are more important for the model feature vector output and more clearly visually distinguish importance maps for each class (see Figure 10 and Figure 11).

5. Conclusions

The proposed neural network architecture treats Hilbert spectrogram as a collection of one-dimensional series and applies one-dimensional dilated convolutions over them. A multi-similarity loss was used as the loss function for subject-independent learning. The architecture was tested on the publicly available PEEGMIMDB dataset with a 14.63% Equal Error Rate (EER) achieved. Our proposed approach’s main advantage was the suitability for interpretation via the created spectrograms and integrated gradients method (we operated on spectrograms in the time–frequency domain, and Reference [26] operated only in the time domain). Future work will focus on using the Hilbert holospectrum to improve system accuracy.

Author Contributions

Conceptualization, M.S. and I.K.; methodology, A.K. and A.M.; software, I.K.; validation, M.S., A.K. and E.K.; formal analysis, E.K.; investigation, M.S.; resources, E.K.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, I.K.; visualization, I.K.; supervision, A.K. and A.M.; project administration, E.K.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Higher Education of Russia, Government Order for 2020–2022, Project No. FEWM-2020-0037 (TUSUR).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository The data presented in this study are openly available in PhysioNet repository at DOI: 10.13026/C28G6P, reference number [50].

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; nor in the decision to publish the results.

References

Kodituwakku, S.R. Biometric Authentication: A Review. Int. J. Trend Res. Dev. 2015, 2, 113–123. [Google Scholar]
Chuang, J.; Nguyen, H.; Wang, C.; Johnson, B. I Think, Therefore I Am: Usability and Security of Authentication Using Brainwaves. In Financial Cryptography and Data Security, Proceedings of the International Conference on Financial Cryptography and Data Security, Okinawa, Japan, 1–5 April 2013; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7862, pp. 1–16. [Google Scholar] [CrossRef] [Green Version]
Jain, A.K.; Ross, A.; Prabhakar, S. An Introduction to Biometric Recognition. IEEE Trans. Circuits Syst. Video Technol. 2004, 14, 4–20. [Google Scholar] [CrossRef] [Green Version]
Hashim, M.M.; Mohsin, A.K.; Rahim, M.S.M. All-Encompassing Review of Biometric Information Protection in Fingerprints Based Steganography. In Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control, Amsterdam, The Netherlands, 25–27 September 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar]
Yang, W.; Wang, S.; Sahri, N.M.; Karie, N.M.; Ahmed, M.; Valli, C. Biometrics for Internet-of-Things security: A review. Sensors 2021, 21, 6163. [Google Scholar] [CrossRef] [PubMed]
Gui, Q.; Ruiz-Blondet, M.V.; Laszlo, S.; Jin, Z. A Survey on Brain Biometrics. ACM Comput. Surv. 2019, 51, 1–38. [Google Scholar] [CrossRef]
Hartmann, K.G.; Schirrmeister, R.T.; Ball, T. EEG-GAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals. arXiv 2018, arXiv:1806.01875. [Google Scholar]
Satapathy, S.K.; Bhoi, A.K.; Loganathan, D.; Khandelwal, B.; Barsocchi, P. Machine learning with ensemble stacking model for automated sleep staging using dual-channel EEG signal. Biomed. Signal Process. Control 2021, 69, 102898. [Google Scholar] [CrossRef]
Hussain, I.; Park, S.J. Quantitative Evaluation of Task-Induced Neurological Outcome after Stroke. Brain Sci. 2021, 11, 900. [Google Scholar] [CrossRef] [PubMed]
Hussain, I.; Park, S.J. HealthSOS: Real-Time Health Monitoring System for Stroke Prognostics. IEEE Access 2020, 8, 213574–213586. [Google Scholar] [CrossRef]
Hussain, I.; Park, S.J. Big-ECG: Cardiographic Predictive Cyber-Physical System for Stroke Management. IEEE Access 2021, 9, 123146–123164. [Google Scholar] [CrossRef]
Paul, S.; Saikia, A.; Hussain, M.; Barua, A.R. EEG-EMG Correlation for Parkinson’s disease. Int. J. Eng. Adv. Technol. 2019, 8, 1179–1185. [Google Scholar] [CrossRef]
Oh, S.L.; Hagiwara, Y.; Raghavendra, U.; Yuvaraj, R.; Arunkumar, N.; Murugappan, M.; Acharya, U.R. A deep learning approach for Parkinson’s disease diagnosis from EEG signals. Neural Comput. Appl. 2018, 32, 10927–10933. [Google Scholar] [CrossRef]
Lee, Y.Y.; Hsieh, S. Classifying Different Emotional States by Means of EEG-Based Functional Connectivity Patterns. PLoS ONE 2014, 9, e95415. [Google Scholar] [CrossRef] [PubMed]
Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Trans. Affect. Comput. 2020, 11, 532–541. [Google Scholar] [CrossRef] [Green Version]
Kumaran, D.S. Using EEG-Validated Music Emotion Recognition Techniques to Classify Multi-Genre Popular Music for Therapeutic Purposes. 2018. Available online: https://digitalcommons.imsa.edu/cgi/viewcontent.cgi?article=1023&context=issf2018 (accessed on 30 November 2021).
Fan, J.; Wade, J.W.; Key, A.P.; Warren, Z.E.; Sarkar, N. EEG-Based Affect and Workload Recognition in a Virtual Driving Environment for ASD Intervention. IEEE Trans. Biomed. Eng. 2018, 65, 43–51. [Google Scholar] [CrossRef]
Suhaimi, N.S.; Mountstephens, J.; Teo, J. EEG-Based Emotion Recognition: A State-of-the-Art Review of Current Trends and Opportunities. Comput. Intell. Neurosci. 2020, 2020, 1–19. [Google Scholar] [CrossRef] [PubMed]
Mason, J.; Dave, R.; Chatterjee, P.; Graham-Allen, I.; Esterline, A.; Roy, K. An investigation of biometric authentication in the healthcare environment. Array 2020, 8, 100042. [Google Scholar] [CrossRef]
Poulos, M.; Rangoussi, M.; Alexandris, N.; Evangelou, A. On the use of EEG features towards person identification via neural networks. Med. Inf. Internet Med. 2001, 26, 35–48. [Google Scholar] [CrossRef] [Green Version]
Zhi Chin, T.; Saidatul, A.; Ibrahim, Z. Exploring EEG based Authentication for Imaginary and Non-imaginary tasks using Power Spectral Density Method. IOP Conf. Ser. Mater. Sci. Eng. 2019, 557, 012031. [Google Scholar] [CrossRef]
Wu, Q.; Zeng, Y.; Zhang, C.; Tong, L.; Yan, B. An EEG-Based Person Authentication System with Open-Set Capability Combining Eye Blinking Signals. Sensors 2018, 18, 335. [Google Scholar] [CrossRef] [Green Version]
Kaliraman, B.; Duhan, M. A new hybrid approach for feature extraction and selection of electroencephalogram signals in case of person recognition. J. Reliab. Intell. Environ. 2021, 7, 241–251. [Google Scholar] [CrossRef]
Maiorana, E.; La Rocca, D.; Campisi, P. On the permanence of EEG signals for biometric recognition. IEEE Trans. Inf. Forensics Secur. 2016, 11, 163–175. [Google Scholar] [CrossRef]
Maiorana, E. EEG-Based Biometric Verification Using Siamese CNNs. In New Trends in Image Analysis and Processing—ICIAP 2019; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
Maiorana, E. Learning deep features for task-independent EEG-based biometric verification. Pattern Recognit. Lett. 2021, 143, 122–129. [Google Scholar] [CrossRef]
Barayeu, U.; Horlava, N.; Libert, A.; Van Hulle, M. Robust single-trial EEG-based authentication achieved with a 2-stage classifier. Biosensors 2020, 10, 124. [Google Scholar] [CrossRef] [PubMed]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Arrieta, A.B.; Díaz-Rodríguez, N.; Ser, J.D.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
Schalk, G.; McFarland, D.J.; Hinterberger, T.; Birbaumer, N.; Wolpaw, J.R. BCI2000: A general-purpose brain-computer interface (BCI) system. IEEE Trans. Biomed. Eng. 2004, 51, 1034–1043. [Google Scholar] [CrossRef]
Larson, E.; Gramfort, A.; Engemann, D.A.; Jaeilepp; Brodbeck, C.; Jas, M.; Brooks, T.L.; Sassenhagen, J.; Luessi, M.; King, J.R.; et al. MNE-Tools/MNE-Python: v0.24.1. 2021. Available online: https://mne.tools/stable/index.html (accessed on 30 November 2021).
Zhang, L.; Wu, D.; Zhi, L. Method of removing noise from EEG signals based on HHT method. In Proceedings of the 2009 First International Conference on Information Science and Engineering, Nanjing, China, 26–28 December 2009. [Google Scholar]
Souza, U.B.D.; Escola, J.P.L.; Brito, L.D.C. A survey on Hilbert–Huang transform: Evolution, challenges and solutions. Digit. Signal Process. 2022, 120, 103292. [Google Scholar] [CrossRef]
Deering, R.; Kaiser, J.F. The use of a masking signal to improve empirical mode decomposition. In Proceedings of the (ICASSP’05)—IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA, 23 March 2005. [Google Scholar]
Quinn, A.J.; Lopes-dos Santos, V.; Dupret, D.; Nobre, A.C.; Woolrich, M.W. EMD: Empirical Mode Decomposition and Hilbert–Huang Spectral Analyses in Python. J. Open Source Softw. 2021, 6, 2977. [Google Scholar] [CrossRef] [PubMed]
Ruffini, G.; Ibañez, D.; Castellano, M.; Dubreuil-Vall, L.; Soria-Frisch, A.; Postuma, R.; Gagnon, J.F.; Montplaisir, J. Deep Learning With EEG Spectrograms in Rapid Eye Movement Behavior Disorder. Front. Neurol. 2019, 10, 806. [Google Scholar] [CrossRef] [Green Version]
Yazdanbakhsh, O.; Dick, S. Multivariate Time Series Classification using Dilated Convolutional Neural Network. arXiv 2019, arXiv:1905.01697. [Google Scholar]
Musgrave, K.; Belongie, S.; Lim, S.N. A Metric Learning Reality Check. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020. [Google Scholar]
Hoffer, E.; Ailon, N. Deep metric learning using Triplet network. In International Workshop on Similarity-Based Pattern Recognition; Springer: Cham, Switzerland, 2015. [Google Scholar]
Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Wang, X.; Han, X.; Huang, W.; Dong, D.; Scott, M.R. Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
Kokhlikyan, N.; Miglani, V.; Martin, M.; Wang, E.; Alsallakh, B.; Reynolds, J.; Melnikov, A.; Kliushkina, N.; Araya, C.; Yan, S.; et al. Captum: A unified and generic model interpretability library for PyTorch. arXiv 2020, arXiv:2009.07896. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Maiorana, E.; Campisi, P. Longitudinal evaluation of EEG-based biometric recognition. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1123–1138. [Google Scholar] [CrossRef]
van der Maaten, L.; Hinton, G.E. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Chollet, F. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 30 November 2021).
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
Schalk, G.; McFarland, D.J.; Hinterberger, T.; Birbaumer, N.; Wolpaw, J.R. EEG Motor Movement/Imagery Dataset. 2009. Available online: https://physionet.org/content/eegmmidb/1.0.0/ (accessed on 30 November 2021).

Figure 1. Dividing data into epochs.

Figure 2. An example of the EEG empirical mode decomposition result.

Figure 3. Huang–Hilbert transform EEG spectrogram.

Figure 4. An example of reshaped spectrogram data as a collection of stacked time series for different frequency bins.

Figure 5. The proposed method framework.

Figure 6. TSNE projected feature space.

Figure 7. Integrated gradients method output.

Figure 8. Integrated gradients method output for different spectrograms summed over time dimension for Subjects 1 and 2.

Figure 9. Integrated gradients method output for different spectrograms summed over time dimension for Subjects 3 and 4.

Figure 10. Integrated gradients method output for different spectrograms summed over the time and channel dimension for Subjects 1 and 2.

Figure 11. Integrated gradients method output for different spectrograms summed over the time and channel dimension for Subjects 3 and 4.

Figure 12. Each channel importance for the model output.

Figure 13. Each frequency bin importance for the model output.

Table 1. The proposed architecture. Here, f—number of filters, dr—dropout rate, d—dilation rate, k—kernel size, n—number of neurons, and p—padding type.

Layer	Output Shape	Trainable Parameters
Dropout(dr = 0.7)	(480, 801)	0
Conv1d(f = 512,k = 5,d = 1,p = “same”)	(512, 801)	1,229,312
BatchNorm1d	(512, 801)	1024
PReLU	(512, 801)	512
Conv1d(f = 512,k = 5,d = 2)	(512, 801)	655,616
BatchNorm1d	(512, 801)	512
PReLU	(256, 801)	256
Conv1d(f = 512,k = 5,d = 4)	(256, 777)	327,936
BatchNorm1d	(256, 777)	512
PReLU	(256, 777)	256
Conv1d(f = 512,k = 5,d = 16)	(256, 777)	327,936
BatchNorm1d	(256, 713)	512
PReLU	(256, 713)	256
Dropout(dr = 0.7)	(256, 713)	0
Conv1d(f = 512,k = 5,d = 32)	(256, 585)	327,936
BatchNorm1d	(256, 585)	512
PReLU	(256, 585)	256
Conv1d(f = 512,k = 5,d = 72)	(256, 297)	327,936
BatchNorm1d	(256, 297)	512
PReLU	(256, 297)	256
Conv1d(f = 512,k = 5,d = 74)	(256, 1)	327,936
BatchNorm1d	(256, 1)	512
PReLU	(256, 1)	256
FullyConnected(n = 256)	(256, 1)	65,792
PReLU	(256)	256
FullyConnected(n = 128)	(128, 1)	32,896

Table 2. Accuracy of the proposed biometric method in different frequency ranges.

Frequency Range	0–10 Hz	10–20 Hz	20–30 Hz	30–40 Hz	40–50 Hz	50–60 Hz
Accuracy	74.2%	73.5%	72.8%	74.9%	72.9%	68.2%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Svetlakov, M.; Kovalev, I.; Konev, A.; Kostyuchenko, E.; Mitsel, A. Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform. Computers 2022, 11, 47. https://doi.org/10.3390/computers11030047

AMA Style

Svetlakov M, Kovalev I, Konev A, Kostyuchenko E, Mitsel A. Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform. Computers. 2022; 11(3):47. https://doi.org/10.3390/computers11030047

Chicago/Turabian Style

Svetlakov, Mikhail, Ilya Kovalev, Anton Konev, Evgeny Kostyuchenko, and Artur Mitsel. 2022. "Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform" Computers 11, no. 3: 47. https://doi.org/10.3390/computers11030047

APA Style

Svetlakov, M., Kovalev, I., Konev, A., Kostyuchenko, E., & Mitsel, A. (2022). Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform. Computers, 11(3), 47. https://doi.org/10.3390/computers11030047

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform

Abstract

1. Introduction

2. Methodology and Proposed Solution

2.1. Dataset

2.2. Signal Processing

2.3. Deep Learning Methods

2.4. Model Interpretation

3. Results

3.1. Model Training

3.2. Model Interpretability

3.3. Ablation Study

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI