A biometric system can be explained as a system with pattern recognition abilities which is able to verify or recognize a person by utilizing biological features. Biometric-based approaches utilize a person’s behavioural and physiological characteristics as features for this purpose. Examples of behavioural characteristics are the voice and gait, while physiological characteristics include palm prints, iris scans, fingerprints, and deoxyribonucleic acid (DNA). Biometrics can be further classified into two categories, namely, conventional biometrics [1
] and cognitive biometrics [2
]. Conventional biometrics use a person’s physiological and behavioural properties [3
], while cognitive biometrics perform identification based on signals from the human brain indicating their cognitive and emotional condition. Features can be extracted from the brain signals and used as traits for biometric identification.
Studies have shown that electrical signals from the human brain have unique properties in different individuals [4
]. Therefore, electroencephalography (EEG) is a biometrics trait suitable for personal identification [7
]. The brain’s EEG signals are captured by placing electrodes on the scalp. Implementing EEG as a trait for biometrics offers several advantages. EEG is highly confidential, as it is generated based on the individual human brain’s activity during recording. It is difficult to imitate or duplicate. A literature review by Mohammed Abo-Zahhad et al. regarding the usability of EEG for biometrics applications [9
] has indicated that, according to their analysis, there is solid proof that EEG signals possess extremely distinctive qualities that make them suited for biometric identification.
There are two types of EEG recordings that are commonly used for biometric applications, namely, active paradigm EEG and task-free paradigm EEG. A subject is required to perform specific tasks or be exposed to external stimulants, known as event-related potentials (ERP) or visually-evoked potentials (VEP), during the recording of active paradigm EEG signals. Several biometric applications have been developed previously. In the work of Hema et al. [10
], an artificial neural network (ANN) was trained using features extracted from EEG signals that were recorded during the performance of mental tasks. A Welch algorithm was used to extract the power spectral density (PSD) from the EEG beta waves. Ferreira et al. [11
] proposed an EEG-based biometric approach that trained a support vector machine (SVM); spectral power was extracted from the gamma frequency band of the VEP-EEG and used as the input for the SVM.
While active paradigm EEG can provide a specific signal segment when the brain responds to stimulants, it can be time-consuming to set up and to perform specific guidelines. Therefore, to provide more accessible application of EEG biometrics, task-free paradigm represents a better option. Task-free paradigm EEG is a continuous method of EEG recording that does not require external stimulants or the performance of tasks by the subject. These paradigms can be divided into an eyes-closed resting state (REC) and an eyes-open resting state (REO). For task-free paradigm EEG acquisition, the subject is only required to close or open their eyes in a resting state. Therefore, it is suitable for all individuals, including those who are severely injured or bedridden. Based on the literature, studies have shown that both REC and REO can differentiate between individuals, thereby strengthening the usability of resting-state EEG as a biometric measure [12
Several state-of-art works have made use of REC or the REO for biometric identification. For example, both REC and REO were utilized as attributes by Choi et al. [12
] in their biometric system. Alpha activity in the EEG was extracted as a feature, as it has been shown that alpha power becomes stronger during the eyes-closed resting state [17
]. Leave-one-out cross-validation with cross-correlation (CC) was implemented to estimate the identification accuracy. It was discovered that individuals’ spatio-spectral patterns of alterations in alpha activity differ from one another. Thus, EEG can be employed for biometric identification. The entropy features from gamma, beta, alpha, and theta of both REC and REO EEG signals were used by Thomas and Vinod [13
] for individual identification by loading these features into a Mahalanobis distance-based classifier. According to their findings, beta-band entropy had the greatest inter-subject heterogeneity. This work is further enhanced by concatenating power PSD with the entropy features. In the identification system proposed by Suppiah and Vinod [14
], PSD features were taken from both REC and REO EEG signals to train a Fischer linear discriminant classifier. De Vico Fallani et al. [20
] utilized PSD from REC and REO EEG signals to train a Bayes classifier for personal identification. Fraschini et al. [16
] proposed an EEG-based biometric identification system incorporating both REC and REO using eigenvector centrality. They observed that the resting-state functional brain network provides better classification than the usual functional brain network. Di et al. [21
] conducted a study using REO and REC. Their approach recorded two sets of REO and REC EEG recordings taken in two week intervals. Spectral and statistical analysis were used to extract features from the recordings. The extracted features were used to train three separate classifiers: SVM, linear discriminant analysis (LDA), and Euclidean distance. Their work shows that cross-time resting-state EEG can provide robust EEG identification.
The conventional design of a resting-state biometrics system requires feature extraction for the training of classifiers [5
]. Feature extraction is a crucial yet challenging task, as vital data must be carefully chosen in order to characterize persons for robust identification. Furthermore, because resting-state EEG signals contain fewer indicators than signals with external stimulants, deriving discriminative features from them is even more complicated. Therefore, it is desirable to automate the search for important features. To overcome the complex feature extraction process, CNN is one of the most common approaches utilized in works that require automated feature extraction and classification [22
CNN is a machine learning approaches inspired by biological topology [25
]. CNN was originally developed for image classification tasks, and presented promising results in pixel by pixel analysis. Therefore, it is suitable for be use with EEG signals, as their data points can be organized in matrix form similar to an image [27
]. Multilayer perceptrons (MLP) with several hidden layers make up the foundation of a CNN, while convolutional layers, pooling layers, and the standard back-propagation neural network dense layer make up the hidden layers. For an explanation of CNN architecture, the work by Ma et al. [28
] provides a good example. This CNN architecture has five layers. The first convolutional layer uses weighted learnable kernels to generate the feature map from the input [29
]. The output of each convolutional layer is a feature map. When characteristics of interest are found in the input, the convolutional layers adapt to trigger the feature maps. The features are pooled in the first pooling layer, then fed forward to the second convolutional layer. The second convolutional layer extracts the attributes from the first pooling layer’s output, producing a feature map. Next, the second pooling layer pools the features from this map. The generated feature map is flattened before being passed along into the fully connected layer. Similar to the standard ANN [30
], the fully connected layer classifies and categorizes the features into corresponding labelled classes. Each hidden layer has learnable parameters that must undergo numerous learning and validation iterations in order to arrive at the ideal value [31
Current interest in implementing CNN for EEG-based biometric applications is high, as shown in [28
]. Recent works are summarized in Table 1
. It can be seen that the CNN architectures in these studies are relatively shallow (not more than ten layers). However, a shallower architecture with limited convolutional layers might not be effective in extracting finer features. Fine features can support the training of CNNs with more information, making them more robust against different individuals. Furthermore, several works do not present the application of batch normalization, even though it has proven effective in improving the performance of CNN architectures. In addition, task-free paradigm EEG signals are under-explored with respect to the use of CNNs, even though they are was easier and more convenient to implement.
Therefore, the main objective of this paper is to design a task-free EEG biometric identification approach using a deep CNN architecture. Four experiments have been carried out to ensure that the design of the deep CNN architecture presents high identification accuracy. The proposed method is further compared with existing CNN approaches in order to investigate its identification performance. This paper is organized into four sections. The proposed CNN architecture is explained in detail in Section 2
, including the dataset, pre-processing, and parameters used. The same section includes experiments that were used to designt the deep CNN architecture for biometric identification. Results from the experiments are discussed in Section 3
. Finally, Section 4
concludes the paper with the findings of this study.
3. Results and Discussions
This section is divided into six subsections. First, Section 3.1
presents the results of Experiment 1. Next, Section 3.2
provides the results of Experiment 2. The results of Experiment 3 are presented in Section 3.3
, and the results from Experiment 4 are provided in Section 3.4
. Section 3.5
describes the final CNN architecture, and Section 3.6
compares the performance of the proposed CNN architecture with related works.
3.1. Selection of the Optimal Number of Convolutional Layers and Type of Resting-State EEG Input
The convolutional layer is the key feature that determines the performance of any CNN architecture. It extracts important features that provide information during the training process. The number of convolutional layers affects the output feature maps that are passed into the fully connected layer. Therefore, the number of convolutional layers has to be carefully selected in order to ensure effective feature extraction without losing any information that is critical to obtaining a good identification result. Training and testing were carried out using REC, REO, and REC + REO separately with different numbers of convolutional layers. The convolutional layers (Conv) used in this experiment were made up of six filters with a kernel size of 5 × 5. One average pooling layer (avgpool) with a filter size of was placed before a fully connected layer (FC) with 109 neurons.
The summary of this experiment is presented in Table 3
. The highest identification accuracy is in bold, which is the 8 conv + 1 avg pool + 1 FC architecture with eight convolutional layers using the REO dataset. The eighth convolutional layer extracted a 32 × 138 × 6 feature map, which contains the most information from the input. From the results, it can be seen that the REO dataset presents higher identification accuracy in all CNN architectures, suggesting that REO EEG signals are more efficient in discriminating between individuals. Furthermore, the results suggest that there is no proportional relationship between the number of convolutional layers used and the identification accuracy.
3.2. Selection of the Optimal Number of Fully Connected Layers
The fully connected layer in a CNN architecture represents the feed-forward neural network, which plays the role of learning the information from the feature map and classifying each individual’s EEG with the correct label. The fully connected layer initiates the backpropagation learning process of the CNN architecture and confers the most accurate weights. Selecting the optimal number of fully connected layers can affect the learnability of a CNN architecture. The CNN architecture and dataset with the highest identification accuracy in Experiment 1 were used for training and testing using a different number of fully connected layers. The results of Experiment 2 are summarized in Table 4
It can be seen that the CNN architecture with two FC layers has the highest identification accuracy (), while the CNN architectures with more than two FC layers have degraded identification accuracy. One FC layer is not sufficient for learning from the final feature maps. On the other hand, using a higher number of FC layers increases the architectural complexity and carries the risk of overfitting.
3.3. Selection of the Optimal Number and Type of Pooling Layers
The pooling layer functions to gradually lower the spatial size of the feature map in order to reduce computational complexity. Max pooling and average pooling are the two most widely used types of pooling. Max pooling applies a max-pooling filter to the subregions of the initial representation and obtains the maximum value in the filter region. Average pooling, on the other hand, applies an average filter to obtain the average value in the filter region. In Experiment 3, training and testing were carried out using a different number of average pooling layers, a max-pooling layer, and the placement location of the pooling layer. This experiment was carried out using the CNN architecture previously optimized through Experiments 1 and 2. The results of the experiment using one, two, and three average pooling layers are presented in Table 5
, Table 6
and Table 7
In this comparison, the CNN architecture with one average pooling layer presented the highest identification accuracy (). Using more than one average pooling layer can cause too much information loss due to the downsizing of the feature map. The results of this experiment suggest that placing one average pooling layer before the fully connected layer can effectively discretize the feature map while retaining important information before passing it to the fully connected layer.
The training and testing processes were repeated using a max-pooling layer, with the results summarized in Table 8
, Table 9
and Table 10
, respectively. The results for the CNN architecture using a max-pooling layer follow the trend of the average pooling results. However, the max-pooling layer presents slightly lower identification results. This is because average pooling includes all features in the count and then propagates it to the next layer. All information from the previous feature map can then be used for feature mapping and creating an informative output, which is a very generalized computation. In comparison, max-pooling extracts only extreme values, which can cause information loss in the pooling process.
3.4. Selection of the Optimal Optimizer
An optimizer plays an important role in minimizing the training error rate and ensuring that the CNN architecture converges efficiently. To select the most suitable optimizer for our application, SGD and Adam were compared using the architecture from Experiment 3. SGD and Adam were selected for comparison because both are among the most popular and commonly used optimizers. SGD is the most commonly used optimizer; it uses gradient descent to update the weights and bias in the network via backpropagation. On the other hand, the Adam otimizer uses gradient-based optimization of stochastic objective functions, combining the benefits of two SGD advancements: Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp). It can compute unique adaptive learning degrees for various parameters. At the same time, Adam inherits the advantages of SGD in that direction towards local minima can be determined by relying on the momentum.
The results of this experiment are presented in Table 11
. The results indicate that the CNN implementing Adam has higher identification accuracy (
). The Adam optimizer presents a lower standard deviation, indicating the training of the CNN architecture with Adam optimizer has more consistent identification accuracy. According to these experimental findings, the advantages of the Adam optimizer are helpful in facilitating effective learning. The design demonstrates improved performance and smoothly converges to the local minima. The learning of the architecture was improved for this investigation by computing a specific learning rate for each weight and bias. The Adam optimizer was therefore chosen as the best optimizer for the suggested architecture.
3.5. Proposed Convolutional Neural Network Architecture
Referring to the experimental results, a deep CNN with eleven layers is proposed. The literature shows that most related works have used shallower CNN architectures. A shallow CNN architecture can cause information loss, as the feature maps that are extracted at the convolutional layers are not fine enough. Furthermore, several related works do not implement batch normalization in their work, even though studies have proven that batch normalization is crucial in improving the performance of CNN [44
]. In our proposed method, batch normalization is included in the design, and is placed after each convolutional layer. The final CNN architecture used in this study is shown in Table 12
, following the architecture’s sequence. There are a total of eleven layers, consisting of eight convolutional layers, one average pooling layer, and two fully connected layers.
For this study, the input to the CNN is a 64 × 160 matrix. The convolutional layers used in this study are made up of six 5 × 5 filters. This small filter size was chosen to extract more precise information and orientation from the signal. The input is passed into the first convolutional layer, creating a 60 × 156 × 6 feature map. For each convolutional layer, rectified linear units (ReLu) are used as the activation function thanks to their sparsity and the reduced likelihood of vanishing gradient. The ReLU function’s primary advantage over other activation functions is that it does not simultaneously fire all of the neurons. The corresponding neuron is not triggered for the negative input values. Compared to the sigmoid and tanh functions, the ReLU function is significantly more computationally efficient, as only a small subset of neurons are active [24
The resulting feature map proceeds through the second convolutional layer, and a feature map of 56 × 152 × 6 is generated. Then, the feature map is input to the third convolutional layer, generating a feature map of size 52 × 148 × 6. The resulting output is passed to the fourth convolutional layer, outputting a feature map of size 48 × 144 × 6. The fifth convolutional layer creates a feature map of 44 × 140 × 6. The sixth convolutional layer outputs a feature map of size 40 × 136 × 6. After passing the resulting feature map to the seventh convolutional layer, the feature map size is 36 × 132 × 6. The eighth convolutional layer outputs a feature map with size 32 × 128 × 6, and the feature map is passed to the average pooling layer, creating an output of 16 × 64 × 6. After being flattened, the output is subsequently sent to the fully connected layer. The fully connected layers use the softmax activation function, as it can calculate the probability for each class in order to address any classification issues. The feature map output of each hidden layer is summarized in Table 13
Seven parameters are fixed, as shown in Table 14
. The learning rate is fixed at 0.001 and stays that way during CNN trained. After each convolutional layer, batch normalisation is carried out using the
normalisation method. Every training iteration’s mini-batch size is limited to 128. A single epoch that is channelled through the CNN requires 36 iterations, as one epoch has 4578 training subsets. The number of training iterations for each epoch is set at 30, which is a moderate value. This setting was chosen to prevent the network from having an excessively high number of iterations. On the other hand, too few iterations can result in an underfitting of the network with training data, as the repetitions for the training are insufficient. In this experiment,
regularization was employed with a regularization factor of 0.0005 to prevent overfitting. The Adam optimizer was been selected for back-propagation during training of the CNN. The Glorot initializer [45
], sometimes referred to as the Xavier initializer, was used to pre-define the weights of the suggested architecture. This initializer randomly selects samples from uniformly distributed data with zero mean and variance.
3.6. Comparison of the Proposed CNN Architecture with State-of-Art CNNs
Our proposed CNN architecture is compared to the existing CNN-based EEG biometrics identification application in Table 15
. By referring to Table 1
, it can be seen that our eleven-layer CNN architecture is deeper in comparison to the others. In Table 15
, it can be seen that our proposed architecture presents higher identification accuracy than all of the other works except for that of Gui et al. [32
] and Wang et al. [38
]. In comparison with the work by Gui et al. [32
], despite using a shallower architecture, their work manages to present a higher identification accuracy. For their work, active-paradigm EEG signals were used. Using active-paradigm EEG signals presents an advantage compared to task-free paradigms in that discriminating features can be detected. Stimulations or tasks performed at certain time points can become a useful marker to facilitate feature extraction by the convolutional layer. However, active paradigms require longer set up, and may not be applicable for individuals who have lost their cognitive ability.
In comparison to the work of Wang et al. [38
], they used a variety of EEG input types and implemented the input using a combination of active paradigms and task-free paradigms. Such implementation inherits the advantages of both active and task-free paradigm EEG signals. Our proposed method uses only REO EEG as input. By tolerating a slight loss of identification accuracy, our proposed CNN architecture can present high-speed identification, as the subject is only required to open their eyes.
Furthermore, in the exploration of the most suitable type of pooling layer, it can be observed that all of the active paradigms implemented max pooling and obtained high identification accuracy. However, it was found that max pooling does not perform well with task-free paradigms. The purpose of max pooling is to obtain extreme values as features from the input EEG signals, which can be found in active-paradigm EEG when subjects are exposed to external stimulation or are required to perform tasks. Peaks or changes in the signals can be extracted efficiently using max pooling. On the other hand, our experiment shows that average pooling works better with task-free paradigm signals. Average pooling pools the features together by obtaining the average value, which includes all of the data. Therefore, there is no information lost in the process. Ma et al. [28
] implemented average pooling in their work as well; however, their work presented a lower identification accuracy due to a shallower CNN architecture, causing the final feature maps to be insufficiently fine.
For additional comparison, the type of activation function and optimizer are tabulated in Table 15
, despite there being works that did not mention the type of activation function and optimizer used. All of these works used ReLu as the activation function, again indicating the advantages of ReLu. For the type of optimizer, it can be seen that Adam and SGD were the most commonly used. Therefore, it was worthwhile for us to conduct experiments on SGD and Adam in order to determine which was the best optimizer for our proposed architecture.
Overall, our proposed architecture contains eleven layers, which is deeper than in any of the existing works. As a suitable number of convolutional layers were selected, it is apparent that a deeper CNN architecture can extract an informative feature map, thereby improving identification accuracy. Although a shallow CNN architecture is less complex, a lower number of convolutional layers are insufficient to extract important features from task-free paradigm EEG signals. Moreover, the comparison indicates that average pooling is more suitable for use with task-free paradigm signals, while max pooling is better suited to active-paradigm signals.