MCI Detection Using Kernel Eigen-Relative-Power Features of EEG Signals

: Classiﬁcation between individuals with mild cognitive impairment (MCI) and healthy controls (HC) based on electroencephalography (EEG) has been considered a challenging task to be addressed for the purpose of its early detection. In this study, we proposed a novel EEG feature, the kernel eigen-relative-power (KERP) feature, for achieving high classiﬁcation accuracy of MCI versus HC. First, we introduced the relative powers (RPs) between pairs of electrodes across 21 different subbands of 2-Hz width as the features, which have not yet been used in previous MCI-HC classiﬁcation studies. Next, the Fisher’s class separability criterion was applied to determine the best electrode pairs (ﬁve electrodes) as well as the frequency subbands for extracting the most sensitive RP features. The kernel principal component analysis (kernel PCA) algorithm was further performed to extract a few more discriminating nonlinear principal components from the optimal RPs, and these components form a KERP feature vector. Results carried out on 51 participants (24 MCI and 27 HC) show that the newly introduced subband RP feature showed superior classiﬁcation performance to commonly used spectral power features, including the band power, single-electrode relative power, and also the RP based on the conventional frequency bands. A high leave-one-participant-out cross-validation (LOPO-CV) classiﬁcation accuracy 86.27% was achieved by the RP feature, using a simple linear discriminant analysis (LDA) classiﬁer. Moreover, with the same classiﬁer, the proposed KERP further improved the accuracy to 88.24%. Finally, cascading the KERP feature to a nonlinear classiﬁer, the support vector machine (SVM), yields a high MCI-HC classiﬁcation accuracy of 90.20% (sensitivity = 87.50% and speciﬁcity = 92.59%). The proposed method demonstrated a high accuracy and a high usability (only ﬁve electrodes are required), and therefore, has great potential to further develop an EEG-based computer-aided diagnosis system that can be applied for the early detection of MCI.


Introduction
Mild cognitive impairment (MCI) is a state between cognitive decline with normal aging and cognitive impairments caused by dementia in which Alzheimer's disease (AD) is In addition to the left-right asymmetry, anterior-posterior difference of theta power was also found to be a potential feature for the MCI-HC classification [14]. Based on the literature, we expect that the neural activities difference between brain regions, including, but not limited to, the left-right and anterior-posterior differences, may differ strongly for the two groups of MCI and HC.
In the EEG-based major depressive disorder (MDD) studies (e.g., [34][35][36]), activity difference between two brain areas is often represented by the relative power between a pair of electrodes (referred to as relative power hereafter), which is different from the relative power used in previous MCI studies [17,18,20] (referred to as single-electrode relative power hereafter, for the sake of distinction). To our knowledge, no prior study has used the EEG power difference between electrodes (i.e., the relative power) to improve EEG-based classification of MCI versus HC. Therefore, we first extracted and analyzed the relative power features from all possible regional inter-hemispheric, intra-hemispheric, and cross-regional inter-hemispheric electrode pairs for all possible frequency subbands. Next, we determined the optimal relative power feature subset by a feature selection method based on Fisher's class separability criterion [37,38] and a top-ranked feature selection strategy [36,39]. After feature selection, we applied the kernel principal component analysis (kernel PCA) [40], which is capable of computing higher-order statistics among the original data and has shown its effectiveness in other domains of EEG applications (e.g., emotion recognition [41] and MDD detection [42]), to extract a set of more distinguishing components from the optimal relative power subset. Finally, a kernel eigen-relative-power (KERP) pattern formed by the extracted kernel PCA components is used as a novel EEG feature for the classification of MCI versus HC.
In the present study, since the proposed feature of interest was based on the spectral power of EEG, we compared the MCI-HC classification performance of relative power feature with spectral power features commonly used in previous studies. To preview the results, the classification performance of relative power outperformed other types of spectral power features and achieved a high subject-independent classification accuracy of~86% by leave-one-participant-out cross validation, when a simple linear discriminant analysis (LDA) classifier was used. Next, we showed that the classification accuracy can be further improved (up to 90%) by using the proposed KERP feature, i.e., cascading kernel PCA to the optimal relative power features. Finally, classification performance based on several other commonly used classifiers including k-NN, LDA, quadratic discriminant analysis (QDA)-a variant of LDA and a support vector machine (SVM)-were compared to check if there is a possibility to achieve a higher accuracy. Our results showed that the combination of KERP and SVM achieved the best classification performance of 90.20%.

Participants
The present study enrolled 24 participants with MCI (14 females, mean age of 70.96 ± 8.2 y/o) in the experimental group and 27 (17 females, mean age of 69.93 ± 4.98 y/o) healthy individuals in the control group. Data was collected at a Memory Clinic of a tertiary 2700bed referral medical center. Based on the results of clinical interviews, laboratory reports, brain imaging findings, and performance of neurocognitive assessments, the diagnostic impression was made under clinical consensus by a group of board-certified healthcare providers. Recommendations from the National Institute on Aging and the Alzheimer's Association (NIA-AA) [43,44] were used as the core clinical criteria for diagnosis of MCI. Individuals defined as the control group were recruited via advertisement and confirmed as not having any condition for all-cause dementia listed in the NIA-AA criteria.
All healthy controls received standardized assessments of neuropsychological battery and their results were within normal range after adjustment for education [45]. For both groups, the exclusion criteria were (1) current major psychiatric comorbidity (clinically diagnosed in the 6 months prior to the baseline neuropsychological evaluation), (2) motor and/or sensory deficits that may confound the examination of cognitive performance, and (3) neurological diseases or condition that may influence cognitive function. The demographics and neuropsychological data of the participants are shown in Figure 1. The study protocol was reviewed and approved by the institutional review board of Taipei Veterans General Hospital (IRB No: 2017-06-009A). Written informed consents were obtained from each participant or their legal guardian according to the Declaration of Helsinki before participating in the study.
criteria for diagnosis of MCI. Individuals defined as the control group were recruited via advertisement and confirmed as not having any condition for all-cause dementia listed in the NIA-AA criteria.
All healthy controls received standardized assessments of neuropsychological battery and their results were within normal range after adjustment for education [45]. For both groups, the exclusion criteria were (1) current major psychiatric comorbidity (clinically diagnosed in the 6 months prior to the baseline neuropsychological evaluation), (2) motor and/or sensory deficits that may confound the examination of cognitive performance, and (3) neurological diseases or condition that may influence cognitive function. The demographics and neuropsychological data of the participants are shown in Figure 1. The study protocol was reviewed and approved by the institutional review board of Taipei Veterans General Hospital (IRB No: 2017-06-009A). Written informed consents were obtained from each participant or their legal guardian according to the Declaration of Helsinki before participating in the study. p-values show the significance of statistical testing of the differences between the two groups. A p-value was considered significant when p-value < 0.01. * denotes p < 0.01.

EEG Recording and Data Collection
A 33-channel electro-cap was used to record EEG signals. The layout of the Ag/AgCl electrodes followed the international 10-20 system (see Figure 2). The ground electrode was at the forehead and was also the physical reference electrode. The impedance of the electrodes was kept below 10 K Ohm by applying electrode gel. The EEG data were digitally re-referenced online to the average of A1 and A2 potentials. For example, = − ( 1 + 2) 2 ⁄ . The remaining 30 channels were all used to record EEG signals for analysis. Electrooculography (EOG) signals were monitored by the electrodes attached at the right side of the right eye and above of the left eye, respectively. Both EEG and EOG signals were amplified, band-pass filtered (0.5-100 Hz), and converted to digital signals with a sampling frequency of 500 Hz using an EEG amplifier (NuAmp, NeuroScan Inc). The EOG artifacts were then removed from the EEG signals using the artifact removal software tool Scan 4.5 [46] provided by the NeuroScan, which is based on an EEG-VEOG covariance analysis, a linear regression procedure, and the creation and use of an LDR file to perform a point-by-point proportional subtraction of the eye blinks. Other possible artifact components, such as generic discontinuities and electromyography (EMG) signals, were further removed by using the independent component analysis (ICA, Infomax, EEGLAB [47]) and the ADJUST algorithm [48]. After these preprocessing steps, the EEG data were then used for analysis.

EEG Recording and Data Collection
A 33-channel electro-cap was used to record EEG signals. The layout of the Ag/AgCl electrodes followed the international 10-20 system (see Figure 2). The ground electrode was at the forehead and was also the physical reference electrode. The impedance of the electrodes was kept below 10 K Ohm by applying electrode gel. The EEG data were digitally re-referenced online to the average of A1 and A2 potentials. For example, Cz = Cz − (A1 + A2)/2. The remaining 30 channels were all used to record EEG signals for analysis. Electrooculography (EOG) signals were monitored by the electrodes attached at the right side of the right eye and above of the left eye, respectively. Both EEG and EOG signals were amplified, band-pass filtered (0.5-100 Hz), and converted to digital signals with a sampling frequency of 500 Hz using an EEG amplifier (NuAmp, NeuroScan Inc). The EOG artifacts were then removed from the EEG signals using the artifact removal software tool Scan 4.5 [46] provided by the NeuroScan, which is based on an EEG-VEOG covariance analysis, a linear regression procedure, and the creation and use of an LDR file to perform a point-bypoint proportional subtraction of the eye blinks. Other possible artifact components, such as generic discontinuities and electromyography (EMG) signals, were further removed by using the independent component analysis (ICA, Infomax, EEGLAB [47]) and the ADJUST algorithm [48]. After these preprocessing steps, the EEG data were then used for analysis.
The EEG recordings include two eye-open resting-state sessions. During each restingstate session, participants were asked to maintain fixation on a black cross presented centrally on a gray background on a personal computer's 22-inch display for 90 s, keep relaxed and try not to think about anything on purpose. We ended up recording 180 s of resting-state EEG signals (90 s/round × 2 rounds) from each participant. The EEG signal per electrode was then divided into 118 overlapping time windows of 3-s length and a shifting length of 1.5 s. Note that between the two resting-state EEG sessions, participants performed working memory tasks for other research purposes, which is irrelevant to the current study and will not be discussed.
The EEG recordings include two eye-open resting-state sessions. During each resting-state session, participants were asked to maintain fixation on a black cross presented centrally on a gray background on a personal computer's 22-inch display for 90 s, keep relaxed and try not to think about anything on purpose. We ended up recording 180 s of resting-state EEG signals (90 s/round × 2 rounds) from each participant. The EEG signal per electrode was then divided into 118 overlapping time windows of 3-s length and a shifting length of 1.5 s. Note that between the two resting-state EEG sessions, participants performed working memory tasks for other research purposes, which is irrelevant to the current study and will not be discussed.

Figure 2.
Layout of the electrodes. Their positions follow the 10-20 international system. Ground (GND) electrode is at the forehead, and two references are at A1 and A2, respectively. For analysis, we divided the entire scalp area into five different regions, including frontal (brown), central (green), parietal (blue), occipital (purple), and temporal (yellow).

Feature Extraction
In this section, we introduce the procedures of the extraction of the relative power (RP) feature, the selection of optimal RP features, the extraction of kernel eigen-relativepower (KERP) features.

Feature Extraction
In this section, we introduce the procedures of the extraction of the relative power (RP) feature, the selection of optimal RP features, the extraction of kernel eigen-relative-power (KERP) features.

Optimal Relative Power Feature Selection
Feature selection can be accomplished by embedded, wrapper, or filter approaches [49]. Embedded and wrapper approaches directly use generalization performance (often represented by cross validation accuracy) as the evaluation criterion to determine the optimal features, and thus both are classifier-specific [50]. However, their computational complexity is relatively high because the classifier's training process is involved in the entire feature selection process. This drawback further magnifies when the number of feature candidates increases. By contrast, filter approach evaluates the features directly by using predefined classifier-independent criteria, and thus filter approach is more efficient in implementation. Considering that we had a huge number of feature candidates to be evaluated (9135 RP features in total), filter approach is more appropriate for the feature selection. Accordingly, the popular filter approach-Fisher's class separability criterion [37]-was chosen for evaluating the RP features in the current study.
Given n f feature candidates, the Fisher's method calculates a Fisher score (F-score) for each feature (see [38] for the calculation of the F-score). A higher F-score corresponds to a higher ratio of the between-class separability to the within-class variation. We calculated the F-scores of the 9135 RP features (n f = 9135), and then ranked the features in descending order according to their F-scores. Then, we selected the top 50 ranked RP features as the candidates (m = 50, where m denotes the number of the maximum number of features that really participate in the feature selection in practice), and applied the top-n-F-score-ranked feature selection strategy [36,39] on the 50 features. For every n, 1 ≤ n ≤ m, a leave-oneparticipant-out cross validation (LOPO-CV) procedure was applied on the top n features, where an LDA classifier was employed to test the classification accuracy. In each fold of the LOPO-CV, data (n-dimensional vectors) from 50 participants were used as the training set, and n-dimensional data from the remaining one participant was used as the test data. This step was repeated until every participant's data had been used as the test data once. After the LOPO-CV procedure was finished for a specific n, we then calculated the classification accuracy: the number of correctly classified participants divided by the total number of participants from two groups. The best n was determined by (denoted as n best ) the highest LOPO-CV classification accuracy across the range of 1 ≤ n ≤ m. The top n best RP features formed an optimal RP subset. The entire feature selection process is summarized as the four steps in Figure 3. Based on the optimal RP features, the KERP is derived as follows.

Kernel Eigen-Relative-Power (KERP) Extraction
Given a training set {x i ∈ R n } M i=1 containing M training data of both classes of MCI and HC, where the elements in the data x i (i.e., an n-dimensional feature vector) are the n optimal RP features (n = n best ), kernel PCA maps all the training data into a higherdimensional feature space F via a nonlinear mapping ϕ : [40], and then diagonalizes the covariance of the ϕ(x i ) s as Cv = λv, where and v is an eigenvector associated with a nonzero eigenvalue λ. All solutions v lie within the span of the mapped data associated spanning coefficients a i , i = 1, . . . , M, i.e., v = ∑ M i=1 a i ϕ(x i ), which are nonlinearly related to the RP features in the original space. Solving the problem expressed as (2) is equivalent to solving the eigenvalue problem as where K is the M × M kernel matrix in which the entries are and a k = (a k1 , . . . , a kM ) T are the eigenvectors of the kernel matrix subject to the normalization constraint a k 2 = 1/λ k , ∀k = 1, . . . , M. The first d leading eigenvectors v s are chosen as the projection axes. For a test data x, projection of its image ϕ(x) onto the pth eigenvector v p is calculated as The nonlinear principal components corresponding to ϕ form a d-dimensional kernel eigen-relative power (KERP) pattern y = (y 1 , . . . , y d ) T , 1 ≤ d ≤ M. The KERP feature extraction method involves two parameters, including the number of chosen eigenvectors d, and the kernel parameter. Here, we employed the Gaussian kernel, where σ is the parameter of the kernel function.

Classification and Parameter Optimization
In the present study, classification performances based on four common classifiers, including k-NN, LDA, QDA, and SVM, were compared. Please see the author's previous work [36] for the decision functions of these chosen classifiers. In both LDA and QDA, the penalties for positive and negative classes were set as 1, because we treated the two classes (MCI and HC) equally in the training. The Gaussian kernel in (5) was also used for SVM. Therefore, among the four classifiers, only the SVM classifier involves free parameters, including the penalty weight C and kernel parameter σ. On the other hand, among the feature extraction methods (including the KERP and the other spectral features to be compared), only the KERP involves free parameters. All the free parameters were optimized using the LOPO-CV and grid search method. For SVM, the grids were in the set {(C, σ)|C = 1, 10, 100, 1000; σ = 1.05 x , x = ±1, ±1.5, ±2, . . . , ±30}, while for KERP the girds to be searched are in the set {(d, σ)|d = 1, 2, . . . , 20; σ = 1.05 x , x = ±1, ±1.5, ±2, . . . , ±30}. The optimal grid is the one resulting in the highest LOPO-CV classification accuracy.

Comparing Relative Power with other Spectral Features in MCI-HC Classification
We first compare the classification performance of relative power (RP) feature with other spectral power features commonly adopted in previous EEG-based MCI-HC classification studies, including BP and single-electrode relative power (SE-RP). Moreover, instead of using the five conventional frequency bands (CFB) like delta (δ), theta (θ), alpha (α), beta (β) and gamma (γ), we proposed to use a 2-Hz-width subband analysis method to derive the RP feature, as mentioned in Section 2.3.1. Therefore, we also compared the classification performance between the subband-based and the CFB-based RPs. Detailed descriptions regarding the four types of features are as follows, For a fair comparison, we compared the four features' best LOPO-CV classification accuracies obtained by their corresponding optimal feature subsets selected by the Fisher's feature selection method and the top-F-score-ranked feature selection strategy. Notice that hereafter, both classification accuracy and accuracy refer to the LOPO-CV classification accuracy. Figure 4 plots the accuracies against the number of top-F-score-ranked features for the four types of features. The optimal subset and the best electrodes for each of the four features are displayed in Figure 5.   As shown in Figure 4, for all feature types, classification accuracy generally increases with the increase of the number of top-F-score-ranked features until reaching the maximum, and then gradually decreases. The best classification accuracies for BP (CFB), SE-RP (CFB), RP (CFB), and RP (2-Hz subband) are 70.59%, 64.71%, 80.39%, and 86.27%, respectively, and their sizes of the optimal feature subsets are 2, 9, 6, and 7, respectively. This comparison shows that the subband-based between-electrode RP feature greatly outperforms BP by 15.58% and outperforms SE-RP by 21.56%. This reveals that the EEG power difference between electrodes (i.e., RP) is more sensitive than the single-electrode-based spectral features (i.e., BP and SE-RP) in MCI-HC classification. Moreover, Figure 4 also shows that RP (2-Hz subband) performed better than RP (CFB), which suggests that the 2-Hz subband approach is more effective than the conventional frequency bands to extract sensitive RP features for the MCI-HC classification.  150, 150, 2175, and 9135, respectively. The best classification accuracies for the four features are 70.59%, 64.71%, 80.39%, and 86.27%, respectively. In terms of RP (2-Hz subband), its accuracy curve reached the maximum 86.27% when the number of top-F-score-ranked-feature increased to 7, which occupy only a very small portion of the whole RP feature candidates (7/9135 = 0.07%).  According to Figure 5, three main finding are worth discussing. First, electrodes with the best classification performance for single-electrode features (BP and SE-RP) mainly reside over the frontal scalp region (with CP4 as the only exception). Second, all the optimal features for the between-electrode features, RP (CFB) and RP (2-Hz subband), are crossregional and composed of different bands. Take the RP (2-Hz subband) as an example. Its best features were extracted from the four electrode pairs of FP2-T6 (right frontal-right temporal), FP2-T3 (right frontal-left temporal), FP2-T5 (right frontal-left temporal), F7-T6 (left frontal-right temporal). This finding reveals that the optimal subband analysis-based RP features should be extracted from regional intra-hemispheric and cross-regional interhemispheric electrode pairs. Third, FP2 is the only electrode appearing in the best electrode sets for all the four feature types, suggesting that right prefrontal scalp area could be a critical region in distinguishing MCI from HC. For BP, the best two BP features (β and γ BPs) are extracted from the same site of FP2. Using the two BPs and one electrode FP2, a classification accuracy 70.59% was achieved, which is higher than chance level 50%.

Comparing Classification Performance between Different Scalp Regions and Frequency Bands
Here, the aim was to analyze and compare which scalp region(s) and which spectral frequency band(s) are the most sensitive to the detection of MCI. The results are listed in Table 1. Several settings about this experiment are described as follows. (1) For a specific frequency band and a scalp region. Take frontal δ BP as an example. We extracted the δ BP features from the seven electrodes' EEG signals from each participant and fed the seven δ BPs into the LDA classifier. The classification accuracy is 54.90%. Then, we performed the feature selection task on the seven δ BPs. The selected optimal δ BPs achieved a slightly higher accuracy of 56.86%. (2) For the term "merged". Take frontal BP as an example. We extracted the BPs of five conventional frequency bands from the EEG signals of the seven electrodes over the frontal scalp region from each participant. Then, feeding the 35 BP features into LDA classifier achieves an accuracy of 66.67%. After performing the feature selection on the 35 BP features, we again fed the selected optimal ones into LDA for classification and obtained an accuracy of 70.59%.
According to Table 1, the accuracies given by the optimal feature subsets are higher than those obtained by using the original features in most cases. The accuracy improvement is large in some cases. For example, in the case of "β-BP" and the "entire" scalp region (entire: the β-BPs of the 30 electrodes were used as the features), when all the original 30 β-BPs were used as the features, the accuracy was only 47.06%; whereas feeding the optimal β-BP subset (few among the 30 β-BPs) into the LDA classifier yielded a much higher result 68.63%. In the following, we discussed other results in terms of the accuracy with feature selection.
Frontal δ and θ powers have been highlighted in previous studies. Our results showed that the most sensitive scalp regions were at the central region for the BP feature (β-BP: 70.59%), at the frontal region for the SE-RP feature (β-SE-RP: 70.59% and γ-SE-RP: 70.59%), and at the frontal region for the RP (CFB) feature (δ-RP: 76.47%). The best classification performance achieved by the frontal δ-RP echoes the finding reported in the study by Yener et al. [20], where frontal delta oscillations were found to be a marker for MCI detection (classification of MCI vs. HC). On the other hand, the studies by Grunwald et al. [12] and by Rossini et al. [14] indicated that participants with MCI had higher resting-state frontal θ powers on average. However, the group difference remains inconclusive: the difference is significant in [14] but not in [12]. Our result in Table 1 indicates that frontal θ-BP provides only 58.82% accuracy, revealing that frontal θ power is not a feasible marker for an accurate detection of MCI. Finally, it can be observed from Table 1 that for all possible features and scalp regions, the RP (2-Hz subband) features over the temporal region achieves the highest accuracy of 82.35%.

Comparing the Accuracies between Different Classifiers with KERP Feature
Given the best RP (2-Hz subband) feature subset, we further test the performance of the proposed KERP feature in MCI-HC classification. As shown in Figure 5, the best seven RP features are (1)

Conclusions
Being the first study that used the between-electrode relative power (RP) as an EEG feature for the MCI-HC classification, we successfully demonstrated that the RP feature can achieve high MCI-HC classification accuracy which outperforms other commonly used spectral power features of EEG signals. Based on the optimal RP features, we further proposed a novel EEG feature-KERP for higher detection accuracy of MCI. We showed that the KERP feature combined with an SVM classifier can achieve a considerably high Then, we fed the KERP features into different classifiers (k-NN, LDA, QDA, SVM) to test the participant-independent classification accuracy (i.e., by LOPO-CV). For the k-NN classifier, we set k = 3. The optimal values of SVM were C = 10 and σ = 3.7975.
The accuracies of these classifiers based on the same KERP feature are listed in Table 2. Comparing the LDA-based accuracy (88.24%) in Table 2 with the highest accuracy of RP shown in Figure 4 (86.27%), we can see that the KERP feature shows better classification performance than the optimal RP feature, and therefore better than other commonly used spectral features (BP and SE-RP). Moreover, among the four classifiers, SVM gave the highest accuracy (90.20%) and k-NN performed the worst (76.47%). The comparison indicates that the combination of KERP feature and the SVM classifier can achieve a high MCI-HC classification accuracy, with the sensitivity of 87.50% (only 3 out of the 24 MCI participants were misclassified) and specificity of 92.59% (only 2 out of the 27 HC were misclassified).

Conclusions
Being the first study that used the between-electrode relative power (RP) as an EEG feature for the MCI-HC classification, we successfully demonstrated that the RP feature can achieve high MCI-HC classification accuracy which outperforms other commonly used spectral power features of EEG signals. Based on the optimal RP features, we further proposed a novel EEG feature-KERP for higher detection accuracy of MCI. We showed that the KERP feature combined with an SVM classifier can achieve a considerably high classification accuracy of 90.20%, with only five electrodes.
Although the current study provided very promising results of the effectiveness of KERP features in the MCI-HC classification, several limitations are still yet to be noted. First, the sample size of the current study was relatively small, which limited the generalizability of the current results. Testing the classification performance of KERP with larger sample size will be needed to increase the generalizability. Second, although resting-state EEG recordings is easier to implement in clinical settings due to its task-free nature, it was unclear whether features extracted from resting-state are stable across time. Future studies that test resting-state EEGs from two time points separated for at least a month will be needed to test the stability of the classification performance of KERP. Last but not least, since individuals with MCI commonly suffer from notable memory declines, it will be very interesting to test the classification performance of KERP features extracted from task-induced EEGs (e.g., working memory tasks). It is very likely that task-induced EEGs may contain more information than resting-state EEGs in terms of MCI detection.
On the other hand, many BCI systems have been integrated with virtual reality (VR) techniques for the purpose of improving cognitive functions of users. Such BCI-VR applications are based on closed-loop control systems. Firstly, these BCI-VR systems allow users to control the actuators in the virtual environment (e.g., virtual racing cars [51]) through their brain activities. In the feedback path, the BCI-VR systems quantify/assess the cognitive performance of the users (attention for example [51]) by extracting the cognitivesensitive markers from the recorded EEG signals. In the current study, we proposed a novel EEG-based method for accurate and early detection of MCI. The proposed method, therefore, has the potential to serve as a neurofeedback approach for developing reliable BCI systems for assessing and even improving the cognitive performance of individuals with MCI. Such BCI systems can be viewed as non-pharmacological interventions, if the systems are implemented and validated in the future. Moreover, the EEG difference between different scalp locations, in fact, somewhat represents the EEG power (or the EEG amplitude's voltage in a specific frequency band) gradient from one location to