Depression Detection Using Relative EEG Power Induced by Emotionally Positive Images and a Conformal Kernel Support Vector Machine

Electroencephalography (EEG) can assist with the detection of major depressive disorder (MDD). However, the ability to distinguish adults with MDD from healthy individuals using resting-state EEG features has reached a bottleneck. To address this limitation, we collected EEG data as participants engaged with positive pictures from the International Affective Picture System. Because MDD is associated with blunted positive emotions, we reasoned that this approach would yield highly dissimilar EEG features in healthy versus depressed adults. We extracted three types of relative EEG power features from different frequency bands (delta, theta, alpha, beta, and gamma) during the emotion task and resting state. We also applied a novel classifier, called a conformal kernel support vector machine (CK-SVM), to try to improve the generalization performance of conventional SVMs. We then compared CK-SVM performance with three machine learning classifiers: linear discriminant analysis (LDA), conventional SVM, and quadratic discriminant analysis. The results from the initial analyses using the LDA classifier on 55 participants (24 MDD, 31 healthy controls) showed that the participant-independent classification accuracy obtained by leave-one-participant-out cross-validation (LOPO-CV) was higher for the EEG recorded during the positive emotion induction versus the resting state for all types of relative EEG power. Furthermore, the CK-SVM classifier achieved higher LOPO-CV accuracy than the other classifiers. The best accuracy (83.64%; sensitivity = 87.50%, specificity = 80.65%) was achieved by the CK-SVM, using seven relative power features extracted from seven electrodes. Overall, combining positive emotion induction with the CK-SVM classifier proved useful for detecting MDD on the basis of EEG signals. In the future, this approach might be used to develop a brain–computer interface system to assist with the detection of MDD in the clinic. Importantly, such a system could be implemented with a low-density electrode montage (seven electrodes), highlighting its practical utility.


Introduction
Major depressive disorder (MDD) is characterized by persistent sadness, hopelessness, and inability to feel pleasure in normally enjoyable activities (i.e., anhedonia [1]).MDD is also associated with deficits in executive function [2] and memory [3], and recent depression is a risk factor for suicide [4].Because depression is prevalent and recurrent, the World Health Organization (WHO) ranks it as a leading contributor to the global burden of disease [5].To effectively treat MDD, a safe, objective, and convenient method for accurate diagnosis is essential.Electroencephalography (EEG) is increasingly recognized as a promising tool in this regard [6], because it is a well-established, inexpensive, and non-invasive method for assessing brain function that is more suitable for routine use than other imaging modalities, such as magnetoencephalography or functional magnetic resonance imaging [7].
Prior studies have used abnormalities in resting-state EEG spectral power to characterize individuals with MDD.Frontal alpha asymmetry [8,9], which refers to relatively greater left versus right alpha band power, is a consistent EEG marker of MDD.Furthermore, the absolute EEG power at specific channels (e.g., alpha power in C3, P3, O1, O2, F7, and T3 [10]) and the average EEG power over channels in specific regions (e.g., frontal alpha and frontal theta [11]) have also been used to distinguish adults with MDD from healthy controls.In short, EEG power may be used to detect MDD.Because MDD is heterogeneous and yields different symptom profiles across patients, the additional information provided by EEG may be very helpful for clinicians seeking to make an accurate diagnosis.
Several studies have, therefore, combined resting-state EEG power features with machine learning to classify individuals as healthy versus depressed.In a study by Hosseinifard et al. [10], classification based on the application of linear discriminant analysis (LDA) to power in the delta, theta, alpha, and beta bands achieved accuracies of 67%, 70%, 73%, and 70%, respectively.Our recent classification study [12] also obtained the best results when applied to alpha power.However, we were only able to achieve a classification accuracy of 68.98% using a k-nearest neighbor (k-NN) classifier.Thus, classification via resting-state EEG spectral power features is characterized by relatively low accuracy.The LDA classification performance results reported in [10,12] could represent the upper limit of classification based on spectral power features, but it is likely that applying a more sophisticated classifier such as a support vector machine (SVM) may improve accuracy.Furthermore, employing an MDD-sensitive task to elicit EEG responses that differ strongly between depressed and healthy individuals may also prove helpful [10,12].The present study took this approach by applying a novel SVM classifier to EEG data recorded during an emotion induction paradigm completed by adults with MDD and healthy controls.
The task used in the current study focused on the induction of positive emotions.We did not use emotionally negative stimuli, because although excessive negative emotion is a prominent characteristic of MDD, it is also commonly found in anxiety.By contrast, anhedonia-reduced ability to derive pleasure from enjoyable experiences-is relatively specific to depression [13].Therefore, a blunted neural response to positive stimuli may be more diagnostic of depression versus anxiety.Furthermore, anhedonia has been linked to disruption in brain reward circuitry [14][15][16], and such abnormalities may have consequences for downstream functions, such as episodic memory [17][18][19].Therefore, abnormal responses to positive stimuli appear to be a reliable and important aspect of depressive illness.To our knowledge, however, no prior study has used positive emotion-induction to improve EEG-based detection of MDD.Based on the literature, we expected that EEG signals recorded from controls and adults with MDD would differ more strongly during exposure to emotionally positive material than at rest.Therefore, we collected EEG data while healthy controls and adults with MDD attempted to maximize their emotional responses to positive images and during the resting state.Given prior work linking changes in relative power (i.e., the power difference between a pair of electrodes) to shifts in emotional experience [20], we focused the analysis on relative power features from all possible (a) regional inter-hemispheric, (b) cross-regional inter-hemispheric, and (c) intra-hemispheric electrode pairs.We then compared leave-one-participant-out cross-validation (LOPO-CV) classification performance based on EEG relative power features and then determined the best relative power feature subsets for the two states considered separately.
In addition to feature extraction, MDD detector design is also critical to classification performance.Various machine-learning classifiers have been employed in different MDD studies, including the very simple k-NN, LDA, and more sophisticated classifiers like Naïve Bayes (NB), logistic regression (LR), and SVM.Two studies have compared these classifiers for detection of MDD based on EEG signals [10,11].A comparison of k-NN, LDA, and LR [10] showed that LR achieved the best classification accuracy.More recently, Mumtaz et al. [11] reported that SVM outperformed both LR and NB in MDD-control classification.
Although these results indicate that SVM is the best classifier for EEG-based MDD detection, the classification performance of SVM could still be improved.One strategy for doing so is based on conformal transformation of a kernel proposed by Wu and Amari [21].The technique of conformally transforming a kernel enlarges the spatial resolution around the SVMs, separating hyperplane in a kernel-induced feature space, thus improving the generalization ability of SVM.This technique has recently been applied to solve problems in other domains, including EEG-based emotion recognition [22].In the present study, we introduced this variant of SVM, the conformal kernel SVM (CK-SVM), as the MDD detector and compared it with two commonly-used MDD detectors (LDA and SVM) and a variant of LDA classifier called quadratic discriminant analysis (QDA) [23], which has not yet been tested in previous MDD studies.To preview the results, CK-SVM emerged as superior to the other classifiers for EEG-based MDD detection.

Participants
Data were collected from 24 adults with MDD (15 females, mean age: 29.7 ± 10.9 y/o, mean education years: 16.3 years) and 31 healthy controls (17 females, mean age: 29.75 ± 9.9 y/o, mean education years: 16.9 years) directly after completion of a source memory task that involved emotionally neutral words [24].The two groups did not differ in age (p = 0.99), male/female ratio (p = 0.57), or years of education (p = 0.28).The following inclusion criteria were used for the MDD group: (1) endorsed symptoms consistent with a current major depressive episode [25]; (2) a Beck depression inventory II (BDI-II) score ≥13 on the day of the EEG [26]; (3) no other DSM (The Diagnostic and Statistical Manual of Mental Disorders)-IV Axis I psychopathology with the exception of generalized anxiety, social anxiety, and/or specific phobia (all of which are highly comorbid with MDD); and (4) no medication use in the past two weeks (six weeks for fluoxetine, six months for neuroleptics).The controls reported no current or past DSM-IV Axis I psychopathology.As expected, the MDD group generated significantly higher BDI-II scores (mean ± S.D.: 24.96 ± 8.8) than did the controls (1.22 ± 2.01), p <0.001; the BDI-II scores indicate that the participants in the MDD group were moderately depressed, on average.All the participants provided informed consent to a protocol approved by the Partners HealthCare Human Research Committee (PHRC), and they were compensated $25/h for their participation.

Resting-State and Emotion-Induction EEG Data Collection
The EEG recording involved a resting-state session followed by the emotion-induction session (Figure 3).All recordings were made in the Center for Depression, Anxiety, and Stress Research at McLean Hospital in Belmont, MA, USA.The resting-state session included three trials.Each trial began with a 5 s countdown, during which time each numeral (5-4-3-2-1) was displayed in black on a gray background for 1 s.This countdown was intended to focus the participants' attention for the EEG recording.The participants then maintained fixation on a centrally presented black cross displayed on a gray background for 54 s.We recorded 162 s of resting-state data (54 s/trial × 3 trials), which were subsequently divided into 27 EEG epochs of 6-s length without overlap.
The emotion-induction session included 27 trials.Each trial began with the same 5 s countdown used in the resting-state session.Next, an IAPS picture was shown for a 6-s emotion-induction period.During this period, the participants were asked to engage with the image by imagining that they (or their loved ones) were experiencing the positive event depicted in the picture.Using mental imagery in this way has been shown to effectively enhance and intensify emotional experience in numerous prior studies of emotion regulation, including studies conducted with depressed adults (e.g., [28]).At the end of each trial, the participants used the Self-Assessment Manikin procedure [29] to evaluate the induced emotional experience for valence and arousal.The emotion-induction session was selfpaced to avoid fatigue.The participants could take a break whenever they wished, and they initiated each trial by pressing a button.Note that the same amount of raw EEG data (27 epochs of 6-s length) was obtained from each participant during the emotion-induction and resting-state sessions.

Resting-State and Emotion-Induction EEG Data Collection
The EEG recording involved a resting-state session followed by the emotion-induction session (Figure 3).All recordings were made in the Center for Depression, Anxiety, and Stress Research at McLean Hospital in Belmont, MA, USA.The resting-state session included three trials.Each trial began with a 5 s countdown, during which time each numeral (5-4-3-2-1) was displayed in black on a gray background for 1 s.This countdown was intended to focus the participants' attention for the EEG recording.The participants then maintained fixation on a centrally presented black cross displayed on a gray background for 54 s.We recorded 162 s of resting-state data (54 s/trial × 3 trials), which were subsequently divided into 27 EEG epochs of 6-s length without overlap.
The emotion-induction session included 27 trials.Each trial began with the same 5 s countdown used in the resting-state session.Next, an IAPS picture was shown for a 6-s emotion-induction period.During this period, the participants were asked to engage with the image by imagining that they (or their loved ones) were experiencing the positive event depicted in the picture.Using mental imagery in this way has been shown to effectively enhance and intensify emotional experience in numerous prior studies of emotion regulation, including studies conducted with depressed adults (e.g., [28]).At the end of each trial, the participants used the Self-Assessment Manikin procedure [29] to evaluate the induced emotional experience for valence and arousal.The emotion-induction session was selfpaced to avoid fatigue.The participants could take a break whenever they wished, and they initiated each trial by pressing a button.Note that the same amount of raw EEG data (27 epochs of 6-s length) was obtained from each participant during the emotion-induction and resting-state sessions.

Resting-State and Emotion-Induction EEG Data Collection
The EEG recording involved a resting-state session followed by the emotion-induction session (Figure 3).All recordings were made in the Center for Depression, Anxiety, and Stress Research at McLean Hospital in Belmont, MA, USA.The resting-state session included three trials.Each trial began with a 5 s countdown, during which time each numeral (5-4-3-2-1) was displayed in black on a gray background for 1 s.This countdown was intended to focus the participants' attention for the EEG recording.The participants then maintained fixation on a centrally presented black cross displayed on a gray background for 54 s.We recorded 162 s of resting-state data (54 s/trial × 3 trials), which were subsequently divided into 27 EEG epochs of 6-s length without overlap.
The emotion-induction session included 27 trials.Each trial began with the same 5 s countdown used in the resting-state session.Next, an IAPS picture was shown for a 6-s emotion-induction period.During this period, the participants were asked to engage with the image by imagining that they (or their loved ones) were experiencing the positive event depicted in the picture.Using mental imagery in this way has been shown to effectively enhance and intensify emotional experience in numerous prior studies of emotion regulation, including studies conducted with depressed adults (e.g., [28]).At the end of each trial, the participants used the Self-Assessment Manikin procedure [29] to evaluate the induced emotional experience for valence and arousal.The emotion-induction session was self-paced to avoid fatigue.The participants could take a break whenever they wished, and they initiated each trial by pressing a button.Note that the same amount of raw EEG data (27 epochs of 6-s length) was obtained from each participant during the emotion-induction and resting-state sessions.

Apparatus, Settings, and EEG Preprocessing
The experimental protocol was programmed with PsychoPy [30], and stimuli were displayed on a 22-inch monitor controlled by a personal computer (PC).The EEG was recorded using a 128-sensor HydroCel GSN Electrical Geodesics Inc. (EGI) Net.The EEG data were referenced to vertex Cz, and impedances were kept below 45 kΩ whenever possible (maximum: 75 kΩ).Eye blinks and movements were monitored by horizontal and vertical bipolar electrooculography (EOG) electrodes.The EEG and EOG data were sampled at 1000 Hz and filtered with a band-pass filter (0.02-100 Hz).The preprocessing of EEG data was performed using EEGLAB toolbox [31].EEG signals were first band-pass filtered (Finite Impulse Response filter, 0.5~50 Hz, EEGLAB).For the rest of the channels, we performed the independent component analysis (ICA, Infomax, EEGLAB) for the concatenated EEG data and applied the ADJUST algorithm [32] to identify and remove artifact components, including horizontal eye movement, vertical eye movement, eye blinks, and generic discontinuities.

Apparatus, Settings, and EEG Preprocessing
The experimental protocol was programmed with PsychoPy [30], and stimuli were displayed on a 22-inch monitor controlled by a personal computer (PC).The EEG was recorded using a 128-sensor HydroCel GSN Electrical Geodesics Inc. (EGI) Net.The EEG data were referenced to vertex Cz, and impedances were kept below 45 kΩ whenever possible (maximum: 75 kΩ).Eye blinks and movements were monitored by horizontal and vertical bipolar electrooculography (EOG) electrodes.The EEG and EOG data were sampled at 1000 Hz and filtered with a band-pass filter (0.02-100 Hz).The preprocessing of EEG data was performed using EEGLAB toolbox [31].EEG signals were first band-pass filtered (Finite Impulse Response filter, 0.5~50 Hz, EEGLAB).For the rest of the channels, we performed the independent component analysis (ICA, Infomax, EEGLAB) for the concatenated EEG data and applied the ADJUST algorithm [32] to identify and remove artifact components, including horizontal eye movement, vertical eye movement, eye blinks, and generic discontinuities.

Apparatus, Settings, and EEG Preprocessing
The experimental protocol was programmed with PsychoPy [30], and stimuli were displayed on a 22-inch monitor controlled by a personal computer (PC).The EEG was recorded using a 128-sensor HydroCel GSN Electrical Geodesics Inc. (EGI) Net.The EEG data were referenced to vertex Cz, and impedances were kept below 45 kΩ whenever possible (maximum: 75 kΩ).Eye blinks and movements were monitored by horizontal and vertical bipolar electrooculography (EOG) electrodes.The EEG and EOG data were sampled at 1000 Hz and filtered with a band-pass filter (0.02-100 Hz).The preprocessing of EEG data was performed using EEGLAB toolbox [31].EEG signals were first band-pass filtered (Finite Impulse Response filter, 0.5~50 Hz, EEGLAB).For the rest of the channels, we performed the independent component analysis (ICA, Infomax, EEGLAB) for the concatenated EEG data and applied the ADJUST algorithm [32] to identify and remove artifact components, including horizontal eye movement, vertical eye movement, eye blinks, and generic discontinuities.
Different types of relative powers have been used in various EEG studies.In this study, we extracted the three types of relative power and compared their classification performances (Figure 5).Type-I relative power (RP-I) is calculated with the following formula [33]: where BP(A) and BP(B) denote the BPs of two different electrodes A and B in the same frequency band.Type-II relative power (RP-II) is given by [11] the following: where W is the power within a specific band of interest (e.g., alpha) divided by the total power within the entire band of 1-45 Hz.Type-III relative power (RP-III) corresponds to the difference between the natural log-transformed BPs of two different electrodes as follows [8]: For each frequency band and each type of relative power, a total of 406 (29 × 28/2) values were extracted for each participant.These values carry information about regional inter-hemispheric (e.g., FP1-FP2), cross-regional inter-hemispheric (e.g., FP1-T4), and intra-hemispheric (e.g., FP1-O1) asymmetries.Because there are five bands, the number of relative power features of each type (RP-I, RP-II, RP-III) increases to 2030 (406 × 5) when all bands are considered.For each kind of feature (e.g., RP-I in a specific frequency band), we obtained one N-dimensional feature vector from each participant.A feature vector is called a data point or a datum in this paper.
Different types of relative powers have been used in various EEG studies.In this study, we extracted the three types of relative power and compared their classification performances (Figure 5).Type-I relative power (RP-I) is calculated with the following formula [33]: where () and () denote the BPs of two different electrodes A and B in the same frequency band.Type-II relative power (RP-II) is given by [11] the following: where  is the power within a specific band of interest (e.g., alpha) divided by the total power within the entire band of 1-45 Hz.Type-III relative power (RP-III) corresponds to the difference between the natural log-transformed BPs of two different electrodes as follows [8]: For each frequency band and each type of relative power, a total of 406 (29 × 28/2) values were extracted for each participant.These values carry information about regional inter-hemispheric (e.g., FP1-FP2), cross-regional inter-hemispheric (e.g., FP1-T4), and intra-hemispheric (e.g., FP1-O1) asymmetries.Because there are five bands, the number of relative power features of each type (RP-I, RP-II, RP-III) increases to 2030 (406 × 5) when all bands are considered.For each kind of feature (e.g., RP-I in a specific frequency band), we obtained one N-dimensional feature vector from each participant.A feature vector is called a data point or a datum in this paper.The LDA classifier was employed to classify individuals as depressed or healthy.LDA finds a hyperplane as the decision boundary in the original space of patterns.For a test datum x ∈ R N , where N is the dimension of x, the LDA decision function for x is given by the following: where The decision function of QDA is given by the following [23]: where Σ 1 and Σ 2 are the covariance matrices of class 1 and class 2, respectively, and , where L is the size of the set S and y i ∈ [−1, +1] are class labels of the training datum x i, , SVM maps the training set into a higher-dimensional feature space F via a nonlinear mapping ϕ and then finds a separating hyperplane w T ϕ(x) + b = 0 (H = 0) that minimizes the training error and maximizes the margin of separation between classes.For a test datum x, its output can be estimated by the SVM decision function as follows: where 0 ≤ α i ≤ C are Lagrange multipliers, C is the penalty weight for training error, SV denotes the set of support vectors: SV = {x i |0 < α i ≤ C}, b opt is the optimum bias of the separating hyperplane (determined according to the Kuhn-Tucker condition of SVM), and k(, ) is the kernel function, which computes the inner product of two mapped data in the space F. In this study, the Gaussian function k(x i, , x) = exp x i − x 2 /2σ 2 was adopted as the kernel of SVM, where σ is the parameter of the kernel.The test datum x is classified as MDD if D SV M (x) > 0; otherwise, x is classified as the control group.

CK-SVM
The Gaussian kernel embeds the original space R N into an infinite dimensional space F as a Riemannian manifold lying on a unit ball centered at the origin of the space F (see the author's previous work [34] for an illustration), and the kernel-induced Riemannian metric is given by the following: Appl.Sci.2018, 8, 1244 8 of 18 where x i denotes the ith element of x.The relationship between the Riemannian distance ds and a small displacement dx, as follows, indicates how a local volume in R N is magnified or contracted in F under the mapping of ϕ.
A conformal transformation of ϕ is defined by ϕ(x) = Q(x)ϕ(x), where Q(x) is a real-valued conformal function.The function Q(x) can be chosen in a way in which its value is the largest at the vicinity of ϕ(x) and decreases with the distance from the position of ϕ(x).Designing such a transformation yields the conformal transformation of a primary kernel k as follows: where the transformed kernel k is called a conformal kernel.Let a set T contain a set of data points x i s.
To magnify the spatial resolution around the images of the data points x i s in the space F, a conformal function consisting of a set of Gaussian functions can be defined as follows [21]: where the denominator term 1 n ∑ j ϕ x j − ϕ(x i ) 2 computes the mean squared distance from the image of x i to its n nearest neighbors ϕ x j .According to Equation ( 9), the conformal function Q(x) has the largest value at the position of the image of the set T and decays with the distance from the image of T in the feature space F. Wu and Amari [21] proposed that the set T should collect the training data points whose α i > 0 (i.e., the support vectors) based on the fact that most support vectors lie within the margin of separation.However, some support vectors may be far from the margin.Accordingly, Liu et al. [22] defined the set T as follows: Because the training data points in the set T are the support vectors falling inside of the separation margin, enlarging the spatial resolution around the image of the set T, defined as Equation (11), is equivalent to increasing the spatial resolution of the separation margin in the feature space F. In this paper, we adopted the conformal function and the set T suggested in [21,22], respectively.Because the Gaussian function is adopted as the kernel, k(x, y) = 1, ∀x = y.Thus, the conformal function expressed in Equation ( 10) can be simplified as follows: In this study, we set n = 3.The training of CK-SVM consists of two steps: firstly, train an SVM with a Gaussian kernel; and secondly, retrain the SVM with the conformal kernel defined in Equation (9).

LOPO-CV
In the present study, we performed leave-one-participant-out cross-validation (LOPO-CV) to assess the participant-independent MDD-control classification performance for each combination of feature, classifier, and session (resting and emotion-induction).LOPO-CV is a technique for evaluating how well the results of a method will generalize to unseen data.In each fold of the LOPO-CV, data from 54 participants were used to train the classifier, and then, the N-dimensional data from the one remaining participant were used as the test data.This step was repeated until every participant's data had served as the test data once.We then recorded the classification accuracy, computed as the number of correctly classified participants divided by the total number of participants (55 folds).Here, a misclassified datum in a testing fold resulted in only a small increase of error rate (1/55 = 1.82%).

Parameter Optimization
None of the three types of relative power feature extraction methods used free parameters.As for the classifiers, LDA involves no free parameters, and both SVM and CK-SVM have two parameters to be adjusted, including the penalty weight C and kernel parameter σ.We searched for the optimum parameters by performing the LOPO-CV procedure combined with a grid search.The parameters were searched in the following sets: C = {1, 10, 20, 50, 80, 100, 120}, σ = 1.025 d d = −100 : 1 : 300 .Therefore, there was a total of 2807 (7 × 401) parameter grids, and for each grid, the LOPO-CV procedure was performed once.The optimum parameter grid resulted in the highest classification accuracy.The results of SVM and CK-SVM reported here were the ones obtained from the most optimal values of the parameters.

Feature Dimension Reduction
Directly including all the features in the LOPO-CV procedure might result in overfitting.Take RP-I of the delta band as an example.A total of 406 delta-band PR-I features were extracted from each participant.If we were to include all the features in the classification, the feature dimension would be 406 (N = 406).This is obviously higher than the size of the training set (|S| = 54) in each fold of the LOPO-CV, leading to the so-called small sample size (SSS) problem.The SSS problem may have two critical consequences.First, it may lead to overfitting [35].Second, the covariance matrix of LDA may become singular such that its inverse does not exist [22].Although it is possible to calculate the pseudoinverse of the covariance in such cases, classification performance would nevertheless degenerate.To avoid the SSS problem, feature selection must be used to reduce the feature dimension to 54 prior to LOPO-CV.
Feature selection methods can be classified into three categories: embedded, wrapper, and filter [36].Embedded and wrapper methods rely on the performance of a specific classifier to quantify the classification ability of each feature.Examples of the former and the latter are sequential forward selection (SFS) [37] and recursive feature elimination (RFE) methods [38], respectively.By contrast, filter methods are independent of the classifier.One popular filter method is Fisher's class separability criterion [39].This involves calculating the Fisher ratio (F-score) for each feature, and then ranking the features by their F-scores.A higher F-score corresponds to a higher between-class to within-class ratio for the feature.By contrast, a lower F-score indicates that the feature is noisier.The advantage of the filter method is its low computational complexity.Therefore, for each state and each type of RPs, we applied Fisher's method to select the top 54 features from the 406 single-band (delta/theta/alpha/beta/gamma) and the 2030 "all-band" RP feature candidates, respectively.Note that the objective was not to select a set of "optimum features", but simply to select 54 features to avoid the small sample size problem.Figure 6 shows the F-scores of the 2030 (all bands) RP-I, RP-II, and RP-III features extracted during the emotion-induction state.
The advantage of the filter method is its low computational complexity.Therefore, for each state and each type of RPs, we applied Fisher's method to select the top 54 features from the 406 single-band (delta/theta/alpha/beta/gamma) and the 2030 "all-band" RP feature candidates, respectively.Note that the objective was not to select a set of "optimum features", but simply to select 54 features to avoid the small sample size problem.Figure 6 shows the F-scores of the 2030 (all bands) RP-I, RP-II, and RP-III features extracted during the emotion-induction state.The next question is how to find the most useful features among a given set of F-score ranked features.To address this issue, Lin et al. [20] conducted two experiments based on a leave-N-featureout scheme to select the best EEG features from 60 candidates.Their first experiment iteratively removed N F-score-ranked features one at a time and examined each one's effects on classification performance.In the second experiment, they iteratively removed N F-score-ranked features one at a time, but the removed N features were randomly selected.Their results based on different N values (e.g., 1, 5, 10, 15, and 20) showed that the top F-score-ranked features were more discriminative than the lower ranked ones.As a result, they suggested that one may directly choose the top-N features as the optimum feature subset for emotional EEG classification, and the optimal value of N can be determined by a data driven method (i.e., cross-validation).Recently, the same strategy for selecting the top-N-F-score-ranked features was adopted for classifying motor imagery EEG [40].
Accordingly, in this study, we performed LOPO-CV to calculate the classification accuracy using the top-N-F-score-ranked features for each state (resting and emotion induction) and each RP type, and this procedure was conducted once for every N ( = 1,2, … ,54).Here, we used the LDA as the classifier, because a sophisticated classifier such as a nonlinear SVM classifier may compensate for the weakness of a feature, and thus, a simple classifier is preferred in feature evaluation [41].Also, LDA involves no free parameters, and thus, there is no need to perform the time-consuming grid search method for each N.

Statistical Analysis
In the current study, we used the Wilcoxon rank-sum test to test for a group difference (MDD vs control) in the subjective ratings of valence and arousal elicited by the IAPS pictures.The next question is how to find the most useful features among a given set of F-score ranked features.To address this issue, Lin et al. [20] conducted two experiments based on a leave-N-feature-out scheme to select the best EEG features from 60 candidates.Their first experiment iteratively removed N F-score-ranked features one at a time and examined each one's effects on classification performance.In the second experiment, they iteratively removed N F-score-ranked features one at a time, but the removed N features were randomly selected.Their results based on different N values (e.g., 1, 5, 10, 15, and 20) showed that the top F-score-ranked features were more discriminative than the lower ranked ones.As a result, they suggested that one may directly choose the top-N features as the optimum feature subset for emotional EEG classification, and the optimal value of N can be determined by a data driven method (i.e., cross-validation).Recently, the same strategy for selecting the top-N-F-score-ranked features was adopted for classifying motor imagery EEG [40].

Results
Accordingly, in this study, we performed LOPO-CV to calculate the classification accuracy using the top-N-F-score-ranked features for each state (resting and emotion induction) and each RP type, and this procedure was conducted once for every N (N = 1, 2, . . ., 54).Here, we used the LDA as the classifier, because a sophisticated classifier such as a nonlinear SVM classifier may compensate for the weakness of a feature, and thus, a simple classifier is preferred in feature evaluation [41].Also, LDA involves no free parameters, and thus, there is no need to perform the time-consuming grid search method for each N.

Statistical Analysis
In the current study, we used the Wilcoxon rank-sum test to test for a group difference (MDD vs control) in the subjective ratings of valence and arousal elicited by the IAPS pictures.

Statistical Analysis Results of Subjective Ratings of Valence and Arousal
Figure 7 shows that the average valence of the emotional responses elicited by the pleasant pictures was significantly lower (p < 0.0005) in the MDD group (6.40 ± 0.11) versus the healthy control group (6.98 ± 0.10).By contrast, there was no difference in arousal ratings (MDD: 5.24 ± 0.21; Control: 5.33 ± 0.21, p = 0.79).Thus, MDD selectively blunted the pleasantness-but not the overall arousal-of the emotional responses elicited by the positive pictures.

LOPO-CV Classification Results Based on LDA Classifier and Top-N-F-Score-Ranked Features
Figure 8 shows the classification accuracy obtained by each top-N-F-score-ranked feature set in different types of RP extracted from the emotion-induction state.Figure 6 shows that overall, the three accuracy curves decrease with the increase of N. The highest classification accuracies for the three types of RPs are 80.00% (N = 6), 76.36% (N = 2), and 80.00% (N = 7), respectively.The best numbers of the top-N-F-score-ranked features (i.e., the best N) for different types of RPs were apparently different.Table 1 further compares the MDD-control classification accuracy between the best N features extracted from the resting state and the emotion-induction state in different conditions of frequency bands.

LOPO-CV Classification Results Based on LDA Classifier and Top-N-F-Score-Ranked Features
Figure 8 shows the classification accuracy obtained by each top-N-F-score-ranked feature set in different types of RP extracted from the emotion-induction state.Figure 6 shows that overall, the three accuracy curves decrease with the increase of N. The highest classification accuracies for the three types of RPs are 80.00% (N = 6), 76.36% (N = 2), and 80.00% (N = 7), respectively.The best numbers of the top-N-F-score-ranked features (i.e., the best N) for different types of RPs were apparently different.Table 1 further compares the MDD-control classification accuracy between the best N features extracted from the resting state and the emotion-induction state in different conditions of frequency bands.
three accuracy curves decrease with the increase of N. The highest classification accuracies for the three types of RPs are 80.00% (N = 6), 76.36% (N = 2), and 80.00% (N = 7), respectively.The best numbers of the top-N-F-score-ranked features (i.e., the best N) for different types of RPs were apparently different.Table 1 further compares the MDD-control classification accuracy between the best N features extracted from the resting state and the emotion-induction state in different conditions of frequency bands.As shown in Table 1, the classification accuracy is higher than 70% in the "all bands" condition for RP-I (74.55%),RP-II (70.91%), and RP-III (74.55%).When the classification was based on the emotion-induction state, the classification accuracy improved to 80.00%, 76.36%, and 80.00%, respectively.In fact, emotion-induction outperformed the resting state for every combination of RP type and frequency.To quantify how much classification accuracy improved for the emotioninduction state versus the resting state, we computed the accuracy improvement ratio (AIR), defined as follows: As shown in Table 1, the classification accuracy is higher than 70% in the "all bands" condition for RP-I (74.55%),RP-II (70.91%), and RP-III (74.55%).When the classification was based on the emotion-induction state, the classification accuracy improved to 80.00%, 76.36%, and 80.00%, respectively.In fact, emotion-induction outperformed the resting state for every combination of RP type and frequency.To quantify how much classification accuracy improved for the emotion-induction state versus the resting state, we computed the accuracy improvement ratio (AIR), defined as follows: AIR = accuracy (EI) − accuracy(resting) accuracy(resting) × 100% (13) As shown in Figure 9, the AIRs were all positive.This reveals a consistent advantage for classification based on EEG data from the emotion-induction session versus the resting state.The results in Table 1 and Figure 9 demonstrate that, compared with the resting state, the emotion-induction state induces EEG signals that differ more sharply across depressed versus healthy adults.Table 1 further highlights that, for each type of RP and each recording state, the accuracy of the "all bands" condition is higher than that for any single band.Take the resting-state RP-I as an example.The accuracies of the five single bands (delta, theta, alpha, beta, and gamma) are 65.45%, 70.91%, 63.64%, 65.45%, and 61.82%, respectively, which are all lower than that in the "all bands" condition (74.55%).This observation indicates that using all the data from different frequency bands results in better accuracy than using the data from any single frequency.

Comparison of Top-N-F-Score-Ranked Features Across the Three Types of Relative Power During Resting State and Emotion-Induction State
Table 2 lists the best top-N features in each state and in each RP condition.Note that for each type of RP, there is almost no overlap between the best feature sets from the resting state and emotioninduction state.For both RP-I and RP-III, there is only one common relative power feature (i.e., frontal theta asymmetry: theta band FP1-FP2) for the resting and the emotion-induction states.As for RP-II, there is no overlap between the two states.Thus, the brain activity that distinguishes the participants with MDD from the healthy controls is different during rest versus positive emotion induction.Of particular interest, one notable difference is the shift from frontal alpha asymmetry at rest to frontal delta asymmetry during positive emotion induction.As shown in Table 2, for both RP-I and RP-III, Table 1 further highlights that, for each type of RP and each recording state, the accuracy of the "all bands" condition is higher than that for any single band.Take the resting-state RP-I as an example.The accuracies of the five single bands (delta, theta, alpha, beta, and gamma) are 65.45%, 70.91%, 63.64%, 65.45%, and 61.82%, respectively, which are all lower than that in the "all bands" condition (74.55%).This observation indicates that using all the data from different frequency bands results in better accuracy than using the data from any single frequency.

Comparison of Top-N-F-Score-Ranked Features Across the Three Types of Relative Power During Resting State and Emotion-Induction State
Table 2 lists the best top-N features in each state and in each RP condition.Note that for each type of RP, there is almost no overlap between the best feature sets from the resting state and emotion-induction state.For both RP-I and RP-III, there is only one common relative power feature (i.e., frontal theta asymmetry: theta band FP1-FP2) for the resting and the emotion-induction states.As for RP-II, there is no overlap between the two states.Thus, the brain activity that distinguishes the participants with MDD from the healthy controls is different during rest versus positive emotion induction.Of particular interest, one notable difference is the shift from frontal alpha asymmetry at rest to frontal delta asymmetry during positive emotion induction.As shown in Table 2, for both RP-I and RP-III, the top features were FP1-FP2(α) in the resting state, but this became FP1-FP2(δ) during the positive emotion induction.

Comparison of LOPO-CV Classification Performance Among Different Classifiers in the-Emotion Induction State
Table 3 lists the best emotion-induction LOPO-CV classification accuracies, where the relative power features of each type are the best ones listed in Table 2.Among the four classifiers, QDA performed the worst for all types of RP features, and CK-SVM performed the best in RP-II and RP-III.When the best features of RP-I were used, SVM and CK-SVM achieved identical accuracy.The results indicate that, overall, CK-SVM outperformed SVM.Finally, the combination of RP-III features and the CK-SVM classifier achieved the highest accuracy of 83.64%.This corresponds to the correct classification of 46 of the 55 participants: 21 of the 24 MDD participants were detected (sensitivity = 87.50%),and 25 of the 31 healthy controls were correctly classified (specificity = 80.65%).

Discussion
It was easier to classify adults as depressed versus healthy using EEGs recorded during a positive emotion-induction state as compared with the resting state.This may reflect the altered response to positive emotional stimuli and rewarding experiences previously reported in depressed adults [14][15][16]18].Along these lines, our behavioral data revealed that MDD blunted the pleasantness-but not the overall arousal-of the emotional responses to positive pictures.Future studies may wish to investigate correlations between the extent of positive emotional dysregulation in depressed adults and classification accuracy based on EEG data collected during emotion induction.
Although the current study used IAPS pictures to induce emotional responses, it is important to note that this is not the only publicly available database of emotional stimuli.Other public resources such as the International Affective Digitized Sounds (IADS) and the Database for Emotion Analysis using Physiological signals (DEAP) have also been used in recent EEG studies [42,43].Both IADS and DEAP include stimuli with positive emotional content.It is unclear whether the findings in the current study-especially the promising classification performance of those best relative power features-would generalize to emotion induction using positive stimuli selected from IADS or DEAP.Because the cortical activities that underlie emotional responses may vary somewhat with the sensory nature of the eliciting stimulus (e.g., multisensory stimuli in DEAP vs visual stimuli in IAPS), it will be important to address this issue in future work.Furthermore, emotional responses are known to vary across cultures [44,45].Because all participants in the present study came from the United States (US), future studies should address whether the current findings apply to different settings.
The findings of this study have several implications for developing an EEG-based brain-computer interface (BCI) system for the detection of MDD.First, the extraction of the top seven F-score-ranked RP-III features induced by the positive emotional stimuli required only seven EEG recording sites, including FP1, FP2, TP7, T6, F4, CP3, and C3 (Table 2).If these results can be replicated, the implication is that a low-density electrode montage could be used to obtain similar classification accuracies.A low-density montage can be setup quickly if a dry-electrode or non-gel-based electrode system is used (e.g., the EGI system used in this study).Even if a gel-based recording system is applied (e.g., the NuAmp made by NeuroScan Inc.), the preparation stage (including applying electric gel to reduce the impendence of the seven electrodes) could be accomplished in about 10 min by experienced users, which would make clinical application of the BCI system feasible.
Second, the current study showed that the combination of positive emotion-induction RP-III features and a CK-SVM classifier yielded the highest classification accuracy.Therefore, this combination could provide clinicians with a useful tool for assisting with the detection of MDD.Obviously, the results of this study must be replicated before considering a transition to clinical settings.Moreover, it may be possible to improve classification performance further by using other types of EEG features that have been proven useful for distinguishing between healthy and depressed individuals in prior studies, such as nonlinear features based on fractal dimension (FD) analysis (e.g., Higuchi's FD [6], correlation dimension [10], Katz' FD [46]), and the spectral-spatial features based on kernel eigen-filter-bank common spatial pattern (KEFB-CSP) proposed in our recent work [12].All these features may also perform better in the positive emotion-induction state than in the resting state.Although discussion of these other features is beyond the scope of the current study, they certainly merit attention in future work.
Finally, the clinical implications of this work deserve consideration.The diagnosis of MDD is currently assigned based on diagnostic interviews and self-reports.This is problematic, because individuals typically underreport their depressive symptoms [47].This is especially true for men relative to women [48,49] and for individuals from Asian as opposed to Western cultures [50,51].Consequently, MDD is often underdiagnosed in such participants, which results in needless suffering.This important issue could be resolved by application of the EEG-based method described here, because it does not rely on self-reporting for detecting depression.For this possibility to materialize, however, the method will need to be validated in such participants.
If this idea is pursued further, it would be valuable to simultaneously try to identify EEG signals that can predict which individuals will respond to which specific treatment.Matching treatments to patients is currently based on a trial-and-error approach that is inefficient: many individuals must cycle through several treatments before finding one that works for them.To combat this problem, recent studies have identified specific variables-measured behaviorally or via self-reports-that can be used to identify the optimal treatment for individuals on the first try [52,53].MDD, however, is a heterogeneous disorder that likely involves multiple distinct pathophysiologies.These pathophysiologies may be difficult to tease apart behaviorally or by self-reporting, but they may respond very differently to various treatments.Because EEG is a direct measure of brain activity, it is presumably closer to the pathophysiological process, and so, it may be possible to identify EEG-based signals that could be used for treatment prediction (e.g., [54]).In short, an important next step will be to use EEG signals not only to distinguish between individuals with MDD and healthy controls, but to identify the particular intervention that is likely to work best for one depressed participant versus another.

Conclusions
The current study investigated the value of positive emotion induction as a method for eliciting EEG signals that could aid in the classification of adults as healthy versus depressed.It also compared three types of relative power features (RP-I, RP-II, and RP-III) that have been adopted in different MDD studies and introduced a variant of SVM-the CK-SVM-as an MDD detector.There are three main findings.First, for all types of RPs, the best F-score-ranked features in the resting state and the emotion-induction state were quite different.Second, for all types of RPs, LOPO-CV classification accuracy was better for EEG collected during the positive emotion-induction versus the resting state.Finally, the CK-SVM classifier outperformed the other three classifiers, yielding an accuracy of 83.64% in the use of the RP-III feature.All the three findings were based on the cross-validation results.In the future, it will be necessary to test these methods (positive emotion-induction EEG relative powers and the CK-SVM classifier) on an independent dataset.In the meantime, the application of the CK-SVM classifier to EEG data collected during a positive emotion-induction state is a promising method for classifying adults as healthy or depressed.

17 Figure 1 .
Figure 1.Distribution of the valence and arousal scores of the selected 27 International Affective Picture System (IAPS) pictures.Every picture has valence and arousal scores higher than five.LVHA = low valence, high arousal; HVHA = high valence, high arousal; LVLA = low valence, low arousal; and HVLA = high valence, low arousal.

Figure 2 .
Figure 2. Examples of the pictures used in this study.The three numbers in the brackets indicate the IAPS identification (ID), mean valence rating, and mean arousal rating of each picture.

Figure 1 .
Figure 1.Distribution of the valence and arousal scores of the selected 27 International Affective Picture System (IAPS) pictures.Every picture has valence and arousal scores higher than five.LVHA = low valence, high arousal; HVHA = high valence, high arousal; LVLA = low valence, low arousal; and HVLA = high valence, low arousal.

17 Figure 1 .
Figure 1.Distribution of the valence and arousal scores of the selected 27 International Affective Picture System (IAPS) pictures.Every picture has valence and arousal scores higher than five.LVHA = low valence, high arousal; HVHA = high valence, high arousal; LVLA = low valence, low arousal; and HVLA = high valence, low arousal.

Figure 2 .
Figure 2. Examples of the pictures used in this study.The three numbers in the brackets indicate the IAPS identification (ID), mean valence rating, and mean arousal rating of each picture.

Figure 2 .
Figure 2. Examples of the pictures used in this study.The three numbers in the brackets indicate the IAPS identification (ID), mean valence rating, and mean arousal rating of each picture.

Figure 4 .
Figure 4. 128 channel map of the HydroCel GSN.The 29 channels marked in black were used for analysis.

Figure 3 .
Figure 3. Trial sequence for the resting and emotion-induction states.EEG = electroencephalography.

Figure 3 .
Figure 3. Trial sequence for the resting and emotion-induction states.EEG = electroencephalography.

Figure 4 .
Figure 4. 128 channel map of the HydroCel GSN.The 29 channels marked in black were used for analysis.

Figure 4 .
Figure 4. 128 channel map of the HydroCel GSN.The 29 channels marked in black were used for analysis.

Figure 5 .
Figure 5. Illustration of the calculation of three types of relative powers in a specific frequency band between a pair of electrodes A and B. DFT = discrete Fourier transform; PSD = power spectrum density, and BP = band power.

Figure 5 .
Figure 5. Illustration of the calculation of three types of relative powers in a specific frequency band between a pair of electrodes A and B. DFT = discrete Fourier transform; PSD = power spectrum density, and BP = band power.

Figure 6 .
Figure 6.Fisher ratio (F-scores, sorted from high to low) of the 2030 (a) RP-I; (b) RP-II; and (c) RP-III features extracted during the emotion-induction state.The F-scores of the top 1 RP-I, RP-II, and PR-III features are 0.1680, 0.2311, and 0.1643, respectively.The F-score curves for the three types of relative power features decrease rapidly.The 54th features' F-scores are 0.0779 (RP-I), 0.0762 (RP-II), and 0.0695 (RP-III), respectively.Finally, the F-score curves converge to near-zero, showing that most of the 2030 relative power features are noisy.

Figure 6 .
Figure 6.Fisher ratio (F-scores, sorted from high to low) of the 2030 (a) RP-I; (b) RP-II; and (c) RP-III features extracted during the emotion-induction state.The F-scores of the top 1 RP-I, RP-II, and PR-III features are 0.1680, 0.2311, and 0.1643, respectively.The F-score curves for the three types of relative power features decrease rapidly.The 54th features' F-scores are 0.0779 (RP-I), 0.0762 (RP-II), and 0.0695 (RP-III), respectively.Finally, the F-score curves converge to near-zero, showing that most of the 2030 relative power features are noisy.

Figure 8 .
Figure 8. Classification accuracy obtained by each top-N-F-score-ranked feature set in the emotion-

Figure 7 .
Figure 7. Ratings of emotional valence and arousal for the 27 IAPS pictures.* denotes p <0.05.

Figure 8 .
Figure 8. Classification accuracy obtained by each top-N-F-score-ranked feature set in the emotioninduction state.The best N for RP-I, RP-II, and RP-III features are 6, 2, and 7, where the classification accuracies are 80.00%, 76.36%, and 80.00%, respectively.

Figure 8 .
Figure 8. Classification accuracy obtained by each top-N-F-score-ranked feature set in the emotion-induction state.The best N for RP-I, RP-II, and RP-III features are 6, 2, and 7, where the classification accuracies are 80.00%, 76.36%, and 80.00%, respectively.

Figure 9 .
Figure 9. Emotion-induction state versus resting state accuracy improvement ratios.

Figure 9 .
Figure 9. Emotion-induction state versus resting state accuracy improvement ratios.
1 and µ 2 are the mean vectors of the training data of the first (positive: MDD) and the second (negative: control) classes, respectively, Σ is the N × N covariance matrix of the training data of the two classes, C 12 is the penalty weight for the positive class's training error, C 21 is the penalty for the negative class's training error, and π 1 and π 2 are the a priori probabilities of the positive and the negative classes, respectively.Here, we set C 12 = C 21 = 1.The test datum x is classified as MDD if D LDA (x) > 0; otherwise, it belongs to the control group.
class 1; otherwise, x ∈ class 2. Both LDA and QDA have the parameters and C 12 and C 21 .For a fair comparison, we set C 12 = C 21 = 1.

Table 1 .
A comparison of linear discriminant analysis (LDA)-based leave-one-participant-out cross-validation (LOPO-CV) classification accuracy between the resting and the emotion-induction (EI) states in different conditions of frequency bands.The number in the parenthesis refers to the best number of N.

Table 3 .
Comparison of accuracy for different classifiers using data from the emotion-induction state.QDA = quadratic discriminant analysis; and CK-VSM = conformal kernel support vector machine.