Diagnosing Schizophrenia Using Effective Connectivity of Resting-State EEG Data

Schizophrenia is a serious mental illness associated with neurobiological deficits. Even though the brain activities during tasks (i.e., P300 activities) are considered as biomarkers to diagnose schizophrenia, brain activities at rest have the potential to show an inherent dysfunctionality in schizophrenia and can be used to understand the cognitive deficits in these patients. In this study, we developed a machine learning algorithm (MLA) based on eyes closed resting-state electroencephalogram (EEG) datasets, which record the neural activity in the absence of any tasks or external stimuli given to the subjects, aiming to distinguish schizophrenic patients (SCZs) from healthy controls (HCs). The MLA has two steps. In the first step, symbolic transfer entropy (STE), which is a measure of effective connectivity, is applied to resting-state EEG data. In the second step, the MLA uses the STE matrix to find a set of features that can successfully discriminate SCZ from HC. From the results, we found that the MLA could achieve a total accuracy of 96.92%, with a sensitivity of 95%, a specificity of 98.57%, precision of 98.33%, F1-score of 0.97, and Matthews correlation coefficient (MCC) of 0.94 using only 10 out of 1900 STE features, which implies that the STE matrix extracted from resting-state EEG data may be a promising tool for the clinical diagnosis of schizophrenia.


Introduction
Schizophrenia is a severe neuropsychiatric disorder affecting approximately 20 million people worldwide according to the World Health Organization (WHO) report [1,2]. Schizophrenia is characterized by noticeable psychotic symptoms including hallucinations, delusions, reduction in performance, and thought disorder. Based on neuroimaging evidence on structural, functional, and effective brain connectivity, a core deficit of schizophrenia can be proposed as the failure of effective functional integration within and between brain areas [3].
Several studies proved alterations in functional connectivity (FC) in patients with schizophrenia (SCZs) in comparison to healthy controls (HCs) in response to external cognitive or sensorimotor stimulation [4][5][6]. However, resting-state electroencephalography (EEG) FC reflects the intrinsic inter-neuronal connections in specific circuits such as the default mode network (DMN) that are attenuated or interrupted during cognitive or sensorimotor tasks [7]. Therefore, investigating resting-state brain connectivity may reveal an intrinsic functional disintegration of brain regions for SCZ.
Machine learning algorithms (MLAs) have been widely used in applications related to neuroscience and psychiatry (e.g., [6,[8][9][10]). Recently, there has been an increasing number of studies that use MLAs to diagnose schizophrenia based on resting-state EEG patterns. In Table 1, we highlighted the outcomes of the fourteen most recent studies in this area with the highest classification accuracy. Boostani et al. [11] extracted several features including band  [15] 10 SVM 80% 55 SCZs and 23 HCs Oh et al. [19] applied an 11-layer CNN model to differentiate resting-state EEGs of 14 SCZs and 14 HCs with deep learning. A total of 1142 EEG segments were used for each subject, where each segment consisted of 6250 time samples and 19 electrodes. Therefore, the total number of sampling points was 1142 × 6250 × 19 = 135,612,500. The most significant features were then automatically extracted by the CNN. Their proposed model achieved classification accuracies of 81.26%. This dataset (14 SCZs and 14 HCs) has been used in four more studies to diagnose schizophrenia [20][21][22][23]. In [20], Jahmunah et al. first segmented the EEG data for each subject into segments of 25 sec. Therefore, they obtained 516 segments for HC and 626 segments for SCZ. They extracted 157 nonlinear features such as the largest Lyapunov exponent, Kolmogorov-Sinai entropy, Hjorth complexity and mobility, Kolmogorov complexity, bispectrum, and permutation entropy. The optimal 14 features were then selected and applied to various classifiers, where the best performance belonged to the SVM classifier with an accuracy of 92.91%. In [21], Buettner et al. used 200 power bands within a range of 0.5 Hz each as features that applied to a random forest (RF) classifier. Using 499 1-min samples for all 28 subjects (375 for training and 124 for evaluation), they yield an accuracy of 96.77%. In [22], Racz et al. used 21 dynamic features of dynamic FC (DFC) at delta frequency bands (0.5-4 Hz) such as its entropy and multifractal properties to classify the two groups. They achieved a classification performance of 89.29% using an RF classifier. In [23], Goshvarpour et al. selected non-linear features including complexity (Cx), Higuchi fractal dimension (HFD), and Lyapunov exponent and the fusion of these features using five different combination rules (R1: summation, R2: product, R3: division, R4: weighted sum using F-values, and R5: weighted sum using information gain ratio (IGR) rules) for 19 EEG electrodes. Using the probabilistic neural network (PNN) classifier and the R3 rule features, they achieved a classification performance of 100%.
Baradits et al. [24] investigated whether abnormalities in microstates, quasi-stable electrical fields in the EEG data, can be used to classify SCZs and HCs. They used four microstates (microstate A: auditory network; microstate B: visual network; microstate C: salience network; and microstate D: fronto-parietal network) and obtained 24 features including basic microstate features (microstate average duration, occurrence per second, full coverage of the time, 4 × 3 = 12 features) and the microstates transition probabilities (12 features). Using 14 out of 24 features that demonstrate significant differences between SCZ and HC and SVM classifier, they yield 82.7% accuracy for classifying 70 SCZs and 75 HCs. Kim et al. [25] recruited 119 SCZs and 119 HCs in their study. They obtained the source-level cortical FC network, where minimum norm estimation (MNE) was used to estimate the time series of source activity and the phase-locking value was used for calculating FC. Values of the clustering coefficient (CC) and path length for the cortical functional network were then used as selected features. Using the linear discriminant analysis (LDA) classifier, the best classification performance was 80.66% by choosing 27 optimal features.
In eight of these studies, a small dataset of SCZs and HCs was analyzed [11,13,[18][19][20][21][22][23], which limits the power and applicability of the MLAs and deep learning algorithms. Using databases with larger data samples would allow having sufficient training data to adjust the model parameters more accurately and therefore increase the generalizability/reliability of the model, i.e., the performance on previously unseen data. Particularly when the selected features display significant variability, a larger training dataset is required to have a reliable classification performance. Furthermore, the sample size (N s ) to the number of features (N f ) ratio in some studies [11][12][13]16,[20][21][22][23] is much lower than the rule of 10:1, or N f is larger than the square root of N s , which are referred to as the rules of preventing over-fitting (good quality) [24]. Finally, two of these studies [17,19] used deep learning algorithms, which are more complex compared to traditional MLAs and therefore require more training data to be reliable. Furthermore, complex feature engineering (the process of using domain knowledge to extract features from raw data) increases the difficulty to interpret the model [26]. Hence, only three studies [15,24,25] meet the optimal properties of a high to very high sample size (N s > 100) and N f to N s ratio rules; however, the classification accuracy of these studies is less than 85%.
The objective of this study is to develop a new MLA based on effective connectivity (EC) measurements to study distinguishing characteristics between schizophrenics and healthy brains using a small set of selected features. FC reflects the statistical dependencies of signals from different brain regions as typically revealed by cross-correlation, coherency, or phase lag index measures. In contrast, EC measures the causal influences that a neural unit applies over another, which defines the mechanisms of neuronal coupling much more precisely than FC [27].
The method to measure EC must fulfill four criteria to be useful for measuring connectivity between brain areas [28], which are (1) independence from any a priori definitions and models; (2) ability to detect strong non-linear interactions across all levels of the brain function from the mechanism of action potential generation in neurons to psychometric functions; (3) the ability to detect EC even with a wide interaction delay between the two signals, reflecting signal transmission through multiple pathways or over complex axonal networks; and (4) robustness against linear cross-talk between signals. Transfer entropy (TE), a model-free statistic that can measure the directed flow of information between two incidents, accomplishes all four of these criteria and can therefore be considered as a suitable method to measure EC [28,29]. For these reasons, TE has gained growing application in neurological science for measuring the information exchanges or understanding the EC across data modalities like EEG [28]. Moreover, it has been recently demonstrated that, for Gaussian variables, it can be estimated with linear vector autoregressive models, since it is operationally equivalent to Granger causality [30]. In this form, it has been used for the estimation of EC on real EEG data [31].
Various methods are available to estimate TE from experimental data (e.g., [32][33][34]). However, most of the methods are very sensitive to noise and need large amounts of data and parameters tuning which limit their utility. Symbolic TE (STE) [35], which estimates TE through symbolization, is a convenient, robust, and computationally efficient method to measure the flow of information in dynamic and multidimensional systems. This makes STE a promising measure of the preferred direction of information flow between brain regions.
STE has been widely applied in EEG studies, including the effects of anesthesia on information processing in the brain [36], the study of epileptic networks [37], investigating the impacts of sleep apnea-hypopnea on the EEG signal [38] and predicting response to clozapine therapy for SCZ patients using resting-state [10] and P300 activities [39]. This confirms that the STE method is a promising tool to study brain network connectivity and its alteration due to mental and neurological disorders and the use of medications. However, to the best of our knowledge, this is the first study applying STE to diagnose schizophrenia from resting-state EEG data. In this study, we investigated the impacts of schizophrenia on STE at various frequency bands.
The contribution of this paper is 2-fold. First, the fast and robust STE approach is used to measure the EC between brain regions for schizophrenic patients. Second, an MLA is developed based on the features extracted from these EC measures to diagnose schizophrenia. The novelty of this study is therefore in combining STE and MLAs to diagnose schizophrenia, which helps to discriminate SCZ patients from HCs with high accuracy by using a small number of features and less complex traditional MLAs, relative to previous studies.

EEG Data
An experienced technician recorded 3.5-min eyes-closed resting-state EEG in a soundproof electromagnetically shielded room using a 10-20 EEG setup with 20 electrodes (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, Oz, O2), where the electrodes' locations follow the unipolar 10-20 Jasper registration scheme [41]. All the recording sessions were scheduled in the morning and the subjects were requested to avoid smoking and consuming coffee, alcohol and drugs before the session. The signals were notch filtered at 60 Hz and band-pass filtered between 0.5 Hz and 80 Hz during the recording and digitalized with the sampling frequency of 204.8 Hz. Figure 1 illustrates an example of EEG recordings for HC and SCZ.

Data Pre-Processing
To minimize the artifacts, we first band-pass-filtered the EEG signal with cut-off frequencies of 0.5 Hz and 50 Hz. We then used the wavelet-enhanced independent component analysis (wICA) method to detect and remove the components that were contaminated with the artifacts [42]. wICA uses the wavelet threshold to enhance artifact removal

Data Pre-Processing
To minimize the artifacts, we first band-pass-filtered the EEG signal with cut-off frequencies of 0.5 Hz and 50 Hz. We then used the wavelet-enhanced independent component analysis (wICA) method to detect and remove the components that were contaminated with the artifacts [42]. wICA uses the wavelet threshold to enhance artifact removal with independent components analysis and can therefore better recover the neural activities that are hidden in the artifacts.

EEG-STE
TE measures the directional information flow between two incidents (data), without assuming any particular model for them, which is especially relevant for detecting the direction of information flow for non-linear interactions with unknown structural information [28,29]. However, estimating the transition probabilities from raw data is not trivial. One solution for this issue is STE. STE transforms the raw data with continuous time series and therefore distribution into symbolic sequences with discretized symbols to simplify the calculation of probability distributions [35]. Here, we briefly describe the STE procedure.
Consider two random processes X = (x 1 , x 2 , . . . , x N ) and Y = (y 1 , y 2 , . . . , y N ), where x i and y i are the ith samples that are obtained from two regions of the brain. Symbolic transfer entropy (STE) estimates the transfer of information between X and Y with a symbolization process. In this method, first for a given i, m amplitude values where m is the embedding dimension, which shows the length of the segments in random processes to be compared, and d is the time delay sample. X a i is then transformed into a sequence of discretized symbols, Knowing the two symbol sequences,X i andŶ i , STE is then calculated as [35].
where p denotes the transition probability density, the sum runs over all symbols and t denotes a time step. We used the EEGapp pipeline [43] to calculate the STE between every two electrodes. In this app, firstly, the 3.5-min EEG signal was divided into 21-time segments of 10 s. . Therefore, the total number of features for all 5 frequency bands is N c = 380 × 5 = 1900 for each subject.

Machine Learning
The dataset in this study consists of the STE features for all 132 subjects and their corresponding labels: label 1 for the 62 SCZs and label 2 for the 70 HCs. MLA employs a training set consisting of labeled samples from SCZ and HC subjects to the class of subjects. The most discriminating features, defined as features whose values differ between the SCZ and HC classes, were identified from a list of candidate features, using various types of feature selection algorithms. We need this step to avoid over-fitting, which impacts the classification performance. These selected features then define a feature space. The job of a classifier is to optimally partition the available training samples into two separate regions (i.e., an SCZ region and an HC region) in the feature space. The class of a previously unseen sample can then be determined by extracting the selected features from the sample and plotting the corresponding point in the feature space. The proximity of each subject's point, in the feature space, to the regions in this feature space occupied by others who are known to be either SCZ or HC, then determines that subject's class.
In this study, we used the Relief algorithm [44] for selecting the most discriminating feature, which is noise-tolerant and robust to feature interactions. The key idea of the algorithm is to select features according to how their values are similar for the neighboring samples in the same class and different for the neighboring samples in different classes [44]. One repeatedly noted drawback of the Relief algorithm is that it does not effectively remove feature redundancies, i.e., it selects features without considering their correlation. However, unless two features are highly correlated (i.e., redundant), useful information may be lost when redundant features are removed [45,46]. Furthermore, there is an inverse relationship between the correlation of EEG electrodes and their distance. In this study, since we used a low-density EEG set up with just 20 electrodes, the ECs between these electrodes are not highly correlated due to the long distance between them [47].
We used a newly developed consensus nested cross-validation (CN-CV) approach to avoid choosing dominant features among a few subjects [48]. CN-CV is an iterative process, wherein first, the subjects are divided into k (here k = 5) outer folds with the same number of subjects for each class. Then, at each iteration, one particular fold is considered as a test fold and all the features associated with that fold are removed from the training set. The remaining k − 1 folds are then combined and divided into l (here l = 5) inner folds. Then, at each fold l (here l = 1, 2, . . . , 5), all features with a positive score based on the Relief algorithm, that are more likely to be relevant to classification, are considered as the selected features for that fold. Consensus (common) features through all the l folds are then considered as the feature set for outer fold k. The iterations repeat until all outer folds have been removed from the training set once. The structure of the CN-CV algorithm is analogous to the well-known nested CV(N-CV) [49], but unlike N-CV only feature selection is achieved in each inner fold. This makes the CN-CV algorithm more computationally efficient than the N-CV method that selects fewer irrelevant features [49]. We then selected the first N r features among all the selected features with the CN-CV approach that gives the highest generalized classification accuracy (averaged accuracy over all k outer folds) as the final selected features.
The third step is to indicate the class (label) of subjects based on the selected features. Various types of classifiers are available for classifying biological signals. In this study, we compare the performance of the five most popular classifiers including Gaussian naïve Bayes (GNB), linear discriminant analysis (LDA), K-nearest neighbors (KNN), support vector machine (SVM), and random forests (RF) using MATLAB R2020a. The choice of these classifiers is based on their effectiveness and simplicity in their implementation. Here, we briefly describe each method.
(1) Gaussian Naïve Bayes (GNB) GNB method classifies the new data based on applying Bayes' theorem with the "naive" assumption, where the features are assumed to be independent with Gaussian probability distribution. We used the GNB classifier in our study because of its simplicity and transparency in machine learning modalities [50].
(2) Linear Discriminant Analysis (LDA) LDA classifier assumes the data samples in each class have Gaussian distribution and the covariance matrices for both classes are the same. As a result, the decision boundary is a linear surface and the LDA predicts the class of a new datum by estimating the probability that it belongs to each class. The class with higher probability is considered the class for the new data. Since the discriminant function is linear, LDA may not be suitable for the non-linearly separable features. Furthermore, this classifier is very sensitive to outliers [51]. (

3) K-Nearest Neighbors (KNN)
KNN classifier assigns new data to a specific class if the majority of its k-nearest neighbors belong to that class within the training set. With a sufficiently high value of k and enough training data samples, KNN can produce non-linear decision boundaries.
KNN is sensitive to the feature vector dimension [52]. However, it is efficient when the dimension of the feature vector is low [53]. (4) Support Vector Machine (SVM) SVM classifier creates the hyperplanes known as support vectors that maximize the distance (margin) between the two classes by minimizing the SVM cost function, which leads to maximizing the classification accuracy [54]. SVM is a widely employed classifier in EEG data classification (e.g., [15,16,18,20,24]) because of its high generalization power and relatively good scalability to high-dimensional data.

(5) Random Forests (RF)
RF is an ensemble learning algorithm that combines multiple decision trees at the training stage and uses the mode of their outputs (the class that appears most often) as a final class. This powerful learning algorithm first takes N samples with replacement from the dataset (bootstrapping). It then trains each tree by using a subset of features. Inserting randomness in building RF, makes it robust to the outliers in the database [55]. This method is also widely used for classification based on EEG data (e.g., [21,22]).
In this study, we considered k = 5 neighbors for the KNN classifier, and Gaussian radial basis kernel function and the sequential minimal optimization technique [56] for SVM, and 80 decision trees for RF.
The fourth step is evaluating the classifiers' performance. Due to the small size of our data sample, we first used the five outer folds of the CN-CV approach in this step to obtain an efficient estimate of classifiers' performances. Then, to further investigate the performance of the proposed method, we evaluated the classifiers' performances with another dataset used in studies [19][20][21][22][23] that contains 14 SCZs (7 males (50%), age: 27.9 ± 3.3 and 7 females (50%), age: 28.3 ± 4.1) and 14 HCs (7 males (50%), age: 26.8 ± 2.9 and 7 females (50%), age: 28.7 ± 3.4 years) collected by the Institute of Psychiatry and Neurology in Warsaw, Poland [57], which is available online at RepOD [58].
To evaluate the classifiers' performance, we measured the sensitivity (SCZ prediction rate or the proportion of SCZs that are correctly identified), specificity (HC prediction rate or the proportion of HCs that are correctly identified), precision (the proportion of subjects classified in SCZ class that are correctly identified), total accuracy (the ratio of the total number of correctly identified SCZs and HCs to the total number of participants), F1-score (a measure of a test's accuracy that is calculated from the precision and the sensitivity of the test, which is a better metric than the total accuracy to evaluate a classifier when an imbalanced class distribution exists), and the Matthews correlation coefficient (MCC) (a measure of the quality of binary (two-class) classification) for GNB, LDA, SVM, KNN and RF classifiers. These evaluation parameters are represented by Total accuracy = TP + TN TP + TN + FP + FN (5) where true positive (TP) is the number of SCZs that are correctly identified, true negative (TN) is the number of HCs that are correctly identified, false positive (FP) is the number of HCs that are misclassified into the SCZ class and false negative (FN) is the number of SCZs that are misclassified into the HC class. MCC is more reliable than the F1 score and total accuracy in binary classification since it produces a high value only if we have high TP and TN rates and low FP and FN rates [59].

Results and Discussion
Using the Relief algorithm, Table 2 shows the N r = 10 most discriminating features between SCZ and HC that provided the highest performance, where the second column of the table indicates the frequency band of the features and the third column shows the areas of the brain that the EC between them using STE is selected as a discriminating feature. For example, from the first row of the table, the first feature is the directed EC from C3 to T3 at θ frequency band. The illustration of these selected features is shown in Figure 2. The number of features is considerably lower than 106 training samples at each fold that will prevent over-fitting (the feature to the number of training samples ratio is 10/106 × 100 = 9.43%).  From Table 2, four of the selected features are from the connectivity between the occipital areas at different frequency bands (features 4-6, 10). The other features are from left centro-temporal (features 1 and 9), frontal (feature 7), fronto-temporal (feature 8) and parieto-temporal (features 2 and 3), which were also identified in previous studies. Several studies verify a significant alteration in these areas and their connection in SCZs compared to HCs. Here, we briefly describe the outcomes of some of the most recent studies. Tohid et al. [60] conducted a systematic review that reports the results of the relevance of schizophrenia to the occipital lobe. They found out there is enough evidence that supports the concept of a decrease in the volume of the occipital lobe in SCZs. In another study, Maller et al. [61] showed that the prevalence of occipital bending is nearly three times higher among SCZs in comparison to HCs. Kawasaki et al. [62] found a significant decrease in SCZs' source activities in comparison to HCs, especially in the medial frontal area, superior temporal gyrus, and temporo-parietal junction (TPJ) using the recorded event-related potentials (ERPs) in response to auditory oddball paradigms. Jalili et al. [63] applied a new form of multivariate synchronization analysis called the S-estimator to the high-density resting-state EEG data of SCZs and HCs. They revealed higher synchroniza- From Table 2, four of the selected features are from the connectivity between the occipital areas at different frequency bands (features 4-6, 10). The other features are from left centro-temporal (features 1 and 9), frontal (feature 7), fronto-temporal (feature 8) and parieto-temporal (features 2 and 3), which were also identified in previous studies. Several studies verify a significant alteration in these areas and their connection in SCZs compared to HCs. Here, we briefly describe the outcomes of some of the most recent studies. Tohid et al. [60] conducted a systematic review that reports the results of the relevance of schizophrenia to the occipital lobe. They found out there is enough evidence that supports the concept of a decrease in the volume of the occipital lobe in SCZs. In another study, Maller et al. [61] showed that the prevalence of occipital bending is nearly three times higher among SCZs in comparison to HCs. Kawasaki et al. [62] found a significant decrease in SCZs' source activities in comparison to HCs, especially in the medial frontal area, superior temporal gyrus, and temporo-parietal junction (TPJ) using the recorded event-related potentials (ERPs) in response to auditory oddball paradigms. Jalili et al. [63] applied a new form of multivariate synchronization analysis called the S-estimator to the high-density resting-state EEG data of SCZs and HCs. They revealed higher synchronization across the left fronto-centro-temporal locations and right frontoentro-temporo-parietal locations in SCZs than in HCs. Takahashi et al. [64] found that SCZs have a greater complexity than HCs in fronto-centro-temporal regions using multiscale entropy in resting-state EEG activity. Ohi et al. [65] acquired 3T MRI scans from SCZs and HCs. They revealed that SCZs have significantly smaller bilateral superior temporal gyrus volumes than HCs. Pu et al. [66] found significantly smaller hemodynamic changes in SCZs than in HCs in the ventro-lateral prefrontal cortex and the anterior part of the temporal cortex (VLPFC/aTC) and dorso-lateral prefrontal cortex and frontopolar cortex (DLPFC/FPC) regions using 52-channel near-infrared spectroscopy (NIRS). Ibáñez-Molina et al. [67] used the Lempel-Ziv algorithm to assess the complexity of EEG signals in SCZs. They found a higher complexity in the resting-state EEG signals of SCZs at the right frontal area. Using a multivariable TE (MTE), Harmah et al. [68] discovered the brain dysfunction in EC for SCZs in the EEG signals of the oddball task that deteriorated in the parietal and frontal lobes. These two lobes showed more difference between SCZ and HC even during mental activity [15]. Kim et al. [25] showed that the most frequently selected features for classifying the SCZ vs. HC were from the frontal and occipital lobes. Fuentes-Claramontea et al. [69] used the functional MRI (fMRI) scanning of SCZs and HCs while performing a task with three conditions of (1) self-reflection; (2) other reflection; and (3) semantic processing. They showed a connection between alteration in the right TPJ activity and the disorder in self/other differentiation, which could be associated with psychotic symptoms of schizophrenia and affect social functioning in these patients.
Most of the selected features are at θ and β frequency bands. An increase in the first episode and chronic SCZ patients in θ frequency band is one of the most consistent observations in schizophrenia EEG/ERP studies, which can occur both locally and globally [70]. Furthermore, the EEG signals of SCZ patients show abnormal synchronization in β and γ bands, suggesting a crucial role in cognitive deficits and other symptoms of schizophrenia [71]. Table 3 shows the training and test performance for GNB, LDA, SVM, KNN, and RF classifiers that averaged over the five CN-CV outer folds. From Table 3, both the training and test scores are high, ensuring that overfitting has not occurred. Using the test dataset, the KNN classifier can discriminate SCZ from HC with the highest averaged total classification accuracy of 96.92%, followed by the RF, GNB, LDA, and SVM classifiers with total accuracies of 95.47%, 95.44%, 95.44%, and 94.67%, respectively. Comparing the sensitivity and specificity, GNB has the highest sensitivity of 96.67%, followed by RF, KNN, SVM, and LDA with sensitivities of 95.12%, 95%, 91.92%, and 88.59% while KNN has the highest specificity of 98.57%, followed by the SVM, LDA, RF, and GNB with the specificities of 97.14%, 97.14%, 95.71%, and 94.28%, respectively. With regard to precision, KNN has the highest value of 98.33%, followed by SVM, LDA, RF, and GNB with precision values of 96.92%, 96.33%, 95.48%, and 93.81%, respectively. Finally, KNN has the highest F1-score (F1 = 0.  For illustrative purposes, Figure 3 shows a scatter plot of the 62 SCZs (blue circles) and 70 HCs (black crosses), using the kernelized principal component analysis (KPCA) with the polynomial kernel [72]. From the figure, the SCZ and HC clusters are clearly separated, which supports the hypothesis of selecting highly discriminating features. It is worth noting that while the selected features were highly discriminating between the two classes, no correlation was found between the values of the selected features and the symptom severity or duration of illness in SCZ class that showed the selected features were closer to the HC class for patients with less severe symptoms or a shorter duration of illness.
We then evaluated the performance of the classifiers by using the selected features in Table 2 from a new dataset available at RepOD [58]. Table 4 shows the performances of different classifiers discriminating 14 SCZs and 14 HCs from RepOD dataset using 5-fold CN-CV. From Table 4, the performance of all classifiers is above 90%, whereas the highest performance belongs to KNN, SVM, and RF classifiers with the sensitivity of 95.71%, specificity of 100%, precision of 100%, total accuracy of 97.86%, F1-score of 0.98 and MCC of 0.96. This performance is higher than the performance of studies [19][20][21][22], while the new dataset is not used for the feature selection. This proves again that the selected features are very discriminating between SCZ and HC.

Conclusions
In this study, we used STE for the first time to develop an MLA to diagnose schizophrenia from resting-state EEG data. Using the relief algorithm, we found a set of 10 discriminating features that differentiated between SCZ and HC. We then first checked the classification performance by using 5-fold CN-CV on our dataset (Table 3) and then on a new dataset available at RepOD [58] (Table 4). From Table 3, the highest accuracy belonged to the KNN classifier (Sensitivity = 95%, specificity = 98.57%, precision = 98.33%, total accuracy = 96.92%, F1-score = 0.97, and MCC = 0.94) and from Table 4, the highest accuracy belonged to KNN, SVM, and RF classifiers (Sensitivity = 95.7%, specificity = 100%, precision = 100%, total accuracy = 97.86%, F1-score = 0.98, and MCC = 0.96).
We note that the performances indicated in Tables 3 and 4 are higher than typical values obtained from previous studies (Table 1) with a much lower number of features, and less complexity compared to the studies using deep learning approaches. We argue that this performance improvement was due to the effectiveness of the STE method that was employed in the present study. Furthermore, the number of SCZ and HC subjects in this study is higher than most previous studies [11][12][13][14][15][16][17][18][19][20][21][22][23], which can increase the probability of an accurate diagnosis of schizophrenia.
Finally, the selected features are mostly from the EC of occipital, frontal, parietotemporal, and centro-temporal regions that are in accordance with other research studies related to SCZ. This supports the idea that the proposed MLA can identify features from the regions that are mainly affected by SCZ and that the STE effective connectivity extracted from resting-state EEGs could contribute towards a better understanding of the underlying pathophysiology of schizophrenic illnesses.
While the number of subjects in this study was higher than in most previous studies, it is recommended that the proposed MLA be trained on a bigger dataset with a higher number of SCZ and HC subjects in the future to have a more reliable classification performance. This proposed MLA also has the potential to be used in differentiating between various neuropsychiatric disorders such as major depressive disorder (MDD), bipolar disorder, autism and schizophrenia, as well as predict the response to different treatments available for these diseases. Thus, further work is required to investigate disease-related alterations of EC between brain areas in neuropsychiatric disorders and conditions other than schizophrenia and also the ability to predict the response to different treatments. Institutional Review Board Statement: Ethical review and approval were waived for this study, due to study of existing data where the information is recorded by the investigator in such a manner that the subjects cannot be identified, directly or through identifiers linked to the subjects.