Identifying Brain Abnormalities with Schizophrenia Based on a Hybrid Feature Selection Technology

Featured Application: The hybrid feature selection method, which combines both machine learning and traditional statistical methods, is proposed to identify the brain abnormalities of schizophrenia. The results suggest that the brain regions and connectivity in SZs are destroyed compared with HCs, which may cause the cognitive deﬁcits and autistic thinking in SZs. The ﬁndings support the validation of the proposed hybrid feature selection method, and thus, it is promised that such a hybrid feature selection method can be further used for other kinds of medical data analysis to enhance the diagnosis ability and further for precision medicine. Abstract: Many medical imaging data, especially the magnetic resonance imaging (MRI) data, usually have a small sample size, but a large number of features. How to reduce effectively the data dimension and locate accurately the biomarkers from such kinds of data are quite crucial for diagnosis and further precision medicine. In this paper, we propose a hybrid feature selection method based on machine learning and traditional statistical approaches and explore the brain abnormalities of schizophrenia by using the functional and structural MRI data. The results show that the abnormal brain regions are mainly distributed in the supramarginal gyrus, cingulate gyrus, frontal gyrus, precuneus and caudate, and the abnormal functional connections are related to the caudate nucleus, insula and rolandic operculum. In addition, some complex network analyses based on graph theory are utilized on the functional connection data, and the results demonstrate that the located abnormal functional connections in brain can distinguish schizophrenia patients from healthy controls. The identiﬁed abnormalities in brain with schizophrenia by the proposed hybrid feature selection method show that there do exist some abnormal brain regions and abnormal disruption of the network segregation and network integration for schizophrenia, and these changes may lead to inaccurate and inefﬁcient information processing and synthesis in the brain, which provide further evidence for the cognitive dysmetria of schizophrenia. -value of L for the optimal subset by the proposed hybrid method were 9.40 × 10 − 6 , 2.82 × 10 − 5 and 3.41 × 10 − 6 respectively. The results show that the HCs and SZ became obviously distinguishable after feature selection; specially, our method was more signiﬁcant than machine learning, as well as statistical methods. In summary, the hybrid method can combine the


Introduction
Schizophrenia (SZ) is a kind of mental disorder characterized by abnormal social behaviour and a failure to understand reality. Recently, decades of research on brain structure and function have provided us with some understanding of the neurobiological mechanisms underlying its symptoms [1,2]. For example, studies on brain structure suggest that neuroanatomical alterations may underlie the clinical onset of psychotic symptoms. The findings from functional brain imaging studies support a leading hypothesis that SZ stems from disconnectivity, namely abnormal interactions The second stage adopts wrapper methods to carry out the further feature selection process [17]. Ensemble methods are based on different sampling strategies to extract multiple sample sets, and then, they use a specific feature selection algorithm to obtain multiple sets of feature subsets. These feature subsets are further integrated to obtain a more stable feature subset [18]. Compared with the above three methods, the performance of the ensemble methods no longer depends merely on a single subset selected, but it is still limited since it uses only one specific feature learner. Hybrid methods can be combined with some different feature selection methods. Hybrid approaches combine two or more well-studied feature selection algorithms to form a new strategy and achieve a complementary advantage of different feature selection methods to solve a particular problem [19,20]. The hybrid approach usually capitalizes on the advantages from the sub-algorithms and therefore is more robust compared with single approaches. The feature selection techniques mentioned above have been applied to many fields of dimensionality reduction analysis [21][22][23]. In addition to the above five types of feature selection methods, some traditional statistical methods can also be used to reduce dimensionality, such as hypothesis testing, correlation coefficients, etc. These methods can obtain features with higher distinguishing ability, so as to improve the discriminative capacity of different classes [24,25].
Motivated by identifying biomarkers of SZ that are associated with cognitive composite ability and specific cognitive domains such as attention, working memory and verbal learning, in this paper, by proposing a hybrid feature selection method combining both machine learning and traditional statistical approaches, we explore the brain abnormalities of SZ. The data have 410 features, including both functional and structural MRI, i.e., functional network connectivity (FNC) and source-based morphometric (SBM) of 40 patients with SZ and 46 healthy controls (HCs). By applying our method to these two datasets, the results show that there exist six aberrant brain regions and 17 abnormal functional connections between the SZ group and HC group. Among our findings, there was an obvious decrease, as well as increase of both the grey matter volume and the connectivity of brain regions. The decreasing regions mainly appeared in the default mode network (DMN) and salience network (SN), e.g., the grey matter volume of precuneus (PCUN) and caudate (CAU), and the connectivity of these two brain regions, as well as insula (INS) and CAU were significantly reduced. Moreover, all connectivity corresponding with rolandic operculum and insula significantly reduced [26][27][28][29][30][31]. The significantly increased grey matter volume of brain regions was mainly distributed in frontal gyrus (FG) and supramarginal gyrus (SMG), and there also existed four with significantly increased connectivity, such as middle frontal gyrus and superior occipital gyrus, as well as middle occipital gyrus and fusiform gyrus, and the corresponding conclusion of increasing also was discussed [28,29,32]. To further confirm the significance of the selected abnormal functional connections, we also used complex network analysis. Since the level of response activity in brain regions and the ability of functional connectivity between different brain regions can reflect the degree of brain disorders, the results have the potential to provide evidence for accurate diagnosis and further for precision medicine learning of such kinds of psychiatric diseases.

Methodology
There are many feature selection methods based on machine learning, as well as traditional statistics. Combining both of them, especially developing a kind of hybrid feature selection method, is still worthy of study. In this section, we will introduce a hybrid feature selection method combining three kinds of machine learning methods and three kinds of statistical methods. In addition, some graph theory will be presented to verify the validation of the features selected by the proposed hybrid feature selection method.

Feature Selection with Support Vector Machine
Support vector machine based on recursive feature elimination (SVMRFE) is a multi-variable wrapper feature selection algorithm, and it can keep relevant features and remove relatively insignificant feature variables in order to achieve higher classification performance. SVMRFE was first proposed for gene selection [33], and it has been widely applied to MRI data research, text analysis and biological information processing [34][35][36].
For SVMRFE, the scoring function for each feature i is defined as: where ω i is the weight for feature i as obtained from the SVM training. Thus, features that contribute the most to discriminating the two classes are represented by |ω| with the highest values, and features with small scores are generally considered as noise, redundant or irrelevant to the problem. Therefore, eliminating features with smaller scores does not bring about great changes of the optimization problem, which is the essence of the algorithm [37,38]. The SVMRFE algorithm is briefly described as below.
Algorithm 1: Support vector machine based on recursive feature elimination (SVMRFE) Input: Dataset D Process: 1. Initialization Let the current feature subset Current − D contain all features, and the optimal feature subset Best − D = ∅; 2. Training the classifier Train a SVM on the training set with the Current − D, and evaluate the classification accuracy on the test set; 3. Updating Current − D Calculate the importance of each feature in Current − D by the scoring function (1), and eliminate features with the smallest score; 4. Updating Best − D If the accuracy rate of Current − D is greater than that of Best − D, then let Best − D = Current − D; 5. Repeat Steps 2-4 until the stop condition is satisfied. Output: The optimal feature subset Best − D The stopping criterion can be a desired dimensionality, a pre-specified number of iterations or a generalization of the performances, etc.

Feature Selection with Random Forest
Random forest (RF) is an ensemble machine learning method using tree-type classifiers. It is built by bootstrap sampling technology and random splitting technology, and the final classification result is made by a majority vote of the trees [39,40]. Because of its excellent generalization performance, RF is also further used for feature selection [41,42].
For a given tree, let S 0 denote the set of input predictor data vectors and S j be the subset of the predict data reaching node j in the binary split tree. According to the performance of the current feature on node j, S j can be divided into two subsets, i.e., S L j and S R j ; here, S L j S R j = S j and S L j S R j = ∅. Choosing the best split according to the mean decrease of the Gini index, which is defined as: where Gini(j) = 1 − ∑ c∈C P 2 c is the Gini index at node j. This metric reflects the contribution of each feature to node j; therefore, we can get an estimate of feature i with Gini importance: where ∆Gini i (j, t) is the value of ∆Gini i (j) on one tree t. The Gini importance indicates how large its overall discriminative value is for the classification task. We randomly chose a feature i, calculated its Gini importance defined in (4) and removed the features with Gini importance below feature i. The algorithm for feature section with random forest by Gini importance (RFFS-GI) is briefly described as below. In addition, for bootstrap sampling technology, about 1/3 of the samples will not be collected at the end, and they are called the out of bag (OOB) data [43]. The role of OOB data can be considered as equivalent to the test data. Therefore, we can also use the classification accuracy of the random forest classifier on the OOB data as the feature separability criterion, so as to calculate the importance of each feature: where ooberr1 is the classification error of the OOB data, ooberr2 is the classification error of the OOB data with adding noise on feature i and N indicates the number of trees in a random forest. We can understand that if a feature is randomly disturbed, the classification error of the OOB data will increase greatly, and it can be considered that this feature has a great influence on the classification result. The algorithm of feature section with random forest by the classification accuracy on the OOB data (RFFS-OOB) is briefly described as below.
Algorithm 3: Feature section with random forest by the classification accuracy on the OOB data (RFFS-OOB) Input: Dataset D Process: 1. Generate random forest; 2. Calculate feature importance based the scoring function (4), and sort the scores; 3. The top ranked features are selected as the optimal feature subset. Output: Optimal feature subset.
In order to improve the accuracy of feature selection results for the SBM and FNC data, we used SVMRFE, RFFS-GI and RFFS-OOB, and repeated them 20 times separately, counted the frequency of the selected features by each feature selection method and integrated the optimal feature subsets.

Feature Section Based on Statistical Methods
For classical statistical methods, the discriminative ability of a feature can be quantitatively measured by its contribution on distinguishing different classes [25,44].
The Kendall tau correlation coefficient provides a distribution-free test of independence between two variables. The Kendall tau correlation coefficient of feature j can be defined as: where n c and n d are the numbers of concordant and discordant pairs, respectively, and n 1 and n 2 correspond to the number of two classes of samples, respectively. For a pair of data (x ij , y i ) and (x kj , y k ) of feature j, it is a concordant pair when sgn(x ij − x kj ) = sgn(y i − y k ), where sgn() is the signum function (i.e., sgn(x) = −1 with x < 0, sgn(x) = 0 with x = 0 and sgn(x) = 1 with x > 0). Correspondingly, it is a discordant pair when sgn(x ij − x kj ) = −sgn(y i − y k ). The discriminative power of each feature j is defined as the absolute value of its Kendall tau correlation coefficient. The permutation test is a non-parametric test method, which is suitable for the case of a small sample size and unknown sample distribution. Assume that there are two samples x A and x B , and x A andx B denote the corresponding sample mean, say n A and n B are the corresponding sample size. At first, we calculate the observed test statistic T obs =x A −x B . Then, the two samples are merged and divided into two groups with size n A and n B . For each division, the difference between the mean values of the two groups is calculated and recorded. The calculated difference set is the accurate distribution of the difference under the null hypothesis. Finally, the ratio of the absolute value of the calculated difference greater than or equal to the absolute value of T obs is the p-value based on the two-sided test.
By the two-sample t-test, we can also determine whether there are significant differences of each feature. The t-value of the feature j can be defined as: wherex 1 andx 2 are the means of feature j of patients and health controls (HCs) and s 1 and s 2 represent the corresponding standard deviations. With the Kendall tau correlation coefficient, permutation test and two-sample t-test, we can identify features with significant differences.

Hybrid Feature Selection Based on Both Machine Learning and Statistical Methods
By combining the above machine learning methods and statistical methods, we propose a hybrid feature selection approach. In more detail, for machine learning methods, we summed the frequencies of SVMRFE, RFFS-GI and RFFS-OOB, then we selected the features with total frequency greater than a given value b to obtain the significant feature subset. At the same time, we selected features with the absolute values of the Kendall correlation coefficient greater than a given value c and those with the p-value of two-sample t-test, as well as that of the permutation test less than 0.05 as the significant feature subset, respectively. Finally, we integrated the significant feature subset from both the machine learning and statistical method as the optimal feature subset. The above process is a hybrid feature selection procedure, and the flowchart is shown in the Figure 1. The experiment results will show that the proposed hybrid feature section method is an effective attempt to combine machine learning and the statistical methods.

Complex Network Analysis Based on Graph Theory
The data we used here are a type of MRI data, which contain both the regions and the functional connection information of brains. The hybrid feature selection method can be directly used to explore the disease-related abnormal brain regions and abnormal function connections. Furthermore, since the completion of various tasks allocated for brains is implemented by the coordination and cooperation between various brain regions, so it is necessary to discover the connection networks of brains in depth.
The analysis of complex network properties by several indexes (see Figure A1) can characterize the topological attributes of the network; for example, the clustering coefficient quantifies the functional segregation of the brain network, in which the functional segregation reflects the ability of a specialized process to occur within some densely-interconnected groups of the brain regions. The length of characteristic path quantifies the functional integration of the brain network, and the functional integration reflects the ability to combine rapidly some specialized information from distributed brain regions [45]. Both global and local network efficiencies quantify the transmission capability of the brain network, and the transmission capability reflects the ability of transmitting information between different brain regions in the brain network. The main difference is that the global network efficiency focuses on the global brain network, but the local network efficiency just focuses on the local brain network. Thus, by complex network analysis, we can confirm the significance of those selected abnormal connection features and can further explore the mechanism of SZ.

Experiments
In this section, based on the hybrid feature selection method and network topological analysis, we located the brain abnormalities of both regions and connections with SZ. Firstly, by the SVMRFE, RFFS-GI, RFFS-OOB, correlation coefficient and hypothesis test, the candidates of brain regions and connections associated with SZ were selected separately, and then, by the hybrid method, we could confirm the significant regions and connections of SZ. Furthermore, the complex network analysis based on graph theory was used to verify the selected abnormal connections. Ultimately, we could locate some of the abnormal brain regions and abnormal connections with SZ, which provided theoretical guidance for the rapid and accurate diagnosis of psychiatric diseases and adjuvant therapy.

Data Collection and Preprocessing
In this study, the Machine Learning for Signal Processing (MLSP) 2014 Schizophrenia classification challenge data were used. The data can be download from https://www.kaggle.com/c/mlsp-2014-mri. They were collected on a 3T MRI scanner at the Mind Research Network and funded by the Centers of Biomedical Research Excellence. Image preprocessing was performed using statistical parametric mapping software (SPM, http://www.fil.ion.ucl.ac.uk/spm). Further feature extraction was done by the GIFT Toolbox (http://mialab.mrn.org/software/gift/), yielding different imaging modalities, i.e., SBM and FNC features for structural MRI and resting state functional MRI, correspondingly.
The data consisted of 40 patients with SZ and 46 HCs. A diagnosis of SZ was made by using the Structured Clinical Interview for DSM-IV (SCID; Diagnostic and Statistical Manual of Mental Disorders, DSM) [46]. Each sample had 410 features (32 for SBM and 378 for FNC). SBM features were weights of brain regions, and they indicated the concentration of grey matter in different regions of the subject's brain [47]. FNC features were the pair-wise correlation values between the time-courses of 28 brain regions and can be seen as a functional modality feature describing the subjects' overall level of synchronicity between brain areas [48]. These 28 brain regions were selected according to the anatomical automatic labeling (AAL) template, and they are shown in Figure A2, while the connections between the brain regions corresponding to these FNC features are shown in Figure A3.

Locating the Abnormalities in Brains for SZ
For both the FNC and SBM data, we performed feature selection methods based on machine learning and statistical approaches, respectively. By the hybrid process, the key features can be selected; namely for SBM data, we obtained the abnormal brain region, and for FNC data, the abnormal connectivities were achieved. Further, we used the brain network based on graph theory to analyse the selected abnormal connections. The following Figure 2 shows the whole flowchart of the procedure.

Feature Selection Results Based on Machine Learning Methods
SVMRFE, RFFS-GI and RFFS-OOB were applied to perform feature selections on the MRI data respectively, with each method being repeated 20 times. Since these three methods were implemented based on the classification results and SBM data and FNC data had different classification performance, therefore, in order to obtain the key features of the two types of data more clearly, we selected the features of both of them separately. By the three feature selection methods, the results of the frequency of each feature that has been selected are shown in Figures 3 and 4 and Figures A4-A7.
It is generally believed that if the frequency of occurrence of a feature is too low, then the feature is not significant. Therefore, we only considered features with a higher frequency to obtain the significant feature subset. In Figure A8, the corresponding characteristic frequency distribution with a frequency greater than or equal to 50 is shown. Each point in this figure corresponds to the number of features with a frequency of occurrence greater than or equal to x. Further, we selected features with a frequency greater than or equal to 55, which is a balance between the numbers of features and the frequency (the details can be found in the illustration of Figure A8). From Figures 3 and 4 and Figures A4 and A7 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1  13  25  37  49  61  73  85  97  109  121  133  145  157  169  181  193  205  217  229  241  253  265  277  289  301  313  325  337  349  361

Feature Selection Results Based on Statistical Methods
Statistical methods were utilized to screen out features with significant differences. The results of the Kendall correlation coefficient are shown in Figure 5, and the hypothetical test results are shown in Figure 6.
We selected features with the p-value of the hypothesis test less than 0.05 and the absolute value of Kendall correlation coefficient greater than 0.26, which is a balance between the size of the selected feature subsets and their distinguishing ability of SZ. The results are shown in Figure 7, where τ is the Kendall correlation coefficient and p 1 and p 2 are the p-values of the two-sample t-test and the permutation test, respectively.     Figure 7. Feature selection results based on statistical methods.

Feature Selection Results Based on a Hybrid Method
By both machine learning and statistical methods, the key candidate features for SZ were selected, and the dataset were quite similar. We adopted the intersection of them as the final selected feature subset, and thus, the abnormal brain regions from the SBM data (see Figure 8) and the abnormal functional connectivity from the FNC data (see Figure 9) can be obtained.  Figure 8. The selected abnormal brain regions of SZ by the hybrid method. Segall et al. presented the relationships between the cortical maps and the brain regions described by the SBM features [47]. Figure 8 shows the brain regions selected by our method that differed from healthy controls in SZ, and these abnormal brain regions were mainly distributed in supramarginal gyrus (SMG), cingulate gyrus (CG), middle frontal gyrus (MFG), precuneus (PCUN), superior frontal gyrus (SFG) and caudate (CAU). Compared with the HC group, the SZ group had significantly reduced grey matter volumes in the CG, PCUN and CAU and significantly increased grey matter volume of brain regions including SMG, MFG and SFG.  Figure A2), in which ML refers to machine learning methods and SM refers to statistical methods. The circular connectivity graph in the middle is a schematic map of the selected functional connections, which are listed in the fourth column of the left table. The labels in this graph correspond to the regions of interest, and the corresponding spatial maps of these regions (see [48]) are also shown in this graph. The right graph depicts the locations and their connections of the selected brain regions by the BrainNet Viewer toolbox [49]. Figure 9 shows that by the hybrid feature selection method proposed here, 17 abnormal functional connections between the SZ group and HC group can be discovered. Furthermore, by combining with the relationship between the connections and the regions shown in Figure A2, six connections are related to the caudate nucleus (CAU), including rolandic operculum (ROL), insula (INS), supramarginal gyrus (SMG), superior occipital gyrus (SOG), precuneus (PCUN) and median cingulate and paracingulate gyri. In addition, there also existed three abnormal functional connections related to the insula (i.e., ROL, amygdala and CAU) and four aberrant functional connection in rolandic operculum (i.e., insula, lingual gyrus, superior parietal gyrus and caudate). Among these abnormal connections discovered by our method, we can find that all connectivities corresponding with rolandic operculum and insula had significantly reduced, and these connectivities related to caudate nucleus had significantly decreased except the median cingulate and paracingulate gyri. Other than that, we also observed the significantly increased connectivity in middle frontal gyrus and superior occipital and middle occipital gyrus and fusiform gyrus, as well as left and right superior parietal gyrus. In conclusion, the brain connectivity in SZ generally decreased, but also had little increased connectivity. To show these abnormal connections more vividly, in Figure 9, we used the BrainNet Viewer toolbox to draw the precise locations of two brain regions with aberrant connections and to show the aberrant brain connectivity network in SZ [49] .

Network Evaluation
Further, to support the validity of the connectivity findings by the above hybrid feature selection method, we constructed a brain network based on these connections and explored its topological properties [50,51]. More specifically, we first chose the clustering coefficient (C), characteristic path length (L), global network efficiency (Eg) and local network efficiency (Eloc) as the evaluation index for each network. Then, we constructed weight networks with a threshold of one for the original and selected FNC data. At last, these four parameters of both SZ and HCs were calculated and tested by a two-sample t-test. The p-values of these four parameters were 1.70 × 10 −1 , 5.02 × 10 −3 , 2.99 × 10 −2 and 4.27 × 10 −2 for the original FNC data and 6.64 × 10 −2 , 3.41 × 10 −6 , 5.40 × 10 −6 and 1.90 × 10 −2 after feature selection by our method. Obviously, from the results of the p-values of the four parameters, we can find that the p-values of all these parameters decreased significantly after feature selection, which means that the distinction of four parameters between the HCs and SZ became more apparent after feature selection, especially the characteristic path length and the global network efficiency. This shows that the HCs and SZ become obviously distinguishable by the hybrid feature selection method and shows the validity of our method.

Discussion
The methods based on machine learning pay more attention to the classification accuracy, but the statistical methods emphasize the correlation between feature and label, which explains the essential difference between the two approaches. Comparing the significant subsets selected by these two approaches, it is clear that most of the biomarkers in these two subsets were same, and this means that despite the emphasis of the two approaches being different, both of them did find the significant features. Further, by integrating the significant subset of these two approaches, the significant features can be double checked and obtained finally by the hybrid method proposed in this article. For example, for the data before feature selection, the p-value of characteristic path length, which is referred to as L in the above section, was 5.02 × 10 −3 . The p-value of L for the optimal subset I, which was obtained by machine learning methods, the p-value of L for the optimal subset II, which was obtained by statistic methods, and the p-value of L for the optimal subset by the proposed hybrid method were 9.40 × 10 −6 , 2.82 × 10 −5 and 3.41 × 10 −6 respectively. The results show that the HCs and SZ became obviously distinguishable after feature selection; specially, our method was more significant than machine learning, as well as statistical methods. In summary, the hybrid method can combine the strength of both machine learning and statistic methods to improve the accuracy of the results, and the results of network evaluation also confirmed this point.
Our findings are quite consistent with those reports that the grey matter volume of CG, PCUN and CAU is significantly reduced in SZ [26,27,52,53]. The CG is considered to be a brain region closely related to task attention, memory and affection, which has been reported to be destroyed in SZ [54]. The PCUN is the portion of the superior parietal lobule on the medial surface of each brain hemisphere, and it is often considered to be a brain region that plays an important role in the pathogenesis of SZ [55]. Given that the Behavioural Inhibition System (BIS) activity and Cloninger's Temperamental Dimension Harm Avoidance (HA) are mainly bound up with the study of the anxiety trait [56,57] and the research results show that the BIS-sensitively as well as HA are negatively correlated with the regional gray matter volume at the CG and PCUN, the SZ may be accompanied by anxiety trait due to the reduction of the gray matter volume at these two regions [58]. The CAU is one of the structures that makes up the dorsal striatum, which is a component of the basal ganglia. It can affect the cognitive function of patients, resulting in decreased memory ability, and may be the cause of cognitive dysfunction in SZ [59].
In our findings, most of the brain connectivity in SZ was significantly reduced, which had been generally accepted as the fact that the functional connectivity reduces significantly in SZ and the reduction may cause the damage of information integration [60]. Among these abnormal connectivities, CAU, INS and ROL were the most connected regions. The INS mainly participates in the formation of aversion, the regulation of pain, the production of depression, the regulation of cardiac activity and the planning of language [61], and these may be the cause of affective symptoms in SZ. Moreover, many studies have found that the connectivity in the INS decreased, which may cause the disrupted functional integration of the brain [30]. The ROL is mainly involved in language, and Wu et al. suggested that the reduction of connectivity of ROL improves the vulnerability of speech recognition to speech masking [62]. Not only that, the work also showed that the ROL is bound up with hallucination [63]. It has been reported that SZ is often accompanied by motor abnormalities, and the work showed that the abnormalities of the motor system are related to the abnormal functional connectivity of CAU and CG [64]. In addition, the work showed that the network of DMN including posterior cingulate cortex and lateral temporal cortex and SN including INS and CAU have abnormal connectivity in SZ [65]. DMN is mainly related to oriented attention and self-monitoring [66], and SN is implicated in orienting toward salient external stimuli and internal events [67]. These state clearly that the abnormal connectivity of CAU and INS may result in the cognitive deficits.
In addition to the above findings that there exist some decreasing regions and connections, we also found that there exist some increasing regions in SMG, MFG and SFG and the increasing connectivity of MFG and superior occipital gyrus, the median cingulate and paracingulate gyri and CAU, the left and right superior parietal, as well as middle occipital gyrus and fusiform gyrus. Some corresponding conclusions were also mentioned in literatures [28,29,32]. Research showed that the connectivity of the frontoparietal network (FPN) and DMN significantly increased [65]. The FPN including dorsolateral prefrontal cortex and dorsolateral parietal cortex is implicated in executive control [68], which means the function of executive control of SZ is different from HCs. In conclusion, we found that most abnormal brain regions and connectivity discovered by our method were mainly related to cognition and hallucination. These abnormalities may be the reason for the cognitive deficits and autistic thinking in SZ. Moreover, our studies show that compared with HCs, the brain network of SZ is not a single decline or rise, but a mix of both. The most abnormal connectivity may cause the information integration and transmission damage. Thus, by our method, we did find the abnormal regions and the connectivity of brain that were strongly related to SZ, and the results also supported the effectiveness of using functional disconnectivity from neuroimaging as a biomarker for diagnosis of mental disorders [69].

Conclusions
By the proposed hybrid feature selection approach, which combined both machine learning and traditional statistical methods, the abnormal brain regions and abnormal connections in brains of SZ were discovered. The results of SBM data showed that the abnormal brain regions of SZ were mainly distributed in supramarginal gyrus, cingulate gyrus, middle frontal gyrus, superior frontal gyrus, precuneus and caudate. These brain regions are reported to have strong association with SZ, and they are mainly involved in perception, thinking, emotion and spiritual activity. The results of FNC data showed that most of the abnormal functional connections in brains of SZ were related to FPN, DMN and SN. These three networks are closely related to cognitive deficits, especially in executive control and salience processing. All of the results suggest that the brain regions and connectivity in SZ are destroyed compared with HCs, and the abnormal activity may cause the cognitive deficits and autistic thinking in SZ. In addition, the complex network analysis further verified the significance of the selected abnormal functional connections. All findings supported the validation of the proposed hybrid feature selection method, and thus, it is promised that such a hybrid feature selection method can be further used for other kinds of medical data analysis to enhance the diagnosis ability.   Figure A1. Different measuring parameters of the global and local network properties. Where t i is the number of triangles around node i, d ij is the shortest path length between node i and node j, C rand and L rand refer to the average clustering coefficient and characteristic path length values obtained from 100 random networks with the same number of nodes, as well as edges and the same degree of distribution as the original network, σ jk is the number of shortest paths between j and k and σ jk (i) is the number of shortest paths between j and k that pass through i.  1  21  17  49  17  55  97  23  71  145 38  20  193 46  59  241 67  47   2  21  7  50  17  42  98  23  55  146 38  47  194 46  50  242 67  49   3  21  23  51  17  20  99  23  42  147 38  49  195 46  53  243 48  39   4  21  24  52  17  47  100 23  20  148 56  29  196 46  25  244 48  59   5  21  38  53  17  49  101 23  47  149 56  46  197 46  68  245 48  50   6  21  56  54  7  23  102 23  49  150 56  64  198 46  34  246 48  53   7  21  29  55  7  24  103 24  38  151 56  67  199 46  60  247 Figure A8. The characteristic frequency distribution with a frequency greater than or equal to 50. The x axis corresponds to the frequency of occurrence, and the y axis is the number of features. We can find that when the frequency is in the red range, i.e., greater than or equal to 52 and less than or equal to 56, the number of features is quite stable. Compared with other ranges, in the red range, there exists a balance between the number of features and the frequency of occurrence, which facilitates the abnormal analysis of brain function connections and structures corresponding to diseases. Therefore, we selected features with a frequency greater than or equal to 55.