An fMRI Feature Selection Method Based on a Minimum Spanning Tree for Identifying Patients with Autism

: Autism spectrum disorder (ASD) is a neurodevelopmental disorder originating in infancy and childhood that may cause language barriers and social di ﬃ culties. However, in the diagnosis of ASD, the current machine learning methods still face many challenges in determining the location of biomarkers. Here, we proposed a novel feature selection method based on the minimum spanning tree (MST) to seek neuromarkers for ASD. First, we constructed an undirected graph with nodes of candidate features. At the same time, a weight calculation method considering both feature redundancy and discriminant ability was introduced. Second, we utilized the Prim algorithm to construct the MST from the initial graph structure. Third, the sum of the edge weights of all connected nodes was sorted for each node in the MST. Then, N features corresponding to the nodes with the ﬁrst N smallest sum were selected as classiﬁcation features. Finally, the support vector machine (SVM) algorithm was used to evaluate the discriminant performance of the aforementioned feature selection method. Comparative experiments results show that our proposed method has improved the ASD classiﬁcation performance, i.e., the accuracy, sensitivity, and speciﬁcity were 86.7%, 87.5%, and 85.7%, respectively.


Introduction
Autism spectrum disorder (ASD) is a group of neurodevelopmental deficits. According to recent research reports, one in every 45 children suffers from ASD, which may cause significant social, communication and behavioral challenges [1][2][3][4][5][6]. In terms of the etiology and diagnosis of ASD, ASD is a widespread developmental disorder caused by diseases of the brain and nervous system, as well as gene mutations [7]. In general, some symptoms of ASD begin to appear gradually at approximately two years of age [8]. A noninvasive method for diagnosing ASD in the early stage may help clinicians implement interventions in a timely manner and significantly improve patients quality of life [9,10]. In recent decades, various noninvasive brain imaging methods have been widely used in ASD studies, such as electroencephalography [11], structural magnetic resonance imaging [12], resting-state magnetic resonance imaging [13,14], and diffusion tensor imaging [15], and these methods have made outstanding contributions to extracting biomarkers that characterize disease characteristics. Recently, machine learning has been used in the field of neuroimaging, because it accurately and automatically distinguishes individuals with schizophrenia from healthy people [16,17]. In light of the ASD auxiliary diagnosis, machine learning is often utilized in three key steps: (1) constructing functional connectivity networks among regions of interest (ROIs) from resting state functional magnetic resonance imaging (rs-fMRI) data, (2) analyzing the connectivity of brain regions and identifying differences in connectivity between patients and normal controls, and (3) building a classification model to discriminate individuals with ASD from non-ASD subjects using filtered functional connectivity features [18]. Here, the functional connections are always represented by Pearson s correlation coefficient matrix. However, the high dimensionality of the functional connection matrix usually reduces the classification performance, particularly under the condition of a lack of sufficient neural image samples. The influx of features into classifiers is known as dimensional disasters and overfitting. Thus, the selection of fMRI features is particularly important in disease classification and prediction [19].
To the best of our knowledge, research on feature selection algorithms based on ASD diagnosis in the existing literature has mainly covered two methods. One feature selection method is a filter-based method. The filter method first selects the features of the dataset and then trains the classifier with the feature subset. Notably, the process of feature selection and the subsequent classification are performed independently. For example, Guo [20] designed a two-sample t-test method to classify patients with ASD, and the accuracy rate was 70.67%. However, the t-test method is mainly applied to the usage scenarios of small sample data. The identification method of the statistical test relies on the difference between groups and obtains the optimal feature subset based on the p-value [21][22][23][24][25]. Mladen et al. [26] introduced the Fisher score [27] to identify patients with ASD according to the features of the same feature data distribution. The method analyzed MRI information from two dimensions of function and structure, which can obtain stronger information expression ability and the classification accuracy reached 85.06%. Kang et al. [28] combined the minimum redundancy maximum correlation (mRMR) feature selection method with the SVM classifier to classify children with autism and typical children, and the resulting classification accuracy was 85.44%. Nevertheless, the mRMR algorithm may underestimate the effectiveness of features to a certain extent. The other feature selection method is based on wrapping and embedding. Specifically, the wrapping method directly utilizes the performance of the final classifier to evaluate the total effects of feature selection and classification, while the embedded method combines the feature selection process with the classifier training process, i.e., the feature is automatically selected during the classifier training process. For instance, Fredo et al. [29] classified functional magnetic resonance images (fMRI) of typically developing (TD) individuals and participants with ASD using a conditional random forest model. The dimensionality of the functional connectivity (FC) matrix was reduced using conditional random forest models, and the classification accuracy was 65%. Taban et al. [30] adopted an autoencoder and a single-layer perceptron to diagnose ASD and achieved an accuracy of 70.3%. To overcome the shortcomings of poor generalization ability in single-layer neural network and obtain the multi-granularity information hidden in fMRI data. Li et al. [31] proposed a patch-level data-expanding strategy for multi-channel convolutional neural networks to automatically identify infants with risk of ASD in early age and the accuracy rate reached 76.24%. The various statistical methods, machine learning methods, and deep learning methods described above all promote research into the diagnosis of ASD.
The current feature selection methods are not good at facilitating the diagnosis of ASD. Due to the high-dimensional characteristics of fMRI data, mutually exclusive information or redundant information between features may be included. Traditional filter-based or wrapper-based methods always tend to select features containing discrimination information and do not closely monitor the information redundancy among features at the same time. From the perspective of feature selection, the model we construct must have both the ability to select discriminative features and filter out redundant information. The current feature selection methods aim to minimize the redundancy between features based on the identification of features that maximize the class difference without fully considering the decoupling of fMRI features. We proposed a feature selection method based on the minimum spanning tree (MST) and applied it to the diagnosis of patients with ASD to achieve this goal. Figure 1 shows the schematic of the method proposed here. Our method improved the accuracy of the ASD diagnosis by selecting the discriminant and decoupling fMRI features. (B) after preprocessing the brain is divided into 90 regions of interest (ROI); (C) by averaging the BOLD activity in each ROI, a time series is extracted representing brain activity in that region; (D) using different measures of connectivity, a connectivity matrix is generated from the ROI time series quantifying the connectivity level between individual ROIs; (E) the upper triangular elements in the connectivity matrix are flattened into a feature vector, which constitutes the original features; (F) our proposed feature selection method is applied to choose a handful of features that contribute to the highest classification accuracy; (G,H) the resulting optimal feature subset is passed to a SVM which trains a model to identify autism spectrum disorder (ASD).
The main contributions of this paper are listed below.
(1) We constructed a minimum spanning tree from the original dataset and fully consider the effect of the feature context on classification to select the optimal feature subset.
(2) In combination with the feature selection method we proposed to identify patients with ASD, the calculation of the model is simplified, and the recognition accuracy is improved.
(3) We identified abnormal brain regions in patients with ASD by counting the regions with more frequent occurrences in the optimal feature subset. These abnormal brain regions provide important reference information for clinical decision-making.
The structure of the remainder of this paper is described below. In Section 2, we propose a feature selection method based on a minimum spanning tree and define the evaluation function to compare the classification performance. In Section 3, we describe our classification results and abnormal brain regions identified through experiments. We compare the classification performance of different feature selection methods and discuss the abnormal brain regions identified in patients with ASD in Section 4. Finally, in Section 5, the paper ends with conclusions.

Demographic Information
The rs-fMRI data used in this study were obtained from the ABIDE1 database [32]. The ABIDE1 database contains rs-fMRI data from 539 ASD patients and 573 normal controls (NC). This study chose rs-fMRI data that satisfied the following acquisition parameters: MRI scanner = 3.0-T Siemens, TR = 3000 ms, TE = 28 ms, time points = 120, data matrix = 64 × 64, slice thickness = 0.0 mm, flip angle = 90°, and axial slices = 34. We received image data containing 59 ASD patients (average age 12.5 years, 8 females) and 46 age-matched normal controls (average age 12.4 years, 7 females). All data included in ABIDE was acquired under the approval from the Institutional Review Board (B) after preprocessing the brain is divided into 90 regions of interest (ROI); (C) by averaging the BOLD activity in each ROI, a time series is extracted representing brain activity in that region; (D) using different measures of connectivity, a connectivity matrix is generated from the ROI time series quantifying the connectivity level between individual ROIs; (E) the upper triangular elements in the connectivity matrix are flattened into a feature vector, which constitutes the original features; (F) our proposed feature selection method is applied to choose a handful of features that contribute to the highest classification accuracy; (G,H) the resulting optimal feature subset is passed to a SVM which trains a model to identify autism spectrum disorder (ASD).
The main contributions of this paper are listed below.
(1) We constructed a minimum spanning tree from the original dataset and fully consider the effect of the feature context on classification to select the optimal feature subset. (2) In combination with the feature selection method we proposed to identify patients with ASD, the calculation of the model is simplified, and the recognition accuracy is improved. (3) We identified abnormal brain regions in patients with ASD by counting the regions with more frequent occurrences in the optimal feature subset. These abnormal brain regions provide important reference information for clinical decision-making.
The structure of the remainder of this paper is described below. In Section 2, we propose a feature selection method based on a minimum spanning tree and define the evaluation function to compare the classification performance. In Section 3, we describe our classification results and abnormal brain regions identified through experiments. We compare the classification performance of different feature selection methods and discuss the abnormal brain regions identified in patients with ASD in Section 4. Finally, in Section 5, the paper ends with conclusions.

Demographic Information
The rs-fMRI data used in this study were obtained from the ABIDE1 database [32]. The ABIDE1 database contains rs-fMRI data from 539 ASD patients and 573 normal controls (NC). This study chose rs-fMRI data that satisfied the following acquisition parameters: MRI scanner = 3.0-T Siemens, TR = 3000 ms, TE = 28 ms, time points = 120, data matrix = 64 × 64, slice thickness = 0.0 mm, flip angle = 90 • , and axial slices = 34. We received image data containing 59 ASD patients (average age 12.5 years, 8 females) and 46 age-matched normal controls (average age 12.4 years, 7 females). All data included in ABIDE was acquired under the approval from the Institutional Review Board (IRB) at each site. All fMRI images were obtained with informed consent in accordance with procedures established by the Human Subject Research Committee.
The demographic information of the 105 subjects is shown in Table 1. We used the Chi-square test and two-sample t-test to evaluate the differences in gender and age between the two groups. As shown in Table 1, no significant differences in age (p = 0.77) and gender (p = 0.81) were observed between the normal control group and the ASD group.

Data Preprocessing
All resting-state fMRI images were preprocessed using the Data Processing Assistant for Resting-state fMRI (DPARSF) [33]. The preprocessing steps are listed below. (1) Removal of the first 10 time points, (2) slice timing correction, and (3) head motion realignment. In this study, all subjects had less than 2 mm of displacement and 2 • of rotation in any direction. (4) Next, image standardization was performed by normalizing the corrected fMRI image to the echo planar imaging (EPI) brain template, followed by (5) spatial smoothing, (6) removing the linear trend, (7) temporal filtering, and (8) removing covariates.

Features from Function Connectivity
In this study, we used the Automatic Anatomical Labeling (AAL) [34] atlas to divide the brain into 90 ROIs. Each ROI is regarded as a node of the functional connectivity network, and the connectivity was used to represent the edge among these nodes. The average value of the time series across all voxels within a region serves as the average time series of the region [35]. Then, the Pearson correlation coefficient between each pair of ROIs' averaged time series was calculated, and Fisher s R-to-Z transformation was performed to measure the functional connectivity between ROIs. In this way, the function connectivity among the 90 ROIs was modeled as 90 × 90 connectivity matrix mathematically. Finally, we extracted 4005 (90 × 89/2) edges from the functional network as features of each individual subject's brain for the subsequent analysis.

Feature Selection
We proposed a new feature selection algorithm based on the minimum spanning tree. First, the connectivity matrix was converted into a feature graph, in this graph, the feature (connectivity between two ROIs) was set as a node (not edge) in the feature graph. The weight of each edge in the graph was defined considering both the redundancy between features and the discriminative information of the features. Secondly, the Prim algorithm was used to construct the minimum spanning tree, and then the original 4005 features were sorted by the minimum spanning tree. Finally, we utilized the SVM algorithm to compare the classification performance of the feature subsets with different feature dimension (N = 1,2 . . . , 4005), the top N features with the best classification accuracy were selected as the optimal feature subset.
The feature selection algorithm based on the minimum spanning tree was described in detail below.
(1) Construct an undirected weight feature graph A graph is a nonlinear data structure, and we used G = (V,E) to represent the feature graph, where V represents the set of all nodes in the graph, E represents the set of edges in the graph. Here in the feature graph, each node represents a connectivity feature, and the weight of the edge measures both the discriminative power of the feature and the redundancy between features. Let V i denote the i-th node of the graph and ω(i, j) denote the weight between node i and node j; graph G is shown in Figure 2. graph G is shown in Figure 2. In this study, we define a new weight calculation formula as follows: where | ( , )| corr i j is the absolute value of the correlation coefficient between nodes(features) i and j , i.e., the degree of redundancy, and ( ) R i is a feature evaluation function based on Fisher′s criterion [27]. The following formula of ( ) R i was used for the calculation: where m is the mean of the feature, 1 m and 2 m are mean feature values in each category, 2 1 σ and 2 2 σ are respective variances, and 1 n and 2 n are the number of samples in the categories.
The correlation coefficient is a statistical measuring of the direction and strength of the relationship of the changing trend between the two features, and its value ranges from -1 to +1. A positive value indicates a positive correlation, and otherwise, a negative correlation. Specifically, the greater the absolute value, the stronger the correlation. In the present study, we calculated the Spearman correlation coefficient [36] as Equation (3).
where i d is the difference between the ranks of corresponding features, and N is the size of data.
We can get the lower and upper limits of the sum in Equation (3) by calculating the rank difference of corresponding data, i.e., 2 2 0 As shown in Formula (1), the denominator of Fisher score measures the discriminative ability of two features, while the numerator of Spearman correlation coefficient evaluates the redundancy between the two features. Therefore, if two features have lower redundancy and higher discriminative power, the weight value of the edge connecting the two features is lower. Therefore, In this study, we define a new weight calculation formula as follows: where corr(i, j) is the absolute value of the correlation coefficient between nodes(features) i and j, i.e., the degree of redundancy, and R(i) is a feature evaluation function based on Fisher s criterion [27].
The following formula of R(i) was used for the calculation: where m is the mean of the feature, m 1 and m 2 are mean feature values in each category, σ 1 2 and σ 2 2 are respective variances, and n 1 and n 2 are the number of samples in the categories. The correlation coefficient is a statistical measuring of the direction and strength of the relationship of the changing trend between the two features, and its value ranges from −1 to +1. A positive value indicates a positive correlation, and otherwise, a negative correlation. Specifically, the greater the absolute value, the stronger the correlation. In the present study, we calculated the Spearman correlation coefficient [36] as Equation (3).
where d i is the difference between the ranks of corresponding features, and N is the size of data. We can get the lower and upper limits of the sum in Equation (3) by calculating the rank difference of corresponding data, i.e., 0 ≤ d 2 i ≤ N(N − 1) 2 . As shown in Formula (1), the denominator of Fisher score measures the discriminative ability of two features, while the numerator of Spearman correlation coefficient evaluates the redundancy between the two features. Therefore, if two features have lower redundancy and higher discriminative power, the weight value of the edge connecting the two features is lower. Therefore, the problem of finding the optimal feature subset is transformed into the problem of finding the subgraph with the smallest weight.
(2) Build the minimum spanning tree In a given undirected graph G = (V,E), ω(i, j) represents the weight of edge connecting vertex i to vertex j. If a subset of E exists in a non-cyclic graph such that the sum of the edge weights W(T) is the minimum, then this T is the minimum spanning tree of G. The minimum spanning tree is obtained using the Prim algorithm.
The Prim algorithm first takes a vertex as the initial vertex of the minimum spanning tree, then iteratively finds the edge with the lowest weight between the remaining vertices and each vertex in the minimum spanning tree, and adds it to the minimum spanning tree. After the addition, if the loop is generated, the algorithm skips this edge and selects the next vertex. When all vertices are added to the minimum spanning tree, the minimum spanning tree in the connected graph is found. The time complexity of this algorithm is O(n 2 ), which is independent of the number of edges in the graph and is suitable for the minimum spanning tree calculation of dense graphs.
In the present study, when 4005 candidate connectivity features were given, the edge weight between each pair of features were calculated as Equation (1). The MST was built using the Prim Algorithm 1 as follows: By constructing a minimum spanning tree, we simplified the original dense graph into a minimum weight subgraph, greatly reducing the number of edges, and obtaining subgraphs of vertices of different degrees. If the sum of the weights of the edges connected by a feature node is smaller, it means that the redundancy between this feature and other features is lower, and the discriminative power of this feature for the category is higher, which means the feature is more important. We defined the sum of the weights of the edges connected by each node in the minimum spanning tree as SW(i), and the calculation formula under the new method is as follows: where N represents the dimension of the feature vector, and ω(i, j) represents the weight of the edge between node i and j.
By calculating the SW value of all features, we sort them in ascending order. The smaller the SW value, the higher the ranking, indicating that this feature is more important.
To determine the dimension of feature set, we employed support vector machine (SVM) to compare the accuracies of different feature dimensions.
In recent years, SVM has been widely used in the field of neuroimaging-based disease recognition and has received extensive attention from researchers [37]. It is good at processing small samples of data and has the ability to process high-dimensional data, making it an excellent classifier [38,39]. Therefore, SVM was used in this study to measure the classification information of feature subsets.
We sorted all the features in MST nodes according to the single feature classification performance of SVM. The feature with the best classification effect was located at the top of the feature list. Then, we used the first two optimal features of the feature set to train and test the classifier and obtained the corresponding classification accuracy. In each step of the experiment, the number of features used was gradually increased until all the features used to train and test the classifier were included. Finally, we took the top K features with the highest classification accuracy as the optimal feature subset.

Classification Method and Evaluation of Performance
After the feature selection, we used the supervised learning method SVM as the classifier to distinguish patients with ASD. SVM used the radial basis function (RBF) as a kernel and was implemented in Pycharm and Scikit-learn.
In this study, accuracy [40], sensitivity [40] and specificity [40] were used in the 5-fold cross-validation process to evaluation the model performance. The formulas used to calculate these three indicators are as follows: where TP, FN, TN, and FP denote true positive, false negative, true negative, and false positive, respectively.

Performance of the Model
We extracted the correlation coefficient matrix from 106 subjects (59 patients with ASD and 46 NC controls). Figure 3 shows the functional and topological connectivity network of brain in a couple of ASD patient and normal controls. Next, we applied the feature selection algorithm we proposed to rank the importance of features. Then, the optimal feature subset was identified based on the SVM classifier, and the number of features that resulted in the best performance was 31. As shown in Table 2, the classification accuracy, sensitivity and specificity of the feature selection algorithm are 86.7%, 87.5% and 85.7%, respectively, which are obviously better than the results obtained by training the classifier with the original feature set.

The Optimal Feature Set and the Abnormal Brain Regions
According to the feature selection algorithm we proposed, the number of features in the optimal feature subset is 31. We counted the number of occurrences of the brain regions connected by these 31 features and defined the number as the weight of the brain regions. Figure 4 shows the weight map of each brain region. Table 3 shows the brain regions with more frequent occurrences: the right superior occipital gyrus, the left olfactory cortex, the right inferior frontal gyrus (opercular part), the left hippocampus, the left amygdala, and the left precentral gyrus.

The Optimal Feature Set and the Abnormal Brain Regions
According to the feature selection algorithm we proposed, the number of features in the optimal feature subset is 31. We counted the number of occurrences of the brain regions connected by these 31 features and defined the number as the weight of the brain regions. Figure 4 shows the weight map of each brain region. Table 3 shows the brain regions with more frequent occurrences: the right superior occipital gyrus, the left olfactory cortex, the right inferior frontal gyrus (opercular part), the left hippocampus, the left amygdala, and the left precentral gyrus.

Classification Effect
In this study, we used 5-fold cross-validation to assess the accuracy of the model corresponding to each feature subset. Based on the experimental results, the feature selection algorithm we proposed achieved a maximum accuracy of 86.7% when applied to the ASD diagnosis. In recent years, many researchers have proposed new feature selection algorithms and models for the diagnosis of ASD. For example, Chen [41] classified patients with ASD and TD individuals using support vector machine-recursive feature elimination (SVM-RFE) and achieved an accuracy of 66% in combination with a support vector machine. In a previous study [42], ASD was classified based on the differences in the interactions of different brain regions between patients with ASD and healthy controls. Through a nonlinear measurement of the interactions between time series pairs, the average classification accuracy ranged from 70% to 81%. A study [43] used a deep learning method based on an autoencoder to diagnose ASD with an accuracy of 70%. Research [44] proposed a recurrent neural network to integrate the phenotype data of the subjects with rs-fMRI data and extract features from them to diagnose patients with ASD, and their model reached an accuracy of 70.1%.
The classification accuracy of above models is usually not higher than 80%. Table 4 shows a comparison of the classification accuracy of our proposed method and the recently reported methods. When classifying patients with ASD, the selection of the optimal feature subset from the high-dimensional original feature set not only improves the classification accuracy but also contributes to improving our understanding of the brain abnormalities that cause the disease. Based on the analysis described above, the feature selection method based on the minimum spanning tree we proposed achieves a better diagnostic accuracy. Table 4. Accuracy comparison of proposed method and state-of-the-art classification methods.

Comparison with Other Feature Selection Methods
We compared two other commonly used feature selection algorithms and the feature selection method we proposed on the same dataset to verify its effectiveness. One is the Fisher score feature selection method, which is a univariate feature selection method that is often used in supervised learning. Its main premise is to identify the feature subset in the feature set space that ensures the largest distance possible between data points of different categories and the smallest distance possible between data points of the same category. The other is the recursive feature elimination (RFE) method, which uses a base model for multiple rounds of training. After each round of training, the features of several weight coefficients are eliminated, and then the next round of training is conducted based on the new feature set. The methods used for comparison and our method all adopt the same experimental settings, including the same dataset, the same preprocessing method, the original feature set, the same classifier, and the same 5-fold cross-validation, to fairly compare the effects of the three feature selection methods. Table 5 shows the experimental results obtained using the three feature selection methods, and our proposed method displayed the highest classification accuracy. This analysis further confirms that the feature selection method we proposed exhibits good performance in diagnosing ASD.

Analysis of the Brain Regions with Greater Weight
Based on the experimental results, the superior occipital gyrus, olfactory cortex, inferior frontal gyrus (opercular part), and hippocampus are the main regions showing abnormalities in patients with ASD. Detailed descriptions of the analyses of these regions are presented below.

Superior occipital gyrus
Recent studies of structural and functional brain networks in patients with ASD suggest that cognitive deficits may be attributed to abnormal connections between different brain regions (such as the superior occipital gyrus) [45][46][47][48]. Consistent with the results of our study, the superior occipital gyrus of participants with autism showed significant abnormalities as the task complexity increased when participating in the test task compared with the normal control group [45]. In addition, based on the analysis of voxels between patients with ASD and normal controls, individuals with ASD present a significantly reduced grey matter volume in the superior occipital gyrus [49]. According to the experimental results, the region of the superior occipital gyrus appears more frequently, indicating that the superior occipital gyrus plays an important role in the recognition of ASD. This result is consistent with previous studies.

Olfactory cortex
Humans must rely on the olfactory cortex network in the brain to receive information from peripheral brain regions and then transmit the information back to the hippocampus to combine various discontinuous transient factors and form episodic and working memory. The olfactory cortex plays a crucial role in translating daily experiences into lasting memories. Many studies have pointed out that the olfactory cortex has a great influence on basic cognitive processes and is associated with a variety of mental disorders [50][51][52]. The symptoms of patients with autism include abnormal smell recognition, and 5% to 46% of patients with autism have been diagnosed with epilepsy; notably, the olfactory cortex is the key to smell recognition and a structure that leads to epilepsy [53]. In addition, based on a study of structural and functional magnetic resonance imaging of the olfactory changes in patients with ASD, alterations in olfaction in patients with ASD are rooted in the primary olfactory cortex [54]. Previous studies have further confirmed our results, and the olfactory cortex contributes to the identification of patients with ASD.

Inferior frontal gyrus
The frontal lobe is the most developed part of the brain, and it is responsible for thinking and calculation, and related to individual needs and emotions. The abnormal inferior frontal gyrus was found in several works of ASD research. For instance, Grezes et al. [55] and Philip et al. [56] discovered that activation of the inferior frontal gyrus in ASD patients decreased when doing the specific tasks. In another study, ASD patients appear to have lower activation in the right inferior frontal gyrus than normal controls when fearful faces were given to them [57]. In the process of facial expression recognition, patients with autism show hypoactivation of the inferior frontal gyrus. This activity gradually increases with age, while the activity of the control group does not [58]. In a recent study, patients with autism often failed to make accurate social judgments about others when their verbal and nonverbal expressions were inconsistent. Researchers explored the mechanism and finally found that the basic cause of impaired social judgment in patients with ASD was reduced brain activity in the inferior frontal gyrus [59]. The findings of these studies are consistent with our experimental results, indicating that the inferior frontal gyrus is an area of the brain showing abnormalities in patients with ASD.

Hippocampus
The hippocampus is located between the thalamus and the medial temporal lobe and is mainly responsible for the storage, conversion and orientation of long-term memory. In a study of memory impairment in patients with ASD, the connectivity between the hippocampus and other areas of the brain was significantly reduced in patients with ASD compared with normal controls, indicating that hippocampal damage is a key factor contributing to memory deficits in patients with ASD [60]. In addition, Mackiewicz et al. [61] discovered that the hippocampus was a key brain area involved in emotional memory. ASD patients were associated with changes in hippocampus volume [62]. In another study, the hippocampal volume of patients with ASD was larger than normal controls, indicating that ASD is related to the volume of the hippocampus [63]. Via et al. [64] found that the volume of gray matter in bilateral amygdala-hippocampus complex decreased significantly in ASD patients. In addition, the researchers extracted 11 features from the hippocampus of patients with ASD and the normal controls, and the two groups presented significantly different distributions of features and the features from the hippocampus had a strong ability to distinguish patients with ASD from the normal controls [65]. Thus, the hippocampus represents a potential biomarker for the diagnosis and characterization of ASD, consistent with the results of our experiments.

Conclusions
A feature selection method based on the minimum spanning tree was proposed in this study to automatically identify patients with ASD. First, we constructed an undirected graph from the original dataset and treated the features as nodes in the graph. The weight between nodes was determined by the redundancy between features and the discriminant ability of features. Second, we used the Prim algorithm to construct the minimum spanning tree. Finally, we combined the support vector machine algorithm to take the first N features with the highest classification accuracy as the optimal feature subset. Compared with other feature selection methods, our method exhibits better classification performance (accuracy rate of 86.7%). In the present study, we only used functional connections between different brain regions as features to identify patients with ASD. In future studies, we will use more features, such as graph theory features and dynamic connection features, in combination with the feature selection method proposed here to establish a better classification model. In addition, due to the limitation of the dataset, both ASD patients and NC may face the problem of imbalance in gender. In the next step, we will use more comprehensive data to solve the problem of data imbalance and detect the early patients.

Conflicts of Interest:
The authors declare no conflict of interest.