Two-Step Feature Selection for Identifying Developmental Differences in Resting fMRI Intrinsic Connectivity Networks

: Functional connectivity derived from functional magnetic resonance imaging (fMRI) is used as an effective way to assess brain architecture. There has been a growing interest in its application to the study of intrinsic connectivity networks (ICNs) during different brain development stages. fMRI data are of high dimension but small sample size, and it is crucial to perform dimension reduction before pattern analysis of ICNs. Feature selection is thus used to reduce redundancy, lower the complexity of learning, and enhance the interpretability. To study the varying patterns of ICNs in different brain development stages, we propose a two-step feature selection method. First, an improved support vector machine based recursive feature elimination method is utilized to study the differences of connectivity during development. To further reduce the highly correlated features, a combination of F-score and correlation score is applied. This method was then applied to analysis of the Philadelphia Neurodevelopmental Cohort (PNC) data. The two-step feature selection was randomly performed 20 times, and those features that showed up consistently in the experiments were chosen as the essential ICN differences between different brain ages. Our results indicate that ICN differences exist in brain development, and they are related to task control, cognition, information processing, attention, and other brain functions. In particular, compared with children, young adults exhibit increasing functional connectivity in the sensory/somatomotor network, cingulo-opercular task control network, visual network, and some other subnetworks. In addition, the connectivity in young adults decreases between the default mode network and other subnetworks such as the fronto-parietal task control network. The results are coincident with the fact that the connectivity within the brain alters from segregation to integration as an individual grows.


Introduction
The human brain is a complex system with different regions dedicated to different functions, which are locally segregated but globally integrated to process information. At present, many brain imaging techniques have been used to characterize and quantify the brain, such as functional magnetic resonance imaging (fMRI). It measures the changes in the blood oxygen level dependent (BOLD) signal, which can reveal correlations in neural activity between distant brain regions [1][2][3]. These correlations are of fundamental interest to neuroscientists for the comprehensive and noninvasive exploration of network, auditory system, and executive control system, there is an increased connectivity in children compared to young adults. For subjects of 10-20 years old, nodes at multiple scales, e.g., voxel-wise or regions of interest (ROIs) within the brain, were examined in [28]. The results showed that connectivity between hub and non-hub regions changed with the development from childhood to adolescence. Based on 264 ROIs identified by parcellation analysis from rs-fMRI data, functional modules in the brain were detected for subjects [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] years old [29][30][31]. The results indicated that the network organization is stable throughout adolescence, and cross-network integration increases with age.
In this paper, we propose a two-step feature selection method to identify the differences of ICNs during brain development. The two-step feature selection methods can couple two effective feature selection methods to assemble an optimal feature set for building the prediction model [32]. This method takes full advantages of the support vector machine recursive feature elimination (SVM-RFE), which can efficiently reduce noise and irrelevant features in the classification task. In addition, by combining the F-score with the correlation score, only the discriminative features can be kept. Our results suggest that there exist many remarkable differences of ICNs with age. Among them, the significant differences in connectivity are strongly associated with task control, cognition, information processing, attention, alertness, and other functions. Compared with children, we find that young adults experience increased connectivity between the sensory/somatomotor network and the cingulo-opercular task control network, visual network, and ventral attention network. At the same time, young adults experience reduced connectivity among the default mode network and the fronto-parietal task control network. Our discoveries coincide with many prior studies, e.g., [26,27,[33][34][35][36][37][38]. Moreover, we reveal that with aging, most of the connections among different resting state networks become remarkably enhanced and those within rest state networks generally decline. This is coincident with the fact that the connectivity of brains alters from segregation to integration as one grows.
The analysis of topological properties based on graph theory further confirms the significance of the difference of ICNs between the two groups, namely, children and young adults. We chose clustering coefficient, characteristic path length, local network efficiency and global network efficiency as the evaluation criteria of the brain network. By the two-sample t-test, the p-values of these four parameters for the brain networks of the two groups are totally small. Additionally, compared with six important feature selection methods, the two-step feature selection method only needs less than 200 features to identify the differences of ICNs during development; for other methods, 600 or even more features are needed. Compared with the other methods, the redundant degree and the classification performance of the proposed method are also significant. From which we can know that the two-step feature selection method proposed here can detect the changes of ICNs more precisely.

Methods
In this paper, we propose a two-step feature selection strategy to select features without redundancy, with the potential to improve the accuracy of prediction and identification of sparse ICN changes.
Step 1: Feature selection based on SVMRFE-CA SVM-RFE is a wrapper feature selection algorithm that avoids overfitting when the data is of high dimension. It can remove insignificant features in order to achieve higher classification performance. SVM-RFE was first proposed for gene selection [39], and has been widely applied to many other fields [40][41][42].
Suppose that in the training set {( , −1} is the label of x i , N is the sample size, and D is the dimension of each sample (thus the original dimension of the feature data is also D). Finding the optimal classification hyperplane of SVM is equivalent to minimizing the objective function J where 2 ω is defined as the margin, ξ i is the slack variable related with training errors, and C is the penalty coefficient used as a trade-off between the maximum margin and the minimum training errors. By the Lagrange multiplier method, the optimal solution of (1) is where α i is the Lagrange multiplier. Let ω = (ω 1 , ω 2 , · · · , ω D ) T . In the SVM-RFE algorithm, each ω i is defined as the weight of feature i [39].
In [40], it was proven that ω 2 i ≈ |J − J i |, where J i represents the value of the objective function after feature i is removed; i.e., ω 2 i is an approximation of the change of the objective function when feature i is eliminated. Therefore, eliminating features with small weights will not have a big impact on the objective function of the optimization problem, which is just the essence of the SVM-RFE algorithm. Since |ω i | and ω 2 i are consistent in values, the importance of feature i can be evaluated by For SVM-RFE, the importance of each feature of the training data is calculated by (3), and the feature with the smallest score is deleted [43]. The SVM is trained again with the remaining features, and the iterations are conducted until the stopping criterion is reached.
In the SVM-RFE algorithm, if a feature is eliminated, it will not be considered again in the subsequent process, which may however have some discriminative ability. This happens especially in the case that the data set is small but with high dimensions. In order to improve the performance of SVM-RFE, we divide the training data into two parts: one for feature selection and the other for validation. The classification accuracy (CA) on the validation set is regarded as the evaluation criterion of the candidate feature subset. Then, by combining with backward deletion, the optimal feature subset can be finally obtained. Specifically, if one feature with small weight is picked, while the corresponding CA on the validation set decreases after its removal, then this feature should be reconsidered. Additionally, multiple subsampling strategies on the training data are utilized and a series of feature selection processes are performed to avoid mistakenly removing such features. Finally, such obtained series of feature subsets form one optimal feature subset. The improved method of SVM-RFE is denoted as SVMRFE-CA.
For SVMRFE-CA, the selected features are evaluated by their classification performances, so the features in the optimal feature subset are beneficial to the discrimination. However, among them, there may exist highly correlated and relevant features. In what follows, we will remove those features obtained by SVMRFE-CA, and only retain the most discriminative features as the final optimal feature subset.
Step 2: Feature selection based on F-score and the correlation score to form a new feature selection technique (FSCS) F-score is a simple and valid feature selection method to quantify the discernibility of a candidate feature [44]. The F-score of the i-th feature is defined as where µ P i and µ N i are the means of the positive samples and the negative samples corresponding to the i-th feature, and σ P i and σ N i are the variances of them, respectively. In Equation (4), the numerator can be considered as an index of the distance between the centers of two data sets, which evaluates the ability of the i-th feature to distinguish between positive and negative sample sets. Since the variance of a data set indicates the degree of concentration, the denominator of Equation (4) represents the ability of the i-th feature to assemble samples. Therefore, the larger the F score (i), the more distinguishable the feature i is between the two sample sets.
In addition, we introduce the correlation score between the feature i and a feature set D, which is defined as follows where c(i, j) represents the Pearson correlation coefficient between features i and j, and |D| denotes the feature size of D. C score (i, D) therefore measures the degree of correlation between feature i and feature set D.
In order to further select the most discriminative and non-redundant features from the feature subset obtained by SVMRFE-CA, we combine both F-score and the correlation score to form a new feature selection technique, i.e., FSCS. The evaluation criterion of it is defined as where D sel is the current selected feature set. The definition suggests the greater the FSCS(i) is, the stronger distinguishing ability of the candidate feature i, and meanwhile, the weaker the correlation between feature i and the features in D sel . The FSCS technique based on F-score and correlation score maximizes the discriminative power of feature i and simultaneously minimizes its overall correlation with other selected features, which can further optimize the features in the feature subset selected by SVMRFE-CA.

The Two-Step Feature Selection Method
In view of the above SVMRFE-CA as well as the FSCS method, we propose a two-step feature selection technique, and its flowchart is shown in Figure 1. In the first step, an integration of feature subsets is produced by multiple iterations of the SVMRFE-CA algorithm with random division of data. The frequency of each feature in these subsets is aggregated and sorted. The first n features with large frequency are retained, and they form an integrated feature subset which has a strong correlation with the classification performance. Based on this, the second step employs FSCS to further select features with the maximal discriminative abilities, while it has the minimal overall correlations with other candidate features. Therefore, using two-step feature selection, we can get the optimal feature subset. The two-step feature selection algorithm is described in Algorithm 1. Algorithm 1: Two-step feature selection method 1: Input: training set D 2: Initialize: Randomly divide D into two parts, 3/4 for feature selection and 1/4 for validation. The current feature subset D cur_1 contains all features of D, the selected feature subset D sel_1 = ∅. Let AR cur_1 = 0 and AR sel_1 = 0, where AR cur_1 and AR sel_1 are the classification accuracies of the training data for validation based on D cur_1 and D sel_1 respectively. 3: for t = 1:T 1 3.1: Train SVM based on D cur_1 to get the weight w i of each feature i, and calculate Score(i) by Equation (2). Score(i).
3.4. Repeat Steps 3.1 to 3.3 until the stop condition (the stopping criterion can be a pre-specified number T 1 of iterations or be desired by a generalization performance) is satisfied. 4: end for 5: Repeat steps 2 to 4 for M times. Count the total frequency of every feature in all D sel_1 obtained by M times, sort them, and get the current optimal feature subset D opt_1 . 6: Initialize: 7.3: Repeat Steps 7.1 to 7.2 till the termination condition (the termination condition can be a pre-specified number T 2 of iterations or a given threshold of FSCS(k)) is reached. 8: Output: Final optimal feature subset D opt = D sel_2 .

Results on Philadelphia Neuro Developmental Cohort (PNC) Data
In this section, we will apply the proposed two-step feature selection method to study the differences of FC during brain development. The flowchart of investigating the developmental differences of the brain between children and young adults is shown in Figure 2.

Data Collection and Preprocessing
The Philadelphia Neuro developmental Cohort (PNC) is a large-scale collaborative study between the Brain Behavior Laboratory at the University of Pennsylvania and the Children's Hospital of Philadelphia, which contains resting state fMRI data for adolescents aged 8 to 22 [45]. Standard brain imaging preprocessing steps were applied using Statistical Parametric Mapping 12 (SPM12), including motion correction, spatial normalization to standard Montreal Neurological Institute (MNI) space (spatial resolution of 2 × 2 × 2 mm) and spatial smoothing with a 3 mm Full Width Half Maximum (FWHM) Gaussian kernel. The influence of motion (six parameters) was further addressed using a regression procedure, and the functional time series were band-pass filtered using a 0.01 Hz to 0.1 Hz frequency range. Finally, the dimension of the data was reduced using the standard 264 ROI atlas as defined by Power et al. with a sphere radius parameter of 5 mm [29]. Then, the time sequences of all ROIs are averaged in the same brain region, so that the data is reduced to a 264 × T matrix for every subject with T = 124 denoting the number of time points for each rs-fMRI scan, and the time interval is 3 s.  In this study, subjects of children (class I) are 103 to 144 months old, and subjects of young adults (class II) are 216 through 271 months old. The details of the subjects are listed in Table 1.

Dynamic Functional Network Connectivity Analysis
After the PNC data was reduced to a 264 × 124 matrix for each subject, the FC can be obtained by calculating the Pearson correlations coefficient (PCC) for each pair of ROIs. As we have discussed in the Introduction, the traditional method of calculating FC is based on the assumption that the functional network of the brain is static over the entire scanning session. However, many recent studies have shown that the functional network changes dynamically over time, and more meaningful information of brain activation patterns can be mined by considering the presence of temporal variability [46,47].
In this work, the dFC is estimated by using the sliding window approach along with time series. For this method, a window is moved along the resting-state fMRI time series and the correlation between two ROIs is calculated for each window. Assume that the time series has T time points, thus there are N = [(T − l)/s] + 1 sub-sequences generated by the sliding time window with window length l and step size s. It is critical to choose the appropriate length of the windows, since longer windows are closer to the whole signal while shorter windows are more sensitive to transient changes and may introduce more noise to the correlation analysis [48]. To effectively capture dynamic information, a window length between 30 s and 60 s is suggested, which can reflect the cognition of an individual [48,49].
For the PNC data, we choose l = 20 and s = 1. In this way, each time window has 20 time points with the window length of 60 s and the step of 1 s. Therefore, for each subject, as the sliding window moves, the 124 time points of the whole BOLD signal will be divided into [(124 − 20)/1] + 1 = 105 sub-sequences. Based on this, we calculate the PCC between every two ROI regions within every sliding window.
For each subject, if we denote the time series in the n-th time window of ROI i (i ∈ {1, 2, · · · , 264}) as {x n ik } l k=1 , then the PCC between the s-th ROI and the t-th ROI in the n-th time window is calculated as follows wherex Thus, considering the symmetry of PCC for two ROIs, for each subject, the dFC feature matrix is D ∈ R N×C 2 264 with N = 105 being the number of sliding windows, where C 2 264 = 264 × (264 − 1)/2 = 34,716. For each subject, the average connectivity information is further extracted from matrix D by calculating the mean of each column. The obtained feature vector V ∈ R 34716 for each subject is a description of the average dynamic functional connectivity between each of two ROIs. The calculation approach of the feature vector for each subject is shown in Figure 3.  In order to capture the differences of ICNs during development, the proposed two-step feature selection method is implemented to remove the redundant features of V or only retain the discriminative ones. It is demonstrated that by the feature selection technique, some meaningful differences of functional connectivity between children and young adults can be found, which are further confirmed by the existing results on brain development, topological analysis of brain networks, and the performance of classification.

The Identification of Essential Differences of ICNs during Development
We randomly divide the samples into two parts, 70% training data for two-step feature selection and 30% test data for testing the classification performance of the selected features, and repeat the process for M 1 times. With each repetition, the training samples are randomly divided into two parts for M 2 times, i.e., 3/4 for feature selection and 1/4 for validating the selected features. Thus, by combining the SVMRFE-CA and the FSCS methods for the current training samples, we obtain the optimal feature subset which possesses the non-redundant information. Then, the frequency of each feature in all the optimal subsets obtained by M 1 times is counted. We found that when M 1 or M 2 is over 20, the selected features are nearly the same. Therefore, we choose M 1 = M 2 = 20. Finally, among the total 34,716 functional connections, 20,719 are absent, 1859 occur more than 10 times, and 134 present in all the 20 experiments. The 134 connections can be considered as functional connections with the most significant differences during development. Additionally, in order to facilitate the analysis on these 264 ROIs which are identified by Power et al. [29], the functional roles to brain regions are assigned from the work of Smith et al. [50]. Based on which, different resting state networks (RSNs) are defined [51], which are shown in Figure 4.
In Figure 4, the 13 RSNs are mainly concerned with the perception of movement, memory, language and other several functions of brain, and they are the sensory/somatomotor network (SSN), cingulo-opercular task control network (COTCN), auditory network (AN), default mode network (DMN), memory retrieval network (MRN), visual network (VN), fronto-parietal task control network (FPTCN), salience network (SN), subcortical network (SCN), ventral attention network (VAN), dorsal attention network (DAN), and cerebellar network (CN), respectively. In which, SSN contains the precuneus, cingulate gyrus, precentral and postcentral gyrus, superior frontal gyrus, and some other brain regions. COTCN contains the middle frontal gyrus, insula, superior temporal gyrus, etc. In DMN, the main brain regions include the posterior cingulate, medial frontal gyrus, parahippocampal gyrus, bilateral lateral temporal lobe. VN mainly contains the middle and inferior occipital gyrus, lingual gyrus and cuneus. FPTCN includes the superior parietal lobule and frontal gyrus. The main regions in SCN are the extra-nuclear, lentiform nucleus and thalamus. VAN is composed of the superior and middle temporal gyrus, inferior frontal gyrus and inferior parietal lobule. Among the 264 ROIs, there exist 28 ROIs with which the functional roles are uncertain, and they are uniformly merged into the uncertain RSN [35].
For the selected 134 functional connections, which are considered to be significantly associated with the developmental difference, there are 104 within or between the 13 RSNs defined above, and 30 related to the uncertain RSN, i.e., the 13th RSN. Among the 134 functional connections, 45 of them are significantly strengthened during development, and 89 are notably weakened with age.  In addition, the distributions of the 104 functional connections among the 12 certain RSNs are illustrated in Figure 6, where the thickness of the line represents the number of different function connections, i.e., the thicker the line is, the stronger functional connection is. Furthermore, the enhanced and weakened functional connections are shown in Figure 7 by the BrainNet Viewer toolbox [52].

The Validation of the Selected Different ICNs by Graph Theory
Topological properties of a brain network can be used to explore the coordination and cooperation between different brain regions [53,54]. For example, the clustering coefficient quantifies functional segregation of the brain network, which reflects the ability of a specialized task to occur within certain densely interconnected groups of brain regions. The length of a characteristic path quantifies the functional integration of the brain network, which reflects the ability to combine specialized information from distributed brain regions. Both global and local network efficiencies quantify the transmission capability of the brain network, which reflects the ability of transmitting information between different brain regions. By topological analysis, the significance of those selected ICNs related to development can be verified. Based on the two brain networks constructed separately by the selected ICNs of the children and the young adults, we explore if there exist any discriminating differences of topological features for these two networks. We chose clustering coefficient (Cc), characteristic path length (Lp), local network efficiency (E l oc), and global network efficiency (E g ) as the evaluation criteria. Their definitions are shown in Table 2. These four parameters of the two brain networks are calculated and tested by the two-sample t-test. The p-values of Cc, Lp, E l oc, and E g are 1.55 × 10 −12 , 1.29 × 10 −12 , 1.07 × 10 −12 , and 2.28 × 10 −12 , respectively. Obviously, the analysis of topological features clearly confirms that there do exist definite differences of the brain connectivity between children and young adults.

Comparison with Other Feature Selection Methods
In order to show the advantages of the proposed two-step feature selection method, we compare it with SVM-RFE, FSCS and several other typical feature selection methods, i.e., maximum correlation minimum redundancy with mutual information difference (mRMR-MID), maximum correlation minimum redundancy with mutual information entropy (mRMR-MIQ), feature section with random forest by Gini index (RFFS-GINI), and feature section with random forest by out of bag data (RFFS-OOB) [55,56]. Each feature selection method was repeated 20 times with random 70% training and 30% testing data. The performance of the classification accuracy rate (CAR) for different feature selection methods are shown in Figure 8. The two-step feature selection method achieves its optimal classification accuracy in a faster and more stable way than other methods. It is worth noting that for other methods, 600 or even more features (i.e., the differences of ICNs) are needed to get their best performances, while less than 200 features are sufficient for the proposed method to identify the the changes of brains' connectivity during development. This is consistent with the feature size selected in Section 3.3. Table 2. Different measuring parameters of the topological properties for brain network.

Measure Definition
Cc: Clustering coefficient Where t i is the number of triangles around node i, k i is the number of links connected directly to node i, d ij is the shortest path length between node i and node j.
Additionally, the redundant degree of a feature subset D can be calculated by where c(i, j) represents the Pearson correlation coefficient of features i and j, and |D| denotes the number of features in D. By (8), the average redundant degree of both the two-step feature selection method and the SVM-RFE (with best performance) are 49.8423 and 138.6289, respectively, which strongly confirms the ability of the two-step feature selection method to effectively remove redundancy. In addition to CAR, the selected differences of ICNs can be further quantified by sensitivity (SS), specificity (SC), positive predictive value (PPV) and negative predictive value (NPV), which are defined as follows SS = Corr p Corr p +InCorr n SC = Corr n Corr n +InCorr p PPV = Corr p Corr p +InCorr p NPV = Corr n Corr n +InCorr n CAR = Corr p +Corr n Corr p +Corr n +InCorr n +InCorr n (9) where Corr p , InCorr p , Corr n and InCorr n denote the number of true positive, false positive, true negative and false negative samples, respectively. CAR, SS, SC, PPV and NPV provide five different assessments of the classifiers. CAR is the proportion of all samples that are correctly predicted, SS is the proportion of positive samples (here, the positive samples refer the children subjects) correctly predicted among all the predicted positive samples, SC is the proportion of negative samples (i.e., the young adults) correctly predicted among all the predicted negative samples, PPV is the proportion of correctly predicted positive samples among the true positive samples, and NPV is the proportion of negative samples correctly predicted among those true negative samples. We calculate the average classification performance when the CAR curve of each method stably achieves its maximum, and the results are shown in Table 3.
From Figure 8 and Table 3, we can see that the proposed two-step feature selection method has the best identification on the differences of ICNs during development. When comparing with mRMR-MIQ, mRMR-MID, FSCS, RFFS-GINI, RFFS-OOB and SVM-RFE, the average classification accuracy has increased by 12.46%, 10.66%, 9.81%, 4.84%, 3.51%, 1.58%, respectively. The sensitivity, specificity, positive predictive value and negative predictive value of the proposed two-step feature selection method are all better than the other methods.

Discussion
The two-step feature selection method proposed here is utilized to identify the essential differences of ICNs as an individual grows. With the first step, i.e., SVMRFE-CA, the noise and irrelevant features can be efficiently reduced. At the same time, since the data set is small but with high dimensions, if a feature in the training process is eliminated, although it may have some discriminative ability, it will not be considered again in the subsequent process by the original SVM-RFE method. In order to retain these latent significant features, we introduced a reconsidered way to keep these features, and the multiple subsampling strategies are also utilized to avoid mistakenly removing such features. In the second step, the F-score is combined with the correlation score to remove the highly correlated and relevant features selected in the SVMRFE-CA process. By the two-step feature selection method, only the features with the most discriminative ability will be retained.
The analysis based on graph theory further confirms the significance of the revealed difference of ICNs between children and young adults. Compared with other feature selection methods, the two-step feature selection method also shows its competitive power. For this method, less than a third of features are needed to identify the developmental differences. Meanwhile, the redundant degree, and the classification performance both displayed the efficiency of the proposed method.
The obtained results reveal that, compared with children, for young adults there exist many remarkable differences in the functional connections among SSN, COTCN, VN, VAN, DMN, FPTCN, and SCN, where the connectivity among SSN, COTCN, VN, and VAN, are significantly enhanced. The results can reflect the specific functional differences during brain development since these RSNs are strongly associated with cognition, information processing, attention, alertness, and other functions. In detail, it is reported that SSN is mainly related to cognitive activities [57]. Some brain regions in COTCN play a particularly important role in the coordination of information transfer between networks and are very active in performing many complex cognitive tasks [58]. VN is mainly involved in visual information processing. VAN is responsible for non-spatial attention, which enables a bottom-up attention selection by a stimulus-driven process, which includes awareness of significant events, reorientation of attention, and alertness [59]. Additionally, our findings are quite consistent with prior reports. For example, it has been observed that the connectivity among the SSN and certain other RSNs are relatively low in childhood [26]. The study by Jolles et al. showed that the hyper-connectivity of young adults' VN with other RSNs is enhanced, that is, young adults' visual information processing ability is higher than that of children [33]. In [35], it observed that some ICNs related to SSN and VN are quite complex in young adults' brains. Meanwhile, we found that compared with children, there exist some weakened functional connections in young adults' brains, i.e., DMN, SCN, and COTCN. These discoveries have also been found in prior studies. Hutchison et al. revealed that children have stronger connectivity between DMN and the cognitive control system than young adults [34]. In [36], it was discovered that there is more consistent DMN connectivity in children' brains. In [37], they uncovered that with increasing age during adolescence and early adulthood, a clear pattern emerges, that is, the connectivity decreases between DMN and COTCN. In [38], it showed that for children, the cortical gray matter reaches a peak, which is usually reduced in adolescence, and this can be a reason leading to the weakening of the connection between DMN and SCN. We discovered that several decreases in connectivity exist between FPTCN and SSN for young adults, and it was also found in [26], that children have stronger connectivity between these two RSNs. Additionally, our research revealed that some differences of connectivity exist between the uncertain RSN and other RSNs, e.g., SSN, DMN, and SN, and they exist most in the fusiform gyrus, lingual gyrus, frontal lobe, and temporal lobe. Moreover, we found that for young adults, the intra functional connections within many RSNs are weakened compared with children, e.g., in DMN and SSN. At the same time, the inter functional connections among different RSNs (e.g., the connections among SSN, VN, VAN and COTCN) are markedly enhanced. The above discoveries are confirmed with the understanding that the connectivity of the brain moves from segregation to integration as one grows. That is, with age development, most connections among different RSNs become remarkably enhanced, and those within RSNs generally decline.

Conclusions
In this study, we presented a two-step feature selection method to identify the essential differences of brain functional connectivity as one grows. For the resting state fMRI dataset focused on brain normative development, 134 significant differences of dynamic functional connectivity between children and young adults have been discovered. Based on this, we revealed that there exist many remarkable differences in the functional modules among the sensory/somatomotor network, cingulo-opercular task control network, visual network, ventral attention network, default mode network, fronto-parietal task control network, and subcortical network. Especially when compared with children, young adults have strengthened functional connectivity between the sensory/somatomotor network and other functional networks. Weakened functional connectivity was found in young adults, e.g., the connections between the default mode network, subcortical network and cerebellar network. Meanwhile, our findings are consistent with the fact that the connectivity of brains changes from segregation to integration during development. The topological analysis further verified the significance of the identified brain functional connectivity. Additionally, a comparison with several feature selection methods supports the performance of the two-step feature selection method.

Conflicts of Interest:
The authors declare no conflict of interest.