Machine Learning and rs-fMRI to Identify Potential Brain Regions Associated with Autism Severity

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized primarily by social impairments that manifest in different severity levels. In recent years, many studies have explored the use of machine learning (ML) and resting-state functional magnetic resonance images (rs-fMRI) to investigate the disorder. These approaches evaluate brain oxygen levels to indirectly measure brain activity and compare typical developmental subjects with ASD ones. However, none of these works have tried to classify the subjects into severity groups using ML exclusively applied to rs-fMRI data. Information on ASD severity is frequently available since some tools used to support ASD diagnosis also include a severity measurement as their outcomes. The aforesaid is the case of the Autism Diagnostic Observation Schedule (ADOS), which splits the diagnosis into three groups: ‘autism’, ‘autism spectrum’, and ‘non-ASD’. Therefore, this paper aims to use ML and fMRI to identify potential brain regions as biomarkers of ASD severity. We used the ADOS score as a severity measurement standard. The experiment used fMRI data of 202 subjects with an ASD diagnosis and their ADOS scores available at the ABIDE I consortium to determine the correct ASD sub-class for each one. Our results suggest a functional difference between the ASD sub-classes by reaching 73.8% accuracy on cingulum regions. The aforementioned shows the feasibility of classifying and characterizing ASD using rs-fMRI data, indicating potential areas that could lead to severity biomarkers in further research. However, we highlight the need for more studies to confirm our findings.


Introduction
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized mainly by social impairments, commonly followed by communication challenges or restricted and repetitive patterns of behavior [1]. ASD is a substantially heterogeneous disorder in which two diagnosed subjects may have a completely different set of symptoms. Some researchers estimated that approximately one in 44 children aged eight years are in the spectrum [2]. Despite a possible gender bias regarding diagnosis, ASD seems to be a sex-related disorder, with a male-to-female ratio close to 3-4:1 [2][3][4]. Current research points to ASD as a primarily hereditary disorder. Approximately 80-83% of ASD cases are due to genetic inheritance. Close to 17-20% are due to environmental risk factors, including problems during the gestation period and the parents' age [5][6][7].
Children and adolescents with an ASD diagnosis have medical expenses up to 6.2 times greater than those with typical development (TD), with general costs from 8.4 to 9.5 times greater than the average [8]. In addition to medical expenses, intensive behavioral interventions needed for ASD treatment have costs from USD 40,000 to USD 60,000 per child Meanwhile, on the fMRI side, some universities have worked together and created the Autism Brain Imaging Data Exchange (ABIDE) [33], an initiative that makes available more than 2000 brain fMRI scans for research purposes. In addition, all fMRI subjects gave consent to use their images. This initiative facilitates autism investigation by providing access to a database that otherwise would not be easily acquired. Moreover, the preprocessed data available on ABIDE I PREPROCESSED also contribute in this sense.
Therefore, we take into account the following true propositions: (1) early diagnosis and interventions lead to better outcomes for autism treatment, as well as long-term cost reduction; (2) ADOS scores allow a rating of the ASD severity; (3) promising results of ML techniques classifying ASD vs. neurotypical through the use of rs-fMRI; and (4) the ADOS scores and ASD rs-fMRI data available at ABIDE. This work aims to investigate the functional differences between autism spectrum and autistic individuals, looking for potential brain regions that may be associated with autism severity. We used ML applied to brain segments from rs-fMRI data to classify individuals from the two groups to identify these regions, selecting the ones with the greatest differences as potential biomarkers that should be more deeply investigated in future works.
The remainder of this paper is structured as follows: Section 2 presents the methodology employed. Sections 3 and 4 present and discuss our results, while Section 5 concludes this work.

Methodology
This section presents this work's methodology. It starts by describing the materials used in Section 2.1, followed by a presentation of the ADOS sub-classes for ASD classification in Section 2.2 and the region selection process in Section 2.3. Then, we explain both the ML used to classify the samples in Section 2.4 and the validation process in Section 2.5. Finally, we present the final data source in Section 2.6 and the accuracy, sensitivity, and sensibility cut-off points in Section 2.7.

Materials
In this work, we used the rs-fMRI data provided by ABIDE [33]. The ABIDE I consortium currently offers 1100 rs-fMRI scans from subjects with and without ASD diagnosis. Since our work was not an ASD vs. TD classification, all rs-fMRI data of neurotypical subjects were discarded, leaving 505 preprocessed fMRI scans from subjects with ASD diagnosis. From these ASD data, only 202 had information concerning ADOS scores for communication, social interaction, and repetitive behavior, which are essential data in our classification approach. Thus, the final data comprised 202 ASD subjects.
The original data from fMRI are 3D images over time. Therefore, applying an atlas and a preprocessing pipeline is necessary to transform the 3D images into matrices representing the brain regions (columns) and their respective activities over time (rows). The preprocessing pipeline also removes noises and other undesirable artifacts, which allows better results.

Automated Anatomical Labeling (AAL)
An atlas is a brain mapping that allows us to evaluate brain activity through its regions. We used the AAL atlas [34] available at ABIDE, as it is the most used atlas in the literature for ASD classification using fMRI and ML [21], reaching meaningful outcomes in [18,20,[35][36][37].
In its third version, AAL segments the human brain into 116 ROIs. A detailed explanation of these regions can be seen in [34]. Table 1 presents the AAL's labels. These pipelines have different methods and sequences to manage fMRI data, removing noise such as head motion, skull, and magnetic interference. We only used the DPARSF pipeline in this work [26,38,39]. The criteria used for choosing DPARSF were analogous to those employed in the atlas definition process. Except for works where the authors create their preprocessing pipeline, DPARSF is the prevailing pipeline in a number of papers [21], reaching meaningful outcomes in ASD classification using rs-fMRI and ML [37,[40][41][42].
The DPARSF final product is a matrix (X, Y), where X is the number of columns, and Y is the number of rows. Each table column represents one ROI, according to the chosen atlas, and each table row represents the elapsed time during the scan. The number of rows (Y) could differ for each fMRI, even using the same atlas. However, the X value must be the same for all fMRI using the same atlas. For example, in a DPARSF matrix, a value (X i ,Y j ) represents the oxygen level of ROI i at time j.

ADOS Classification
We used the ADOS standard division for ASD diagnosis to investigate any functional differences in the severity of ASD. The ADOS standard division has previously defined cut-off points to classify subjects as autistic, ASD, or non-ASD. Table 2 shows the maximum scores and the ASD and autism cut-off points for each module (ASD score groups according to the individual's age) and domain areas. For each ADOS module, the first line indicates the maximum value; the second line shows the ASD cut-off point, and the third line indicates the autism cut-off point, according to the domain area. We adopted the cut-off points from [15] to determine into which class a given subject should be classified, based on their scores available on ABIDE. This way, if a subject scored in at least one domain above the "autism cut-off", they were classified as Class 2 (autism). If the subject did not score above the "autism cut-off" but had at least one domain scoring above the "ASD cut-off", they were classified as Class 1 (ASD). We classified the remaining subjects as non-ASD, discarding them. Tables 3 and 4 show the ABIDE subjects' distribution according to the ADOS class; the complete phenotypes of each subject are available on [33].  Tables 5-7 present the phenotype information of the selected subjects.

Region Selection
We grouped the ROIs from AAL by macro regions, considering the region name. The result was a set of regions (SoRs) (e.g., precentral left and right as one SoRs, angular left and right as one SoRs). This process resulted in 35 SoRs containing the ROIs grouped by brain region. We also included one SoRs with all the ROIs. Table 8 presents the resulting SoRs, where the set ID is the SoRs' identification, and the RoIs IDs match the RoIs used in Table 1. [X, ..., Y] is a one-to-one incremental sequence where X is the lower limit and Y the superior (e.g., [1, ..., 4] is the same as [1,2,3,4]).
This approach aimed: (1) to simplify the SVC classification; and (2) to give a more generic location of the functional differences between ASD classes in a manner that would allow better comparison between existing studies that use different atlases.

SVC Classifying Algorithm
We used a supervised learning method, support vector machine (SVM), specifically the C-Support Vector Classification (SVC), to check the differences between ASD sub-classes. This method has three steps: training, validation, and test [43,44].
Based on an in-depth systematic review and meta-analysis available in [21], we selected SVM as our ML method. SVM was the most used AI tool for solving ASD classification problems, showing some reliable results when applied in similar situations [18,20,37,45,46]. The second most used method was the artificial neural network (ANN) [21]. Both approaches have similar results in the literature, with SVM slightly better in terms of sensitivity [21]. As our goal was to find potential regions of a biomarker, and due to the complexity of the problem, we decided to adopt SVM given its more direct comparison, facilitating the interpretability of the results. We used the SVM from the scikit-learn library available at [47].
SVM creates a multidimensional plane, where each object (in our case, each subject) will be positioned according to the selected features' value. First, the sample part used for training will determine a curve to split the plane, as shown in Figure 1, where each area corresponds to one class. Then, the validation sample part will verify the accuracy of the curve, and this process will be repeated until the SVM reaches the best angle given the features, training sample, and validation sample. After this, the test sample is used to measure the SVM generalization. We hypothesized that higher accuracy would reflect the existence of an interpretative way to differ each class. In other words, SoRs with higher accuracy potentially contain the regions where classes are more distinct regarding the features used. These findings can highlight the areas to consider for further investigations on functional brain activity and ASD severity.
As the main goal was to find regions where there is a functional brain difference in the ASD severity level, and there is a lack of data about SVM setups in previews works on fMRI related to ASD investigations, as observed in [21], we chose a few educated-guess setups in our experiment. The setup was related to the variables gamma, coef0, kernel, class_weight, degree, and max_iter.
The gamma delimitates how close the final classification should be regarding the training sample, with more significant values given to more rigid solutions and lower values to given more flexible solutions.
The coef0 is an independent value related to the scale of the sample. Meanwhile, the kernel is the mathematical equation used to solve the problem, and the ones available from [47] are linear, poly, rbf, sigmoid.
The class_weight option considers the size of each class in the training step, adjusting the weight accordingly. For example, regarding training, if Class 1 has three subjects and Class 2 has nine subjects, Class 1 will weigh three while Class 2 will weigh one. This process is meant to avoid the algorithm taking into account only the dominant class from training, which can jeopardize the SVM's generalization capacity.
The degree will define the curve degree of the equation that splits the SVM classification plane. Finally, max_iter is the total training iterations allowed to be used by the algorithm, stopping the training when the value is reached, regardless of the gain.
Here, we used the following values for each variable: •

Validation Process
We performed a k-fold cross-validation model to validate our process [48][49][50]. We selected k = 10, which is recommended for samples larger than 200 objects. The SVM automatically split the sample into training and test; in this case, we used the standard 70% to training and 30% for test. Therefore, the 9 folds were sent to the SVM and then split into 7/3 for training and test, and then applied in the 10th fold for validation; the process was repeated until all 10 folds were used as the validation sample.
We adopted the following division criteria to avoid bias noise: • Amount of subjects of a specific ADOS subclass in each fold, avoiding any fold having only subjects of the same subclass. For example, a fold without autistic subjects could bias the SVC always to answer ASD due to the lack of autistic subjects on training or validation.
We first divided our sample into two groups, ASD and autistic, one for each ADOS subclass. Then, we ordered them by subject ID, and for each group, we designated one subject at a time for each fold: {Subject 1 to Fold 1, Subject 2 to Fold 2, Subject n to Fold (n mod 10)}.
Thus, each fold had a balanced subclass distribution at the end of this process. Given our sample's limitations, this process aimed to produce the most adaptive learning for our SVC.

Final Data Source
The resultant data were composed of two files for each subject. The first file contained a matrix where each column represented one of the 116 ROIs from the AAL atlas, and each row represented a picture of the brain over time. The second file was a vector with the subject's phenotype data, including the ADOS score. Since the first row of each fMRI placed the ROI label, we removed it from the file sent to the SVM.
SVM only accept vectors as its input. Therefore, we converted the resulting matrix from DPARSF into a vector. We considered two conversion options: (1) construct a vector from the matrix where the matrix position (X i , Y j ) is placed on the vector position (Z i+i * j ); and (2) acquire the maximum, minimum, median, and average values for each ROI from each SoRs and create a vector (Z a max , Z a min , Z a med , Z a avg , ..., Z b max , Z b min , Z b med , Z b avg ), where a and b are, respectively, the first and the last ROI ID of a SoRs.
Both conversion options have advantages and drawbacks. The first option has the simplest preprocessing but a more significant need for computer power for the SVC to process all data. On the other hand, the second option has the drawback of a preprocessing pipeline, which will acquire the data from each subject to transform in the four values mentioned above, with loss of information due to transformation. However, due to the size reduction, the SVC requires less computer power to analyze all the data from all subjects. Thus, aiming for better scalability and facilitating human understanding of the results, we chose the second option for this paper.

Accuracy, Sensitivity, and Specificity Restrictions, and Post-Hoc Tests
We imposed restrictions on the minimum accuracy, sensitivity, and sensibility required to consider a functional difference between the two ASD sub-classes. The cut-off point was 60%, based on values achieved by other ASD vs. non-ASD classification studies [22][23][24][51][52][53]. Thus, we discarded results with accuracy (ACC), specificity (SPC), or sensibility (SNS) less than 60%.
Finally, we applied three post-hoc tests on the features from the SoRs that achieved the cut-off: addition of phenotype data, t-test, and p-value. The addition of phenotype data aimed to investigate the effect of sex, age, and FIQ on SVM accuracy for each SoRs, while t-test and p-value aimed to investigate the separability of the sample used, to investigate how they differed from both groups.

Results
This section presents the results of our ASD vs. autism classification experiments. All SoRs can be seen in Table 8 and each ROI used by these sets can be seen in Table 1. In this paper, we used specificity (SPC) related to the ASD classification and sensitivity (SNS) associated with the autistic classification.
Our experiments worked with a total of 202 subjects, which comprised 36 with ASD and 166 with autism, according to the ADOS scores. Table 9 shows the SoRs with the ACC, SNS, and SPC greater than or equal to 60%. . This shows the existence of a non-random separation when considering five brain regions.
The t-test of each feature allows us to understand the difference between the ASD and autistic groups. The t-test results are a statistical difference between any two given groups, and positive values mean that the group 1 average is larger than group 2, while negative values mean that the group 2 average is larger than group 1. Table 10 shows the t-test result for each feature on each SoRs for which SVM had above threshold results, and the positive values mean that the ASD group average is larger than the autistic group for that feature, while negative values mean that the autistic group average is higher.
Furthermore, reinforcing the t-test result, the p-value (scale [0,1]) of each feature from SoRs above the required threshold is plotted in Table 11. The higher p-value was 0.96 for the mean on ROI 4 (Frontal Sup. Orb. Left), the third ROI from SoRs 1, with high values indicating a risk of not being able to distinguish the two groups from each other. On the other hand, lower values indicate a high possibility of discerning the two groups using the feature. The lower p-value was 0.02 from the min on ROI 72 (Putamen Left), the first ROI from SoRs 27. The SoRs 1 has a mean p-value of 0.45 (0.43 STD), while SoRs 11 has a mean p-value of 0.32 (0.14 STD); for SoRs 23, 27, and 30, the mean p-value is 0.53 (0.51 STD), 0.30 (0.24 STD), and 0.30 (0.24 STD), respectively. Therefore, SoRs 11 has the lowest p-value STD and one of the lowest p-value means, which indicates a high probability of containing the largest set of features to classify ASD severity. It is worth noting that these values reflect only our sample and should not be used as a diagnostic tool as further research is needed to either confirm or deny our findings.  Moreover, we performed other trials adding phenotype information (age, sex, and full IQ). We used the same features and added the phenotype data in the vector sent to the ML algorithm. We executed the test for the three phenotypes together, one at a time, and all combinations of two phenotypes. We used the same process for the main experiment; the results that reached the threshold defined in Section 2.7, as well as the ACC gain, using the phenotype for each SoRs are shown in Table 12. However, as shown by [21], these features did not show a significant improvement, if any, in the sample. The missing combinations did not reach the cut-offs in at least one of ACC, SNS, or SPC.
Finally, we show the mean result for each of the features with high ACC both for ASD and autistic in Tables 13 and 14, respectively.

Discussion
This paper assessed brain functional differences between ASD and autism using rs-fMRI and SVM classification (SVC). The measure used to distinguish ASD from autism was the ADOS score and cut-off points, as seen in Table 2.
Our results highlight some brain regions that potentially can distinguish functional differences between both groups (ASD vs. autism). The main finding in distinguishing the two ASD sub-classes reached up to 73.8% accuracy (SoRs 11). These results need to be taken with caution due to the limitations mentioned and given its Matthews Correlation Coefficient of 0.31 (scale [−1,1]), which is better than a random selection but still not ideal. However, our results show a promising path to investigate the functional difference between both ASD sub-classes.
The best ACC was reached for SoRs 11, consisting of the cingulate gyrus (cingulum), and both left and right sides of the brain for the anterior, median, and posterior. We can conjecture that brain regions such as the cingulum (73.8% ACC, 76.5% SNS, 60.8% SPC) and angular (SoRs 23) (66.3% ACC, 67.4% SNS, 60.8% SPC) have the potential to differentiate the severity of ASD subjects taking into consideration the ACC reached on this experiment. These SoRs applied together with methods such as ADOS may in the future allow professionals to classify individuals. The frontal lobe (SoRs 1) (64.9% ACC, 65.7% SNS, 63.3% SPC) also should be considered for further investigations as it shows reasonable ACC.
Since these brain regions are commonly pointed to as an ASD vs. TD differential, we can also suppose, based on our results, that such regions have the potential to describe areas where functional activity may be a biomarker for ASD severity, supporting previous investigations [64]. Therefore, we can presume the potential functional difference between subjects from the ASD group and the autism group using these ROIs.

Conclusions
Firstly, and most importantly, the field lacks sample data to strengthen the recent outcomes. We believe that all published studies have insufficient samples to ensure definitive conclusions on ML applied to fMRI for ASD diagnoses. For example, the ADOS used hundreds of thousands of subjects to validate its algorithm, while the sum of all subjects from all published papers regarding ML applied to fMRI (discounting the subjects duplicated for multiple studies) is not even close to this value. Therefore, any claim to solve the issue tends to be premature. Nevertheless, it is mandatory to research possible biomarkers while waiting for more available data to validate the findings.
We investigated the functional brain activity difference between ADOS ASD subclasses (autism and ASD) using fMRI data from subjects previously diagnosed and available at ABIDE. The differences between each ASD sub-class were the ADOS score and cut-off points. We applied these data to train an ML classification algorithm (SVC) to classify the disorder severity, investigating the existence of functional brain differences across regions between both ASD sub-classes.
Our main contribution was the identification of five SoRs that potentially have discriminating patterns for ASD severity. Additionally, the suggested use of SoRs can help to improve investigations by allowing more clarity in interpreting and comparing the results, aiming to enable physicians to look up the same markers found by the ML. In this same aspect, opting to explore approaches using features more easily observed by human analyses, such as the maximum, minimum, mean, and standard deviation from each ROI, is also another contribution. These contributions can improve further research to give tools for physicians to utilize these signals when evaluating a subject, more than simply finding an ML to aid the ASD evaluation.
Our findings are consistent with previous studies on autism and brain development, bringing a promising approach to evaluating ASD subtypes. A computational aid system could improve medical diagnosis by delivering more tools for physicians' evaluation, reducing analysis ambiguity. Further research, applied to a younger sample, can allow a computational system to assess individuals early, before the most severe symptoms begin. Distinguishing the severity of a subject can help in intervention selection, and earlier diagnosis can help set proper interventions to improve the individual's quality of life.
Our study limitations lie mainly in the reduced sample size, which may not generalize our outcomes for all populations. However, we can speculate about these functional differences between the ASD subtypes.
Another limitation of the study was the mean age of the subjects (-16 years old), which does not correspond to early diagnosis. Therefore, an additional experiment with younger subjects will be required to improve the results' reliability.
For further works, an increase in the available subjects, including younger ones, would help to raise the accuracy as it would help to clarify how many of our results can be generalized to all populations. In addition, the research community would benefit from more available fMRI data with the respective phenotype data (such as ADOS score, age at scan, sex, FIQ), allowing more accurate investigations. Institutional Review Board Statement: Ethical review and approval were waived for this study due to the use of publicly available, previously published data.