A CNN Deep Local and Global ASD Classification Approach with Continuous Wavelet Transform Using Task-Based FMRI

Autism spectrum disorder (ASD) is a neurodegenerative disorder characterized by lingual and social disabilities. The autism diagnostic observation schedule is the current gold standard for ASD diagnosis. Developing objective computer aided technologies for ASD diagnosis with the utilization of brain imaging modalities and machine learning is one of main tracks in current studies to understand autism. Task-based fMRI demonstrates the functional activation in the brain by measuring blood oxygen level-dependent (BOLD) variations in response to certain tasks. It is believed to hold discriminant features for autism. A novel computer aided diagnosis (CAD) framework is proposed to classify 50 ASD and 50 typically developed toddlers with the adoption of CNN deep networks. The CAD system includes both local and global diagnosis in a response to speech task. Spatial dimensionality reduction with region of interest selection and clustering has been utilized. In addition, the proposed framework performs discriminant feature extraction with continuous wavelet transform. Local diagnosis on cingulate gyri, superior temporal gyrus, primary auditory cortex and angular gyrus achieves accuracies ranging between 71% and 80% with a four-fold cross validation technique. The fused global diagnosis achieves an accuracy of 86% with 82% sensitivity, 92% specificity. A brain map indicating ASD severity level for each brain area is created, which contributes to personalized diagnosis and treatment plans.


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disorder that affects social communication ability. ASD also causes language impairment and repetitive behaviors [1]. Individuals with ASD show different severity levels associated with each symptom [2]. The common ASD diagnostic standard utilizes history and expert clinical judgment together with behavioral modules of the autism diagnostic observation schedule (ADOS) [3,4]. Autism is diagnosed with the arising noticeable symptoms which start at the age of three to five years [5]. It is crucial to intervene and diagnose ASD early to allow for better assessment and treatment.
ASD can be diagnosed at the age of 12 months old, especially with the emergence of imaging diagnostic tools that employ brain imaging modalities such as structural (sMRI), functional (fMRI), and diffusion (DTI) magnetic resonance imaging [6]. Combining these scans to view the structure of the brain together with the brain functional activity during rest and performance of certain tasks constitute an early biomarker for ASD [7].
Resting state and task-based fMRI are types of fMRI scans that are adopted to manifest functional activity. Task-based fMRI measures evoked blood oxygen level-dependent (BOLD) signals during the performance of different tasks [8] such as auditory tasks, language tasks, visual processing tasks, motor tasks, and social tasks [9].
To investigate autistic brain abnormal functional response to speech compared to typically developed (TD) peers, several studies were performed [10]. Studies in [11][12][13] played an audio of a simple bedtime story and examined the sleep fMRI response. These studies included 40 autistic toddlers and 40 TD toddlers with ages that range from 12 to 48 months. Autistic toddlers showed abnormal laterality and hypoactivation in the left anterior portion of the superior temporal cortex (aSTG). On the other hand, TD toddlers exhibited the normal dominant activation of the left hemisphere aSTG. They also suggested early intervention and treatment as they demonstrated that as the age increases, lateralization abnormality increases.
Several studies up to 2013 that were reviewed in [14] concluded the involvement of atypical lateralization with language impairment. Individuals with ASD exhibited attenuation in the left hemisphere activation. Also, anomalous lateralization in the functional areas responsible for prelinguistics and language, specifically the fronto-temporal regions, were present. One of the reviewed studies [15] revealed atypical lateralization starts at an early age. Lower lateralization was present in high risk ASD infants, while higher lateralization was present in low risk peers. A review in [16] concluded similar results.
A meta-analysis of fMRI studies until 2013 was presented in [10]. Increased activation in the right precentral gyrus and decreased left activation were revealed in ASD individuals who performed language and auditory tasks, which contradicts the normal activation in TD individuals. Moreover, fMRI scans in TD individuals showed higher activation in the bilateral superior temporal gyri (STG) and left cingulate gyrus than ASD peers.
Literature on task-based fMRI analysis for ASD concludes fundamental differences in activation in ASD compared to TD individuals. These findings support the employment of task-based fMRI for early ASD diagnosis [17]. Machine learning (ML) has made it possible to develop intelligent and automated systems for several pattern recognition applications. The emergence of noninvasive or minimally invasive medical screening devices created massive informative data structures that allowed for the exploitation of ML for automated diagnosis. A research in [18] proposed a pipeline based on task fMRI scans for predicting treatment of social responsiveness scale outcome. They applied the general linear model (GLM) for brain feature extraction. Feature selection techniques were performed following feature extraction. For classification, they employed the random forest (RF) classifier. Twenty ASD children (5.90 ± 1.07 years) were included in the study. A recent study in [19] performed both local and global diagnosis for ASD toddlers. Brain areas parcellated with the Brainnetome atlas (BNT) were analyzed with a stacked nonnegativity constraint auto-encoder. The study included 30 ASD against 30 TD and classified between two groups with an accuracy of 75.8%. Another recent study graded the severity of autism into three groups [20,21]. GLM analysis for low individual level analysis, to extract features, and high group level analysis, to infer statistical differences between groups and validation, were applied. They utilized different approaches to extract features from GLM analyzed whole brain areas. Among the several classifier architectures they tested, Random Forest performed best with 78% accuracy. In [22], they enhanced their framework by performing a two stage classifier, included more data (92 mild, 32 moderate, and 33 severely autistic) and performed more validation techniques. Accuracies ranged between 70% and 83%.
ML and deep learning, which is a subset of ML that involves deep networks, have played a very important rule in many neuroscience applications. Convolutional neural network (CNN) is one of the most powerful DL network architectures. CNNs are deeply adopted in Brain-Computer Interfaces (BCI) as well as classification of EEG signals [23][24][25].
Recently, CNNs have been widely utilized for ASD diagnosis and analysis with fMRI [26]. Jinlong Hu et al. [27] adopted a multi-channel 2D CNN model to classify FMRI dataset of 995 subjects in a motor experiment. They proved that CNNs achieve good performance with high dimensional data, in comparison with other classifiers, mostly when the dataset is large as in their case. A study in [28] investigated the employment of spatial and temporal features of task-based fMRI. To capture the spatial information, they developed a 3D convolutional neural networks on two-channel images of mean and standard deviation that were created by the sliding window, which captures the temporal statistics. This framework achieved an 8.5% increase in the mean F-scores.
FMRI scans constitute 4D data of a brain 3D volume consisting of 1D time-dependent BOLD signals. Several signal processing techniques can be optimized to analyze these BOLD signals. Wavelet transform are considered one of the efficient time signal processing techniques for resolving time-series. Applications of the wavelet transform include compression, high resolution time, and frequency analysis and denoising [29]. It has also been utilized for fMRI analysis as an alternative to conventional GLMs. PS Lessa et al. [30] concluded that Wavelet correlation analysis achieves higher statistical power in comparison to GLMs. Moreover, wavelet transforms contribute to the achievement of efficient brain disorder diagnosis, such as ADHD, autism and Alzheimer diagnosis, when applied on fMRI feature processing. In an approach to diagnose ADHD, García et al. [31] performed continuous wavelet transform (CWT) to create scalograms of BOLD signals.
Most previous fMRI experiments were applied on adults [32,33], however, our proposed study includes toddlers/infants from 12 to 40 months old. The aim of our study is to develop an early autism computer aided local and global detection tool. Spatial dimensionality reduction with region of interest (ROI) selection and clustering have been performed to reduce the 4D fMRI data to a reduced number of BOLD signals. In order to provide a detailed frequency and scale representation, we have applied CWT on selected BOLD signals. CWT creates scalogram images that are used as input images to multi-channel 2D-CNNs for each area. Finally, brain maps that indicate level of ASD severity for each ROI is provided for each subject. The proposed framework works towards determining the neuro-circuits with abnormalities as well as creating personalized diagnosis and treatment plans that handles the specific case of each individual. Moreover, CWT achieved better results compared to other feature extraction and generation techniques.

fMRI Data Collection
This study includes subjects from "Biomarkers of Autism at 12 Months: From Brain Overgrowth to Genes" dataset. This dataset was collected between August 2007 and June 2014 and is provided by the national database for autism research (NDAR: http: //ndar.nih.gov (accessed on 22 May 2019)) [11,34,35]. The dataset included 639 subjects that were tracked every 12 months roughly starting at 12 months and until they are 40 months old.
We have chosen some substantial criteria in selecting subjects for our study such that included subjects must have ADOS toddler module, sMRI (T1) and (T2), and response to speech task fMRI (T2*). Intensive validation on each report and scan has been conducted. Visual validation is performed for all sMRI scans to exclude inaccurate or corrupted ones. FMRI scans have been validated to have 154 volumes and visually validated to have no clear artifacts. One hundred subjects (50 ASD 50 TD) with ages ranging between 12-40 months old, are included in this study. Information about each subject , such as IDs and final diagnosis, as well as the extracted BOLD signals of this dataset are available in Supplementary Materials 1 and 2, respectively.

Response to Speech Experiment
The experiment that was used while task-based fMRI scans were acquired is a response to speech experiment. An audio record of a narrator telling a story was played during natural sleep. The audio consists of three different types of records, simple forward speech, complex forward speech, and backward speech. Such records alternate with silence periods and are repeated during a 6 min and 20 s span.

Methods
In this study, local and global ASD diagnosis have been developed. Figure 1 demonstrates the adopted framework. First, fMRI scans are preprocessed using FMRI expert anal-ysis tool (FEAT) [36] developed in fMRI's software library (FSL) [37]. Brain parcellation is based on Harvard-Oxford probabilistic atlas https://identifiers.org/neurovault.collection: 262. (accessed on 11 April 2019) The Detailed explanation of preprocessing steps is provided in [20].

Spatial Dimensionality Reduction
Applying neural networks on raw data without feature engineering is feasible when the raw data are easily separable. However, identifying autism biomarkers in task fMRI is a complex problem as autism follows a wide spectrum and is not easily separable. Moreover, fMRI raw data is a high dimensional data of 4D. CNN performance decreases when data dimensionality is high and input data size is small as in medical applications. Hence, it is crucial to reduce dimensionality. A comparison of fMRI feature extraction and reduction approaches have been presented in [38], proving higher ASD classification results. The following steps have been proposed for feature reduction: • ROI selection: Based on literature of the response to speech experiment for toddlers, specific brain areas related to language circuits are activated. These areas include cingulate gyri (CG), superior temporal gyrus (STG), primary auditory cortex (PAC) and angular gyrus (AG) for both hemispheres. In this study, the most significantly activated brain areas are selected. • Clustering: Each brain includes several commonly activated voxels, which are considered redundant data. Therefore, grouping similar BOLD signals in each area and extracting a single value for each group is efficient and can extensively enhance classification performance. Hence, each brain area's BOLD signals have been clustered with kmeans. Different number of clusters have been tested to achieve higher validation accuracies.Two methods to represent the signals of each cluster have been tested: averaging BOLD signals, or extracting the BOLD signal closest to the center of that cluster.
The advantage of the previous reduction approaches is that the brain structure is maintained. Each brain area is represented by a number of features. This technique allows for local analysis and obtaining brain maps.

Continuous Wavelet Transform
CWT is a technique used to represent a signal by convolving wavelets, that vary continuously in transition and scale, with the original signal. The result presents a power spectrum of the signal as in Figure 2. The CWT of a signal x(t) at scale a (a > 0) and translation b is calculated by: where ψ is the mother wavelet which is a continuous function in both the time domain and the frequency domain and the * represents operation of complex conjugate. The mother wavelet is the source that generates daughter wavelets which are the translated and scaled versions of the mother wavelet. After extracting BOLD signals from clusters, the CWT is applied to produce scalograms that provide a detailed representation on these BOLD signals. The scalogram images are then rescaled to 64 × 64 and fed to multichannel 2D-CNNs for each area. In task-based fMRI experiments, quantifying the change in the BOLD signal across time is significantly important. As mentioned before, CWT scalograms hold information about both frequency and time in an image, and therefore, satisfy this requirement. Applying 2D CNN filters can extract trainable numerical weighted values from these images, during the training phase. During testing phase, these values are compared to classify each entry.

2D CNN Classification
CNN is a deep learning architecture gaining prominence in the analysis of images, including medical image data. CNN may be characterized by the dimensionality of their convolutional kernels, which in practice is typically between one and four, inclusive. Higher kernel dimensions incur a computational bottleneck, especially when paired with large input sizes, e.g., a 4D CNN that processes fMRI volumetric time series. We have developed a more tractable 2D CNN model four our framework. As a deep neural network, the CNN comprises a number of layers, including convolutional layers based on the aforementioned kernels, pooling layers for reducing the size of the activation map, and fully connected (FC) layers for higher order feature representations.
We have extensively tested several model hyper-parameters, as explained in detail in the experimental results. Our CNN model performs three successive passes of convolution and size reduction as shown in Table 1 (which is developed by the model summary method provided by Keras library). These are followed by FC layers (Dense), the final (output) layer having a softmax activation function for purposes of classification. As explained earlier, each brain area is represented with CWT power spectrum images. A separate CNN classifier is developed and tuned for better performance for each brain area. Global classification is obtained with majority voting by all areas, as shown in Figure 3.

Experimental Results
The incorporated dataset includes 100 subjects (50 ASD and 50 TD). Performance evaluation has been conducted for local CNN model. The whole framework integration is performed using python. The CNN classification model is implemented with Keras library. Several parameters at each step on the proposed spatial dimensionality reduction and classification pipeline are evaluated. The 4-fold average classification accuracy with random shuffling is the score to be optimized. For clustering, 3 clusters provide discriminant average BOLD signals for each area. In the CWT stage, 32, 64 and 128 number of scales have been evaluated. best performance is obtained by 64 scales. Some wavelets have been tested such as: Mexican Hat, Gaussian Derivative and Morlets. Best results are obtained with Morlets.

Local Classification
Each local CNN classifier is fed with CWT scalogram images extracted from both hemisphere and the inferior and posterior division, if present. Hence, each classifier has different number of extracted signals for it's input. Table 3 demonstrates the classification accuracy, sensitivity, specificity, and area under the curve (AUC) for the STG, CG, AG, and PAC areas. The AUC is an effective measure of sensitivity and specificity for assessing inherent validity of the proposed system. Higher AUC means that the proposed system is accurate in differentiating ASD with TD subjects. This implies both sensitivity and specificity are maximum and errors (false positive and false negative) are minimum.
The confusion matrix of each area is demonstrated in Figure 4. As can be noted, high percentages are concentrated in the diagonal of each matrix (True positive and True negative) and ranges around the corresponding total accuracy. Therefore, each matrix is balanced. Moreover, receiver operating characteristic (ROC) curves are plotted in Figure 5. After developing local 2D-CNN models, brain maps for each subject are created to represent the level of autism severity for each brain area.

Global Classification
The global classification accuracy is obtained by fusing the decision from each local classifier with majority voting. The achieved accuracy is 86% (sensitivity 82%, specificity 92%). The confusion matrix is demonstrated in Figure 6. Same notes can be concluded from the confusion matrix. We have also tested a global 2D-CNN classifier that is trained with the scalogram images of all areas at once. This step is performed as a validation step and to highlight the advantage of classification that is based on local classifiers . The obtained accuracy is 82%. Figure 7 plots the ROC of the classifier.
The accuracy is close to the global accuracy of 86% which proves the stability of the system. The inferred reason for less accuracy can be related to the fact that higher number of input features (and hence higher number of parameters) introduced in the CNN network achieves lower accuracy. Therefore in this validation model, the increased number of channels increases the number of parameters and hence, leads to lower performance.  The proposed framework achieves higher accuracies compared to other previous work performed on task-based fMRI scans of the same experiment, as presented in Table 4. A direct comparison between our research and other literature of other tasks would not be objective as other researches incorporate different data sets and task-based fMRI experiments. As a comparison with our previous approaches in [19,20,38], we can note that the accuracy of the proposed classification that is based on local classifiers is higher. The reason is believed to be the better learning of CNN local networks that have lower number of parameters. Majority voting reflects the advantage of building the decision based on the most affected brain areas, rather than all included areas.

Brain Maps
According to literature, not every brain area is affected by the same degree for each individual. Therefore, we obtain individual brain maps that explain the level of autism for each area. After the implementation and training of local classifiers, each subject's local brain area data is tested for each corresponding trained network. The resulted probabilities are represented in a brain map as demonstrated in Figure 8. As an example, the probabilities obtained for the first individual are: (STG: 0.037, AG: 0.36, CG: 0.31, PAC: 0.072). According to majority voting, the four areas has high probabilities for autism (p > 0.5), hence, this individual is TD. For the other individual, the obtained probabilities are: (STG: 0.77, AG: 0.97, CG: 0.61, PAC: 0.99). According to majority voting, the four areas has low probabilities for autism (p <= 0.5), hence, this individual is TD. Some individuals might have autistic areas and non autistic ones, as mentioned before. An example for the probability distribution (STG: 0.43, AG: 0.8, CG: 0.61, PAC: 0.99). Three areas are autistic (p > 0.5) and one area is non autistic (p < 0.5). Therefore, this subject is classified as autistic. Figure 8 also demonstrates a 3D view. The viewing tool is FSLeyes through FSL. As can be noted, the grade of autism are higher (red colors) of ASD subjects, with variable grade on each area. The grade of autism for TD subjects is lower (yellow colors) with different grades. Figure 8. Coronal, sagital and axial 2D views and a 3D view of both ASD and TD example. Brain areas for the ASD individual are more severely distributed (red highlights) than TD peer (more yellow highlight distribution).

Conclusions and Future Work
In this paper, a novel CNN Deep learning based ASD local and global diagnosis system is introduced. The proposed system utilized task-based fMRI to achieve this goal. According to the response to speech experiment, hypoactivation of the bilateral superior temporal gyrus, bilateral primary auditory cortex, cingulate gyrus and angular gyrus are exhibited in ASD toddlers. Whereas, TD peers exhibited typical lateralized activation. Based on these results, local spatial and temporal features are extracted from each ROI separately. CWT is performed to extract scalogram images, from the extracted BOLD signals from spatially reduced clusters, that hold frequency specifications. A local CNN classifier is utilized for each area. Experimental results are reported for all activated brain areas. Accuracies range between 71% and 80%. Global classification is obtained from local results. Achieved accuracy is 86% (with 82% sensitivity and 92% specificity). Finally, local individual brain maps are created for each subject that indicate level of ASD severity.
Future work will include the application of the same approaches on rest-state fMRI of same dataset. Hence, a detailed report for each subject will be obtained for connected brain networks during rest and activated brain areas during task activities. Global decision will be more accurate and will consider all functional aspects of the brain. Researchers are encouraged to collect more data from different geographical sites. A protocol for generic experimental design is recommended to enable researchers to validate their work with other datasets. More validation steps will be performed, leading to a robust ASD diagnosis system. In addition, our future work will include genomic data (which is available in the collected data set used in this paper) to correlate affected brain areas with specific genome sequences to help in early ASD detection. Finally, local classification results will be investigated to identify malfunctioned neuro-circuits involved with ASD. Informed Consent Statement: Written informed consent has been obtained from the patient(s) to use this dataset for research purposes and publishing this paper.

Data Availability Statement:
The dataset adopted in this research is provided by the national database for autism research NDAR: http://ndar.nih.gov (accessed on 22 June 2021).