Machine Learning-Based Classification of Autism Spectrum Disorder across Age Groups †

Presented at the 2nd Computing Congress 2023


Introduction
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by a variety of behavioral and developmental abnormalities.A person with ASD will experience lifelong effects on their ability to interact and communicate with others [1].Since its symptoms frequently appear in the first two years of life, autism is considered as a "behavioral disease" and can be diagnosed at any age.Experts claim that the ASD issue begins in childhood and lasts through adolescence and old age.The disease ASD also has an impact on how the human brain develops.Typically, a person with ASD cannot interact socially or have a discussion with others.
The effects of ASD on a person's life typically last throughout the rest of their lives.It is important to remember that this illness could occur as a result of both hereditary and environmental causes.This condition's symptoms can appear at almost three years of age and may continue for the rest of one's life.Although a patient with this condition cannot be totally cured, the effects can be temporarily reduced if the signs are caught early.Researchers believe that ASD may be linked to human genetics, though they have not definitively identified the precise underlying factors.
The major goal of this research is to improve the diagnosis of autism by developing a machine learning system that makes use of various machine learning algorithms to create an autism predictive model with the highest level of accuracy.The solution is to provide a very accurate predictive model that can predict whether an individual (adolescent, child, or adult) has ASD or not.The goal is to employ a standard approach for diagnosing autism and convert it into a machine learning model that can use medical data to generate predictions and observations, and lead to better solutions for identifying ASD as early as possible in the future.

Literature Review
This Section presents some of the studies related to ASD using machine learning.In the study [2], the authors used the Autism Spectrum Questions (AQ) to create models for classifying ASD.They employed Least Absolute Shrinkage and Selection Operator (LASSO) and Chi-square to identify the most relevant features from the AQ dataset.Subsequently, they applied three supervised machine learning algorithms, Logistic Regression (LR), Random Forest, and K-Nearest Neighbors, utilizing K-fold cross-validation for robust evaluation.The results indicated that Logistic Regression achieved the highest accuracy rate, reaching 97.541%.This impressive performance was achieved by selecting 13 essential features-based Chi-square method.
Deshpande et al. [3] used functional MRI (fMRI) to examine how individuals with autism and normally developing controls differ in terms of the causal influence of one brain area on another (effective connectivity) during Theory-of-Mind (ToM) tasks.The participants include 15 high-functioning people with autism and 15 typically developing people who served as controls.The SVM classifier distinguished between people with autism and typically developing controls, with a maximum accuracy of 95.9%.
Duda et al. [4] investigated the potential of machine learning in accurately and swiftly differentiating between Attention Deficit Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD) using data from the Social Responsiveness Scale.The study uses 65 behavioral features with the maximum accuracy of 96.5%.A feature selection wrapper using swarm intelligence to perform ASD diagnosis on the UCI ML repository is presented in [5].The study is based on the hypothesis that an ML model can achieve superior classification accuracy with a minimal subset of features.The results support this idea, showing that only 10 of the 21 essential traits in the ASD dataset are necessary to distinguish between patients with ASD and those without it.Surprisingly, using these ideal feature subsets, the technique produces an accuracy range of 92.12% to 97.95% on average.
In [6], the early signs of ASD in children are identified.The experiment was conducted on UCI data of children using different classifiers, and the results showed that Logistic Regression achieved the highest accuracy among the models, offering a promising approach to aid in the early detection of ASD.Convolutional Neural Network (CNN)-based prediction models were applied to UCI data in [7].After addressing the missing data and applying machine learning models, the results highlight the superiority of Convolutional Neural Network (CNN)-based prediction models, achieving remarkably high accuracy rates of 98.30%, 96.88%, and 99.53% for ASD screening in children, adolescent, and adult populations, respectively.In [8], federal learning is applied to achieve 98% and 81% accuracy for ASD child and adult datasets, respectively.A detailed review about ASD is presented in [9,10].

Proposed Methodology
The proposed method is shown in Figure 1 and includes data preprocessing, feature reduction, model evaluation, and ASD prediction.

Preprocessing
The autism dataset is first preprocessed to remove missing values and encode categorical attributes.The dataset contains some missing values in individual features especially in terms of gender, country, ethnicity, etc., and the different types of attributes.Pre-

Preprocessing
The autism dataset is first preprocessed to remove missing values and encode categorical attributes.The dataset contains some missing values in individual features especially in terms of gender, country, ethnicity, etc., and the different types of attributes.Preprocessing is applied on the dataset for handling missing values and categorical attributes.Binary label encoding is used for four features in the dataset.For example, the attributed gender is either male or female.This is converted to numeric value 0 for a female and 1 for a male.The dataset includes data collected from 89 countries.Each country is represented in alphabetic order from 1 to 89, and the missing country in the dataset is represented as 90.The dataset includes a total of 14 ethnicity and is represented by 14 values used in an alphabetic order, and the missing value is represented as 15.The preprocessing step applied in the dataset is shown in Table 1, and the data before and after preprocessing is shown in Table 2.
? is represented as missing value in the databse.

Cuckoo Search Algorithm (CSA)
CSA algorithm is used for feature reduction.Using the Cuckoo Search Algorithm (CSA) for feature selection in the context of Autism Spectrum Disorder (ASD) research can be a promising approach to improve the accuracy and efficiency of data analysis.The algorithm for cuckoo search is provided below.The parameters used for cuckoo search include population = 20, stopping criterion = 100, probability of abandoning a nest = 0.25, and scale factor for leavy flight = 0.6 (Algorithm 1).
Evaluate fitness of each nest 4.
Choose a cuckoo randomly from the autism Dataset 5.
Generate a new solution (features) by modifying the cuckoo's solution 6.
Evaluate the fitness of the new combination of features 7.
Implement the CSA to search for optimal feature subsets.8.
while (stopping criterion not met) Repeat the following steps: i. Levy Flight Generation: Use Levy flights to generate new solutions.ii.
Evaluate New Solutions: Assess the fitness of the new solutions.iii.
Replace Solutions: Replace less fit solutions with better ones.iv.
Abandon Solutions: Occasionally replace some solutions with new random solutions (exploration).v.
Evaluate fitness of nests

return best solution found
After feature reduction, the output label (ASD or normal) is predicted using different classification methods.Each classifier's accuracy is evaluated and compared.

Experimental Results and Discussion
The performance of the proposed approach was evaluated on ASD datasets from the UCI database and implemented in MATLAB.

Dataset Description
In this research, three publicly accessible ASD datasets from the UCI database were utilized, which are relevant for the clinical diagnosis of ASD at various ages.Dataset description is shown in Table 3. Children (age between 4 and 11 years), adolescents (age between 12 and 17 years), and adults (age above 18+ years) are the three age groups represented in the datasets.The dataset includes a total of 21 features with 10 behavioral features and 10 individual features.The individual features are related to personal information which includes age, ethnicity, gender, born with jaundice, country, etc., and the behavioral features are related to the screening questions.The data were collected by using a survey in nations across the world through a mobile application called ASD Tests.

Classification Methods
The classification model classifiers such as Logistic Regression (LR), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) are used for classification.

Logistic Regression
It is one of the most popular machine learning algorithms used primarily for binary classification tasks.It uses a logistic function to find the optimal curve to fit the data points.

K-Nearest Neighbors (KNN)
This algorithm is a straightforward and intuitive machine learning method employed for both classification and regression purposes.It operates as a non-parametric, instancebased approach that makes predictions on how closely data points in a particular dataset resemble one another.The experiment was conducted on different K values, and the maximum accuracy was obtained with K = 10.

Support Vector Machines (SVM)
The primary utilization of Support Vector Machines (SVM) is for both multiclass and binary classification tasks.Its core objective is to identify an optimal decision boundary that effectively segregates data points into distinct classes, all the while maximizing the separation margin between these classes.

Result and Discussion
We applied three ML models for evaluation.Accuracy is calculated for all the models using the following equation: The accuracy of various ML models on the ASD datasets is shown in Table 4 and Figure 2. According to the results, compared to other models in the available dataset, linear regression has the highest accuracy.Since the authors used their different methods and datasets, the results in Table 5 are not comparable.This research can be enhanced by the usage of deep learning techniques, more datasets, and more features.Since the authors of [6] used CNN, their achieved accuracy is very high.Since the authors used their different methods and datasets, the results in Table 5 are not comparable.This research can be enhanced by the usage of deep learning techniques, more datasets, and more features.Since the authors of [6] used CNN, their achieved accuracy is very high.

Conclusions
In this study, three publicly available ASD screening datasets offered by the UCI machine learning repository were used to detect Autism Spectrum Disorder (ASD) using several ML models.This study evaluated different machine learning models for the accurate and robust classification of ASD across various age groups, from early childhood to adulthood.The findings and insights from this research contribute to a deeper understanding of ASD diagnosis, offering potential benefits to clinicians, researchers, and individuals on the autism spectrum.In order to increase the system's robustness and overall performance, future research should concentrate on large datasets, enhancing feature selection methods, and using deep learning strategies that combine CNNs and classification.

of 7 Figure 1 .
Figure 1.Proposed system for autism prediction.

Figure 1 .
Figure 1.Proposed system for autism prediction.

Table 1 .
Rule applied for missing and encode categorical values.
? is represented as missing value in the databse.

Table 2 .
Data before and after preprocessing.

Table 5
provides a comparative analysis with the prior research concerning ASD.

Table 5 .
Comparison with the existing methods.

Table 4 .
Accuracy of different age groups of ASD dataset.

Table 5 .
Comparison with the existing methods.