Autism Screening in Toddlers and Adults Using Deep Learning and Fair AI Techniques

: Autism spectrum disorder (ASD) has been associated with conditions like depression, anxiety, epilepsy, etc., due to its impact on an individual’s educational, social, and employment. Since diagnosis is challenging and there is no cure, the goal is to maximize an individual’s ability by reducing the symptoms, and early diagnosis plays a role in improving behavior and language development. In this paper, an autism screening analysis for toddlers and adults has been performed using fair AI (feature engineering, SMOTE, optimizations, etc.) and deep learning methods. The analysis considers traditional deep learning methods like Multilayer Perceptron (MLP), Artiﬁcial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM), and also proposes two hybrid deep learning models, i.e., CNN–LSTM with Particle Swarm Optimization (PSO), and a CNN model combined with Gated Recurrent Units (GRU–CNN). The models have been validated using multiple performance metrics, and the analysis conﬁrms that the proposed models perform better than the traditional models.


Introduction
Autism spectrum disorder (ASD) has been termed a developmental disorder owing to the fact that the symptoms appear in the first two years of life.ASD may impact an individual's interaction, communication, learning, and behavior.Since there is a wide variation in severity and types of symptoms, it is referred to as a spectrum disorder.The lifelong disorder, if treated in a timely manner, can improve symptoms and functioning.Autistic individuals may make inconsistent eye contact, have difficulties with conversations, have delayed or mismatched facial expressions with what is said, have trouble understanding others, and may be unable to adjust to social situations.While the primary cause of ASD is unknown, clinical studies suggest that the cause may be tied to a person's genes and environment [1].It is also found that the number of brain tissues in the cerebellum is significantly less for autistic people.Diagnosing ASD in young children is usually a two-stage process, such that at the first stage, children are screened for developmental delays at regular intervals.Additional diagnostic evaluation may be conducted if symptoms of ASD are seen in the initial screening process.This evaluation may include neurological examination, assessment of cognitive abilities, observing the child's behavior and language abilities, and hearing tests.The diagnosis may be slightly challenging in adults, who are assessed for communication challenges, repetitive behaviors, and sensory issues.Early treatment is essential for ASD as it can assist the individual in dealing with aggression, hyperactivity, attention problems, depression, and anxiety.Proper medication and treatment programs can help these individuals learn communication, social, and language skills and reduce behaviors that interfere with their daily lives.Studies suggest that autism rates among children have tripled from 2000 to 2016 in the New York metropolitan area [2].In 2018, 1 in 44 children was diagnosed with ASD [3].In the past, several studies have been carried out to either analyze autism in individuals or to provide a potential solution for combating the disorder.Eye-tracking movements can provide visual preference patterns to identify autism in individuals [4].Medical and sociodemographic features can also be investigated for diagnosing patients with autism [5].Neurological studies on white matter microstructural disintegrity have also contributed to understanding the disorder [6].Along the same lines, several clinical studies and machine-learning techniques have been deployed to study autism in individuals [7,8].While previous studies have presented the analysis of individuals using several methods, the studies lack extensive analysis for toddlers and adults.This paper presents two deep-learning architectures for identifying potential signs of autism in toddlers and adults.The first architecture incorporates a combination of Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) integrated with Particle Swarm Optimization (PSO).The second architecture integrates CNNs with Gated Recurrent Units (GRU).This study also incorporates visualization based on data for both toddlers and adults along with other baseline models such as K-Nearest Neighbors (KNN), Logistic regression (LR), Random forests (RF), Decision Trees (DT), Support Vector Classifier (SVC), Multilayer Perceptron (MLP), Artificial Neural Networks (ANN), CNN, LSTM, GRU, and CNN-LSTM.The performance of all these models has been evaluated across multiple datasets, i.e., for toddlers and adults, using specific metrics like accuracy, precision, recall, and F-1 scores.To the best of the author's knowledge, this is the first study to introduce CNN-LSTM-PSO-based architecture.The bio-inspired algorithm has been incorporated into the analysis to present the best optimal solution.Due to bias impacting ethical decision-making, this study deploys multiple Artificial Intelligence (AI) techniques to ensure the accurate performance of the models.Extensive feature engineering and techniques like SMOTE (Synthetic Minority Oversampling Technique) for handling imbalanced data are also part of this study.The rest of the paper is organized as follows: Material and Methods incorporate the related works and methodology.This section also discusses the architecture of the proposed deep learning methods followed by AI fairness techniques.The experimental analysis highlights the datasets used and the metrics considered for evaluation.The results and observations are discussed in the following experiments.This section also includes a comparative analysis of the proposed work with others.Finally, this study is concluded.Kohli et al. (2022) [9] conducted a study on using machine-learning techniques along with social visual attention for assessing autism in children.The study incorporates biomarkers such as eye movements toward social stimuli for facilitating early diagnosis.Other body movements, neural correlates, electrodermal activities, and genomes may also be used as biomarkers.Statistical approaches like explanatory and predictive strategies may be used effectively to analyze autism in individuals.The study asserts that Support Vector Machines (SVM) and neural networks are excellent machine-learning tools for ASD classification.However, SVD performs better in terms of performance and cost-effectiveness.Similarly, Lau et al. (2022) [10] suggested eye-tracking-based diagnosis using machinelearning models for detecting ASD.The study incorporates scanning the eyes' path to extract projection points for analyzing behavior among children.The analysis uses feedforward networks (FFN), ANNs, and pre-trained CNNs using GoogleNet and ResNet-18.The study also introduces a hybrid model combining GoogleNet with SVMs and ResNet-18 with SVMs.The overall accuracy achieved due to these hybrid models is 95.5% and 94.5%, respectively.Liao et al. (2022) [11] presented yet another interesting study using voice markers with machine learning to detect autism in individuals.The analysis incorporates novel cross-linguistic datasets along with a pipeline for minimizing overfitting.Multiple analyses are conducted on the participants in terms of tasks and languages.Ml models are observed to be efficient in identifying autism from voices.The study's main aim is to assess the generalizability of the different models, and it is observed that the models do not generalize well to different tasks and new languages.Maenner et al. (2021) [12] performed an extensive analysis of ASD using Applied behavior analysis (ABA).The gold standard treatment has been personalized using 29 participants for the clinical study.The patient similarity method and collaborative filtering techniques have been applied to the data.The study observed an average accuracy of 81-84%, followed by a normalized discounted cumulative gain of 79-81%.The efficiency of the models has been validated using the percentage of recommended goals, and the treatment recommendation is generalizable to other methods.Lau et al. (2022) [10] analyzed ASD using a supervised machine-learning analytic approach.Prosodic differences in ASD are prevalently seen in multiple languages due to cross-linguistic variability.The analysis considers acoustic features and intonational aspects of prosody considering English and Cantonese languages.The ML models assert that efficient ASD diagnosis with respect to rhyme-relative features.However, the relevant intonational features did not show efficient classification for Cantonese.The study asserts that rhythm is a key prosodic feature, and intonation can lead to variation in other prosodic properties.Mellema et al. (2022) [13] proposed machine-learning techniques for identifying autism in children using physiological and behavioral data.Using feature extraction techniques on electroencephalography (EEG), eye fixation, and facial data, ASD can be detected in children with improved efficiency and reduced costs.The study uses a weighed naive Bayes classification technique for multimodal data fusion and achieves an accuracy of 87.50%.The study also incorporates confusion matrices and graphs to depict how eye movements, facial expressions, and EEG may have varying discriminative powers for ASD detection.Minissi et al. (2022) [14] suggested the application of ML algorithms to neuroimaging features from structural and functional Magnetic resonance imaging (MRI).This could be advantageous in comprehending brain alterations and characteristics of ASD.The study investigated twelve machine-learning algorithms separately trained on combinations of different MRI features followed by optimization.The model achieves 80% area under the precision-recall curve on the IMaging-PsychiAtry Challenge (IMPAC) dataset and 86% and 79% under the precision-recall curve on other datasets.The models were also successful in identifying biomarkers for ASD diagnosis.Peketi and Dhok (2023) [15] studied detecting autism in young children using Magnetoencephalography Signals and learning machine-learning techniques.The study, which aims to find biomarkers for ASD detection, performed a clinical study on thirty children watching cartoons.The features considered for ML modeling are neural oscillations and phase, usually inferred by power spectral density (PSD) and preferred phase angle (PPA).The classification accuracy was 88% for PPA features and 82% for PSD features.The analysis also combines PSD and PPA features to acquire 94% (feature level) and 98% (score level) accuracy, respectively.Rabbi et al. (2023) [16] considered two different datasets based on genetic and personal characteristics for diagnosing autism.The study was conducted using optimized feature selection methods for handling high-dimensional data.The classification process has been enhanced by deploying sixteen models and optimization techniques.The study also incorporates four bio-inspired ML optimization algorithms: Artificial Bee Colony, Grey Wolf Optimization, Bat Algorithms, and Flower Pollination Algorithms.The evaluation metrics considered for the study are precision, accuracy, recall, F-1 score, and the area under the curve (AUC).The experimental analysis depicts an accuracy of 99.96% using grey wolf optimizer with SVM.Rybner et al. (2022) [17] suggested learning machine-learning techniques for early-stage detection of ASD for multiple age levels.The study incorporates data with respect to toddlers, children, adolescents, and adults, using feature selection techniques, and evaluates the performance using accuracy, kappa score, and f-1 measure.It was observed that SVMs performed better than other classifiers, with an accuracy of 97.82% for toddlers, 95.87 for adolescents, and 96.82% for adults, respectively.The Shapley Additive Explanations (SHAP) method has also been applied to multiple studies for performing feature-based analysis [4,18].Deng et al. (2022) [19] introduced a spatial-temporal transformer for identifying ASD using time series Functional magnetic resonance imaging (fMRI).The study details the use of a linear spatial-temporal multi-headed attention architecture for obtaining a spatial and temporal representation of data.A Gaussian method has been deployed for tackling the imbalance problem.The study validates the robustness of the model by inspecting two independent datasets; however, the overall accuracy achieved is much lower than the state-of-the-art methods.Moreover, transformers suffer from high computational demand, complexity, and high training time.Cao et al. (2023) [20] suggested using a Vision Transformer for analyzing pediatric ASD.The study uses a large facial dataset for performing model structure transferability and introduces a Gaussian Process layer for enhancing the robustness of the learning.The model shows a satisfactory highest accuracy of 94.50% but suffers from model complexity and high training time.

Methodology
Convolutional Neural Networks (CNNs) are used to recognize patterns in data.The CNN architecture incorporates multiple layers like the convolutional layer, the max pooling layer, and the fully connected layer, where each layer performs a specific task.The convolutional layer includes filters and generates feature maps from input features by performing the convolutional operation, which is depicted as f = V (x * w f + y f ), such that f represents feature maps, while w f depicts kernel weight vectors.x is used to represent input features, and y f depicts the bias.While V is used to denote the activation function, + denotes the convolutional operation.Long Short-Term Memory (LSTM) was introduced to solve the problem of vanishing and exploding gradients, commonly seen in recurrent neural networks.LSTM includes logic gates such as the input, output, and forget gates and hidden states of the previous and current timestamps.The gates regulate what information must be added or removed using pointwise multiplication operation and sigmoid function.The proposed architecture (Figure 1) includes both components CNN and LSTM bound together with Particle Swarm Optimization to detect if an individual has ASD accurately.While CNN and LSTM are deployed to capture complex features and patterns from the provided data, PSO is used for optimizing the results to achieve better model performance.Dropout during training has been introduced to eliminate the overfitting issue.Once the CNN and LSTM layers are deployed, dropout is applied to enhance the model's generalization ability.The fully connected layer provides the overall output using the softmax activation function.The PSO can determine the architecture and parameters of the hybrid deep learning network architecture.It can simultaneously optimize the number of layers of CNN and LSTM along with the number of units and epochs.The PSO algorithm witnesses every particle in the swarm incorporating five variables representing the network's hyperparameters.The swarm depicts the possible set of solutions, and each variable in the particle is identified by its position in the search space for each hyperparameter.The search space is initialized with random numbers to depict network layers, filters, and epochs, and the search spaces with respect to hyperparameters are determined, taking into account computational time and efficiency.The fitness function, essentially the Root Mean Square Error (RMSE), is calculated using specific hyperparameters for the network.The velocity and positions of particles get updated frequently, and the algorithm determines the optimal hyperparameter values to minimize the fitness function.Therefore, all three algorithms work together for the overall network to exhibit efficiency.CNNs are used for capturing complex and non-linear relationships of data.LSTMs learn the long-term dependencies among features.Finally, PSO deploys an efficient global search algorithm to find the global minimum of the fitness function to ensure better model performance (Figure 2).Table 1 depicts the hyperparameter setting for the CNN-LSTM-PSO network.Table 1 depicts the hyperparameter setting for the CNN-LSTM-PSO network.Table 1 depicts the hyperparameter setting for the CNN-LSTM-PSO network.The proposed GRU-CNN hybrid neural network architecture combines GRU and CNN networks.Neural networks are prone to issues like short-term memory, vanishing gradients, and exploding gradients.During back-propagation, the weights are updated; however, if the gradient value becomes significantly tiny, it may not encourage learning.GRU addresses this issue by using gate reset and updating gate operations.The reset gate typically works as a barrier, while the update gate concerns keeping or discarding data.The reset gate also decides the amount of information to be retained.The architecture comprises several blocks, i.e., five Convolutional blocks, one GRU block, and one fully connected block.Convolutional blocks incorporate two convolutional layers and a flatten operation such that a max-pooling layer follows the last layer.After the max-pooling operation, the data is fed to flatten the layer.The number of parameters and filters varies for every block, and the output from the last layer is fed to GRU.The GRU captures long-term dependency and can learn from previous data through the memory cell.The output from this layer is forwarded to the fully connected block, which incorporates convolutional layers with a softmax function.This is used for performing the classification (Figure 3).The proposed GRU-CNN hybrid neural network architecture combines GRU CNN networks.Neural networks are prone to issues like short-term memory, vanis gradients, and exploding gradients.During back-propagation, the weights are upd however, if the gradient value becomes significantly tiny, it may not encourage lear GRU addresses this issue by using gate reset and updating gate operations.The reset typically works as a barrier, while the update gate concerns keeping or discarding The reset gate also decides the amount of information to be retained.The archite comprises several blocks, i.e., five Convolutional blocks, one GRU block, and one connected block.Convolutional blocks incorporate two convolutional layers and a fl operation such that a max-pooling layer follows the last layer.After the max-po operation, the data is fed to flatten the layer.The number of parameters and filters v for every block, and the output from the last layer is fed to GRU.The GRU captures term dependency and can learn from previous data through the memory cell.The ou from this layer is forwarded to the fully connected block, which incorpo convolutional layers with a softmax function.This is used for performing the classific (Figure 3).

AI Fairness Techniques
Algorithmic bias is a prevalent problem, and biased algorithms may lead to unfair and discriminatory decision-making.While bias cannot be completely eliminated, some methods can reduce it significantly.Since people create the models, there is always a risk of amplifying biases.Several AI fairness techniques may be applied to monitor the outliers and apply statistical methods to data to minimize bias.There are three methods of encouraging fairness in AI systems, i.e., during the preprocessing stage, optimizing the models during the training process, and post-processing the algorithm results.In this study, several techniques have been deployed for eliminating bias; some of the fair AI techniques used in classification are listed as follows: Finding the optimal solution in high dimensional space primarily involves minimizing the cost function to reduce error.The heuristic model works as follows: Step 1: Particles adjust their traveling velocity based on past data and other colleagues in the group.
Step 2: Each particle considers the best result locally and globally.
Step 3: Each particle updates the position with respect to the current position, current velocity, the distance between the current position and local position, and the distance between the current position and global position.PSO is used extensively for concurrent processing and requires very few algorithm parameters.It is an efficient global search algorithm that is free of derivatives.

•
Optimization using Adam: The Adam optimizer improves the Stochastic Gradient Descent for updating network weights iteratively during the learning process.It is easy to implement and computationally efficient.It requires little training for the hyperparameters and does not need a lot of memory to run.Adam incorporates the Adaptive Gradient Algorithm (AdaGrad) and the Root Mean Square Propagation (RMSProp).AdaGrad is responsible for maintaining a pre-parameter learning rate to reduce error and improve overall performance.In contrast, RMSProp maintains the learning rates based on an average of recent magnitudes of weights.Adam uses an average of second moments of the gradient and calculates the exponential moving average of the gradient and squared gradient.It also takes into account parameters that are essential for controlling decay rates (beta1 and beta2).Adam has several parameters: Alpha: alpha is the step size or learning rate.Larger values lead to faster learning before rate updation and smaller values lead to slow learning.
Beta1: exponential decay rate (first moment) Beta2: exponential decay rate (second moment) Epsilon: epsilon denotes a minimal number for preventing any division by zero Apart from the techniques mentioned above that have been used in the analysis, some of the other fair AI techniques have been mentioned as follows: • Focal Loss: Focal Loss functions and an extension of cross-entropy are used for handling class imbalance problems by applying a modulating term to the class entropy loss.More weights are assigned to hard or easily misclassified images, which could be imaged with noisy data, partial images, or background images.As the loss contribution from easy examples is reduced, misclassified images get corrected.It is a method of dynamically scaling the cross-entropy loss such that the scaling factor decays to zero.This leads to an increase in confidence in the correct classes.The scaling factor hence weights the impact of easy examples while training, ensuring that the model is focused on hard examples; • Focal Tversky Loss: Neural networks often deploy Tversky loss for image segmentation problems.It is a loss function that is used to handle an imbalance in images, specifically when the number of positive pixels is orders of magnitude smaller than the number of negative pixels.The function is capable of comparing and predicting the outputs of neural networks with the true output.For image segmentation problems, the segmented image acts as the predicted output, whereas the ground truth segmentation of the same image is the true output.Unlike other loss functions like binary cross-entropy loss and dice coefficient, the Tversky loss allows more control over the relationship between precision and recall.It is possible to penalize false negatives more heavily than false positives, and vice versa by adjusting the hyperparameters.Focal Tversky loss is more like a generalized Tversky loss that applies the concept of focal loss for focusing on hard cases with low probabilities.

Experimental Analysis
In this section, the datasets and the evaluation metrics used in this study have been discussed in detail.

Datasets
This study aims at detecting ASD in toddlers and adults using deep learning techniques.For the analysis, two individual datasets have been considered, i.e., Autism Screen Data for toddlers and Autism Screening in adults.Both datasets have been taken from Kaggle.The Autism Screen data for toddlers incorporates influential features for determining autistic traits.Ten behavioral features and other characteristics make up the dimensions of the dataset.The attributes range from A1 to A10, so possible answers to questions like 'Always,' 'Usually,' 'Sometimes,' and 'Never' are mapped to values 1 and 0. A user obtaining more than 3 points exhibits ASD traits.The Autism Screen data for toddlers dataset incorporates 1054 rows and 19 columns, while the Autism Screening data for adults incorporates 704 rows and 21 columns, and the features include the scores (A1-A10) followed by age, gender, ethnicity, jaundice, country of residence, etc.Some common features of both datasets are age, gender, ethnicity, jaundice, etc.The baseline and the proposed deep neural networks are applied to both datasets to analyze ASD in toddlers and adults.Following data preprocessing, the datasets have been split into 80-20 as training and test sets.For analyzing the datasets, Python 3.8 has been deployed along with Tensorflow-GPU (Graphics processing unit).Jupyter Notebook with multiple Python libraries such as Pandas, NumPy, and Scikit-Learn have been used.The processor used for the analysis is the AMD (Advanced Micro Devices) Ryzen 7 4800H with Radeon Graphics (2.90 Gigahertz).The installed RAM (Random Access Memory) has a memory of 64 gigabytes.The models have been compiled using the Adam optimizer.The learning rate was set to 0.0001, and binary cross-entropy was used as the loss function.The network has been trained in a batch size of 10. Figure 4 depicts the features associated with both datasets after the duplicates were dropped and missing values were removed.0. A user obtaining more than 3 points exhibits ASD traits.The Autism Screen data for toddlers dataset incorporates 1054 rows and 19 columns, while the Autism Screening data for adults incorporates 704 rows and 21 columns, and the features include the scores (A1-A10) followed by age, gender, ethnicity, jaundice, country of residence, etc.Some common features of both datasets are age, gender, ethnicity, jaundice, etc.The baseline and the proposed deep neural networks are applied to both datasets to analyze ASD in toddlers and adults.Following data preprocessing, the datasets have been split into 80-20 as training and test sets.For analyzing the datasets, Python 3.8 has been deployed along with Tensorflow-GPU (Graphics processing unit).Jupyter Notebook with multiple Python libraries such as Pandas, NumPy, and Scikit-Learn have been used.The processor used for the analysis is the AMD (Advanced Micro Devices) Ryzen 7 4800H with Radeon Graphics (2.90 Gigahertz).The installed RAM (Random Access Memory) has a memory of 64 gigabytes.The models have been compiled using the Adam optimizer.The learning rate was set to 0.0001, and binary cross-entropy was used as the loss function.The network has been trained in a batch size of 10. Figure 4 depicts the features associated with both datasets after the duplicates were dropped and missing values were removed.

Evaluation Metrics
The following evaluation metrics were used for this study.

Evaluation Metrics
The following evaluation metrics were used for this study.

Results and Observations
This section incorporates three subsections.The first subsection discusses data visualization with respect to the datasets, while the second subsection discusses the results based on the machine-learning models deployed for this study.The third subsection provides a comparative analysis of the proposed work with some related works conducted in the past.

•
Data distribution for ethnicity The data distribution for both toddlers and adults regarding ethnicity has been analyzed using data visualization (Figure 5).Regarding toddlers, the highest number of cases were White Europeans, followed by Asians and Middle Easterners, respectively.Minor cases belonged to Native Indians, Pacifica, and Mixed Races.In the case of adults, the maximum number of cases were White Europeans, and the minimum number of cases were Turkish and others.

Results and Observations
This section incorporates three subsections.The first subsection discusses data visualization with respect to the datasets, while the second subsection discusses the results based on the machine-learning models deployed for this study.The third subsection provides a comparative analysis of the proposed work with some related works conducted in the past.

•
Data distribution for ethnicity The data distribution for both toddlers and adults regarding ethnicity has been analyzed using data visualization (Figure 5).Regarding toddlers, the highest number of cases were White Europeans, followed by Asians and Middle Easterners, respectively.Minor cases belonged to Native Indians, Pacifica, and Mixed Races.In the case of adults, the maximum number of cases were White Europeans, and the minimum number of cases were Turkish and others.Regarding the number of cases depicting ASD traits, in toddlers, the count was highest for White Europeans and Asians (Figure 6).In the case of adults, there is a significant number of White Europeans, Asians, and Middle Easterners who do not show ASD traits.

•
Data distribution showing ethnicity vs. ASD traits.
Regarding the number of cases depicting ASD traits, in toddlers, the count was highest for White Europeans and Asians (Figure 6).In the case of adults, there is a significant number of White Europeans, Asians, and Middle Easterners who do not show ASD traits.

•
Data distribution showing ethnicity vs. ASD traits.
Regarding the number of cases depicting ASD traits, in toddlers, the count was highest for White Europeans and Asians (Figure 6).In the case of adults, there is a significant number of White Europeans, Asians, and Middle Easterners who do not show ASD traits.• Data distribution showing ethnicity vs. gender.
It was also observed that in the case of toddlers, several males showed more ASD traits than females (Figure 7).In most ethnicity distributions, the number of males seems to be higher.In the case of adults, more females in White Europeans, Black people, and South Asians are observed.Other Ethnicities, like Asians and Middle Easterners, have more males.

•
Data distribution showing ethnicity vs. gender.
It was also observed that in the case of toddlers, several males showed more ASD traits than females (Figure 7).In most ethnicity distributions, the number of males seems to be higher.In the case of adults, more females in White Europeans, Black people, and South Asians are observed.Other Ethnicities, like Asians and Middle Easterners, have more males.

•
Data distribution showing ethnicity vs. gender.
It was also observed that in the case of toddlers, several males showed more ASD traits than females (Figure 7).In most ethnicity distributions, the number of males seems to be higher.In the case of adults, more females in White Europeans, Black people, and South Asians are observed.Other Ethnicities, like Asians and Middle Easterners, have more males.

• ASD cases with jaundice
This study also analyzed ASD cases with respect to jaundice and observed that in toddlers, for the number of individuals suffering from jaundice, there was a higher number of cases that exhibited ASD traits (Figure 8).Similarly, for people who were not suffering, there was a significant count of individuals displaying ASD traits.In the case of adults, individuals who did not test for jaundice have a higher number of cases that do not show ASD traits.In these cases, people who have tested positive for jaundice have more individuals who do not display ASD traits.

• ASD cases with jaundice
This study also analyzed ASD cases with respect to jaundice and observed that in toddlers, for the number of individuals suffering from jaundice, there was a higher number of cases that exhibited ASD traits (Figure 8).Similarly, for people who were not suffering, there was a significant count of individuals displaying ASD traits.In the case of adults, individuals who did not test for jaundice have a higher number of cases that do not show ASD traits.In these cases, people who have tested positive for jaundice have more individuals who do not display ASD traits.
This study also analyzed ASD cases with respect to jaundice and observed that in toddlers, for the number of individuals suffering from jaundice, there was a higher number of cases that exhibited ASD traits (Figure 8).Similarly, for people who were not suffering, there was a significant count of individuals displaying ASD traits.In the case of adults, individuals who did not test for jaundice have a higher number of cases that do not show ASD traits.In these cases, people who have tested positive for jaundice have more individuals who do not display ASD traits.

• Age distribution
Another important factor considered for this study is age.It was observed that in the case of toddlers, most cases are above the age of 36 months, i.e., 3 years, while in adults, most of the cases fall between 20 to 30 years of age (Figure 9).Further, the number of cases decreases significantly as age increases.Hence, for toddlers, the signs are exhibited around 3 years, while for adults, strategies, therapies, and treatments aid in getting better with age.

• Age distribution
Another important factor considered for this study is age.It was observed that in the case of toddlers, most cases are above the age of 36 months, i.e., 3 years, while in adults, most of the cases fall between 20 to 30 years of age (Figure 9).Further, the number of cases decreases significantly as age increases.Hence, for toddlers, the signs are exhibited around 3 years, while for adults, strategies, therapies, and treatments aid in getting better with age.   Figure 11 depicts the gender-based distribution for ASD cases in adults.We observe that the number of males showing characteristics of ASD is slightly higher than the number of females considered for this study.The distribution shows 367 males and 337 females possessing ASD characteristics.

Results
In this section, the results are presented based on the experimental analysis and model training for both datasets.Table 1 exhibits the model performance for ASD screening in toddlers, while Table 3 exhibits the model performance for ASD screening in adults.Figure 11 depicts the gender-based distribution for ASD cases in adults.We observe that the number of males showing characteristics of ASD is slightly higher than the number of females considered for this study.The distribution shows 367 males and 337 females possessing ASD characteristics.

Results
In this section, the results are presented based on the experimental analysis and model training for both datasets.Table 1 exhibits the model performance for ASD screening in toddlers, while Table 3 exhibits the model performance for ASD screening in adults.

Results
In this section, the results are presented based on the experimental analysis and model training for both datasets.Table 1 exhibits the model performance for ASD screening in toddlers, while Table 3 exhibits the model performance for ASD screening in adults.Based on the performance evaluations, it was observed that the proposed architecture CNN-LSTM-PSO performs relatively better compared to other models.The next best performance is exhibited by CNN, equipped with relu and Adam as the activation function and optimizer, respectively.CNN-LSTM and GRU-CNN have similar performances.The models with the least accuracies are KNN, LR, and SVC, respectively.Although the proposed models achieve significantly high accuracy, the execution time indicates that the models take more time to train.Figure 12 depicts the comparisons of the models visually.Based on the performance evaluations, it was observed that the proposed architecture CNN-LSTM-PSO performs relatively better compared to other models.Th next best performance is exhibited by CNN, equipped with relu and Adam as th activation function and optimizer, respectively.CNN-LSTM and GRU-CNN have simila performances.The models with the least accuracies are KNN, LR, and SVC, respectively Although the proposed models achieve significantly high accuracy, the execution tim indicates that the models take more time to train.Figure 12 depicts the comparisons of th models visually.Based on the performance evaluations (Table 4), it was observed that the proposed architecture CNN-LSTM-PSO performs relatively better compared to other models.Th next best performance is exhibited by CNN, equipped with relu and Adam as th activation function and optimizer, respectively.CNN-LSTM and GRU have simila performances.The models with the least accuracies are KNN, DT, and SVC, respectively For the second dataset, we observe that although the proposed models achiev Based on the performance evaluations (Table 4), it was observed that the proposed architecture CNN-LSTM-PSO performs relatively better compared to other models.The next best performance is exhibited by CNN, equipped with relu and Adam as the activation function and optimizer, respectively.CNN-LSTM and GRU have similar performances.The models with the least accuracies are KNN, DT, and SVC, respectively.For the second dataset, we observe that although the proposed models achieve significantly high accuracy, the execution time indicates that the models take more time to train.Figure 13 depicts the comparisons of the models visually.significantly high accuracy, the execution time indicates that the models take more time to train.Figure 13 depicts the comparisons of the models visually.

Comparative Analysis
In this section, the comparative analysis of the proposed work is presented with some previous related works.Table 5 summarizes the overall comparison with results.

Comparative Analysis
In this section, the comparative analysis of the proposed work is presented with some previous related works.Table 5 summarizes the overall comparison with results.Based on the overall study, several observations can be made.The overall contributions of the conducted study are as follows: a.In both the datasets, i.e., toddlers and adults, it is observed that the maximum number of cases belonged to White European Ethnicity; b.In the case of toddlers, there are more male cases of ASD than females.In the case of adults, there are more female cases; c.
Jaundice does not have any significant impact on ASD, as the results from both datasets are not consistent; d.Most ASD cases in toddlers are seen around three years, i.e., thirty-six months.Likewise, most ASD cases in adults fall between 20 to 30 years; e.This study proposed two deep neural network architectures, i.e., CNN-LSTM-PSO and GRU-CNN, for ASD detection in toddlers and adults (multiple datasets).While both models display satisfactory results, CNN-LSTM-PSO performs relatively better than other machine-learning algorithms; f.This study considers thirteen algorithms, including traditional ML algorithms, ensembles, and neural networks, performing extensive analysis on ASD; g.The study evaluated all the algorithms using multiple evaluation parameters, i.e., Accuracy, Precision, Recall, and F-1 Score, respectively; h.The optimization techniques considered for the study are Particle Swarm Optimization (combined with CNN-LSTM) and Adam (neural networks).The techniques are useful in minimizing the cost function, thereby improving model accuracy; i.
The study deploys several AI fairness methods to eliminate bias.The bias elimination techniques proposed in this study are feature engineering techniques like handling outliers, transformations, and scaling.Apart from that, the SMOTE technique has been included to solve the class imbalance problem.Finally, we have deployed optimization methods for achieving efficient performance.The data obtained from two independent datasets has been thoroughly analyzed for bias elimination using feature engineering methods.
• Suspected outliers can be dropped if they are caused by human error or data processing errors; • Similarly, properly formatted and validated data leads to improved data quality.It is also essential for protecting applications from inconsistencies like null values, duplicates, incorrect indexing, and incompatible formats, thereby eliminating bias through transformation.

•
Moreover, scaling the features normalizes the range of values of the independent variables.Standardizing data with different scales can eliminate inconsistencies and bias in measuring the same data characteristics.

•
SMOTE has been deployed to handle the data imbalance problem for both the datasets considered for this study; • Finally, we see that optimization yields better accuracy and overall performance of the suggested models.
As the performance of the models improves with the techniques suggested, the observed data points seem much closer to the predicted value, thereby reducing the error and improving the overall accuracy.Hence, it is correct to assume that the bias elimination techniques proposed in this study can indeed reduce the deviation.This results in a much more robust and efficient learning of the deep neural networks proposed.

Conclusions
In this paper, machine-learning methods have been deployed to detect ASD in toddlers and adults.The analysis comprises performing data visualization to identify patterns in data, followed by an extensive study using ML models.Thirteen machine-learning models have been analyzed over two datasets, and the model performance has been evaluated using multiple statistical parameters.This study introduces two novel deep learning architectures, i.e., the CNN-LSTM-PSO model and the GRU-CNN model.It is observed that optimization techniques indeed lead to improvement in model accuracy.Moreover, several AI fairness techniques have been introduced to eliminate bias from the overall system.Some of the techniques include handling outliers, transformations, SMOTE, etc.Based on the experimental analysis and comparative analysis, it is observed that the proposed study advances the existing studies.Some limitations of the conducted research pertain to data availability, model complexity, and training time.Most of the datasets available for ASD research incorporate clinical data for adults and toddlers only.While we find enough data points to perform the analysis, it might not represent the actual number of people diagnosed with the disorder.Hence, the study can be improvised in the future by collecting more data.The models proposed for ASD identification in toddlers and adults are based on deep learning architectures incorporating multiple algorithms, layers, techniques, etc. Regularization techniques may likely overfit the model; hence, eliminating some layers may reduce the model's size and further improve its performance.Due to the involvement of hybrid architectures, multiple layers, and optimization techniques, the training time is comparatively high for deep learning networks.This may be evaded by deploying transfer learning models or other hybrid deep learning models in the future.Moreover, this study may introduce more hybrid deep neural networks and other optimization techniques like the Artificial Bee Colony, Grey Wolf Optimizer, Bat algorithm, etc.Studies show that autism is not confined to humans and may also be seen in animals like dogs and monkeys.Identifying the disorder using ML techniques may lead to a breakthrough in multiple fields.

Future
Internet 2023, 15, x FOR PEER REVIEW

Figure 4 .
Figure 4. Datasets depicting features for Autism Screening in toddlers and adults.
a. Accuracy: accuracy represents the summation ratio of true positives and true negatives divided by all possible prediction outcomes Accuracy = (TP + TN)/(TP + TN + FP + FN) b.Precision: The precision score determines the ratio of correct positive predictions.The positive predictive value measures quality and can determine the success of prediction in the case of imbalance classes.Precision = TP/(FP + TP) c. Recall: Recall identifies correctly predicted positives out of actual positives.It can determine how good the model is at identifying actual positives given all positives in a dataset.It is also referred to as a true positive rate or sensitivity.Recall = TP/(FN + TP)

Figure 4 .
Figure 4. Datasets depicting features for Autism Screening in toddlers and adults.
a. Accuracy: accuracy represents the summation ratio of true positives and true negatives divided by all possible prediction outcomes Accuracy = (TP + TN)/(TP + TN + FP + FN) b.Precision: The precision score determines the ratio of correct positive predictions.The positive predictive value measures quality and can determine the success of prediction in the case of imbalance classes.Precision = TP/(FP + TP) c. Recall: Recall identifies correctly predicted positives out of actual positives.It can determine how good the model is at identifying actual positives given all positives in a dataset.It is also referred to as a true positive rate or sensitivity.Recall = TP/(FN + TP) d.F-1 Score: F-1 score is a function of precision and recall, as it considers both precision and recall for measuring the model performance.It is used when model optimization is the primary concern.F-1 Score = (2 * Precision * Recall )/(Precision + Recall)

Future 21 Figure 5 .
Figure 5. ASD Data distribution for toddlers and adults.• Data distribution showing ethnicity vs. ASD traits.

Figure 5 .
Figure 5. ASD Data distribution for toddlers and adults.

Figure 5 .
Figure 5. ASD Data distribution for toddlers and adults.

Future
Internet 2023, 15, x FOR PEER REVIEW 13 of 21

Figure 7 .
Figure 7. Ethnicity vs. gender for toddlers and adults.

Figure 7 .
Figure 7. Ethnicity vs. gender for toddlers and adults.

Future 21 Figure 9 .
Figure 9. ASD cases with jaundice.•Gender-baseddistribution Figure10depicts the gender-based distribution for ASD cases in toddlers.We observe that the number of males showing characteristics of ASD is significantly higher than the number of females considered for this study.The distribution shows 735 males and 319 females possessing ASD characteristics.

• 21 Figure 10 .
Figure 10 depicts the gender-based distribution for ASD cases in toddlers.We observe that the number of males showing characteristics of ASD is significantly higher than the number of females considered for this study.The distribution shows 735 males and 319 females possessing ASD characteristics.Future Internet 2023, 15, x FOR PEER REVIEW 15 of 21

Figure 11 .
Figure 11.Gender-based distribution for ASD cases in Adults.

Figure 10 .
Figure 10.Gender-based distribution for ASD cases in toddlers.

Figure 11 21 Figure 10 .
Figure11depicts the gender-based distribution for ASD cases in adults.We observe that the number of males showing characteristics of ASD is slightly higher than the number of females considered for this study.The distribution shows 367 males and 337 females possessing ASD characteristics.

Figure 11 .
Figure 11.Gender-based distribution for ASD cases in Adults.

Figure 11 .
Figure 11.Gender-based distribution for ASD cases in Adults.

Figure 12 .
Figure 12.Performance evaluation of ML models for ASD (toddlers).

Figure 12 .
Figure 12.Performance evaluation of ML models for ASD (toddlers).

Figure 13 .
Figure 13.Performance evaluation of ML models for ASD (Adults).

Figure 13 .
Figure 13.Performance evaluation of ML models for ASD (Adults).

Table 2
depicts the hyperparameter setting for the GRU-CNN network.

Table 2 .
Hyperparameter settings for the proposed GRU-CNN model.

Table 2
depicts the hyperparameter setting for the GRU-CNN network.

Table 2 .
Hyperparameter settings for the proposed GRU-CNN model.

•
Handling outliers: Outliers depict significantly high and low values in the dataset and can affect the classification problem.The main techniques for handling outliers are removing, replacing the values, capping, and discretization.Outliers may be eliminated from a distribution; however, removing a large chunk of outliers may be the direct consequence of outliers over multiple variables.Often, outliers can be treated as missing values and imputed using appropriate methods.Maximumand minimum values can be capped and replaced with random values; • Transformations: Transformation techniques are applied to skewed data that do not exhibit a normal distribution.These popular techniques include logarithmic, Square Root, and Box-Cox transformations.Logarithmic transformation is capable of squashing large numbers and expanding small numbers; • Scaling and normalization: Machine-learning algorithms are often sensitive to the scale of input values.Min-max scaling and Standardization/Variance scaling are popular normalization methods.Min-max scaling rescales values between 0 and 1, while standardization/variance scaling ensures that the distribution has a mean of 0 and a variance of 1.This is particularly useful when variables are on a different scale and, therefore, treated differently.Normalization and standardization techniques ensure that the scale is modified to eliminate bias while retaining meaningful information; • SMOTE: Imbalance dataset can lead to flawed classification.The Synthetic Minority Oversampling Technique (SMOTE) is a technique of oversampling minority classes to handle the class imbalance problem.The data points in the minority class are duplicated while not adding new information to the dataset, thereby synthesizing new data from existing data.The algorithm takes samples of the feature space for each target class and its nearest neighbors.The features of the target class are combined with the features of the neighbors to generate new data.The technique generates additional data and makes the samples more general by increasing the percentage of minority samples only; • Optimization using PSO: This study introduces a primary optimization technique as a proposed hybrid neural network architecture component.Particle Swarm Optimization (PSO) is a stochastic optimization technique miming swarms' social interaction and behavior.Each particle in the swarm is denoted by positional coordinates, updating according to the best solution.
Score: F-1 score is a function of precision and recall, as it considers both precision and recall for measuring the model performance.It is used when model optimization is the primary concern.

Table 3 .
Performance evaluation of ML models (ASD in toddlers).

Table 3 .
Performance evaluation of ML models (ASD in toddlers).

Table 4 .
Performance evaluation of ML models (ASD in adults).

Table 4 .
Performance evaluation of ML models (ASD in adults).

Table 5 .
Comparative analysis of the proposed work with previous works.

Table 5 .
Comparative analysis of the proposed work with previous works.