Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage

Akter, Tania; Ali, Mohammad Hanif; Khan, Md. Imran; Satu, Md. Shahriare; Uddin, Md. Jamal; Alyami, Salem A.; Ali, Sarwar; Azad, AKM; Moni, Mohammad Ali

doi:10.3390/brainsci11060734

Open AccessFeature PaperArticle

Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage

by

Tania Akter

^1,2

,

Mohammad Hanif Ali

¹

,

Md. Imran Khan

²

,

Md. Shahriare Satu

³

,

Md. Jamal Uddin

⁴

,

Salem A. Alyami

⁵

,

Sarwar Ali

⁶,

AKM Azad

⁷

and

Mohammad Ali Moni

^8,9,*

¹

Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka 1342, Bangladesh

²

Department of Computer Science and Engineering, Gono Bishwabidyalay, Savar, Dhaka 1344, Bangladesh

³

Department of Management Information Systems, Noakhali Science and Technology University, Sonapur, Noakhali 3814, Bangladesh

⁴

Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj Town Road, Gopalgonj 8100, Bangladesh

⁵

Department of Mathematics and Statistics, Imam Mohammad Ibn Saud Islamic University, Riyadh 13318, Saudi Arabia

⁶

Department of Electrical and Electronics Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh

⁷

School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia

⁸

WHO Collaborating Centre on eHealth, UNSW Digital Health, Faculty of Medicine, University of New South Wales, Sydney, NSW 2052, Australia

⁹

Healthy Aging Theme, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia

^*

Author to whom correspondence should be addressed.

Brain Sci. 2021, 11(6), 734; https://doi.org/10.3390/brainsci11060734

Submission received: 2 April 2021 / Revised: 24 May 2021 / Accepted: 24 May 2021 / Published: 31 May 2021

(This article belongs to the Special Issue Understanding Autism Spectrum Disorder)

Download

Browse Figures

Versions Notes

Abstract

:

Autism spectrum disorder (ASD) is a complex neuro-developmental disorder that affects social skills, language, speech and communication. Early detection of ASD individuals, especially children, could help to devise and strategize right therapeutic plan at right time. Human faces encode important markers that can be used to identify ASD by analyzing facial features, eye contact, and so on. In this work, an improved transfer-learning-based autism face recognition framework is proposed to identify kids with ASD in the early stages more precisely. Therefore, we have collected face images of children with ASD from the Kaggle data repository, and various machine learning and deep learning classifiers and other transfer-learning-based pre-trained models were applied. We observed that our improved MobileNet-V1 model demonstrates the best accuracy of 90.67% and the lowest 9.33% value of both fall-out and miss rate compared to the other classifiers and pre-trained models. Furthermore, this classifier is used to identify different ASD groups investigating only autism image data using k-means clustering technique. Thus, the improved MobileNet-V1 model showed the highest accuracy (92.10%) for k = 2 autism sub-types. We hope this model will be useful for physicians to detect autistic children more explicitly at the early stage.

Keywords:

autism; facial images; MobileNet-V1; classifier; transfer learning; clustering

1. Introduction

Autism is referred to as a complex developmental disability that affects the brain’s ability to process information. It is a neurological disorder characterized by weak social interaction, lack of eye contact and restricted, repetitive and stereotyped behaviors [1]. Although no treatment policy has been invented to recover from autism to date, it can be controlled through early intervention and continued therapies [2]. Therefore, every autistic individual needs to receive proper care, support and attention, which is a very important result of its early detection. This diagnosis may help to recover these symptoms and prevent the distress of the child’s development and other psychological illness. Nevertheless, it is very difficult to detect and diagnose ASD given its complex nature. Numerous studies have been conducted to explore significant features of autism in various ways such as feature extraction [3], eye tracking [4], facial recognition [5,6,7], medical image analysis [8], apps development [9], voice recognition [10] and so on. Among them, facial recognition plays an important role in recognizing a person’s identity or emotional state and can be used to detect autism effectively. It is a popular way to analyze human faces and extract the distinct features between normal and abnormal faces, including mining relevant information to reveal behavior patterns [11,12].

Given advances in predictive analytics of facial pattern recognition, however, some in-depth efforts are currently underway in the space of autistic children’s data analyses to to detect ASD earlier. To automate detection of facial expressions of various neurological disorders, Yolcu et al. [13] proposed a novel convolution neural network (CNN) approach that integrated part-based and holistic information, where the first CNN was trained to segment important facial components and the second one was used to recognize facial expression. Further, they improved their system with four CNN models [7], where the first three CNNs were used to partition the facial components, i.e., eyebrow-segmented, eye-segmented and mouth-segmented images, and the fourth CNN was created for the final iconized images to recognize facial expressions of various neurological disorders. In 2018, Haque and Valles [5] modified Kaggle’s Facial Expression Recognition 2013 (FER2013) dataset of young autistic children with lighting effects (darker or lighter shades of contrast) and recognized human facial expression using a deep CNN. Rudovic et al. [6] proposed a novel deep learning model called CultureNet that investigated video data of 30 children of different cultural a backgrounds in the context of automated engagement estimation using the child-independent and -dependent settings. Joseph et al. [14] represented a real-time emotion recognition method that predicted the emotion of autistic children following Robot-Assisted Therapy (RAT) using Raspberry Pi 3 and deep-learning techniques.

In this study, we propose a transfer-learning-based face recognition framework to detect autistic children more precisely. To do so, we collected a range of facial images of normal and autistic children, and different machine learning, deep learning and other improved pre-trained models were used to analyze them. Then, the performance of these classifiers was evaluated with various measures, i.e., accuracy, AUC, f-measure, g-mean, sensitivity, specificity, fall-out and miss rate. Then, we improved MobileNet-V1, which shows the highest accuracy in this predictive analysis. In summary, our study is fundamentally an image classification task, where a well-trained classification mode (transfer-learning based) can detect autism given an input image of a child. With the advent of high-spec mobile devices, this type of model can readily provide a test diagnostics of putative autistic traits by taking an image with cameras. The contribution of this work is summarized in the following:

We propose an improved transfer-learning-based facial recognition framework that has the potential of yielding high accuracy to identify autistic children.
We explore an improved MobileNet model that shows the best performance among various standard machine learning, deep learning and related pre-trained models.
We focus on a less explored area of image processing and recognition, i.e., identifying ASD-affected children from facial images of normal and autistic children.
We implement and validate a range of different machine learning and deep learning models.
We identify relevant clusters of autistic children.

This paper is organized as follows: Section 2 describes an improved transfer-learning-based autism facial recognition framework that provides insight on how the normal and autistic children’s images are investigated. Then, it outlines the primary facial image dataset, proposed framework along with baseline classifiers, improved MobileNet-V1 and K-means clustering algorithms. Section 3 provides the experimental findings of different classifiers including improved MobileNet-V1, and Section 4 reports the performance validation of the proposed framework and their implications. Finally, Section 5 concludes and offers future research directions.

2. Materials and Methods

In this study, we proposed a transfer learning-based autism facial recognition framework to investigate ASD at an early stage. Therefore, we employed machine learning and deep learning and their pre-trained models that could automatically do robust feature extraction to an extent that is of near-impossible to detect by mere observation due to their subtlety, and we performed classification with them. Moreover, computational analysis of facial scanning features on autism and control faces was indicated as the detector. Hence, improved transfer-learning-based autism face recognition framework and related machine learning methods are described in this section.

2.1. Dataset

We collected 2936 facial images of normal and autistic children from the Kaggle data repository, where 1468 normal and 1468 autistic children images were found. This dataset was originally curated by Piosenka [15], where most of the images are downloaded from websites and Facebook pages related to autism. Note that due to the websites of images, many of them are not of the best quality or consistency with respect to the facial alignment, perspective or image size. Next, a python script was developed that automatically crops images such that the input dimension of facial images was

224 \times 224 \times 3

. In this work, collected the dataset was categorized into three groups, i.e., training, validation and test sets. The training set was used for model training and contained a total of 2536 (86.38%) images, where the numbers of images of autistic and non-autistic children were 1268 each. In addition, the validation set is 3.41%, of which the numbers of autistic and non-autistic children images were 50 each. However, the test set was kept separated with 300 (10.22%) images, where the number of images of autistic and non-autistic children were 150 each. Thus, the primary facial dataset was observed and checked, and no major biases/obstacles (to the best of our knowledge and efforts) for further analysis were found.

2.2. Improved Transfer-Learning-Based Autism Facial Recognition Framework

The objective of this framework was to identify autism cases by analyzing facial patterns of children employing various machine learning methods. Figure 1 illustrates the schematic workflow diagram of this framework, which is described briefly as follows.

Data Acquisition: Firstly, the primary facial image dataset was gathered and cleaned. It contains training, validation and test set for further analysis.
Training Stage: Then, the facial training dataset was used to train CNN, pre-trained improved CNN models and other machine learning classifiers. In this circumstances, the classifiers were called baseline classifiers.
Performance of Validation and Test Set: In this training process, various classifiers were employed in the validation set to assess the performance of training activities. Consequently, baseline classifiers were also implemented into the test set and to evaluate of the performance of classifiers. Notably, we considered the distribution of face scanning coordinates as a discriminating features for autism classification.
Evaluation Metrics: In this evaluation process, different metrics such as accuracy, area under curve (AUC), f-measure, g-mean, sensitivity, specificity, fall-out and miss-rate were used to verify the result of various classifiers.
Comparison and Evaluation: After evaluating the performance of individual classifiers in the validation and test set, improved MobileNet-V1 shows the best result among all other classifiers.
Clustering-based Autism Sub-typing: According to previous studies [16,17,18], several sub-types were generated from patients with autism for further medical and neurological analysis. The motivation for conducting clustering on the autistic faces only was to explore different autism sub-types based on quantitative analysis to help to extract and investigate distinguishable significant features of autism. This approach reduces data dimensionality and wraps strong and significant features that can appear in multiple clusters [19]. Therefore, this procedure can lessen false-positive and false-negative results of autism for further supervised learning tasks. To generate the number of clusters/groups from the working dataset, k-means algorithm (see details in Section 2.5) was widely employed in the different fields of machine learning [16,20,21,22]. In this work, this algorithm was implemented into only autistic facial images (e.g., all autism images were gathered from training, validation and testing sets) and generated numerous autism sub-types in each iteration by changing the values of k from 2 to 10. Then, those sub-types were considered as individual class labels to further demonstrate the predictability of our selected improved MobileNet-V1 (i.e., the best performing model in classifying autistic/normal faces in our first experiment). Note that the problem then became a multi-class classification problem (i.e., it depends on the value of k, e.g., k = 3 would be a 3-class classification and so on), rather than a simple binary-class classification problem (i.e., autistic/normal faces). Note, this multi-class classification task only uses the autistic faces (i.e., no controls were used) for training, validating and testing the best performing improved MobileNet-V1. In addition, this classification was followed 10-fold cross validation and best sub-types has been selected based on accuracy of the classifier.

To realize the overall structure of proposed framework, it is required to know details about its working methods. These settings are helpful to understand how this framework can detect autism more precisely and generate highest results by the improved model. Therefore, we outline associated machine learning methods of this work in detail as follows.

2.3. Baseline Classifiers

In this work, many widely used classification algorithms were implemented in the primary facial datasets. The proposed framework uses 17 classifiers, 10 of which are machine learning models [23,24], and rest of them are deep and pre-trained transfer learning models. Hence, several machine learning classifiers are used such as Adaboost [25], Decision Tree (DT) [26,27], Gradient Boosting (GB) [28], K-Nearest Neighbour (KNN) [29], Logistic Regression (LR) [30], Multi-layer Perceptron (MLP) [31], Naïve Bayes (NB) [32], Random Forest (RF) [33], Support Vector Machine (SVM) [34], Gradient Boosting (XGB) [35], Convolutional Neural Network (CNN) and pre-trained CNNs are DenseNet121 [36], ResNet50 [33], VGG16 [37], VGG19 [38], MobileNet-V1 [39] and MobileNet-V2 [40]. However, the results of default pre-trained models were not promising; hence, we appended several additional layers in each of models. In each transfer learning model, three batch normalization (BN) and two fully connected (FC) layers are appended one after another before the output layer to classify facial image into the autism/normal group. Thus, BN layers kept their default setting. Furthermore, the first FC layer is applied with 128 neurons (dense) and second FC layer is employed with 16 neurons (dense). All of these classifiers which are used in this work are called baseline classifiers.

2.4. Improved MobileNet-V1

Improved MobileNet-V1 model is the enhanced version of traditional MobileNet-V1 (see Appendix A.1) with some additional layers which illustrates in details at Figure 2. In this MobileNet-V1, we have augmented several additional layers to increase the performance of this model. The batch normalization layer is used to normalize the output of global average pooling layer by re-centering and re-scaling input values. Like other transfer learning models, it also append three batch normalization (BN), two fully connected (FC) layers one after another before output layer (see Figure 2). Regarding the dimensions, the primary facial images had their input dimensions as

224 \times 224 \times 3

each, with the input filter dimension as

3 \times 3 \times 3 \times 32

(for detail, please see in Andrew et al. [39]). When we apply the improved MobileNet-V1, the dimensions of input images are reduced according to the regular conversion of MobileNet-V1 depending on the “depthwise separable convolution operation” (see Figure 2). Finally, it generates one-dimensional output for the given input image, depending on the number of classes to be predicted (i.e., binary classification or multi-class classification).

2.5. K-Means Clustering

K-Means Clustering is an unsupervised learning method that classifies an unlabeled dataset into various clusters. It is a centroid-based technique, where each cluster is connected with centroid values. Therefore, the sum of the distance between the data points and their corresponding clusters is minimized using this algorithm. At first, the number of clusters and centroids are specified in this algorithm. Then, each data point is assigned to the closest centroids. Repeat this process until all data points are assigned into different clusters.

3. Experimental Result

To evaluate the performance of baseline classifiers in the proposed framework, we implemented several machine learning classifiers named AdaBoost, DT, GB, KNN, LR, MLP, NB, RF, SVM, XGB using sci-kit learn, CNN and improved pre-trained models of CNNs such as DenseNet121, ResNet50, VGG16, VGG19, MobileNet-V1 and MobileNet-V2 using Keras library in python [23,41,42], respectively. Therefore, all computations were manipulated at Google Colaboratory [43]. The training set was used to train individual classification models as well as the validation set along with the test set in order to evaluate classifiers’ performances. Several evaluation metrics like accuracy, AUC, f-measure, g-mean, sensitivity, specificity, fall-out, and miss rate were calculated to measure the performance of these classifiers (see details of metrics in Appendix A.2). The details of experimental results of validation and test set are provided in Table 1 and Table 2, respectively.

3.1. Evaluation the Result of Validation Set

In the validation set, the improved MobileNet-V1 model provides the highest accuracy, AUC, f-measure, g-mean, sensitivity and specificity as shown in Table 1. It displays 83% accuracy, AUC, f-measure, g-mean, sensitivity and specificity and 17% fall-out and miss rate. In this case, ResNet50 shows the second-highest (80%) accuracy, AUC, f-measure, g-mean, sensitivity and specificity, whereas its fall-out and miss rate are 20%. Other classifiers produce less than 80% and greater than 60% results except fall-out and miss rate in all evaluation metrics separately. Hence, they show that their fall-out and miss rates are below or equal to 40% and above or equal to 20%, respectively. In Figure 3b, ROC curves of individual classifiers for validation set are represented. In these curves, the improved MobileNet-V1 model outperforms than other classifiers. Altogether, this model can be suggested to classify autism and normal patients by investigating their faces more efficiently.

3.2. Evaluation the Result of Test Set

Table 1 and Table 2 show the results of individual classifiers for the test set. Again, improved MobileNet-V1 generates the best result (90.67%) for all evaluation metrics. In addition, this model provides the lowest fall-out and miss rates and outperforms other classifiers. Among other models, Dense-Net121 demonstrates the second best prediction, which provides 83.67% accuracy, AUC, f-measure, g-mean, sensitivity and specificity as well as fall-out and miss rate of 16.33%. Then, ResNet50 shows almost 80% accuracy, AUC, f-measure, g-mean, sensitivity and specificity and presents below 20% fallout and miss rate. Other classifiers do not show better performance, where they represent below 80% and upon 65% results along with error rate between 23% to 34%. In Figure 3a, ROC curves of individual classifiers for test set are also represented where the improved MobileNet-V1 model shows the highest result in this analysis.

Therefore, improved MobileNet-V1 shows the best outcomes among all general classifiers, CNN and also improved pre-trained models. In addition, it can be justified based on the accuracy of individual base (traditional) pre-trained models for ImageNet [44] (see results in Table 3). In this case, we obtain the top one and top five accuracy of pre-trained models for imageNet and the accuracy of improved pre-trained models for the validation and test set of the facial autism dataset. DenseNet121 shows the best accuracy for ImageNet, but it does not show better results for the autism dataset. On the other hand, the improved MobileNet-V1 [39] shows the maximum outcomes for the autism dataset. ImageNet is an image database organized according to WordNet hirarchy that contains 1,281,167 images for training and 50,000 images for validation, organized into 1000 classes [44]. However, our autism facial dataset is quite small in size compared to the ImageNet database. Hence, better-performing transfer learning models for large scale datasets do not show the performance in this work. Instead, the improved MobileNet-V1 model is able to investigate a small number of data more precisely than other pre-trained models. It also provides more suitable results for improved Mobile-V2 in this work.

Therefore, we compared the results of the improved MobileNet-V1 with its base (traditional) MobileNet-V1 for validation and test sets. Again, the improved MobileNet-V1 outperforms its base version for both of the datasets (see Table 4).

However, the reason we did the clustering on the autistic faces only, was to identify different autism sub-types depending of the values of k in the k-means algorithm. Then, what follows is that those sub-types were considered as individual “class labels” to further demonstrate the predictability (train and test) of our selected improved MobileNet-V1 (i.e., the best performing model in classifying autistic/normal faces in our first experiment). In that light, the accuracy that we refer to (for example, the best accuracy of 92.10% for k = 2), was from the multi-class classification tasks that we formulated on the “improved MobileNet-V1” (i.e., applying 10-fold cross validation), using each of the clusters (i.e., autism sub-types) as an individual class label. Note that this multi-class classification task only uses the autism faces (i.e., no controls were used) for training, validating and testing the best-performing “improved MobileNet-V1”. Figure 4 shows the accuracy of multi-class classification applying improved MobileNet-V1 when it scrutinized only autism sub-types for k = 2, 3, ..., 10.

4. Discussion

Previously, some researchers worked with various facial images to recognize discrete characteristics using machine and deep learning. Yolcu et al. [7,13] inspected Radboud Faces Database (RaFD) amalgamating 4-channels using CNN cascade and gained 94.44% and 93.43% accuracy, respectively. It has been useful to monitor and diagnosevarious types of neurological disorders influencing facial expressions. However, RaFD is a set of pictures of controls, and how they analyze neurological disorder from the facial images of controls cannot properly be explained. In addition, autism is a special kind of neurological disorder that cannot be detected by applying the general estimating process of other disorders. In addition, Haque and Valles [5] enforced deep CNN into FER2013 dataset from Kaggle for recognizing facial emotion and found their accuracy 63.11%. Again, Jain et al. [45] utilized the FER2013 dataset from Kaggle using the suggested CNN-RNN+ReLU algorithm and obtained 94.46% accuracy. Joseph et al. [14] implemented SqueezeNet algorithm into Cohn–Kanade and Japanese Female Facial Expression dataset and obtained 75% accuracy. They tried to recognize emotions using the social robot and predict the behavior of ASD children. Nevertheless, all of these works had used control facial images, which were not appropriate to detect autism. Moreover, early detection of autistic and non-autistic children are not normally performed using their emotional activities, as far as our current study is concerned.

In order to perceive ASD, many research works have been conducted so far, but there is still some rooms for identifying and improving this condition effectively [46]. In this work, we proposed a framework that investigates facial images (e.g., autism/normal images) of children employing various machine learning, deep learning and pre-trained transfer learning models. Improved MobileNet-V1 reveals the best predictive performance for training, validation and testing images in all evaluation metrics. After that, numerous autism sub-types were extracted from k-means clustering, and therefore, multi-class classification was performed using each of the clusters (i.e., autism sub-types) as an individual class label. Generally speaking, it is an extremely difficult task to differentiate the true autistic child face from an intentional funny face for any classification model at hand. However, while training, the Convolutional Neural Network (ConvNet/CNN) models with substantially “DEEP”-architecture (i.e., large number of hidden layers) are able to scan the images to extract subtle features from a facial image grid, with a series of convolution-pooling-dropout-batch normalization (optional) operations before finally flattening them with a fully-connected layer followed by the output (prediction) layer. In this process, a deep ConvNet can learn dominant autistic traits (image features), some of which are difficult (or impossible) to be mimicked even with intentional funny poses, including wide-set eyes, the groove below the nose, above the top lip, wide forehead and smaller mid-face [47]—as a single feature in combination. More specifically, we assume that these types of extracted features and their likelihoods of being present in combination within the false positives are much less likely. However, we can not completely rule out the selection bias of images that are used for training at the first place, but deep learning models are generally robust enough to address this issue if a huge number of data (big data) are used for training. Therefore, we consider augmenting further high-quality data sets in our future studies to potentially improve our models generalization capabilities. However, the MobileNet is a CNN architecture model for image classification and mobile vision where this structure is perfectly fit with restricted resources like Mobile devices, embedded systems and computers without GPU or low computational efficiency, with web browsers having limitation over computation, graphic processing and storage. In addition, it is much faster than regular convolution with approximately same result. Therefore, improved MobileNet-V1 is provided more accurate performance than other methods.

We argue that, like other successful applications of transfer-learning models, our study can also facilitate highly accurate prediction of autistic traits from images of children, which has been manifested in our results (i.e., the improved MobileNet-V1 model showed the highest accuracy (92.10%) for k = 2 autism sub-types). Therefore, the multi-class classification task only uses the autistic faces (i.e., no controls were used) for training, validating and testing the best-performing improved MobileNet-V1. However, we did the clustering on the autistic faces only, which was to identify different autism sub-types (i.e., depending of the values of k in k-means algorithm).

However, this study is not fundamentally designed for emotional/behavioral face detection; it detects autism/non-autism from static images. Moreover, we have employed deep learning models that can automatically do robust feature extraction, to an extent that is near-impossible to detect by mere observation due to their subtlety, and performed classification with them. We focused on categorizing autism by investigating static images; hence, video sequence analysis was not included in the experiment.

It is very important to detect autism and ensure taking proper steps at an early stage, for which our proposed framework can play an important role in the medical sector to estimate such types of neurological disorder more quickly than existing systems. Moreover, transfer learning models are more compatible with handheld devices such as smartphones, tablets, etc. Hence, they can be easily incorporated in a mobile app, which will give more assistance to for health workers or physicians to identify autism at an autism resource center. In addition, parents can use this app to recognize autism quickly and more precisely at home. In this process, we could take facial images of children by a smartphone camera, manipulate and identify these cases more quickly than any other procedure. Again, this model is very simple, cost-effective and needs low resources to implement into devices.

5. Conclusions and Future Works

In this work, we proposed a well-assembled transfer learning based autism face recognition framework, where the improved MobileNet-V1 model shows the best result (83% accuracy for the validation set and 91% accuracy for the test set) in a range of state-of-the-art machine learning and deep learning models to recognize control and autistic children of heterogeneous sources more accurately. Later, k-means clustering method has been applied to autism faces to fabricate various sub-types (i.e., depending of the values of k in k-means algorithm) and used improved MobileNet-V1 to predict high accuracy (92.10%) for binary sub-types (i.e., k = 2) in this system. The proposed framework can play a significant role in early autism detection and can be as a useful tool for physicians and health-workers. It can also be used to detect autism without or with less training in the domestic environment. Some limitations are noted in the proposed framework; for instance, a few facial images has been used and their quality is not promising. However, we cannot associate this work with activity recognition/video sequencing/3D images analysis to formulate more auspicious results. In addition, improved MobileNet-V1 has not provided more stable predictive performance for greater number of autistic sub-types. In the future, we will diminish these shortcomings by ameliorating this model with more standard facial images and mingled dynamic recognition techniques (e.g., activity/motion picture/video sequence identification) to perceive autism more precisely. Moreover, we will further develop our “improved MobileNet-V1” to have more stable predictive performances for a greater number of autistic sub-types (i.e., larger values of K in the K-means algorithm); i.e., we will boost multi-class classification tasks to obtain more accurate predictive performance to define autism sub-types.

Author Contributions

Conceptualization, T.A., M.H.A. and M.S.S.; methodology and software, M.I.K.; validation and formal analysis, M.H.A., M.S.S. and M.J.U.; resources, M.S.S.; data curation, M.I.K.; writing—original draft preparation, T.A. and M.S.S., writing—review and editing, A.A., S.A., S.A.A. and M.A.M.; visualization, M.I.K.; supervision, M.A.M.; funding acquisition, S.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

No external funding has been received.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The data used in this paper is available in the references in Section 2.1.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Traditional MobileNet-V1

Traditional MobileNet-V1 contains a core layer to perform depth-wise separable convolutions that is used to reduce model size and complexity, which is followed by a point-wise convolution. A standard convolution layer takes input of

D_{F} \times D_{F} \times M

with feature map F and produces

D_{G} \times D_{G} \times N

with feature map G. In this case, it is parameterized by

D_{K} \times D_{K} \times M \times N

. For standard convolution including stride and padding, the output feature map and its computational cost Q can be calculated as:

G_{k, l, n} = \sum_{i, j, m} K_{i, j, m, n} F_{k + i - 1, l + j - 1, m}

(A1)

Q = D_{K} \times D_{K} \times M \times N \times D_{F} \times D_{F}

(A2)

where

D_{F}, D_{K}

and

D_{G}

are the spatial length (width and height) of a square input feature map, kernel and output feature map, respectively. In addition, M is the number of input channels, N is the number of output channels,

D_{K} \times D_{K}

is the kernel size and

D_{F} \times D_{F}

is the feature map size. In depth-wise separable convolution, it uses one filter for each input channel, which can be defined as:

{\tilde{G}}_{k, l, m} = \sum_{i, j} {\tilde{K}}_{i, j, m} F_{k + i - 1, l + j - 1, m}

(A3)

Here,

\tilde{K}

is the depth-wise convolution kernel with

D_{K} \times D_{K} \times M

size. Therefore, mth filters in

\tilde{K}

are implemented and generated mth channel in F and

\tilde{G}

feature map, respectively. Furthermore, point-wise convolution is represented by

1 \times 1

convolution, which reshapes this dimension. Hence, the whole operational cost P of depth-wise separable convolution is defined by:

P = D_{K} \times D_{K} \times M \times D_{F} \times D_{F} + M \times N \times D_{F} \times D_{F}

(A4)

MobileNet makes the usage of

3 \times 3

depth-wise separable convolution, which is 8 to 9 times faster than a standard convolution. The reduction cost is calculated by:

\begin{matrix} \frac{P}{Q} & = \frac{D_{K} \times D_{K} \times M \times D_{F} \times D_{F} + M \times N \times D_{F} \times D_{F}}{D_{K} \times D_{K} \times M \times N \times D_{F} \times D_{F}} \\ = \frac{1}{N} + \frac{1}{D_{K}^{2}} \end{matrix}

(A5)

The MobileNet topology is shown in Figure A1 where the batch norm (BN) and rectified linear unit (ReLU) are connected sequentially after each convolution layer except full connected linear layer.

Figure A1. A standard convolution (left panel) and a depth-wise separable convolution (right panel).

Therefore, this structure uses two model-shrinking hyper-parameters, namely width multiplier,

α

and resolution multiplier,

ρ

. The role of

α

is to thin the network uniformly at each layer. The cost of the depth-separable convolution with

α

is calculated by:

D_{K} \times D_{K} \times α M \times D_{F} \times D_{F} + α M \times α N \times D_{F} \times D_{F}

(A6)

where

α \in (0, 1]

has typical settings of 1, 0.75, 0.50 and 0.25, respectively. Here,

α = 1

is used for the baseline MobileNet and

α < 1

is used for reduced MobileNets. The second hyper-parameter is called resolution multiplier, and

ρ

mitigates resources of the neural network. Hence, the computational cost of our network with

α

and

ρ

can be represented as:

D_{K} \times D_{K} \times α M \times ρ D_{F} \times ρ D_{F} + α M \times α N \times ρ D_{F} \times ρ D_{F}

(A7)

where

ρ \in (0, 1]

, which is usually set implicitly so that the input resolution of the network is 224, 192, 160 or 128. Here,

ρ = 1

is used for the baseline MobileNet and

ρ < 1

for the cost optimized MobileNets. Using multipliers reduces tremendously the multi-adds and parameters, but the accuracy is slightly less in the traditional MobileNet-V1.

Appendix A.2. Performance Evaluation

In this work, we observed and justified corresponding results considering some evaluation metrics such as accuracy, area under the curve (AUC), f-measure, g-mean, sensitivity, specificity, fall-out and miss rate. These metrics are calculated using true positive (TP), true negative (TN), false positive (FP) and false negative (FN), which are defined as follows:

Accuracy determines the rate of the classification correctly. It is computed by the proportion of the sum of the TP and TN against the total number of the population.

$Accuracy = \frac{T P + T N}{T P + F N + F P + T N}$

(A8)
Area Under Curve (AUC) measures the probabilities of how well the positive classes are isolated from the negative classes.

$AUC = \int_{x = 0}^{1} T P R (F P R^{- 1} (x)) d x$

(A9)
F-Measure computes the harmonic mean of precision and recall.

$\begin{matrix} F - Measure & = 2 \times \frac{precision \times recall}{precision + recall} \\ = \frac{T P}{T P + \frac{1}{2} (F P + F N)} \end{matrix}$

(A10)
Geometric Mean (G-mean) is the product root of class-wise sensitivity and is calculated as the squared root of the product of the sensitivity and specificity.

$G - mean = \sqrt{\frac{T P}{T P + F N} \times \frac{T N}{T N + F P}}$

(A11)
Sensitivity is determined by the proportion of the positive status against the positive predicted status.

$Sensitivity = \frac{T P}{T P + F N}$

(A12)
Specificity is computed as the proportion of the negative status against the predicted negative status.

$Specificity = \frac{T N}{T N + F P}$

(A13)
Fall-Out//false-positive rate is calculated as the ratio of TP and the summation of FP and TN.

$Fall - Out = \frac{F P}{F P + T N}$

(A14)
Miss Rate:Miss Rate/false negative rate measures the ratio of FN and the summation of FN and TP.

$Miss Rate = \frac{F N}{F N + T P}$

(A15)

References

Goh, K.L.; Morris, S.; Rosalie, S.; Foster, C.; Falkmer, T.; Tan, T. Typically developed adults and adults with autism spectrum disorder classification using centre of pressure measurements. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 844–848, ISSN 2379-190X. [Google Scholar] [CrossRef]
Speaks, A. What is autism. Retrieved Novemb. 2011, 17, 2011. [Google Scholar]
Satu, M.S.; Farida Sathi, F.; Arifen, M.S.; Hanif Ali, M.; Moni, M.A. Early Detection of Autism by Extracting Features: A Case Study in Bangladesh. In Proceedings of the 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 10–12 January 2019; pp. 400–405. [Google Scholar] [CrossRef]
Guillon, Q.; Hadjikhani, N.; Baduel, S.; Rogé, B. Visual social attention in autism spectrum disorder: Insights from eye tracking studies. Neurosci. Biobehav. Rev. 2014, 42, 279–297. [Google Scholar] [CrossRef]
Haque, M.I.U.; Valles, D. A Facial Expression Recognition Approach Using DCNN for Autistic Children to Identify Emotions. In Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 1–3 November 2018; pp. 546–551. [Google Scholar] [CrossRef]
Rudovic, O.; Utsumi, Y.; Lee, J.; Hernandez, J.; Ferrer, E.C.; Schuller, B.; Picard, R.W. CultureNet: A Deep Learning Approach for Engagement Intensity Estimation from Face Images of Children with Autism. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 339–346, ISSN 2153-0866. [Google Scholar] [CrossRef]
Yolcu, G.; Oztel, I.; Kazan, S.; Oz, C.; Palaniappan, K.; Lever, T.E.; Bunyak, F. Facial expression recognition for monitoring neurological disorders based on convolutional neural network. Multimed. Tools Appl. 2019, 78, 31581–31603. [Google Scholar] [CrossRef]
Akter, T.; Satu, M.S.; Barua, L.; Sathi, F.F.; Ali, M.H. Statistical Analysis of the Activation Area of Fusiform Gyrus of Human Brain to Explore Autism. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 2017, 15, 331–337. [Google Scholar]
Satu, M.S.; Azad, M.S.; Haque, M.F.; Imtiaz, S.K.; Akter, T.; Barua, L.; Rashid, M.; Soron, T.R.; Al Mamun, K.A. Prottoy: A Smart Phone Based Mobile Application to Detect Autism of Children in Bangladesh. In Proceedings of the 2019 4th International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 20–22 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
Schelinski, S.; Borowiak, K.; von Kriegstein, K. Temporal voice areas exist in autism spectrum disorder but are dysfunctional for voice identity recognition. Soc. Cogn. Affect. Neurosci. 2016, 11, 1812–1822. [Google Scholar] [CrossRef] [Green Version]
Jiang, X.; Chen, Y.F. Facial Image Processing. In Applied Pattern Recognition; Bunke, H., Kandel, A., Last, M., Eds.; Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2008; pp. 29–48. [Google Scholar] [CrossRef]
Garcia, C.; Ostermann, J.; Cootes, T. Facial Image Processing. Eurasip J. Image Video Process. 2008, 2007, 1–2. [Google Scholar] [CrossRef] [Green Version]
Yolcu, G.; Oztel, I.; Kazan, S.; Oz, C.; Palaniappan, K.; Lever, T.E.; Bunyak, F. Deep learning-based facial expression recognition for monitoring neurological disorders. In Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 13–16 November 2017; pp. 1652–1657. [Google Scholar] [CrossRef]
Joseph, L.; Pramod, S.; Nair, L.S. Emotion recognition in a social robot for robot-assisted therapy to autistic treatment using deep learning. In Proceedings of the 2017 International Conference on Technological Advancements in Power and Energy (TAP Energy), Kollam, India, 21–23 December 2017; pp. 1–6. [Google Scholar] [CrossRef]
Piosenka, G. Detect Autism from a Facial Image. Available online: https://cutt.ly/ibIXt5a (accessed on 26 December 2020).
Hu, V.W.; Steinberg, M.E. Novel clustering of items from the Autism Diagnostic Interview-Revised to define phenotypes within autism spectrum disorders. Autism Res. Off. J. Int. Soc. Autism Res. 2009, 2, 67–77. [Google Scholar] [CrossRef] [Green Version]
Ellegood, J.; Anagnostou, E.; Babineau, B.; Crawley, J.; Lin, L.; Genestine, M.; DiCicco-Bloom, E.; Lai, J.; Foster, J.A.; Peñagarikano, O.; et al. Clustering autism-using neuroanatomical differences in 26 mouse models to gain insight into the heterogeneity. Mol. Psychiatry 2015, 20, 118–125. [Google Scholar] [CrossRef]
Vargason, T.; Frye, R.E.; McGuinness, D.L.; Hahn, J. Clustering of co-occurring conditions in autism spectrum disorder during early childhood: A retrospective analysis of medical claims data. Autism Res. 2019, 12, 1272–1285. [Google Scholar] [CrossRef]
Baadel, S.; Thabtah, F.; Lu, J. A clustering approach for autistic trait classification. Inform. Health Soc. Care 2020, 45, 309–326. [Google Scholar] [CrossRef]
Satu, M.S.; Khan, M.I.; Mahmud, M.; Uddin, S.; Summers, M.A.; Quinn, J.M.W.; Moni, M.A. TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets. Knowl. Based Syst. 2021, 226, 107126. [Google Scholar] [CrossRef]
Ruzich, E.; Allison, C.; Smith, P.; Watson, P.; Auyeung, B.; Ring, H.; Baron-Cohen, S. Subgrouping siblings of people with autism: Identifying the broader autism phenotype. Autism Res. 2016, 9, 658–665. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Stevens, E.; Atchison, A.; Stevens, L.; Hong, E.; Granpeesheh, D.; Dixon, D.; Linstead, E. A Cluster Analysis of Challenging Behaviors in Autism Spectrum Disorder. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 661–666. [Google Scholar] [CrossRef] [Green Version]
Akter, T.; Khan, M.I.; Ali, M.H.; Satu, M.S.; Uddin, M.J.; Moni, M.A. Improved Machine Learning based Classification Model for Early Autism Detection. In Proceedings of the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Online, 5–7 January 2021; pp. 742–747. [Google Scholar] [CrossRef]
Akter, T.; Ali, M.H.; Khan, M.I.; Satu, M.S.; Moni, M.A. Machine Learning Model To Predict Autism Investigating Eye-Tracking Dataset. In Proceedings of the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Online, 5–7 January 2021; pp. 383–387. [Google Scholar] [CrossRef]
Gudipati, V.K.; Barman, O.R.; Gaffoor, M.; Abuzneid, A. Efficient facial expression recognition using adaboost and haar cascade classifiers. In Proceedings of the 2016 Annual Connecticut Conference on Industrial Electronics, Technology Automation (CT-IETA), Bridgeport, CT, USA, 14–15 October 2016; pp. 1–4. [Google Scholar] [CrossRef]
Salmam, F.Z.; Madani, A.; Kissi, M. Facial Expression Recognition Using Decision Trees. In Proceedings of the 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), Beni Mellal, Morocco, 29 March–1 April 2016; pp. 125–130. [Google Scholar] [CrossRef]
Howlader, K.C.; Satu, M.S.; Barua, A.; Moni, M.A. Mining Significant Features of Diabetes Mellitus Applying Decision Trees: A Case Study In Bangladesh. bioRxiv 2018, 481994. [Google Scholar] [CrossRef]
Huynh, X.P.; Park, S.M.; Kim, Y.G. Detection of Driver Drowsiness Using 3D Deep Neural Network and Semi-Supervised Gradient Boosting Machine. In Computer Vision–ACCV 2016 Workshops; Lecture Notes in Computer Science; Chen, C.S., Lu, J., Ma, K.K., Eds.; Springer: Cham, Switzerland, 2017; pp. 134–145. [Google Scholar] [CrossRef]
Zennifa, F.; Ageno, S.; Hatano, S.; Iramina, K. Hybrid System for Engagement Recognition During Cognitive Tasks Using a CFS + KNN Algorithm. Sensors 2018, 18, 3691. [Google Scholar] [CrossRef] [Green Version]
Liu, T.L.; Wang, P.W.; Yang, Y.H.C.; Shyi, G.C.W.; Yen, C.F. Association between Facial Emotion Recognition and Bullying Involvement among Adolescents with High-Functioning Autism Spectrum Disorder. Int. J. Environ. Res. Public Health 2019, 16, 5125. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Thepade, S.D.; Abin, D. Face Gender Recognition Using Multi Layer Perceptron with OTSU Segmentation. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 16–18 August 2018; pp. 1–5. [Google Scholar] [CrossRef]
Chen, Y.P.; Liu, C.H.; Chou, K.Y.; Wang, S.Y. Real-time and low-memory multi-face detection system design based on naive Bayes classifier using FPGA. In Proceedings of the 2016 International Automatic Control Conference (CACS), Taichung, Taiwan, 9–11 November 2016; pp. 7–12. [Google Scholar] [CrossRef]
Jarraya, S.K.; Masmoudi, M.; Hammami, M. A comparative study of Autistic Children Emotion recognition based on Spatio-Temporal and Deep analysis of facial expressions features during a Meltdown Crisis. Multimed. Tools Appl. 2021, 80, 83–125. [Google Scholar] [CrossRef]
Mostafa, S.; Yin, W.; Wu, F.X. Autoencoder Based Methods for Diagnosis of Autism Spectrum Disorder. In Computational Advances in Bio and Medical Sciences; Lecture Notes in Computer Science; Măndoiu, I., Murali, T.M., Narasimhan, G., Rajasekaran, S., Skums, P., Zelikovsky, A., Eds.; Springer: Cham, Switzerland, 2020; pp. 39–51. [Google Scholar] [CrossRef]
Vinay, A.; Gupta, A.; Kamath, V.R.; Bharadwaj, A.; Srinivas, A.; Murthy, K.N.B.; Natarajan, S. Facial Analysis Using Jacobians and Gradient Boosting. In Mathematical Modelling and Scientific Computing with Applications; Springer Proceedings in Mathematics & Statistics; Manna, S., Datta, B.N., Ahmad, S.S., Eds.; Springer: Singapore, 2020; pp. 393–404. [Google Scholar] [CrossRef]
Tong, X.; Sun, S.; Fu, M. Data Augmentation and Second-Order Pooling for Facial Expression Recognition. IEEE Access 2019, 7, 86821–86828. [Google Scholar] [CrossRef]
Yang, B.; Cao, J.; Ni, R.; Zhang, Y. Facial Expression Recognition Using Weighted Mixture Deep Neural Network Based on Double-Channel Facial Images. IEEE Access 2018, 6, 4630–4640. [Google Scholar] [CrossRef]
Raghavendra, R.; Raja, K.B.; Venkatesh, S.; Busch, C. Transferable Deep-CNN Features for Detecting Digital and Print-Scanned Morphed Face Images. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1822–1830, ISSN 2160-7516. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2019, arXiv:1801.04381. [Google Scholar]
Chollet, F. Keras: The Python Deep Learning Library; Astrophysics Source Code Library, 2018; p. ascl:1806.022. Available online: https://en.wikipedia.org/wiki/Astrophysics_Source_Code_Library (accessed on 1 April 2021).
Satu, M.S.; Rahman, S.; Khan, M.I.; Abedin, M.Z.; Kaiser, M.S.; Mahmud, M. Towards Improved Detection of Cognitive Performance Using Bidirectional Multilayer Long-Short Term Memory Neural Network. In Brain Informatics; Lecture Notes in Computer Science; Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N., Eds.; Springer: Cham, Switzerland, 2020; pp. 297–306. [Google Scholar] [CrossRef]
Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Bisong, E., Ed.; Apress: Berkeley, CA, USA, 2019; pp. 59–64. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255, ISSN 1063-6919. [Google Scholar] [CrossRef] [Green Version]
Jain, N.; Kumar, S.; Kumar, A.; Shamsolmoali, P.; Zareapoor, M. Hybrid deep neural networks for face emotion recognition. Pattern Recognit. Lett. 2018, 115, 101–106. [Google Scholar] [CrossRef]
Akter, T.; Shahriare Satu, M.; Khan, M.I.; Ali, M.H.; Uddin, S.; Lió, P.; Quinn, J.M.W.; Moni, M.A. Machine Learning-Based Models for Early Stage Detection of Autism Spectrum Disorders. IEEE Access 2019, 7, 166509–166527. [Google Scholar] [CrossRef]
Aldridge, K.; George, I.D.; Cole, K.K.; Austin, J.R.; Takahashi, T.N.; Duan, Y.; Miles, J.H. Facial phenotypes in subgroups of prepubertal boys with autism spectrum disorders are correlated with clinical phenotypes. Mol. Autism. 2011, 2, 15. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The schematic diagram of our proposed transfer-learning-based facial recognition framework. (A) Data pre-processing step: organization of raw images for further activities; (B) Evaluation of Baseline Classifiers: performance analysis of improved models with state-of-the-art classifiers; (C) Identification of autism clusters: investigate individual clustering groups and explore the best group using machine learning model.

Figure 2. Improved MobileNet-V1 Transfer Learning Model.

Figure 3. Comparison of ROC curves obtained for (a) validation set (b) test set using improved MobileNet-V1 along with other classifiers.

Figure 4. Accuracy of improved MobileNet-V1 for different clustered group, where the x-axis labels indicate different values of K = 2, 3, ..., 10, and the y-axis label shows the accuracy.

Table 1. Performance Analysis of Validation Set using Machine Learning Classifiers and Improved Pretrained Models.

Classifier	Accuracy	AUC	F-Measure	G-Mean	Sensitivity	Specificity	Fall-Out	Miss Rate
AdaBoost	0.6200	0.6200	0.6198	0.6200	0.6200	0.6200	0.3800	0.3800
DT	0.6000	0.6000	0.5998	0.6000	0.6000	0.6000	0.4000	0.4000
GB	0.7100	0.7100	0.7097	0.7100	0.7100	0.7100	0.2900	0.2900
KNN	0.6200	0.6200	0.5824	0.6200	0.6200	0.6200	0.3800	0.3800
LR	0.7000	0.7000	0.6981	0.7000	0.7000	0.7000	0.3000	0.3000
MLP	0.6400	0.6400	0.6279	0.6400	0.6400	0.6400	0.3600	0.3600
NB	0.6600	0.6600	0.6578	0.6600	0.6600	0.6600	0.3400	0.3400
RF	0.7600	0.7600	0.7600	0.7600	0.7600	0.7600	0.2400	0.2400
SVM	0.6700	0.6700	0.6692	0.6700	0.6700	0.6700	0.3300	0.3300
XGB	0.7300	0.7300	0.7300	0.7300	0.7300	0.7300	0.2700	0.2700
CNN	0.7200	0.7200	0.7190	0.7200	0.7200	0.7200	0.2800	0.2800
DenseNet121	0.7800	0.7800	0.7786	0.7800	0.7800	0.7800	0.2200	0.2200
ResNet50	0.8000	0.8000	0.8000	0.8000	0.8000	0.8000	0.2000	0.2000
VGG16	0.7100	0.7100	0.7014	0.7100	0.7100	0.7100	0.2900	0.2900
VGG19	0.7600	0.7600	0.7478	0.7600	0.7600	0.7600	0.2400	0.2400
MobileNet-V1	0.8300	0.8300	0.8296	0.8300	0.8300	0.8300	0.1700	0.1700
MobileNet-V2	0.6200	0.6200	0.6176	0.6200	0.6200	0.6200	0.3800	0.3800

Table 2. Performance Analysis of Test Set using Machine Learning Classifiers and Improved Pretrained Models.

Classifier	Accuracy	AUC	F-Measure	G-Mean	Sensitivity	Specificity	Fall-Out	Miss Rate
AdaBoost	0.6633	0.6633	0.6625	0.6633	0.6633	0.6633	0.3367	0.3367
DT	0.6633	0.6633	0.6631	0.6633	0.6633	0.6633	0.3367	0.3367
GB	0.7333	0.7333	0.7331	0.7333	0.7333	0.7333	0.2667	0.2667
KNN	0.6867	0.6867	0.6627	0.6867	0.6867	0.6867	0.3133	0.3133
LR	0.6933	0.6933	0.6920	0.6933	0.6933	0.6933	0.3067	0.3067
MLP	0.6767	0.6767	0.6646	0.6767	0.6767	0.6767	0.3233	0.3233
NB	0.6833	0.6833	0.6825	0.6833	0.6833	0.6833	0.3167	0.3167
RF	0.7600	0.7600	0.7599	0.7600	0.7600	0.7600	0.2400	0.2400
SVM	0.7400	0.7400	0.7399	0.7400	0.7400	0.7400	0.2600	0.2600
XGB	0.7400	0.7400	0.7400	0.7400	0.7400	0.7400	0.2600	0.2600
CNN	0.7000	0.7000	0.6998	0.7000	0.7000	0.7000	0.3000	0.3000
DenseNet121	0.8367	0.8367	0.8365	0.8367	0.8367	0.8367	0.1633	0.1633
ResNet50	0.8100	0.8100	0.8082	0.8100	0.8100	0.8100	0.1900	0.1900
VGG16	0.7667	0.7667	0.7615	0.7667	0.7667	0.7667	0.2333	0.2333
VGG19	0.7133	0.7133	0.6948	0.7133	0.7133	0.7133	0.2867	0.2867
MobileNet-V1	0.9067	0.9067	0.9067	0.9067	0.9067	0.9067	0.0933	0.0933
MobileNet-V2	0.6467	0.6467	0.6463	0.6467	0.6467	0.6467	0.3533	0.3533

Table 3. The Accuracy of Individual Pre-trained Models for Imagenet (of Base Models) and Autism Facial Dataset (of Improved Models).

Pre-Trained Model	Top 1 Accuracy	Top 5 Accuracy	Validation Set	Test Set
Base/Improved Model	ImageNet (of Base)		Autism Dataset ( of Improved)
DenseNet121	0.7500	0.9230	0.7800	0.8367
ResNet50	0.7490	0.9210	0.8000	0.8100
VGG16	0.7130	0.9010	0.7100	0.7667
VGG19	0.7130	0.9000	0.7600	0.7133
MobileNet-V1	0.7040	0.8950	0.8300	0.9067
MobileNet-V2	0.7130	0.9010	0.6200	0.6467

Table 4. Comparison the results between Base and Improved MobileNet-V1 for Validation and Test Set.

Classifier	Accuracy	AUC	F-Measure	G-Mean	Sensitivity	Specificity	Fall Out	Miss Rate
	Validation Set
Base MobileNet-V1	0.7800	0.7800	0.7778	0.7800	0.7800	0.7800	0.2200	0.2200
MobileNet-V1	0.8300	0.8300	0.8296	0.8300	0.8300	0.8300	0.1700	0.1700
	Test Set
Base MobileNet-V1	0.8300	0.8300	0.8298	0.8300	0.8300	0.8300	0.1700	0.1700
MobileNet-V1	0.9067	0.9067	0.9067	0.9067	0.9067	0.9067	0.0933	0.0933

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akter, T.; Ali, M.H.; Khan, M.I.; Satu, M.S.; Uddin, M.J.; Alyami, S.A.; Ali, S.; Azad, A.; Moni, M.A. Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage. Brain Sci. 2021, 11, 734. https://doi.org/10.3390/brainsci11060734

AMA Style

Akter T, Ali MH, Khan MI, Satu MS, Uddin MJ, Alyami SA, Ali S, Azad A, Moni MA. Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage. Brain Sciences. 2021; 11(6):734. https://doi.org/10.3390/brainsci11060734

Chicago/Turabian Style

Akter, Tania, Mohammad Hanif Ali, Md. Imran Khan, Md. Shahriare Satu, Md. Jamal Uddin, Salem A. Alyami, Sarwar Ali, AKM Azad, and Mohammad Ali Moni. 2021. "Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage" Brain Sciences 11, no. 6: 734. https://doi.org/10.3390/brainsci11060734

APA Style

Akter, T., Ali, M. H., Khan, M. I., Satu, M. S., Uddin, M. J., Alyami, S. A., Ali, S., Azad, A., & Moni, M. A. (2021). Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage. Brain Sciences, 11(6), 734. https://doi.org/10.3390/brainsci11060734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Transfer-Learning-Based Facial Recognition Framework to Detect Autistic Children at an Early Stage

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Improved Transfer-Learning-Based Autism Facial Recognition Framework

2.3. Baseline Classifiers

2.4. Improved MobileNet-V1

2.5. K-Means Clustering

3. Experimental Result

3.1. Evaluation the Result of Validation Set

3.2. Evaluation the Result of Test Set

4. Discussion

5. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Traditional MobileNet-V1

Appendix A.2. Performance Evaluation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI