1. Introduction
Neurodevelopmental disorders (NDs) are a group of disorders that typically appear in childhood and are characterized by impairments in neurological development that affect multiple aspects of communication, learning, social, behavior, cognitive, and emotional ability to function [
1,
2,
3,
4,
5,
6]. NDs include Autism Spectrum Disorders (ASD), Attention Deficit Hyperactivity Disorder (ADHD), Intellectual Disability (ID), Specific Learning Disorder (SLD), and Communication Disorders (CD) [
1]. DSM 5 [
1] defines these disorders’ profiles with certain characteristics [
1,
2,
3,
4,
5,
6]: (i) ASD exhibits persistent difficulties with social interaction and communication as well as the existence of constrictive, repetitive patterns of behavior, interests, or hobbies resulting in clinically severe functional deficits; (ii) inattention, impulsivity, and hyperactivity are the characteristics of ADHD, interfering with day-to-day functioning; (iii) ID comprises impairments of general mental abilities including verbal abilities, learning aptitude, the capacity for logical reasoning, and practical intelligence (problem-solving) that impact adaptive functioning; (iv) SLD presents significantly poor performance in at least one of these areas: oral expression, listening comprehension, basic reading and/or writing abilities, mathematics calculation and/or problem-solving; and (v) CD refers to a group of disorders (speech sound disorder, language disorder, childhood-onset fluency disorder, social (pragmatic) communication disorder, and unspecified communication disorder) characterized by persistent difficulties in the acquisition, comprehension, and/or use of spoken or written language, which interfere with effective communication.
These disorders commonly onset during childhood from young infancy to adolescence. For instance, ASD can be diagnosed between 2–4 years old, ADHD before 12 years old, ID before 18 years old, and any of the NDs may go undetected until adulthood [
4]. The severity of ND symptoms varies, and they affect individuals’ quality of life as well as that of their families, causing major care needs that require extensive community assets [
7,
8]. Early screening and evaluation are vital to identify children at risk of neurodevelopmental disorders (NDs) and/or communication deficiencies. While the current literature reports a high prevalence of NDs, still many children are underdiagnosed, resulting in them missing out on effective interventions that could be of more impact if administered early [
7,
8].
Effective communication is essential for indicating the development continuation from childhood through adult life and for social interactions [
4,
9]. Delayed speech and language development are often early indicators of many NDs [
4,
10]. Various instruments for assessment, testing, observations, and perceived behaviors of the child and parent/caregiver interviews are employed by clinicians during evaluation procedures [
7]. Although all the aforementioned are meant to be applied with clinical discretion, their use raises concerns such as [
11,
12,
13]: (i) clinical symptoms are shared among neurodevelopmental disorders; (ii) severe specifier values may result in a positive diagnostic decision because most indications are expressed quantitatively; (iii) in the lack of biomarkers, we are unable to distinguish false positives from extremely related conditions; (iv) the decoding of instrument values at the threshold may be challenging; (v) diagnostic instruments do not offer a differential diagnosis and are unhelpful for negative diagnoses; and (vi) diagnostic instrumentation does not establish individual Functional Communication Profiles to highlight deficits and strengths valuable for intervention. As such, occasionally they may result in subjective evaluations [
11,
12,
13] which point out elements of clinical assessment based on multiparametric, non-standardized, and subjective diagnostic procedures that are still challenging and require a high level of expertise [
14]. Moreover, early detection of developmental disabilities in children is crucial for improving the prognostic procedures for NDs on an individual’s development stages [
12]. Therefore, there is a need for additional support to diminish the over- or under-diagnosis of NDs in children [
11,
12,
14,
15].
Speech and language therapy and special education can benefit from the advances of biosignal processing techniques, and wearable biosensors have made it feasible for the real-time collection and analysis of biosignals, enabling new possibilities for healthcare monitoring and management. Biosignals are time-varying measures of human body processes that can provide important information about the functioning of the human body [
16,
17]. There two main categories of biosignals: (i) physical signals, that are directly related to physical properties of the body, such as movement, force, and pressure (i.e., accelerometry, eye movements, blinks, respiration, facial expressions, voice); and (ii) physiological signals, that reflect the activity of the body’s organs and systems, such as the heart, the lungs, and the brain (i.e., electrocardiography, electroencephalography) [
16]. As a result, less invasive devices are available (i.e., eye-trackers), which provide the child with a computer interaction community and allow understanding of how children engage with digital technologies, letting novel insights into their visual and cognitive processing [
18]. Eye tracking is a method for identifying diagnostic biomarkers with evidence in children with ASD [
19,
20,
21], ADHD [
22,
23,
24,
25], ID [
26,
27,
28], SLD [
29,
30], and CD [
24,
27]. The role of the autonomic nervous system has earned consideration for many types of neurophysiological features of NDs, such as ASD [
31,
32,
33,
34]. There are many characteristics that can be studied by taking heart rate measurements, of which a very common one is heart rate variability signal (HRV), since it has been found to be directly related to health [
35], mental stress [
36], cognitive functions [
37], and psychosomatic state [
38]. Autonomic dysregulation is a biomarker for ASD and ADHD. Specifically, assessment using HRV can distinguish sensory reactivity in ASD children from that found in typically developed children [
31,
39]. Furthermore, ADHD can be assessed using HRV to distinguish measurements regarding sustained attention and emotional and behavioral regulation deficits seen in ADHD, and it may help to define the pathophysiology of the disorder [
40,
41].
Machine learning (ML) is a subset of AI and a rapidly evolving field of study that aims to establish high-quality prediction models using search strategies, deep learning, and computational analysis to enable machines to learn to make autonomous decisions and improve their performance at specific tasks [
42]. There are several uses for ML in health and healthcare [
12,
43,
44,
45,
46,
47,
48]. The way we approach disease/disorder screening, diagnosis, and treatment may change as a result; for example, ML algorithms can examine patient data to spot trends and forecast the course of diseases/disorders. Supervised ML for classification is a type of machine learning where a model is trained to predict a categorical output variable. Metrics such as accuracy, error rate, precision, and recall can be used to evaluate a classification model’s performance [
39,
49]. A good classification model should have high accuracy, precision, and recall, but the optimal values may depend on the specific problem being addressed. For instance, early detection of type 2 diabetes and its complications has been identified from electronically collected data using ML and deep learning techniques [
50,
51]. Further, towards individualized treatment plans, ML algorithms can examine patient data, including genetic data and medical history improving treatment results [
52,
53]. Wearable technology and sensor data can be analyzed by ML algorithms to track patient health and spot early disease symptoms [
54,
55].
In relation to this, a soft computing approach of predictive fuzzy cognitive maps has been employed successfully to represent human reasoning and to derive conclusions and decisions in a way that is human-like for a Medical Decision Support System [
48]. This system was intended for medical education, employing a scenario-based learning approach to safely explore extensive “what-if” scenarios in case studies and prepare for dealing with critical adversity [
48]. Additionally, a sub-band morphological operation method has also been used successfully to detect cerebral aneurysms [
56] and convolutional neural networks have been employed for the classification of leukocytes categories and leukemia prediction [
57]. Furthermore, wearable electroencephalogram (EEG) recorders and Brain Computer Interface software have been proposed to aid in the assessment of alcohol-related brain waves [
58]. More specifically, calculated spectral and statistical properties were used for classification, and Grammatical Evolution was applied. The suggested approach reported high accuracy results (89.95%), and thus, it was suited for direct drivers’ mental state evaluation for road safety and accident avoidance in a future in-vehicle smart system. Further, for the hemiplegia type classification among patients and healthy individuals, an automatic feature selection and building method based on grammatical evolution (GE) for radial basis function (RBF) networks was presented [
59]. Using an accelerometer sensor dataset, this approach was put to the test using four different classification techniques: RBF network, multi-layer perceptron (MLP) trained using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) training algorithm, support vector machine (SVM), and GenClass, a GE-based parallel tool for data classification. The test results showed that the suggested solution had the best classification accuracy (90.07%) [
59]. Various approaches of neural networks and deep neural networks have been used for classification of speech quality and voice disorders with very promising results [
43,
44,
45,
46,
47,
60,
61].
New prospects are presented to assist clinical decision-making through the use of AI algorithms, automated instruments for measuring, decision-making, and classification in communication deficiencies and NDs in the research setting [
11,
12,
13,
14,
15,
62]. Traditional ML approaches use separate feature extraction procedures and classification methods, but with Deep Learning these two procedures are done comprehensively [
42]. For the ASD diagnosis in young children from 5 to 10 years old, an intelligent model has been presented based on resting-state functional magnetic resonance imaging data from global Autism Brain Imaging Data Exchange I and II datasets and using convolutional neural networks (CNNs) [
63]. The best results have been obtained with Adamax optimization technique. A review of ML research for MRI-based ASD identification deduced that the accuracy of research studies with a significant number of participants is generally lower than that of studies with fewer participants, implying the further need for large-scale studies [
64]. Regarding participants’ age, it is shown that the accuracy of ASD automated diagnosis is higher for younger individuals [
64]. Another thorough examination of deep learning approaches looks into the prognosis of neurological and neuropsychiatric disorders, reporting more potential for diagnosing stroke, cerebral palsy, and migraines using various deep learning models [
65].
A deep neural network model employed in the early screening of ASD, assessing children’s eye tracking data applicability, reported outcomes that strongly indicated efficiency in helping clinicians for a quick and reliable evaluation [
15]. The outcomes of a review article on ML methods of feature selection and classification for ASD, used to analyze and investigate ASD, indicate an improvement in diagnostic accuracy, time, and quality without complexity [
66]. In an analysis and detection of ASD after applying various ML techniques and handling missing values, the results strongly suggest that convolutional neural network- (CNN) based prediction models work better on their datasets with a significantly higher accuracy for ASD screening in children, adolescents, and adult data [
67]. A CNN is employed for the classification of ADHD, trained with EEG spectrograms of 20 patients and 20 healthy participants. The model has an accuracy of 88% ± 1.12%, outperforming the Recurrent Neural Network and the Shallow Neural Network, with the advantage of avoiding the manual EEG spectral or channel features [
68]. Furthermore, a CNN was used to identify ADHD from a dataset of children (ADHD: 50, Healthy: 57) and the network input data consisted of power spectrum density of EEGs. The accuracy obtained was 90.29% ± 0.58% [
69].
Additionally, serious games which embed fine motor activities obtained from a mobile device and deep learning convolutional neural networks (CNN) are proposed as novel digital biomarkers for the classification of developmental disorders [
12]. A pilot study of an integrated system that includes a serious game and a mobile app, and utilizes ML models that measure ADHD behaviors, suggests their significant potential in the domain of ADHD prediction [
14]. Moreover, a gamified online test and ML using Random Forests for the predictive model were designed with results revealing that their model correctly detected over 80% of the participants with dyslexia, pointing out that dyslexia can be screened using an ML approach [
62].
Consequently, more in-depth research is needed which utilizes automatic classification techniques to assist clinicians’ decision making. The aim of the current study is to examine automatic classification for the assistance and support of evaluation procedures in speech and language skills on biometric data gathered for children with a Disorder (NDs or no-NDs). Further, and in more detail, we also examine five types of NDs: ASD, ADHD, ID, SLD, and CD. Hence, we overall study six binary classification problems. The methods utilized to classify the data are a Radial Basis Function (RBF) neural network, a Deep Neural Network (DNN), and a Grammatical Evolution variant named GenClass [
70].
4. Discussion
This study aimed to utilize ML to examine the development of innovative automated solutions for the early identification of NDs in children with communication deficiencies, offering the development of technology-based data-gathering techniques such as motion tracking, heart rate metrics, and eye tracking from the new SmartSpeech dataset developed in Greek. Ten-fold cross-validation was chosen for evaluating model efficacy since it produces high variability in testing and training data, decreases bias, and delivers consistent findings for all tries, parameters, and models. The results of this research give a direct comparison of the different machine learning methods employed on this dataset, which are RBF, DNN, and GenClass.
The reported results of this study (
Table 3,
Table 4 and
Table 5) display the comparison of all the methods employing the performance metric of the error rate (%). Thus, a smaller value implies better performance. Precision and recall metrics are also displayed for the class Disorder (
Table 6). Finally, the highest performance classification methods in accuracy metrics are reported for each class and dataset (
Table 7). Particularly,
Table 7 clearly illustrates the tendency of the specific methods to dominate in each dataset and class; more specifically:
For the eye tracking measurements, the GenClass and the DNN-4 have proven to be the best choices, with an accuracy of at least 86.33% for the ASD population. GenClass is superior for the classes Disorder, ID, SLD, and CD, whereas DNN-4 is better for ASD and ADHD. For the aggregate class Disorder, GenClass has the highest observed accuracy of 92.83%. This finding may be utilized for automated screening to discriminate whether an individual has NDs.
The RBF method is the most accurate in the heart rate dataset, with an accuracy of at least 80.05%. It is notable that it achieves the best performance for all the classes under study.
As for the game-based dataset, the GenClass method has the highest accuracy for the classes Disorder, ASD, ID, and CD. The classes ADHD and SLD are better identified using the RBF algorithm.
However, in most other cases GenClass and DNN-4 outperform the rest. It is worth noting that GenClass is expected to have longer execution times since it is based on genetic algorithms. Nevertheless, in this study we have employed the parallelization feature of the software GenClass [
91] to speed up the process.
Similar research attempts to identify NDs have been reported in the literature. For example, one such study evaluated the ability of drag-and-drop data to be used to classify children with developmental disabilities [
12]. Data were collected from 223 children with typical development and 147 children with developmental disabilities via a mobile application (DoBrain). A deep learning CNN algorithm was developed to classify an area under the curve (AUC) of 0.817. Furthermore, in line with our study, a binary classifier has also been trained using paralinguistic features extracted from typically developing children and children suffering from Speech Sound Disorders (SSD), reporting 87% accuracy [
60]. In the same direction as our study, the HRV was also used as a biomarker to distinguish autistic and typical children by applying several machine learning algorithms, that is, the Logistic Regression, Linear Discriminant Analysis, and Cubic Support Vector Machine [
39]. Logistic Regression proved to be the best classifier for a color stimulus test in that study, whereas Linear Discriminant Analysis was better in the baseline test. Moreover, an important biomarker to detect ASD can be considered similar to our research which focused on eye tracking data [
15]. While finding the best method to predict autism with the help of eye tracking scan path images, the DNN classifier was compared to traditional machine learning approaches such as Boosted Decision Tree, Deep Support Vector Machine, and Decision Jungle. The DNN model outperformed the other machine learning techniques with an AUC of 97%, sensitivity of 93.28%, specificity of 91.38%, negative prediction value (NPV) of 94.46%, and positive predictive value (PPV) of 90.06% [
15]. Moreover, RBF also reported reliable results in a study with an attempt to identify children with ID that was done using two different feature extraction methods of speech samples, that is, the Linear Predictive Coding based cepstral parameters and Mel-frequency cepstral coefficients, along with four classifiers, that is, k-nearest neighbor, support vector machine, linear discriminant analysis, and RBF neural network [
92]. The RBF classification model was the best technique for classifying disordered speech, giving higher accuracy compared to the rest of the classifiers (>90%).
Furthermore, this study’s sample size is analogous to other research [
12,
15,
93] due to the high costs of collecting the data involving human subjects and the ongoing development of tasks and experimental techniques that can discriminate between various situations to the greatest extent possible. Similar to prior studies [
93], in this study, experimenting while collecting a single multi-dimensional data sample may take 1.5 to 4 h of participant’s time (such as setting up, testing, and setting down) and 2 to 6 h of participant time (which encompasses travel time). Furthermore, reaching out to people and encouraging participation is complex, making recruiting many participants with NDs difficult. As a result, the resources available for early-stage studies do not allow for gathering samples from thousands of people. Although this study’s sample size is not very large, its results form one of the first attempts at employing ML on data from digital gameplay and sensors to automatically assist the clinician’s decision, reducing the inherent uncertainty of clinical diagnosis regarding speech and language activities and their manifestations. This study contributes to the automatic classification of NDs based on new datasets initiated from responses during software interactions, primarily designed and implemented for the Greek language. Future research may focus on enriching the dataset and considering recent advances in classification to enhance accuracy.