Predicting Children with ADHD Using Behavioral Activity: A Machine Learning Analysis

: Attention deﬁcit hyperactivity disorder (ADHD) is one of childhood’s most frequent neurobehavioral disorders. The purpose of this study is to: (i) extract the most prominent risk factors for children with ADHD; and (ii) propose a machine learning (ML)-based approach to classify children as either having ADHD or healthy. We extracted the data of 45,779 children aged 3–17 years from the 2018–2019 National Survey of Children’s Health (NSCH, 2018–2019). About 5218 (11.4%) of children were ADHD, and the rest of the children were healthy. Since the class label is highly imbalanced, we adopted a combination of oversampling and undersampling approaches to make a balanced class label. We adopted logistic regression (LR) to extract the signiﬁcant factors for children with ADHD based on p -values (<0.05). Eight ML-based classiﬁers such as random forest (RF), Naïve Bayes (NB), decision tree (DT), XGBoost, k-nearest neighborhood (KNN), multilayer perceptron (MLP), support vector machine (SVM), and 1-dimensional convolution neural network (1D CNN) were adopted for the prediction of children with ADHD. The average age of the children with ADHD was 12.4 ± 3.4 years. Our ﬁndings showed that RF-based classiﬁer provided the highest classiﬁcation accuracy of 85.5%, sensitivity of 84.4%, speciﬁcity of 86.4%, and an AUC of 0.94. This study illustrated that LR with RF-based system could provide excellent accuracy for classifying and predicting children with ADHD. This system will be helpful for early detection and diagnosis of ADHD.


Introduction
Attention deficit hyperactivity disorder (ADHD) is one of the most frequent neurodevelopmental behavioral disorders in childhood [1]. Children with ADHD have the following symptoms: hyperactivity, inattention, and impulsivity [1]. According to the Centers for Disease Control (CDC) and prevention, the number of children in the USA who have been diagnosed with ADHD has fluctuated over time as follows: about 4.4 million children between the ages of 2 and 17 years were diagnosed with ADHD in 2003, 5.4 million children in 2007, 6.4 million children in 2011, and 6.1 million children in 2016 [2]. About 12.9% of male children and 5.6% of females were diagnosed with ADHD [2,3]. Globally, the prevalence of adults with ADHD was 2.8% in 2016 [4] and 0.96% in 2019; and 7.8% of children were diagnosed with ADHD in 2003, 9.5% in 2007, and 11% in 2007 [5]. There were 62% of children who had taken medication for ADHD, and 46.7% of those children had also received behavioral treatment [2]. It is noted that the number of children with ADHD has been increasing day by day. Therefore, it is necessary to propose a model for the identification of the risk factors for ADHD.
Researchers are trying to determine the risk factors to reduce the number of children with ADHD. A study showed that genetic factors played a significant role and were linked with ADHD [6]. Genetic factors are responsible for almost 75% of the risk of ADHD in younger children [7]. Besides the genetic factors, there were several risk factors for ADHD such as brain injury, alcohol/tobacco use during pregnancy, and premature delivery [6].
Previous studies also showed that age, sex, asthma, race, anxiety, depression, obesity, cigarette smoking, and socio-economic status were also associated with children with ADHD [5,[8][9][10][11][12][13][14][15]. These studies were conducted only to identify the risk factors for children with ADHD. It is necessary to propose a prediction model. In this regard, in comparison with classical approaches, machine learning (ML)-based models may be used for prediction. ML-based models have been also used for the identification and prediction in the field of medical imaging [16][17][18], healthcare [19][20][21], and mental health [22,23].
Several ML-based classifiers were applied to predict children with ADHD [24][25][26][27][28][29]. Uluyagmur-Ozturk et al. [30] conducted a study on the emotional status of children and classified them as ASD, ADHD, and control based on their diagnosis in Turkey. They extracted the data of 61 children from Maramara University Medical Hospital. There were 18 children with ASD, 30 children with ADHD, and 13 healthy children. The average ages of the children with respective groups were 10.50, 9.46, and 9.22 years. They utilized ReliefF to determine the most significant features of ASD and ADHD. They also utilized five ML-based algorithms like decision tree (DT), random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), and AdaBoost (AB) to classify children as ASD, ADHD, and healthy. They showed that AB provided an 80% accuracy rate in differentiating children as having ASD, ADHD, and healthy.
Slobodin et al. [31] also diagnosed children with ADHD based on a continuous performance test (CPT). They selected 458 children aged 6-12 years. The selected children had an average age of 8.7 ± 1.8 years and 59.0% of the children were boys, with 46.51% of the children having ADHD problems. They found that there was no significant age difference between ADHD and non-ADHD (p-value = 0.94). They partitioned the dataset into the training set and holdout. They applied several ML-based classifiers like RF, MOXO, and neural network (NN) for the prediction of ADHD. ML-based classifiers were trained on 60% of the dataset, and 40% of the dataset was used as test set for the evaluation of MLbased classifiers. They showed that their proposed ML-based classifiers (MOXO) provided the highest accuracy of 87.0%, the sensitivity of 89.0%, and the specificity of 84.0%.
Morrow et al. [32] also conducted a study on children who received treatment for ADHD. They extracted the data of 6630 children with ADHD (age: 3-17 years) from the National Survey of Children's Health (NSCH), 2016-2017. The average age of the children with ADHD was 12.4 years. Four ML-based classifiers like classification and regression tree (CART), logistic regression (LR), ensemble decision forest (EDR), and deep multi-layer neural network (DeepNet) were employed to determine the associated factors with children who received treatment for ADHD. They showed that the DeepNet-based classifier gave the highest AUC of 0.72 compared to CART, EDR, and LR.
Despite the rapid development of ML-based classifiers, their application to ADHD diagnosis remains a difficult task. Yet, various ML-based classifiers have been utilized to predict children with ADHD in different countries using different ADHD datasets. However, the models' performance has to be improved. The current study had the following objectives: (i) to extract the risk factors of children with ADHD; and (ii) to propose an ML-based classifier to classify and predict children as either having ADHD or healthy.
The overall layout of this study is as follows: Section 2 presents the materials and methods; we present descriptions of dataset, predictor and outcome variables, statistical analysis, imbalance management methods, feature section method, machine learning techniques, and performance evaluation criteria. Results are presented in Section 3. Section 4 presents a detailed discussion, and finally, the conclusion is presented in Section 5.

Dataset
The data utilized for this study was extracted from the 2018-2019 NSCH [33], which is a nationally representative survey based on child health and well-being. Participants were 59,963 youths aged 0 to 17 years from the NSCH, 2018-2019. We enrolled 56,006 participants aged 3-17 years for our study purpose. The dataset contained some missing and unusual observations. Excluding these, about 45,779 participants were considered for our final analysis. Among them, 5218 children with ADHD and the rest of the children were healthy.

Outcome Variable
In this study, we considered the outcome variables by asking the following question to their parents: "Has a doctor or health professional ever told you that the selected child (S.C.) has attention deficit disorder or attention deficit hyperactive disorder, that is, ADD or ADHD?" [34,44]. We categorized this outcome variable as "1" if the response was "Yes" and "0" if the response was "No".

Statistical Analysis
We used Stata version 14.10 for descriptive analysis and Python version 3.9, and Scikit-learn version 1.0.2 for ML-based analysis. First, data is presented as mean ± standard deviation (SD) for continuous variables and frequency (%) for categorical variables. Second, an independent t-test for continuous variables and Chi-square tests for categorical variables were used to compare the differences in variables between ADHD and healthy children. Third, all tests were two-tailed and the factors were statistically significant whose p-values are less than 0.05.

Imbalanced Management Method
A dataset is called imbalanced when one class label is larger than the other class label. To classify imbalanced data, an ML-based algorithm will be biased to the majority class. To solve this problem, we adopted two types of data sampling methods as follows: (i) oversampling and (ii) undersampling. Oversampling is a sampling technique that randomly selects the samples with replacement from the minority class and adds them to the training dataset. As a result, the performance of ML-based classifiers will be improved [45,46]. Undersampling is also a sampling technique to randomly select samples without replacement from the majority class until the balance of the class label is reached [47].

Feature Selection Method
Feature selection (FS) is also known as the variable selection in statistics and machine learning (ML). FS is a process for selecting the most informative features to improve the performance of ML-based algorithms. FS is needed for the following reasons: (i) to simplify models to make them easy to interpret by readers [48]; (ii) to reduce overfitting and the complexity of problems the model [49]; (iii) to reduce the training time and cost [50]; (iv) to avoid the curse of dimensionality [51]; and (v) to improve the accuracy of ML-based models [52]. In this study, we used LR as an FS method [53,54] to extract the most significant risk factors of the children with ADHD. LR is used as supervised learning in the community of ML. In statistics, LR is also used to extract the most informative features [36,38,41,54,55]. The LR-based feature extraction procedure is described as follows: LR is used when the output variable is binary (1/0) and the input variables may be discrete or continuous. LR evaluates the connection between the output and one or more input variables by estimating the probability of the logit function. The logit function is the linear combination of input variables (X) and output variable (Y) (here, ADHD), which can be represented as follows: where, P j is the probability of children who have ADHD and takes a value, Y = 1, and 1 − P j is the probability of healthy children and takes a value, Y = 0. B i (i = 0, 1, . . ., r), are the unknown parameters, known as regression coefficients that need to be estimated, where, r represents the total number of the input variables. The steps of LR-based FS method are as follows: (i) Write down the likelihood function; (ii) Estimate the regression coefficients by maximum likelihood estimator (MLE) and one can get easily odds ratio (ORs) by taking the exponent of the regression coefficients (ORs = exp(B)); and (iii) Test the regression coefficients using a normal/z-test and calculate the p-values. We select the features that correspond to regression coefficients with p-values less than 0.05 [53][54][55].

Machine Learning Techniques
This study aimed to predict children with ADHD using eight ML-based classifiers. We select the best classifier who performed the better performance scores. We divided the dataset into two sets: training set and test set. We took 90% of the dataset as training set and the rest of the dataset was treated as the test set. We fitted each of eight ML-based classifiers: random forest (RF) [56], Naïve Bayes (NB) [57], decision tree (DT) [58], XGBoost [59], knearest neighbor (KNN) [60], multilayer perceptron (MLP) [61], support vector machine (SVM) [62], and 1-dimensional convolution network (1D CNN) [63] for the training set. The five ML-based classifiers (RF, DT, KNN, MLP, and SVM) out of eight classifiers had additional parameters, called hyperparameters. We optimized the hyperparameters based on the grid search function. The grid search function takes as input arrays of all possible hyperparameters values for each classifier and uses a cross-validation (CV) protocol on the training set to extract the optimal values of the hyperparameters. In this study, we used 10-fold CV and selected the sets of hyperparameter values with the highest classification accuracy. Then, we fit the ML-based classifiers after choosing the optimal values of the hyperparameters. The hyperparameters of different classifiers are presented in Table 2. We used the sigmoid function and the Adam optimizer for 1D CNN. After choosing the optimum value of hyperparameters, we have now predicted the children with ADHD for the test set and computed the performance scores of each ML-based classifier. Table 2. Optimized hyperparameters of different classifiers using the grid search method.

Performance Evaluation Criteria
Accuracy, sensitivity (SE), and specificity (SP) are used to evaluate the performance of all ML-based classifiers, which are computed based on true positive (TP), true negative (TN), false positive (FP), and false-negative (FN) and defined as follows:

Accuracy
Accuracy is the ratio between the total number of correctly classified classes and the total number of populations and mathematically defined as: Sensitivity Sensitivity (SE) is the ratio between the total number of correctly classified positive classes and the total number of positive classes and mathematically defined as: Specificity Specificity (SP) is the ratio between the total number of correctly classified negative classes and the total number of negative classes and mathematically defined as:

Results
In this study, we adopted a feature selection method and eight ML-based classifiers for the prediction. We performed three experiments, such as (i) Baseline and demographic characteristics of children with ADHD; (ii) balanced dataset formation; (iii) selecting the prominent significant risk factors of children with ADHD using LR; and (iv) comparison of performance of ML-based classifiers for the prediction of children with ADHD. The results of these three experiments were discussed in Sections 3.1-3.4, respectively.

Baseline and Demographic Characteristics of Children with ADHD
The baseline and demographic characteristics of children with ADHD aged 3-17 years are shown in Table 3. Before balancing the class label, the overall prevalence of ADHD was 11.4%. The age range included in our analysis was from 3-17 years, with the average age of the children being 10.6 ± 4.4 years, with an ADHD disease age of 12.4 ± 3.4 years. In this study, 52.2% were male; 79.2% were white, 6.29% were black, and 14.6% were of other race. About 15.1% of male children had ADHD. Our results showed that 37.6% and 41.9% of children with ADHD suffered from anxiety and depression problems, respectively. It was observed that all factors were statistically significantly associated with ADHD (p < 0.05).

Balanced Dataset Formation
The main aim of this section is to balance the class label (ADHD vs. healthy) using a combination of oversampling and undersampling methods. The database utilized in this study was comprised of 5218 (11.4%) children with ADHD, and 40,561 (88.6%) children were healthy. Here, the ratio between children with ADHD and healthy children was 1:9. In order to reduce the difference in the number of samples per class, we take 3 times of the positive class (ADHD) (3 × 5218) = 15,654 children with ADHD using oversampling and also take 15,654 healthy children from 40,561 using undersampling.

Prominent Risk Factors of Children with ADHD Using LR
One of the objectives of this study was to select the high-risk factors for children with ADHD. After balancing the class label, LR was adopted for feature selection. We need to check the associations between different factors and children with ADHD before applying LR. We chose only the factors for LR whose factors were statistically significantly associated with children who had ADHD. Table 4 summarizes identifying the risk factors for children with ADHD using LR. The odds ratios (ORs) with their 95% confidence intervals (CIs), standard error (SE), and p-values are also summarized in Table 4. The following factors were associated with a higher likelihood of being diagnosed with ADHD: child's age  012-1.178). A child had a significantly lower chance of being diagnosed with ADHD if she/he lived in a two-parent family (OR:0.833; 95% CI: 0.781-0.887), and mother's age (OR: 0.971 95% CI: (0.967-0.975) was a low risk factor. At 5% level of significance, it was discovered that child's age, child's sex, mother's age, allergies, asthma, anxiety, depression, alcohol, insurance, race, family structure, very LBW, premature child, and poverty were statistically significant risk factors of ADHD (see Table 4). Table 3. Baseline and demographic characteristics of children with ADHD, 3-17 years.

Comparisons of Performances of Machine Learning Techniques
The main objective of this section was to predict children with ADHD using eight ML-based classifiers. The comparison of the performances of ML-based classifiers for the prediction of children with ADHD is shown in Table 5. It was noted that RF-based classifier gave the highest classification accuracy of 85.5%, sensitivity of 84.4%, and specificity of 86.4%, whereas NB provided the lowest classification accuracy of 69.8%, sensitivity of 77.3% , and specificity of 65.3%. It was also noted that DT provided 84.6% accuracy, 83.4% sensitivity, and 86.0% specificity, whereas KNN provided 84.0% accuracy, 82.6% sensitivity, and 85.6% specificity. It was also observed that RF-based classifier achieved the highest AUC of 0.94 compared to other classifiers. The corresponding ROC curve of eight MLbased classifiers is depicted in Figure 1. Therefore, the RF-based classifier performed better performance scores for the prediction of children with ADHD.

Discussion
Our current study was conducted based on the latest nationally representative survey of NSCH, 2018-2019, with children aged 3-17 years. The study aim was as follows: (i) to investigate the risk factors of the children with ADHD; and (ii) to predict the children with ADHD. The current diagnostic process for ADHD is time-consuming and complicated by behavioral symptom overlaps. Since the incidence rate of ADHD is high, it is necessary to provide a tool that can swiftly and correctly predict the risk of ADHD. There were some ML-based works in previous studies to correctly detect and predict ADHD [28,29,[64][65][66] and children with ADHD who received treatment [32]. Our current study expands these previous works by implementing an LR-based model for the risk factor extraction method and eight ML-based classifiers for the prediction of the children with ADHD. LR results illustrated that several factors (child's age, child's sex, mother's age, allergies, asthma, anxiety, depression, alcohol, insurance, race, family structure, very LBW, premature child, and poverty) were identified as the high-risk predictors of the children who had ADHD. This present study also adopted eight ML-based classifiers for prediction. Eight ML-based classifiers for the prediction of children with ADHD gave an accuracy range of 69.8% to 85.5% and an AUC of 0.78 to 0.94. RF-based classifiers correctly predicted the children with ADHD with an excellent accuracy of 85.5% and also an excellent AUC of 0.94.

Conclusions
This study presented a comprehensive investigation into the risk factors of the children with ADHD. This study illustrated that LR with RF-based classifier could provide excellent accuracy in correctly classifying and predicting children with ADHD. This study will assist physicians in detecting and treating children with ADHD at an early stage.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: This study was based on an analysis of existing public domain survey datasets that are freely available online with all identifier information removed. One can use the dataset from the following link as [https://www.census.gov/programs-surveys/nsch/data/datasets. html].