Mood State Detection in Handwritten Tasks Using PCA–mFCBF and Automated Machine Learning

In this research, we analyse data obtained from sensors when a user handwrites or draws on a tablet to detect whether the user is in a specific mood state. First, we calculated the features based on the temporal, kinematic, statistical, spectral and cepstral domains for the tablet pressure, the horizontal and vertical pen displacements and the azimuth of the pen’s position. Next, we selected features using a principal component analysis (PCA) pipeline, followed by modified fast correlation–based filtering (mFCBF). PCA was used to calculate the orthogonal transformation of the features, and mFCBF was used to select the best PCA features. The EMOTHAW database was used for depression, anxiety and stress scale (DASS) assessment. The process involved the augmentation of the training data by first augmenting the mood states such that all the data were the same size. Then, 80% of the training data was randomly selected, and a small random Gaussian noise was added to the extracted features. Automated machine learning was employed to train and test more than ten plain and ensembled classifiers. For all three moods, we obtained 100% accuracy results when detecting two possible grades of mood severities using this architecture. The results obtained were superior to the results obtained by using state-of-the-art methods, which enabled us to define the three mood states and provide precise information to the clinical psychologist. The accuracy results obtained when detecting these three possible mood states using this architecture were 82.5%, 72.8% and 74.56% for depression, anxiety and stress, respectively.


Introduction
Morphological biometrics, based on quantitative measures of the human body [1,2], as well as behavioural biometrics, based on the patterns of actions performed by a subject, have proved to be helpful for e-security and e-health [3]. This research focuses on behavioural biometrics. We analyse the online activities performed during certain specific drawing and handwriting tasks performed by the subjects [4]. For monitoring health conditions, behavioural biometrics, especially online handwriting/drawing, has proved to be more useful in indicating states of mental disorders and diseases, such as dementia, than other popular morphological biometrics traits, such as fingerprint and iris recognition [3,4]. Also, behavioural biometrics is a minimally invasive methodology because it is based on tasks that are part of routine functional activities. Figure 1 represents the tablet application that captures the sensor data of the tablet and the pen when the user handwrites or draws on the tablet. Treatment of mental illnesses is a health priority because they significantly impact human well-being and are among the major causes of inabilities in populations worldwide. Indeed, depression, stress and anxiety are the most prevalent negative moods in the world, and stress is often present as a comorbidity. It is estimated that the number of people affected by depression (resp. anxiety) is 4.4% (resp. 3.6%) of the global population [5], and these numbers are rapidly increasing because of the global spread of the coronavirus disease 2019. The manifestation of such disorders is commonly accompanied by the deterioration of social behaviour mainly because of the inability to express one's emotions and decode others' moods [6]. Unfortunately, these diseases do not have a definitive treatment, and treatments can last for the entire lifetime with a consistent impact on the quality of life of patients and on the health costs of public administrations. Therefore, it is imperative to detect early indicators of mental disorders to provide timely treatment so that they do not become chronic and difficult to treat. Identifying early indicators hence enables early implementation and effective interventions, which reduces public health care costs [7]. These indicators include changes in a person's voice, facial expressions and body postures as well as changes in behaviours and functional abilities, such as handwriting and drawing [8].
We focus on negative moods (depression, stress and anxiety) because they last for long periods and a negative state of mind (mood) deteriorates the quality of life of patients. Depression is a mood disorder that causes energy and interest to disappear, and instils a persistent feeling of sadness, which results in high energy consumption by the brain; this can lead to various emotional and physical problems. Clinical depression can be serious; in fact, when depression is left untreated, it can lead to suicide [9].
Anxiety is a cognitive-affective response characterised by feelings of tension and worry regarding a potentially negative outcome that the individual perceives as highly probable and imminent [10]. Anxiety signs and symptoms include nervous sweating, increased heart rate and hyperventilation [9]. It impacts the activities of the patients because this state causes tiredness [11].
Stress is a natural reaction to the pressure that the body undergoes when faced with complicated or dangerous life situations [12]. In general, stress is a normal human response and is part of life, but it becomes a mood disorder when it is experienced frequently and interferes with the ability to perform daily activities. Moreover, when facing stressful situations, the body releases large amounts of several hormones, which can damage the body (causing diabetes and cardiovascular diseases) and cognitive processes [13].
The use of behavioural biometrics, especially, in particular, the online analysis of the activity of a subject performing a handwriting or drawing task enables the characterisation of mood states, especially, depression, anxiety and stress [14]. The use of technological tools (e.g., smartphones, tablets and touch screens) and the multiple interactions among subjects on social media, public administration or health platforms provide access to a large amount of data; this is helpful for discovering or evaluating important features of the subject's condition.
The characterisation of mood detection through behavioural biometrics, in particular, by the online analysis of handwriting and drawings is a novel and promising research field. Unfortunately, to the best of our knowledge, there are few studies and very few datasets that can be used as a benchmark for potential applications.
This research is based on a study published by Likforman-Sulem et al. [15]; they proposed a methodology to use online handwriting/drawing data to discriminate depressed, stressed and anxious patients from a healthy control group. Their work shows the use of various features to discriminate among negative moods (depression, anxiety and stress) with significant accuracy, sensitivity and specificity by using random forest classification. These features are based on several factors: the duration for which the pen is used on the sheet or near it (in the air), the total time to complete a specific handwriting/drawing task and other features based on the number of strokes performed during a task and/or the pressure applied by the pen on the paper, In this research, we have improved upon the classification accuracies as compared with our previous research [15] by using principal component analysis (PCA) and modified fast correlation-based filtering (mFCBF) strategies. However, we had to compromise on the explainability of the results. It was not possible to translate the principal components (PCs) into specific sets of kinematic, temporal and pressure variables for any given handwriting task. Clinicians who tried to apply these findings to their clinical settings could not perform a manual handwriting analysis even though we had provided a list of explainable features in Table II of [15]. This can be done easily for an automatic machine system that classifies moods by using online handwritten tasks.
In this research, we test our system using the EMOTHAW database. This database uses the same software and hardware that we had already used when detecting Parkinson's disease. The main difference is that the EMOTHAW database uses other handwriting/drawing patterns to detect mood states. Therefore, we can use the same features to characterise the user's data.
In our work on Parkinson's disease detection [16], we found it useful to add kinematic and statistical features; therefore, the first contribution of this work is to add these features to the user's features that we used in [17]. The second contribution of this work is the use of a PCA-mFCBF pipeline; in fact, it is probably the most important contribution. In this paradigm, the features are orthogonalised using PCA. Then, the PCs are selected using mFCBF [17]. The third contribution of this work is the use of the automated machine learning (AutoML) approach. We proposed to use AutoML because we had successfully used it in our research on Parkinson's disease. These three contributions allow for achieving a level of accuracy in the results that highly outperformed state of the art results in mood detection [16].
The last contribution of this research is assessing the detection of three mood states and allowing a high level of precision to the clinical psychologists. This was possible because we achieved 100% accuracy when detecting two mood states. Section 2 reviews the theory of PCA, which is one of the key concepts used in this work. Section 3 describes the EMOTHAW dataset, which is used in this research to test our methodology. This section also contains information about the distribution of scores for two and three mood states and the explanation of the overlapping of mood states. Section 4 describes the data captured from the tablet and pen. Section 5 describes the temporal, kinematic, statistical, spectral-domain and cepstral-domain features. This section also includes the augmentation method used in this work to synthetically increase the training dataset. Section 6 describes the feature selection (FS) methodology, which includes the PCA [18][19][20][21], mFCBF [17] and the new proposed PCA-mFCBF pipeline. Section 7 describes the hyperparameters of the front end. Section 8 defines the machine learning (ML) modelling to maximise the accuracy detection task. Section 9 reviews the AutoML concepts and AutoML H2O platform used in this work. Section 10 describes the experiments conducted and their results. Finally, in Section 11, we state our remarks and conclusion.

Principal Component Analysis (PCA)
PCA is useful when multi-colinear vectors are present. PCA can be used to reduce the dimensions and variances of the vectors and to denoise them.
Given the set of possible correlated feature vectors FV, PCA applies an orthogonal transformation to obtain a set of linearly uncorrelated observations: the PCs. This is achieved by projecting the original features into a reduced PCA space using the eigenvectors, which are the PCs of the covariance matrix. The number of principal components obtained by applying PCA is less than or equal to the minimum values between the number of observations O i and the number of features [18][19][20][21]. The resulting projected features are a linear combination of the original features, which capture most of the feature variances. In this transformation, the first component explains the maximum variance in the features, and each subsequent PC explains less of the variance. Most of the useful PCs are dictated by the rank of the matrix.
The variance-covariance matrix is defined as follows: X is real symmetric matrix; therefore, the above expression can be decomposed as where U represents the PCs and is an orthogonal matrix whose columns are eigenvectors of X, and Λ is a diagonal matrix whose entries are the eigenvalues of X [22]. Then, the projected features can be expressed as follows: The PCA transformation corresponds to multiplying the original features X by the transformation matrix U that represents the PCs. In other words, Y can be viewed as a linear regression, that is, each element in Y can be predicted with a linear combination of the original feature vector X weighted for a vector in the matrix U. C Y is a diagonal matrix that is defined as follows: Substituting C X = UΛU T , we obtain which is a diagonal matrix; this implies that all PCs are uncorrelated with one another.
For example, Figure 2 shows the sepal length and sepal width features before and after applying PCA for the well-known Iris dataset [23]. We can see that the first PC variance PC1 is greatly reduced after applying PCA.

EMOTHAW Databases
In [15], the authors defined a database named EMOTHAW, which they abbreviated from the phrase 'emotion recognition from handwriting and drawing'; this database has 129 participants whose mood states have been assessed using the depression, anxiety and stress scale (DASS) questionnaire. For each subject, this database includes raw data recorded through a digitising tablet. DASS has approximately seven handwriting/drawing tasks based on a set of well-assessed tests in the medical domain, namely, clock-drawing test, mini-mental state examination test, house-tree-person test and four other simple tasks [24,25].

The DASS Scale
DASS is a 42-item self-report questionnaire developed in [26]. The Italian version (I-DASS-42) was assessed by Severino [27]. DASS consists of three scales that measure the three related negative mood states of depression, anxiety and stress through a 14-point questionnaire. The score ranges are given in Table 1. The DASS scores help establish a bridge between the tasks and moods because the scores indicate whether stress, depression and anxiety were of a normal, mild, moderate, severe or extremely severe degree. The range values and usual interpretations (labels) for the three mood states are shown in Table 1. The database contained only 129 subjects; therefore, the detailed classification (described in Table 1) would have generated outliers for most of the labels. In [15], the authors adopted a binary classification of mood states for the DASS scores; Table 1 shows the two mood states: normal and above normal. A person who scores higher than 9 on the depression scale is considered to be depressed. A person who scores higher than 7 on the anxiety scale is considered to be anxious. A person who scores higher than 14 on the stress scale is considered to be stressed.
In this paper, we define trinary classification of mood states for the DASS scores; Table 1 shows these three mood states: normal, mild and moderate. A user scoring a value less than 10 on the depression DASS scale is considered to be normal. A person who scores between 10 and 13 is considered to be mildly depressed. A person who scores more than or equal to 14 is considered to be moderately depressed. A user scoring a value of less than 8 on the anxiety DASS scale is considered to be normal; a score of 8 or 9 is considered to be mild depression; and a score higher than or equal to 10 is considered to be above mild depression. A user scoring a value less than 15 in the stress DASS scale is considered to be normal; a score between 15 and 18 is considered to be mild stress; and a score greater than or equal to 19 is considered to be moderate stress.

Subjects
The EMOTHAW database consists of data obtained from 129 participants (71 females and 58 males) aged between 21 and 32 years with a similar demographic background, to ensure controlled experimentation. All the subjects were right-handed and were all students of Università degli Studi della Campania L. Vanvitelli in Italy. The data acquisition protocol consisted of filling in the DASS questionnaire (Italian version), followed by the execution of seven handwriting/drawing tasks described below.

Tasks
The recorded tasks performed by each subject are shown in Table 2. Table 2. Task performed for each user.

Tasks
(1) Drawing a copy of two overlapping pentagons  Figure 3a shows the pentagon-drawing task. Figure 3b shows the house-drawing task. Figure 3c shows an example of the four Italian words in uppercase letters (BIODEGRAD-ABILE (biodegradable), FLIPSTRIM (flipstrim), SMINUZZAVANO (to crumble), CHI-UNQUE (anyone) [15]. Figure 3d,e show examples of the left-and right-hand loop drawings, respectively. Hand loop drawings are the simplest tasks but still provide good information. For example, we can observe users' motor activities in their muscles. Figure 3f shows an example of the cursive writing of a sentence. Figure 3g shows an example of the clock-drawing task.
For data acquisition, the authors of [15] tested the differences between the online and offline data collection methods and concluded that collecting online data had more impact for the research because information, such as the pressure, pen position and time stamp, could be obtained in this way. In their article, they proved the relevance of these characteristics.
The analyses in [15] were based on four signals: time in the air (pen status = 0), time on paper (pen status = 1), the total duration of the task and the number of strokes. Then, the authors selected five tasks and extracted the four abovementioned features for each task; there were a total of 20 features. They compared the contributions of handwriting-based tasks (two tasks = eight features), drawing-based tasks (three tasks = twelve features) and the combination of the two tasks (20 features). The experimentation exploits a random forest model for mood state detection using the leave-percentage-out (LPO) approach that repeats the experiments ten times for each type of task.
In [15], the results showed that the writing-based tasks were less effective than the other tasks in all the considered mood states, particularly on stress. For depression, the drawing-based tasks were the most effective, whereas a combination of drawing and writing tasks was the most effective solution for characterising stress and anxiety.

Distribution of Scores for Two and Three Mood States
The distributions of DASS scores in the EMOTHAW database are shown in Figure 4. For binary labelling, the bars in dark blue are the normal scores, and the bars in yellow and red show the above-normal scores. For binary labelling, the proportions of depression, anxiety and stress were approximately 67%, 58% and 57%, respectively. For trinary labelling, the bars in dark blue are the normal scores, the bars in yellow are the mild scores and the bars in red are the moderate scores. Moreover, these classes are highly unbalanced.

Overlapping of Mood States
The consistency and temporal stability of the DASS scales have been assessed in several studies [15]. In this study, participants performed the writing or drawing tasks as soon as they completed the DASS questionnaire. Therefore, we are confident that the participants performed the tasks under the measured states.
The cross-tables in Figure 5 show that depression, anxiety and stress can be observed separately or simultaneously. The scores have been dichotomised. From the matrices in the second row of Figure 5, we observe that for approximately 20% of the participants (19.4% = 2.3% + 7.8% + 9.3%), a single negative state is observed (such as anxious/non-stressed/non-depressed). For approximately the same percentage of participants (21.8% = 3.1% + 4.7% + 14.0%), two negative mood states were observed simultaneously. However, due to the construction of the scales, each of these mood states can be predicted separately. A Pearson's χ2 test conducted on the anxious/stressed, stressed/depressed and anxious/depressed cross tables showed that the qualitative variables anxiety-stress, stress-depression and depression-anxiety were linked (p-values below 0.01). The strongest link is between anxiety and stress. The next strongest link is between anxiety and depression. The weakest of the three links is that between stress and depression.
We did not analyse the comorbid mood states because the number of available samples in EMOTHAW for certain combinations was too small.  All tasks have a duration of T seconds (N samples).

Data Augmentation
In the EMOTHAW database, the classes for binary and trinary labelling are highly unbalanced. Therefore, to improve accuracy, the training data needed to be augmented. First, we augmented the data such that all the mood states had the same number of observations. Then, we augmented all the mood states using the following steps:

1.
Identify the mood states having few observations, 2.
Calculate the number of samples required to make all the mood state observations of the same size, 3.
Randomly select observations from the original data and 4.
For each selected sample, calculate the new feature vector by adding the Gaussian random noise to the original features: FV u * a = FV u * + α * GV, where FV u * is the feature vector of a random user, and α is a scale value lower than 0.2. GV is a vector with the same dimension as FV u * with the Gaussian random values having mean and variance values of 0 and 1, respectively.

Feature Extraction
The system front-end used in this research is shown in Figure 7. There are two main differences between this research and our previous study on mood state detection [16]. The first difference is the addition of kinematic and statistical features in this study. We added these features because they proved to be effective in increasing the accuracy of the results obtained in our study on Parkinson's disease detection [17]. The second and most important difference is the inclusion of PCA, which is used to orthogonalise all features. The inclusion of PCA as orthogonalisation before selection is novel because, instead of selecting features by using the first PCA's coefficients, we selected features by applying mFCBF on the PC. In other words, we pipelined PCA and mFCBF.

User's Features
In this paper, we will use the following set of feature vectors for different feature sets: where FV u T is the feature vector for the temporal features [16]; FV u T_SD_CD is the feature vector of the temporal, spectral-and cepstral-domain features; FV u a is the feature vector obtained by concatenating the kinematic and statistical features to FV u T_SD_CD ; TF u is the raw vector of the temporal features of the user u as defined in Table 3; KF u is the raw vector of the kinematic features of the user u as defined in Table 4; SF u is the raw vector of the statistical features of the user u, as defined in Table 5; SDF u is the raw vector of the log energy filterbank features of the user u in the spectral domain; and CF u is the raw vector of the cepstral features calculated as the Fourier transform of the log energy of the filterbanks of the user u. All these features are calculated for all the tasks, as follows: T = {spiral, letter l, syllable le, trigramm les, word1, word2, word3, sentence} Velocity of signals in w τ,u (n) a τ,u w (n) Acceleration of signals in w τ,u (n) j τ,u w (n) Jerk of signal in w τ,u (n) Range of signals in g τ,u (n) g τ,u Median of signals in g τ,u (n) ..

g τ,u
Mode of signals in g τ,u (n) ... g τ,u Standard deviation of signals in g τ,u (n) Outlier robust range (99th percentile -1st percentile), applied to all signals in g τ,u (n) Range of signals in g τ,u (n) g τ,u Arithmetic mean of signals in g τ,u (n) = g τ,u Geometric mean of signals in g τ,u (n) k τ,u Kurtosis of signals in g τ,u (n)

Detection Task
The detection task involves identifying the mood state of the user. For binary labelling, we define the mood state for each user as follows: Therefore, we can relate the feature vector F with the corresponding mood state as follows: where F = {T, T_SD_CD, T_K_S_SD_CD}. A dataframe is defined as the union of all users U in FVS u F . Hence, for each feature F, this operation can be expressed using relational algebraic notation as follows: FVS F = ∪ U u=1 FVS u F . In these data frames, the rows represent the number of users, and the columns represent the features and the disease state of the users.
For trinary labelling, we define the mood state for each user as follows: Mild 2 above Mild f or all u = 1 . . . U,

Features for Moods
The EMOTHAW database includes three moods; therefore, the feature vector for each mood is defined as follows: FVS M F , where M = {Depression, Anxiety, Stress}. In our detection task, we detected only the state of one mood at a time.

Feature Selection
It is important to select the right features because data models also learn irrelevant information, which degrades their performance. There are multiple methods to select features. In this research, we test the accuracy performance measures of PCA, mFCBF and the PCA-mFCBF pipeline.

Principal Component Analysis (PCA)
PCA is an orthogonal transformation of the features; it rotates the dimensional axis to maximise variability. PCA returns coefficients in order of importance-the first coefficient has the highest variance representation in the entire set. The higher the selected PC, the higher the representability of the variance of the set of features.
Let us represent the PCA function on the feature vector F and mood M as follows: The maximum number of PCs calculated by PCA is equal to the number of observations (in this case, it is equal to the number of users). The selected PCs P of the entire features F is represented as follows: As stated earlier, although PCA helps minimise features to improve model accuracy, it has a negative effect on the model's explainability, which is a drawback.

Modified Fast Correlation-Based Filtering (mFCBF)
The system front-end calculates many features from a limited number of time signals captured from the tablet's sensor. Beside this correlation, certain features contribute more towards creating an accurate model; mFCBF selects the features that, even when they are correlated, mostly contribute to the increase in model accuracy.
FCBF selection is based on two steps [16]. In the first step, the selected features are those whose correlation with the output are higher than the correlation with the threshold value. The second step takes the features of the first step and selects the features with a correlation less than the threshold value. Algorithm 1 shows the pseudocode of our modified version of the function mFCBF [17]. This modified version differs from the original version in step 5, where the selected feature has high correlation with the output. The mFCBF algorithm receives a data frame and the thresholds oTh and iTh as inputs.
Here, oTh is used to set the lower correlation threshold of each of the selected features and the output; the practical value of this parameter should be greater than 0.2. Also, iTh is used to set the higher correlation value between the features; the practical value of this parameter should be less than 0.2. By sweeping oTh and iTh for a range of values, we can find the right features that maximise the performance of the ML method. This operation for the features F and mood M is expressed as follows: Note that these equations represent 2D arrays, where one dimension represents the number of users, and the other dimension represents the number of selected features. Algorithm 1. The mFCBF algorithm receives the users' feature matrix (O), minimum correlation threshold (oTh) and the maximum correlation threshold (iTh) and returns the selected set of features.

PCA-mFCBF Pipeline
In Sections 6.1 and 6.2, we selected features using PCA or mFCBF, respectively. In this section, we propose to pipeline them by first applying PCA and then applying mFCBF. The PCA step returns all the PCA coefficients. Then, the selection step is performed using mFCBF. The intra-feature variability is already minimised; therefore, this step selects the PC which has a higher correlation with the output.
This PCA-mFCBF pipeline for the features F and mood M is defined as follows: In Section 10, we will prove that orthogonalising before selection with mFCBF is a good strategy to greatly increase the accuracy.
Again, the purpose of mFCBF in this pipeline is to select the PCs' that contribute the most to increasing the model accuracy. Although it is beneficial to orthogonalise features using PCA to improve the model's accuracy, a potential drawback of PCA is its adverse effect on the model's explainability.

Front-End Hyperparameters
Spectral-domain features (SDF) and cepstral-domain features (CDF) are functions of parameters, such as filterbank bandwidth ( f bbw), bandwidth of the filters in the filterbank ( f bw), filterbank initial frequency (i f ) and overlap of the filters in the filterbank (ov). However, FS depends on the feature-output-correlation threshold (oTh) and the intra-feature correlation threshold (iTh). All these parameters depend on the next set of parameters: f bbw, f bw, i f , ov, oTh, iTh. Therefore, we must tune these parameters to optimise the model's accuracy.
The range of values for each of the parameters are defined as follows: iTh range = [0. When mFCBF is used as the FS method, this optimisation can be represented as follows: When the PCA-mFCBF pipeline is used as an FS method, this optimisation can be represented as follows: where M = {Depression, Anxiety, Stress} is the mood; MLm = {SV M, autoML} is the ML model; and F = {φ, T, T_SD_CD, T_K_S_SD_CD} is the feature set used. The empty set notation φ is used when no FS method is used. For the randomness of the augmentation method, we trained and tested the model for different user sets and random sequences, and we selected the maximum accuracy.

AutoML
AutoML, also known as augmented ML, is a methodology that aims to automate the data science pipeline for classification and regression. The AutoML pipeline includes data pre-processing (cleaning, imputing and quality checking), feature engineering (transformation and selection), model selection, evaluation and hyper-parameter optimisation.
H2O includes automatic training and tuning of many models within a user-specified time limit. Stacked ensembles are automatically trained on collections of individual models to produce highly predictive ensemble models.
PyCaret is an end-to-end ML and model management tool that speeds up the experiment cycle exponentially and increases productivity.
The auto-sklearn architecture is an AutoML toolkit and a drop-in replacement for the scikit-learn estimator; it includes algorithm selection and hyper-parameter tuning. It uses Bayesian optimisation, meta-learning and ensemble construction.
TPOT uses genetic programming to determine the best performing ML pipelines, and it is built on top of scikit-learn [38]. It supports feature pre-processing, feature construction and selection, model selection and hyper-parameter optimisation.
MLBox supports fast reading, distributed data processing, FS, leak detection, cleaning, formatting, accurate hyper-parameter optimisation in high-dimensional space and ML algorithms, such as deep learning, stacking, light gradient boosting machine and XGBoost. An important feature is that it includes prediction with the interpretation of models.
AutoML packages are not perfect, and there are packages with different levels of automation. For example, feature engineering is a task in this pipeline for which AutoML has some features, but a lot more work is required to develop and automate this task, especially for sensor data analysis.

AutoML H2O
For data modelling, we used AutoML H2O [28,29], which is a platform that can be used for automating the ML workflow. The automation includes the training and tuning of many models within a user-specified time limit. It also includes hyper-parameter optimisation, which uses a Bayesian approach. Table 6 shows a few of these models, such as variations of trees, random forests, Naïve Bayes, linear models, additive models, deep learning and support vector machines (SVMs). In addition, H2O also includes a model that is an ensemble of all models and is a model for each family of methods. The ensembled models are mostly the ones with better accuracy results.

Experiments and Results
H2O [28,29] is an AutoML package intended to automate the ML pipeline starting from data manipulation to ML parameter optimisation. Table 7 shows the configuration setting that we used for H2O. In this configuration, we limited the maximum expected training time to 200 s to avoid a long processing time. The number of models was set to 15 as a trade-off between the accuracy performance and processing time. We excluded the gradient boosting machine (GBM) modelling because its database did not converge with our database. The number of folds was set to two because when this was increased, the accuracy of the results did not improve, but the process time did. Finally, the stop metric was set to log loss because this was a classification task. LPO was used for testing. PLO is a variation of leave-one-out; however, instead of leaving one element out, the data model in LPO was tested with a percentage of the registers in the database, and we trained with the rest. In our experiments, we repeated this training-testing cycle until we circulated all the possibilities, and we averaged the accuracy values of all tests. In our experiments, we always left out 10%.
Augmentation was controlled by the percentage of augmentation and the amplitude of the Gaussian random noise applied to the original signal. In this research, we first augmented the data such that all the mood states had the same number of observations. Then, we augmented all the mood states by 80% using the Gaussian noise with 0 and 1 as the mean and variance values, respectively. The Gaussian random noise amplitude was multiplied by 0.2. Table 8 shows the accuracy results for the binary detection moods for the temporal feature F = TF, for the ML models MLm = {SV M, aML} and the moods M = {Depression, Anxiety, Stress}. In this table, we can observe that the accuracy results with AutoML are much higher than the results obtained using SVM [16]. This can be explained because AutoML simultaneously evaluates more than 15 classification algorithms.  Table 9 shows the accuracy results for the features F = T_SD_CD for the ML models MLm = {SV M, aML} and for the moods M = {Depression, Anxiety, Stress}. The second and third columns show the accuracy results obtained when PCA is used as the FS method. We can see that accuracy results with AutoML are much higher than the results obtained with SVM. The fourth and fifth columns of Table 9 show the accuracy results obtained when mFCBF is used as the FS method. Clearly, the accuracy results with AutoML are much higher than the results obtained with SVM. Also, the accuracy results obtained when using mFCBF as the FS method (in the fourth and fifth columns) are much higher than the accuracy results obtained with PCA (in the second and third columns).
Finally, sixth and seventh columns of Table 9 show the accuracy results when the PCA-mFCBF pipeline is applied as the FS method. The accuracy results with AutoML are much higher than the results obtained with SVM. Also, the accuracy results when using the PCA-mFCBF pipeline as the FS method (in the sixth and seventh columns) are much higher than the accuracy results obtained using PCA (in the second and third columns) or mFCBF (in the fourth and fifth columns). Figure 8 shows the selected feature for each task after applying the PCA-mFCBF pipeline. These features are a mixture of the original PCs and are not necessarily the first ones, which are normally selected when PCA is used for FS. This mixture of PCs is because the first step in mFCBF selects the PCs which have a high correlation with the output. Then, the PCs with low correlation are selected.   The fourth and fifth columns of Table 10 show the accuracy results when mFCBF is used as the FS method. The accuracy results with AutoML, when concatenating the kinematic and statistical features, improved 4.38%, 7.02% and 14.04% for depression, anxiety and stress, respectively. Again, the accuracy of the results is higher when the kinematic and statistical features are concatenated. The accuracy of the results when using mFCBF as the FS method (in the fourth and fifth columns) is much higher than the accuracy of the results that are obtained with PCA (in the second and third columns).
The sixth and seventh columns of Table 10 show the accuracy of the results when the PCA-mFCBF pipeline is used as the FS method. The accuracy results with AutoML, when concatenating the kinematic and statistical features, improved 3.5%, 7.89% and 11.41% for depression, anxiety and stress, respectively.
Clearly, the accuracy of the results when using the PCA-mFCBF pipeline as the FS method (in the sixth and seventh columns) is much higher than the accuracy of the results obtained when using PCA (in the second and third columns) or mFCBF (in the fourth and fifth columns). Table 11 shows the accuracy of the results for the trinary classification for the features F = T_K_S_SD_CD for the ML model MLa = autoML and for the moods M = {Depression, Anxiety, Stress}. The second columns is the accuracy of the results when PCA is used as the FS method. The accuracy of the results is higher for the PCA-mFCBF pipeline; the second-best accuracy is obtained with mFCBF; and the third-best accuracy is obtained when using PCA. The accuracy is the worst when no selection feature is used. Therefore, the behavioural biometrics with the trinary classification is identical to the behavioural biometrics with the binary classification.

Conclusions
In this study, we propose the merging of the temporal, kinematic, statistical, spectraland cepstral-domain features to detect the mood state. We found that adding the kinematic and statistical features improved the results.
We also proposed to use PCA-mFCBF for FS. PCA is used for orthogonalising features before applying mFCBF to select the features. When using the PCA-mFCBF pipeline, we found that the experimental results were substantially superior to the results obtained when only PCA or mFCBF was used.
The best performance was obtained when adding the kinematic and statistical features pipelined with PCA-mFCBF. The second-best performance was that of the AutoML H2O platform for data modelling. This makes sense because AutoML H2O [28,29] includes the Bayesian hyper-parameter optimisation and model assessment.
The experiment results proved that this pipeline strategy for FS and PCA-mFCBF substantially increases the accuracy results and even reaches 100% in the binary classification of our task.
We also classified data into three categories and developed a few experiments using all the described features. Then, we used PCA-FCBF as the FS method and modelled using the AutoML H2O platform. The accuracy results for trinary detection were 82.45%, 72.8% and 74.56% for depression, anxiety and stress, respectively. Also, we found that the results for the trinary detection were not as impressive as the results obtained for binary detection.