Hybrid System for Engagement Recognition During Cognitive Tasks Using a CFS + KNN Algorithm

Engagement is described as a state in which an individual involved in an activity can ignore other influences. The engagement level is important to obtaining good performance especially under study conditions. Numerous methods using electroencephalograph (EEG), electrocardiograph (ECG), and near-infrared spectroscopy (NIRS) for the recognition of engagement have been proposed. However, the results were either unsatisfactory or required many channels. In this study, we introduce the implementation of a low-density hybrid system for engagement recognition. We used a two-electrode wireless EEG, a wireless ECG, and two wireless channels NIRS to measure engagement recognition during cognitive tasks. We used electrooculograms (EOG) and eye tracking to record eye movements for data labeling. We calculated the recognition accuracy using the combination of correlation-based feature selection and k-nearest neighbor algorithm. Following that, we did a comparative study against a stand-alone system. The results show that the hybrid system had an acceptable accuracy for practical use (71.65 ± 0.16%). In comparison, the accuracy of a pure EEG system was (65.73 ± 0.17%), pure ECG (67.44 ± 0.19%), and pure NIRS (66.83 ± 0.17%). Overall, our results demonstrate that the proposed method can be used to improve performance in engagement recognition.


Introduction
Engagement is defined as a cognitive process involving decision making, information gathering, visual scanning, and selectively sustaining attention on a specific event while ignoring other external influences [1]. Knowing a person's level of engagement is important in order to gain good performance on a specific task [2]. Billeci et al. [3] adapted a wearable integrated electroencephalograph (EEG) and electrocardiograph (ECG) for measuring the change of neurophysiological and autonomic activity between engagement and disengagement states, in subjects exhibiting autism spectrum disorder. By extracting quantitative EEG (QEEG) features from an EEG signal, as well as heart rate and heart rate variability (HRV) from an ECG, they found evidence of differing activity in the engagement and disengagement states, in both the EEG and ECG. Bierre et al. [4] applied near-infrared spectroscopy (NIRS) to assess anterior frontal hemodynamic responses to engagement during three cognitive tasks. In their study, they presented evidence of age-related anterior frontal hemodynamic changes with cognitive demands. The classification of engagement based on self-reporting and behavior-based information tends to be delayed, sporadic, and intrusive. Performance-based information can be misleading since multiple degrees of tasks could be grouped together with the same level of performance [5]. Conversely, physiological measures do not require overt behavior, can be arranged to have little or no interference with task execution, and can supply information continuously without significant delay. Hussain et al. [6] investigated the activity of physiological signals and facial responses to cognitive load under an emotional stimulus and collected participant ratings from a self-assessment manikin to find the normative ratings in the collection. They investigated the correlation between physiological data and the level of stimulation. They also subsequently compared the accuracy of cognitive load detection with face video features, physiological features, and participant rating features with fusion features. They concluded that classification with fusion features (i.e., not only based on self-report) performed with more accuracy. Furthermore, measuring human mental states based on physiological activity has also been investigated by Stikic et al. [1]. By integrating EEG and ECG features, they applied an unsupervised method for cognitive state recognition. However, the unsupervised learning requires large amounts of data to get an appropriate pattern. In our study, data labeling relied on physiological activities. We used electrooculograms (EOG) and eye tracking to record eye movements, such as blink rate and pupillometry of participants. Several studies demonstrate that blink rate and pupillometry are correlated with engagement components [7][8][9][10][11][12][13]. The information from the EOG and eye tracker will be used for labeling data in the model data. We classified high engagement level and low engagement level.
We are currently implementing a hybrid technology system that can be used to measure engagement recognition levels during cognitive tasks. Several studies on hybrid system have mentioned their promising characteristics. Ahn et al. [14] have suggested computational integration methods. In their review, they mentioned the multimodal systems of EEG and NIRS, which are high-density types. Hong et al. [15] focused on the utility of the integration between EEG and NIRS for locked-in syndrome patients. They mentioned that the proper selection of features will improve the accuracy of classification. In our study, we investigate the best features and the most common features that can be used in engagement recognition; the difference lies in the approach of the study. Ahn et al. [16] combined EEG, ECG, and NIRS by using 68 electrodes for EEG, ECG, and EOG and 8 channels in the NIRS in simulated driving. In our study, we use a two-electrode EEG, an ECG, and two channels in the NIRS. All mentioned sensors are wireless sensors. Our previous work [17,18] used this system for monitoring the cognitive state in children with developmental disorders during a 7 year training period. This time we would like to do engagement recognition of the low-density hybrid system.
We investigate nine types of linear and nonlinear features from EEG, ECG, and NIRS to find the most common features that can be used in engagement recognition. The investigation of linear and nonlinear features has been previously studied for mental state recognition but in stand-alone systems, such as only for EEG or ECG [19][20][21]. In our study, we tried to adapt these features in the hybrid system. This step was improved by combining the feature selector and classifier. We used the correlation-based feature selection (CFS) introduced by Hall [22] as the feature selector and k-nearest neighbor (KNN) as the classifier, following several comparisons with other classifiers. Although a CFS and KNN combination (CFS + KNN) algorithm with two types searching method (i.e., best first search and greedy stepwise search) has been used by Hu et al. [23], our study applied a CFS + KNN algorithm in a low-density hybrid system and only used one searching algorithm. We decided to use only one search method (i.e., best first search) after demonstrating that this searching method had the highest performance. To our knowledge, this is the first study to employ a hybrid system (EEG, ECG, and NIRS) with fewer than 10 channels and to apply a KNN classifier with a CFS feature selector for engagement recognition.

Participants
There were 18 participants in our experiment. All participants were Kyushu University students, with ages ranging from 21 to 28 (24.3 ± 2.3). All participants had normal visual function and were free of disability; 16 were right handed, and two participants were left handed. Participants were instructed not to consume any caffeine 2 h before the experiment because it could affect the HRV [24,25]. The study was conducted in accordance with the ethical principles of Kyushu University and the Declaration of Helsinki. Written informed consent was obtained from each participant before the experiment.

Engagement Tasks
The experiment was done between 10:30 a.m. and 1:30 p.m. Testing took place in a dimly lit room. We also recorded the behavior activities using a webcam camera (Logicool C270, Logitech, Lausanne, Switzerland), which was located in front of the participant's face. Three types of engagement task were used: backward digit span (BDS) [2], forward digit span (FDS) [2], and arithmetic. These tasks are consisted of three level. Level one consisted of series 30 sets of four digits, level two: 30 sets of five digits and level three: six digits. Most of the questions in this experiment were relatively simple and did not require any prerequisite knowledge or specific skills. However, a good level of attention and alertness was required to avoid making easy mistakes because the response time was limited to 20 s. Each trial started with the presentation of a central, white fixation dot on a dark background until the participant's eyes could be accepted by the eye tracker. Next, cognitive questions (i.e., encoding session) would appear for 10 s and the participant was instructed to respond within 20 s. All cognitive tasks were counterbalanced. The measurement was recorded after the practice session finished.

BDS (Backward Digit Span)
In this task, digits would appear within 10 s and participants were asked to type the digits backward in reverse order. The task can be seen in Figure 1.

Participants
There were 18 participants in our experiment. All participants were Kyushu University students, with ages ranging from 21 to 28 (24.3 ± 2.3). All participants had normal visual function and were free of disability; 16 were right handed, and two participants were left handed. Participants were instructed not to consume any caffeine 2 h before the experiment because it could affect the HRV [24,25]. The study was conducted in accordance with the ethical principles of Kyushu University and the Declaration of Helsinki. Written informed consent was obtained from each participant before the experiment.

Engagement Tasks
The experiment was done between 10:30 a.m. and 1:30 p.m. Testing took place in a dimly lit room. We also recorded the behavior activities using a webcam camera (Logicool C270, Logitech, Lausanne, Switzerland), which was located in front of the participant's face. Three types of engagement task were used: backward digit span (BDS) [2], forward digit span (FDS) [2], and arithmetic. These tasks are consisted of three level. Level one consisted of series 30 sets of four digits, level two: 30 sets of five digits and level three: six digits. Most of the questions in this experiment were relatively simple and did not require any prerequisite knowledge or specific skills. However, a good level of attention and alertness was required to avoid making easy mistakes because the response time was limited to 20 s. Each trial started with the presentation of a central, white fixation dot on a dark background until the participant's eyes could be accepted by the eye tracker. Next, cognitive questions (i.e., encoding session) would appear for 10 s and the participant was instructed to respond within 20 s. All cognitive tasks were counterbalanced. The measurement was recorded after the practice session finished.

BDS (Backward Digit Span)
In this task, digits would appear within 10 s and participants were asked to type the digits backward in reverse order. The task can be seen in Figure 1.

FDS (Forward Digit Span)
In forward digit span, digits would appear within 10 s and the participant is asked to type the digits in the forward order. The task can be seen in Figure 2.

FDS (Forward Digit Span)
In forward digit span, digits would appear within 10 s and the participant is asked to type the digits in the forward order. The task can be seen in Figure 2.

Arithmetic
For the arithmetic task, a number would appear within 10 s together with a letter. The participants were asked to calculate operations using just the number. The question would appear together with the blank forms, and the participants were asked to type the answer within 20 s. The task can be seen in Figure 3.

Software and Apparatus
Stimuli were presented on a 17-inch CRT monitor (1024 × 768). Testing took place in a dimly lit room. Stimuli presentation was done using OpenSesame [26], using the legacy back-end for the display control and the PyGaze toolbox [27] for the eye tracker.

Eye Tracking
Before the start of each task, participants were positioned in front of an eye tracker (The EyeTribe tracker version 1, Copenhagen, Denmark). The distance of the participants' eyes from The EyeTribe was estimated to be ~57 cm. The participants were asked to fix their heads on a chin rest. Eight participants were successfully calibrated in the 60 Hz mode, and three participants were successfully calibrated in the 30 Hz mode. In this study, we calibrated and validated the eye tracking system to each participant using a nine-point dot matrix. After validation, the eye tracker that had been embedded with the OpenSesame software labeled each calibration point with the error in the degree of the visual angle between the calibrated and validated measures. If the calibration points do not exceed 1° and the greatest single point error does not exceed 1°, the process would continue. Before each trial, a one-point eye tracker recalibration was performed.

Electrophysiology
In this study, EEG, EOG, and ECG (Polymate Mini AP 108, Miyuki Giken Co., Ltd., Kasugai-city, Japan) signals were sent by Bluetooth to a computer. The frequency of sampling was 500 Hz. To evaluate engagement recognition during a cognitive task, we recorded EEG at the Fz and

Arithmetic
For the arithmetic task, a number would appear within 10 s together with a letter. The participants were asked to calculate operations using just the number. The question would appear together with the blank forms, and the participants were asked to type the answer within 20 s. The task can be seen in Figure 3.

Arithmetic
For the arithmetic task, a number would appear within 10 s together with a letter. The participants were asked to calculate operations using just the number. The question would appear together with the blank forms, and the participants were asked to type the answer within 20 s. The task can be seen in Figure 3.

Software and Apparatus
Stimuli were presented on a 17-inch CRT monitor (1024 × 768). Testing took place in a dimly lit room. Stimuli presentation was done using OpenSesame [26], using the legacy back-end for the display control and the PyGaze toolbox [27] for the eye tracker.

Eye Tracking
Before the start of each task, participants were positioned in front of an eye tracker (The EyeTribe tracker version 1, Copenhagen, Denmark). The distance of the participants' eyes from The EyeTribe was estimated to be ~57 cm. The participants were asked to fix their heads on a chin rest. Eight participants were successfully calibrated in the 60 Hz mode, and three participants were successfully calibrated in the 30 Hz mode. In this study, we calibrated and validated the eye tracking system to each participant using a nine-point dot matrix. After validation, the eye tracker that had been embedded with the OpenSesame software labeled each calibration point with the error in the degree of the visual angle between the calibrated and validated measures. If the calibration points do not exceed 1° and the greatest single point error does not exceed 1°, the process would continue. Before each trial, a one-point eye tracker recalibration was performed.

Electrophysiology
In this study, EEG, EOG, and ECG (Polymate Mini AP 108, Miyuki Giken Co., Ltd., Kasugai-city, Japan) signals were sent by Bluetooth to a computer. The frequency of sampling was 500 Hz. To evaluate engagement recognition during a cognitive task, we recorded EEG at the Fz and

Software and Apparatus
Stimuli were presented on a 17-inch CRT monitor (1024 × 768). Testing took place in a dimly lit room. Stimuli presentation was done using OpenSesame [26], using the legacy back-end for the display control and the PyGaze toolbox [27] for the eye tracker.

Eye Tracking
Before the start of each task, participants were positioned in front of an eye tracker (The EyeTribe tracker version 1, Copenhagen, Denmark). The distance of the participants' eyes from The EyeTribe was estimated to be~57 cm. The participants were asked to fix their heads on a chin rest. Eight participants were successfully calibrated in the 60 Hz mode, and three participants were successfully calibrated in the 30 Hz mode. In this study, we calibrated and validated the eye tracking system to each participant using a nine-point dot matrix. After validation, the eye tracker that had been embedded with the OpenSesame software labeled each calibration point with the error in the degree of the visual angle between the calibrated and validated measures. If the calibration points do not exceed 1 • and the greatest single point error does not exceed 1 • , the process would continue. Before each trial, a one-point eye tracker recalibration was performed.

Electrophysiology
In this study, EEG, EOG, and ECG (Polymate Mini AP 108, Miyuki Giken Co., Ltd., Kasugai-city, Japan) signals were sent by Bluetooth to a computer. The frequency of sampling was 500 Hz. To evaluate engagement recognition during a cognitive task, we recorded EEG at the Fz and Pz, referenced at A1. These areas are highly correlated in cognitive activities [28][29][30]. The ECG was recorded on the chest (2-lead placement) [1,17,18]. We chose this position for the ECG to reduce the effect of artifact movements when the participant responded to the tasks. We also put two electrodes for a vertical EOG. Figure 4 shows the electrode placements. effect of artifact movements when the participant responded to the tasks. We also put two electrodes for a vertical EOG. Figure 4 shows the electrode placements.

Near-Infrared Spectroscopy
A spatially resolved continuous wave NIRS system (PocketNIRS; DynaSense Inc., Hamamasu, Japan) was placed symmetrically to measure hemodynamic activity from the prefrontal region (Fp1 and Fp2). A black tensor bandage was wrapped around the subject's head to prevent light from entering the sensors. The NIRS signal was sent via Bluetooth to the computer. This NIRS had wavelengths of 735, 810, and 850 nm. The frequency sampling was 10.2 Hz. The NIRS position is shown in Figure 4.

Analysis for Engagement Recognition
During the answer session, participants would shift their gaze to the keyboard. This condition would also cause artifact movement. So, to ensure high-quality data, we only analyzed the encoding session. The details of our analysis design are as follows.

Pupillometry
Pupillometry is concerned with changes in pupil size. The diameter of the pupil size has long been known as a marker of cognitive load and attentional performance. A study by Van Den et al. [7] mentioned that pupil size could be used to track the focus of attention. In this study, we analyzed pupillometry using a handmade program written in Matlab 2017b. We applied a bandpass filter to avoid from high frequency and calculated the mean value and Z score, as shown in Equation (1): Zpupillometry = (μbaseline − μptask)/sdptask Zpupillometry = Z score of pupillometry μbaseline = mean baseline μptask = mean of pupil size during the encoding time sdptask = standard deviation of pupil size during executing the tasks (1)

Blinking Rates
Blinking has been correlated with cognitive activity [15]. In this study, eye blinks were detected with vertical EOG. To analyze the EOG signal, we used MATLAB 2017b. We performed baseline drift removal. The EOG signal is characterized by a frequency range of 0.1 to 20 Hz, and the amplitude lies between 25 and 3500 μV. We applied a bandpass filter from 0.1 to 20 Hz. We selected

Near-Infrared Spectroscopy
A spatially resolved continuous wave NIRS system (PocketNIRS; DynaSense Inc., Hamamasu, Japan) was placed symmetrically to measure hemodynamic activity from the prefrontal region (Fp1 and Fp2). A black tensor bandage was wrapped around the subject's head to prevent light from entering the sensors. The NIRS signal was sent via Bluetooth to the computer. This NIRS had wavelengths of 735, 810, and 850 nm. The frequency sampling was 10.2 Hz. The NIRS position is shown in Figure 4.

Analysis for Engagement Recognition
During the answer session, participants would shift their gaze to the keyboard. This condition would also cause artifact movement. So, to ensure high-quality data, we only analyzed the encoding session. The details of our analysis design are as follows.

Pupillometry
Pupillometry is concerned with changes in pupil size. The diameter of the pupil size has long been known as a marker of cognitive load and attentional performance. A study by Van Den et al. [7] mentioned that pupil size could be used to track the focus of attention. In this study, we analyzed pupillometry using a handmade program written in Matlab 2017b. We applied a bandpass filter to avoid from high frequency and calculated the mean value and Z score, as shown in Equation (1): Z pupillometry = (µ baseline − µ ptask )/sd ptask Z pupillometry = Z score of pupillometry µ baseline = mean baseline µ ptask = mean of pupil size during the encoding time sd ptask = standard deviation of pupil size during executing the tasks (1)

Blinking Rates
Blinking has been correlated with cognitive activity [15]. In this study, eye blinks were detected with vertical EOG. To analyze the EOG signal, we used MATLAB 2017b. We performed baseline drift removal. The EOG signal is characterized by a frequency range of 0.1 to 20 Hz, and the amplitude lies between 25 and 3500 µV. We applied a bandpass filter from 0.1 to 20 Hz. We selected the detected peak at more than 50 µV as the criterion [12] for eye blinking. After that, we calculated the Z score on the data. Equation (2) shows how to calculate the Z score: Z blinkrate = (µ baseline − µ blinkratetask )/sd blinkrate Z blinkrate = Z score of blink rate µ blinkratetask = mean of blink rate during encoding time µ baseline = mean baseline sd blinkrate = standard deviation of blink eyes during executing the tasks (2)

Engagement Recognition
After calibrating participant eye movements, a 90 s baseline was recorded for each participant [13]. During this time, the participants were asked to relax; it was a neutral situation. After the data were collected, we performed offline analysis. In our study, data labeling is classified into two classes: high engagement and low engagement. We used supervised learning to do the data mining.

Data Labeling for Training Data
In our study, we adapted a supervised learning method, which made data labeling a crucial part for definition. Data labeling in this study was based on eye blinks and pupil size. Blinking rates were recorded from the EOG, and pupil size was recorded by the eye tracker. We used the Z score from Formulas (1) and (2). Afterwards, we assigned a point to every datum based on the criteria in Table 1. Table 1. Scoring criteria engagement index based on Z score.

Point
Pupil Blinking Following the scoring, we divided the data into high engagement and low engagement. If the total score between the pupil and blinking indices were greater than 0, we classified the datum as high engagement, and if it was less than 0, we classified it as low engagement.

Feature Extraction
In our features, we combined the results from the nonlinear and linear analysis. In total, there were 59 features in this study (i.e., 34 features from EEG, 7 features from ECG, and 18 features from NIRS). The feature extraction was calculated using a handmade program written in MatLab 2017b. Table 2 shows the types of features that we used in this study. From the linear parameters, we used the Hjorth parameter to investigate the signals (y(t)) (EEG, ECG, NIRS), based on their activity (Equation (3)), mobility (Equation (4)), and complexity (Equation (5)). The Hjorth parameter has been used in several EEG studies [19,31,32]. Mostly it was used because this parameter is of minimal complexity and calculated in real time. In our study, we used it for the EEG, ECG, and NIRS: activity = var(y(t)) (3) mobility = var d y (t) /dt /(var(y(t))) (4) complexity = (Mobility((d y (t))/dt)/(mobility(y(t))) The Kolmogorov complexity is an effective way to calculate signal complexity [32]. We used this parameter on the EEG and ECG signals. The EEG signal was calculated in 10 s windows. A third-order Butterworth filter with a frequency cutoff of 0.5 to 65 Hz has been used to filter the data. After applying the filter, Wavelet Daubichies-8 was applied in order to get the value of power spectral density. After obtaining the power of the EEG oscillation, the relative power of each band (i.e., theta (θ) (Equation (6)), alpha (α) (Equation (7)), beta (β) (Equation (8)), and gamma (Equation (9)) were computed from each electrode: Relative θ = (power θ)/(power θ + power α + power β + power γ) Relative α = (power α)/(power θ + power α + power β + power γ) Relative β = (power β)/(power θ + power α + power β + power γ) Relative γ = (power γ)/(power θ + power α + power β + power γ) We calculated the change of RR interval of the ECG for every 10 s window. Then, we calculated the value of HRV activity by using a fast Fourier transform. In this study, we calculated the high frequency (HF) ECG component to measure the level of parasympathetic nerve activity in the autonomic nervous system. The HF component can be found from 0.15 to 0.4 Hz. Heart rate (HR) was also calculated, using Equation (10). The maximum power spectral density and power density integral were calculated after obtaining the value of the power spectral density: HR = 60/((median(RR interval))/(frequency sampling)) We calculated the spectral entropy from EEG and ECG data in a nonlinear domain. This allows the system to be quantified using the rate of information loss or generation and calculated the system randomness, regularity, and predictability.

Predictive Modeling
We used Weka 3.8 [32] data mining for machine learning. We divided the dataset into two datasets, using 70% from each participant as the training set; testing sets were chosen as a contiguous 30% portion, from each participant's dataset. In our study, the total sample from each participant was 261 samples, with approximately 182 samples for the training data. The total sample for our training data exceeded 2000 samples. For the first step, we investigated the best validation method by comparing cross-validation methods with several folds; we also investigated the validation methods according to the "leave one participant out" cross-validation (LOSOXV) and hold out validation. Afterwards, we investigated the machine learning algorithm when the system, by using a feature selector and omitting selectors, and investigated when this feature selector is combined with several classifiers such as KNN and support vector machine (SVM). After we found the best performing algorithm, we applied it in the comparison between the hybrid and stand-alone system for engagement recognition. An overview of the analysis system is shown in Figure 5. for our training data exceeded 2000 samples. For the first step, we investigated the best validation method by comparing cross-validation methods with several folds; we also investigated the validation methods according to the "leave one participant out" cross-validation (LOSOXV) and hold out validation. Afterwards, we investigated the machine learning algorithm when the system, by using a feature selector and omitting selectors, and investigated when this feature selector is combined with several classifiers such as KNN and support vector machine (SVM). After we found the best performing algorithm, we applied it in the comparison between the hybrid and stand-alone system for engagement recognition. An overview of the analysis system is shown in Figure 5. The feature selector is important for reducing the time needed to find the best features to be used in a study case [5,22], and it also can increase the accuracy of classification. The CFS is a simple filter algorithm that ranks feature subsets according to a correlation-based heuristic evaluation function [5]. This feature selector calculates features that are highly correlated with the class and uncorrelated with each other. Irrelevant features would be ignored because they have low correlation with the class. Redundant features would be screened out, as they are highly correlated The feature selector is important for reducing the time needed to find the best features to be used in a study case [5,22], and it also can increase the accuracy of classification. The CFS is a simple filter algorithm that ranks feature subsets according to a correlation-based heuristic evaluation function [5]. This feature selector calculates features that are highly correlated with the class and uncorrelated with each other. Irrelevant features would be ignored because they have low correlation with the class. Redundant features would be screened out, as they are highly correlated with one or more of the remaining feature. The acceptance of a feature will depend on the extent to which it predicts classes in an area if the instance space was not already predicted by other features. equations (11) through (13) show the calculation process: M s = (k(r cf ))/ √ (k + k(k − 1)(r ff )) (11) Sensors 2018, 18, x 9 of 16 with one or more of the remaining feature. The acceptance of a feature will depend on the extent to which it predicts classes in an area if the instance space was not already predicted by other features. equations (11) through (13) show the calculation process: Ms = (k(rcf))/√(k + k(k − 1)(rff)) (11) CFS = max¦s [(r(cf1) + r(cf2) +⋯+ r(cfk))/√(〖k + 2(r(f1f2) + ⋯+ r(fifj) +⋯+ r(fkf1)))] This study used a KNN algorithm, which is an approach for data classification that estimates the probability that a data point belongs to one group or another, depending on the group membership of the data points nearest to it. The general details of the predictive learning in our study can be seen in Figure 6. In this study, the accuracy of performance was calculated on the basis of several variables, such as the true positive (TP) rate. The TP rate was calculated as the proportion of cases that were correctly classified as class high or low among all cases that are truly of the same corresponding class, i.e., the extent to which part of the classes was captured. The TP rate value is also equivalent to the recall. We also calculated the false positive (FP) rate. The FP rate is the proportion of cases that were classified as class high or low but belong to a different class, among all cases that are not of class high or low. The precision is the proportion of the cases that truly have a class low or high among all those that were classified as class high or low. The recall (i.e., sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. The F-measure is simply twice time value of the precision and recall divided by the sum value of precision and recall. We also investigated the area under a receiver operating curve (ROC) for our performance evaluation.

Classification Algorithm
We adopted several classifiers and investigated the best verification strategy for evaluating recognition performance. We also investigated the best feature selector that decreases the time combination for finding the best features. The feature selection was conducted on the training set, M s = Heuristic "merit" of a feature subset, S, containing k features. r cf = Mean feature-class correlation. r ff = Average feature-feature intercorrelation. k = Number features.
This study used a KNN algorithm, which is an approach for data classification that estimates the probability that a data point belongs to one group or another, depending on the group membership of the data points nearest to it. The general details of the predictive learning in our study can be seen in Figure 6. with one or more of the remaining feature. The acceptance of a feature will depend on the extent to which it predicts classes in an area if the instance space was not already predicted by other features. equations (11) through (13) show the calculation process: Ms = (k(rcf))/√(k + k(k − 1)(rff)) (11) CFS = max¦s [(r(cf1) + r(cf2) +⋯+ r(cfk))/√(〖k + 2(r(f1f2) + ⋯+ r(fifj) +⋯+ r(fkf1)))] This study used a KNN algorithm, which is an approach for data classification that estimates the probability that a data point belongs to one group or another, depending on the group membership of the data points nearest to it. The general details of the predictive learning in our study can be seen in Figure 6. In this study, the accuracy of performance was calculated on the basis of several variables, such as the true positive (TP) rate. The TP rate was calculated as the proportion of cases that were correctly classified as class high or low among all cases that are truly of the same corresponding class, i.e., the extent to which part of the classes was captured. The TP rate value is also equivalent to the recall. We also calculated the false positive (FP) rate. The FP rate is the proportion of cases that were classified as class high or low but belong to a different class, among all cases that are not of class high or low. The precision is the proportion of the cases that truly have a class low or high among all those that were classified as class high or low. The recall (i.e., sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. The F-measure is simply twice time value of the precision and recall divided by the sum value of precision and recall. We also investigated the area under a receiver operating curve (ROC) for our performance evaluation.

Classification Algorithm
We adopted several classifiers and investigated the best verification strategy for evaluating recognition performance. We also investigated the best feature selector that decreases the time combination for finding the best features. The feature selection was conducted on the training set, In this study, the accuracy of performance was calculated on the basis of several variables, such as the true positive (TP) rate. The TP rate was calculated as the proportion of cases that were correctly classified as class high or low among all cases that are truly of the same corresponding class, i.e., the extent to which part of the classes was captured. The TP rate value is also equivalent to the recall. We also calculated the false positive (FP) rate. The FP rate is the proportion of cases that were classified as class high or low but belong to a different class, among all cases that are not of class high or low. The precision is the proportion of the cases that truly have a class low or high among all those that were classified as class high or low. The recall (i.e., sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. The F-measure is simply twice time value of the precision and recall divided by the sum value of precision and recall. We also investigated the area under a receiver operating curve (ROC) for our performance evaluation.

Classification Algorithm
We adopted several classifiers and investigated the best verification strategy for evaluating recognition performance. We also investigated the best feature selector that decreases the time combination for finding the best features. The feature selection was conducted on the training set, and then the performance was evaluated on the test set. This procedure was iterated until each participant's data had been tested. This strategy can eliminate the risk of overfitting. Due to technical problems and participant health conditions, 11 participants were analyzed in this study. The sample statistics of engagement level can be seen in Table 3. The sample statistic for the high engagement level is higher than for the low engagement level. The percentage of low engagement level classes is 38% and that of high engagement is 62%. Because the data were imbalanced, we used balancing filter in Weka 3.28 [33], to balance the sample data. This filter reweighs the instances in the data so that each class has the same total weight. The total sum of weights across all instances will be maintained. Only the weights in the first batch of data received by this filter were changed. The result can be seen in Table 4. The balancing filter increased the weight of sample numbers for the low class and decreased the number of weight in the high class, resulting in the percentage in each class becoming 50%. To ensure the effect of data balancing, we calculated the accuracy when all 59 features were used and also when only selected features were used and compared the results ( Table 5). The use of a balancing filter and the combination of the CFS and KNN have shown the highest accuracy, compared with other methods. We further examined the usability of the feature selector of the CFS and its search method with other classification algorithms. Table 6 shows the comparison between the CFS and classifiers. In this step, we tried to compare between the combinations of a CFS and a support vector machine (SVM), and CFS + KNN, with several values of k. From the results, we conclude that the combination of CFS and KNN, with a k value of 9, performed the best.

Comparison of Stand-Alone and Hybrid Systems
After determining the best algorithm, we adopted that algorithm in the hybrid system. We calculated the recognition classification from each class. As shown in Table 7, the precision (low = 0.735, high = 0.771) for each class became the highest in the hybrid system, from among the others, during initial training. Other performance calculations in initial training, such as the ROC area, have shown that the hybrid system has the highest value compared with stand-alone systems. We evaluated the accuracy when the algorithm was applied in stand-alone system by using the CFS + KNN algorithm. As shown in Table 8, the hybrid system achieved the highest accuracy (71.65 ± 0.16%). We tested our data by comparing the results of the SVM and KNN + CFS classifier, which extended our final choice for engagement recognition with the hybrid system. From our investigation, the standard deviation of the SVM, among participants, (SD = 0.2) is higher than that of the KNN method (SD = 0.16). The details can be seen in Table 9.

Discussion
In this study, we explored a novel way of combining EEG, ECG, and NIRS with a low-density of electrodes/channels for engagement recognition. The combination of these three different approaches is commonly termed a hybrid system. The integration of NIRS and EEG is complementary, because they enable simultaneous analysis of the neuronal and hemodynamic components of brain activity and do not interfere with each other [34,35]. Hybrid systems, especially low-density hybrid systems, could be effective for engagement recognition under study conditions. They are also practical, especially in a naturalistic condition. In a previous study, we implemented a hybrid system to study intellectual disability child during cognitive training [17,18].
In this study, we sought to implement a low-density hybrid system for engagement recognition during cognitive tasks. To reach our goal, there were several issues we solved, for example, the problem of data imbalance during data labeling. The effect of data imbalance could cause overfitting, and this imbalance was solved by using the class balancer in Weka 3.8 data mining. This method was chosen because other methods, such as under sampling, would prevent the classifier from learning the character of excluded data instances [29]. Afterwards, we also investigated the best validation method to be adopted in this study, in the hope of minimizing the overfitting. We chose the cross-validation method after testing a 10-fold, 3-fold, 5-fold, leave one participant out (LOSOXV), and hold out methods; the result showed that the accuracy in the 10-fold cross-validation method is higher than other classifiers. On the other hand, cross validation method also saving the time of validation. Hu et al. [23] also mentioned cross-validation is the best method for validation, and they used 3-fold cross-validation. In our study, we chose a 10-fold cross-validation rather than a 3-fold cross-validation because it allows the training set to contain 90% of the data instances and the validation set contains the other 10%. With cross-validation, our trained model did not over fit to a specific training subset but rather had the ability to learn from each data fold.
The other issue we tried to solve was the generalizability of features across participants. Li et al. [19] offered a way to find EEG features in cross-participant emotion by exploring 18 kinds of linear and nonlinear EEG features. They examined the effectiveness of these 18 features from a dataset of emotional analysis, using physiological signals and a STJU motion EEG dataset. Their results showed that the considered Hjorth parameter was suitable for analyzing EEG signals. In their evaluation, they found the Hjorth parameter in the beta rhythm led to the best mean recognition accuracy in cross-subject emotion recognition. In our study, we used nine types of linear and nonlinear features to find the most common feature to be used in engagement recognition. As shown in Table 10, the Hjorth parameter becomes the most selected feature by CFS + KNN. Oh et al. [33] applied the Hjorth parameter for extracting EEG features. They found that the Hjorth parameter increased their EEG classification by 4.4%, on average. Following that, we suggest that the Hjorth parameter could be a useful feature for EEG, ECG, and NIRS in engagement recognition. The results from the feature selection also showed that nonlinear parameter features (e.g., spectral entropy) were not chosen as features for engagement recognition either stand-alone system or hybrid system.
Following feature extraction, we calculated a predictive model by using several classifiers. The KNN with k = 9 was selected in this system, after we compared it with other kernels and other classifiers. For example, we investigated SVM (poly kernel), but its accuracy performance was lower (70.78 ± 0.19%) than that of the KNN (71.65 ± 0.16%). The KNN classifier mentioned in the study by Hu et al. [23] had the highest accuracy among other classifiers, especially when combined with a CFS feature selector. In their study, the obtained k value for their systems is 3. In our study, we tried to investigate the KNN classifier with several k values (k = 1, k = 3, k = 9). To be thorough, the KNN with k = 1 has an accuracy of 65.64 ± 0.14%; the k = 3 has an accuracy of 70.34 ± 0.16%; the k = 9 has an accuracy of 71.65 ± 0.16%. The implementation of k = 1 was mostly full of noise. This is because the larger the k value, the more smoothing takes place. From this result, we decided to implement the KNN method with k = 9. Palaniappan et al. [36] compared the performance of SVM and KNN for diagnosing respiratory pathologies. Their result also showed that KNN has a better accuracy than SVM. Although the combination of a CFS and KNN algorithm has been used by Hu et al. [23] for EEG attention recognition, their study used two types of search methods (i.e., best first search and greedy stepwise search). In our study, we applied this algorithm to a low-density hybrid system for engagement recognition during cognitive tasks and only used one searching algorithm. We decided to use only one search method (i.e., best first search) after we found the best first search performed better compared with the greedy stepwise search.

Features Selected
Hybrid activity (Hjorth Parameter) (fz), power density integral gamma (fz), relative alpha (fz), relative beta (fz), activity (pz), power density integral beta (pz), relative power gamma (pz), complexity ECG (Hjorth parameter), mobility ECG (Hjorth parameter), activity ECG (Hjorth parameter), mobility total (fp1), activity total (fp1), complexity deoxy (fp1), mobility deoxy (fp1), activity deoxy (fp1), mobility tot (fp2), complexity deoxy (fp2), mobility deoxy (fp2) EEG Activity (fz), alpha density (fz), theta density (fz), gamma density (fz), Kolmogorov complexity (fz), relative power, relative alpha (fz), relative beta (fz), relative gamma fz, activity (pz), alpha density(pz), beta density(pz), gamma density (pz), relative alpha (pz), relative beta (pz), relative gamma (pz) ECG Activity, mobility complexity (Hjorth parameter) NIRS Activity [tot(fp1), deoxy(fp1), deoxy(fp2)], mobility [tot(fp1)), deoxy(fp1), tot(fp2), deoxy (fp2)], complexity [tot (fp1), deoxy(fp1), deoxy (fp2) (Hjorth parameter)] Our algorithm performance was evaluated using the Matthews correlation coefficient (MCC). Based on Chiccho [37], if the MCC value is closer to +1, it means the testing algorithm exhibits a better performance. Conversely, if the value is closer to −1, it means the testing algorithm exhibits a worse performance. From those criteria, the MCC value of our proposed hybrid systems showed the highest values among all other systems. The precision-recall curve (PRC) for our proposed hybrid system also showed the highest value among others. The detail can be seen in Table 11. Performance of participants for high engagement and low engagement experiments is shown in Figure 7. There were no significant differences in high engagement and low engagement compared with reaction time (p > 0.05) and response accuracy (p > 0.05) from 11 participants, based on t-test. In the other hand, blinking rates (p < 0.05) and pupillometry (p < 0.05) shown significant differences in both states. This could be happened because our task contain multiple levels. Level one consisted of series 30 sets of four digits, level two: 30 sets of five digits and level three: six digits. Aghajani et al. [5] mentioned performance-based information as data labeling can be misleading since multiple level of tasks could be grouped together with the same level of performance. This could explain the reliability of physiological data in our recognition. the other hand, blinking rates (p < 0.05) and pupillometry (p < 0.05) shown significant differences in both states. This could be happened because our task contain multiple levels. Level one consisted of series 30 sets of four digits, level two: 30 sets of five digits and level three: six digits. Aghajani et al. [5] mentioned performance-based information as data labeling can be misleading since multiple level of tasks could be grouped together with the same level of performance. This could explain the reliability of physiological data in our recognition.  However, there are still some limitations in this study. We have not studied the effect of fatigue to the brain activities. In future studies, we will also investigate the fatigue effect. Due to the small participant pool, sampling could be a limitation of this study. Even though the small number of participants used for training (n = 11) may limit our conclusions, the preliminary results demonstrated the capability of a low-density hybrid EEG, ECG, and NIRS system for use in engagement recognition.

Conclusions
This study sought to investigate the usability of a low-density hybrid system in engagement recognition. In our investigation, we used a 10-fold cross-validation to validate this hybrid system, after comparing it with other validation methods. We considered using CFS + KNN for engagement recognition. CFS + KNN showed the highest accuracy compared with other selector combinations; the value was 71.65 ± 0.16%. When comparing the result between hybrid and stand-alone system, the hybrid system showed the highest accuracy. We also found that the Hjorth parameter was useful for engagement recognition. From this, we concluded that hybrid system can be validated for engagement recognition.