A Deep Neural Network for Working Memory Load Prediction from EEG Ensemble Empirical Mode Decomposition

: Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD) are frequently associated with working memory (WM) dysfunction, which is also observed in various neural psychiatric disorders, including depression, schizophrenia, and ADHD. Early detection of WM dysfunction is essential to predict the onset of MCI and AD. Artiﬁcial Intelligence (AI)-based algorithms are increasingly used to identify biomarkers for detecting subtle changes in loaded WM. This paper presents an approach using electroencephalograms (EEG), time-frequency signal processing


Introduction
Neurodegenerative diseases, including Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD), as well as neural psychiatric disorders such as depression, schizophrenia, and ADHD, have been consistently linked with dysfunction in the cognitive system known as Working Memory (WM) [1]. Working memory load refers to the amount of information temporarily stored in WM at a given instance [2]. The role WM is critical, particularly during the initial phases of MCI. This WM decline is initially mild and unnoticeable and later develops into AD. The use of AI-based biomarkers to detect subtle changes in memory at an early stage is becoming increasingly important in identifying the onset of MCI and AD [3,4]. The decline in working memory performance, particularly at lower cognitive loads, has been found to predict the onset of MCI, which is an early stage of AD [5,6]. Early identification of AD or dementia is critical for timely interventions and treatments. Providing subjects with an application tool for assessing WM can aid in identifying the initial stages of these conditions. However, existing EEG-based methods for predicting WM load in normal subjects and subjects with MCI have been limited by low accuracies [7]. Therefore, developing more accurate methods for predicting WM load in these subjects remains an important challenge. EEG signals provide valuable information about the time-varying dynamics of neural activity [8]. Time-frequency decomposition methods, such as Ensemble Empirical Mode Decomposition (EEMD), are frequently employed for extracting time-frequency features from EEG signals. EEMD is an adaptive signalprocessing technique that separates the EEG signal into a set of Intrinsic Mode Functions (IMFs), each representing a different time-scale of activity in the signal [9,10]. By analyzing the IMFs, it is possible to identify neural activity patterns linked to specific cognitive processes, such as WM.
Non-invasive EEG is a low-cost method for acquiring brain signals from a subject's scalp, making it suitable for assessing WM [7,11,12]. EEG recordings can provide valuable insights into the neural processes underlying WM dysfunction. Numerous studies have been conducted on EEG-based biomarkers for WM, but few have developed a patientspecific application tool that accurately predicts the WM load using a reduced number of EEG electrodes [13,14]. Independent Components Analysis (ICA) is a well-established technique for transforming EEG data [15,16]. Addressing these gaps is of utmost importance for the development of accurate and efficient tools for identifying early stages of cognitive impairments.
The present study aims to develop a state-of-the-art Deep Neural Network (DNN) architecture trained on time-frequency features computed from EEG responses. These responses are elicited while subjects engage in visuospatial WM tasks to accurately classify the WM load in normal subjects and those with MCI. Secondly, we seek to employ the ICA method to select a reduced number of independent EEG electrode channels from the original set of 16 channels. By developing a DNN architecture that is trained on timefrequency features, we aim to overcome the current limitations of low accuracies associated with EEG-based WM load prediction methods. This innovation holds the promise to transcend the current limitations of accuracy associated with EEG-based working memory (WM) load-prediction methods. By reducing the number of channels used in the analysis, the study aims to streamline the analysis process, reduce complexity, and enhance the overall accuracy of the WM load classification. The following contributions emphasize the accuracy of the WM load prediction: (1) The reduction in the number of EEG channels used for time-frequency feature extraction.
(2) A novel workflow for WM load prediction that combines two powerful tools: a time frequency analysis method of EEMD and an AI method of DNN for WM load prediction. (3) A robust approach utilizing a subset of scalp EEG electrodes for predicting WM load.
We organize the remaining sections of this work as follows. Section 2 provides related work on working memory. In Section 3, we present the materials and methodology. Section 4 provides results, and Section 5 the discussion. Finally, conclusions are given in Section 6.

Literature Review
The concept of memory was initially defined by Herman Ebbinghaus in 1885 [17] and later classified as primary and secondary memory. There are two categories of memory: long-term memory (LTM) and short-term memory (STM) [18]. WM is similar to STM, involving cognitive processes such as attention, reasoning, decision-making, and language comprehension. In WM, only relevant information necessary for performing the task is stored, while other irrelevant information is discarded. Figure 1 illustrates a model for information processing in the brain [19], showing how WM acts as an intermediary between sensory and long-term memory. The WM cognitive system acts as a channel for storing and transferring information to long-term memory. However, only the information relevant to the task at hand is held in the WM, while other irrelevant information is discarded. For instance, while performing an arithmetic operation, distracting information such as background noises or body temperature is ignored, and only the numbers used for the operation are stored in the WM. However, this information is not permanently stored, and after some time, it is forgotten. The sensory memory filters out distracting information and either stores it in long-term memory or discards it through WM. This process is crucial for efficient cognitive functioning [1].
storing and transferring information to long-term memory. However, only the information relevant to the task at hand is held in the WM, while other irrelevant information is discarded. For instance, while performing an arithmetic operation, distracting information such as background noises or body temperature is ignored, and only the numbers used for the operation are stored in the WM. However, this information is not permanently stored, and after some time, it is forgotten. The sensory memory filters out distracting information and either stores it in long-term memory or discards it through WM. This process is crucial for efficient cognitive functioning [1].

Figure 1.
Model for information processing in the brain.
Furthermore, the brain's resources for storing information are limited during cognitive tasks. The memory load that the WM experiences while performing a cognitive task is referred to as cognitive load or WM load [20]. Cognitive Load Theory (CLT) helps to understand what happens when subjects perform cognitive functions with increasing WM load [21]. According to CLT, reducing cognitive overload in an individual is crucial for optimal learning. The three types of cognitive loads are intrinsic load, germane load, and extraneous load [22]. Intrinsic load encapsulates the intricacy of new information, germane load links new data with the current knowledge, and extraneous load is the distracting information while performing a cognitive task.
A recent study by Puszta et al. [23] analyzed EEG datasets to understand the effect of stimulus modality on EEG patterns during associative WM load. The support vector machine algorithm was used to predict WM load and stimulus modality using power, phase connectivity, and cross-frequency coupling values. Under a low WM load, alpha and theta frontal-parietal connectivity were highest, and the prediction accuracy for WM load was ≥75%. The study validated findings from earlier studies and highlighted the importance of frontal-parietal connectivity in the WM load. In an fMRI experiment, Collins et al. [24] studied how working memory and reinforcement learning processes interact. They found that learning is best explained as a mixture of a fast, delay-sensitive WM process and slower reinforcement learning. The neural interaction between the two systems was related to individual differences in using WM for learning. Eryilmaz et al. [25] identified connectivity fingerprints that differentiate task states representing different levels of WM load. The most critical connectivity pairs for classifying high and low WM loads were discovered using support vector machines. Network connectivity profiles derived from tasks with high WM load revealed that ventral attention and default-mode networks were the most predictive of load-related increases in response times.
The author in [26] proposed a new approach to predict WM tasks from EEG signals using data-driven functional linear regression. The proposed method uses a B-spline approximation of functional principal components and LASSO feature selection to extract critical features from frontal electrodes. The results show a strong linear association between actual observations and out-of-sample predictions with an R-square of 0.72. This study represents the first attempt to predict WM proficiency from N-back task tests using EEG signals. Ben-Artzi et al. [27] studied how WM capacity and load manipulation affect the assignment of credit to irrelevant features of the environment. Individual differences in visual WM capacity were found to predict outcome-irrelevant learning. However, it is Furthermore, the brain's resources for storing information are limited during cognitive tasks. The memory load that the WM experiences while performing a cognitive task is referred to as cognitive load or WM load [20]. Cognitive Load Theory (CLT) helps to understand what happens when subjects perform cognitive functions with increasing WM load [21]. According to CLT, reducing cognitive overload in an individual is crucial for optimal learning. The three types of cognitive loads are intrinsic load, germane load, and extraneous load [22]. Intrinsic load encapsulates the intricacy of new information, germane load links new data with the current knowledge, and extraneous load is the distracting information while performing a cognitive task.
A recent study by Puszta et al. [23] analyzed EEG datasets to understand the effect of stimulus modality on EEG patterns during associative WM load. The support vector machine algorithm was used to predict WM load and stimulus modality using power, phase connectivity, and cross-frequency coupling values. Under a low WM load, alpha and theta frontal-parietal connectivity were highest, and the prediction accuracy for WM load was ≥75%. The study validated findings from earlier studies and highlighted the importance of frontal-parietal connectivity in the WM load. In an fMRI experiment, Collins et al. [24] studied how working memory and reinforcement learning processes interact. They found that learning is best explained as a mixture of a fast, delay-sensitive WM process and slower reinforcement learning. The neural interaction between the two systems was related to individual differences in using WM for learning. Eryilmaz et al. [25] identified connectivity fingerprints that differentiate task states representing different levels of WM load. The most critical connectivity pairs for classifying high and low WM loads were discovered using support vector machines. Network connectivity profiles derived from tasks with high WM load revealed that ventral attention and default-mode networks were the most predictive of load-related increases in response times.
The author in [26] proposed a new approach to predict WM tasks from EEG signals using data-driven functional linear regression. The proposed method uses a B-spline approximation of functional principal components and LASSO feature selection to extract critical features from frontal electrodes. The results show a strong linear association between actual observations and out-of-sample predictions with an R-square of 0.72. This study represents the first attempt to predict WM proficiency from N-back task tests using EEG signals. Ben-Artzi et al. [27] studied how WM capacity and load manipulation affect the assignment of credit to irrelevant features of the environment. Individual differences in visual WM capacity were found to predict outcome-irrelevant learning. However, it is noted that further research is needed in different modules to determine whether these differences are applicable in other contexts. Mirjalili et al. [28] proposed a comprehensive process to address challenges in single-trial EEG classification, including temporal overlap, feature selection stability, and classification accuracy significance. The model identified 4 to 10 brain regions and oscillations where ERD and ERS predict an individual's performance. The mean prediction accuracy for 50 participants was 69.51%, with a standard deviation of 8.41. Accuracy was significantly above chance in 34 participants.

Materials and Methods
The EEG signal processing and DNN-based WM load prediction model comprises five stages. Figure 2 illustrates the steps in processing the EEG signal to predict WM load. During the N-back and block-tapping task sessions, subjects were instructed to wear an EEG cap connected to a 16-channel OpenBCI EEG data collection board, as shown in Figure 3a. The N-back is the most versatile cognitive assessment, which requires information retention, continual updating, and interference resolution. As a result of its comprehensive nature, the N-back task has found extensive application in WM evaluation in human subjects [29]. It keeps the participant's WM system continuously engaged at its limit, thereby stimulating an increase in WM function, hence its widespread use for WM load prediction [30]. In addition to N-back, we created the block-tapping task in which the subject was required to remember the sequence of appearance of blocks in a spatial grid. These two visuospatial WM tasks were a comprehensive set of tasks used in this work for WM load prediction. The EEG cap was adjusted to fit the subject's head and ensure proper electrode placement. The task consisted of tapping a block in response to visual cues displayed on a computer screen. During the task, EEG signals were recorded and saved for offline analysis. Figure 3b shows a subject wearing the EEG cap during the task, and Figure 3c shows the subject performing the block-tapping task.
differences are applicable in other contexts. Mirjalili et al. [28] proposed a comprehensive process to address challenges in single-trial EEG classification, including temporal overlap, feature selection stability, and classification accuracy significance. The model identified 4 to 10 brain regions and oscillations where ERD and ERS predict an individual's performance. The mean prediction accuracy for 50 participants was 69.51%, with a standard deviation of 8.41. Accuracy was significantly above chance in 34 participants.

Materials and Methods
The EEG signal processing and DNN-based WM load prediction model comprises five stages. Figure 2 illustrates the steps in processing the EEG signal to predict WM load. During the N-back and block-tapping task sessions, subjects were instructed to wear an EEG cap connected to a 16-channel OpenBCI EEG data collection board, as shown in Figure 3a. The N-back is the most versatile cognitive assessment, which requires information retention, continual updating, and interference resolution. As a result of its comprehensive nature, the N-back task has found extensive application in WM evaluation in human subjects [29]. It keeps the participant's WM system continuously engaged at its limit, thereby stimulating an increase in WM function, hence its widespread use for WM load prediction [30]. In addition to N-back, we created the block-tapping task in which the subject was required to remember the sequence of appearance of blocks in a spatial grid. These two visuospatial WM tasks were a comprehensive set of tasks used in this work for WM load prediction. The EEG cap was adjusted to fit the subject's head and ensure proper electrode placement. The task consisted of tapping a block in response to visual cues displayed on a computer screen. During the task, EEG signals were recorded and saved for offline analysis. Figure 3b shows a subject wearing the EEG cap during the task, and Figure 3c shows the subject performing the block-tapping task.   noted that further research is needed in different modules to determine whether these differences are applicable in other contexts. Mirjalili et al. [28] proposed a comprehensive process to address challenges in single-trial EEG classification, including temporal overlap, feature selection stability, and classification accuracy significance. The model identified 4 to 10 brain regions and oscillations where ERD and ERS predict an individual's performance. The mean prediction accuracy for 50 participants was 69.51%, with a standard deviation of 8.41. Accuracy was significantly above chance in 34 participants.

Materials and Methods
The EEG signal processing and DNN-based WM load prediction model comprises five stages. Figure 2 illustrates the steps in processing the EEG signal to predict WM load. During the N-back and block-tapping task sessions, subjects were instructed to wear an EEG cap connected to a 16-channel OpenBCI EEG data collection board, as shown in Figure 3a. The N-back is the most versatile cognitive assessment, which requires information retention, continual updating, and interference resolution. As a result of its comprehensive nature, the N-back task has found extensive application in WM evaluation in human subjects [29]. It keeps the participant's WM system continuously engaged at its limit, thereby stimulating an increase in WM function, hence its widespread use for WM load prediction [30]. In addition to N-back, we created the block-tapping task in which the subject was required to remember the sequence of appearance of blocks in a spatial grid. These two visuospatial WM tasks were a comprehensive set of tasks used in this work for WM load prediction. The EEG cap was adjusted to fit the subject's head and ensure proper electrode placement. The task consisted of tapping a block in response to visual cues displayed on a computer screen. During the task, EEG signals were recorded and saved for offline analysis. Figure 3b shows a subject wearing the EEG cap during the task, and Figure 3c shows the subject performing the block-tapping task.   The raw EEG data were bandpass-filtered, and ICA was used to select the most significant electrode channels. By applying ICA to the EEG data, we efficiently reduced the number of electrodes while preserving signal quality. The methodology involves formulating the problem as X = AS, whitening the mixed EEG signals (X), estimating the unmixing matrix (W), obtaining independent components (S_est = W X_w), selecting a relevant subset of ICs, and reconstructing reduced EEG signals (X_red = A_red S_red).
From the reduced-electrode EEG data, we extracted IMFs to identify relevant features for predicting WM load. Figure 4 shows an example of the raw EEG data (in red) and the corresponding IMFs (in green). The IMFs were obtained using EEMD, a time-frequency technique that decomposes nonstationary signals into a finite number of IMFs and a residual component. The IMFs represent the underlying oscillatory modes in the EEG signal and provides valuable information about the dynamic changes in brain activity related to cognitive processes. The EEMD methodology includes the following steps: i.
The raw EEG data were bandpass-filtered, and ICA was used to select the most sig-nificant electrode channels. By applying ICA to the EEG data, we efficiently reduced the number of electrodes while preserving signal quality. The methodology involves formulating the problem as X = AS, whitening the mixed EEG signals (X), estimating the unmixing matrix (W), obtaining independent components (S_est = W X_w), selecting a relevant subset of ICs, and reconstructing reduced EEG signals (X_red = A_red S_red).
From the reduced-electrode EEG data, we extracted IMFs to identify relevant features for predicting WM load. Figure 4 shows an example of the raw EEG data (in red) and the corresponding IMFs (in green). The IMFs were obtained using EEMD, a time-frequency technique that decomposes nonstationary signals into a finite number of IMFs and a residual component. The IMFs represent the underlying oscillatory modes in the EEG signal and provides valuable information about the dynamic changes in brain activity related to cognitive processes. The EEMD methodology includes the following steps: i.
Calculate residual, r(t) = x(t) − ∑ C_j(t), and determine stopping criteria. In this study, a DNN architecture was developed to predict WM load using EEG signals. Figure 5 shows the architecture of the DNN model used in this study. The model consisted of six hidden layers, a Relu activation input layer, and a Sigmoid activation output layer. The model was trained using the backpropagation algorithm with the Adam optimizer and a categorical cross-entropy loss function. The model's performance was evaluated using Overall Accuracy (OA), specificity, sensitivity, F1 score, and Kappa metrics. In this study, the IMFs were used as features for predicting WM load using a DNN model. In this study, a DNN architecture was developed to predict WM load using EEG signals. Figure 5 shows the architecture of the DNN model used in this study. The model consisted of six hidden layers, a Relu activation input layer, and a Sigmoid activation output layer. The model was trained using the backpropagation algorithm with the Adam optimizer and a categorical cross-entropy loss function. The model's performance was evaluated using Overall Accuracy (OA), specificity, sensitivity, F1 score, and Kappa metrics. In this study, the IMFs were used as features for predicting WM load using a DNN model.
A total of 18 subjects participated in the experiment. Each subject filled out the participant consent form before the commencement of the session. Each subject's response on the block tapping and N-back WM tasks were recorded to calculate low and high WM load for the three trials, while the 16-channel EEG data were also recorded. The EEG data were preprocessed as per the workflow in Figure 2. The IMF features were divided into training, testing, and validation sets before training the DNN. For each trial, a confusion matrix was computed to discern between two classes: Low WM load and high WM load. The format of the confusion matrix is shown in Figure 6. The performance metrics were calculated using the formulas given in Table 1, where TH is True High WM load, TL is True Low WM load, FH is False High WM load, and FL is False Low WM load. A total of 18 subjects participated in the experiment. Each subject filled out the participant consent form before the commencement of the session. Each subject's response on the block tapping and N-back WM tasks were recorded to calculate low and high WM load for the three trials, while the 16-channel EEG data were also recorded. The EEG data were preprocessed as per the workflow in Figure 2. The IMF features were divided into training, testing, and validation sets before training the DNN. For each trial, a confusion matrix was computed to discern between two classes: Low WM load and high WM load. The format of the confusion matrix is shown in Figure 6. The performance metrics were calculated using the formulas given in Table 1, where TH is True High WM load, TL is True Low WM load, FH is False High WM load, and FL is False Low WM load

Performance Metrics Formula
Sensitivity + Specificity + F1-Score 2 * * + Overall Accuracy (OA) The Kappa score tests the inter-reliability of the results, i.e., how much of the accuracy is obtained by chance. Po is the proportion of observed agreement, and Pe is the proportion of agreement expected by chance. The specificity score measures the proportion  A total of 18 subjects participated in the experiment. Each subject filled out the participant consent form before the commencement of the session. Each subject's response on the block tapping and N-back WM tasks were recorded to calculate low and high WM load for the three trials, while the 16-channel EEG data were also recorded. The EEG data were preprocessed as per the workflow in Figure 2. The IMF features were divided into training, testing, and validation sets before training the DNN. For each trial, a confusion matrix was computed to discern between two classes: Low WM load and high WM load. The format of the confusion matrix is shown in Figure 6. The performance metrics were calculated using the formulas given in Table 1, where TH is True High WM load, TL is True Low WM load, FH is False High WM load, and FL is False Low WM load

Performance Metrics Formula
Sensitivity + Specificity + F1-Score 2 * * + Overall Accuracy (OA) The Kappa score tests the inter-reliability of the results, i.e., how much of the accuracy is obtained by chance. Po is the proportion of observed agreement, and Pe is the proportion of agreement expected by chance. The specificity score measures the proportion

Discussion
Working memory is the most important function affected in neurodegenerative diseases such as MCI/AD and dementia, and in psychiatric disorders such as schizophrenia and ADHD. Time-frequency analysis and deep learning methods are powerful tools that can be used for developing technology to be used by clinicians and physicians to predict WM load. The IMFs computed using EEMD from EEG brain responses and a DNN was utilized to predict low WM and high WM load. This methodology was employed in a pilot study comprised of healthy subjects of varying age and subjects with MCI, to determine its ability to distinguish between instances of low and high WM loads. The performance metrics of OA, Kappa, sensitivity or recall, specificity or precision, and F1-score were used to evaluate the method. The effectiveness of the time-frequency IMFs in discriminating between low and high WM loads in healthy subjects, as well as subjects with MCI was evaluated. The ICA was useful for selecting the fewer best EEG electrodes for each subject without compromising the performance of the DNN in the prediction accuracy.

Discussion
Working memory is the most important function affected in neurodegenerative diseases such as MCI/AD and dementia, and in psychiatric disorders such as schizophrenia and ADHD. Time-frequency analysis and deep learning methods are powerful tools that can be used for developing technology to be used by clinicians and physicians to predict WM load. The IMFs computed using EEMD from EEG brain responses and a DNN was utilized to predict low WM and high WM load. This methodology was employed in a pilot study comprised of healthy subjects of varying age and subjects with MCI, to determine its ability to distinguish between instances of low and high WM loads. The performance metrics of OA, Kappa, sensitivity or recall, specificity or precision, and F1-score were used to evaluate the method. The effectiveness of the time-frequency IMFs in discriminating between low and high WM loads in healthy subjects, as well as subjects with MCI was evaluated. The ICA was useful for selecting the fewer best EEG electrodes for each subject without compromising the performance of the DNN in the prediction accuracy. Table 1 lists the OA, Kappa, sensitivity, specificity, and F1-score calculated for each of the 18 subjects. The OA was between 90.52% and 100% for normal subjects, and between 93.61% and 99.68% for subjects, with MCI showing a good prediction performance of the DNN. It obtained 97.52% overall accuracy for all subjects compared to 83.94%, the highest among the other methods. The Kappa score for the age group 20-40 years was between 87.14% and 100%, which shows that the results were almost in perfect agreement with the datasets. For age group 40 to 60 years, the Kappa score varied between 83.5% to 100%, indicating strong agreement of the results with the data. Subject S10 had an 80.85% Kappa score. This subject had difficulty using the computer mouse in block tapping and 1-back tasks. For the age group above 60 years, the Kappa scores were between 96.01% and 99.61%, indicating perfect agreement of the prediction results with the datasets. For the subjects with MCI, the Kappa score ranged between 87.12% and 99.32%, indicating strong agreement of the prediction results with the data. One subject with MCI had a Kappa score of 87.12%, which is understandable, as the subject had a recent eye surgery and had recovered from that and showed difficulty locating the blocks. The other subject with MCI having a Kappa score of 87.14% presented the need for more practice performing the WM tasks. Based on the confusion matrices for each subject, the sensitivity rate and specificity rate were calculated. The sensitivity scores were above 90% for all the subjects except one who scored 87.24%, indicating a high prediction accuracy for the true labels of high WM and low WM load. The specificity scores were above 98.87% for all subjects, meaning that the DNN had the least false predictions, affirming the reliability of the DNN in WM load predictions. The F1-scores were presented in the last column of Table 1, with an average F1-score of 97.34%. This underscores the good levels of sensitivity and precision, which indicates the prototype was robust and performed well in WM load prediction. Figure 7 summarizes the WM load prediction performance in a bar chart. Table 3 provides the average and standard deviation of the OA for each age group tested. For age group of 20 to 40 years, the average was 98.47%; for 40 to 60 years, the average was 95.35%; for above 60 years, it was 98.53%; and for subjects with MCI, the average was 97.51%, respectively. The standard deviation was between 2.4 and 4.84, which shows the reliability of the prototype for prediction of WM load in all the age groups tested. Table 4 gives the results of the ANOVA conducted on the overall prediction accuracy for each tested group. For each age group, the p-value was <0.05, so the null hypothesis of the results being random was rejected. ANOVA analysis also shows that the F-values were greater than the F-critical values, indicating that the results are significant. The implemented prototype performed much better than the current state-of-the-art methods in WM load prediction, as can be seen in Tables 2 and 5.

Analysis of IMFs of Subjects above 60 Years and Subjects with MCI during Rest State
The EEMD was used for extracting time-frequency features called IMFs from the EEG signals. The efficacy of the IMFs in distinguishing between low WM load and high WM load was analyzed by calculating the average of the Power Spectral Density (PSD) of the IMF3, IMF4, and IMF5 for the frontal Fp2 electrode EEG signal. The Fp2 was selected as this was one of the best channels selected by the ICA method for 17 of the 18 subjects tested. The brain processes memory information in the frontal region of the brain. The average PSD of the IMFs 3, 4, and 5 during the rest state (the subject was seated with eyes closed) for subjects above 60 years and subjects with MCI are depicted in Figure 8, and the coefficient of variance in Figure 9, respectively. According to Figure 8, the AVG PSD of IMF is high for subjects S16 and S17 with MCI. In the case of subjects of age above 60 years, only the AVG PSD of IMF5 has higher values which are the lower frequencies of the ensemble empirical mode decomposition, while the MCI subjects have higher values of the AVG PSD for IMF4 that are of higher frequencies of beta waves. Figure 9 depicts a graphical representation of the coefficient of variance, showing a high value for subject S12 for IMF4, indicating a high oscillatory behavior. Figures 10 and 12 summarize the AVG PSD of IMFs for low WM and high WM loads in normal and subjects with MCI, respectively. Figures 11 and 13 present the coefficient of variance for AVG PSD of IMFs for low WM and high WM loads, respectively. According to the plots for the AVG PSD of IMFs in Figure 10, the subjects with MCI had higher values for IMF4 and IMF5 beta and alpha oscillations during low WM load activity. Subjects S09, S11, and S12, who are above 60 years of age, showed higher values of the IMF5. According to Figure 11, the coefficient of variance was higher for IMF4 for most of the subjects, indicating variability in the values of IMF4. Similarly, analyzing the AVG PSD of IMFs for high WM in Figure 12, we can see that the subjects with MCI have high values for IMF5, which are alpha waves. Figure 13 shows the variability in the IMF3 and IMF4; hence, the coefficient of variance for these IMFs had higher values. In general, the IMFs 4 and 5 corresponding to beta and alpha waves are predominantly present in the frontal region or prefrontal cortex of subjects with MCI while performing WM tasks. The EEMD extracted the beta (IMF1, IMF2, IMF3, and IMF4), alpha (IMF5, IMF6, and IMF7), and theta bands (IMF8, IMF9, and IMF10). The beta bands IMF3, IMF4, and alpha band IMF5 were the most informative features used to train the DNN.

Analysis of IMFs of Normal Subjects and Subjects with MCI for Low and High WM Load
The ANOVA results in Table 6 for the two groups of AVG PSD of IMFs for low WM load and high WM load in normal subjects had a p-value of 0.0469 and F-value (4.14) surpassing the F-critical value (4.03), confirming that the IMFs are reliable features for discriminating between low and high WM load in normal subjects. Also, for the group with MCI, the AVG PSD of IMFs for low WM load and high WM load had a p-value of 0.0145, and the F-value (9.65) was above the F-critical value (5.32), indicating that the IMFs are robust features for discriminating between low and high WM load in subjects with MCI. The analysis was conducted across various age groups, considering the decline in WM function with age. There were equal number of males and females in the tested groups. The proposed ICA + EEMD + DNN method predicts WM load equally well in male and female subjects. Table 6. Results of analysis of variance (ANOVA) conducted on the relationship between average power spectral density of intrinsic mode functions for low and high WM load in normal subjects and in subjects with MCI.

Efficacy of ICA in Selecting Fewer Scalp Electrodes Specific to Each Subject
ICA was used to identify the most independent EEG channel electrodes based on orthogonal rotation of prewhitened EEG 16 channel data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. The number of channels identified were from 2 to 10 out of the total 16 channels. Figure 14 gives the name of the electrodes selected for each subject and the brain region they pertain to. It can be seen that the frontal region is the most involved in WM tasks for all the subjects where the channel electrodes Fp1 and Fp2 are located. In two normal subjects, channels F3, F4, F7, and F8 in the frontal region were also selected. Following the frontal region electrodes, the electrodes in the cerebral cortex region C3 and C4, and the temporal electrodes T5 and T6 were the most selected. Hence, the frontal, cerebral cortex, and temporal electrodes were selected for all subjects. In addition, the parietal electrodes P3 and P4 were selected for some normal subjects, while the occipital electrodes O1 and O2 were selected instead of the parietal electrodes in three subjects with MCI. Figures 15 and 16 show the brain topological plot for a subject from each tested group during a rest state and during a WM task state. As seen in the left panel of Figure 16, the subjects with MCI exhibited mostly zero values in the brain topoplot during the rest state.      Table 6 presents methods used for WM load prediction using wavelet features [28], Lasso [23], power ratio features [29], and EMD features [30]. These methodologies are built on redundant feature-extraction methods that do not result in optimal features for training a classifier model. The models utilized for classification, such as the SVM, cannot be used for individualized WM load prediction. A DNN classifier is used in [29], but the lack of robust features results in poorer WM load prediction accuracies. Conversely, in other methods such as [23,35], where theta and alpha phase connectivity were computed, the use of a less potent classifier resulted in unreliable accuracies for WM load prediction. The features used in the SOA are either spatial or spatiotemporal features, and do not exploit the time-frequency content of the EEG signals. A more recent advancement, EEMD, serves as an enhanced time-frequency feature extraction technique capable of decomposing an EEG signal into time widows of varying frequencies. As a result, the proposed ICA+EEMD+DNN method outperforms alternative methods in WM load prediction.

Limitations
One limitation of this study is its focus on one cognitive ability in comparison to WM, while individual differences may vary in relation to other cognitive abilities such as learning and attention. For future work, we will design experiments to evaluate a set of related cognitive abilities for each neurodegenerative or psychiatric disorder. The other limitation is regarding EEG data which were inherently non-stationary and noisy, and required setup time for application of the electrodes on the subject's scalp to ensure noise-free good-quality signals. The current DNN model was trained for each subject, and the training data should encompass a wide range of variability for generalizing the system. Also, the DNN requires large datasets for training: currently, it utilizes 67% of data for training. For future work, we propose to use Model Agnostic Meta Learning (MAML) networks that require much less data for training, and hold promise for achieving enhanced generalizable performance.

Conclusions
The deep learning method demonstrated proficiency in predicting WM load in both normal subjects from different age groups, and subjects with MCI through ICA best electrode selection and EEMD time-frequency analysis of brain EEG signals recorded while the subject performed WM tasks. The average overall accuracy obtained using the ICA + EEMD + DNN method was 97.62%, performing better than EMD-and SVM-or CNNbased methods. This method selects the best independent electrodes using the fast ICA algorithm, and it uses EEMD based IMF features, which is better than using EMD, which ensures the best WM load prediction accuracies. This method is subject-or patient-specific and selects the best electrodes for the subject's working memory function capability. This ICA + EEMD + DNN WM load prediction method can also be used by individual subjects, as it is low cost and non-invasive, to monitor their WM function. The analysis of the IMFs  Table 6 presents methods used for WM load prediction using wavelet features [28], Lasso [23], power ratio features [29], and EMD features [30]. These methodologies are built on redundant feature-extraction methods that do not result in optimal features for training a classifier model. The models utilized for classification, such as the SVM, cannot be used for individualized WM load prediction. A DNN classifier is used in [29], but the lack of robust features results in poorer WM load prediction accuracies. Conversely, in other methods such as [23,35], where theta and alpha phase connectivity were computed, the use of a less potent classifier resulted in unreliable accuracies for WM load prediction. The features used in the SOA are either spatial or spatiotemporal features, and do not exploit the time-frequency content of the EEG signals. A more recent advancement, EEMD, serves as an enhanced time-frequency feature extraction technique capable of decomposing an EEG signal into time widows of varying frequencies. As a result, the proposed ICA + EEMD + DNN method outperforms alternative methods in WM load prediction.

Limitations
One limitation of this study is its focus on one cognitive ability in comparison to WM, while individual differences may vary in relation to other cognitive abilities such as learning and attention. For future work, we will design experiments to evaluate a set of related cognitive abilities for each neurodegenerative or psychiatric disorder. The other limitation is regarding EEG data which were inherently non-stationary and noisy, and required setup time for application of the electrodes on the subject's scalp to ensure noise-free good-quality signals. The current DNN model was trained for each subject, and the training data should encompass a wide range of variability for generalizing the system. Also, the DNN requires large datasets for training: currently, it utilizes 67% of data for training. For future work, we propose to use Model Agnostic Meta Learning (MAML) networks that require much less data for training, and hold promise for achieving enhanced generalizable performance.

Conclusions
The deep learning method demonstrated proficiency in predicting WM load in both normal subjects from different age groups, and subjects with MCI through ICA best electrode selection and EEMD time-frequency analysis of brain EEG signals recorded while the subject performed WM tasks. The average overall accuracy obtained using the ICA + EEMD + DNN method was 97.62%, performing better than EMD-and SVMor CNN-based methods. This method selects the best independent electrodes using the fast ICA algorithm, and it uses EEMD based IMF features, which is better than using EMD, which ensures the best WM load prediction accuracies. This method is subject-or patient-specific and selects the best electrodes for the subject's working memory function capability. This ICA + EEMD + DNN WM load prediction method can also be used by individual subjects, as it is low cost and non-invasive, to monitor their WM function. The analysis of the IMFs using average PSD show that the IMFs calculated for the subjects with MCI had higher values of beta and alpha oscillations (IMF3, IMF4, and IMF5) compared to normal subjects and presented predominantly in the frontal region of the brain during the performance of WM tasks. All of the subjects used the frontal region and the cerebral cortex regions for performing WM tasks. MCI subjects also used the occipital region, while some normal subjects used the parietal region.
We focused on evaluating our WM load prediction method on subjects with MCI because they had significant WM dysfunction, which may lead to AD. Since AD is the most common neurodegenerative disorder affecting a large population, we focused on WM load prediction in subjects diagnosed with MCI. For future work, we will extend our testing WM load prediction approach to other neural psychiatric disorders, which will require the refinement and modification of the current approach as well as testing other ML paradigms such as MAML deep learning networks.
WM plays a crucial role in many cognitive tasks. The proposed method has potential use in Brain Computer Interface (BCIs) where it can improve the adaptability and responsiveness of BCIs when patients have movement disorders, personalized learning experiences for patients and students according to their cognitive load, monitoring the safety of workers, wherein this system could be used to determine if workers are overloaded to provide breaks and shifts accordingly. The other potential use will be in neurofeedback, cognitive rehabilitation for patients recovering from brain injury, and adapting Virtual Reality (VR) and Augmented Reality (AR) based on the user's cognitive load to make VR/AR interactions more intuitive and user-friendly.  Data Availability Statement: EEG data collected in this project is available on request.