Models for Classifying Cognitive Load Using Physiological Data in Healthcare Context: A Scoping Review

Kim, Hyeongjo; Kim, Minji; Han, Yejin

doi:10.3390/app15169155

Open AccessReview

Models for Classifying Cognitive Load Using Physiological Data in Healthcare Context: A Scoping Review

by

Hyeongjo Kim

^1,†,

Minji Kim

^2,†

and

Yejin Han

^3,*

¹

Department of Education, University of Illinois Urbana-Champaign, Champaign, IL 61801, USA

²

Center for Innovative Digital Education, SNU College, Seoul National University, Seoul 08826, Republic of Korea

³

Department of Medical Education & Humanities, College of Medicine, Yeungnam University, Daegu 42415, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(16), 9155; https://doi.org/10.3390/app15169155

Submission received: 2 July 2025 / Revised: 11 August 2025 / Accepted: 12 August 2025 / Published: 20 August 2025

(This article belongs to the Special Issue Interaction Design Technologies for Education: Advancements, Challenges, and Impacts)

Download

Browse Figures

Versions Notes

Abstract

Background: In healthcare education, understanding and managing cognitive load is crucial for enhancing learning outcomes for students, healthcare professionals, patients, and the general public. Despite numerous studies developing cognitive load classification models, there is a lack of comprehensive guidelines on how to effectively utilize these models in healthcare education. This study reviews cognitive load classification models using physiological data to provide insights and guidelines for their development in healthcare contexts. Methods: A scoping review was conducted on studies published between 2015 and 2024, identified through databases including Scopus, Web of Science, PubMed, EMBASE, and PsycINFO. The search terms included “cognitive load,” “physiology,” “data,” and “classification.” Inclusion criteria were peer-reviewed journal articles in English, focused on the healthcare context, utilizing experimental physiological data, and developing classification models. After screening 351 articles, ten studies met the inclusion criteria and were analyzed in detail. Results: Task design predominantly focused on measuring intrinsic cognitive load by adjusting task difficulty. Data collection mainly utilized EEG (electroencephalogram) and body movement data. SVM (support vector machine) algorithms were the most frequently used for model development, with cross-validation and feature selection employed to prevent overfitting. This study derived the importance of clearly defining cognitive load types, designing appropriate tasks, establishing reliable ground truths with multiple indicators, and selecting contextually relevant data. Conclusions: This study provides a comprehensive analysis of cognitive load classification models using physiological data in healthcare education, offering valuable guidelines for their development. Despite the study’s limitations, including a small number of analyzed papers and limited diversity in educational contexts, it offers critical insights for using and developing cognitive load classification in healthcare education. Future research should explore the applicability of these models across diverse educational settings and populations, aiming to enhance the effectiveness of healthcare education and ultimately improve learning and healthcare outcomes.

Keywords:

cognitive load (CL); classification model; physiological data; healthcare education; scoping review

1. Introduction

To effectively achieve the goal of education in the healthcare field, it is essential to provide educational materials tailored to the learners’ levels. Given that learners’ comprehension and learning abilities vary, it is important to consider cognitive load related to these differences [1,2]. Cognitive load refers to the mental effort required to process information and is closely linked to the learner’s level of understanding [3,4]. For instance, novice learners must focus on grasping basic concepts, thus maintaining a lower cognitive load is essential. Conversely, advanced learners benefit from engaging with complex problem-solving tasks that appropriately challenge their cognitive capacities. Therefore, while appropriate cognitive load can enhance learning outcomes, excessive cognitive load can impede learning [5,6].

In this aforementioned context, efforts have been made to accurately categorize cognitive load and provide tailored education accordingly. Personalized education is designed considering the learner’s cognitive load level, thereby improving learning efficiency and increasing learner satisfaction [7,8]. In the past, various models for classifying cognitive load have been actively researched and developed in the educational field to provide such personalized education [5,9]. The importance of personalized education lies in its ability to meet the individual needs and capabilities of learners, facilitating more effective acquisition and application of knowledge. This approach enhances the overall learning experience and contributes to better long-term learning outcomes [10,11].

Recently, research on developing models to measure and classify cognitive load in healthcare contexts has been increasing. The objective of this research is to accurately measure and classify cognitive load during physicians’ surgical procedures or patients’ rehabilitation activities, thereby providing more effective education and training [6]. Particularly, the use of physiological data to measure cognitive load has gained significant attention. Utilizing physiological data allows for more objective and precise measurement of cognitive load, greatly aiding in the provision of personalized education. For example, by monitoring real-time physiological indicators such as heart rate, brainwaves, and skin conductance, immediate feedback can be provided based on the learner’s cognitive load status, helping them maintain optimal learning conditions [12,13].

However, existing research primarily focuses on the development of cognitive load classification models. That is, most studies emphasize the process and performance evaluation of developing these models, but there is a lack of concrete guidelines on how to effectively develop these models. A scoping review of cognitive load classification models for healthcare education, based on substantial evidence, is needed to provide comprehensive guidance on model development. This approach is critical in devising educational methodologies tailored to the specific requirements and contexts of the healthcare sector, thereby maximizing educational effectiveness.

The aim of this study is to analyze the characteristics of cognitive load classification models developed in healthcare contexts. By doing so, we seek to provide guidelines for effectively developing these models in educational programs for medical students, professionals, patients, and the general public. This study aims to answer the following research questions: Firstly, how has the task design been conducted to develop cognitive load classification models in healthcare contexts? Secondly, what types of data have been collected to develop cognitive load classification models in healthcare contexts? Thirdly, how has the development of cognitive load classification models been carried out in healthcare contexts?

2. Method

This scoping review was structured and reported following the standards set by the updated PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 statement [14].

2.1. Search Strategy

We conducted comprehensive searches in five databases: Scopus (2004~), Web of Science (1990~), PubMed (National Library of Medicine; 1781~), EMBASE (1947~), and PsycINFO (1887~), chosen for their relevance to our study. The searches covered the literature published over the past ten years, covering the period from May 2014 to May 2024. Our search strategy utilized a combination of key terms across four categories: cognitive load, data, classification, and physiology. Detailed search queries for each database are documented in Table 1. Additionally, we conducted a hand search using Google Scholar. A professional research librarian advised on the development and implementation of our search strategies.

2.2. Inclusion and Exclusion Criteria

Our inclusion criteria encompassed original research articles published in peer-reviewed journals, written in English, that utilized physiological data to develop models for classifying cognitive load within a healthcare context. We included studies that did not rely on pre-existing data but instead used physiological data collected from experiments to develop classification models. Only full-text research papers were included. We excluded commentaries, editorials, book chapters, and letters. Our inclusion criteria are detailed in Table 2. After removing duplicates, three authors independently assessed the titles, abstracts, and full texts of all articles. Any disagreements between the authors were resolved by reaching a consensus through discussion.

2.3. Data Extraction

Data were extracted from all the eligible studies; our research focus was on methodological aspects of how cognitive load tasks were designed, what data was collected, and how classification models were developed. We developed a framework of classification model development (Figure 1) and coding scheme (Table 3) through supporting literature investigation and iterative testing and revision.

Regarding cognitive load task design, we examined what cognitive load type each study selected. The cognitive load type was categorized into intrinsic load, extraneous load, and overall load. Cognitive load is generally defined as the load imposed on working memory when performing a task. Among these, intrinsic and extraneous load are derived from Sweller’s cognitive load theory. Intrinsic cognitive load is related to the complexity of the task and is determined by the number of elements in the task and their interactions [15,16,17]. Extraneous cognitive load refers to the cognitive load that arises from processing information not directly related to the task, often due to the way the task is presented or the use of unnecessary cognitive resources [18,19,20]. The remaining element, germane load, was not included in the classification scheme. Unlike intrinsic and extraneous load, germane cognitive load does not result from the task itself but refers to the cognitive resources used to handle these loads [18,19,20].

Among these three cognitive loads, only intrinsic and extraneous cognitive load are included in our framework due to both conceptual ambiguity and practical challenges in operationalizing germane load within the reviewed studies. Conceptually, there has been ongoing debate over whether germane load should even be considered a distinct type of cognitive load. First, it is difficult to distinguish intrinsic cognitive load (arising from the inherent complexity of learning material) from germane load (arising from the cognitive effort used to organize, abstract, and integrate that material). As [21,22] note, the processing complexity involved in learning is conceptually intertwined with content complexity, making it difficult to isolate germane load from intrinsic load in empirical studies. Additionally, the operational definitions of germane load found in some studies are controversial. For instance, in [23], germane load was inferred post hoc through transfer test performance—i.e., higher test scores were interpreted as reflecting higher germane load, based on the assumption that successful learners must have allocated more working memory resources to deep processing. However, this post hoc logic stands in contrast to more direct, task-based manipulations used to define other cognitive load types [24]. For these reasons, while theoretical exploration of germane cognitive load remains meaningful, and its role in learning processes is supported conceptually, its use in the engineering of classification models based on physiological data is currently less practical.

Therefore, this study classified cognitive load into intrinsic load, extraneous load, and overall load, which measures the total load caused by the task without distinguishing between the intrinsic load and extraneous load. Cognitive load validation serves as the standard for classifying cognitive load. Methods for setting the ground truth for cognitive load classification models include using task design, subjective measurement, or a combination of both. We also examined how each ground truth was validated. In relation to tasks, we extracted information that could aid in cognitive load task design. Task manipulation is important as it divides the levels of cognitive load through the task. Therefore, we categorized task manipulation into controlling task difficulty, using secondary tasks. Additionally, we checked whether the tasks were authentic or experimental, which tasks were used, and how tasks were categorized according to Bloom’s Taxonomy.

To provide insights into how many participants are needed for data collection and what type of data are appropriate to collect, we gathered information on participants (healthcare professionals/patients/general public), number of participants (total participants, usable participants after exclusion, percentage of usable data), type of collected cognitive load data (physiological/subjective/performance), subcategories of physiological data (e.g., physiological—brain—EEG), and data preprocessing (instance unit, instance number, feature selection). Among the schemes, the type of cognitive load data can be categorized into physiological, subjective, and performance data [25]. Physiological data capture information from our bodies to assess cognitive load. Subjective data measure cognitive load based on individual perception, and performance data are based on the expectation that the task performance level represents cognitive load level [25]. Among these, physiological data are more objective than subjective data and more directly connected to body status in real-time [26]. Therefore, this study focused on physiological data. Data related to the brain, body, heart, eyes, and skin are utilized to assess cognitive load. In this study, physiological data were primarily classified based on these body parts, and their subcategories were identified (e.g., brain—electroencephalogram (EEG), functional near-infrared spectroscopy, fNIRS). Feature selection methods were categorized into filter, wrapper, and embedded methods [27,28]. The filter method involves identifying highly relevant features based on statistical methods such as correlation analysis. Wrapper methods search for the most efficient combinations within subsets of features, while embedded methods, like penalized regression analysis, filter out irrelevant features within the classification model itself.

Regarding model development, to identify information necessary for creating robust models, we extracted data on classification models (specific model, feature fusion techniques, overfitting prevention methods), and model performance (performance metrics reported, accuracy, performance compared to baseline). Among the schemes, methods to prevent overfitting include cross-validation, data augmentation, feature selection, and hyperparameter tuning. Cross-validation involves splitting the entire dataset into multiple subsets, using one subset as the validation set while training the model on the remaining subsets, and repeating this process iteratively to each subset. Data augmentation refers to creating new datasets in various ways to increase the amount of training data available. Feature selection excludes unnecessary features to prevent the model from overfitting. Hyperparameter tuning adjusts hyperparameters deliberately to simplify the model (for example, pruning in decision trees).

2.4. Assessment of Study Quality

In order to assess the quality of the selected studies, the QualSyst Standard Quality Assessment Criteria [15,16,17] were employed (Appendix A). The selected assessment tool was chosen for its applicability to the evaluation of the quality of quantitative studies, including non-randomized controlled trials. The criteria for evaluating quantitative studies are as follows: (1) clear objective, (2) appropriate study design, (3) proper subject, (4) detailed subject characteristics, (5) random allocation in interventional studies, (6) investigator blinding, (7) subject blinding in interventional studies, (8) robust outcome and exposure measures, (9) appropriate sample size, (10) adequate analytic methods, (11) variance estimates for results, (12) control of confounding, (13) detailed results reporting, and (14) evidence-supported conclusions. The 14 criteria for quantitative studies are scored on a scale from 0 (lowest quality) to 2 (highest quality). The quality of the study was defined using a final mean score, with a value below 0.50 indicating inadequate quality, a value between 0.50 and 0.70 indicating adequate quality, a value between 0.71 and 0.80 indicating good quality, and a value above 0.80 indicating strong quality [18,19,20]. A cutoff point of 0.71 was selected as a conservative threshold for inclusion. The final quality scores ranged from 0.91 to 1.00, indicating that all included studies were of good quality.

3. Results

3.1. Trial Flow and Features of Reviewed Studies

A total of 389 articles were identified in the five academic databases, with none found in gray literature. After removing 181 duplicates, 208 studies remained for title screening. Title review eliminated 106 studies, leaving 102 for abstract screening. Abstract review excluded 88 more, resulting in 14 studies for full-text assessment. Four studies were excluded during full-text review, leaving ten studies for final inclusion in this scoping review. Figure 2 details the exclusion process and reasons at each stage.

The ten studies included in the review were published between 2017 and 2023, encompassing a total of 311 participants. All of these studies were quantitative. Of the total studies, three engaged patients, another three involved healthcare professionals, and the remaining four had participants from the general public. The studies investigated the classification of cognitive load across various groups and conditions. They explored how cognitive load can be categorized in patients with stroke, autism, and cognitive impairments. Additionally, research examined the classification of cognitive demands during surgical operations and drug classification tasks for healthcare personnel. Finally, studies also assessed how cognitive load in the general public can be classified under stress conditions and in sedentary work environments. Detailed information on each study’s participant characteristics and context can be found in Appendix B.

3.2. How Has the Task Design Been Conducted to Develop Cognitive Load Classification Models in Healthcare Contexts? (RQ1)

(1): Cognitive load of interest and task design

Among the ten studies reviewed, seven measured intrinsic load, two measured extraneous load, and one measured overall load as shown in Table 4 and Figure 3. Studies measuring intrinsic load manipulated the task difficulty by altering the components of the task to distinguish between high and low cognitive load. In contrast, studies employing extraneous load aimed primarily to induce fatigue or cognitive pathology by disrupting cognitive activities. These studies used secondary tasks unrelated to the primary task to induce extraneous load. Participants were required to perform commonly used cognitive load tasks such as the Operation Span Task (OSPAN) or elementary arithmetic tasks alongside the primary task to create irrelevant load. The study focusing on overall load did not manipulate the task difficulty or add additional tasks; instead, it aimed to measure the total cognitive load experienced by learners performing the task. It used subjective measurements to assess the learners’ overall effort and mental demands throughout the task, rather than manipulating tasks.

(2): Validation of cognitive load

Out of the ten studies, three used only task design to measure cognitive load, the others validate the cognitive load with additional methods. Five used subjective measurement, and one of each used a mixed method, as shown in Table 5 and Figure 4. Of the five studies that used subjective measurement to verify whether the measured cognitive load is validated, four measured cognitive load with the NASA Task Load Index (NASA-TLX). The reason for using subjective measurement instead of task design was the concern that cognitive load levels of the same tasks could vary greatly depending on the individual and situation [5,29]. Lastly, there is one study using the mixed method based on the perceived task difficulty, task performance, and task design.

(3): Task type

The tasks used in the reviewed studies can be divided into authentic tasks relevant to the context in which the classification model is applied and experimental tasks used solely in experimental contexts. Out of the ten studies, five utilized authentic tasks, while the remaining studies used experimental tasks. Authentic tasks included walking, pill sorting, surgery, and driving, depending on the research purpose as shown in Table 6 and Figure 5. Experimental tasks included commonly used cognitive load tasks such as the OSPAN, multiple object tracking (MOT) task, N-back task, and elementary arithmetic tasks.

Additionally, the tasks used in the studies were categorized based on Bloom’s Taxonomy of cognitive learning [5,9], as shown in Table 6 and Figure 6. According to Bloom’s Taxonomy, tasks are divided into six levels: remembering, understanding, applying, analyzing, evaluating, and creating. Among the tasks, seven fell under the “applying” level, and two fell under the “remembering” level, and one fell under the “analyzing” level. The prevalence of tasks at the “applying” level is due to the inclusion of arithmetic tasks, which fall into this category.

3.3. What Types of Data Have Been Collected to Develop Cognitive Load Classification Models in Healthcare Contexts? (RQ2)

(1): Participant

The number of participants in the studies ranged from a minimum of 8 to a maximum of 154. However, excluding the study that collected data from 154 participants, all other studies collected data from fewer than 35 participants. The criteria for recruiting participants were set according to the purpose and conditions of the tasks. For instance, studies aimed at assisting patients or healthcare professionals involved participants who had the relevant medical conditions or were actual healthcare professionals. Additionally, tasks involving physical activity considered factors such as age and physical condition [1,2]. Even after excluding data with noise, at least approximately 85% of participant data were utilized for the model.

(2): Data collection

Cognitive load data can be categorized into physiological data, subjective measurement data, and behavioral data (Table 7). The collected physiological data were organized by related body parts, with each type counted separately when multiple types were employed in a single study. Among these, brain-related data were the most frequently utilized, appearing in seven studies. These included electroencephalography (EEG), functional magnetic resonance imaging (fMRI), and functional near-infrared spectroscopy (fNIRS). Body-related data, used in five studies, involved measurements from various body parts such as the hand, hip, head, leg, and muscle. Eye-related data, reported in two studies, encompassed blink frequency (BL), eye movement (EM), and pupil dilation (PU). Heart-related data, also found in two studies, included electrocardiogram (ECG), heart rate variability (HRV), respiration rate (RSP), and photoplethysmogram (PPG). Skin-related data, appearing in two studies, comprised galvanic skin response (GSR; also referred to as electrodermal activity, EDA) and skin temperature (ST).

Other than physiological data, the data utilized for developing classification models are subjective data and performance data. Subjective data were obtained from cognitive load questionnaires after the task, while performance data were derived from the process or results of tasks. Subjective data were employed as criteria for classifying cognitive load in five studies. NASA-TLX was used in four studies, and perceived task difficulty was used in the other study. Subjective data were considered as classification criteria instead of task design when significant individual differences in task response were anticipated or when task differentiation by difficulty level was challenging. Lastly, performance data were utilized in two studies.

3.4. How Has the Development of Cognitive Load Classification Models Been Carried out in Healthcare Contexts? (RQ3)

(1): Data preprocessing and feature selection

Instance refers to a vector of features, and the creation units of instances can be classified into participant, task, event, and time. Among all studies, eight generated instances at the task level, while two studies created instances at the time level as shown in Table 8 and Figure 7. The studies that created instances at the time level all utilized brain data. Excluding the two studies where instances were not reported, the total number of instances varied widely, from 16 to 16,291. The study that utilized the most instances employed a walking task, with each step serving as an instance, resulting in the highest instance despite having only 10 participants [1,2].

In terms of feature selection, five utilized only the filter method, three used both filter and wrapper methods, and the remaining two did not use any feature selection method, as shown in Table 8 and Figure 8. Specific filter methods employed included correlation analysis, Mann–Whitney-U test, principal component analysis (PCA), and independent component analysis (ICA). Wrapper methods used included recursive feature elimination (RFE) and forward feature selection (FFS).

(2): Classification model

Among all studies, five employed SVM, while the remaining studies used random forest (RF), k-nearest neighbors (KNN), adaptive boosting (AdaBoost), linear discriminant analysis (LDA), and distribution-based classifier (Parzen Window) as shown in Table 9. To combine features for model utilization, methods include feature-level fusion, which combines all features together, decision-level fusion, which builds models for individual features and integrates their results, and hybrid fusion, which uses both approaches [30]. All 10 studies in this research employed feature-level fusion.

(3): Overfitting Method

When developing classification models, one critical consideration is to avoid overfitting to the training data alone. Methods to prevent overfitting include cross-validation, data augmentation, feature selection, and hyperparameter tuning. In Table 10, all 10 studies in this research utilized cross-validation. Additionally, data augmentation was used in one study, and feature selection was employed in eight studies.

(4): Model Performance

Classification models are evaluated using various metrics to assess performance. While accuracy is a prominent metric, it is essential to consider multiple metrics such as precision, recall, F1 score, and area under the curve-receiver operating characteristic (AUC-ROC). Among all studies, five focused solely on accuracy, while others reported a combination of metrics including F1 score and AUC-ROC. However, one study did report accuracy as a range (0.93–1), as shown in Table 11.

All studies demonstrated performance improvements above baseline accuracy. Given the varying number of classification categories across studies, comparing model performance relative to baseline accuracy provides insight into their effectiveness. Excluding the study that did report accuracy range, the highest performance improvement among the studies was a 251.2% increase, while the lowest improvement was 39.6%.

4. Discussion

This study analyzed the characteristics of ten cognitive load classification models developed within the healthcare context using a scoping review methodology. The results in this study can be discussed in terms of different forms of validity relevant to utilizing or developing cognitive load classification models in healthcare education.

First, to develop a usable and valid cognitive load classification model, it is imperative to clearly define the specific type of cognitive load the model quantifies. This ensures the model’s applicability for its intended purpose and guides appropriate utilization. However, the majority of the analyzed studies did not refer to the exact definition of their measured cognitive load. Only the study of [27] classified its cognitive load as an intrinsic task. For the remaining studies, the target cognitive load type could only be inferred by examining the task and classifying it according to cognitive load theory. To prevent inappropriate use, the target cognitive load type should be explicitly stated in reference to established theory. Distinguishing the type of cognitive load also enables researchers and developers of cognitive load classification models to align their measurement objectives with the appropriate cognitive load type. This can increase content validity, the extent to which a measurement covers all the important aspects of the concept it is intended to measure.

Also, clearly defining the types of cognitive load informs task design and data selection, which in turn enhances construct validity, the extent to which the measurement actually captures the theoretical concept it claims to measure, by ensuring task manipulation to distinguish different types of cognitive load or high–low level of cognitive load and chosen data to properly reflect the target cognitive load. Considering the definition of cognitive load and alignment between the target cognitive load, task, and data enables all stakeholders to assess content and construct validity of the cognitive load classification model.

Secondly, the cognitive load classification model should be developed using real tasks rather than experimental contexts. This is because the actual situation in which the cognitive load classification model is used is not in a laboratory but in a real context. The model that is valid only in a laboratory context may have low validity in a real situation. Therefore, in order to ensure the consistency of measurement of cognitive load in diverse real-world contexts, it is imperative to employ authentic tasks, a point that has been underscored by [2]. This can enhance ecological validity, the extent to which the findings (or the measurement process) reflect real-world settings, tasks, and conditions. The utilization of authentic tasks enables models to capture the complexity of real-world contexts and increases their applicability in diverse healthcare settings.

Thirdly, a multimodal approach should be considered when measuring cognitive load with physiological data. The objective of the cognitive load classification model is to objectively measure cognitive load [12,13]. To this end, recent attempts have been made to develop a cognitive load classification model using objective and time-based physiological data instead of self-report questionnaires. While physiological data are sensitive to moment-to-moment changes in cognitive load, they can also capture other states unrelated to cognitive load. To validate the use of such data, comparisons should be made with established instruments such as cognitive load questionnaires, which—despite their retrospective nature—are supported by rigorous validation processes. References [6,27,31,32] examined cognitive load occurrence through both self-report and physiological measures. This approach is consistent with criterion validity, which assesses whether results from one measure relate to those from an accepted criterion. Collecting comparative data from validated tools ensures that physiological measures validly capture cognitive load.

Fourthly, the data collected should be closely aligned with the target cognitive load type and the task. For example, in the study of [2], physical data from the legs, not the hands, were collected to measure the cognitive load that occurs in a walking task. Conversely, the study by [4] utilized physiological data from the hands to assess the cognitive load in a drug classification task. While both studies measured extraneous cognitive load, the nature of the collected data varied according to the specific task. Ensuring such alignment strengthens validity by excluding irrelevant data and focusing on variables directly related to the task context. The design of the measurement tool (collecting data) must align with the core characteristics (extraneous cognitive load) of the task.

In considering the practical implementation of cognitive load classification models in real-time healthcare settings, several challenges must be addressed. Latency in processing high-frequency physiological data (e.g., EEG, ECG) has the potential to impede real-time feedback, thus necessitating the development of efficient algorithms and the use of edge computing solutions. The collection of sensitive physiological data gives rise to concerns regarding privacy and data security, particularly in patient-facing environments, thus emphasizing the significance of ethical data governance and robust security infrastructure. The integration of these systems into clinical workflows is hindered by hardware limitations, including the necessity for non-intrusive, wearable, and cost-effective sensors. Addressing these challenges is critical for the successful deployment and adoption of cognitive load models in healthcare education and training environments.

5. Conclusions

This study conducted a scoping review of related studies published from 2015 to 2024 to analyze cognitive load classification models using physiological data in the healthcare field. The results showed that, in terms of task design, tasks that adjusted difficulty to measure intrinsic load were most commonly used, while EEG and body movement data were the most frequently utilized for data collection. In terms of model development, SVM algorithms were most frequently employed, with cross-validation and feature selection being the primary methods to prevent model overfitting. Based on these findings, guidelines for the development of cognitive load classification models were proposed, emphasizing the importance of clear definitions of cognitive load, appropriate task design, reliable ground truth establishment, and contextually relevant data collection.

The primary contribution of this study lies in addressing a critical gap in the existing literature: while numerous studies focus on developing cognitive load classification models, there is a lack of comprehensive, evidence-based guidelines on their development within healthcare education. Unlike previous reviews that primarily summarize model performance, our scoping review provides actionable insights for educators, researchers, and developers. By systematically analyzing the methodologies behind task design, data collection, and model development in healthcare contexts, we offer a unique framework for understanding and optimizing the practical utility of these models. This includes emphasizing the necessity of clearly defining cognitive load types, designing contextually relevant tasks, establishing reliable ground truths, and selecting appropriate physiological data. This practical, guideline-oriented approach significantly advances the field by bridging the gap between model development and real-world application in healthcare education.

Ultimately, the insights derived from this review are fundamental for advancing personalized education within the healthcare domain. By accurately classifying cognitive load through physiological data, educators can adapt instructional strategies in real-time, ensuring that learning materials and activities are optimally challenging—neither overwhelming novice learners nor underutilizing the capacities of advanced students. This adaptive approach aligns with core principles of cognitive load theory [33,34], promoting deeper understanding and retention. For instance, in clinical training, a classification model could identify when a medical student is experiencing excessive extraneous cognitive load during a complex procedure, allowing for immediate intervention and adjustment of training methods. Similarly, for patients and the general public learning about health management, tailoring information delivery based on their cognitive state can significantly improve comprehension and adherence. Thus, the effective use of these models is poised to enhance the effectiveness of healthcare education, leading directly to better learning outcomes for students and professionals, and ultimately, improved health literacy and clinical practice across all populations.

Despite these aforementioned contributions, this study is subject to limitations. Notably, the small number of included papers (n = 10) significantly restricts the generalizability of our findings. This is primarily because our stringent inclusion criteria focused solely on studies that directly collected physiological data and subsequently developed classification models. The process of both physiological data collection and classification model development is inherently complex and resource-intensive, leading to a limited number of studies that satisfy both conditions. This scarcity highlights the emerging nature of research at the intersection of cognitive load classification and healthcare education, making broader generalizations challenging. Future research should address this by actively expanding the scope of reviews to include a larger and more diverse body of literature, potentially by incorporating additional fields (e.g., cognitive science, human–computer interaction) and literature databases beyond those used in this study. This approach is expected to enhance the effectiveness of healthcare education, leading to better learning outcomes and improved healthcare services.

Author Contributions

H.K. conceptualized and designed the study. M.K., H.K. and Y.H. conducted the review of literature and prepared the first draft. M.K., H.K. and Y.H. contributed to the review and revision in the first draft and approved the final version. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the 2025 Yeungnam University Research Grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no competing interests.

Abbreviations

AdaBoost	adaptive boosting
AUC-ROC	area under the curve-receiver operating characteristic
BL	blink frequency
CL	cognitive load
ECG	electrocardiogram
EDA	electrodermal activity
EEG	electroencephalogram
EM	eye movement
FFS	forward feature selection
fMRI	functional magnetic resonance imaging
fNIRS	functional near-infrared spectroscopy
GSR	galvanic skin response
HRV	heart rate variability
ICA	independent component analysis
KNN	k-nearest neighbors
LDA	linear discriminant analysis
MOT	multiple object tracking
NASA-TLX	NASA Task Load Index
OSPAN	Operation Span Task
PCA	principal component analysis
PPG	photoplethysmogram
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
PU	pupil dilation
RFE	recursive feature elimination
RSP	respiration rate
ST	skin temperature
SVM	support vector machine

Appendix A. Details of Reviewed Studies

Study

Q1. Cognitive Load Task Design

Q2. Data Collection

Q3. Model Development

Cognitive Load

Tasks

Participants

Data

Data Preprocessing

Classification Model

Performance of Model

CL Type

CL Validation

Task Manipulation

Task Type

Task Name

Bloom Taxonomy

Type

Total

Usable

% of Usable

Type

Sub-type

Instance Unit

Instance
Num

Feature Selection

Specific Model

Feature Fusion
Techniques

Overfitting Prevention Methods

Performance
Metrics
Reported

Accuracy

Performance Compared to Baseline

Dasgupta et al. (2018) [2]

Extraneous

N

Using Secondary Tasks

Authentic

Walking + Elementary Arithmetic

Applying

Patient

10

100

Physiological
Data

Body (Leg)

Task

16,291

Filter (Correlation), Wrapper (Recursive Feature Elimination (RFE))

RF, SVM

Feature-
Level

Cross-Validation, Feature Selection

Reporting Another Index

Ranged of 0.93–1

N

Dorum et al. (2020)
[13]

Intrinsic

N

Controlling Difficulty

Experimental

Multiple Object Tracking (MOT) Task

Remembering

Patient

154

144

93.5

Physiological
Data

Brain (fMRI)

Task

432

Unused

LDA

Feature-
Level

Cross-Validation

Reporting Only Accuracy

95.8

187.8

Kohout et al. (2019)
[4]

Extraneous

Task Performance

Using Secondary Tasks

Authentic

Pill Sorting + OSPAN

Applying

Healthcare Professional

8

100

Physiological
Data

Body (Hand, Hip)

Task

16

Filter (Mann–Whitney-U Test), Wrapper (Forward
Feature Selection (FFS))

SVM

Feature-
Level

Cross-Validation, Feature Selection

Reporting Another Index

90.0

80

Keles et al. (2021)
[6]

Overall

Subjective Measurement
(NASA-TLX)

N

Authentic

Laparoscopic Surgery tasks

Applying

Healthcare Professional

33

28

84.9

Physiological, Subjective
Data

Brain (fNIRS)

Time

Unreported

Filter (Pearson Correlation)

SVM

Feature-
Level

Cross-Validation, Feature Selection

Reporting Only Accuracy

90.0

80

Zhang et al. (2017)
[29]

Intrinsic

Subjective Measurement (Perceived Task Difficulty), Task Performance

Controlling Difficulty

Authentic

Driving

Applying

Patient

20

100

Physiological, Subjective, Performance
Data

Brain (EEG), Eye (BL, EC, EM, PU), Heart (ECG, RSP, PPG), Skin (GSR, ST), Body (Muscle)

Task

286

Filter (Principal Component Analysis (PCA))

KNN

Feature-
Level

Cross-Validation, Feature Selection

Reporting Only Accuracy

84.4

68.9

Zhou et al. (2020)
[27]

Intrinsic

Subjective Measurement (NASA-TLX)

Controlling Difficulty

Authentic

Surgery

Applying

Healthcare Professional

12

100

Physiological, Subjective
Data

Brain (EEG), Heart (HRV), Body (Hand, Muscle), Skin (GSR)

Task

119

Filter (Independent Component Analysis (ICA))

SVM

Feature-
Level

Cross-Validation, Feature Selection

Reporting Another Index

83.2

66.4

Beiramvand et al. (2023)
[8]

Intrinsic

N

Controlling Difficulty

Experimental

N-Back Task

Remembering

General Public

15

Unreported

Physiological
Data

Brain (EEG)

Time

Unreported

Unused

AdaBoost

Feature-
Level

Cross-Validation, Data Augmentation

Reporting Another Index

80.9

61.7

Chen & Epps (2019)
[35]

Intrinsic

Subjective Measurement
(Unspecified)

Controlling Difficulty

Experimental

Elementary Arithmetic

Applying

General Public

24

100

Physiological
Data

Body (Head)

Task

336

Filter

Parzen Window

Feature-
Level

Cross-Validation, Feature Selection

Reporting Only Accuracy

69.8

39.6

Gogna et al. (2024)
[31]

Intrinsic

Subjective Measurement
(NASA-TLX)

Controlling Difficulty

Experimental

Game (spotting the differences in similar-looking pictures)

Analyzing

General Public

15

100

Physiological
Data

Brain (EEG)

Task

45

Wrapper (Recursive Feature Elimination, RFE)

SVM

Feature-
Level

Cross-Validation, Feature Selection, Hyperparameter Tuning

Reporting Another Index

91.2

173.9

Yu et al. (2024)
[32]

Intrinsic

Subjective Measurement
(NASA-TLX)

Controlling Difficulty

Experimental

Elementary Arithmetic

Applying

General Public

20

100

Physiological, Subjective
Data

Brain (fMRI), Eye (BL, EM, PU)

Task

300

Filter

RF

Feature-
Level

Cross-Validation, Feature Selection

Reporting Only Accuracy

87.8

251.2

Appendix B. Distribution of Quality Assessment Scores (QualSyst) Across Included Studies

Element	Chen & Epps (2019) [35]	Dasgupta et al. (2018) [2]	Dorum et al. (2020) [13]	Keles et al. (2021) [6]	Kohout et al. (2019) [4]	Zhang et al. (2017) [29]	Zhou et al. (2020) [27]	Beiramvand et al. (2023) [8]	Gogna et al. (2024) [31]	Yu et al. (2024) [32]
1. Objective	2	2	2	2	2	2	2	2	2	2
2. Study design	2	2	2	2	2	2	2	2	2	2
3. Group selection	2	2	2	2	2	2	2	2	2	2
4. Subject characteristics	2	2	2	2	2	2	2	2	2	2
5. Random allocation	.	.	.	.	.	.	.	.	.	.
6. Investigator blinding	.	.	.	.	.	.	.	.	.	.
7. Subject blinding	.	.	.	.	.	.	.	.	.	.
8. Outcome/exposure measures	2	2	2	2	2	2	2	2	2	2
9. Sample size	.	.	.	1	.	.	.	.	.	.
10. Analytical methods	2	2	2	2	2	2	2	2	2	2
11. Variance estimates	2	2	2	2	2	2	2	2	2	2
12. Confounding control	1	1	1	1	1	2	1	1	1	1
13. Results detail	2	2	2	2	2	2	2	2	2	2
14. Conclusions	2	2	2	2	2	2	2	2	2	2
Total	19	19	19	20	19	20	19	19	19	19
Average	0.95	0.95	0.95	0.91	0.95	1.00	0.95	0.95	0.95	0.95

References

Brüggemann, T.; Ludewig, U.; Lorenz, R.; McElvany, N. Effects of mode and medium in reading comprehension tests on cognitive load. Comput. Educ. 2023, 192, 104649. [Google Scholar] [CrossRef]
Dasgupta, P.; VanSwearingen, J.; Sejdic, E. “You can tell by the way I use my walk.” Predicting the presence of cognitive load with gait measurements. Biomed. Eng. Online 2018, 17, 122. [Google Scholar] [CrossRef]
Sweller, J. Cognitive load theory. In International Encyclopedia of Education, 4th ed.; Elsevier: Amsterdam, The Netherlands, 2023; pp. 127–134. [Google Scholar] [CrossRef]
Kohout, L.; Butz, M.; Stork, W. Using Acceleration Data for Detecting Temporary Cognitive Overload in Health Care Exemplified Shown in a Pill Sorting Task. In Proceedings of the 32nd IEEE International Symposium on Computer-Based Medical Systems (IEEE CBMS), Cordoba, Spain, 5–7 June 2019; pp. 20–25. [Google Scholar]
Skulmowski, A.; Xu, K.M. Understanding Cognitive Load in Digital and Online Learning: A New Perspective on Extraneous Cognitive Load. Educ. Psychol. Rev. 2021, 34, 171–196. [Google Scholar] [CrossRef]
Keles, H.O.; Cengiz, C.; Demiral, I.; Ozmen, M.M.; Omurtag, A.; Sakakibara, M. High density optical neuroimaging predicts surgeons’s subjective experience and skill levels. PLoS ONE 2021, 16, e0247117. [Google Scholar] [CrossRef]
Kucirkova, N.; Gerard, L.; Linn, M.C. Designing personalised instruction: A research and design framework. Br. J. Educ. Technol. 2021, 52, 1839–1861. [Google Scholar] [CrossRef]
Beiramvand, M.; Lipping, T.; Karttunen, N.; Koivula, R. Mental Workload Assessment Using Low-Channel Pre-Frontal EEG Signals. In Proceedings of the 2023 IEEE International Symposium on Medical Measurements and Applications, Jeju, Republic of Korea, 14–16 June 2023; pp. 1–5. [Google Scholar]
Anderson, L.W.; Krathwohl, D.R. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives; Complete edition; Addison Wesley Longman: New York, NY, USA, 2021. [Google Scholar]
Reddy, L.V. Personalized recommendation framework in technology enhanced learning. J. Emerg. Technol. Innov. Res. 2018, 5, 746–750. [Google Scholar]
Mu, S.; Cui, M.; Huang, X. Multimodal Data Fusion in Learning Analytics: A Systematic Review. Sensors 2020, 20, 6856. [Google Scholar] [CrossRef]
Wu, C.; Liu, Y.; Guo, X.; Zhu, T.; Bao, Z. Enhancing the feasibility of cognitive load recognition in remote learning using physiological measures and an adaptive feature recalibration convolutional neural network. Med. Biol. Eng. Comput. 2022, 60, 3447–3460. [Google Scholar] [CrossRef] [PubMed]
Dørum, E.S.; Kaufmann, T.; Alnæs, D.; Richard, G.; Kolskår, K.K.; Engvig, A.; Sanders, A.-M.; Ulrichsen, K.; Ihle-Hansen, H.; Nordvik, J.E.; et al. Functional brain network modeling in sub-acute stroke patients and healthy controls during rest and continuous attentive tracking. Heliyon 2020, 6, 11. [Google Scholar] [CrossRef] [PubMed]
Page, M.J.; McKenzie, J.; Bossuyt, P.; Boutron, I.; Hoffmann, T.; Mulrow, C.; Moher, D. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. arXiv 2020. [Google Scholar] [CrossRef]
Dong, H.; Lio, J.; Sherer, R.; Jiang, I. Some Learning Theories for Medical Educators. Med. Sci. Educ. 2021, 31, 1157–1172. [Google Scholar] [CrossRef] [PubMed]
Kmet, L.M.; Lee, R.C.; Cook, L.S. Standard Quality Assessment Criteria for Evaluating Primary Research Papers From a Variety of Fields. (HTA Initiative #13); Alberta Heritage Foundation for Medical Research: Edmonton, AB, Canada, 2004. [Google Scholar]
Sweller, J. Element interactivity and intrinsic, extraneous, and germane cognitive load. Educ. Psychol. Rev. 2010, 22, 123–138. [Google Scholar] [CrossRef]
Sarkhani, N.; Beykmirza, R. Patient Education Room: A New Perspective to Promote Effective Education. Asia Pac. J. Public. Heal. 2022, 34, 881–882. [Google Scholar] [CrossRef]
Lee, L.; Packer, T.L.; Tang, S.H.; Girdler, S. Self-management education programs for age-related macular degeneration: A systematic review. Australas. J. Ageing 2008, 27, 170–176. [Google Scholar] [CrossRef]
Sweller, J.; Ayres, P.; Kalyuga, S. Cognitive Load Theory; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
de Jong, T. Cognitive load theory, educational research, and instructional design: Some food for thought. Instr. Sci. 2010, 38, 105–134. [Google Scholar] [CrossRef]
Sweller, J.; Ayres, P.; Kalyuga, S. The Expertise Reversal Effect. In Cognitive Load Theory, Explorations in the Learning Sciences; Instructional Systems and Performance Technologies; Sweller, J., Ayres, P., Kalyuga, S., Eds.; Springer: New York, NY, USA, 2011; pp. 155–170. [Google Scholar]
Zu, T.; Hutson, J.; Loschky, L.C.; Rebello, N.S. Using eye movements to measure intrinsic, extraneous, and germane load in a multimedia learning environment. J. Educ. Psychol. 2020, 112, 1338. [Google Scholar] [CrossRef]
Paas, F.; Renkl, A.; Sweller, J. Cognitive Load Theory: Instructional Implications of the Interaction between Information Structures and Cognitive Architecture. Instr. Sci. 2004, 32, 1–8. [Google Scholar] [CrossRef]
Haapalainen, E.; Kim, S.; Forlizzi, J.F.; Dey, A.K. Psycho-Physiological Measures for Assessing Cogni-Tive load. In Proceedings of the 12th ACM international conference on Ubiquitous computing, Copenhagen, Denmark, 26–29 September 2010; pp. 301–310. [Google Scholar]
Guo, J.; Dai, Y.; Wang, C.; Wu, H.; Xu, T.; Lin, K. A physiological data-driven model for learners’ cognitive load detection using HRV-PRV feature fusion and optimized XGBoost classification. Softw. Pr. Exp. 2020, 50, 2046–2064. [Google Scholar] [CrossRef]
Zhou, T.; Cha, J.S.; Gonzalez, G.; Wachs, J.P.; Sundaram, C.P.; Yu, D. Multimodal Physiological Signals for Workload Prediction in Robot-assisted Surgery. ACM Trans. Hum. Robot. Interact. 2020, 9, 12. [Google Scholar] [CrossRef]
Kumar, V.; Minz, S. Feature selection. SmartCR 2014, 4, 211–229. [Google Scholar] [CrossRef]
Zhang, L.; Wade, J.; Bian, D.; Fan, J.; Swanson, A.; Weitlauf, A.; Warren, Z.; Sarkar, N. Cognitive Load Measurement in a Virtual Reality-Based Driving System for Autism Intervention. IEEE Trans. Affect. Comput. 2017, 8, 176–189. [Google Scholar] [CrossRef]
Atrey, P.K.; Hossain, M.A.; El Saddik, A.; Kankanhalli, M.S. Multimodal fusion for multimedia analysis: A survey. Multimedia Syst. 2010, 16, 345–379. [Google Scholar] [CrossRef]
Gogna, Y.; Tiwari, S.; Singla, R. Evaluating the performance of the cognitive workload model with subjective endorsement in addition to EEG. Med. Biol. Eng. Comput. 2024, 62, 2019–2036. [Google Scholar] [CrossRef] [PubMed]
Yu, R.; Chan, A. Effects of player–video game interaction on the mental effort of older adults with the use of electroencephalography and NASA-TLX. Arch. Gerontol. Geriatr. 2024, 124, 105442. [Google Scholar] [CrossRef] [PubMed]
Kalyuga, S. Cognitive Load in Adaptive Multimedia Learning. In New Perspectives on Affect and Learning Technologies; Springer: New York, NY, USA, 2011; pp. 203–215. [Google Scholar]
Mihalca, L.; Salden, R.J.; Corbalan, G.; Paas, F.; Miclea, M. Effectiveness of cognitive-load based adaptive instruction in genetics education. Comput. Hum. Behav. 2011, 27, 82–88. [Google Scholar] [CrossRef]
Chen, S.; Epps, J. Atomic Head Movement Analysis for Wearable Four-Dimensional Task Load Recognition. IEEE J. Biomed. Health Inform. 2019, 23, 2464–2474. [Google Scholar] [CrossRef]

Figure 1. Framework of classification model development.

Figure 2. Trial flow.

Figure 3. Types of cognitive load.

Figure 4. Validation of cognitive load.

Figure 5. Task type of cognitive load.

Figure 6. Categorization of cognitive load tasks based on Bloom’s Taxonomy.

Figure 7. Instance type.

Figure 8. Feature selection method.

Table 1. Search queries.

Database	Query
Scopus	TITLE-ABS-KEY ((“cognitive load”) AND (physiolog* OR psychophysiolog* OR bio) AND (data) AND (classif))
Web of Science	ALL=((“cognitive load”) AND (physiolog* OR psychophysiolog* OR bio) AND (data) AND (classif))
PubMed	(“cognitive load”) AND (physiolog* OR psychophysiolog* OR bio) AND (data) AND (classif)
EMBASE	((“cognitive load”) AND (physiolog* OR psychophysiolog* OR bio*) AND (data) AND (classification))/br
PsycINFO	(“cognitive load”) AND (physiolog* OR psychophysiolog* OR bio*) AND (data) AND (classification)

Table 2. Inclusion and exclusion criteria.

Criteria	Inclusion	Exclusion
Time period	Published from May, 2014 to 2024
Language	English	Not English
Type of article	Peer-reviewed journal publications	Non-peer-reviewed publications
Type of study	Full-text research papers	Commentaries, editorials, book chapters, letters
Context	Healthcare context	Not related to healthcare context (e.g., data analysis, machine learning, studying, manufacturing)
Study focus	Studies using physiological data collected from experiments to develop classification models of cognitive load	Not collecting data Not using physiological data

Table 3. Coding scheme.

Research Question	Component	Category	Subcategory
1. Cognitive Load Tasks Design	Cognitive load	Type	Extraneous, Intrinsic, Overall
	Cognitive load	Validation	Task Performance, Subjective Measurement
	Tasks	Manipulation	Using Secondary Tasks, Controlling Difficulty
		Type	Authentic, Experimental
		Bloom Taxonomy	Remembering, Understanding, Applying, Analyzing, Evaluation, Creating
2. Data Collection	Participants	Type	Patient, Healthcare Professional, General Public
	Data	Type	Physiological Data, Subjective Data, Performance Data
	Data Preprocessing	Instance Unit	Task, Time
	Data Preprocessing	Feature Selection	Filter, Wrapper
3. Classification Models Development	Classification Model	Specific Model	RF, SVM, LDA, KNN, AdaBoost, Parzen Window
	Classification Model	Overfitting Prevention Methods	Cross-validation, Feature Selection, Hyperparameter Tuning, Dyadic Interaction
	Performance of Model	Performance Metrics Reported	Reporting Another Index, Reporting Only Accuracy

Table 4. Type of cognitive load and task manipulation.

Type of CL	Task Manipulation	Number of Studies
Intrinsic Load	Controlling task difficulty	7
Extraneous Load	Using secondary task	2
Overall Load	Not used	1

Table 5. Validation of cognitive load.

CL Validation		Number of Studies
Only Task Design		3
Task Performance		1
Subjective Measurement	NASA-TLX	4
Subjective Measurement	Unspecified	1
Mixed	Perceived Task Difficulty and Task Performance	1

Table 6. Task type of cognitive load.

Category	Subcategory	Number of Studies
Task Type	Authentic	5
Task Type	Experimental	5
Task Type according to Bloom Taxonomy	Remembering	2
	Understanding	0
	Applying	7
	Analyzing	1
	Evaluation	0
	Creating	0

Table 7. Collected cognitive load data.

Data Category	Data Sub-Category		Number of Studies
Physiological Data	Brain	EEG, fMRI, fNIRS	7
	Body	Hand, Hip, Head, Leg, Muscle	5
	Eye	BL, EM, PU	2
	Heart	ECG, HRV, RSP, PPG	2
	Skin	GSR (EDA), ST	2
Subjective Data		NASA-TLX	4
Subjective Data		Perceived task difficulty	1
Performance Data	Process	Driving	1
Performance Data	Result	Driving	1

Table 8. Instance type and feature selection method.

Category	Sub-Category	Number of Studies
Instance Type	Task level	8
Instance Type	Time level	2
Feature Selection Method	Filter method	5
	Both filter and wrapper methods	3
	Not used	2

Table 9. Classification model.

Category	Number of Studies
Support Vector Machine (SVM)	4
Support Vector Machine (SVM), Random Forest (RF)	1
Random Forest (RF)	1
K-Nearest Neighbors (KNN)	1
Adaptive Boosting (AdaBoost)	1
Linear Discriminant Analysis (LDA)	1
Distribution-Based Classifier (Parzen Window)	1

Table 10. Overfitting method.

Category	Number of Studies
Cross-Validation, Feature Selection	7
Cross-Validation, Feature Selection, Hyperparameter Tuning	1
Cross-Validation, Data Augmentation	1
Cross-Validation	1

Table 11. Model performance.

Category	Number of Studies
Reporting Only Accuracy	5
Reporting Accuracy with Another Index	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, H.; Kim, M.; Han, Y. Models for Classifying Cognitive Load Using Physiological Data in Healthcare Context: A Scoping Review. Appl. Sci. 2025, 15, 9155. https://doi.org/10.3390/app15169155

AMA Style

Kim H, Kim M, Han Y. Models for Classifying Cognitive Load Using Physiological Data in Healthcare Context: A Scoping Review. Applied Sciences. 2025; 15(16):9155. https://doi.org/10.3390/app15169155

Chicago/Turabian Style

Kim, Hyeongjo, Minji Kim, and Yejin Han. 2025. "Models for Classifying Cognitive Load Using Physiological Data in Healthcare Context: A Scoping Review" Applied Sciences 15, no. 16: 9155. https://doi.org/10.3390/app15169155

APA Style

Kim, H., Kim, M., & Han, Y. (2025). Models for Classifying Cognitive Load Using Physiological Data in Healthcare Context: A Scoping Review. Applied Sciences, 15(16), 9155. https://doi.org/10.3390/app15169155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Models for Classifying Cognitive Load Using Physiological Data in Healthcare Context: A Scoping Review

Abstract

1. Introduction

2. Method

2.1. Search Strategy

2.2. Inclusion and Exclusion Criteria

2.3. Data Extraction

2.4. Assessment of Study Quality

3. Results

3.1. Trial Flow and Features of Reviewed Studies

3.2. How Has the Task Design Been Conducted to Develop Cognitive Load Classification Models in Healthcare Contexts? (RQ1)

3.3. What Types of Data Have Been Collected to Develop Cognitive Load Classification Models in Healthcare Contexts? (RQ2)

3.4. How Has the Development of Cognitive Load Classification Models Been Carried out in Healthcare Contexts? (RQ3)

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Details of Reviewed Studies

Appendix B. Distribution of Quality Assessment Scores (QualSyst) Across Included Studies

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI