Multimodal Hidden Markov Models for Real-Time Human Proficiency Assessment in Industry 5.0: Integrating Physiological, Behavioral, and Subjective Metrics

Alsanousi, Mowffq M.; Prabhu, Vittaldas V.

doi:10.3390/app15147739

Open AccessArticle

Multimodal Hidden Markov Models for Real-Time Human Proficiency Assessment in Industry 5.0: Integrating Physiological, Behavioral, and Subjective Metrics

by

Mowffq M. Alsanousi

^1,2

and

Vittaldas V. Prabhu

^1,*

¹

Industrial and Manufacturing Engineering Department, Pennsylvania State University, State College, PA 16802, USA

²

Department of Architectural Engineering and Construction Management, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7739; https://doi.org/10.3390/app15147739

Submission received: 20 June 2025 / Revised: 8 July 2025 / Accepted: 9 July 2025 / Published: 10 July 2025

(This article belongs to the Special Issue Applications of Artificial Intelligence in Industrial Engineering)

Download

Browse Figures

Versions Notes

Abstract

This paper presents a Multimodal Hidden Markov Model (MHMM) framework specifically designed for real-time human proficiency assessment, integrating physiological (Heart Rate Variability (HRV)), behavioral (Task Completion Time (TCT)), and subjective (NASA Task Load Index (NASA-TLX)) data streams to infer latent human proficiency states in industrial settings. Using published empirical data from the surgical training literature, a comprehensive simulation study was conducted, with the MHMM (Trained) achieving 92.5% classification accuracy, significantly outperforming unimodal Hidden Markov Model (HMM) variants 61–63.9% and demonstrating competitive performance with advanced models such as Long Short-Term Memory (LSTM) networks 90%, and Conditional Random Field (CRF) 88.5%. The framework exhibited robustness across stress-test scenarios, including sensor noise, missing data, and imbalanced class distributions. A key advantage of the MHMM over black-box approaches is its interpretability by providing quantifiable transition probabilities that reveal learning rates, forgetting patterns, and contextual influences on proficiency dynamics. The model successfully captures context-dependent effects, including task complexity and cumulative fatigue, through dynamic transition matrices. When demonstrated through simulation, this framework establishes a foundation for developing adaptive operator-AI collaboration systems in Industry 5.0 environments. The MHMM’s combination of high accuracy, robustness, and interpretability makes it a promising candidate for future empirical validation in real-world industrial, healthcare, and training applications in which it is critical to understand and support human proficiency development.

Keywords:

Industry 5.0; human proficiency assessment; Multimodal Hidden Markov Model; real-time monitoring; human–AI collaboration; temporal modeling; cognitive workload

1. Introduction

1.1. From Automation to Augmentation: The Rise of Industry 5.0

The rise of Industry 5.0 marks a shift from automation-driven systems to a more human-centric industrial paradigm. While Industry 4.0 emphasized digitalization, the Internet of Things (IoT), and cyber-physical systems to boost efficiency, it often resulted in increased cognitive and operational demands on human workers [1]. In contrast, Industry 5.0 emphasizes collaboration between humans and intelligent machines, aiming for enhanced adaptability, resilience, and sustainability [2]. It envisions empowering workers through systems that are not only precise but also empathetic and responsive to human needs [2].

A compelling example from Zakeri et al. (2023) illustrates this need: in a smart factory, an operator collaborating with a collaborative robot (cobot) on a pick-and-place task that incorporated a Stroop-style secondary task experienced a marked increase in missed-beep errors due to high task complexity and a cobot speed of 1 m s an overload that NASA Task Load Index (NASA-TLX) scores registered only partially and less sensitively than neural measures [3]. A multimodal approach using electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) data predicted missed-beep events with 77.8% accuracy, enabling the timely detection of cognitive overload and suggesting adaptive support. This highlights the critical demand for dynamic, real-time assessment tools [3], motivating the development of the proposed Multimodal Hidden Markov Model (MHMM). Operators in this evolving context must interpret complex real-time data, interact with cobots, and manage unexpected disruptions, tasks that demand continuous cognitive, physiological, and behavioral engagement [4]. These dynamics call for AI systems that are human-aware, capable of supporting operators in real time, and fully aligned with the adaptive and inclusive vision of Industry 5.0 [5].

1.2. Limitations of Static and Unimodal Proficiency Assessments

Traditional proficiency assessments, such as retrospective self-reports (e.g., the NASA Task Load Index [NASA-TLX]), unimodal physiological indicators (e.g., heart rate variability [HRV]) or basic behavioral metrics (e.g., task completion time [TCT]), are inadequate for capturing the dynamic and complex demands of modern performance environments [6,7,8]. These methods exhibit several critical limitations:

Temporal Disconnect: Retrospective tools, such as NASA-TLX, fail to capture moment-to-moment fluctuations in workload or proficiency, limiting their ability to reflect real-time cognitive demands [6].
Subjective Variability: Self-reported data are susceptible to recall bias, fatigue, and individual differences in perception, reducing reliability [6,9].
Incomplete Insight: Unimodal metrics, such as HRV or TCT, often overlook contextual or cognitive nuances essential for comprehensive performance assessment [10,11,12,13].

Even advanced tools such as the Surgery Task Load Index (SURG-TLX) inherit many of these limitations, including subjective bias and a lack of real-time granularity [14]. Similarly, unimodal models struggle to generalize across diverse settings due to noise and individual physiological or behavioral variability [12,15,16]. To address these shortcomings, a new assessment paradigm is required-one that integrates context-sensitive, multimodal data (e.g., combining physiological, behavioral, and contextual inputs) and operates in real time to provide a more accurate and holistic evaluation of proficiency [12,17].

The dynamic nature of human proficiency in Industry 5.0 environments, influenced by factors such as learning, fatigue, and task complexity, necessitates assessment tools capable of capturing these temporal fluctuations. Unlike traditional static evaluations that offer only a snapshot in time, a robust assessment framework must track skill acquisition (e.g., the transition from a ‘Novice’ to an ‘Intermediate’ state [18,19]) and identify periods of skill decay or fatigue-induced regression [12]. This real-time understanding of an operator’s evolving competence is crucial for developing human-centric AI systems that can provide adaptive support, personalize training interventions, and optimize human–AI collaboration [17,20].

1.3. Multimodal Integration for Comprehensive Proficiency Assessment

Proficiency is a multidimensional construct influenced by diverse data streams:

Physiological Metrics: HRV serves as a reliable indicator of stress and cognitive workload [21].
Behavioral Metrics: TCT measures task efficiency and motor fluency, reflecting operational performance [22].
Subjective Metrics: NASA-TLX quantifies the perceived workload and effort, capturing the operator’s subjective experience [23].

These findings highlight that multimodal integration offers a robust framework for context-aware, real-time proficiency assessment in high-stakes industrial environments.

1.4. The Multimodal Hidden Markov Model (MHMM): A Dynamic Solution

To overcome the limitations of static and unimodal assessments, this research proposes a Multimodal Hidden Markov Model (MHMM). The MHMM is parameterized using published empirical data from surgical training studies (Shrivastava et al., 2025) [24], ensuring its parameters reflect real-world physiological, behavioral, and subjective metrics. It dynamically infers latent proficiency states (Novice, Intermediate, and Expert) using three complementary observation streams:

HRV (Physiological): Indicates stress and cognitive load.
TCT (Behavioral): Reflects task fluency.
NASA-TLX (Subjective): Captures perceived task demand.

The MHMM leverages this integration and models temporal dynamics through a probabilistic framework, capturing transitions such as learning progression, fatigue onset, or proficiency decay. Grounded in classic learning theories (Dreyfus’s five-stage model and Fitts & Posner’s three-phase framework) the MHMM is parameterized with empirical benchmarks drawn from surgical simulation studies and smart-factory task-analysis datasets [24,25,26,27,28]. In contrast to black-box architectures such as deep neural networks, the MHMM provides interpretable transition matrices and state-specific emission probabilities, enabling real-time insights into operator development and risk.

Aligned with the core tenets of Industry 5.0, the proposed MHMM advances the following:

Human-centric AI by tailoring assistance to operator states.
Operational resilience through the early detection of overload or regression.
Sustainable workforce development by proactively mitigating burnout.

The MHMM offers a scalable and interpretable framework for context-aware, real-time proficiency monitoring, supporting adaptive human–AI collaboration in safety-critical and cognitively demanding environments.

2. Literature Review

2.1. Evolution from Industry 4.0 to Industry 5.0: Cognitive Burden and Human–AI Collaboration

The industrial environment has undergone significant transformation due to the digitalization and automation associated with Industry 4.0. This revolution is marked by the integration of cyber-physical systems, the IoT, and advanced data analytics, which have enhanced productivity, operational efficiency, and connectivity across sectors, as per Arcelay et al. [29,30]. However, these advancements have introduced challenges for human operators, including heightened cognitive burdens and mental stress caused by complex human–machine interfaces and continuous monitoring requirements [31,32]. Research indicates that while collaborative robots (cobots) are designed to assist human workers, their unpredictable behaviors and lack of contextual awareness can increase the cognitive workload and stress for operators [31,32].

The term Operator 4.0 refers to the modern worker who is required to manage, troubleshoot, and adapt to increasingly complex systems, generating skill demands that can adversely affect mental health [30]. Conversely, Industry 5.0 emerges as a corrective approach that emphasizes human-centricity, sustainability, and resilience, highlighting the essential role of human workers in production processes [33]. This new paradigm proposes a future where technology not only enhances human creativity and critical thinking but also fosters complex problem-solving skills rather than supplanting them [2].

Central to Industry 5.0 is the idea of Operator 5.0, where humans and machines collaborate synergistically, leveraging AI’s precision alongside human traits such as judgment and adaptability [33]. This shift underscores the importance of integrating advanced human–machine interfaces (HMIs), digital twins, and AI-driven co-creation tools, all aimed at enhancing and supporting human capabilities in real time [34]. Furthermore, Industry 5.0 promotes the development of personalized, sustainable, and resilient industrial ecosystems, prioritizing inclusivity, workers’ well-being, and ongoing innovation [35].

2.2. Limitations of Traditional Proficiency Assessment Models

Traditional models for assessing human proficiency face substantial challenges when applied to the dynamic contexts of Industry 5.0. These models often rely on static and retrospective methodologies that can limit their effectiveness in the rapidly changing industrial environments characterized by increased collaboration between humans and machines. For instance, the NASA-TLX serves as a conventional assessment tool that provides evaluations of workload and performance post-task completion. However, this approach does not adequately reflect the temporal dynamics of human performance related to fluctuating factors, such as learning, fatigue, and stress [36].

Moreover, existing methodologies such as the NASA-TLX encounter methodological limitations, such as involving forced-pairwise rankings, which some authors argue may limit the reliability of workload assessments in fast-paced industrial settings [6,37]. Unimodal approaches, whether they focus on physiological, behavioral, or subjective metrics, fail to capture the complex, context-dependent nature of real-world operator performance, leaving significant gaps in understanding human proficiency [17].

The lack of temporal and contextual resolution in these traditional models is a critical impediment to achieving the adaptive and collaborative aspirations of Industry 5.0. Real-time assessments of proficiency must evolve to account for myriad factors, including task complexity and individual operator characteristics [12]. Thus, assessment systems must develop capabilities for the real-time monitoring and interpretation of these variables, providing timely insights that facilitate adaptive behaviors and effective human–AI collaboration [17].

2.3. The Multifaceted Nature of Human Proficiency

The multifaceted nature of human proficiency underscores the intricate interplay between cognitive, physiological, and behavioral factors affecting an individual’s ability to perform tasks effectively and adapt to varying demands. Traditional assessments often fall short of capturing this complexity, necessitating a nuanced understanding of various dimensions of human performance.

Cognitive factors, including attention, perception, memory, and decision making, play a significant role in proficiency. Research indicates that cognitive performance can be impacted by physical activity, showing improved post-exercise capabilities, which highlights the dynamic nature of cognition and its susceptibility to various influencing factors, according to Chang et al. [38]. Furthermore, cognitive functions influenced by fatigue and stress directly correlate with performance outcomes; this connection emphasizes the need for the continuous monitoring of these factors [6].

Physiological factors can also be informative, with indicators such as HRV and electrodermal activity serving as objective metrics of stress and cognitive load [39]. Studies illustrate that the integration of these physiological signals can enhance our understanding of overall readiness and performance capabilities, affirming that physiological responses provide valuable insights into cognitive stress and the behavior of the autonomic nervous system [39].

Behavioral factors, which manifest as observable metrics—such as TCT, accuracy, and performance consistency—reflect underlying cognitive and physiological states [13]. For instance, an operator may exhibit stable behavioral performance while experiencing elevated cognitive loads; such discrepancies between subjective and objective metrics could lead to misclassifications of proficiency if assessed by unimodal methods alone [15].

2.4. Rationale for Multimodal Assessment

The rationale for multimodal assessment stems from the limitations of unimodal approaches; research consistently shows that integrating various data sources yields superior accuracy and reliability in evaluating human proficiency. Multimodal assessments have demonstrated performance improvements, reflecting a more comprehensive view of human proficiency in dynamic environments [3]. The synthesis of physiological indicators, behavioral data, and subjective evaluations fosters a rich understanding of performance, which is critical for the adaptive and collaborative goals inherent in Industry 5.0 [17].

A comprehensive assessment of human proficiency must consider cognitive, physiological, and behavioral dimensions concurrently. Recognizing the interdependence of these factors and adopting a multimodal approach will provide a more accurate and actionable understanding of operator performance in complex, dynamic settings.

2.5. Three-Level Proficiency Classification Framework

The Three-Level Proficiency Classification Framework (Novice, Intermediate, and Expert) aligns with the established cognitive science literature, particularly models of skill acquisition, such as the Dreyfus Model and Cognitive Load Theory. This framework emphasizes the developmental stages of expertise as follows:

Novice: Individuals at this level demonstrate rule-based behavior that requires explicit instruction and guidance. They frequently experience substantial cognitive load during task execution. Research indicates that Cognitive Load Theory significantly influences learning outcomes, particularly in problem-solving contexts [40]. For example, studies suggest that appropriate cognitive load management can improve problem-solving skills by creating conducive learning environments for novices [40].
Intermediate: At this proficiency level, individuals can perform tasks independently in familiar contexts, having developed procedural skills. However, they may still struggle with new or complex situations. Evidence suggests that intermediate learners exhibit varying levels of cognitive workload that affect their efficiency in managing routine tasks. Monitoring cognitive workload via physiological metrics, such as HRV has been examined in settings where cognitive task demands vary, indicating that these fluctuations can lead to performance inconsistencies in unfamiliar scenarios [21,41].
Expert: Experts demonstrate highly flexible and effective performance, marked by a comprehensive understanding of their tasks and systems. They can foresee potential issues and make informed decisions, relying on a reduced perceived cognitive load for the routine components of tasks [42]. Research suggests that the assessment of cognitive workload is crucial for understanding expert performance, indicating that experts can maintain high performance despite dynamic cognitive demands [41,42]. Moreover, effective feedback is critical during the skill acquisition process for experts, enhancing their learning outcomes [43].

2.6. Rationale for Selected Proficiency Indicators: HRV, TCT, and NASA-TLX

The rationale for selecting HRV, TCT, and the NASA-TLX as primary indicators of proficiency is supported by empirical literature that highlights the relevance of these metrics for capturing the complexity of human performance in dynamic settings.

HRV is a well-established physiological metric that reflects the autonomic nervous system’s activity, making it an important indicator of an individual’s mental workload, stress, and overall arousal levels. Research indicates that HRV can be effectively used to assess cognitive load and emotional states, as it demonstrates sensitivity to variations in psychological stress and cognitive effort [21]. Studies have shown that HRV is associated with attentional control and emotional regulation, which are crucial for maintaining performance under pressure [44]. Time-domain and frequency-domain metrics, such as Standard Deviation of Normal-to-Normal intervals (SDNN) and Root Mean Square of Successive Differences (RMSSD) for time-domain analysis, and Low-Frequency (LF) and High-Frequency (HF) power for frequency-domain analysis, provide insights into autonomic balance and are relevant for understanding how HRV responds to cognitive demands [44].
TCT serves as a direct and objective metric of performance efficiency. It is commonly utilized in studies related to skill acquisition. Research indicates that reductions in TCT over successive trials typically signify improvements in proficiency and learning, while increases or variability in TCT can indicate challenges, such as task difficulty or operator fatigue [23]. Studies in simulation environments confirm TCT’s effectiveness as an objective tool for measuring skills and strategies, reinforcing its relevance to assessing proficiency in real-time systems [23].
NASA-TLX is a validated measure of perceived mental workload, providing insights into the subjective experience of task demands through its multi-dimensional framework. Although it is not designed for continuous real-time monitoring, its inclusion in a multimodal assessment framework enhances the understanding of how operators perceive and manage workloads. Research illustrates that perceived workload can diverge from objective performance indicators, highlighting the importance of subjective assessments in human-centered evaluation approaches [45]. Moreover, empirical evidence shows that higher NASA-TLX scores are significantly associated with lower self-reported job satisfaction, indicating that the instrument can also serve as an indirect proxy for job-satisfaction levels when task demands are high [46].

The combination of HRV, TCT, and NASA-TLX creates a balanced framework for assessing proficiency. HRV provides physiological insights into an operator’s internal state, TCT offers a measure of task performance that is likely to link to higher-level productivity metrics, and NASA-TLX captures subjective perceptions of workload. This triad enables a comprehensive understanding of human proficiency, ensuring that critical aspects of performance are not overlooked, which is often the case in unimodal assessments [23,44,47].

Although HRV, TCT, and NASA-TLX were selected because they are widely validated and practical across many domains, the framework is deliberately metric-agnostic: any appropriate trio of physiological (e.g., HRV, EEG, fNIRS), behavioral (e.g., TCT, error-rate, eye-movement latency), and subjective (e.g., NASA-TLX, SURG-TLX) indicators can be substituted to suit the constraints of a specific application, provided that all three information channels remain represented [48,49].

2.7. Sensor Types for Multimodal Data Acquisition

In the emerging Industry 5.0 paradigm, the ability to fuse physiological, behavioral, and subjective data into a multimodal proficiency-assessment pipeline is increasingly viewed as a foundational pillar of truly human-centric production ecosystems [3]. Table 1 summarizes key sensor modalities, types, and devices for capturing HRV, TCT, and NASA-TLX, enhancing reliability over unimodal methods [50].

HRV, a sensitive indicator of cognitive workload, is measured using high-fidelity ECG sensors (e.g., Polar H10) or wearable PPG sensors (e.g., Apple Watch, Oura Ring) [3]. TCT is captured via software event logs, motion sensors (e.g., accelerometers), or RFID tools, providing precise behavioral data [51]. NASA-TLX, the standard for subjective workload, uses digital or paper questionnaires, with mobile apps enabling real-time collection [50].

2.8. Modeling Approaches: A Comparative Analysis of Temporal Models

To develop a robust comparative analysis of the MHMM against other prominent temporal modeling techniques, it is essential to draw on existing literature that covers the strengths and weaknesses of each model, as well as their applicability to dynamic assessment scenarios. Below is a summarized comparative analysis with references that support the characteristics of each modeling approach discussed.

Long Short-Term Memory (LSTM) Networks: LSTM networks are advanced neural networks capable of capturing long-range dependencies in sequential data, making them highly effective for various tasks, including language modeling and time-series prediction. They achieve superior accuracy, but at the cost of interpretability. The complexity of LSTMs can lead to a “black-box” effect, where understanding the rationale behind predictions is challenging [52]. Recent studies have shown that LSTMs outperform traditional models in various contexts due to their flexibility and power [53].
Conditional Random Fields (CRFs): CRFs serve as a powerful discriminative model that incorporates the entire context of sequences for prediction. They effectively avoid the independence assumptions typical of generative models, thereby improving the accuracy of predictions. However, training CRFs can be computationally intensive, and they might fall short of capturing very long-range dependencies compared to LSTMs [54,55]. Their integration into applications such as image labeling and natural language processing showcases their versatility [56].
Markov chain: Markov chains are well-studied and highly interpretable because they assume the system’s state is fully observable at every time-step, so the transition matrix can be inspected directly [57]. This transparency is valuable when clear state-to-state dynamics must be reported. However, a plain Markov chain has two key limitations for multimodal proficiency assessment:
o
It cannot represent hidden or latent proficiency states, because there are no observation model mapping unobserved states to measured signals; problems that involve unobservable constructs therefore require more expressive models such as Hidden Markov Models [57].
o
When multiple data streams (e.g., physiological, behavioral, subjective) are concatenated into a composite state, the number of distinct states grows exponentially, producing the classic curse of dimensionality and making the approach impractical for rich multimodal data [58].
Unimodal Hidden Markov Model (HMM): Unimodal HMMs are proficient at modeling temporal sequences and are well documented for their effectiveness in applications such as speech recognition and time series analysis. Their strength lies in their ability to capture temporal dependencies clearly, according to Liao et al. [59]. However, they are inherently limited by their inability to incorporate multiple data streams, leading to a potentially incomplete assessment of human proficiency [57].

Table 2 shows a comparison of Temporal Modeling.

The choice of modeling approach hinges on the trade-off between predictive accuracy and interpretability. Advanced models, such as LSTMs, tend to deliver high accuracy but may lack interpretability. In contrast, models such as CRFs and HMMs, while providing more interpretability, may not always reach the same level of predictive performance, especially in dynamic and multimodal scenarios. The generative framework of the MHMM offers a clearer and more interpretable structure that can directly connect to real-world performance concepts, which is essential for applications focused on enhancing human operator understanding and support.

2.9. The Need for Temporal Models and the Markov Chain Foundation

Human proficiency is best understood as a dynamic process rather than a static trait, necessitating a model that can interpret and process temporal data. Operators experience fluctuations in skill levels due to factors such as learning, fatigue, and task complexity. Static models that capture a single point in time do not align with this understanding, emphasizing the need for temporal models that accommodate changing states over time.

Markov chains provide a foundational framework for these temporal models. A Markov Chain is a statistical model in which the probability of transitioning to the next state relies solely on the current state, allowing the assessment of transitions between proficiency states, such as from Novice to Intermediate. This transition probability can yield insights into learning rates and the stability of proficiency levels [18,59]. However, traditional Markov Chain models assume that these states are directly observable, while true proficiency is inherently latent and inferred from observable behavior rather than measured directly [57].

The limitation of observing only behavior emphasizes the necessity of advanced models that can infer latent constructs. As pointed out in the literature, combining objective metrics with hidden states can enrich the understanding of underlying proficiency mechanisms [19]. Therefore, leveraging temporal models, such as the Markov Chain, while addressing their inherent limitations regarding latently measured states can enhance our ability to understand human proficiency dynamics in real-time settings.

2.10. The HMM as the Superior Choice

The HMM is particularly suited for this research due to its ability to address the limitations inherent in standard Markov Chains. Unlike a conventional Markov Chain, which only considers observable states, the HMM incorporates a layer of hidden states—such as Novice, Intermediate, and Expert—which evolve over time according to defined transition probabilities. These states are not directly measurable but are inferred from observable data streams that represent an individual’s proficiency.

Probabilistic Linking: The HMM probabilistically connects these hidden proficiency states to observable data, such as physiological metrics (e.g., HRV), behavioral metrics (e.g., TCT), and subjective feedback (e.g., responses on the NASA-TLX). This linkage allows a comprehensive understanding of proficiency that is informed by real-world indicators [59].
Latent Proficiency Assessment: The structure of the HMM aligns with the challenge of proficiency assessment by acknowledging that true proficiency is a latent construct that cannot be measured directly. Instead, it infers proficiency through related, observable behaviors and states, thus enriching the assessment process. This capability to capture temporal learning rates and state transitions can be instrumental for understanding dynamic human performance [19].
Temporal Dynamics: HMM retains the ability to model temporal dynamics and state changes effectively, providing valuable insights into how an operator’s skills develop over time. For instance, using established algorithms such as the Viterbi Algorithm, an HMM can analyze sequences of multimodal observations and discern the likeliest underlying sequence of proficiency states, making it a promising approach for this type of analysis [60].
Real-World Applications: HMMs have been successfully applied in diverse fields, such as speech recognition, bioinformatics, and human–computer interaction, demonstrating their versatility in modeling sequences where hidden states play a crucial role [60]. The adaptability and robustness of HMMs makes them particularly advantageous for capturing the fluctuating nature of human proficiency as influenced by various factors be they learning, fatigue, or task complexity.

The HMM framework not only effectively models the transitions between observable behaviors and unobservable proficiency states, but also potentially provides the interpretability needed for practical applications. By capturing hidden states and facilitating a nuanced understanding of operator performance, HMMs offers prospects of empowering researchers and practitioners to assess and support human operators more effectively.

2.11. Comparative Review of HMM Implementations

Although the benefits of MHMMs seem promising, it is important to situate this methodology within the framework of current empirical studies. HMMs and associated temporal modeling methodologies have been utilized in diverse fields to evaluate human conditions such as awareness, cognitive load, and task effort.

This comparative review (Table 3) combines essential works employing HMMs and other related models for human state evaluation, assessing each according to their data modalities, principal contributions, and distinct limits, underscoring the necessity of a more comprehensive framework.

2.12. Synthesis and Rationale for the Proposed MHMM Framework

This section synthesizes the literature review’s findings, identifying critical gaps in existing proficiency assessment models and providing a rationale for the proposed Multimodal Hidden Markov Model (MHMM) as a dynamic, multimodal solution for Industry 5.0.

The examination of existing empirical work on human proficiency assessment reveals a consistent pattern: No current model successfully integrates physiological, behavioral, and subjective data within a dynamic temporal framework specifically designed for the real-time classification of human proficiency. To address this gap, we propose the Multimodal Hidden Markov Model (MHMM) as a framework with distinct advantages uniquely suited to address this challenge:

Superior Temporal Modeling: Unlike static classifiers, such as Artificial Neural Networks (ANNs) and k-Nearest Neighbors (KNN), which consider each data point independently, the MHMM is built to model sequences of proficiency states through its transition matrix. This allows it to track the evolution of proficiency, capturing learning curves and fatigue onset. Research on vigilance and driving fatigue underscores the importance of such capabilities for accurate assessments of human performance, showing that dynamic modeling is key in tracking changes in performance over time rather than relying on static assessments [27].
Increased Robustness: By relying on multiple data streams, the MHMM provides state estimation that is more resilient to noise or artifacts in any single channel. For example, if HRV data are temporarily unreliable, the model can still utilize TCT and subjective data from validated measures, such as the NASA-TLX, to maintain stable and accurate proficiency assessments. This resilience to data quality fluctuations is supported by literature that recognizes the importance of integrating multimodal data to enhance the robustness of assessments in complex environments.
Predictive and Diagnostic Power: Beyond classification, the transition probabilities inherent in the MHMM enable the prediction of proficiency changes. If an operator’s transition probability shifts from Intermediate to Novice under stress, this may trigger proactive support within the operational system. Such predictive capabilities allow targeted training interventions that are often absent in static models.

The transition to Industry 5.0 demands a paradigm shift from static, unimodal assessments toward dynamic, human-centric models of human proficiency. The literature confirms that proficiency is a multidimensional construct best understood through the integration of physiological, behavioral, and subjective data. While various modeling techniques have been explored, significant limitations persist due to the reliance on single data modalities or a focus on constructs unrelated to proficiency.

3. Methodology

This section presents the theoretical foundations of HMMs and introduces the proposed MHMM for real-time human proficiency assessment. The rationale behind selecting HMMs to model latent proficiency states is discussed along with a detailed design of the MHMM, focusing on its assumptions and the statistical distributions appropriate for multimodal observation streams.

3.1. HMM Foundations for Proficiency Assessment

HMMs are extensively utilized for inferring latent states from observable data sequences [57]. In contexts where human proficiency is evaluated, it serves as an unobservable metric derived indirectly through performance indicators, such as task completion time and physiological metrics [19]. The HMM framework is particularly adept at analyzing time-series data, capturing the dynamics of these latent constructs, and revealing insights into proficiency trajectories.

3.1.1. The Hidden Markov Model: A Framework for Latent Proficiency States

An HMM consists of several integral components: a set of hidden states representing unobservable conditions, an observation space of measurable signals, and various probability distributions detailing transitions and emissions. For instance, in our study, the hidden states may reflect proficiency levels such as Novice, Intermediate, and Expert. The observable outputs include multimodal data consisting of TCT, HRV, and the NASA-TLX. Each of these data types has unique statistical properties that influence their representation in the HMM.

An HMM is formally defined by the following components:

Set of Hidden States $(S = {s_{1}, s_{2}, \dots ., s_{N}})$ : These represent $N$ distinct, unobservable conditions of the underlying system. In the present study, these states correspond to discrete levels of proficiency, such as Novice, Intermediate, and Expert.” The state at time $t$ is denoted as $q_{t} \in S$ .
Set of Observation Space $(Y = \{y_{1}, y_{2}, . . ., y_{m}\})$ : These are the $M$ distinct, measurable outputs or signals that can be observed at each time step. In our research, observations are vectors comprising multimodal data: TCT, HRV, and NASA-TLX scores. The observation at time $t$ is denoted $O_{t} \in Y$ . For continuous observations, $Y$ represents the space from which observations are drawn, and probability density functions are used instead of discrete probabilities.
Initial State Distribution $(π = π_{i})$ : The initial probability of hidden states at $t = 1$ .

$π_{i} = P (q_{1} = s_{i}), f o r i = 1, \dots, N, \sum_{i = 1}^{N} π_{i} = 1 .$

(1)

For example,

π_{N o v i c e}

in Equation (1) represents the probability that an individual will begin in the novice proficiency state.

State Transition Probability Matrix $(A = a_{i j})$ : This $N \times N$ matrix defines the probability of transitioning from hidden state $s_{i}$ to $s_{j}$ :

$a_{i j} = P (q_{t + 1} = s_{j}| q_{1} = s_{i}), f o r i, j = 1, \dots, N, \sum_{j = 1}^{N} a_{i j} = 1 f o r a l l i .$

(2)

For example,

a_{N o v i c e \to I n t e r m e d i a t e}

in Equation (2) denotes the probability of transitioning from Novice to Intermediate.

Emission Probability Distribution $(B = b_{j} (O_{t})) :$ The probability of observing $O_{t}$ when in hidden state $s_{j}$ :

$b_{j} (O_{t}) = P (O_{t}| q_{t} = s_{j}) .$

(3)

For example,

b_{N o v i c e} (l o n g T C T, l o w H R V, h i g h T L X)

in Equation (3) represent the likelihood of observing this specific combination of performance metrics if the individual is currently in the Novice proficiency state.

The definition of “time” t within the HMM is a critical design parameter. It can represent discrete task trials, segments within a continuous task, or fixed time windows. This choice significantly influences the interpretation of transition probabilities A; for instance, short time steps might reveal rapid intra-task fluctuations, while longer steps or trial-based steps might reflect slower learning curves or inter-task proficiency shifts.

The practical application of HMMs often relies on two key assumptions to maintain mathematical tractability and computational feasibility:

First-Order Markov Assumption: The current hidden state $q_{t}$ depends only on the previous state $q_{t - 1}$ :

$P (q_{t}| q_{t - 1}, q_{t - 2}, \dots, q_{1}) = P (q_{t}| q_{t - 1}) .$
Output Independence Assumption: The current observation $Q_{t}$ is conditionally independent of all other observations and past hidden states, given the current hidden state $q_{t}$ :

$P (O_{t}| q_{t - 1}, q_{t - 2}, \dots, q_{1}, O_{t - 1}, O_{t - 2}, \dots, O_{1}) = P (o_{t}| q_{t}) .$

3.1.2. Algorithmic Solutions for HMM Problems in Proficiency Modeling

HMMs provide a framework for addressing three fundamental problems, each of which has direct relevance to proficiency assessments:

Likelihood Evaluation: Given an HMM defined by $λ = (A, B, π)$ (Equations (1)–(3)) and a sequence of observations, determine the probability of observing this sequence: $P (O ∣ λ)$ . This is crucial for comparing how well different proficiency models (e.g., an HMM trained on expert data versus one trained on novice data) explain a new operator’s observed performance sequence. The Forward Algorithm, a dynamic programming approach, efficiently solves this problem [59].
Decoding: Given an HMM, $λ = (A, B, π)$ (Equations (1)–(3)), and a sequence of observations $Q = (q_{1}, q_{2}, \dots, q_{T})$ , find the likeliest sequence of hidden states that could have generated these observations. This is arguably the key problem for real-time proficiency assessment, as it allows the inference of an operator’s proficiency trajectory (e.g., Novice → Novice → Intermediate) over the duration of a task or series of tasks. The Viterbi Algorithm is the well-studied method for finding this optimal state sequence (In everyday terms, the Viterbi Algorithm acts like a GPS for hidden states: it traces the single most likely “route” an operator’s proficiency could have taken, step-by-step, to produce the observations you recorded). Importantly, the output of the Viterbi Algorithm is an inference of the most probable state sequence, not an absolute ground truth; its accuracy is contingent upon the quality and validity of the learned HMM parameters [60].
Learning (or Training): Given a sequence of observations $Q$ (and potentially the number of hidden states $N$ ), estimate the HMM parameters $λ = (A, B, π)$ that maximize the probability of the observed data $P (O ∣ λ)$ . This is how the proficiency models are constructed from the training data. The Baum-Welch Algorithm (In everyday terms, Baum-Welch is like tuning a radio: you keep nudging the dial until the signal comes in clearest, iteratively adjusting the model’s probabilities so they best explain the data you hear), a specific instance of the Expectation-Maximization (EM) Algorithm, is commonly used for this unsupervised learning task. The nature and quality of the training data used with the Baum-Welch Algorithm are paramount [64]. The quality of training data is important; distinct proficiency models (e.g., Novice, Expert) require observation sequences from clearly categorized individuals to ensure model validity [65].

3.2. The Proposed Multimodal Hidden Markov Model (MHMM)

Building on the foundational principles of HMMs, this research proposes an MHMM specifically designed for the dynamic assessment of human proficiency.

The conceptual architecture of the proposed MHMM is illustrated in Figure 1. The model’s key innovation is its two-layer structure.

The model is structured into two layers. The Hidden Layer contains the unobservable proficiency states (Novice, Intermediate, and Expert), which must be inferred. The Observation Layer consists of directly measurable data streams: Heart Rate Variability (HRV), Task Completion Time (TCT), and NASA Task Load Index (NASA-TLX). Arrows within the hidden layer represent Transition Probabilities (A) (Equation (2)), modeling how proficiency evolves. Dashed arrows from the hidden to the observation layer represent Emission Probabilities (B) (Equation (3)), linking proficiency states to the data they are likely to produce.

Figure 2, this flowchart visually outlines the MHMM framework for real-time human proficiency assessment. It details the steps from initial multimodal data collection (HRV, TCT, NASA-TLX), and preprocessing (normalization, handling missing values), through the parameterization of the MHMM (initial state, transition, and emission probabilities). It highlights the crucial decision point of incorporating contextual factors (task complexity, fatigue) to either utilize an expert-driven model or a trained model, ultimately leading to the inference of latent proficiency states using the Viterbi Algorithm and the output of valuable transition probabilities. This systematic approach underpins the dynamic and comprehensive understanding of operator proficiency, crucial for human-centric systems in Industry 5.0.

3.2.1. The Conditional Independence Assumption in MHMM

A key simplifying assumption in the proposed MHMM, common to many multimodal HMM applications, is the conditional independence of the observation streams, given the hidden proficiency state. Formally, the joint probability of observing a particular set of HRV, TCT, and NASA-TLX values at time

t

, given the system is in proficiency state

s_{j}

, is factored as shown in Equation (4):

P ({H R V}_{t}, {T C T}_{t}, {T X L}_{t}| q_{t} = s_{j}) = P ({H R V}_{t}| q_{t} = s_{j}) . P ({T C T}_{t}| q_{t} = s_{j}) . P ({T L X}_{t}| q_{t} = s_{j})

(4)

This assumption means that, for any given level of proficiency, the HRV value is considered independent from the TCT and NASA-TLX score at that moment. The main reason for using this assumption is to make the model simpler and easier to work with. It reduces the number of parameters the model needs, which makes training faster and running the model more efficient.

Of course, in real life, elements such as HRV, TCT, and perceived effort often change together, even when someone’s skill level stays the same. For example, frustration might raise HRV, decelerate task time, and increase the sense of effort simultaneously. However, HMMs are generally quite good at handling small violations of this independence assumption. In practice, the different types of data (physiological, behavioral, and subjective) often show clear enough patterns for each skill level-Novice, Intermediate, Expert-that the model can still tell the difference between them, even if some overlap exists. In this approach, each type of data acts somewhat similar to an independent “signal” about the person’s skill level. As long as these signals are mostly reliable and distinct for each level, the model can be expected to work reasonably well.

3.2.2. Statistical Modeling of Emission Probabilities

A critical design choice in an MHMM for continuous data is the selection of appropriate probability distributions for the emission probabilities

b_{j} (O_{t})

. For this exposition we use surgery proficiency as the use-case and use statistical properties and benchmarks from surgical literature, particularly the work of Shrivastava et al. (2025) [24], the following distributions were chosen:

Task Completion Time (TCT)–Lognormal Distribution: TCT data are positive, skewed right, and bounded at zero, with a long tail due to errors or hesitation. The lognormal distribution is suitable for such data [66].
Heart Rate Variability (HRV)–Normal (Gaussian) Distribution: HRV, as a physiological measure, clusters symmetrically around a mean for a given cognitive state, making the normal distribution appropriate [67].
NASA Task Load Index (TLX)–Normal (Gaussian) Distribution: Subjective TLX scores are assumed to follow a normal distribution around a mean characteristic of a proficiency level [37].

The use of these specific distributions ensures that the MHMM accurately captures the statistical properties of each data stream, thereby improving the fidelity of the emission probability calculations and the overall accuracy of proficiency state inference, creating distinct multimodal signatures for each proficiency state for the surgery use-case. More generally, the specific distributions and their parameters will depend on the application in terms of the tasks, multi-modal measurements, and of course the human operators and associated proficiency levels.

4. Dataset Generation and Simulation Environment

This section details the simulation-based approach employed to rigorously validate the MHMM. Our methodology emphasizes grounding a high-fidelity synthetic dataset in empirical literature, ensuring it mirrors real-world human proficiency dynamics. This controlled environment provides an unambiguous ground truth, enabling precise and quantitative evaluation of the proposed MHMM approach’s ability to recover latent proficiency states from multimodal data streams under various conditions. Furthermore, this section outlines a sensitivity analysis to assess the robustness of the MHMM’s parameter settings and describes an expanded comparative framework including advanced neural models and methodological transparency.

4.1. Rationale for a Simulation-Based Approach

While real-world field data are the ultimate benchmark, a simulation-based approach offers several indispensable advantages for the initial, rigorous validation of a complex framework such as the MHMM:

Establishing an Unambiguous Ground Truth: The core task of the MHMM is to infer latent (hidden) proficiency states (Novice, Intermediate, Expert). In real-world settings, an operator’s true, moment-to-moment proficiency level is unobservable and cannot be labeled with perfect accuracy. Our simulation provides an unambiguous ground truth where the underlying state sequence is known by design, allowing for precise, quantitative evaluation of model performance that is impossible with field data alone.
Enabling Controlled, Systematic Stress Testing: Real-world data are often noisy and incomplete. A simulation allows for controlled experimentation where specific challenges can be isolated and systematically tested. As detailed in later, we can create scenarios to evaluate a model’s robustness to sensor noise, its resilience to missing data, and its performance on imbalanced class distributions. This targeted stress testing is important for understanding a model’s limitations before real-world deployment.
Feasibility and Scalability: Collecting large-scale, longitudinal multimodal data in high-stakes environments (e.g., operating rooms, industrial settings) is logistically complex, prohibitively expensive, and fraught with ethical and privacy challenges. Simulation provides a practical method for generating the rich datasets required to train and validate sophisticated temporal models.
Ensuring Reproducibility: By generating data from a defined parameterized model, this study is fully reproducible. Other researchers can use the same generative process to benchmark their models against this study’s results, fostering transparency and advancing the field.

Therefore, this illustrative study uses simulation not as a substitute for real-world validation, but as a necessary first step to rigorously demonstrate the MHMM’s methodological soundness and comparative advantages under a range of controlled conditions.

4.2. Simulation Design and Ground-Truth Generation

The evaluation’s foundation is a synthetic dataset generated by the MHMM class configured in “Expert Mode,” acting as a ground-truth generator. This dataset comprises longitudinal data for 100 simulated participants, each performing 200 distinct procedural tasks, totaling 20,000 individual data points. Each data point represents multimodal observations HRV, TCT, and NASA-TLX intrinsically linked to a known underlying proficiency state (Novice, Intermediate, or Expert).

To ensure ecological validity, the MHMM’s generative parameters were meticulously initialized using empirical benchmarks from the surgical training literature (Shrivastava et al., 2025 [24]). The initial state distribution for participants was set to favor Novice states (

π_{N o v i c e} = 0.80, π_{I n t e r m e d i a t e} = 0.15, π_{E x p e r t} = 0.05

). The base transition probability matrix (A_base) (Equation (2)) was designed to reflect typical learning curves, with high self-transition probabilities and smaller probabilities for progression (e.g., Novice to Intermediate) or regression (e.g., Intermediate to Novice). For emission parameters, the means and standard deviations for each modality (TCT as Lognormal, HRV and NASA-TLX as Normal distributions) were directly matched to empirical findings from Shrivastava et al. (2025) [24], creating distinct multimodal signatures for each proficiency state.

A key innovation of this simulation is its ability to model how contextual factors dynamically influence proficiency transitions. The compute_dynamic_transition_matrix_expert function modulates the base transition probabilities at each procedural step based on simulated workplace challenges. These include task complexity (impeding skill acquisition), skill decay and forgetting (increasing regression probabilities based on time gaps between procedures), cumulative fatigue and stress (increasing regression probabilities based on physiological strain and perceived workload), and collaboration intensity (adjusting learning and regression based on interaction with AI). The data generation process is an iterative, step-by-step procedure in Python 3.11.13, where at each step, the next hidden state is probabilistically sampled based on dynamic transitions, and corresponding multimodal observations are generated from their defined emission distributions, with all contextual metadata recorded. This ensures a rich, longitudinal dataset with known ground-truth trajectories for rigorous model evaluation.

4.3. Comparative Benchmarking Framework

To objectively assess the MHMM’s performance, its two variants (Expert System and Trained) were benchmarked against a suite of established temporal models, chosen to represent different classes of sequence modeling approaches. This comprehensive comparison evaluates the benefits of multimodal fusion and the MHMM’s interpretability against state-of-the-art black-box and discriminative models.

MHMM (Expert System): This variant of our Multimodal Hidden Markov Model uses pre-defined, expert-informed parameters. It represents an ideal scenario where domain knowledge is perfectly integrated, and it uniquely leverages contextual metadata (task complexity, fatigue events) to dynamically adjust its predictions, showcasing the full potential of a context-aware system.
MHMM (Trained): This is a standard MHMM that learns its parameters (initial state probabilities, transition probabilities, and emission distributions) directly from the training data using the Baum-Welch algorithm. Its inclusion evaluates the MHMM’s ability to discover proficiency dynamics from raw observation sequences without prior expert knowledge, providing a more realistic benchmark for data-driven applications.
Unimodal HMMs (TCT, HRV, or TLX): Three separate Hidden Markov Models were implemented, each trained exclusively on a single data modality (Task Completion Time, Heart Rate Variability, or NASA Task Load Index). These serve as crucial baselines to empirically demonstrate the advantage of multimodal data fusion over relying on any single physiological, behavioral, or subjective indicator.
Long Short-Term Memory (LSTM) Network: As a type of recurrent neural network, LSTM models are state-of-the-art in sequence learning, capable of capturing complex long-range dependencies in data. They serve as a powerful “black-box” performance benchmark, often achieving high accuracy but lacking the inherent interpretability of probabilistic models like HMMs.
Transformer Model: This advanced neural architecture, known for its use of self-attention mechanisms, excels in capturing global dependencies within sequences, making it highly effective for sequence-to-sequence tasks. Its inclusion provides a contemporary benchmark from the deep learning domain to assess how advanced neural architectures perform in modeling temporal proficiency dynamics.
Hybrid HMM-LSTM Model: This model combines the strengths of both HMMs and LSTMs. It uses the MHMM’s emission probabilities (representing the likelihood of observations given each hidden state) as additional features for an LSTM network. This hybrid approach explores whether integrating probabilistic insights from HMMs can enhance the predictive power of neural sequence processing.
Conditional Random Field (CRF): Unlike generative models such as HMMs, CRFs are discriminative models that focus on directly modeling the conditional probability of a sequence of labels given a sequence of observations. They are powerful for sequence labeling tasks and avoid some of the strong independence assumptions of HMMs, offering an alternative probabilistic modeling approach for comparison.

4.4. Experimental Procedure and Stress-Testing Scenarios

All models were evaluated using a standardized procedure with a participant-level 70/30 train-test split to ensure generalizability. Model performance was assessed not only on clean data but also across four stress-test scenarios designed to simulate realistic data challenges: a Baseline Scenario (unaltered data), a Noise Robustness Scenario (15% of data points corrupted by a multiplicative noise factor), a Missing Data Scenario (15% of data points randomly set to NaN), and an Imbalanced Data Scenario (highly skewed proficiency distribution, e.g., 80% Novice). Performance was quantified using Accuracy and Weighted F1-Score, with the find_state_mapping function ensuring correct alignment of unsupervised model labels with true proficiency states.

4.5. Robustness to Noisy and Missing Sensor Data

The simulation study evaluates the MHMM’s performance under realistic data challenges, including noisy and missing sensor data, as outlined in the stress-testing scenarios. In real-world industrial settings, such as smart factories or surgical training, sensor data like HRV, TCT, and NASA-TLX are often subject to noise from environmental interference or missing due to sensor failures, such as wearable device dropouts. The MHMM is designed to handle these challenges robustly, ensuring reliable proficiency assessment in Industry 5.0’s human-centric, technology-augmented environments. This subsection outlines the MHMM’s mechanisms for managing noisy and missing data, as tested in the simulation, enabling continuous operator support without workflow disruption.

For missing data, the MHMM employs a native approach in its _emission_log_probability function, which conditionally includes only non-missing modalities when computing the log probability of an observation vector for a given proficiency state (Novice, Intermediate, Expert). For instance, if HRV data is unavailable (NaN), the calculation omits HRV’s contribution while incorporating TCT and NASA-TLX, ensuring partial observations remain usable without requiring imputation. This approach was tested in the Missing Data scenario, simulating intermittent sensor failures common in industrial training systems.

For noisy data, the MHMM leverages continuous probability distributions parameterized by empirically grounded means and standard deviations from Shrivastava et al. (2025) [24]. These distributions, detailed above, capture the expected variability in each modality, allowing the model to absorb moderate noise perturbations as part of the natural spread of the data rather than interpreting them as significant changes. Numerical stability is ensured through log-space computations and probability clipping in the Baum-Welch and Viterbi algorithms, preventing underflow or overflow issues when processing noisy data. The Trained MHMM’s parameter initialization, using K-Means clustering, promotes robust convergence on noisy training data by identifying stable initial emission parameters, as evaluated in the Noise Robustness scenario.

4.6. Parameter Sensitivity Analysis

To further assess the robustness of the MHMM’s trained parameters, a sensitivity analysis was conducted. This involved systematically perturbing key emission parameters (means and standard deviations of HRV, TCT, and NASA-TLX for specific states) by ±10% or ±15%. For each perturbation, the MHMM (Expert System) was re-evaluated on the test dataset, and its performance (accuracy and macro-averaged F1-Score) was compared against the original model. This analysis quantifies the model’s stability under variations in empirical estimates, providing critical insight into its reliability in practical applications where precise parameter values may vary.

5. Results and Discussion

This section presents and analyzes the findings from the simulation study. The results validate the MHMM’s superior performance in accuracy, robustness, and temporal tracking compared to other state-of-the-art models. Furthermore, the discussion highlights the MHMM’s unique interpretability, exploring its implications for developing human-centric AI systems in Industry 5.0.

5.1. Data Distribution and Model Grounding

The fidelity of the simulation is critical for the validity of the evaluation. Figure 3 illustrates the distributions of the generated multimodal data (TCT, HRV, and TLX) for each proficiency state. The distinct overlapping distributions for Novice, Intermediate, and Expert closely mirror the empirical benchmarks established in the surgical literature (Shrivastava et al., 2025) [24]. For instance, TCT shows a clear rightward shift and increased variance for novices, while HRV shifts rightward (indicating lower stress) with increased expertise. This demonstrates that the synthetic data successfully capture the nuanced statistical signatures of real-world proficiency, providing a valid foundation for benchmarking the models.

To further elaborate on the characteristics of the multimodal data, Figure 4 provides a detailed view of the density distribution of each physiological (HRV), behavioral (TCT), and subjective (NASA-TLX) metric within each proficiency state (Novice, Intermediate, Expert). These violin plots, unlike simple box plots, illustrate the full shape of the data distribution, including peaks and spread. This allows for a more nuanced understanding of how these multimodal inputs differentiate (or exhibit overlap between) proficiency levels, thereby confirming the statistical properties and distinctiveness of the generated data for each state.

5.2. Comparative Model Performance Across Scenarios

The performance of all models is evaluated across four scenarios designed to test their accuracy and robustness. Table 4 summarizes the key performance metrics (Accuracy and Weighted F1-Score) for each model across these scenarios.

Several key findings emerge from these results:

Superiority of MHMM: In all four scenarios, the MHMM (Expert System) and MHMM (Trained) variants consistently outperformed all other models in both accuracy and F1-scores. The F1-score measures how well a model balances precision (how many predicted labels are correct) and recall (how many actual cases it finds). It is calculated as:

$F1-score = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}$

This makes it useful for evaluating performance when some classes are less frequent than others. This demonstrates the inherent strength of the HMM framework for this task, especially when it can leverage either expert knowledge or learn directly from multimodal data.
Clear Advantage of Multimodal Fusion: Unimodal HMMs performed significantly worse than all multimodal approaches, with accuracies hovering between 61% and 65% in the baseline scenario. This provides strong evidence for the central hypothesis of this study: Fusing physiological, behavioral, and subjective data streams provides a far more robust and reliable assessment of proficiency than any single indicator alone.

Robustness to Noise and Missing Data: The MHMM variants showed remarkable resilience in the stress test scenarios. In the Missing Data scenario, the MHMM’s performance barely degraded, highlighting the advantage of its native ability to handle missing observations without relying on imputation. In contrast, CRF’s performance dropped from 0.885 to 0.801 accuracy, indicating its vulnerability to incomplete data.

5.3. Parameter Sensitivity Analysis of the Trained MHMM

To further evaluate the robustness and reliability of the trained MHMM, a sensitivity analysis was performed on its key emission parameters. This analysis involved systematically perturbing the learned mean and standard deviation values for HRV, TCT, and NASA-TLX distributions across the different proficiency states (0: Novice, 1: Intermediate, 2: Expert). The impact of these perturbations on the model’s overall classification accuracy and F1-Macro score was then observed.

As presented in Table 5, the results demonstrate the trained MHMM’s resilience to variations in its emission parameters. Even with significant perturbations, such as a ±10% change in the mean of HRV for Novice (state 0) or Intermediate (state 1), the model’s accuracy and F1-Macro scores remained remarkably close to the baseline performance (0.925 accuracy, 0.912 F1-Macro). Similar minimal performance degradation was observed when the standard deviation of TCT for Novice (state 0) was varied by ±15%, or when the mean and standard deviation of TLX for Intermediate (state 1) and Expert (state 2) were perturbed by ±10%. Only the most impactful results are included in the table for brevity and clarity, as the remaining perturbations yielded changes in less than 0.002, statistically negligible and visually uninformative. This consistent performance across perturbed parameters underscores the inherent stability of the MHMM, suggesting that its ability to infer proficiency states is robust even when faced with minor inaccuracies or natural variability in the underlying data distributions. This finding is crucial for practical applications where precise parameter estimation might be challenging.

5.4. Real-World Applicability and Implementation Scenarios

The two MHMM variants evaluated in this study-Expert MHMM and Trained MHMM-demonstrate the framework’s adaptability to diverse real-world needs in Industry 5.0, where human-centric and resilient systems are foundational. The choice between a pre-defined or trainable model depends on the application context, data availability, and the balance between standardized assessment and context-specific adaptation, with both ensuring secure handling of sensitive multimodal data (e.g., HRV, TCT, NASA-TLX).

Expert MHMM: This model is suited for standardized, high-stakes settings with established proficiency benchmarks, such as surgical training programs. Its pre-defined parameters, derived from empirical literature (e.g., Shrivastava et al., 2025) [24], enable a computationally efficient “plug-and-play” system, ideal when large, context-specific datasets are unavailable. For example, in surgical simulation, the Expert MHMM could use securely collected HRV data to assess trainee proficiency, ensuring data privacy through encrypted transmission. Its interpretable transition probabilities (e.g., 15% Novice-to-Intermediate transition) support trusted decision-making in safety-critical applications. In our simulation, the Expert MHMM achieved 92.7% accuracy (Table 4), highlighting its potential for standardized assessments.
Trained MHMM: This variant is ideal for dynamic environments where local factors influence performance, such as smart factories with unique workflows. By learning parameters via the Baum-Welch Algorithm [68], it adapts to context-specific variations, enhancing accuracy within particular settings. For instance, a manufacturing facility could train the MHMM on encrypted TCT data to monitor human proficiency with cobots, ensuring data security. The Trained MHMM’s 92.5% accuracy (Table 4) demonstrates its robustness in data-driven scenarios. This adaptability supports Industry 5.0’s vision of personalized operator support [68].

The flexibility of the MHMM framework to accommodate both standardized and customized applications underscores its value for human–AI collaboration in Industry 5.0. While simulation results validate its efficacy, future empirical validation in real-world settings (e.g., manufacturing or healthcare) is needed to confirm its practical impact.

5.5. Analysis of Proficiency Trajectory Tracking

As shown in Figure 5, both MHMMs track the ground truth with exceptional accuracy. They successfully capture not only the stable periods within a single proficiency state but also the precise moments of transition, such as learning progressions (Novice → Intermediate) and temporary regressions. This ability to accurately model the temporal dynamics of skill acquisition and decay is a key advantage of the MHMM framework.

Furthermore, Figure 6 displays the step-wise prediction accuracy of both MHMM variants (Expert System and Trained) over the duration of the procedure for a sample participant. The shaded regions around each accuracy line represent simulated confidence intervals, providing an indication of the reliability and variability of the predictions at each point in time. This offers a statistical perspective on the models’ consistent performance and prediction uncertainty throughout the task.

5.6. Interpretability: Unpacking Proficiency Dynamics

A core advantage of the MHMM over black-box models such as LSTMs is its interpretability. The model’s learned parameters are not abstract weights but tangible probabilities that offer direct insights into proficiency dynamics. Figure 7 visualizes the dynamic transition matrices under different simulated contexts.

The heatmaps show how the probabilities of transitioning between proficiency states change from the Baseline condition in response to high complexity and collaboration intensity (Collab) and fatigue and decay.

These matrices provide quantifiable, actionable metrics:

Baseline Learning Rates: Under standard conditions, the probability of a Novice transitioning to Intermediate, as defined by the transition matrix in Equation (2), is 15%, implying an expected dwell time of approximately six to seven procedures in that state.
Contextual Adjustments: The model quantifies how context impacts proficiency. High task complexity reduces the Novice-to-Intermediate transition probability from 15% to 9%. The onset of fatigue and skill decay increases the probability of an Expert regressing to Intermediate from a mere 1% to 12%.

Additionally, in Figure 8, visually represents the most commonly observed state-to-state transitions for a sample participant. This bar chart effectively highlights the most frequent pathways of proficiency change (e.g., “Novice → Novice” or “Intermediate → Expert”), offering an intuitive overview of the predominant flow of proficiency states within the simulated trajectory.

This level of transparency allows trainers and system designers to understand why a model makes a certain prediction. It enables data-driven interventions (e.g., scheduling rest breaks when the regression risk is high) and fosters trust in the AI system, a critical factor for adoption in high-stakes environments.

5.7. Recent Temporal Modeling Approaches (2020–2025)

To further contextualize the performance of the proposed MHMM, a detailed comparative summary of recent temporal modeling approaches applied to human state evaluation is presented in Table 6. This table highlights their data modalities, methodologies, reported accuracies, and main limitations, providing a broader perspective on the current landscape of research.

While recent temporal modeling approaches demonstrate promising results across diverse domains (from gaming and driving to pilot monitoring) they often rely on unimodal data, lack longitudinal proficiency tracking, or omit subjective context. As summarized in Table 6, many models either prioritize short-term states (e.g., fatigue, vigilance) or are validated on narrow, domain-specific tasks. In contrast, the proposed MHMM integrates physiological, behavioral, and subjective modalities within a three-state interpretable framework, achieving high accuracy (92.5%) and aligning with cognitive science theories. Despite the current limitation of relying on simulation-based validation, the MHMM offers strong potential for real-time, generalizable proficiency tracking in complex industrial environments. Its modular design and multimodal integration position it as a promising foundation for future adaptive and human-centered AI systems in Industry 5.0.

5.8. Discussion and Implications for Industry 5.0

The results of this study position the MHMM as a powerful and practical tool for realizing the human-centric vision of Industry 5.0. Its high accuracy, robustness to imperfect data, and unique interpretability make it highly suitable for real-world deployment. By providing a continuous, context-aware understanding of human proficiency, the MHMM can power a new generation of adaptive systems:

Personalized Training: The system can tailor training curricula in real time, providing more support to a Novice and offering more challenging scenarios to an Expert.
Adaptive Human–AI Collaboration: A cobot could adjust its level of assistance based on the operator’s inferred state, offering more explicit guidance during moments of fatigue or simplifying tasks when the operator is struggling.
Proactive Well-being Management: By detecting early signs of fatigue-induced skill regression, the system can prompt interventions to prevent burnout and ensure workforce sustainability.

The model’s strong performance, grounded in realistic, empirically derived data, underscores its potential to bridge the gap between human operators and intelligent systems, creating safer, more efficient, and more supportive work environments.

5.9. Limitations and Future Directions

Despite its robust design and significant performance, the MHMM framework acknowledges limitations guiding future research:

Simulation-Based Validation: The current reliance on synthetic data is essential for controlled methodological validation but limits claims regarding real-world performance. Operational field validation remains a crucial next step.
Future Empirical Validation: Empirical validation studies are planned within real-world settings, such as advanced manufacturing environments and surgical training centers. These studies will involve operators performing authentic tasks while monitored through wearable physiological devices, detailed behavioral data, and continuous subjective assessments via adaptive interfaces. Expert-validated proficiency evaluations will be employed to benchmark the MHMM’s predictions, ensuring robust real-world applicability and facilitating the refinement of parameters.
Conditional Independence Assumption: While the conditional independence assumption greatly simplifies computational modeling, real-world data may show interdependencies across physiological, behavioral, and subjective metrics. Future studies could explore more sophisticated coupled HMMs or Bayesian network models to explicitly capture these inter-modal correlations, potentially enhancing accuracy.
Ethical and Practical Constraints: Multimodal monitoring introduces privacy concerns and necessitates transparent data-handling protocols aligned with regulatory compliance (e.g., General Data Protection Regulation (GDPR)or Health Insurance Portability and Accountability Act (HIPAA)). Ensuring equitable access across diverse industries and populations also remains essential. Future research must prioritize ethical considerations, consent processes, and non-intrusive sensor technologies.

Moreover, extending the MHMM framework to incorporate bidirectional interactions between operators and AI systems, particularly cognitive feedback loops adjusting assistance levels based on real-time inferred workload, would further advance its alignment with the collaborative ethos of Industry 5.0.

6. Conclusions

The transition to Industry 5.0 demands dynamic, human-centric tools that transcend the limitations of static, unimodal proficiency assessments. This paper introduced the Multimodal Hidden Markov Model (MHMM), a novel framework that integrates physiological (HRV), behavioral (TCT), and subjective (NASA-TLX) data to infer latent proficiency states in real time. Through a comprehensive simulation study, the MHMM (trained) achieved a remarkable 92.5% accuracy, outperforming unimodal HMMs and rivaling advanced models, such as LSTMs and CRFs. Its ability to model temporal dynamics and adapt to contextual factors—such as task complexity, fatigue, and skill decay—ensures robust and context-aware proficiency tracking.

Although we operationalized the MHMM with three hidden states, (the framework is state-agnostic. In domains where proficiency unfolds in finer, task-specific increments (e.g., surgery, aviation maintenance, or advanced pilot training), researchers have successfully deployed HMM variants with five or more states to reflect detailed sub-phases of expertise [26,69]. Accordingly, the number of latent states in our MHMM can be scaled up or down to match the granularity required by a given application without altering the underlying probabilistic machinery.

The MHMM’s interpretable transition probabilities provide granular insights into learning and forgetting rates, enabling targeted interventions and adaptive human–AI collaboration. By aligning with Industry 5.0’s pillars of human-centricity, resilience, and sustainability, the MHMM supports safer, more efficient, and equitable industrial ecosystems. While challenges such as real-world validation and ethical data collection remain, the framework’s empirical foundation and generalizable methodology position it as a transformative tool for high-stakes domains. The MHMM not only advances proficiency assessments but also paves the way for empathetic, collaborative AI systems that empower human potential in Industry 5.0 and beyond.

7. Potential Applications and Broader Impact

By integrating physiological, behavioral, and subjective data streams, the MHMM framework enables real-time, context-sensitive human proficiency assessment. This approach is intrinsically aligned with the human-centric, resilient, and sustainable vision of Industry 5.0, where AI is designed to augment, rather than replace, human capabilities. The goal is to foster highly synergistic human–machine collaboration and adaptive decision making, particularly in high-stakes, safety-critical domains [70,71]. The potential applications and broader impacts of this framework are extensive and transformative.

7.1. Domain-Specific Applications

Healthcare and Surgical Training: In healthcare, MHMMs offer significant promises for revolutionizing surgical training and intraoperative support. By continuously monitoring metrics such as HRV, TCT, and subjective workload indices from trainees, the system can provide adaptive feedback and personalized training curricula [72]. Wearable devices enable real-time physiological data collection, supporting the development of data-driven training systems [72]. For instance, the real-time detection of cognitive overload or excessive fatigue can trigger automated assistance from a robotic surgical system, thereby reducing error rates and enhancing patient safety [41]. Recent work has demonstrated the value of using physiological and kinematic data to assess surgical skills, underscoring the relevance of multimodal models for creating data-driven, objective, and personalized medical education [73].

Aviation and Aerospace: In aviation, MHMMs can be embedded within next-generation flight simulators and operational cockpit systems to personalize pilot training and augment in-flight safety. By classifying mental states, such as high workload, distraction, or fatigue, the system can dynamically adjust scenarios’ complexity or modify cockpit interfaces to mitigate the risks associated with cognitive overload, especially during critical flight phases or long-haul operations [74]. The aerospace industry is increasingly focused on leveraging AI-driven workload assessment to address persistent challenges in pilot performance and safety, with MHMMs offering scalable solutions for both adaptive training and real-time risk mitigation [75].

Smart Manufacturing and Industry 5.0: Within smart factories, the MHMM framework empowers cobots to adapt their behaviors based on a human operator’s real-time proficiency and cognitive state. This aligns directly with the Operator 5.0 paradigm, where technology acts as a supportive partner [71]. For example, a novice operator exhibiting signs of high cognitive load might receive more detailed, stepwise guidance and proactive physical assistance from a cobot. Conversely, an expert would be entrusted with more strategic, supervisory tasks, optimizing workflow efficiency and team resilience [76]. This dynamic, human-aware adaptation is a cornerstone of the Industry 5.0 vision, enhancing productivity, safety, and the overall well-being of the workforce [70,71].

Education and Adaptive Learning: Beyond industrial settings, MHMMs can enhance educational platforms by monitoring student engagement, proficiency, and affective states (e.g., frustration, and confusion) using wearable sensors and behavioral analytics. The real-time classification of mental workloads enables personalized learning pathways, where content difficulty, resource recommendations, and task assignments are dynamically adjusted to the individual learner’s needs. This fosters more inclusive, engaging, and effective learning environments, particularly in complex domains such as STEM education [77].

7.2. Broader Societal and Economic Impact

The widespread adoption of MHMM-based systems promises significant societal and economic benefits. By enabling early detection of fatigue, distraction, and skill degradation, MHMMs can support timely interventions that reduce burnout and work-related stress. Notably, research highlights the prevalence of mental health challenges faced by healthcare professionals, exacerbated by factors such as excessive workloads and the COVID-19 pandemic [78,79]. The findings presented by [78]; illustrate the critical need for support mechanisms for mental health professionals, indicating that actionable measures like MHMM could mitigate issues such as compassion fatigue. Likewise, Frías et al. (2025) discuss strategies to bolster the mental health of healthcare workers, underscoring the urgency of protecting this workforce [78,79].

Furthermore, the proactive risk mitigation capabilities of MHMM (particularly in medical settings, aviation, and industrial operations) can prevent costly errors and improve overall safety and operational efficiency. Reports on the impact of MHMM implementations in high-stakes environments support claims about operational improvements, with potential cost savings aligning with findings from studies advocating for enhanced protocols in sectors such as healthcare [79,80].

Beyond traditional applications, MHMM’s scope extends into fields like renewable energy and autonomous logistics.

These adaptations are crucial as organizations strive toward Industry 5.0’s human-centric vision, which prioritizes workers’ well-being amidst technological advancements. Studies highlight the necessity of scalable AI solutions like MHMM in ensuring resilience within these emerging applications, underscoring this approach’s relevance in addressing the increasing complexity of today’s operational environments [81].

The transparent nature of the MHMM framework supports trust and accountability in human–AI interactions, addressing ethical imperatives crucial for responsible AI deployment.

Moreover, the importance of human–AI synergy for workforce resilience, particularly in the context of Industry 5.0, highlights the potential societal benefits of MHMM implementation. Studies suggest that engaging with AI positively impacts employee mental health and promotes a supportive work environment, essential for sustaining a healthy workforce [79,81].

The potential benefits of MHMM-based systems encapsulate a wider societal and economic impact that includes enhanced mental health support for workers, improved operational safety across sectors, and a commitment to ethical AI deployment. As industries adopt these systems, the integrated approach of MHMM could facilitate transformative changes that foster resilience within the workforce while addressing the challenges presented by modern work environments.

7.3. Methodological Innovation and Implementation Challenges

The MHMM framework serves as a catalyst for interdisciplinary research, merging cognitive science, human factor engineering, and machine learning. Its application advances the frontiers of human–computer interaction, wearable technology, and affective computing.

Despite its promise, the practical deployment of MHMMs must overcome significant challenges. Ethical considerations are paramount, requiring robust frameworks for data governance, users’ privacy, and informed consent in compliance with regulations such as GDPR and HIPAA [68]. The development of unobtrusive sensing technologies that can capture high-fidelity data without disrupting natural workflows is a critical engineering challenge. Finally, ensuring the fairness and equity of these systems requires rigorous validation across diverse demographic groups to identify and mitigate potential biases in the underlying models [82]. Future work must focus on addressing these challenges to unlock the full transformative potential of this technology.

Author Contributions

Conceptualization, M.M.A. and V.V.P.; methodology, M.M.A.; software, M.M.A.; validation, M.M.A. and V.V.P.; formal analysis, M.M.A.; investigation, M.M.A.; resources, V.V.P.; data curation, M.M.A.; writing—original draft preparation, M.M.A.; review and editing, V.V.P.; visualization, M.M.A.; supervision, V.V.P.; project administration, V.V.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study were synthetically generated based on parameters from Shrivastava et al. (2025) [24]. The synthetic dataset is available upon reasonable request from the corresponding author, subject to institutional approval.

Acknowledgments

We would like to thank one of the anonymous reviewers for their constructive comments and suggestions, which helped improve the clarity and presentation of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MHMM	Multimodal Hidden Markov Model
HRV	Heart Rate Variability
TCT	Task Completion Time
NASA-TLX	NASA Task Load Index
HMM	Hidden Markov Model
LSTM	Long Short-Term Memory
CRF	Conditional Random Field
IoT	Internet of Things
EEG	Electroencephalography
fNIRS	Functional Near-Infrared Spectroscopy
ECG	Electrocardiography
EDA	Electrodermal Activity
SURG-TLX	Surgery Task Load Index
HMI	Human–Machine Interface
SDNN	Standard Deviation of NN Intervals
RMSSD	Root Mean Square of Successive Differences
LF	Low-Frequency
HF	High-Frequency
EMHMM	Eye-Movement Hidden Markov Model
GDPR	General Data Protection Regulation
HIPAA	Health Insurance Portability and Accountability Act
ANN	Artificial Neural Network
KNN	k-Nearest Neighbors

References

Sun, S.; Zheng, X.; Gong, B.; Paredes, J.G.; Ordieres-Meré, J. Healthy Operator 4.0: A Human Cyber–Physical System Architecture for Smart Workplaces. Sensors 2020, 20, 2011. [Google Scholar] [CrossRef] [PubMed]
Yitmen, İ.; Almusaed, A.; Alizadehsalehi, S. Investigating the Causal Relationships Among Enablers of the Construction 5.0 Paradigm: Integration of Operator 5.0 and Society 5.0 with Human-Centricity, Sustainability, and Resilience. Sustainability 2023, 15, 9105. [Google Scholar] [CrossRef]
Zakeri, Z.; Arif, A.; Omurtag, A.; Breedon, P.; Khalid, A. Multimodal assessment of cognitive workload using neural, subjective and behavioural measures in smart factory settings. Sensors 2023, 23, 8926. [Google Scholar] [CrossRef] [PubMed]
Kakade, S.; Patle, B.K.; Umbarkar, A. Applications of Collaborative Robots in Agile Manufacturing: A Review. Robot. Syst. Appl. 2023, 3, 59–83. [Google Scholar] [CrossRef]
Bakator, M.; Nikolić, M.; Ćoćkalo, D.; Stanisavljev, S. Transition to Industry 5.0 With Ai and Digilitalization of Production Systems. J. Eng. Manag. 2024, 2, 8–12. [Google Scholar] [CrossRef]
Young, M.S.; Brookhuis, K.A.; Wickens, C.D.; Hancock, P.A. State of science: Mental workload in ergonomics. Ergonomics 2015, 58, 1–17. [Google Scholar] [CrossRef]
Malik, A.A.; Masood, T.; Bilberg, A. Virtual reality in manufacturing: Immersive and collaborative artificial-reality in design of human-robot workspace. Int. J. Comput. Integr. Manuf. 2020, 33, 22–37. [Google Scholar] [CrossRef]
Wickens, C.D.; Helton, W.S.; Hollands, J.G.; Banbury, S. Engineering Psychology and Human Performance; Routledge: Oxfordshire, UK, 2021. [Google Scholar]
Annett, J. Subjective rating scales: Science or art? Ergonomics 2002, 45, 966–987. [Google Scholar] [CrossRef]
Berntson, G.G.; Quigley, K.S.; Lozano, D. Cardiovascular psychophysiology. Handb. Psychophysiol. 2007, 3, 182–210. [Google Scholar]
Wickens, C.D. Multiple resources and mental workload. Hum. Factors 2008, 50, 449–455. [Google Scholar] [CrossRef]
Charles, R.L.; Nixon, J. Measuring mental workload using physiological measures: A systematic review. Appl. Ergon. 2019, 74, 221–232. [Google Scholar] [CrossRef] [PubMed]
Borghini, G.; Astolfi, L.; Vecchiato, G.; Mattia, D.; Babiloni, F. Measuring neurophysiological signals in aircraft pilots and car drivers for the assessment of mental workload, fatigue and drowsiness. Neurosci. Biobehav. Rev. 2014, 44, 58–75. [Google Scholar] [CrossRef] [PubMed]
Wilson, M.R.; Poolton, J.M.; Malhotra, N.; Ngo, K.; Bright, E.; Masters, R.S. Development and validation of a surgical workload measure: The surgery task load index (SURG-TLX). World J. Surg. 2011, 35, 1961–1969. [Google Scholar] [CrossRef] [PubMed]
Matthews, G.; Reinerman-Jones, L.E.; Barber, D.J.; Abich, J., IV. The psychometrics of mental workload: Multiple measures are sensitive but divergent. Hum. Factors 2015, 57, 125–143. [Google Scholar] [CrossRef]
Parasuraman, R.; Sheridan, T.B.; Wickens, C.D. Situation awareness, mental workload, and trust in automation: Viable, empirically supported cognitive engineering constructs. J. Cogn. Eng. Decis. Mak. 2008, 2, 140–160. [Google Scholar] [CrossRef]
Debie, E.; Rojas, R.F.; Fidock, J.; Barlow, M.; Kasmarik, K.; Anavatti, S.; Garratt, M.; Abbass, H.A. Multimodal fusion for objective assessment of cognitive workload: A review. IEEE Trans. Cybern. 2019, 51, 1542–1555. [Google Scholar] [CrossRef]
Abbakumov, D.; Desmet, P.; Noortgate, W.V.D. Measuring Growth in Students’ Proficiency in MOOCs: Two Component Dynamic Extensions for the Rasch Model. Behav. Res. Methods 2018, 51, 332–341. [Google Scholar] [CrossRef]
Sok, P.; Xiao, T.; Azeze, Y.; Jayaraman, A.; Albert, M.V. Activity Recognition for Incomplete Spinal Cord Injury Subjects Using Hidden Markov Models. IEEE Sens. J. 2018, 18, 6369–6374. [Google Scholar] [CrossRef]
French, A.; Cummings, M.L.; Zhu, H.; Pajic, M. Determining Novice and Expert Status in Human-Automation Interaction Through Hidden Markov Models. Appl. Artif. Intell. 2024, 38, 2402174. [Google Scholar] [CrossRef]
Sancinelli, S. Heart Rate Variability as an Indicator of Stress in Students’ Athletes. Open J. Med. Psychol. 2023, 12, 141–149. [Google Scholar] [CrossRef]
Préfontaine, Y.; Kormos, J. The Relationship Between Task Difficulty and Second Language Fluency in French: A Mixed Methods Approach. Mod. Lang. J. 2015, 99, 96–112. [Google Scholar] [CrossRef]
Desai, M.; Mishra, S.; Ganpule, A.; Kurien, A.A.; Muthu, V. Task Completion Time: Objective Tool for Assessment of Technical Skills in Laparoscopic Simulator for Urology Trainees. Indian J. Urol. 2008, 24, 35. [Google Scholar] [CrossRef] [PubMed]
Prajapati, S.; Shrivastava, S. Physiological correlates of cognitive load in laparoscopic surgery. Int. J. Life Sci. Biotechnol. Pharma Res. 2025, 14, 404–410. [Google Scholar]
Dreyfus, S.E. The five-stage model of adult skill acquisition. Bull. Sci. Technol. Soc. 2004, 24, 177–181. [Google Scholar] [CrossRef]
Blum, T.; Padoy, N.; Feußner, H.; Navab, N. (Eds.) Modeling and online recognition of surgical phases using hidden markov models. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2008: 11th International Conference, New York, NY, USA, 6–10 September 2008; Proceedings, Part II 11; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Wang, H.; Chen, D.; Huang, Y.; Zhang, Y.; Qiao, Y.; Xiao, J.; Xie, N.; Fan, H. Assessment of vigilance level during work: Fitting a hidden Markov model to heart rate variability. Brain Sci. 2023, 13, 638. [Google Scholar] [CrossRef]
Fitts, P.M.; Posner, M.I. Human Performance; Brooks/Cole: Pacific Grove, CA, USA, 1967. [Google Scholar]
Arcelay, I.; Goti, A.; Oyarbide-Zubillaga, A.; Akyazi, T.; Celaya, E.A.; Bringas, P.G. Definition of the Future Skills Needs of Job Profiles in the Renewable Energy Sector. Energies 2021, 14, 2609. [Google Scholar] [CrossRef]
Santhi, A.R.; Muthuswamy, P. Industry 5.0 or Industry 4.0S? Introduction to Industry 4.0 and a Peek Into the Prospective Industry 5.0 Technologies. Int. J. Interact. Des. Manuf. 2023, 17, 947–979. [Google Scholar] [CrossRef]
Fournier, É.; Kilgus, D.; Landry, A.; Hmedan, B.; Pellier, D.; Fiorino, H.; Jeoffrion, C. The Impacts of Human-Cobot Collaboration on Perceived Cognitive Load and Usability During an Industrial Task: An Exploratory Experiment. IISE Trans. Occup. Ergon. Hum. Factors 2022, 10, 83–90. [Google Scholar] [CrossRef]
Paulíková, A.; Babeľová, Z.G.; Ubárová, M. Analysis of the Impact of Human-Cobot Collaborative Manufacturing Implementation on the Occupational Health and Safety and the Quality Requirements. Int. J. Environ. Res. Public Health 2021, 18, 1927. [Google Scholar] [CrossRef]
Dcruz, D.; Thomas, A.; Murphy, L. Identifying the Fit-for-Purpose Leadership for the Manufacturing Organizations in the Era of Industry 5.0—A Literature Review. In Advances in Manufacturing Technology XXXVI; IOS Press: Amsterdam, The Netherlands, 2023. [Google Scholar]
Kumar, S. Introductory Chapter: Welding in the Era of Industry 5.0. In Welding-Materials, Fabrication Processes, and Industry 5.0.; IntechOpen: London, UK, 2024. [Google Scholar]
Lepenioti, K.; Pertselakis, M.; Bousdekis, A.; Louca, A.; Lampathaki, F.; Apostolou, D.; Mentzas, G.; Anastasiou, S. Machine Learning for Predictive and Prescriptive Analytics of Operational Data in Smart Manufacturing. In Advanced Information Systems Engineering Workshops, Proceedings of the CAiSE 2020 International Workshops, Grenoble, France, 8–12 June 2020; Proceedings 32; Springer International Publishing: Cham, Switzerland, 2020; pp. 5–16. [Google Scholar]
Truong, N.C.D.; Wang, X.; Liu, H. Temporal and Spectral Analyses of EEG Microstate Reveals Neural Effects of Transcranial Photobiomodulation on the Resting Brain. Front. Neurosci. 2023, 17, 1247290. [Google Scholar] [CrossRef]
Rubio, S.; Díaz, E.; Martín, J.; Puente, J.M. Evaluation of subjective mental workload: A comparison of SWAT, NASA-TLX, and workload profile methods. Appl. Psychol. 2004, 53, 61–86. [Google Scholar] [CrossRef]
Chang, Y.K.; Labban, J.D.; Gapin, J.I.; Etnier, J.L. The Effects of Acute Exercise on Cognitive Performance: A Meta-Analysis. Brain Res. 2012, 1453, 87–101. [Google Scholar] [CrossRef] [PubMed]
Bartsch, R.P.; Liu, K.K.L.; Bashan, A.; Ivanov, P.C. Network Physiology: How Organ Systems Dynamically Interact. PLoS ONE 2015, 10, e0142143. [Google Scholar] [CrossRef] [PubMed]
Zahara, M.N.; Hendrayana, A.; Pamungkas, A.S. The Effect of Problem-Based Learning Model Modified by Cognitive Load Theory on Mathematical Problem Solving Skills. Hipotenusa J. Math. Soc. 2020, 2, 41–55. [Google Scholar] [CrossRef]
Dias, R.D.; Ngo-Howard, M.C.; Boskovski, M.T.; Zenati, M.A.; Yule, S. Systematic Review of Measurement Tools to Assess Surgeons’ Intraoperative Cognitive Workload. Br. J. Surg. 2018, 105, 491–501. [Google Scholar] [CrossRef]
Runswick, O.R.; Roca, A.; Williams, A.M.; Bezodis, N.E.; McRobert, A.P.; North, J.S. The Impact of Contextual Information and a Secondary Task on Anticipation Performance: An Interpretation Using Cognitive Load Theory. Appl. Cogn. Psychol. 2018, 32, 141–149. [Google Scholar] [CrossRef]
Vieira, L.M.N.; Ibiapina, C.D.C.; Camargos, P.A.M.; Brand, P.L. Simulation-based Bronchoscopy Training: Randomized Trial Comparing Worked Example to Video Introduction. Pediatr. Pulmonol. 2023, 58, 3227–3234. [Google Scholar] [CrossRef]
Spinhoven, P.; Van der Veen, D.C.; Oude Voshaar, R.C.; Comijs, H.C. Worry and Cognitive Control Predict Course Trajectories of Anxiety in Older Adults With Late-Life Depression. Eur. Psychiatry 2017, 44, 134–140. [Google Scholar] [CrossRef]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in Psychology; Elsevier: Amsterdam, The Netherlands, 1988; Volume 52, pp. 139–183. [Google Scholar]
Rostami, F.; Babaei-Pouya, A.; Teimori-Boghsani, G.; Jahangirimehr, A.; Mehri, Z.; Feiz-Arefi, M. Mental workload and job satisfaction in healthcare workers: The moderating role of job control. Front. Public Health 2021, 9, 683388. [Google Scholar] [CrossRef]
Storch, N. Collaborative Writing in L2 Contexts: Processes, Outcomes, and Future Directions. Annu. Rev. Appl. Linguist. 2011, 31, 275–288. [Google Scholar] [CrossRef]
Boumann, H.; Hamann, A.; Biella, M.; Carstengerdes, N.; Sammito, S. (Eds.) Suitability of physiological, self-report and behavioral measures for assessing mental workload in pilots. In International Conference on Human-Computer Interaction; Springer: Cham, Switzerland, 2023. [Google Scholar]
Adamolekun, A.; Logah, F.X.; Alabi, C.; Baanye, J.; Wilson, M.; Ganiu, O.; Seong, Y.; Yi, S. Toward a Unified Framework for Multimodal Cognitive Workload Assessment in Human-machine Interaction Systems. 2025. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5235980 (accessed on 25 June 2025).
Sriranga, A.K.; Lu, Q.; Birrell, S. A systematic review of in-vehicle physiological indices and sensor technology for driver mental workload monitoring. Sensors 2023, 23, 2214. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Zhang, Q.; Zhang, T.; Wang, T.; Tao, D. Assessment of drivers’ mental workload by multimodal measures during auditory-based dual-task driving scenarios. Sensors 2024, 24, 1041. [Google Scholar] [CrossRef] [PubMed]
Zen, H.; Sak, H. Unidirectional Long Short-Term Memory Recurrent Neural Network With Recurrent Output Layer for Low-Latency Speech Synthesis. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; pp. 4470–4474. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Liu, T.; Huang, X.; Ma, J. Conditional Random Fields for Image Labeling. Math. Probl. Eng. 2016, 2016, 15. [Google Scholar] [CrossRef]
Wu, Y.; Yu, C.H.; Cai, B.B.; Qin, S.J.; Gao, F.; Wen, Q. Quantum Conditional Random Field. arXiv 2019, arXiv:1901.01027. [Google Scholar]
Rudovic, O.; Pavlović, V.; Pantić, M. Kernel Conditional Ordinal Random Fields for Temporal Segmentation of Facial Action Units. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012; pp. 260–269. [Google Scholar]
Ghojogh, B.; Karray, F.; Crowley, M. Hidden Markov Model: Tutorial. 2024. Available online: https://www.researchgate.net/publication/382134823_Hidden_Markov_Model_Tutorial (accessed on 25 June 2025).
Michel, F.; Siegle, M. Formal error bounds for the state space reduction of Markov chains. Perform. Eval. 2025, 167, 102464. [Google Scholar] [CrossRef]
Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 2002, 77, 257–286. [Google Scholar] [CrossRef]
Viterbi, A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 2003, 13, 260–269. [Google Scholar] [CrossRef]
Sharma, K.; Papamitsiou, Z.; Olsen, J.K.; Giannakos, M. (Eds.) Predicting learners’ effortful behaviour in adaptive assessment using multimodal data. In Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, Frankfurt, Germany, 23–27 March 2020. [Google Scholar]
Zheng, Y.; Que, Y.; Hu, X.; Hsiao, J.H. (Eds.) Predicting reading performance based on eye movement analysis with hidden Markov models. In Proceedings of the 2022 International Conference on Advanced Learning Technologies (ICALT), Bucharest, Romania, 1–4 July 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
Wu, Z.; Zhang, M.; Xiao, L.; Lv, X. A Driving Fatigue Identification Method Based on HMM. 2020. Available online: https://www.sae.org/publications/technical-papers/content/2020-01-0112/ (accessed on 25 June 2025).
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar] [CrossRef]
Juang, B.H.; Rabiner, L.R. A probabilistic distance measure for hidden Markov models. AT T Tech. J. 1985, 64, 391–408. [Google Scholar] [CrossRef]
Trietsch, D.; Mazmanyan, L.; Gevorgyan, L.; Baker, K.R. Modeling activity times by the Parkinson distribution with a lognormal core: Theory and validation. Eur. J. Oper. Res. 2012, 216, 386–396. [Google Scholar] [CrossRef]
Malik, M. Heart rate variability: Standards of measurement, physiological interpretation, and clinical use. Circulation 1996, 93, 1043–1065. [Google Scholar] [CrossRef]
Mittelstadt, B.D.; Allo, P.; Taddeo, M.; Wachter, S.; Floridi, L. The ethics of algorithms: Mapping the debate. Big Data Soc. 2016, 3, 2053951716679679. [Google Scholar] [CrossRef]
Gao, L.; Wang, C.; Wu, G. Hidden Semi-Markov Models-Based Visual Perceptual State Recognition for Pilots. Sensors 2023, 23, 6418. [Google Scholar] [CrossRef] [PubMed]
Breque, M.; De Nul, L.; Petridis, A. Industry 5.0: Towards a Sustainable, Human-Centric and Resilient European Industry; European Commission, Directorate-General for Research and Innovation: Luxembourg, 2021; Volume 46. [Google Scholar]
Romero, D.; Stahre, J. Towards the resilient operator 5.0: The future of work in smart resilient manufacturing systems. Procedia CIRP 2021, 104, 1089–1094. [Google Scholar] [CrossRef]
Weenk, M.; van Goor, H.; Frietman, B.; Engelen, L.J.; van Laarhoven, C.J.; Smit, J.; Bredie, S.J.; van de Belt, T.H. Continuous monitoring of vital signs using wearable devices on the general ward: Pilot study. JMIR Mhealth Uhealth 2017, 5, e7208. [Google Scholar] [CrossRef]
Vedula, S.S.; Ishii, M.; Hager, G.D. Objective assessment of surgical technical skill and competency in the operating room. Annu. Rev. Biomed. Eng. 2017, 19, 301–325. [Google Scholar] [CrossRef]
Hernández-Sabaté, A.; Yauri, J.; Folch, P.; Piera, M.À.; Gil, D. Recognition of the mental workloads of pilots in the cockpit using EEG signals. Appl. Sci. 2022, 12, 2298. [Google Scholar] [CrossRef]
Masi, G.; Amprimo, G.; Ferraris, C.; Priano, L. Stress and workload assessment in aviation—A narrative review. Sensors 2023, 23, 3556. [Google Scholar] [CrossRef]
Wang, L.; Gao, R.; Váncza, J.; Krüger, J.; Wang, X.V.; Makris, S.; Chryssolouris, G. Symbiotic human-robot collaborative assembly. CIRP Ann. 2019, 68, 701–726. [Google Scholar] [CrossRef]
D’Mello, S.; Dieterle, E.; Duckworth, A. Advanced, analytic, automated (AAA) measurement of engagement during learning. Educ. Psychol. 2017, 52, 104–123. [Google Scholar] [CrossRef] [PubMed]
Kercher, A.; Gossage, L. Identifying Risk Factors for Compassion Fatigue in Psychologists in Aotearoa, New Zealand, During the COVID-19 Pandemic. Prof. Psychol. Res. Pract. 2024, 55, 28–38. [Google Scholar] [CrossRef]
Frías, C.E.; Samarasinghe, N.; Cuzco, C.; Koorankot, J.; de Juan, A.; Ali Rudwan, H.M.; Rahim, H.F.A.; Zabalegui, A.; Tulley, I.; Al-Harahsheh, S.T.; et al. Strategies to Support the Mental Health and Well-Being of Health and Care Workforce: A Rapid Review of Reviews. Front. Med. 2025, 12, 1530287. [Google Scholar] [CrossRef] [PubMed]
O’Connor, E.; Prebble, K.; Waterworth, S. Organizational Factors to Optimize Mental Health Nurses’ Wellbeing in the Workplace: An Integrative Literature Review. Int. J. Ment. Health Nurs. 2023, 33, 5–17. [Google Scholar] [CrossRef]
King, O.; Buccheri, A.; Isaacs, A.N.; Bishop, J.; Alston, L.; Versace, V.; Wong Shee, A.; Sourlos, N.; Imran, D.; Jacobs, J.; et al. The Community, the Workplace, and Public Health Measures: A Qualitative Study of Factors That Impacted the Wellbeing of Rural Health Service Staff in Victoria, Australia, During the COVID-19 Pandemic. Health Soc. Care Community 2023, 2023, 5556980. [Google Scholar] [CrossRef]
Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]

Figure 1. Conceptual architecture of the proposed Multimodal Hidden Markov Model (MHMM).

Figure 2. Overall methodology for the proposed MHMM.

Figure 3. Distributions of the generated multimodal data for each proficiency state.

Figure 4. Distribution of multimodal data by proficiency state. Each violin plot displays the data distribution, with dashed lines indicating the median (central line), 25th percentile (bottom line), and 75th percentile (top line) of the data.

Figure 5. Moment-by-moment proficiency state predictions for a sample participant.

Figure 6. Prediction accuracy over procedure steps with simulated confidence intervals.

Figure 7. Dynamic transition matrices of the MHMM (Expert System) under different contexts.

Figure 8. Top most frequent state transitions (Sample Participant).

Table 1. Sensor types for multimodal data streams.

Data Modality	Example Metric	Typical Sensor Types	Example Devices/Methods
Physiological	Heart Rate Variability (HRV)	ECG, Photoplethysmography (PPG) sensors	Polar H10 (ECG), Apple Watch (PPG), Oura Ring
Behavioral	Task Completion Time (TCT)	Computer input logs, motion sensors, RFID	Event logs, accelerometers, RFID tools
Subjective	NASA Task Load Index (NASA-TLX)	Digital/paper questionnaires, mobile apps	NASA-TLX app, web survey, paper forms

Table 2. Comparative analysis of advanced temporal modeling approaches.

Method	Strengths	Weaknesses	Temporal Data	Multimodal Data	Interpretability
Unimodal Hidden Markov Model (HMM)	Captures temporal dependencies; probabilistic; interpretable states	Limited to a single observation stream, providing an incomplete view of proficiency	Excellent	Poor	Moderate to High
Long Short-Term Memory (LST) Networks	Excellent at capturing long-range dependencies; highly flexible; state-of-the-art accuracy on many sequence tasks	Black-box nature makes interpretation difficult; requires large datasets and is computationally expensive	Excellent	Excellent	Low
Conditional Random Fields (CRF)	Discriminative model that considers the entire sequence context (global normalization); avoids observation independence assumption	Training can be computationally intensive; may not capture long-range dependencies as well as LSTMs	Excellent	Good	Moderate
Markov Chain	Simple structure; highly interpretable transitions between observable states	Cannot model latent/hidden states; Usually limited to a single observation stream	Good	Poor	High

Table 3. Comparative analysis of HMM-based proficiency assessment models.

Study	Model	Data Modalities	Approach	Key Contributions	Limitations/Gaps
[20]	HMM	Behavioral (real-time–strategy-game logs: actions, APM, MMR)	Unimodal	Differentiated novice/expert status in human-automation tasks by modeling interaction patterns	Lacks physiological and subjective data, limiting cognitive load insights
[61]	HMM	Multimodal (Logs, eye tracking, EEG, wristband, facial expressions)	Multimodal	Classified learners’ effort in adaptive assessments to predict future effort levels	Focuses on classifying learner effort rather than directly assessing proficiency
[27]	HMM	Physiological (HRV)	Unimodal	Assessed vigilance levels during work tasks using HRV data	Relies on single modality (HRV), missing broader complexity of user state
[62]	EMHMM	Behavioral (Eye tracking)	Unimodal	Predicted reading performance by analyzing eye movement patterns	Limited to eye tracking, missing physiological/subjective metrics
[63]	HMM	Behavioral (facial video/key-point features)	Unimodal	Identified driving fatigue	Lacks physiological (e.g., EEG, HRV) and subjective workload metrics, so cognitive-state coverage is incomplete.

Table 4. Summary of model performance across evaluation scenarios.

Scenario	Model	Accuracy	Weighted F1-Score
Baseline	MHMM (Expert System)	0.927	0.925
	MHMM (Trained)	0.925	0.924
	Hybrid HMM-LSTM	0.917	0.913
	LSTM	0.900	0.898
	CRF	0.885	0.884
	Transformer	0.793	0.793
	Unimodal HMM (TCT)	0.639	0.612
	Unimodal HMM (TLX)	0.635	0.627
	Unimodal HMM (HRV)	0.610	0.603
Stress Test (15% Noise)	MHMM (Expert System)	0.889	0.888
	LSTM	0.887	0.880
	MHMM (Trained)	0.885	0.884
	Hybrid HMM-LSTM	0.881	0.881
	CRF	0.856	0.852
	Transformer	0.779	0.779
Missing Data (15%)	MHMM (Expert System)	0.923	0.922
	MHMM (Trained)	0.921	0.920
	LSTM	0.921	0.898
	Transformer	0.805	0.805
	CRF	0.801	0.796
	Hybrid HMM-LSTM	0.422	0.422
Imbalanced States	MHMM (Expert System)	0.934	0.933
	MHMM (Trained)	0.933	0.932
	Hybrid HMM-LSTM	0.919	0.919
	LSTM	0.934	0.909
	CRF	0.897	0.897
	Transformer	0.824	0.824

Table 5. Selected parameter sensitivity results for trained MHMM.

Parameter (State)	Perturbation	Accuracy	F1-Macro
	Baseline	0.925	0.912
HRV mean (0)	±10%	0.919–0.917	0.904–0.902
HRV mean (1)	±10%	0.926–0.920	0.913–0.907
TCT std (0)	±15%	0.926–0.925	0.914–0.912
TLX mean (1)	±10%	0.919–0.920	0.904–0.908
TLX std (2)	±10%	0.925–0.924	0.912–0.911

Table 6. Comparative of temporal modeling approaches (2020–2025).

Study (Author-Year—Short Label)	Data Modalities	Methodology	Reported Accuracy	Main Limitation
Proposed MHMM	HRV + TCT + NASA-TLX	3-state Multimodal HMM (Baum-Welch + Viterbi)	MHMM (Trained) = 92.5% accuracy	Simulation-based validation only; field testing pending
French et al. (2024)—HMM in HAI [20]	StarCraft II gameplay actions, APM, MMR	Unsupervised HMMs; BIC; rare-state filtering	No classification accuracy reported; paper reports EM convergence metric KL-divergence = 0.0383 between successive iterations	Gaming-only data; lacks physiological/subjective context; no accuracy metric provided
Sharma et al. (2020)—Predicting effort in adaptive assessment [61]	Click-logs, eye-tracking, EEG, HR/EDA, facial video	k-means → 2-state HMM + Viterbi	Weighted F1 = 0.90 (Precision = 0.89, Recall = 0.84)	Focusing on momentary effort, not longitudinal proficiency evolution
Wang et al. (2023)—Vigilance assessment using HRV [27]	ECG-derived HRV indices	3-state HMM (Baum-Welch + Viterbi)	Training = 92.67%; Prediction = 87.78%; MAE = 0.12	Unimodal (physiological only); lacks behavioral/subjective interpretation
Wu et al. (2020)—Driving fatigue identification [63]	Facial key-points (video) ± EEG	Fuzzy C-Means → 5-state HMM	Overall HMM accuracy ≈ 84% on combined facial-key-points	Binary fatigue only; lacks subjective input and longitudinal proficiency tracking
Gao et al. (2023)—HSMM Visual Perceptual State Recognition for Pilots [69]	Eye-tracking (AOI sequences)	Hidden Semi-Markov Model (duration modeling)	Accuracy = 93.55%, Recall = 91.58% (HMM = 80%)	Gaze-only input; lacks physiological or subjective workload context

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alsanousi, M.M.; Prabhu, V.V. Multimodal Hidden Markov Models for Real-Time Human Proficiency Assessment in Industry 5.0: Integrating Physiological, Behavioral, and Subjective Metrics. Appl. Sci. 2025, 15, 7739. https://doi.org/10.3390/app15147739

AMA Style

Alsanousi MM, Prabhu VV. Multimodal Hidden Markov Models for Real-Time Human Proficiency Assessment in Industry 5.0: Integrating Physiological, Behavioral, and Subjective Metrics. Applied Sciences. 2025; 15(14):7739. https://doi.org/10.3390/app15147739

Chicago/Turabian Style

Alsanousi, Mowffq M., and Vittaldas V. Prabhu. 2025. "Multimodal Hidden Markov Models for Real-Time Human Proficiency Assessment in Industry 5.0: Integrating Physiological, Behavioral, and Subjective Metrics" Applied Sciences 15, no. 14: 7739. https://doi.org/10.3390/app15147739

APA Style

Alsanousi, M. M., & Prabhu, V. V. (2025). Multimodal Hidden Markov Models for Real-Time Human Proficiency Assessment in Industry 5.0: Integrating Physiological, Behavioral, and Subjective Metrics. Applied Sciences, 15(14), 7739. https://doi.org/10.3390/app15147739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multimodal Hidden Markov Models for Real-Time Human Proficiency Assessment in Industry 5.0: Integrating Physiological, Behavioral, and Subjective Metrics

Abstract

1. Introduction

1.1. From Automation to Augmentation: The Rise of Industry 5.0

1.2. Limitations of Static and Unimodal Proficiency Assessments

1.3. Multimodal Integration for Comprehensive Proficiency Assessment

1.4. The Multimodal Hidden Markov Model (MHMM): A Dynamic Solution

2. Literature Review

2.1. Evolution from Industry 4.0 to Industry 5.0: Cognitive Burden and Human–AI Collaboration

2.2. Limitations of Traditional Proficiency Assessment Models

2.3. The Multifaceted Nature of Human Proficiency

2.4. Rationale for Multimodal Assessment

2.5. Three-Level Proficiency Classification Framework

2.6. Rationale for Selected Proficiency Indicators: HRV, TCT, and NASA-TLX

2.7. Sensor Types for Multimodal Data Acquisition

2.8. Modeling Approaches: A Comparative Analysis of Temporal Models

2.9. The Need for Temporal Models and the Markov Chain Foundation

2.10. The HMM as the Superior Choice

2.11. Comparative Review of HMM Implementations

2.12. Synthesis and Rationale for the Proposed MHMM Framework

3. Methodology

3.1. HMM Foundations for Proficiency Assessment

3.1.1. The Hidden Markov Model: A Framework for Latent Proficiency States

3.1.2. Algorithmic Solutions for HMM Problems in Proficiency Modeling

3.2. The Proposed Multimodal Hidden Markov Model (MHMM)

3.2.1. The Conditional Independence Assumption in MHMM

3.2.2. Statistical Modeling of Emission Probabilities

4. Dataset Generation and Simulation Environment

4.1. Rationale for a Simulation-Based Approach

4.2. Simulation Design and Ground-Truth Generation

4.3. Comparative Benchmarking Framework

4.4. Experimental Procedure and Stress-Testing Scenarios

4.5. Robustness to Noisy and Missing Sensor Data

4.6. Parameter Sensitivity Analysis

5. Results and Discussion

5.1. Data Distribution and Model Grounding

5.2. Comparative Model Performance Across Scenarios

5.3. Parameter Sensitivity Analysis of the Trained MHMM

5.4. Real-World Applicability and Implementation Scenarios

5.5. Analysis of Proficiency Trajectory Tracking

5.6. Interpretability: Unpacking Proficiency Dynamics

5.7. Recent Temporal Modeling Approaches (2020–2025)

5.8. Discussion and Implications for Industry 5.0

5.9. Limitations and Future Directions

6. Conclusions

7. Potential Applications and Broader Impact

7.1. Domain-Specific Applications

7.2. Broader Societal and Economic Impact

7.3. Methodological Innovation and Implementation Challenges

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI