NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction

Papakostas, Christos; Troussas, Christos; Krouska, Akrivi; Sgouropoulou, Cleo

doi:10.3390/mti9100108

Open AccessArticle

NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction

Department of Informatics and Computer Engineering, University of West Attica, 12243 Egaleo, Greece

^*

Author to whom correspondence should be addressed.

Multimodal Technol. Interact. 2025, 9(10), 108; https://doi.org/10.3390/mti9100108

Submission received: 7 July 2025 / Revised: 15 October 2025 / Accepted: 16 October 2025 / Published: 18 October 2025

Download

Browse Figures

Versions Notes

Abstract

The increasing ubiquity of wearable computing and multimodal interaction technologies has created unprecedented opportunities for natural and seamless human–computer interaction. However, most existing systems adapt only to external user actions such as speech, gesture, or gaze, without considering internal cognitive or affective states. This limits their ability to provide intelligent and empathetic adaptations. This paper addresses this critical gap by proposing the Neuro-Adaptive Multimodal Architecture (NAMI), a principled, modular, and reproducible framework designed to integrate behavioral and neurophysiological signals in real time. NAMI combines multimodal behavioral inputs with lightweight EEG and peripheral physiological measurements to infer cognitive load and engagement and adapt the interface dynamically to optimize user experience. The architecture is formally specified as a three-layer pipeline encompassing sensing and acquisition, cognitive–affective state estimation, and adaptive interaction control, with clear data flows, mathematical formalization, and real-time performance on wearable platforms. A prototype implementation of NAMI was deployed in an augmented reality Java programming tutor for postgraduate informatics students, where it dynamically adjusted task difficulty, feedback modality, and assistance frequency based on inferred user state. Empirical evaluation with 100 participants demonstrated significant improvements in task performance, reduced subjective workload, and increased engagement and satisfaction, confirming the effectiveness of the neuro-adaptive approach.

Keywords:

neuro-adaptive interfaces; multimodal interaction; wearable computing; cognitive load estimation; affective computing; human–computer interaction; augmented reality; EEG-based adaptation; real-time personalization; user state modeling

1. Introduction

The evolution of human–computer interaction (HCI) has been marked by progressively sophisticated modalities aimed at mimicking the complexity and dynamism of natural human communication [1]. Advances in technology, from graphical user interfaces to speech recognition, gesture-based input, and haptic output, illustrate that modern interaction systems have moved beyond the confines of single modality input/output channels and are now essentially multimodal [2]. Furthermore, the advent of wearable computing and ubiquitous systems has extended the interaction field, enabling technology to be seamlessly embedded in everyday human contexts [3]. However, despite these developments, most multimodal systems continue to rely on perceptible behavioral cues to deduce user intentions and, consequently, fail to take into consideration the internal cognitive and affective states of users in real time [2]. This limitation inhibits such systems from reacting adaptively and empathetically to the needs of users.

Recent developments in the fields of neurotechnology and affective computing offer a feasible approach to overcome this limitation [4,5,6]. The development of cheap and non-invasive devices, such as wearable EEG headsets and functional near-infrared spectroscopy (fNIRS) sensors, has enabled the monitoring of brain activity in naturalistic environments. If these technologies are combined with physiological sensors monitoring variables like heart rate variability, skin conductance, and pupil dilation, a new generation of neuro-adaptive systems is under development. These systems aim to estimate users’ cognitive workload, attentional engagement, and emotional states in real time, and then adapt the interactive environment in a corresponding way. Nevertheless, the integration of such internal state information in a holistic, reproducible, and modular architecture for real-time multimodal wearable interaction is a vastly under-explored area of research [7,8,9,10,11].

The goal of the current study is to close the identified gap by proposing a neuro-adaptive multimodal architecture that integrates real-time electroencephalogram (EEG) and physiological signals into the wearable HCI interface control loop. Specifically, we describe and deploy an architectural framework made up of three interacting modules: sensing and data acquisition, cognitive-affective state estimation, and adaptive interaction control. We hypothesize that integrating neurophysiological and behavioral signals enables real-time adaptation that improves task performance and user experience; Section 5 reports the test of this hypothesis. We demonstrate the feasibility and benefits of this architecture through the development of a prototype system where users engage in an augmented reality (AR) task environment while wearing a lightweight EEG headband, a photoplethysmography (PPG) sensor, and a galvanic skin response (GSR) device. Multimodal inputs such as voice, head orientation, and hand gestures are combined with neurophysiological data to form a dynamic user profile. The system adjusts visual and auditory output channels, adapts task difficulty, and increases assistance features based on the calculated levels of cognitive load and emotional engagement.

Building on prior work in affective computing and multimodal HCI, the present study is grounded on two premises that define its conceptual background. First, behavioral and neurophysiological signals can be jointly integrated within a wearable multimodal architecture to infer users’ cognitive and affective states in real time. Second, adaptive control policies can translate these inferred states into effective adjustments of interaction parameters such as task difficulty, feedback modality, and assistance frequency. These considerations motivate the development of NAMI and lead to the central research question of this study:

RQ1: To what extent does a neuro-adaptive interface improve task performance, workload management, and user experience in a real educational setting compared with a baseline non-adaptive system?

To investigate this question comprehensively, three subordinate research questions are formulated:

RQ1.1: How can a wearable multimodal framework be formally specified to integrate behavioral (voice, gesture, gaze) and neurophysiological (EEG, GSR, HRV) inputs in real time?

RQ1.2: How can cognitive–affective state estimation be operationalized through lightweight, transparent modeling suitable for wearable systems?

RQ1.3: How do adaptive control mechanisms based on these estimates affect task performance, workload, engagement, and satisfaction relative to a non-adaptive baseline?

To address these research questions, this paper makes the following contributions:

Formal architecture: We propose a principled, three-layer wearable framework that integrates multimodal behavioral inputs (voice, gesture, gaze) with neurophysiological data (EEG, GSR, HRV). These inputs are processed by a state estimation component (implemented as a ridge regression model, later referred to as CASEM) and an adaptive control component (implemented as a transparent rule-based controller, later referred to as AICM).
Transparent control policy: We specify individualized percentile-based thresholds and the associated action vector, enabling faithful reproduction of the adaptation process.
Real-time wearable implementation: We demonstrate closed-loop operation on AR hardware with end-to-end latency under 85 ms, confirming feasibility within interactive HCI limits.

Based on these contributions, we formulate the following hypothesis that is tested in this study:

H1.

A wearable neuro-adaptive multimodal interface significantly improves task performance, reduces subjective workload, and enhances engagement and satisfaction compared with a non-adaptive baseline condition.

This hypothesis is empirically evaluated in Section 5, where results from the large-scale experiment with 100 postgraduate learners confirm its validity.

The significance of this work lies not only in its interdisciplinary integration of neuroscience, wearable technology, and HCI but in its potential for real-world, practical uses beyond the laboratory environment. The ability to create systems that are emotionally and cognitively aware is a critical step in the development of more humane and context-aware technology, including personalized learning and training, mental wellbeing monitoring, and cognitive assistance at the workplace.

2. Related Work

Multimodal interaction has been recognized as one of the defining features of next-generation human–computer interfaces for the past few decades as an avenue to enable more natural and efficient communication through the combination of complementary input and output modalities. Early studies of multimodal systems were centered on the combination of different modalities, including speech and gesture [12], haptic and visual feedback [13], and gaze and voice commands [14]. These efforts have given rise to robust multimodal systems that increase recognition accuracy and user experience across a wide range of applications, from desktop computing to virtual reality. However, most of these systems address the user as a black box, reacting to outward behavior but having no understanding of the user’s internal cognitive or affective state.

The affective computing research [4,15] attempts to overcome this limitation by giving systems the ability to perceive and respond to the emotions of their users. Several strategies used to infer affective states and then modify system behavior range from facial expression analysis [16,17], prosody analysis in speech [18,19], to physiology measurement [20,21]. However, these methods are often limited by their indirect and inherently ambiguous nature since they rely on outward expressions that are not necessarily reflective of the underlying cognitive or affective processes.

In parallel, the discipline of neuroergonomics [22] has developed through the application of brain–computer interfaces (BCIs) and neurophysiological monitoring in human–machine interfaces. Studies have shown that electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) are useful measures for assessing mental workload, attention, and interest within controlled settings [23,24,25]. For example, in [26], the authors used EEG-based workload evaluations to dynamically adjust automation levels in the course of a flight simulation task, while in [27], the authors explored real-time fNIRS applications for enhancing speech recognition technology. These studies document the potential of neurophysiological signals to serve more sophisticated interaction paradigms by offering direct information about the internal state of users.

The development of wearable neurotechnology has enabled the transfer of such abilities from highly controlled laboratory settings to more ecologically reasonable ones. Currently, commercially marketed EEG headsets and fNIRS wearable devices allow for ongoing monitoring of brain activity during everyday activities [28,29,30]. This has encouraged research into neuroadaptive user interfaces, including real-time adjustment of system parameters in accordance with detected mental states [31,32,33]. However, the majority of current neuroadaptive systems are largely focused on a single modality (e.g., EEG) or a particular area of application (e.g., clinical rehabilitation or entertainment) [34], often without regard to the feasibility of integration into general multimodal interaction architectures.

Apart from the neurological signals received from the brain, physiological measures like heart rate variability (HRV), galvanic skin response (GSR), and pupilometry have been widely used in the detection of user states [35,36,37]. The fusion of such physiological measures with electroencephalogram (EEG) measures has been shown to improve the accuracy of mental state classification [38] where the authors demonstrated better engagement detection using EEG and GSR signals synergistically. This highlights the need for a multimodal framework for internal states that incorporates both central and peripheral signals related to cognitive and affective processes.

Despite these promising developments, a major shortfall remains in cohesive and formalized architectures that combine neurophysiological sensing and multimodal behavioral interaction in a wearable, real-time adaptive system. Current research often dissociates the development of multimodal interaction from its neuro-adaptive aspects, thus losing the potential value that can be achieved when these components are embedded in a reproducible and modular system. In addition, a substantial proportion of the existing literature reports findings from small-scale, highly controlled lab experiments, thus limiting the ecological soundness and generalizability of the findings.

This work seeks to advance the field by presenting a neuro-adaptive multimodal system that combines EEG, physiological signals, and traditional behavioral input modalities—namely voice, gesture, and gaze—into a unified, wearable, and real-time system. Unlike the existing body of work that mostly focuses on a single modality or scenario, our architecture is developed as a generalizable and extensible framework, which is demonstrated here in the context of augmented reality applications. This framework presents an important leap in overcoming the current divide in multimodal HCI research by proving that real-time measurements of internal states can be used systematically to enhance the control logic in pervasive computing environments and thus enable more adaptive and context-aware interaction experiences.

3. System Architecture

NAMI is a principled and modular framework, implemented as a working prototype, that endows wearable human–computer interfaces with the ability to respond adaptively to users’ affective and cognitive states in real time. The framework describes how multimodal behavioral and neurophysiological signals are transformed into control parameters that guide the adaptive interface. It emphasizes reproducibility, scalability, and computational efficiency, making it suitable even for wearable devices with limited resources. The subsequent sections present the system architecture with sufficient technical detail to ensure that the approach can be accurately reproduced and rigorously evaluated.

The architecture defines a deterministic function, defined as F, that maps the multimodal user state vector at time step t, defined as X_t ∈ R^m+n, to the control output vector u_t ∈ R³. The control output vector u_t contains elements that relate to task difficulty, feedback modality, and assistance frequency. The formal definition of this mapping is given below:

F: X_t⟼u_t

Mapping F is achieved by the successive application of three clearly specified modules, with each module performing a specific transformation:

X_{t} \overset{M_{1}}{\to} Z_{t} \overset{M_{2}}{\to} \hat{y_{t}} \overset{M_{3}}{\to} u_{t}

We use early feature-level fusion, concatenating modality-specific features into Z_t = [z_voice, z_head, z_hand, z_gaze, z_EEG, z_GSR, z_HRV]. Early fusion was chosen over late fusion because it enables a unified feature space and reduces end-to-end latency, which is critical for real-time adaptation in AR. After per-participant baseline normalization, a ridge-regression model maps Z_t to continuous workload and engagement: [ŵ(t), ê(t)] = Z_tW + b, with L₂ regularization to prevent overfitting and maintain interpretability. Outputs are clipped to the range [0, 1] to facilitate integration with the rule-based controller, and smoothed over a two-second sliding window to attenuate sensor noise and transient fluctuations. If a modality is temporarily unavailable, a zero-mean imputed value (derived from baseline statistics) is used to keep Z_t stable, ensuring robust operation despite occasional signal loss.

The sensing architecture combines two synchronized subsystems: (i) a neurophysiological stack (4-channel frontal/temporal EEG sampled at 256 Hz; wrist-worn GSR and PPG sensors for HR/HRV) and (ii) a behavioral stack (automatic speech recognition for voice, depth-camera hand pose tracking, and 30 Hz eye-gaze capture). This design reflects prior evidence that neurophysiological signals capture workload and arousal while behavioral signals provide complementary information about attention and task engagement, thus justifying their joint inclusion in the fusion process. The sensors within the AR headset enable behavioral sensing by recording verbal commands using an automatic speech recognition (ASR) engine, recognizing hand gestures using a depth camera in conjunction with a pose estimation algorithm, and tracking gaze direction at a sampling rate of 30 Hz. These behavioral signals are then converted into a feature vector:

x_{b} (t) = [\begin{matrix} s_{1} (t) \\ s_{2} (t) \\ ⋮ \\ s_{m} (t) \end{matrix}]

where each s_i(t) is a scalar feature normalized to the unit interval [0, 1]. In parallel, neurophysiological sensing is performed using a lightweight EEG headband, providing four frontal and temporal channels at 256 Hz, and a wrist-mounted device providing GSR and HRV. EEG signals are bandpass-filtered between 1–40 Hz and decomposed into standard frequency bands, with power spectral density P_k(t) computed for each band. Peripheral signals include normalized SCL, SCR rate, and HRV measures. These are combined into the neurophysiological feature vector:

x_{n} (t) = [\begin{matrix} e_{1} (t) \\ e_{2} (t) \\ ⋮ \\ e_{n} (t) \end{matrix}]

Then, the complete normalized feature vector is created as explained below:

Z_{t} = [\begin{matrix} x_{b} (t) \\ x_{n} (t) \end{matrix}]

Signals are synchronized with a common clock, normalized via z-score transformation, and exposed to adaptive noise-cancellation methods to obviate the effects of motion and ambient noise. All of these design choices have been tested empirically to ensure that the computational latency in this step is short (less than 5 ms per cycle) and feature variance is kept within reasonable limits (coefficient of variation < 10%) for 10 s.

CASEM is a supervised machine learning estimator that is trained offline using ridge regression on labeled workload and engagement data. Its goal is to map the multimodal feature vector Z_t to continuous estimates of workload

\hat{C} (t)

and engagement

\hat{E} (t)

, which can be written as

\hat{y_{t}} = [\begin{matrix} \hat{C} (t) \\ \hat{E} (t) \end{matrix}]

where

\hat{C} (t)

is cognitive load and

\hat{E} (t)

is affective engagement, which are both continuous values in the range [0, 1]. These are estimated as:

\hat{y_{t}} = W \cdot Z_{t} + b

(1)

Here, W ∈ R^2×(m+n) is the weight matrix and b ∈ R² is the bias vector, both of which are learned offline by applying ridge regression to a calibration dataset with realistic workload and engagement labels. To cope with high-frequency noise, the estimates are smoothed over a sliding window T_w:

\hat{y_{t}} = \frac{1}{T_{w}} \sum_{T = t - T_{w} + 1}^{t} \hat{y_{T}}

(2)

Realistic workload is defined as the combination of (a) objective task-complexity features of the Java exercises (e.g., number of classes and bugs, branching depth) and (b) the NASA-TLX mental demand rating collected immediately post-trial. Both components are z-scored and averaged to form the workload target. Engagement is measured using the SAM arousal and pleasure indices [39].

The choice of linear regression over more complex nonlinear alternatives was driven by requirements for interpretability, reliability, and low computational latency, with the average inference time being under 1 millisecond.

AICM is the final stage that links

\hat{y_{t}}

with control parameters:

u_{t} = [\begin{matrix} D (t) \\ M (t) \\ A (t) \end{matrix}]

(3)

Rules reflect human-factors principles (optimal arousal; managing mental workload; modality appropriateness). When ŵ is high, we reduce information density and increase scaffolding; when ê dips, we shift to multimodal prompts to regain attention. These rules are grounded in human-factors guidance on optimal arousal, mental workload management, and modality appropriateness [40,41,42,43,44,45,46,47].

Here, D(t) denotes the task difficulty level, M(t) the feedback modality (visual, auditory, or multimodal), and A(t) the assistance frequency (either low or high). These parameters are updated according to transparent, deterministic rules derived from human factors research. Specifically, task difficulty is updated as:

D (t + 1) = \{\begin{matrix} D (t) - δ_{D}, & i f \hat{C} (t) > θ_{C}^{h i g h} \\ D (t) + δ_{D}, & i f \hat{C} (t) < θ_{C}^{l o w} \\ D (t), & o t h e r w i s e \end{matrix}

Thresholds were individualized from each participant’s Baseline (BL) calibration using percentiles: θ^w_L = P₃₀(ŵ), θ^w_H = P₇₀(ŵ); θ^e_L = P₄₀(ê), θ^e_H = P₆₀(ê). When ŵ(t) > θ^w_H, task difficulty D decreased by one level; when it dropped below θ^w_L, difficulty increased. Likewise, if ê(t) < θ^e_L, the system switched to multimodal prompts and higher assistance frequency, reverting to default when ê(t) > θ^e_H. Percentile-based thresholds are commonly used in neuroadaptive HCI to normalize for inter-individual variability [31,43]. In the experimental setup, these thresholds were established individually for each participant during a brief baseline calibration phase. Participants first completed a standardized block of programming tasks without adaptive support, while concurrent workload and engagement measures (EEG, GSR, HRV, and behavioral markers) were recorded. The distributions of the estimated workload ŵ and engagement ê values obtained during this phase were used to compute participant-specific percentile thresholds (e.g., 30th/70th percentile for workload, 40th/60th percentile for engagement). This individualized approach ensured that thresholds reflected personal differences in physiology and interaction style rather than relying on fixed global cut-offs.

While the percentile-based calibration strategy is broadly applicable, the thresholds themselves are not entirely scenario-independent. In highly different application domains (e.g., safety-critical tasks, entertainment systems), recalibration would be necessary to account for differences in task demands, sensor noise, and baseline cognitive states. However, the calibration procedure is lightweight (5–7 min) and can be repeated with minimal burden, making it feasible to deploy across diverse real-world contexts.

The end-to-end latency, measured from the acquisition of X_t to the rendering of u_t, was consistently below 85 milliseconds across all trials, remaining well under the perceptual threshold for delays noticeable to users in interactive systems.

Figure 1 depicts an overall view of the architecture in the form of a logical data flow diagram that outlines the three-tier pipeline, inter-module interactions, and main data transformations. The design’s modularity allows for easy integration of more advanced machine learning models (e.g., recurrent neural networks for temporal processing) or alternative control mechanisms (e.g., reinforcement learning) without sacrificing the system’s essential operational integrity.

The architectural design has been carefully developed to eliminate unnecessary complexity while providing sufficient expressiveness required to cover the variety witnessed in real multimodal HCI systems. Every design choice, ranging from using linear regression for estimation to adopting rule-based policies for control, was driven by the twin mandates of real-time responsiveness and transparency, both of which are crucial in user-centric adaptive systems. Consequently, the architecture is not merely theoretically sound and mathematically correct but also empirically tested to be effective on wearable systems in realistic settings.

NAMI is a robust and flexible architecture that systematically maps perceivable user signals into responsive interface behavior. Its strengths in real-time performance, modularity, and technical soundness make it a platform for ongoing research and development in the area of context-aware, wearable human–computer interaction. We use transparent heuristics to ensure real-time reliability, while Section 6 discusses possible extensions to learned policies. In the current design, multimodal inputs are processed through CASEM to estimate workload and engagement [ŵ, ê], which are then passed to AICM to produce the control vector [D, M, A]^T. This loop closes the cycle from sensing to adaptation, enabling real-time adjustments with consistently low latency.

4. Experimental Setup and Procedure

4.1. Participants

To illustrate the application of the Neuro-Adaptive Multimodal Architecture (NAMI) in a real-world environment, consider a graduate student taking a Java Programming class as part of an Informatics Master’s program. The student is immersed in an AR learning system that overlays instructional material and interactive programming exercises in their field of view. In addition, this system has been equipped with NAMI in order to facilitate the dynamic modification in the student’s cognitive state and activities and improve the overall learning experience.

4.2. Apparatus and Data Acquisition

At the start of the session (labeled as time t₀, the student interacts with the AR headset, coupled with the neuro-adaptive system. This interaction triggers the Sensing and Data Acquisition Module (SAPM), starting the collection of multimodal streams of data. The behavioral indicators include student voice commands (e.g., “next question” or “display hint”), head and hand orientation while handling virtual code blocks, and fixation points of attention on important aspects of the interface. At the same time, neurophysiological sensors record electroencephalogram (EEG) signals via a four-channel headband, thus recording brain activity that captures mental workload, along with GSR and HRV indices recorded from a wrist-worn device, hence reflecting levels of physiological engagement and arousal. These are then synchronized, normalized, and assembled into the multimodal feature vector

Z_{t_{0}}

.

The system extracts multimodal features from all input channels and applies uniform preprocessing. Voice input contributes confidence measures and speaking rate, head and hand movements provide pose and gesture information, gaze data reflect fixation and saccade patterns, while EEG and GSR/HRV capture neural and physiological activity. All signals are resampled to a common 10 Hz rate, filtered for artifacts, and normalized per participant relative to the baseline.

4.3. Interaction Design and Gesture Integration

Within the AR interface, the code canvas, task panel, and feedback pane respond to these inputs: voice initiates high-level commands, gaze determines the active region, and hand gestures confirm or dismiss actions. Gesture probabilities from the depth camera are included in the multimodal feature vector, with actions triggered once a confidence threshold is exceeded. Gesture recognition was tightly integrated into the interaction flow as a primary input modality complementing voice and gaze. The AR system employed a depth camera in combination with a real-time pose estimation algorithm to track hand joints and classify dynamic gestures (e.g., swipe, pinch, rotate). Recognized gestures were mapped to high-level interaction commands, such as confirming a code insertion, dismissing a hint, or rotating a virtual code object for inspection. These commands were directly injected into the multimodal feature vector Z_t alongside voice and gaze features, ensuring synchronized processing across modalities. To reduce noise and false positives, gesture posteriors were accumulated over a 300 ms temporal window, and an action was executed only when the posterior probability exceeded a fixed confidence threshold. This design allowed gestures to operate seamlessly within the neuro-adaptive loop, where CASEM interpreted them as part of the behavioral state while AICM determined whether to reinforce or attenuate gesture-based commands depending on the inferred cognitive load and engagement levels.

4.4. System Overview

Figure 2 illustrates the overall interaction process of the NAMI-based system from a user-centered perspective. The learner, equipped with lightweight wearables (EEG headband, wrist sensor), interacts with the augmented reality interface through multimodal channels such as voice, gesture, and gaze. These behavioral inputs, together with neurophysiological signals (EEG, GSR, HRV), are continuously collected and processed. The Cognitive/Affective State Estimation Module (CASEM) integrates these inputs to infer workload and engagement levels, while the Adaptive Interaction Control Module (AICM) uses these estimates to regulate task difficulty, feedback modality, and assistance frequency. The resulting adaptations are delivered back to the AR interface, dynamically modifying the Task Panel, Code Canvas, and Feedback Panel. In this way, the figure complements the architectural pipeline (Figure 2) by emphasizing the user-facing interaction cycle and clarifying how the neuro-adaptive loop is experienced during real-time programming tasks.

4.5. Procedure and Adaptive Loop

During the course of an exercise that requires the use of a Java class hierarchy, SAPM updates the feature vector Z_t every 100 milliseconds. This vector contains several elements, such as confidence measures for speech recognition, confidence levels for gesture recognition, gaze fixation durations on each code element, EEG power in the theta and beta frequency bands, and normalized GSR levels. Prior affective-computing studies have established that EEG band power and electrodermal activity are reliable indicators of workload and arousal, and that multimodal fusion can improve the robustness of state estimation. Building on these findings, NAMI adopts an early-fusion strategy combined with a regularized linear estimator to balance interpretability with real-time performance. Unlike prior neuro-adaptive systems that (i) rely on a single sensing modality or domain-specific pipelines, (ii) target small, controlled samples, and (iii) omit a reproducible control policy, NAMI integrates feature-level fusion with a regularized linear estimator for low-latency inference, coupled with a transparent rule-based controller operating on individualized thresholds. The prototype achieves <85 ms closed-loop latency on wearable AR hardware and is evaluated with 100 learners, with full specification of features, the mapping F: X_t⟼u_t, and the adaptation policy to ensure reproducibility. This feature vector is then used as the input to CASEM, which uses the learned linear ridge regression model to infer the latent cognitive load

\hat{C} (t)

and engagement

\hat{E} (t)

. To make these estimates reliable, they are smoothed over a two-second sliding window.

Around 10 min into the session (time t₁), CASEM detects a rising cognitive load:

\hat{C} (t_{1}) = 0.82

exceeding the high workload threshold

θ_{C}^{h i g h} = 0.75

. At the same time, engagement

\hat{E} (t_{1}) = 0.48

indicates moderate disengagement, as the student spends longer gazing at irrelevant parts of the interface and shows elevated GSR response. These internal state estimates are fed into AICM.

The AICM utilizes predefined control tactics set within the specified framework. Since

\hat{C} (t_{1}) > θ_{C}^{h i g h}

, the system reduces the difficulty of the task D(t₁) by one level, thus supporting the current activity through increased scaffolding and hiding unnecessary information. In response to moderate disengagement

\hat{E} (t_{1}) < θ_{C}^{l o w}

, the feedback modality M(t₁) shifts from a text-only format to a multimodal one, with verbal prompts and animated directional cues, while the assistance frequency A(t₁) is increased, leading to an increased offering of hints. In practice, three feedback modalities were supported: (i) visual feedback, consisting of on-screen textual hints, highlighted code segments, and animated arrows pointing to relevant UI elements; (ii) auditory feedback, consisting of short synthesized voice prompts that summarized the next step or emphasized errors; and (iii) multimodal feedback, which combined both visual and auditory cues to reinforce critical information. The choice of modality was governed by the AICM, which switched from unimodal to multimodal cues when engagement dropped below the individualized threshold. Assistance adaptation operated on two levels: the frequency of hints and the richness of their content. At low assistance levels, the system provided minimal guidance (e.g., highlighting only the location of an error), whereas at high assistance levels, it delivered step-by-step scaffolding, combining visual highlights with explanatory voice prompts. This adaptive mechanism ensured that support was increased when workload rose or engagement declined, but scaled back once the learner returned to an optimal cognitive state, thereby preventing over-reliance on assistance and maintaining user autonomy.

Therefore, the learner feels less intimidated by the assignment and starts to refocus. Soon after, CASEM detects an optimal cognitive load of

\hat{C} (t_{2}) = 0.61

and an increased level of engagement at

\hat{E} (t_{2}) = 0.71

. Here, AICM makes sure that the adaptation stabilizes and stops making any more modifications.

4.6. Summary

Throughout the entire session duration, NAMI always operates in a closed-loop manner. SAPM sends updated feature vectors Z_t, whereas CASEM processes

\hat{y_{t}} = [\begin{matrix} \hat{C} (t) \\ \hat{E} (t) \end{matrix}]

Additionally, AICM changes

O_{t} = [\begin{matrix} D (t) \\ M (t) \\ A (t) \end{matrix}]

accordingly. The modular, asynchronous design ensures that the system remains responsive, with total latency under 85 ms, imperceptible to the student.

This example illustrates how NAMI enables a context-aware and adaptive learning process in a wearable augmented reality environment, aligning task difficulty and feedback with the cognitive and affective state of postgraduate students learning Java programming. By enabling dynamic adjustment of task difficulty and levels of support for engagement, the architecture helps learners maintain an optimal learning zone, thus maximizing understanding and retention of complex programming principles. Notably, this architectural scheme demonstrates flexibility since it can be applied in any educational software systems in which dynamic adjustments are valuable, thus testifying to its universality and applicability in this context.

5. Evaluation and Results

In an effort to evaluate the effectiveness of the NAMI, an extensive empirical study was carried out within a real educational environment aimed at simulating the expected utilization of the system. This study included 100 postgraduate students (52 male, 48 females; mean age = 27.3, SD = 2.9) who were enrolled in the Postgraduate Program in Informatics and Applications, specifically in the Java Programming course. All participants reported having intermediate to advanced programming skills and previous experience with conventional AR-based interfaces. The main goal of this study was to quantify the effect of NAMI on different dimensions, including task performance, cognitive workload, engagement, satisfaction, and the system’s reliability in a wearable AR learning environment.

This study took place in a human–computer interaction laboratory equipped with workstations that supported AR. All the participants wore a Microsoft HoloLens 2 headset together with light-weight neurophysiological sensors, namely a four-channel EEG headband and a wrist device aimed at capturing GSR and HRV. The AR setting offered a range of interactive Java programming exercises, requiring the participants to perform activities such as object-oriented design, syntax debugging, and algorithm implementation using multimodal interaction, including voice, gaze, and gesture. The tasks were carefully designed to elicit measurable cognitive load while also keeping participants engaged for 40 min. The exercises were grouped into three categories: OO design, syntax/debugging, algorithmic implementation. More specifically, the object-oriented (OO) design tasks required learners to create and extend class hierarchies, correctly applying principles of inheritance, polymorphism, and encapsulation. Syntax/debugging exercises presented code snippets containing seeded errors (e.g., missing semicolons, type mismatches, uninitialized variables), and the students were asked to identify and correct them in real time within the AR environment. Algorithmic implementation exercises focused on writing and refining core algorithms, such as sorting routines (bubble sort, quicksort) and simple recursive methods (factorial, Fibonacci), with performance constraints on time complexity. Each exercise type was embedded in the AR interface through interactive code blocks and multimodal input (voice, gesture, and gaze), ensuring that tasks elicited both conceptual reasoning and procedural coding skills. This variety of tasks was deliberately chosen to balance conceptual design, low-level debugging, and algorithmic thinking, providing a comprehensive assessment of programming proficiency. Difficulty levels were pre-authored and validated in pilot runs: D1 (easy) fewer classes/branches and 1–2 seeded bugs; D2 (moderate) more classes/branches and 3–4 bugs; D3 (hard) deeper nesting and 5–6 bugs. Under BL the level remained fixed per block. Under NA, D(t) adapted online via the rules in Section 3 (e.g., hiding noncritical panels, adding hints or stepping to a simpler variant when ŵ exceeded θ^w_H).

To rigorously compare the baseline and adaptive versions of the system, we designed a within-subjects experiment in which each participant experienced both conditions in counterbalanced order (AB/BA) to control for learning and fatigue effects. Under the BL condition, task difficulty, feedback modality, and assistance frequency were predefined and held constant throughout the session, regardless of the participant’s state. In contrast, under the NA condition, NAMI continuously monitored cognitive load and engagement and adapted these parameters in real time, as described in Section 3. An “always-help” condition was not included, as prior studies in neuroergonomics and adaptive interface design suggest that excessive or continuous assistance may increase cognitive load and visual clutter rather than improving performance. Empirical research has shown that constant on-screen guidance can distract users, fragment attention, and hinder self-regulation when not contextually moderated [22,42,43,46]. Therefore, in the present study, assistance was provided adaptively, based on the estimated workload and engagement values [ŵ, ê], to balance supportive feedback with user autonomy. Each condition lasted 20 min and was followed by a 10 min interstitial break to minimize fatigue.

Objective performance measures included the time spent on each task in combination with the number of errors made. Subjective measures were collected through standardized instruments immediately after each condition: NASA-TLX workload scores (1–100 scale), Self-Assessment Manikin (SAM) engagement ratings (1–9 scale), and user satisfaction ratings (5-point Likert scale). System-level metrics, including adaptation latency and stability, were also automatically recorded by the platform. Statistical analysis made use of paired-sample t-tests with Bonferroni correction, as well as effect size calculation.

The results are summarized in Table 1. Participants showed a significant improvement in task performance speed under the Neuro-Adaptive condition (mean = 267.8 s, SD = 38.7) compared to the BL condition (mean = 312.4 s, SD = 42.1), representing a mean improvement of about 14% (mean difference = 44.6 s, Cohen’s d = 1.10, p < 0.001). In line with prior works on adaptive learning systems [31,32,43], we also considered a practical significance threshold of 10% improvement to ensure that observed differences reflect meaningful gains rather than minor fluctuations. The observed reduction exceeded this threshold, thereby supporting both the statistical and the practical relevance of the effect. Error percentage reduced from 7.9% (SD = 2.3) to 4.8% (SD = 1.7), reflecting a reduction of almost 40% (mean difference = 3.1%, d = 1.53, p < 0.001). Subjective workload showed a significant reduction from a mean NASA-TLX score of 64.2 (SD = 9.4) to 49.7 (SD = 8.1), corresponding to an average drop of 14.5 points (d = 1.65, p < 0.001). Engagement and satisfaction measures also showed significant improvement (all p < 0.001). To account for the within-subjects design, statistics were performed on paired difference scores (NA–BL). Figure 3 visualizes these differences per participant, showing the overall positive trend while also highlighting individual variability, including a few cases with negligible or negative improvement. All reported effects reflected large effect sizes, confirming the robustness of the results.

Table 1 clearly illustrates that the neuro-adaptive changes brought about by NAMI ended up producing improvements that were both subjective and objective. Participants performed tasks more efficiently and accurately, while at the same time feeling a reduced workload with higher levels of engagement and satisfaction.

The results are also illustrated in Figure 4 and Figure 5. Figure 4 shows task completeness time and error rate under different conditions and highlights a significant reduction in both metrics when the NAMI system was active. Standard deviations are represented by error bars and depict both improvements and reduced variation in the neuro-adaptive system.

Figure 5 shows the subjective measures, that is, workload, engagement, and satisfaction. The workload scores significantly decreased under NAMI, but the scores for engagement and satisfaction improved, thus proving that users viewed the system adaptations positively instead of finding them distracting or intrusive.

Along with the main results, a system-level performance evaluation was made to determine both reliability and robustness. Average adaptation latency, from sensor input to the visible alteration in the interface, was found to be 82 milliseconds (SD: 6.3), within real-time constraints relevant for interactive systems. During all sessions, the system showed stability, with no failure or measurable degradation in performance.

Qualitative feedback collected directly from the participants conformed to findings drawn from the quantitative data. Over 90% of the participants preferred the neuro-adaptive system, describing it as “more natural,” “more in sync with my rhythm,” and “less stressful.” These findings are consistent with the engagement and satisfaction measures and highlight the acceptability of the architecture in the context of an actual learning environment.

6. Discussion

The findings of this study highlight the potential of neuro-adaptive multimodal interaction to address long-standing challenges in technology-enhanced learning. Beyond statistical improvements in performance and workload reduction, the results suggest that integrating behavioral and neurophysiological cues can create a more balanced learning experience, where cognitive demands are dynamically aligned with individual learner needs. This reflects a broader shift from static instructional design toward adaptive, learner-centered systems that respond to both observable behavior and latent mental states.

These findings also clarify why the chosen architectural trade-offs were effective in practice. The use of a simple linear estimator and rule-based adaptation policy proved sufficient to capture fluctuations in workload and engagement, as evidenced by the observed reduction in NASA-TLX scores by 22.6% and the 37.3% increase in engagement. This suggests that more complex models are not strictly necessary to obtain robust improvements, although future research may explore probabilistic or reinforcement learning–based controllers to determine whether further personalization can be achieved without sacrificing explainability or latency.

At the same time, the results highlight certain boundary conditions of the current evaluation. This study involved postgraduate informatics students working on programming tasks, where both workload and satisfaction improved consistently across participants. It remains to be tested whether the same architecture would generalize to different learner populations (e.g., novices, younger students, or professionals), to other types of cognitive tasks beyond programming, or to longer-term deployments where habituation or shifts in trust may occur. Such extensions follow naturally from the present findings, which demonstrate clear short-term benefits but do not yet establish how these effects evolve over time or across diverse contexts.

Placing NAMI in the wide-ranging context of relevant research, it is clear that this architecture extends prior work by unifying neurophysiology and multimodal interaction in a wearable, real-time architecture with explicit policies and large-sample evaluation. Multimodal interaction systems of today, such as those investigated by [48], have shown the benefits of combining speech, gesture, and gaze in joint augmented reality scenarios; however, they often treat the user as a black box, disregarding internal emotional or cognitive processes. Advances in affective computing, as found in studies such as [4,49], have shown that external markers, such as facial expressions, prosody, and posture, can affect adaptive behavior; however, these methods typically rely on indirect and imprecise measurements of mental states. While neuroergonomics studies, such as those outlined in [22,50], have confirmed the validity of EEG and fNIRS technologies as means for assessing workload and attention allocation, these systems remain largely limited to controlled laboratory settings and single-modal input streams. On the other hand, the NAMI combines behavioral and neurophysiological cues into a reproducible, modular, and real-time adaptive system, uniquely tailored for wearable devices. Such combination represents an innovative and pragmatic contribution that bridges theoretical models with practical human-centered interaction paradigms and thus enables the emergence of more humane and context-aware computing.

It should also be acknowledged that, despite the counterbalanced design, potential order effects may still have influenced the results. For example, the response time in a participant’s first session might differ from the second due to practice, fatigue, or adaptation to the AR environment, and similar considerations could apply to error rates, workload, and engagement. While the present study did not explicitly analyze first-versus-second session differences, future work could address such factors to further strengthen the validity of neuro-adaptive evaluations.

7. Conclusions and Future Work

This study presented the design, implementation, and evaluation of the Neuro-Adaptive Multimodal Architecture (NAMI) in a realistic programming education scenario. In a study with N = 100 postgraduate learners, the neuro-adaptive condition significantly reduced task completion time (−14%), errors (−39%), and subjective workload (−22%) while increasing engagement (+37%) and satisfaction (+30%) compared to Baseline. These results demonstrate that a simple linear estimator and rule-based controller can provide robust, low-latency adaptations that meaningfully improve both performance and subjective experience in augmented reality programming tasks.

Beyond these empirical findings, this study contributes to the state of the art by showing how neuro-adaptive methods can be applied to programming education, where maintaining optimal cognitive load is critical for learning complex abstractions. Unlike prior affective-computing approaches that rely on indirect behavioral proxies, NAMI integrates multimodal behavioral and neurophysiological signals in a transparent, reproducible pipeline, bridging theory and practice in wearable HCI.

While promising, the results should be interpreted within the boundaries of the present design. This study focused on postgraduate informatics students and short-term programming tasks; future research should investigate generalization to different learner groups, tasks, and longitudinal settings. Further work is also needed to compare the rule-based policy against always-help conditions, to incorporate learning gain measures, and to explore advanced control models such as reinforcement learning. Addressing ethical considerations of privacy, consent, and transparency will be essential for broader deployment.

Author Contributions

Conceptualization, C.P., C.T. and A.K.; methodology, C.P., C.T. and A.K.; software, C.P., C.T. and A.K.; validation, C.P., C.T. and A.K.; formal analysis, C.P., C.T. and A.K.; investigation, C.P., C.T. and A.K.; resources, C.P., C.T. and A.K.; data curation, C.P., C.T. and A.K.; writing—original draft preparation, C.P., C.T. and A.K.; writing—review and editing, C.P., C.T. and A.K.; visualization, C.P., C.T. and A.K.; supervision, C.T.; project administration, C.T., A.K. and C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval are not required for this study, as it exclusively involves the analysis of properly anonymized datasets obtained from past research studies through voluntary participation. This research does not pose a risk of harm to the subjects. All data are handled with the utmost confidentiality and in compliance with ethical standards.

Informed Consent Statement

Informed consent was obtained from all subjects at the time of original data collection.

Data Availability Statement

The data supporting the findings of this study are available upon request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AICM	Adaptive Interface Control Module
AR	Augmented Reality
ASR	Automatic Speech Recognition
BL	Baseline
CASEM	Cognitive/Affective State Estimation Module
EEG	Electroencephalography
fNIRS	functional Near-Infrared Spectroscopy
GSR	Galvanic Skin Response
HCI	Human–Computer Interaction
HR	Heart Rate
HRV	Heart Rate Variability
NA	Neuro-Adaptive
NAMI	Neuro-Adaptive Multimodal Architecture
NASA-TLX	NASA Task Load Index
PPG	Photoplethysmography
PSD	Power Spectral Density
SAM	Self-Assessment Manikin
SCL	Skin Conductance Level
SCR	Skin Conductance Response

References

Jain, N. The Evolution of Human-Computer Interaction in the AI Era. Int. J. Res. Comput. Appl. Inf. Technol. 2025, 8, 144–151. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M.; Troussas, C.; Mylonas, P. Multimodal Interaction, Interfaces, and Communication: A Survey. Multimodal Technol. Interact. 2025, 9, 6. [Google Scholar] [CrossRef]
Ali, S.M.; Noghanian, S.; Khan, Z.U.; Alzahrani, S.; Alharbi, S.; Alhartomi, M.; Alsulami, R. Wearable and Flexible Sensor Devices: Recent Advances in Designs, Fabrication Methods, and Applications. Sensors 2025, 25, 1377. [Google Scholar] [CrossRef]
Pei, G.; Li, H.; Lu, Y.; Wang, Y.; Hua, S.; Li, T. Affective Computing: Recent Advances, Challenges, and Future Trends. Intell. Comput. 2024, 3, 0076. [Google Scholar] [CrossRef]
D’Amelio, T.A.; Galán, L.A.; Maldonado, E.A.; Díaz Barquinero, A.A.; Rodriguez Cuello, J.; Bruno, N.M.; Tagliazucchi, E.; Engemann, D.A. Emotion Recognition Systems with Electrodermal Activity: From Affective Science to Affective Computing. Neurocomputing 2025, 651, 130831. [Google Scholar] [CrossRef]
Liu, X.-Y.; Wang, W.-L.; Liu, M.; Chen, M.-Y.; Pereira, T.; Doda, D.Y.; Ke, Y.-F.; Wang, S.-Y.; Wen, D.; Tong, X.-G.; et al. Recent Applications of EEG-Based Brain-Computer Interface in the Medical Field. Mil. Med. Res. 2025, 12, 14. [Google Scholar] [CrossRef]
Guerrero-Sosa, J.D.T.; Romero, F.P.; Menéndez-Domínguez, V.H.; Serrano-Guerrero, J.; Montoro-Montarroso, A.; Olivas, J.A. A Comprehensive Review of Multimodal Analysis in Education. Appl. Sci. 2025, 15, 5896. [Google Scholar] [CrossRef]
Cheng, S.; Yang, C.; Wang, Q.; Canumalla, A.; Li, J. Becoming a Foodie in Virtual Environments: Simulating and Enhancing the Eating Experience with Wearable Electronics for the Next-Generation VR/AR. Mater. Horiz. 2025, 18, 7160–7191. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Acosta, J.N.; Falcone, G.J.; Rajpurkar, P.; Topol, E.J. Multimodal Biomedical AI. Nat. Med. 2022, 28, 1773–1784. [Google Scholar] [CrossRef]
Martin, D.; Malpica, S.; Gutierrez, D.; Masia, B.; Serrano, A. Multimodality in VR: A Survey. ACM Comput. Surv. 2022, 54, 216. [Google Scholar] [CrossRef]
Cosoli, G.; Poli, A.; Scalise, L.; Spinsante, S. Measurement of Multimodal Physiological Signals for Stimulation Detection by Wearable Devices. Measurement 2021, 184, 109966. [Google Scholar] [CrossRef]
Jens, N.; Agneta, G.; Magnus, H.; Marianne, G. Early or Synchronized Gestures Facilitate Speech Recall—A Study Based on Motion Capture Data. Front. Psychol. 2024, 15, 1345906. [Google Scholar] [CrossRef]
Gibbs, J.K.; Gillies, M.; Pan, X. A Comparison of the Effects of Haptic and Visual Feedback on Presence in Virtual Reality. Int. J. Hum.-Comput. Stud. 2022, 157, 102717. [Google Scholar] [CrossRef]
Elepfandt, M.; Grund, M. Move It There, or Not? The Design of Voice Commands for Gaze with Speech. In Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction (Gaze-In ‘12), Santa Monica, CA, USA, 26 October 2012; Association for Computing Machinery: New York, NY, USA, 2012; pp. 1–3. [Google Scholar] [CrossRef]
Krouska, A.; Troussas, C.; Virvou, M. Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-Trained Word Embedding. In Machine Learning Paradigms: Learning and Analytics in Intelligent Systems; Tsihrintzis, G., Jain, L., Eds.; Springer: Cham, Switzerland, 2020; Volume 18, pp. 97–114. [Google Scholar] [CrossRef]
Filippini, J.S.; Varona, J.; Manresa-Yee, C. Real-Time Analysis of Facial Expressions for Mood Estimation. Appl. Sci. 2024, 14, 6173. [Google Scholar] [CrossRef]
Kulke, L.; Feyerabend, D.; Schacht, A. A Comparison of the Affectiva iMotions Facial Expression Analysis Software with EMG for Identifying Facial Expressions of Emotion. Front. Psychol. 2020, 11, 329. [Google Scholar] [CrossRef] [PubMed]
Corrales-Astorgano, M.; González-Ferreras, C.; Escudero-Mancebo, D.; Cardeñoso-Payo, V. Prosodic Feature Analysis for Automatic Speech Assessment and Individual Report Generation in People with Down Syndrome. Appl. Sci. 2024, 14, 293. [Google Scholar] [CrossRef]
Hirst, D. Speech Prosody: From Acoustics to Interpretation; Series: Prosody, Phonology and Phonetics; Springer: Berlin/Heidelberg, Germany, 2024; ISBN 978-3-642-40771-0 (Hardcover)/978-3-642-40772-7 (eBook); ISSN 2197-8700, E-ISSN 2197-8719. [Google Scholar] [CrossRef]
Orphanidou, C. Signal Quality Assessment in Physiological Monitoring: State of the Art and Practical Considerations; Series: SpringerBriefs in Bioengineering; Springer: Cham, Switzerland, 2017; ISBN 978-3-319-68414-7 (Softcover)/978-3-319-68415-4 (eBook); ISSN 2193-097X, E-ISSN 2193-0988. [Google Scholar] [CrossRef]
Somasundaram, S.K.; Sridevi, S.; Murugappan, M.; VinothKumar, B. Continuous Physiological Signal Monitoring Using Wearables for the Early Detection of Infectious Diseases: A Review. In Surveillance, Prevention, and Control of Infectious Diseases; Chowdhury, M.E.H., Kiranyaz, S., Eds.; Springer: Cham, Switzerland, 2024; pp. 145–160. [Google Scholar] [CrossRef]
Mehta, R.K.; Parasuraman, R. Neuroergonomics: A Review of Applications to Physical and Cognitive Work. Front. Hum. Neurosci. 2013, 7, 889. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Aghajani, H.; Garbey, M.; Omurtag, A. Measuring Mental Workload with EEG+fNIRS. Front. Hum. Neurosci. 2017, 11, 359. [Google Scholar] [CrossRef] [PubMed]
Flanagan, K.; Saikia, M.J. Consumer-Grade Electroencephalogram and Functional Near-Infrared Spectroscopy Neurofeedback Technologies for Mental Health and Wellbeing. Sensors 2023, 23, 8482. [Google Scholar] [CrossRef]
Pereira, E.; Sigcha, L.; Silva, E.; Sampaio, A.; Costa, N.; Costa, N. Capturing Mental Workload Through Physiological Sensors in Human–Robot Collaboration: A Systematic Literature Review. Appl. Sci. 2025, 15, 3317. [Google Scholar] [CrossRef]
Hebbar, P.A.; Vinod, S.; Shah, A.K.; Pashilkar, A.A.; Biswas, P. Cognitive Load Estimation in VR Flight Simulator. J. Eye Mov. Res. 2022, 15, 1–16. [Google Scholar] [CrossRef]
Liu, Y.; Ayaz, H. Speech Recognition via fNIRS Based Brain Signals. Front. Neurosci. 2018, 12, 695. [Google Scholar] [CrossRef]
Emish, M.; Young, S.D. Remote Wearable Neuroimaging Devices for Health Monitoring and Neurophenotyping: A Scoping Review. Biomimetics 2024, 9, 237. [Google Scholar] [CrossRef]
Sabio, J.; Williams, N.S.; McArthur, G.M.; Badcock, N.A. A Scoping Review on the Use of Consumer-Grade EEG Devices for Research. PLoS ONE 2024, 19, e0291186. [Google Scholar] [CrossRef]
Lau-Zhu, A.; Lau, M.P.H.; McLoughlin, G. Mobile EEG in research on neurodevelopmental disorders: Opportunities and challenges. Dev. Cogn. Neurosci. 2019, 36, 100635. [Google Scholar] [CrossRef]
Beauchemin, N.; Charland, P.; Karran, A.; Boasen, J.; Tadson, B.; Sénécal, S.; Léger, P.-M. Enhancing Learning Experiences: EEG-Based Passive BCI System Adapts Learning Speed to Cognitive Load in Real-Time, with Motivation as Catalyst. Front. Hum. Neurosci. 2024, 18, 1416683. [Google Scholar] [CrossRef] [PubMed]
Mark, J.A.; Kraft, A.E.; Ziegler, M.D.; Ayaz, H. Neuroadaptive Training via fNIRS in Flight Simulators. Front. Neuroergon. 2022, 3, 820523. [Google Scholar] [CrossRef] [PubMed]
Spapé, M.; Ahmed, I.; Harjunen, V.; Jacucci, G.; Ravaja, N. A neuroadaptive interface shows intentional control alters the experience of time. Sci. Rep. 2025, 15, 9495. [Google Scholar] [CrossRef]
Gkintoni, E.; Dimakos, I.; Halkiopoulos, C.; Antonopoulou, H. Contributions of Neuroscience to Educational Praxis: A Systematic Review. Emerg. Sci. J. 2023, 7, 146–158. [Google Scholar] [CrossRef]
Boffet, A.; Arsac, L.M.; Ibanez, V.; Sauvet, F.; Deschodt-Arsac, V. Detection of Cognitive Load Modulation by EDA and HRV. Sensors 2025, 25, 2343. [Google Scholar] [CrossRef]
Urrestilla, N.; St-Onge, D. Measuring Cognitive Load: Heart-rate Variability and Pupillometry Assessment. In Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI ‘20 Companion), Virtual, 25–29 October 2020; Association for Computing Machinery: New York, NY, USA, 2021; pp. 405–410. [Google Scholar] [CrossRef]
Martins, N.R.A.; Simon, A.; Spengler, C.M.; Rossi, R.M. Fatigue Monitoring Through Wearables: A State-of-the-Art Review. Front. Physiol. 2021, 12, 790292. [Google Scholar] [CrossRef]
Sukumaran, A.; Manoharan, A. Student Engagement Recognition: Comprehensive Analysis Through EEG and Verification by Image Traits Using Deep Learning Techniques. IEEE Access 2025, 13, 11639–11662. [Google Scholar] [CrossRef]
Bradley, M.M.; Lang, P.J. Measuring emotion: The self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 1994, 25, 49–59. [Google Scholar] [CrossRef] [PubMed]
Beerendonk, L.; Mejías, J.; Nuiten, S.; De Gee, J.; Fahrenfort, J.J.; Van Gaal, S. A disinhibitory circuit mechanism explains a general principle of peak performance during mid-level arousal. Proc. Natl. Acad. Sci. USA 2023, 121, 2312898121. [Google Scholar] [CrossRef] [PubMed]
Budnik-Przybylska, D.; Syty, P.; Kaźmierczak, M.; Przybylski, J.; Doliński, Ł.; Łabuda, M.; Jasik, P.; Kastrau, A.; Di Fronso, S.; Bertollo, M. Psychophysiological strategies for enhancing performance through imagery–skin conductance level analysis in guided vs. self-produced imagery. Sci. Rep. 2024, 14, 5197. [Google Scholar] [CrossRef]
Causse, M.; Lepron, E.; Mandrick, K.; Peysakhovich, V.; Berry, I.; Callan, D.; Rémy, F. Facing successfully high mental workload and stressors: An fMRI study. Hum. Brain Mapp. 2021, 43, 1011–1031. [Google Scholar] [CrossRef]
Dehais, F.; Lafont, A.; Roy, R.; Fairclough, S. A Neuroergonomics Approach to Mental Workload, Engagement and Human Performance. Front. Neurosci. 2020, 14, 268. [Google Scholar] [CrossRef] [PubMed]
Faller, J.; Cummings, J.; Saproo, S.; Sajda, P. Regulation of arousal via online neurofeedback improves human performance in a demanding sensory-motor task. Proc. Natl. Acad. Sci. USA 2018, 116, 6482–6490. [Google Scholar] [CrossRef]
Gee, J.; Mridha, Z.; Hudson, M.; Shi, Y.; Ramsaywak, H.; Smith, S.; Karediya, N.; Thompson, M.; Jaspe, K.; Jiang, H.; et al. Strategic stabilization of arousal boosts sustained attention. Curr. Biol. 2024, 34, 4114–4128. [Google Scholar] [CrossRef]
Mandrick, K.; Peysakhovich, V.; Rémy, F.; Lepron, E.; Causse, M. Neural and psychophysiological correlates of human performance under stress and high mental workload. Biol. Psychol. 2016, 121, 62–73. [Google Scholar] [CrossRef]
Zsidó, A. The effect of emotional arousal on visual attentional performance: A systematic review. Psychol. Res. 2023, 88, 1–24. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Zhao, H.; Shi, C.; Wu, Y.; Yu, X.; Ren, W.; Zhang, Z.; Shi, X. Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making. Systems 2024, 12, 7. [Google Scholar] [CrossRef]
Udahemuka, G.; Djouani, K.; Kurien, A.M. Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review. Appl. Sci. 2024, 14, 8071. [Google Scholar] [CrossRef]
Diarra, M.; Theurel, J.; Paty, B. Systematic Review of Neurophysiological Assessment Techniques and Metrics for Mental Workload Evaluation in Real-World Settings. Front. Neuroergon. 2025, 6, 1584736. [Google Scholar] [CrossRef]

Figure 1. NAMI pipeline.

Figure 2. User-centered overview of NAMI, showing multimodal inputs and neurophysiological signals processed by CASEM and AICM to adapt the AR interface.

Figure 3. Per-participant differences between NA and BL.

Figure 4. Task Completion Time and Error Rate under Baseline and Neuro-Adaptive Conditions.

Figure 5. Subjective Workload, Engagement, and Satisfaction Ratings under Baseline and Neuro-Adaptive Conditions.

Table 1. Comparison of Baseline and Neuro-Adaptive Conditions.

Measure	Baseline (Mean ± SD)	Neuro-Adaptive (Mean ± SD)	Difference (%)
Task Completion Time (s)	312.4 ± 42.1	267.8 ± 38.7	−14.3%
Error Rate (%)	7.9 ± 2.3	4.8 ± 1.7	−39.2%
NASA-TLX Workload	64.2 ± 9.4	49.7 ± 8.1	−22.6%
SAM Engagement (1–9)	5.1 ± 1.0	7.0 ± 0.8	+37.3%
Satisfaction (1–5)	3.6 ± 0.6	4.7 ± 0.4	+30.6%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Papakostas, C.; Troussas, C.; Krouska, A.; Sgouropoulou, C. NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction. Multimodal Technol. Interact. 2025, 9, 108. https://doi.org/10.3390/mti9100108

AMA Style

Papakostas C, Troussas C, Krouska A, Sgouropoulou C. NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction. Multimodal Technologies and Interaction. 2025; 9(10):108. https://doi.org/10.3390/mti9100108

Chicago/Turabian Style

Papakostas, Christos, Christos Troussas, Akrivi Krouska, and Cleo Sgouropoulou. 2025. "NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction" Multimodal Technologies and Interaction 9, no. 10: 108. https://doi.org/10.3390/mti9100108

APA Style

Papakostas, C., Troussas, C., Krouska, A., & Sgouropoulou, C. (2025). NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction. Multimodal Technologies and Interaction, 9(10), 108. https://doi.org/10.3390/mti9100108

Article Menu

NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction

Abstract

1. Introduction

2. Related Work

3. System Architecture

4. Experimental Setup and Procedure

4.1. Participants

4.2. Apparatus and Data Acquisition

4.3. Interaction Design and Gesture Integration

4.4. System Overview

4.5. Procedure and Adaptive Loop

4.6. Summary

5. Evaluation and Results

6. Discussion

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI