IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI

Arranz-Romero, Javier; Roig-Vila, Rosabel; Cazorla, Miguel

doi:10.3390/app16052515

Open AccessArticle

IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI

by

Javier Arranz-Romero

^*

,

Rosabel Roig-Vila

and

Miguel Cazorla

Faculty of Education, University of Alicante, P.O. Box 99, 03690 Alicante, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(5), 2515; https://doi.org/10.3390/app16052515

Submission received: 6 February 2026 / Revised: 27 February 2026 / Accepted: 3 March 2026 / Published: 5 March 2026

(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)

Download

Browse Figures

Versions Notes

Abstract

Adaptive learning systems increasingly rely on multimodal affective computing, yet many pipelines remain difficult to audit and pedagogically justify. We introduce NAILF (Neuro-Adaptive Artificial Intelligent Learning Flow) and formalise IPA 2.0 as an interpretable continuous index integrating affective valence/intensity with attentional activation into a traceable intermediate signal for neuro-adaptive decision-making. Validation follows a two-level strategy. Study A performs a structured simulation over the full emotion–attention space (108 configurations), demonstrating numerical stability and coherent monotonic behaviour under controlled parameterisation. Study B evaluates external validity on the DIPSEER in-the-wild classroom dataset using subject-wise temporal calibration (lag/windowing/smoothing), hold-out evaluation, and explicit anti-leakage auditing. Across evaluable subjects (n = 172), Fisher-z aggregation shows a small but significant association between IPA 2.0 and an external engagement criterion (r_global = 0.166, 95% CI [0.017, 0.308]). A heterogeneous strong-signal subset (n = 25, r_eval ≥ 0.50) supports personalised calibration as a core design principle. We discuss practical implications: IPA 2.0 is not a sole predictor, but an auditable signal that can gate, rank, and explain adaptive interventions under real-world noise and label–signal asynchrony.

Keywords:

multimodal affective computing; student engagement; attention estimation; academic emotions; interpretable learning analytics; neuro-adaptive learning; adaptive educational systems; in-the-wild classroom data; temporal calibration; educational AI

1. Introduction

Effective personalisation of digital learning requires robust and continuous inference of students’ affective and attentional states while the activity is taking place. These states are not static: they fluctuate over short time scales and depend on both the task and the instructional context. In the educational field, there is ample evidence that academic emotions directly influence motivation, cognitive effort and self-regulation, with all being processes closely linked to executive functions and the affective modulation of cognition [1,2]. Complementarily, engagement has been established as an integrative construct that reflects sustained attention, involvement, and persistence in the task [3]. In parallel, multimodal affective computing has made significant advances in the automatic inference of emotions and attentional states from video, audio, text and physiological signals. However, recent reviews agree on persistent limitations when these systems are transferred to real educational contexts: difficulties in generalisation “in the wild”, dependence on heterogeneous protocols, poor comparability between studies and, in particular, limited traceability between automatic inference and the educational decisions that supposedly use it [4,5,6]. Consequently, high classification performance alone does not guarantee pedagogical validity or applied legitimacy. Within educational artificial intelligence (AI), the growth of adaptive learning systems, intelligent tutors and learning analytics has intensified this tension between prediction and decision-making [7,8]. Although numerous approaches model emotion and attention as relevant variables, they often do so separately, and the transition from these inferences to concrete pedagogical actions remains implicit, unaudited or dependent on opaque heuristics. This disconnect hinders both scientific replication and the ethical and educational evaluation of neuro-adaptive systems. The present work addresses this gap by proposing an integrated and interpretable approach that connects affective and attentional inference with traceable pedagogical decisions. Specifically, it introduces the Neuro-Adaptive Artificial Intelligent Learning Flow (NAILF) framework and formalises the Learning Improvement Index (IPA 2.0) as a continuous and interpretable metric that integrates emotional valence and intensity with attentional activation, explicitly incorporating terms of discordance and neuro-alignment. Unlike purely predictive approaches, IPA 2.0 is designed as an auditable intermediate signal that allows instructional adaptations to be justified through an explicit decision loop (state → rule → action). A central methodological challenge in this type of proposal is to avoid fragile inferences based exclusively on noisy empirical data or spurious correlations. To address this, the article adopts a validation strategy on two complementary levels. First, structural validation is performed using biologically informed simulation (Study A), which orthogonally scans the emotion–attention state space to assess the internal consistency, numerical stability, and limit behaviour of the index. Second, external empirical validation is carried out (Study B) using the DIPSEER multimodal classroom dataset, collected under real ecological conditions, incorporating temporal calibration per subject and explicit information leakage controls [9]. The aim of this work is not to present a final educational product or demonstrate a causal impact on academic performance, but rather to provide the scientific–operational core that makes a rigorous neuro-educational adaptation viable: (i) an integrated and interpretable metric of emotion and attention, (ii) a reproducible multimodal pipeline with methodological guarantees, and (iii) initial empirical evidence under real classroom conditions. The rest of the article is organised as follows. Section 2 reviews the theoretical framework and related work, identifying the design requirements derived from the state of the art. Section 3 describes the materials and methods, including the formalisation of IPA 2.0, the NAILF framework and the validation protocols. Section 4 presents the results of the simulation (Study A) and empirical validation with DIPSEER (Study B). Section 5 discusses the implications, limitations and future directions, while Section 6 concludes the work.

2. Theoretical Framework and Related Work

2.1. Academic Emotions and Learning: From Theory to Operationalisation

Academic emotions are a key determinant of self-regulated learning. Control-value theory states that students’ emotional experience emerges from their cognitive evaluation of control and value over the task, directly influencing motivation, strategy use and performance [2]. Psychometric instruments such as the Achievement Emotions Questionnaire (AEQ) and its abbreviated versions have made it possible to operationalise these emotions in formal educational contexts, providing robust evidence of their relationship with cognitive engagement and achievement [10,11]. However, most of these measurements rely on retrospective self-reports or discrete scales, which limits their temporal granularity and direct applicability to adaptive systems in near real time. This limitation has motivated interest in continuous and computationally tractable approaches to emotion, especially in the field of affective computing, where emotional intensity is modelled as a dynamic variable inferred from observable signals [4]. In this context, there remains a need to connect well-established theoretical constructs with interpretable numerical representations that can be integrated into automated pedagogical decisions without losing conceptual validity.

2.2. Attention, Self-Regulation and Engagement in Technology-Mediated Learning Environments

Attention is a functional system that supports the selection, maintenance and control of cognitive focus during activity and is closely linked to executive and self-regulation processes [1,12]. In education, the construct of engagement has emerged as an operational approach that integrates attentional, behavioural and, in some models, emotional components [3]. In digital environments, attention and engagement are increasingly inferred from behavioural and multimodal signals, such as interaction patterns, posture, gaze, facial expressions and physiological indicators. Previous studies hold that attention cannot be modelled as a binary state, but rather as a dynamic process that fluctuates over time [4]. Consequently, the analysis of these states requires continuous scales and consistent time windows, a capability that modern AI techniques have effectively begun to address [4,13]. However, many approaches treat attention as an isolated output, without explicitly integrating it with emotional state or pedagogical rules that explain how such inference translates into educational action. At the same time, capturing attentional signals using wearable devices and educational IoT instrumentation, and the use of eye tracking as a complementary modality, have been explored as promising avenues for personalisation and adaptive feedback in real time [14,15]. Furthermore, the literature on mind-wandering underlines that attention fluctuates naturally and should not be treated as a dichotomous state, but rather as a context-sensitive continuum [16].

2.3. Multimodal Recognition of Emotion and Attention: Recent Advances and Persistent Limitations

Multimodal emotion and attention recognition has made remarkable progress thanks to deep models and heterogeneous signal fusion strategies. Recent reviews document substantial improvements in accuracy and robustness when combining video, audio, text and physiological data, as well as the growth of specialised datasets for training and evaluation [4,5,6]. A recent example of cross-modal platforms in specialised educational settings [17] reports a cross-modal approach for personalised art education, reinforcing the relevance of traceable integration when translating affective signals into educational decisions. Despite these advances, critical challenges remain when systems are evaluated in real classroom conditions. Among the most important are: (i) degradation of performance “in the wild” due to lighting variability, occlusions, and movement; (ii) asynchrony between signals and labelling; (iii) inter-individual heterogeneity; and (iv) difficulty in comparing results across studies due to the use of non-standardised protocols and metrics. In response, multiple studies recommend prioritising ecological validations, explicit information leakage controls, and interpretable components that facilitate system auditing [4,5]. In a similar vein, semantic fusion approaches and computer vision pipelines have been explored to infer affective states in educational contexts, combining facial expressions, contextual cues, and textual or interaction data [18,19]. However, in many cases, inference remains disconnected from explicit pedagogical logic, limiting its usefulness beyond recognition per se.

2.4. Educational AI and Neuro-Adaptation: From Prediction to Traceable Decision

Educational AI has rapidly evolved towards adaptive learning systems, intelligent tutors and AI-assisted assessment, with promises of large-scale personalisation [7,8]. However, a persistent criticism is that most of these systems emphasise prediction (e.g., classification of states or performance) without explaining how these predictions inform specific pedagogical decisions or under what criteria these decisions can be considered justified or auditable. In contexts where potentially sensitive multimodal data is processed, this lack of traceability becomes a methodological and ethical problem. Consequently, recent research highlights the need for frameworks that explicitly separate the inference of student status from the adaptation logic, allowing the step (status → rule → action) to be documented and its educational validity to be evaluated [20,21]. From this perspective, interpretability ceases to be an optional attribute of the model and becomes a design requirement. This umbrella encompasses, for example, virtual educational systems aimed at improving efficiency in school contexts [22], reviews of strategies for improving engagement in AI-supported hybrid education [23], empirical evidence on AI-enhanced online courses and their relationship with student perception and participation [24], and recent reviews on the use of AI to support assessment processes in computer science education [25]. Additionally, an association has been observed between engagement and consumption of educational audiovisual content, which is relevant for video-mediated learning environments [26].

2.5. Critical Synthesis and Design Requirements Derived from the State of the Art

The preceding review highlights a clear gap between advances in multimodal recognition and their effective integration into adaptive educational systems. In summary, three design requirements are identified that are not fully covered by the state of the art:

R1. Interpretable integration: emotion and attention must be combined in a common metric that preserves psychological meaning and allows for pedagogical interpretation, avoiding fragmented or purely classificatory outputs.
R2. Robust pipeline: multimodal inference must be based on reproducible protocols, with explicit controls for temporal asynchrony and information leakage.
R3. Two-level validation: it is necessary to distinguish between the structural coherence of the model and ecological empirical evidence, avoiding undue extrapolations from only one of these levels.

2.6. From State of the Art to System: NAILF as an Operational Framework

Requirements R1–R3 are addressed in this work not as an abstract theoretical proposal, but as a criterion of applied educational engineering. In this sense, the Neuro-Adaptive Artificial Intelligent Learning Flow (NAILF) framework is conceived as an operational architecture that organises multimodal acquisition, state inference and pedagogical decision-making in an explicit and traceable manner. In NAILF, the Learning Improvement Index (IPA 2.0) acts as an integrated signal that synthesises emotion and attention in time windows, allowing parameterisable adaptive policies to be activated and each intervention to be audited. The aim of this article is not to exhaustively describe a complete technological ecosystem, but to provide the formalisation of the index, the associated inference pipeline, and a replicable validation strategy that can serve as a scientific basis for subsequent neuro-adaptive systems. Figure 1 provides an overview of the PAIDEIA ecosystem, placing NAILF within a broader educational AI architecture that connects multimodal data acquisition, integrated state inference, and adaptive decision layers.

Within NAILF, the Learning Flow Processor (LFP) implements the transition from state estimation to pedagogical action, as illustrated in Figure 2.

3. Materials and Methods

3.1. Study Objectives and Research Questions

The overall objective of this work is to formalise and validate an integrated and interpretable approach for inferring emotional and attentional states in educational contexts, and to evaluate its usefulness as an intermediate signal for traceable neuro-adaptive decisions. In particular, the Learning Improvement Index (IPA 2.0) is proposed as a continuous metric that synthesises both dimensions within the NAILF framework. Based on this objective, the following research questions are posed:

R1. How can multimodal emotional and attentional indicators be integrated into a single, interpretable signal that is operational in near real time?
R2. Does the IPA 2.0 exhibit internal consistency and numerical stability when exploring the complete space of emotion–attention states under plausible constraints (Study A)?
Q3. What evidence of convergent validity does IPA 2.0 show against an external criterion of engagement in an ecological empirical validation with human subjects (Study B)?
P4. What methodological decisions (windows, temporal calibration, leakage controls) are necessary for rigorous validation in “in the wild” contexts?

These questions explicitly delimit the scope of the study: the structural and convergent validity of the index is evaluated, not the causal impact of educational interventions.

3.2. Study Design and Validation Logic

This work adopts an applied methodological design with two complementary levels of evidence:

Study A (pre-empirical): structural validation through biologically informed simulation, aimed at examining the internal consistency, numerical stability and limit behaviour of the IPA 2.0.
Study B (empirical): external validation with human subjects using the DIPSEER dataset, collected in real classroom conditions, aimed at estimating convergent validity under a robust protocol.

This separation avoids a frequent but methodologically weak inference: assuming that an index is valid solely because it correlates with a noisy empirical target, or, conversely, extrapolating educational utility from mathematical consistency. Study A is interpreted as a theoretical ceiling/floor analysis, while Study B provides applied evidence under ecological constraints.

3.3. Formalisation of the Learning Improvement Index (IPA 2.0)

3.3.1. General Definition

The IPA 2.0 is defined as a continuous metric that integrates emotional and attentional information into a single, interpretable scale designed to guide adaptive pedagogical decisions within NAILF. The index combines four main components: (1) weighted emotional valence; (2) emotional intensity; (3) weighted attentional activation; and (4) terms of adjustment for discordance and neuro-alignment. These components are described in the following sections, and their general formulation is as follows:

I P A = (w_{e} \cdot I_{e}) + (w_{a} \cdot A_{a}) - δ + F N P

(1)

In NAILF, IPA 2.0 functions as a decision variable within a traceable loop (state → rule → action), enabling auditable adaptive interventions (Figure 3). Figure 4 illustrates the broader multi-agent architecture envisaged in NAILF, including governance and validation components. These elements provide an operational context but are not subject to empirical validation in the present study.

In particular, the “inclusive” dimension aligns with the Universal Design for Learning framework as a reference for accessibility and pedagogical flexibility in digital environments [27].

3.3.2. Emotional Component

The term w_e represents the weight assigned to the detected emotional category, based on control-value theory and neuro-educational evidence on the differential impact of academic emotions on learning [2,20,28]. Five aggregate categories with symmetrical and explicit weights are used. Table 1 shows the values used in the experiments. Emotional intensity I_e is modelled on a continuous scale from 1 to 10. Although instruments such as the AEQ use shorter Likert scales [10,11], the extended scale allows for greater computational granularity and is common in affective computing when intensity is inferred from multimodal signals [4]. This decision does not redefine the psychological construct, but rather provides an operational numerical representation of it. In addition, Appendix A includes material with the strong-signal subset (

r_{e v a l} \geq 0.50

) in Study B (Phase 0C calibrated).

3.3.3. Attention Component

The term w_a corresponds to the weight assigned to the type of attention, based on an operational adaptation of classical models of attention [1,12]. Categories are defined with ordered weights that reflect their relative contribution to the task focus (e.g., focused/active, constructive, sustained, selective, external attention). Table 2 shows the weights used in the experiment.

A_a attentional activation is represented on a continuous scale from 1 to 5, where higher values indicate greater stability and effective duration of attentional focus. This representation is consistent with recent work modelling attention as a dynamic and continuous process in learning analytics [4,29].

3.3.4. Adjustment Terms: Discordance and Neuro-Alignment

The term δ models affect–attentional discordance, understood as potential interference between emotional processing and task focus (e.g., high activation with intense negative emotion). It is implemented as an explicit penalty in the equation, and in the studies presented, it is set to a constant value to standardise comparisons and subject the index to a conservative regime. The PAIDEIA Neuro-alignment Factor (FNP) is a discrete term that captures synergy or functional friction between emotion and attention (+0.5 or −0.3). In Studies A and B, it is set at its maximum positive value to preserve scale stability; its conditional activation is reserved for sensitivity analysis and future deployments. Table 3 shows the activation values as a function of attention type.

Intuitively, δ acts as a bounded “discordance brake”: when the affective context suggests friction, δ explicitly discounts the contribution of attentional activation so that high activation is not automatically interpreted as beneficial engagement. In our implementation, δ is bounded to

δ \in [0, 0.30]

, and we set

δ = 0.30

in Studies A and B as an upper-bound stress-test regime (i.e., a conservative setting chosen to standardise comparisons rather than an empirically tuned optimum). This keeps the adjustment fully traceable (a direct

- δ

term) while reducing degrees of freedom and avoiding post-hoc tuning. To address sensitivity and “necessity” concerns, Appendix A, Table A10 reports both a

δ = 0

counterfactual and a rule-based conditional activation of

δ

/FNP; under these checks, the induced ordering over the 108-state grid remains essentially unchanged (Spearman

ρ = 0.99994

for conditional activation;

ρ = 1.000

for

δ = 0

), indicating that the reported structural behaviour is not driven by fine-tuned

δ

choices while preserving

δ

as an auditable safeguard for future deployments.

3.4. Study A: Structural Simulation of the Emotion–Attention Space

Study A consists of a computational simulation that runs through the Cartesian product of 9 academic emotional states and 12 types of attention, generating 108 cognitive-affective scenarios. For each scenario, the IPA 2.0 is calculated using maximum values of emotional intensity and attentional activation, together with the defined weights. The objective is not to emulate human behaviour, but to examine the structural response of the index: ranges, symmetries, numerical stability and relative contribution of each dimension. The results are interpreted as theoretical limits and expected patterns, not as empirical evidence.

3.5. Study B: External Empirical Validation with the DIPSEER Dataset

3.5.1. Dataset and Ecological Context

For empirical validation, we used the public DIPSEER dataset, collected in a real university classroom and published in Science Data Bank [9]. The dataset integrates multi-angle RGB video and smartwatch signals per student, along with emotion and engagement/attention time tags generated through self-reporting and expert consensus. DIPSEER was chosen based on three criteria: (i) sufficient multimodality for integrated inference, (ii) ecological conditions “in the wild” and (iii) dual emotion–attention labelling that allows for the study of their co-dependence.

3.5.2. Operationalisation and Mapping to IPA 2.0

For reproducibility, DIPSEER annotations are operationalised via deterministic, declarative rules: (i) self-reported attention (1–5) is used as

A_{a} \in [1, 5]

; (ii) the external criterion

T_{1}

is defined as the median of the four expert labelers (split-source:

T_{1}

is not used in IPA computation); and (iii) academic emotion labels are grouped into IPA categories and mapped to

w_{e}

as in Table Table 1, and emotional intensity

I_{e}

is taken on a 1–10 scale when available or linearly rescaled with the original range explicitly reported. In Study B, validation does not re-derive IPA within the calibration script; it uses {subject}_ipa.csv as a reproducible preprocessing artifact (IPA already computed) and evaluates it against

T_{1}

. The Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11, Table A12 attention taxonomy and its weights (Table 2) and the attention-activation mapping (Table 3) are used as part of the index design and structural sweep (Study A) and not as a native DIPSEER label. Full mapping rules are provided in the Appendix A (Table A12 and Table A13).

3.5.3. Time Windows, Calibration and Leakage Prevention

Given the natural asynchrony between sensors and labelling, a representation by time windows (1 s and 5 s) and a time calibration procedure per subject with strict time division are adopted: 30% of data for calibration and 70% for evaluation. The calibration grid (lag, window, smoothing) is fixed and predefined. The evaluation is based exclusively on the hold-out section. Eligibility filters are applied to avoid undefined correlations on almost constant series. To prevent information leakage, the external engagement criterion is kept completely separate from the IPA inputs. Programmatic verification ensures there is no overlap between target columns and input variables, guaranteeing a split-source design.

3.6. Metrics and Statistical Analysis

The convergent validity of IPA 2.0 is assessed using Pearson correlations per subject and aggregation using Fisher-z transformation. Confidence intervals of 95% are reported and cases excluded for statistical reasons (e.g., zero variance) are explicitly documented. Complementary analyses of functional utility (e.g., incorporation of the IPA as an additional variable in predictive models) are interpreted as secondary evidence, not as primary proof of validity.

3.7. Ethical and Data Governance Considerations

Study B is based exclusively on secondary analysis of a public dataset with prior ethical approval. No new data are collected and no participants are identified. The analysis is conducted under the principles of minimisation, pseudonymisation and regulatory compliance (GDPR), and the traceability of the pipeline is considered an integral part of the method.

4. Results

4.1. Study A: Structural Validation Through Biologically Informed Simulation

Study A evaluates the internal behaviour of IPA 2.0 through a systematic simulation of the emotion–attention state space. Given its pre-empirical nature, the results are interpreted as structural properties of the index, not as evidence of educational effectiveness.

4.1.1. Coverage of the State Space and Range of the Index

The simulation runs through the complete orthogonal design defined by 9 academic emotional states and 12 types of attention, generating 108 cognitive-affective scenarios. For each scenario, the IPA 2.0 is calculated under maximum values of emotional intensity and attentional activation, with fixed adjustment terms to standardise comparisons. Table 4 shows a representative subset corresponding to the emotional category Joy

(w_{e} = 1.0)

, illustrating the monotonic response of the index along the attentional axis.

The complete values for the remaining emotional categories are provided as Appendix A (see Appendix A.1) to preserve the readability of the main manuscript. Figure 5 shows the heatmap.

4.1.2. Numerical Stability and Relative Contribution of Emotion and Attention

Table 5 presents aggregated descriptive statistics of the IPA 2.0 by emotional category. Given the orthogonal design, the distribution of the attentional component is identical across all groups, confirming the methodological cleanliness of the simulation.

Full 108-state simulated grids for the remaining emotion categories are reported in Appendix A, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9. The standard deviation 4.95 indicates that the attentional component, scaled by activation (1–5), shifts the index substantially from the base emotional term, validating the vector equilibrium of the IPA 2.0. Meanwhile, Table 6 shows the Pearson correlation matrix between the structural components of the index.

This pattern confirms that IPA 2.0 integrates both dimensions without collapse: emotion establishes the overall direction of the state, while attention modulates its operational intensity. To calculate the Pearson coefficients presented in this matrix, vector dimensionality reduction was performed. Given that the adaptation of the Goldberg–Posner model includes semantically distinct terminology but with identical weighting (e.g., Focused Attention and Active Attention both with

w_{a} = + 2.0

), the statistical analysis was performed by collapsing the redundant categories to work on the space of unique weights. This procedure avoids the artificial inflation of variance explained by duplicate labels, resulting in an effective matrix of 9*8 independent vectors. For comparison, the calculation on the complete extended matrix of 108 scenarios (retaining semantic redundancies) yields marginally different values (

r_{e m o} = 0.860; r_{a t n} = 0.510

), confirming the structural stability of the model regardless of the selected vector grouping.

We add a robustness/sensitivity block (Appendix A, Table A10) over the 108 configurations: ±20% weight perturbation (Monte Carlo

n = 10, 000

): median Spearman

ρ = 0.9912

(P5 = 0.9863; P95 = 0.9950; min = 0.9781); near-ceiling

I_{e} / A_{a}

noise (

n = 5000

): median

ρ = 0.9967

; mid-range stress-test noise (

n = 5000

): median

ρ = 0.9411

. Additionally,

δ = 0

preserves ranking (

ρ = 1.000

) and rule-based conditional

δ

/FNP activation preserves ordering (

ρ = 0.99994

).

4.2. Study B: External Empirical Validation with DIPSEER

4.2.1. Cohort, Eligibility, and Anti-Leakage Controls

Of the 405 subjects processed with an available target, 179 met the eligibility criteria based on minimum variance of the external criterion. Of these, 172 subjects produced an assessable correlation and were included in the global aggregation. Table 7 presents these values.

The split-source design ensured that there was no overlap between the IPA input variables and the external criterion.

4.2.2. Convergent Validity of the IPA 2.0

We report two complementary estimators: (a) the arithmetic mean of subject-wise correlations (mean

r_{e v a l} = 0.131

) as a descriptive statistic and (b) the primary aggregated effect using Fisher-z (

r_{g l o b a l} = 0.166

; 95% CI [0.017, 0.308]). Given in-the-wild classroom noise and inherent label–signal asynchrony, we interpret

r_{g l o b a l}

as convergent validity evidence for an auditable intermediate signal rather than a sole decision variable.

The distribution of individual correlations has a mean of 0.131 and a standard deviation of 0.342, reflecting high inter-individual heterogeneity. Aggregation using Fisher-z transformation produces the overall effect shown in Table 8.

The confidence interval excludes zero, indicating a statistically significant positive association under ecological conditions.

4.2.3. Temporal Calibration and Parameter Stability

Temporal calibration most frequently selected 5 s windows and smoothing, consistent with the episodic nature of engagement. These values are shown in Table 9.

We add a descriptive characterisation using calibration parameters (lag/window/ smoothing) and subject-wise variability patterns, consistent with temporal alignment and signal quality heterogeneity, without making causal claims (Appendix A, Table A11).

We include Sy as a functional baseline already computed in Phase 0C: sensor-only vs sensor + IPA as an interpretable feature (

n = 139

TF subjects). Statistics: baseline accuracy mean = 0.4981 (median = 0.5238); sensor + IPA mean = 0.4905 (median = 0.4706); Δaccuracy mean = −0.0076 (median = 0.0000); improved in 56/139. This is reported as a minimal comparative reference, not as a predictive-superiority claim (see Table A14).

5. Discussion

5.1. Summary of Main Findings

This paper introduces the Learning Improvement Index (IPA 2.0) as an integrated and interpretable metric that combines emotion and attention, and places it within the NAILF operational framework as an intermediate signal for traceable neuro-adaptive decisions. The evidence presented is articulated at two complementary levels: pre-empirical structural validation through simulation (Study A) and external empirical validation in real ecological conditions with the DIPSEER dataset (Study B). The results of Study A show that IPA 2.0 exhibits internal consistency, numerical stability, and orderly behaviour when traversing the entire space of emotion–attention states. In particular, the simulation confirms that the index integrates both dimensions without collapse: emotion establishes the overall direction of the state, while attention modulates its operational intensity. This analysis provides a necessary theoretical ceiling for interpreting subsequent empirical results and avoids inferences based on numerically unstable configurations. In Study B, the IPA 2.0 shows an overall positive association with an external criterion of engagement/attention under ecological conditions (r_global = 0.166, 95%CI[0.017, 0.308]), using a protocol that includes subject-wise temporal calibration, hold-out evaluation, and explicit anti-leakage auditing. Although the effect size is modest, the confidence interval excludes zero and a strong signal subset is observed (25 subjects with r_eval ≥ 0.50), supporting the convergent validity of the index in scenarios where measurement is more stable.

5.2. Interpretation of Effect Size in “In the Wild” Contexts

A central aspect of the interpretation of Study B is the observed effect size. In datasets collected in real classroom conditions, with discrete targets, interindividual heterogeneity, and sensor-label asynchrony, high correlations are not expected to be systematically obtained. Engagement is a multifactorial construct, influenced by variables not observed in this study (instructional design, content, social dynamics, and student expectations), which introduces structural noise into any external criterion. From this perspective, a positive and statistically significant aggregate effect should be interpreted as evidence of a signal, not as an upper limit of potential performance. The coexistence of a moderate r_global with a strong signal subset suggests a pattern of moderation: when the alignment between emotion, attention, and external labelling is greater, the convergence of IPA 2.0 emerges more clearly. This pattern is consistent with recent literature on engagement and learning analytics in real digital contexts, where interindividual variability is the norm rather than the exception.

5.3. Methodological Robustness and Interpretation of Temporal Calibration

One of the methodological contributions of this work is the explicit nature of decisions designed to avoid effect inflation. The use of variance eligibility filters, temporal calibration per subject with strict temporal division, and split-source auditing reduce the risk of spurious correlations and circularity between index inputs and external criteria. The fact that the calibration frequently selects 5 s windows and temporal smoothing provides relevant information about the dynamics of classroom engagement: the attentional state assessed by experts manifests itself as an episodic, not instantaneous, process. This finding has direct implications for real neuro-adaptive deployments, where decisions that are overly reactive to temporal noise can be counterproductive. Likewise, setting the terms of discordance (δ) and neuro-alignment (FNP) as constants in the main analyses acts as a conservative mechanism: these terms cannot induce spurious associations with the external criterion, and their function is to stabilise the index scale under multimodal uncertainty. Their conditional activation is reserved for sensitivity analyses and future deployments with auditable rules.

5.4. Implications for the Design of Neuro-Adaptive Systems

Beyond the validity of the index itself, this work makes a conceptual contribution to the design of educational AI systems: the need to explicitly separate inference, integration, and pedagogical decision-making. In NAILF, IPA 2.0 does not act as a final classifier, but as an integrated signal that feeds a traceable loop (state → rule → action). This separation allows decisions to be audited, adaptations to be justified, and their impact to be evaluated without confusing predictive performance with educational validity. From this perspective, the interpretability of IPA 2.0 is not an accessory attribute, but a functional requirement. The ability to explain why a state is considered optimal, improvable or risky, in terms of emotion and attention, is essential both for pedagogical evaluation and for the ethical governance of systems that process potentially sensitive multimodal data.

5.5. Summary of Results

The results show that IPA 2.0:

1.: Exhibits structural consistency and numerical stability (Study A).
2.: Exhibits positive convergent validity under ecological conditions (Study B).
3.: Shows interindividual heterogeneity with a strong signal subset.

5.6. Response to Research Questions

The results allow us to answer the questions posed in Section 3:

R1. The IPA 2.0 demonstrates it is possible to integrate emotion and attention into a continuous and interpretable signal, that is operational in time windows and suitable for traceable decisions.
R2. Structural simulation (Study A) confirms the internal consistency and numerical stability of the index when traversing the entire space of emotion–attention states.
Q3. Empirical validation (Study B) provides evidence of positive convergent validity against an external criterion of engagement in ecological conditions, under a robust protocol.
P4. The results underscore the importance of temporal calibration, source separation, and hold-out evaluation as necessary conditions for credible validation “in the wild.”

5.7. Limitations and Future Directions

This work has intentional limitations that define a clear programme for future research. First, although evidence of the convergent validity of IPA 2.0 is provided, the causal impact of pedagogical interventions activated by the index is not evaluated. This impact should be analysed using controlled or quasi-experimental designs in prospective deployments. Second, the present study does not systematically compare multimodal recognition architectures or fusion strategies, as the focus is on validating the integrated index, not on the end-to-end performance of the pipeline. This comparison is a priority for future research, along with ablation analysis, robustness in the absence of modalities, and bias and fairness audits. We did not include emotion-only (IPA_E) or attention-only (IPA_A) ablations; such decompositions address component attribution rather than the validity claims pursued here.

Finally, generalisation across contexts (face-to-face classroom captured by DIPSEER and online environments) requires evaluating domain transfer, temporal stability and data governance in multi-centre and multi-device scenarios.

6. Conclusions

This paper presents the Learning Improvement Index (IPA 2.0) as an integrated and interpretable metric that combines emotion and attention, and places it within the NAILF framework as an intermediate signal for traceable neuro-adaptive decisions in educational contexts. In contrast to approaches focused exclusively on prediction, the article proposes and validates an alternative that is oriented towards interpretability, auditability, and applied validity. The contribution is twofold: (i) structural simulation supports the internal coherence and numerical stability of IPA 2.0 across the emotion–attention space; and (ii) DIPSEER in-the-wild validation supports convergent validity under subject-wise temporal calibration, hold-out evaluation, and explicit anti-leakage controls. IPA 2.0 is positioned as an interpretable, calibratable signal within NAILF for traceable neuro-adaptive decisions under real-world constraints.

The empirical results show an overall positive association between IPA 2.0 and an external criterion of engagement/attention, along with marked interindividual heterogeneity and a strong signal subset. This pattern is consistent with the multifactorial nature of engagement and the limitations inherent in data collected “in the wild”. It reinforces the interpretation of IPA 2.0 as an integrated, calibratable, and contextual signal, rather than an isolated predictor or a comprehensive measure of learning. Beyond the metric itself, this work highlights the importance of explicitly separating inference, integration, and pedagogical decision-making in educational AI systems. By formalising IPA 2.0 as an intermediate variable within a traceable loop (state → rule → action), this work provides a methodological basis for neuro-adaptive systems that can be evaluated, audited and justified from an educational and ethical perspective. Taken together, the findings support IPA 2.0 as a viable component for multimodal affective computing pipelines designed for educational adaptation, provided that it is interpreted under explicit measurement conditions and with awareness of interindividual heterogeneity. Future work includes completing the end-to-end evidence of multimodal recognition, prospectively evaluating the impact of index-activated interventions, and analysing their generalisation across diverse contexts and platforms.

Author Contributions

Investigation and conceptualisation, J.A.-R., R.R.-V., and M.C.; methodology, J.A.-R., with supervision from R.R.-V. and M.C.; formal analysis, J.A.-R.; investigation, J.A.-R.; data curation, J.A.-R.; visualisation, J.A.-R.; writing—original draft preparation, J.A.-R.; writing—review and editing, J.A.-R., R.R.-V., and M.C.; supervision, R.R.-V. and M.C.; and project administration, J.A.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been carried out under the framework of the grant PID2022-138453OB-I00 funded by MICIU/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of University of Alicante (protocol code UA-2023-07-31 and 31 July 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Appendix Tables

Table A1. Strong-signal subset (

r_{e v a l}

≥ 0.50) in Study B (Phase 0C calibrated).

Table A1. Strong-signal subset (

r_{e v a l}

≥ 0.50) in Study B (Phase 0C calibrated).

Subject ID	$r_{eval}$	Best Lag	Best Window (s)	Smoo-Thing	$n_{total}$
group_03_exp_05_subject_05	0.969	$- 1$	5	3	93,924
group_02_experiment_07_subject_04	0.941	0	1	3	94,179
group_02_experiment_07_subject_21	0.917	0	1	3	94,778
group_02_experiment_01_subject_11	0.881	0	5	3	93,659
group_02_experiment_08_subject_08	0.860	0	5	None	93,469
group_03_exp_08_subject_08	0.804	1	5	3	93,634
group_02_experiment_03_subject_03	0.773	0	5	3	93,963
group_02_experiment_09_subject_09	0.771	0	5	3	93,866
group_01_experiment_07_subject_14	0.763	$- 1$	5	3	92,137
group_03_exp_06_subject_06	0.745	$- 2$	5	3	93,651
group_03_exp_07_subject_04	0.740	2	5	3	93,837
group_02_experiment_05_subject_18	0.720	0	1	None	74,696
group_02_experiment_07_subject_08	0.710	0	1	None	93,936
group_02_experiment_03_subject_12	0.686	1	5	3	94,060
group_01_experiment_07_subject_01	0.649	0	1	3	55,858
group_02_experiment_02_subject_13	0.608	0	5	None	93,888
group_1_experiment_2_subject_13	0.608	0	5	None	93,888
group_02_experiment_07_subject_01	0.599	1	5	3	2399
group_1_experiment_6_subject_13	0.590	$- 2$	5	3	93,444
group_02_experiment_07_subject_02	0.571	0	5	None	93,825
group_03_exp_02_subject_11	0.558	$- 1$	5	3	94,764
group_01_experiment_08_subject_20	0.532	$- 2$	5	None	2364
group_02_experiment_08_subject_01	0.530	0	1	3	2429
group_02_experiment_02_subject_17	0.526	0	1	None	93,819
group_1_experiment_2_subject_17	0.526	0	1	None	93,819

Table A2. Simulated IPA values for the emotion category “Hope” (

w_{e}

as defined in IPA 2.0).

Table A2. Simulated IPA values for the emotion category “Hope” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	20.20
Sustained	1.6	0.30	0.50	18.20
Selective	1.4	0.30	0.50	17.20
Alternating	1.2	0.30	0.50	16.20
Divided	0.8	0.30	0.50	14.20
Reactive	0.4	0.30	0.50	12.20
Active	2.0	0.30	0.50	20.20
Constructive	1.8	0.30	0.50	19.20
Interactive	1.2	0.30	0.50	16.20
External	$- 1.0$	0.30	0.50	5.20
Internal	$- 0.5$	0.30	0.50	7.70
Shared	0.0	0.30	0.50	10.20

Table A3. Simulated IPA values for the emotion category “Pride” (

w_{e}

as defined in IPA 2.0).

Table A3. Simulated IPA values for the emotion category “Pride” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	17.20
Sustained	1.6	0.30	0.50	15.20
Selective	1.4	0.30	0.50	14.20
Alternating	1.2	0.30	0.50	13.20
Divided	0.8	0.30	0.50	11.20
Reactive	0.4	0.30	0.50	9.20
Active	2.0	0.30	0.50	17.20
Constructive	1.8	0.30	0.50	16.20
Interactive	1.2	0.30	0.50	13.20
External	−1.0	0.30	0.50	2.20
Internal	−0.5	0.30	0.50	4.70
Shared	0.0	0.30	0.50	7.20

Table A4. Simulated IPA values for the emotion category “Relief” (

w_{e}

as defined in IPA 2.0).

Table A4. Simulated IPA values for the emotion category “Relief” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	17.20
Sustained	1.6	0.30	0.50	15.20
Selective	1.4	0.30	0.50	14.20
Alternating	1.2	0.30	0.50	13.20
Divided	0.8	0.30	0.50	11.20
Reactive	0.4	0.30	0.50	9.20
Active	2.0	0.30	0.50	17.20
Constructive	1.8	0.30	0.50	16.20
Interactive	1.2	0.30	0.50	13.20
External	−1.0	0.30	0.50	2.20
Internal	−0.5	0.30	0.50	4.70
Shared	0.0	0.30	0.50	7.20

Table A5. Simulated IPA values for the emotion category “Anxiety” (

w_{e}

as defined in IPA 2.0).

Table A5. Simulated IPA values for the emotion category “Anxiety” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	5.20
Sustained	1.6	0.30	0.50	3.20
Selective	1.4	0.30	0.50	2.20
Alternating	1.2	0.30	0.50	1.20
Divided	0.8	0.30	0.50	−0.80
Reactive	0.4	0.30	0.50	−2.80
Active	2.0	0.30	0.50	5.20
Constructive	1.8	0.30	0.50	4.20
Interactive	1.2	0.30	0.50	1.20
External	−1.0	0.30	0.50	−9.80
Internal	−0.5	0.30	0.50	−7.30
Shared	0.0	0.30	0.50	−4.80

Table A6. Simulated IPA values for the emotion category “Anger” (

w_{e}

as defined in IPA 2.0).

Table A6. Simulated IPA values for the emotion category “Anger” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	5.20
Sustained	1.6	0.30	0.50	3.20
Selective	1.4	0.30	0.50	2.20
Alternating	1.2	0.30	0.50	1.20
Divided	0.8	0.30	0.50	−0.80
Reactive	0.4	0.30	0.50	−2.80
Active	2.0	0.30	0.50	5.20
Constructive	1.8	0.30	0.50	4.20
Interactive	1.2	0.30	0.50	1.20
External	−1.0	0.30	0.50	−9.80
Internal	−0.5	0.30	0.50	−7.30
Shared	0.0	0.30	0.50	−4.80

Table A7. Simulated IPA values for the emotion category “Shame” (

w_{e}

as defined in IPA 2.0).

Table A7. Simulated IPA values for the emotion category “Shame” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	2.20
Sustained	1.6	0.30	0.50	0.20
Selective	1.4	0.30	0.50	−0.80
Alternating	1.2	0.30	0.50	−1.80
Divided	0.8	0.30	0.50	−3.80
Reactive	0.4	0.30	0.50	−5.80
Active	2.0	0.30	0.50	2.20
Constructive	1.8	0.30	0.50	1.20
Interactive	1.2	0.30	0.50	−1.80
External	−1.0	0.30	0.50	−12.80
Internal	−0.5	0.30	0.50	−10.30
Shared	0.0	0.30	0.50	−7.80

Table A8. Simulated IPA values for the emotion category “Sadness” (

w_{e}

as defined in IPA 2.0).

Table A8. Simulated IPA values for the emotion category “Sadness” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	2.20
Sustained	1.6	0.30	0.50	0.20
Selective	1.4	0.30	0.50	−0.80
Alternating	1.2	0.30	0.50	−1.80
Divided	0.8	0.30	0.50	−3.80
Reactive	0.4	0.30	0.50	−5.80
Active	2.0	0.30	0.50	2.20
Constructive	1.8	0.30	0.50	1.20
Interactive	1.2	0.30	0.50	−1.80
External	−1.0	0.30	0.50	−12.80
Internal	−0.5	0.30	0.50	−10.30
Shared	0.0	0.30	0.50	−7.80

Table A9. Simulated IPA values for the emotion category “Boredom” (

w_{e}

as defined in IPA 2.0).

Table A9. Simulated IPA values for the emotion category “Boredom” (

w_{e}

as defined in IPA 2.0).

Attention Type	w_a	Discordance (δ)	FNP	IPA
Focused	2.0	0.30	0.50	0.20
Sustained	1.6	0.30	0.50	−1.80
Selective	1.4	0.30	0.50	−2.80
Alternating	1.2	0.30	0.50	−3.80
Divided	0.8	0.30	0.50	−5.80
Reactive	0.4	0.30	0.50	−7.80
Active	2.0	0.30	0.50	0.20
Constructive	1.8	0.30	0.50	−0.80
Interactive	1.2	0.30	0.50	−3.80
External	−1.0	0.30	0.50	−14.80
Internal	−0.5	0.30	0.50	−12.30
Shared	0.0	0.30	0.50	−9.80

Appendix A.1. Study A Robustness and Sensitivity (Monte Carlo)

Fixed

δ

and FNP: In Studies A and B, we intentionally keep

δ

and FNP fixed to reduce degrees of freedom and preserve comparability. To address sensitivity, we report (Table A10) with (i) a rule-based conditional activation of

δ

/FNP and (ii) an explicit

δ = 0

counterfactual. Across the 108-state grid, rank ordering is preserved under conditional activation (Spearman

ρ = 0.99994

), and setting

δ = 0

yields an identical ranking (

ρ = 1.000

).

Table A10. Study A robustness and sensitivity checks over the 108-state grid (Monte Carlo).

Check	What It Validates	Configuration	Metric	Result
Sx-1 Axioms	Structural coherence (not psychological construct validity)	108-grid; weights from Table 1, Table 2 and Table 3; δ = 0.30; FNP = +0.50	Constraint compliance	100% (108/108)
Sx-2 Weight perturbation ±20%	Robustness to parameterisation	Monte Carlo n = 10,000; factors U[0.8, 1.2] applied per weight	Spearman’s $ρ$ (median; P5–P95; min)	0.9912; 0.9863–0.9950; min = 0.9781
Sx-3 Noise (near-ceiling regime)	Sensitivity to measurement noise	Monte Carlo n = 5000; Gaussian noise per state; clipping to valid ranges	Spearman’s $ρ$ (median; P5–P95; min)	0.9967; 0.9951–0.9978; min = 0.9928
Sx-3b Noise (mid-range regime)	Sensitivity under non-ceiling scenarios	Monte Carlo n = 5000; $I_{e}$ ≈ 6, $A_{a}$ ≈ 3; higher variance; clipping	Spearman’s $ρ$ (median; P5–P95; min)	0.9411; 0.9247–0.9553; min = 0.9011
Sx-4 Conditional $δ$ /FNP	Sensitivity of adjustment terms	Auditable sign-alignment rule (see script)	Spearman’s $ρ$ vs. baseline	0.99994
Sx-4b $δ$ = 0	Impact of removing the penalty term	$δ$ = 0 constant; FNP = +0.50 constant	Spearman’s $ρ$ vs. baseline	1.0000 (identical ranking)

Appendix A.2. Strong-Signal Subset Descriptive Breakdown (Study B)

Strong-signal subset: We add a descriptive characterisation using calibration parameters (lag/window/smoothing) and subject-wise variability patterns, consistent with temporal alignment and signal quality heterogeneity, without making causal claims.

Table A11. Descriptive breakdown of the strong-signal subset by calibration parameters and variability (Study B).

Parameter	Value	n (Evaluable, n = 172)	n (Strong Subset, n = 25)
Window	1 s	46	8
Window	5 s	126	17
Smoothing	None	57	9
Smoothing	3	115	16
Lag	−2	36	3
Lag	−1	21	3
Lag	0	45	15
Lag	+1	26	3
Lag	+2	44	1
std_target ( $T_{1}$ variability)	median [P25, P75]	0.470 [0.399, 0.542]	0.468 [0.414, 0.575]
unique_levels	median [P25, P75]	3 [3, 5]	4 [3, 5]
n_samples_eval	median [P25, P75]	43 [42, 198]	43 [42, 206]

Appendix A.3. Declarative Mapping DIPSEER A→B IPA 2.0 (Reproducibility)

Mapping specification (Section 3.5.2): We add declarative mapping tables (Table A12 and Table A13) and operationalisation text clarifying (i) the external criterion

T_{1}

(median of 4 labellers), (ii) the

A_{a}

input (self-report 1–5), and (iii) the reproducible artifact {subject}_ipa.csv where IPA is materialised prior to calibration.

Table A12. Declarative mapping from DIPSEER academic emotion labels to IPA categories and weights (

w_{e}

).

Table A12. Declarative mapping from DIPSEER academic emotion labels to IPA categories and weights (

w_{e}

).

Emotion Label (DIPSEER/AEQ Schema)	IPA Category (5-Group)	Valence	$w_{e}$	Traceability Note
Joy	Joy/Hope	+	+1.0	Operative mapping used to compute `emotion_weight`; see Table 1
Hope	Joy/Hope	+	+1.0
Pride	Pride/Relief	+	+0.7
Relief	Pride/Relief	+	+0.7
Anxiety	Anxiety/Anger	−	−0.5
Anger	Anxiety/Anger	−	−0.5
Shame	Shame/Sadness	−	−0.8
Sadness	Shame/Sadness	−	−0.8
Boredom	Boredom	−	−1.0	Anchor negative-deactivating extreme.

Table A13. IPA 2.0 inputs, transformations, and traceability to DIPSEER/preprocessing artifacts.

Variable in DIPSEER/Artifact	IPA 2.0 Input	Transformation	Traceability Comment
Emotional intensity (self-report/annotation)	$I_{e}$ (1–10)	Identity if already 1–10; otherwise linear rescaling	Original range explicitly reported.
Attention/Engagement (self-report)	$A_{a}$ (1–5)	Identity (1–5)	Subject-wise temporal calibration (lag/window/smoothing). strong-signal subset (Phase 0C ) 590
External engagement criterion (4 labelers)	Target $T_{1}$	Median of labelers	Split-source: $T_{1}$ is not used in IPA computation; only for validation.

Appendix A.4. Phase 0C TensorFlow Baseline (Sensor-Only vs. Sensor + IPA)

Comparisons to inference/fusion strategies: Since predictive superiority is not the objective, we include a functional baseline already computed in Phase 0C (Sy): sensor-only vs. the same model augmented with IPA as an interpretable feature. The results are heterogeneous (

n = 139

TF subjects; improved in 56/139; median Δaccuracy = 0.000).

Table A14. Phase 0C TensorFlow baseline summary (sensor-only vs. sensor+IPA).

Metrics	Value
TF subjects with metrics (n)	139
Accuracy baseline (mean; median)	0.4981; 0.5238
Accuracy sensor+IPA (mean; median)	0.4905; 0.4706
δaccuracy (mean; median)	−0.0076; 0.0000
δaccuracy (Q1; Q3)	−0.1820; 0.1693
Improves/Stays the same/Worsens	56/17/66
Range δaccuracy (min; max)	−0.7188; 0.7188

References

Goldberg, E. The Executive Brain: Frontal Lobes and the Civilized Mind; Oxford University Press: New York, NY, USA, 2001. [Google Scholar]
Pekrun, R. The control-value theory of achievement emotions: Assumptions, corollaries, and implications for educational research and practice. Educ. Psychol. Rev. 2006, 18, 315–341. [Google Scholar] [CrossRef]
Fredricks, J.A.; Blumenfeld, P.C.; Paris, A.H. School engagement: Potential of the concept, state of the evidence. Rev. Educ. Res. 2004, 74, 59–109. [Google Scholar] [CrossRef]
D’mello, S.K.; Kory, J. A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. (CSUR) 2015, 47, 43. [Google Scholar] [CrossRef]
Lian, H.; Lu, C.; Li, S.; Zhao, Y.; Tang, C.; Zong, Y. A survey of deep learning-based multimodal emotion recognition: Speech, text, and face. Entropy 2023, 25, 1440. [Google Scholar] [CrossRef]
Ramaswamy, M.P.A.; Palaniswamy, S. Multimodal emotion recognition: A comprehensive review, trends, and challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2024, 14, e1563. [Google Scholar] [CrossRef]
Kabudi, T.; Pappas, I.; Olsen, D.H. AI-enabled adaptive learning systems: A systematic mapping of the literature. Comput. Educ. Artif. Intell. 2021, 2, 100017. [Google Scholar] [CrossRef]
Alam, A. Harnessing the power of AI to create intelligent tutoring systems for enhanced classroom experience and improved learning outcomes. In Intelligent Communication Technologies and Virtual Mobile Networks; Springer Nature: Singapore, 2023. [Google Scholar]
Marquez-Carpintero, L.; Suescun-Ferrandiz, S.; Álvarez, C.L.; Fernandez-Herrero, J.; Viejo, D.; Roig-Vila, R.; Cazorla, M. DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild. arXiv 2025, arXiv:2502.20209. [Google Scholar] [CrossRef]
Pekrun, R.; Goetz, T.; Frenzel, A.C.; Barchfeld, P.; Perry, R.P. Measuring emotions in students’ learning and performance: The Achievement Emotions Questionnaire (AEQ). Contemp. Educ. Psychol. 2011, 36, 36–48. [Google Scholar] [CrossRef]
Bieleke, M.; Gogol, K.; Goetz, T.; Daniels, L.; Pekrun, R. The AEQ-S: A short version of the Achievement Emotions Questionnaire. Contemp. Educ. Psychol. 2021, 65, 101940. [Google Scholar] [CrossRef]
Posner, M.I.; Petersen, S.E. The attention system of the human brain. Annu. Rev. Neurosci. 1990, 13, 25–42. [Google Scholar] [CrossRef]
Zhai, X.; Chu, X.; Chai, C.S.; Jong, M.S.Y.; Istenic, A.; Spector, M.; Liu, J.B.; Yuan, J.; Li, Y. A Review of Artificial Intelligence (AI) in Education from 2010 to 2020. Complexity 2021, 2021, 8812542. [Google Scholar] [CrossRef]
Camacho, V.L.; de la Guía, E.; Olivares, T.; Flores, M.J.; Orozco-Barbosa, L. Data capture and multimodal learning analytics focused on engagement with a new wearable IoT approach. IEEE Trans. Learn. Technol. 2020, 13, 704–717. [Google Scholar] [CrossRef]
da Silva Soares, R., Jr.; Oku, A.Y.A.; Barreto, C.d.S.F.; Sato, J.R. Exploring the potential of eye tracking on personalized learning and real-time feedback in modern education. Prog. Brain Res. 2023, 282, 49–70. [Google Scholar] [PubMed]
Smallwood, J.; Schooler, J.W. The restless mind. Psychol. Bull. 2006, 132, 946–958. [Google Scholar] [CrossRef] [PubMed]
Ding, Y.; Li, Z.; Zou, Y.; Dong, X. A DeepSeek cross-modal platform for personalized art education in Autism Spectrum Disorder. Sci. Rep. 2025, 15, 44800. [Google Scholar] [CrossRef]
Cárdenas-López, H.M.; Zatarain-Cabada, R.; Barrón-Estrada, M.L.; Mitre-Hernández, H. Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition. Soft Comput. 2023, 27, 17357–17367. [Google Scholar] [CrossRef]
Goel, A.; Karim, R.; Singh, U.; Kumar, R. A Review On Emotion Identification Using YOLO and DeepSORT. In Proceedings of the 2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT); IEEE: New York, NY, USA, 2024; Volume 5, pp. 430–434. [Google Scholar]
Tyng, C.M.; Amin, H.U.; Saad, M.N.; Malik, A.S. The influences of emotion on learning and memory. Front. Psychol. 2017, 8, 235933. [Google Scholar] [CrossRef]
Zaibout, N.; Madrane, M.; Khamlichi, L. Towards Advanced Digital Assessments: Artificial Intelligence, Gamification, And Learning Analytics. Int. J. Tech. Phys. Probl. Eng. (IJTPE) 2024, 16, 93–105. [Google Scholar]
Alonso-Secades, V.; López-Rivero, A.J.; Martín-Merino-Acera, M.; Ruiz-García, M.J.; Arranz-García, O. Designing an intelligent virtual educational system to improve the efficiency of primary education in developing countries. Electronics 2022, 11, 1487. [Google Scholar] [CrossRef]
Almusaed, A.; Almssad, A.; Yitmen, I.; Homod, R.Z. Enhancing student engagement: Harnessing “AIED”’s power in hybrid education—A review analysis. Educ. Sci. 2023, 13, 632. [Google Scholar] [CrossRef]
Sadegh-Zadeh, S.A.; Movahhedi, T.; Hajiyavand, A.M.; Dearn, K.D. Exploring undergraduates’ perceptions of and engagement in an AI-enhanced online course. Front. Educ. 2023, 8, 1252543. [Google Scholar] [CrossRef]
dos Santos, S.C.; Junior, G.A. Opportunities and Challenges of AI to Support Student Assessment in Computing Education: A Systematic Literature Review. CSEDU 2024, 2, 15–26. [Google Scholar]
Amashi, R.; Koppikar, U.; Vijayalakshmi, M. Investigating the Association Between Student Engagement With Video Content and Their Learnings. IEEE Trans. Educ. 2023, 66, 479–486. [Google Scholar] [CrossRef]
Rose, D.H.; Meyer, A. Teaching Every Student in the Digital Age: Universal Design for Learning; ERIC: Alexandria, VA, USA, 2002. [Google Scholar]
Immordino-Yang, M.H.; Damasio, A. We feel, therefore we learn: The relevance of affective and social neuroscience to education. Mind Brain Educ. 2007, 1, 3–10. [Google Scholar] [CrossRef]
Dewan, M.; Murshed, M.; Lin, F. Engagement detection in online learning: A review. Smart Learn. Environ. 2019, 6, 1. [Google Scholar] [CrossRef]

Figure 1. PAIDEIA ecosystem overview: data acquisition, multimodal inference, IPA 2.0 integration, and adaptive decision layer.

Figure 2. Learning Flow Processor (LFP): orchestration pipeline from state estimation to traceable pedagogical actions.

Figure 3. NAILF decision loop: state → rule → action with logging and auditability (colors are used only to visually distinguish elements; no quantitative meaning is implied).

Figure 4. NAILF multi-agent layer: physiological/emotional/educational/inclusive/evaluation agents and validator (XAI governance) (colors are used only to visually distinguish elements; no quantitative meaning is implied).

Figure 5. Heatmap visualisation of the simulated IPA 2.0 index across 108 combined emotional and attentional states based on the formal equation.

Table 1. Emotion category weights (

w_{e}

) used in IPA 2.0.

Table 1. Emotion category weights (

w_{e}

) used in IPA 2.0.

Emotional Category	Emotion Weight
Joy/Hope	+1.0
Pride/Relief	+0.7
Anxiety/Anger	−0.5
Shame/Sadness	−0.8
Boredom	−1.0

Table 2. Attention type weights

w_{a}

as used in IPA 2.0. Weights are ordered to reflect their relative contribution to task-focused cognitive engagement, based on classical models of attention and previous literature on educational neuro-science.

Table 2. Attention type weights

w_{a}

as used in IPA 2.0. Weights are ordered to reflect their relative contribution to task-focused cognitive engagement, based on classical models of attention and previous literature on educational neuro-science.

Attention Type	$W_{a}$
Focused/Active	+2.0
Constructive	+1.8
Executive/Sustained	+1.6
Selective	+1.4
Alternating/Interactive	+1.2
Fragmented/Divided	+0.8
Reactive	+0.4
Shared	0.
Internal	$- 0.5$
External	$- 1.0$

Table 3. Attention activation mapping (1–5) used for numerical stability.

Type of Attention	Attention Activation (1–5)
Focused attention	5.0
Sustained attention	4.47
Selective attention	4.20
Alternating attention	3.93
Divided attention	3.40
Reactive attention	2.87
Active attention	5.00
Constructive attention	4.73
Interactive attention	3.93
Shared attention	2.33
Internal attention	1.67
External attention	1.00

Table 4. Simulated IPA 2.0 values for the emotion “Joy”

(w_{e} = 1.0)

.

Table 4. Simulated IPA 2.0 values for the emotion “Joy”

(w_{e} = 1.0)

.

Attention Type	$w_{a}$	Discordance ( $δ$ )	FNP	IPA
Focused/Active	2.0	0.30	0.5	20.20
Sustained	1.6	0.30	0.50	18.20
Selective	1.4	0.30	0.50	17.20
Alternating	1.2	0.30	0.5	16.20
Divided	0.8	0.30	0.50	14.20
Reactive	0.4	0.30	0.5	12.20
Constructive	1.8	0.30	0.50	19.20
Interactive	1.2	0.30	0.50	16.20
Shared	0.0	0.30	0.50	10.20
Internal	$- 0.5$	0.30	0.5	7.70
External	$- 1.0$	0.30	0.50	5.20

Table 5. Descriptive statistics of the IPA 2.0 in the simulation (Study A).

Emotional Category ( $w_{e}$ )	Variable	Mean	SD
Joy/Hope (1.0)	IPA	14.74	4.95
Pride/Relief (0.7)	IPA	11.74	4.95
Anxiety/Anger ( $- 0.5$ )	IPA	$- 0.26$	4.95
Shame/Sadness ( $- 0.8$ )	IPA	$- 3.26$	4.95
Boredom ( $- 1.0$ )	IPA	$- 5.26$	4.95

Table 6. Pearson correlation matrix (Study A).

Variable	Emotional Weight	Attention Value	IPA
Emotional weight	1.000	0.000	0.857
Attention value	0.000	1.000	0.515 **
IPA	0.857	0.515	1.000

** p < 0.01 (two-tailed).

Table 7. Cohort and eligibility summary (Study B).

Metric	Value
$N_{t o t a l}$ processed	405
$N_{v a l i d}$ (target available)	405
$N_{e l i g i b l e}$ (variance filter)	179
$N_{e f f e c t i v e}$ (Fisher aggregation)	172
Strong-signal subset ( $r_{e v a l} \geq 0.50$ )	25

Table 8. Summary of overall correlation (Study B, evaluation).

Statistic	Value
Mean $r_{e v a l}$	0.131
Median $r_{e v a l}$	0.101
SD $r_{e v a l}$	0.342
Min $r_{e v a l}$	−0.605
Max $r_{e v a l}$	0.969
$r_{g l o b a l}$ (Fisher-z)	0.166
95% CI	[0.017, 0.308]
n (effective)	172

Table 9. Distribution of selected calibration parameters (Study B).

Parameter	Value	n
Window	1 s	46
Window	5 s	126
Smoothing	None	57
Smoothing	3	115
Lag	−2	36
Lag	−1	21
Lag	0	45
Lag	+1	26
Lag	+2	44

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Arranz-Romero, J.; Roig-Vila, R.; Cazorla, M. IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI. Appl. Sci. 2026, 16, 2515. https://doi.org/10.3390/app16052515

AMA Style

Arranz-Romero J, Roig-Vila R, Cazorla M. IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI. Applied Sciences. 2026; 16(5):2515. https://doi.org/10.3390/app16052515

Chicago/Turabian Style

Arranz-Romero, Javier, Rosabel Roig-Vila, and Miguel Cazorla. 2026. "IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI" Applied Sciences 16, no. 5: 2515. https://doi.org/10.3390/app16052515

APA Style

Arranz-Romero, J., Roig-Vila, R., & Cazorla, M. (2026). IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI. Applied Sciences, 16(5), 2515. https://doi.org/10.3390/app16052515

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI

Abstract

1. Introduction

2. Theoretical Framework and Related Work

2.1. Academic Emotions and Learning: From Theory to Operationalisation

2.2. Attention, Self-Regulation and Engagement in Technology-Mediated Learning Environments

2.3. Multimodal Recognition of Emotion and Attention: Recent Advances and Persistent Limitations

2.4. Educational AI and Neuro-Adaptation: From Prediction to Traceable Decision

2.5. Critical Synthesis and Design Requirements Derived from the State of the Art

2.6. From State of the Art to System: NAILF as an Operational Framework

3. Materials and Methods

3.1. Study Objectives and Research Questions

3.2. Study Design and Validation Logic

3.3. Formalisation of the Learning Improvement Index (IPA 2.0)

3.3.1. General Definition

3.3.2. Emotional Component

3.3.3. Attention Component

3.3.4. Adjustment Terms: Discordance and Neuro-Alignment

3.4. Study A: Structural Simulation of the Emotion–Attention Space

3.5. Study B: External Empirical Validation with the DIPSEER Dataset

3.5.1. Dataset and Ecological Context

3.5.2. Operationalisation and Mapping to IPA 2.0

3.5.3. Time Windows, Calibration and Leakage Prevention

3.6. Metrics and Statistical Analysis

3.7. Ethical and Data Governance Considerations

4. Results

4.1. Study A: Structural Validation Through Biologically Informed Simulation

4.1.1. Coverage of the State Space and Range of the Index

4.1.2. Numerical Stability and Relative Contribution of Emotion and Attention

4.2. Study B: External Empirical Validation with DIPSEER

4.2.1. Cohort, Eligibility, and Anti-Leakage Controls

4.2.2. Convergent Validity of the IPA 2.0

4.2.3. Temporal Calibration and Parameter Stability

5. Discussion

5.1. Summary of Main Findings

5.2. Interpretation of Effect Size in “In the Wild” Contexts

5.3. Methodological Robustness and Interpretation of Temporal Calibration

5.4. Implications for the Design of Neuro-Adaptive Systems

5.5. Summary of Results

5.6. Response to Research Questions

5.7. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Appendix Tables

Appendix A.1. Study A Robustness and Sensitivity (Monte Carlo)

Appendix A.2. Strong-Signal Subset Descriptive Breakdown (Study B)

Appendix A.3. Declarative Mapping DIPSEER A→B IPA 2.0 (Reproducibility)

Appendix A.4. Phase 0C TensorFlow Baseline (Sensor-Only vs. Sensor + IPA)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI