Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children

Faria, Diego Resende; da Silva Ayrosa, Pedro Paulo

doi:10.3390/app15137532

Open AccessArticle

Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children

by

Diego Resende Faria

^1,*

and

Pedro Paulo da Silva Ayrosa

²

¹

School of Science, Loughborough University, Loughborough LE11 3TU, UK

²

Educational Technology Laboratory (LABTED) & Computer Science Deapartment, State University of Londrina, Londrina 86057-970, Brazil

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7532; https://doi.org/10.3390/app15137532

Submission received: 7 May 2025 / Revised: 12 June 2025 / Accepted: 2 July 2025 / Published: 4 July 2025

Download

Browse Figures

Versions Notes

Abstract

Neuro-Affective Intelligence (NAI) integrates neuroscience, psychology, and artificial intelligence to support neurodivergent children through personalized Child–Machine Interaction (CMI). This paper presents an adaptive neuro-affective system designed to enhance engagement in children with neurodevelopmental disorders through serious games. The proposed framework incorporates real-time biophysical signals—including EEG-based concentration, facial expressions, and in-game performance—to compute a personalized engagement score. We introduce a novel mechanism, Bayesian Immediate Feedback Learning (BIFL), which dynamically selects visual, auditory, or textual stimuli based on real-time neuro-affective feedback. A multimodal CNN-based classifier detects mental states, while a probabilistic ensemble merges affective state classifications derived from facial expressions. A multimodal weighted engagement function continuously updates stimulus–response expectations. The system adapts in real time by selecting the most appropriate cue to support the child’s cognitive and emotional state. Experimental validation with 40 children (ages 6–10) diagnosed with Autism Spectrum Disorder (ASD) and Attention Deficit Hyperactivity Disorder (ADHD) demonstrates the system’s effectiveness in sustaining attention, improving emotional regulation, and increasing overall game engagement. The proposed framework—combining neuro-affective state recognition, multimodal engagement scoring, and BIFL—significantly improved cognitive and emotional outcomes: concentration increased by 22.4%, emotional engagement by 24.8%, and game performance by 32.1%. Statistical analysis confirmed the significance of these improvements (

p < 0.001

, Cohen’s

d > 1.4

). These findings demonstrate the feasibility and impact of probabilistic, multimodal, and neuro-adaptive AI systems in therapeutic and educational applications.

Keywords:

educational AI; serious games; EEG; emotion recognition; Bayesian feedback learning

1. Introduction

Neurodiversity presents significant challenges in both healthcare and education. Globally, approximately 1 in 100 children (ranging from 1 in 36 in countries such as the USA, UK, and Brazil) is diagnosed with ASD. Additionally, ADHD affects around 5–7% of children and 2.5% of adults worldwide, making it one of the most prevalent neurodevelopmental conditions. Beyond these, 1 in 8 individuals (around 970 million people) globally live with a mental disorder, with anxiety and depressive disorders being the most common, according to the World Health Organization (WHO) [1].

Despite the increasing demand for scalable and adaptive interventions to support cognitive and emotional well-being, traditional Human–Computer Interaction (HCI) systems predominantly rely on unimodal approaches that lack real-time adaptation. NAI offers an innovative solution by enabling adaptive systems that respond dynamically to cognitive and emotional states. Unlike conventional affective computing, NAI integrates affective and cognitive processes, allowing AI to modulate interactions based on neuro-affective regulation, which is critical for decision-making, attention modulation, and emotional self-regulation. To address these limitations, we propose an NAI system for adaptive Child–Game Interaction (CGI), as illustrated in Figure 1, integrating real-time EEG signals, facial emotion analysis, and performance metrics to monitor and respond to engagement levels. A central innovation of this work is the Bayesian Immediate Feedback Learning (BIFL) framework, which optimizes stimulus delivery by continuously updating its belief model using Bayesian inference and multi-armed bandit theory. The system dynamically selects and delivers the most effective modality (visual, auditory, or textual) based on individual neuro-affective feedback during gameplay. Unlike prior systems, our approach enables personalized real-time adaptation without requiring long training or predefined rules. The engagement score is calculated through a dynamic weighting model, assigning importance to each modality based on reliability, measured by an uncertainty measure. This ensures that the multimodal engagement score merges the multisensory information (e.g., concentration, emotion, and game score) based on the most stable and informative input source over time.

The key contributions of this work are as follows:

A novel Bayesian Immediate Feedback Learning (BIFL) framework that adaptively selects the most appropriate stimulus modality (visual, auditory, or text) throughout gameplay, based on real-time multimodal feedback from EEG concentration, facial expressions, and game performance. The selected stimulus is that which has improved engagement and supported emotional regulation in neurodivergent children by analyzing their biophysical responses.
A robust multimodal engagement computation model with dynamic weighting based on signal stability.
Experimental validation with 40 neurodivergent children over four weeks, showing significant improvements in concentration, emotional engagement, and task performance.
Statistical evaluation demonstrating high effect sizes and significant improvements, validating the system’s effectiveness.

This study is part of the NeuroEngage Project (2020–2024), supported by UKRI, CONFAP, and IEEE RAS-SIGHT, and it aims to advance scalable AI solutions for therapeutic cognitive training in neurodiverse populations.

The remainder of this paper is structured as follows: Section 2 reviews related work, Section 3 details the methods, Section 4 describes the experimental design, Section 5 presents the results, and Section 6 concludes with future directions.

2. Related Works

Serious games have emerged as a promising tool across a range of domains, including therapy, education, and cognitive training. They have shown particular promise in supporting individuals with neurodevelopmental and mental health disorders [2]. Among the most studied populations are children with ASD and ADHD, for whom interactive, game-based methods can support cognitive, emotional, and behavioral interventions. For individuals with ASD, serious games have shown effectiveness in teaching social interactions and emotional understanding [3]. These games facilitate skill acquisition in diverse contexts but are frequently tailored to high-functioning individuals, limiting their accessibility [4]. Moreover, their clinical validation often falls short of rigorous medical standards [3], and game design practices are inconsistently reported, hindering replication and broader applicability [5]. Despite these limitations, the incorporation of structured game design elements and emotional training mechanics continues to expand [6]. In the ADHD domain, computer-assisted learning and gamified interventions have been used to enhance executive function [7]. Brain–computer interfaces (BCIs), combining EEG-based attention monitoring and feedback mechanisms, have been used both for treatment and as interaction tools in serious games [8,9]. Neurofeedback integrated into game-based training has shown efficacy across most behavioral symptoms, although attention deficits persist as a challenge [10]. VR-based emotion-recognition systems have improved emotion regulation in autistic children, yet scalability for differing severity levels remains an issue [11].

Recent advances in HCI and AI-driven game design further demonstrate the potential of neuro-adaptive interventions. These include real-time EEG-based concentration monitoring for personalized game adaptation [12], as well as hybrid approaches that fuse eye-tracking and EEG to improve attention classification in BCIs [13]. Serious games have also been explored in low-resource settings through tangible interfaces [14], and machine learning applied to EEG data has enabled high-accuracy attention lapse prediction, albeit with limited interpretability [15]. Motion-based gaming interventions have proven to be effective in training impulse control in children with ADHD, although they may exclude those with motor impairments [16]. A recent systematic review identified common reward-based patterns in over 40 serious games but also emphasized the need for better personalization and longitudinal validation [17].

Beyond the State-of-the-Art

While these studies show significant progress in leveraging AI, neurofeedback, and multimodal HCI for therapeutic game-based interventions, they are often limited in terms of clinical scalability, individual adaptability, and interpretability of their models. In particular, the lack of explainability in AI decisions, the one-size-fits-all design approach, and the poor integration of multimodal data streams hinder real-world deployment. This work addresses some of these gaps by integrating facial expression analysis and EEG-based cognition assessment to adaptively select personalized stimuli for neurodivergent children. Our system contributes to the state-of-the-art by advancing real-time, AI-driven, context-aware interaction that is sensitive to each child’s level of engagement.

3. Methods

3.1. EEG-Based Concentration Level Detection

For preprocessing, EEG data collected via Muse headbands (4 channels: AF7, AF8, TP9, TP10, sampled at 256Hz) underwent Discrete Wavelet Transform (DWT) using the Daubechies-4 (db4) wavelet basis. A 5-level decomposition was applied to isolate five canonical EEG bands:

δ

,

θ

,

α

,

β

, and

γ

. Each band was processed separately, resulting in five distinct matrices. The frequency bands corresponded to cognitive states, where

α

related to relaxation,

β

to active thinking,

θ

to drowsiness,

γ

to cognitive load, and

δ

to deep sleep. Feature extraction followed a sliding-window approach (1 s windows with 0.5 s overlap). Within each window, we extracted the following features:

Mean, skewness, and kurtosis;
Minimum and maximum values;
Variance and covariance matrix;
Eigenvalues and matrix logarithms of the covariance matrix;
FFT magnitude and top 10 most energetic frequencies.

From each window, a total of 989 features were computed. To reduce dimensionality and improve generalization, we applied Information Gain for feature selection, retaining the top 256 features per wave type. Each resulting feature matrix was reshaped into a 16 × 16 grayscale image. For classification, a multi-CNN architecture was utilized, employing a 5→1 CNN fusion model (CNN(5→1)) [18], where five parallel CNNs were trained (one per wave type) and their features were concatenated in the flattened layer. The CNN architecture (per branch) consisted of the following:

Conv2D (32 filters, 3 × 3, ReLU);
Conv2D (64 filters, 3 × 3, ReLU);
MaxPooling2D (2 × 2);
Dropout (0.25);
Flatten;
Dense (512 units, ReLU);
Dropout (0.5);
Output: Dense (3 classes, Softmax).

The concatenated output (shape: 11520) was passed through a final dense layer (512 ReLU) and a softmax layer for classification into three mental states: relaxed, neutral, and concentrated. The final model (CNN(5→1)) was trained for 400 epochs. Algorithm 1 presents the steps for EEG classification.

Due to the limited availability of large-scale child EEG datasets, the initial EEG model was pre-trained using adult data and calibrated to each child through a 20 s personalized adaptation phase. This brief calibration allowed real-time mapping of dominant EEG patterns without needing long training sessions, which are difficult to conduct with children.

Algorithm 1 EEG Feature Extraction and Classification

1:: Input: Raw EEG signals S from channels ${T P 9, T P 10, A F 7, A F 8}$
2:: Output: Concentration level $C = {concentrated, neutral, relaxed}$
3:: Step 1: Apply Discrete Wavelet Transform (DWT)
4:: Decompose S into five frequency bands: $α, β, θ, γ, δ$
5:: Using DWT Type II:
6:: $X_{j, k} = \sum_{n} x [n] g_{j, k} [n]$
7:: $Y_{j, k} = \sum_{n} x [n] h_{j, k} [n]$
8:: where $X_{j, k}$ and $Y_{j, k}$ are the approximation and detail coefficients at scale j, and $g_{j, k} [n]$ and $h_{j, k} [n]$ are the wavelet and scaling functions, respectively.
9:: Step 2: Compute Statistical Features
10:: Mean: ${\bar{y}}_{k} = \frac{1}{N} \sum_{i = 1}^{N} y_{k i}$
11:: Skewness: $g_{1, k} = \frac{\sum_{i = 1}^{N} {(y_{k i} - {\bar{y}}_{k})}^{3}}{N s_{k}^{3}}$
12:: Kurtosis: $g_{2, k} = \frac{\sum_{i = 1}^{N} {(y_{k i} - {\bar{y}}_{k})}^{4}}{N s_{k}^{4}} - 3$
13:: Step 3: Compute Covariance-Based Features
14:: Covariance matrix: $s_{k ℓ} = \frac{1}{N - 1} \sum_{i = 1}^{N} (y_{k i} - {\bar{y}}_{k}) (y_{ℓ i} - {\bar{y}}_{ℓ})$
15:: Eigenvalues: $\det (S - λ I_{K}) = 0$
16:: Logarithm of the covariance matrix:
17:: $e^{B} = I_{K} + \sum_{n = 1}^{\infty} \frac{S^{n}}{n!}$
18:: where $B = logm (S)$
19:: Step 4: Compute Spectral Features
20:: Fast Fourier Transform (FFT):
21:: $FFT (w_{t}) = \sum_{n = 0}^{N - 1} y_{n} e^{- j 2 π k n / N}$
22:: Extract the top 10 most energetic frequency components:
23:: $getFFT (w_{t}, 10) = arg {max}_{f} | FFT (w_{t}) |$
24:: Step 5: Construct Final Feature Vector: $F = {\bar{y}, g_{1}, g_{2}, s_{k ℓ}, λ, logm (S), FFT (w_{t})}$
25:: Step 6: EEG Classification using CNN each EEG frequency band $b \in {α, β, θ, γ, δ}$
26:: Apply CNN with layers:
27:: Convolution: $Z^{(l)} = ReLU (W^{(l)} X + b^{(l)})$
28:: Pooling: $P^{(l)} = max (Z^{(l)})$
29:: Flattening: $F^{*} = Flatten (P^{(l)})$
30:: Fully Connected Dense Layer: $D^{(l)} = ReLU (W_{d} F^{*} + b_{d})$
31:: Dropout: $D^{(l)} = Dropout (0.5)$
32:: Concatenate CNN outputs into a feature vector $F_{CNN}$
33:: Apply final SoftMax classifier:
34:: $C = arg max (SoftMax (Z))$
35:: Return C

EEG Data Augmentation Using WGAN-GP

Acquiring large-scale EEG datasets is inherently challenging due to privacy concerns, ethical considerations, and the complexity of long experimental setups. These limitations often result in small and unbalanced datasets, making it difficult to train deep models effectively. To address this challenge, we employed a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) [19] to generate realistic synthetic EEG data, mitigating data scarcity while enhancing classification. The WGAN-GP framework comprises a generator

G (z)

that learns the underlying distribution of real EEG samples and a discriminator

D (x)

that distinguishes between real and synthetic signals. By optimizing the Wasserstein distance with gradient penalty, a WGAN-GP ensures improved convergence stability and better quality of synthetic EEG signals. Algorithm 2 presents all the steps of a WGAN-GP.

Algorithm 2 WGAN-GP Training Algorithm

1:: Input: Training data distribution $P_{d a t a}$ , noise distribution $P_{z}$
2:: Output: Trained generator $G_{θ}$
3:: Step 1: Define Generator and Discriminator Models
4:: Generator:
5:: $G_{θ} (z) = ReLU (W_{4} \cdot ReLU (W_{3} \cdot ReLU (W_{2} \cdot ReLU (W_{1} \cdot z)))))$
6:: Discriminator:
7:: $D_{ω} (x) = LeakyReLU (W_{4} \cdot Dropout (LeakyReLU (W_{3} \cdot Dropout (LeakyReLU (W_{2} \cdot Dropout (LeakyReLU (W_{1} \cdot x)))))))$
8:: Step 2: Initialize Models and Optimizers
9:: Initialize generator $G_{θ}$ and discriminator $D_{ω}$
10:: Initialize Adam optimizers with learning rate $l r$ and $β$
11:: Step 3: Training Loop epoch = $1, \dots, n u m_e p o c h s$
12:: Randomly shuffle training data indices $i = 1, \dots, m$
13:: Sample a batch of m real data: $x^{(i)} \sim P_{d a t a}$
14:: Sample a batch of m noise vectors: $z^{(i)} \sim P_{z}$
15:: Generate fake data: $\tilde{x} = G_{θ} (z)$
16:: Interpolate data: $\hat{x} = ϵ x + (1 - ϵ) \tilde{x}$ , where $ϵ \sim U (0, 1)$
17:: Compute gradient penalty:
18:: $\nabla_{\hat{x}} D_{ω} (\hat{x})$ ,
19:: $L_{G P} = λ {(∥ \nabla_{\hat{x}} D_{ω} (\hat{x}) ∥_{2} - 1)}^{2}$
20:: Compute discriminator loss:
21:: $L_{D} = D_{ω} (\tilde{x}) - D_{ω} (x) + L_{G P}$
22:: Update discriminator:
23:: $ω \leftarrow ω - η \nabla_{ω} L_{D}$
24:: Every k iterations, update the generator: $i mod (5 \times m) = = 0$
25:: Generate fake data: $\tilde{x} = G_{θ} (z)$
26:: Compute generator loss: $L_{G} = - D_{ω} (\tilde{x})$
27:: Update generator:
28:: $θ \leftarrow θ - η \nabla_{θ} L_{G}$
29:: Return: Trained generator $G_{θ}$

3.2. Facial Expression Recognition

Feature extraction for facial expression recognition incorporates geometric, temporal, and deep learning-based features to enhance representation. Geometric features capture facial structure through landmark-based distances and angles, while temporal features track variations in expressions over time. Additionally, deep learning features are extracted using pre-trained convolutional neural networks (VGG16 model) to encode high-level spatial patterns. For classification, Support Vector Machines (SVMs) and Logistic Regression are applied independently to generate initial predictions. These outputs are then fused using a Dynamic Bayesian Mixture Model (DBMM), which dynamically adjusts classifier contributions based on confidence scores and temporal dependencies, ultimately improving classification accuracy and robustness.

The facial expression model was trained on the KDEF dataset [20], which consists of adult facial images. While children may exhibit expressions that differ in intensity or geometry, we addressed this limitation by extracting geometric features (e.g., distances between eyes, eyebrows, and mouth corners) that are less dependent on texture. An ensemble classifier combining these geometric features with EEG-based concentration detection helped reduce bias and improve emotional state recognition. Algorithm 3 presents all the steps for facial expression recognition.

Algorithm 3 Facial Expression Recognition

1:: Input: Face image of size $224 \times 224$
2:: Output: Predicted facial expression class
3:: Step 1: Preprocessing
4:: Detect facial landmarks using DLib or MediaPipe.
5:: Normalize and resize the image to $224 \times 224$ .
6:: Step 2: Feature Extraction
7:: (a) Geometric Features:
8:: Compute pairwise distances among facial landmarks:
9:: $d_{i j} = \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}}$
10:: Compute landmark triangulation (Delaunay).
11:: Compute angles between facial landmark triangles:
12:: $θ_{i j k} = {cos}^{- 1} (\frac{d_{i j}^{2} + d_{j k}^{2} - d_{i k}^{2}}{2 d_{i j} d_{j k}})$
13:: (b) Covariance-Based Features:
14:: Compute covariance matrix (given the landmarks):
15:: $s_{k ℓ} = \frac{1}{N - 1} \sum_{i = 1}^{N} (y_{k i} - {\bar{y}}_{k}) (y_{ℓ i} - {\bar{y}}_{ℓ})$
16:: Compute logm of covariance matrix:
17:: $e^{B} = I_{K} + \sum_{n = 1}^{\infty} \frac{S^{n}}{n!}, B = logm (S)$
18:: (c) Deep Learning Features (given the face images):
19:: Extract deep features from VGG16 fully connected layers.
20:: (d) Other image-based features (given the face images):
21:: Extract Histogram of Gradients and Linear Binary Patterns.
22:: (6) Temporal Features:
23:: Compute temporal difference:
24:: $F_{T} = F_{t} - F_{t - 1}$ (difference between current and previous frame feature vector)
25:: Step 3: Feature Fusion
26:: Concatenate all extracted features:
27:: $F = {F_{geo}, F_{HoG}, F_{VGG 16}, F_{T}}$
28:: Step 4: Feature Selection
29:: Select features using Information Gain:
30:: $I G (F_{i}) = H (C) - H (C | F_{i})$
31:: Step 5: Train SVM and Logistic Regression
32:: Train Support Vector Machine (SVM):
33:: $\min_{w, b} \frac{1}{2} {∥ w ∥}^{2}$
34:: s.t. $y_{i} (w \cdot x_{i} + b) \geq 1 \forall i$
35:: Train Logistic Regression:
36:: $P (y | X) = \frac{1}{1 + e^{- (w X + b)}}$
37:: Step 6: Individual Predictions
38:: Predict with SVM: ${\hat{y}}_{SVM} = SVM (F_{selected})$
39:: Predict with Logistic Regression: ${\hat{y}}_{LR} = LR (F_{selected})$
40:: Step 7: Ensemble Prediction with DBMM
41:: Compute posterior using DBMM:
42:: $P (C^{t} | A^{t}) = \frac{\prod_{k = t - T}^{t} P (C^{k} | C^{k - 1}) \sum_{i} w_{i}^{t} P_{i} (A^{t} | C^{t})}{\sum_{j} \prod_{k = t - T}^{t} P_{i, j} (C^{k} | C^{k - 1}) \sum_{i} w_{i}^{t} P_{i, j} (A^{t} | C^{t})}$
43:: Compute classifier weights using inverse entropy:
44:: $w_{i} = 1 - \frac{- \sum_{k} P_{i, k} log P_{i, k}}{\sum_{i} (- \sum_{k} P_{i, k} log P_{i, k})}$
45:: Compute final classification using argmax:
46:: $C^{*} = arg max P (C^{t} | A^{t})$
47:: Return: Predicted facial expression class $C^{*}$

3.3. Theoretical Foundations of Proposed Bayesian Immediate Feedback Learning (BIFL)

Bayesian Immediate Feedback Learning (BIFL) is a probabilistic framework designed to dynamically optimize real-time stimulus selection based on immediate feedback from cognitive and affective states. BIFL combines Bayesian inference and Multi-Armed Bandit (MAB) theory to learn and adjust stimulus-response strategies for each child in an adaptive educational environment. BIFL is particularly useful in scenarios where the acquisition of extensive data is challenging due to ethical, privacy, or experimental constraints. Unlike conventional Reinforcement Learning methods that require extensive training and a predefined reward function, BIFL updates its decision-making strategy dynamically, relying on Bayesian posterior updates of observed responses. BIFL is structured around a Bayesian MAB model, where each possible stimulus is treated as an “arm” with an uncertain reward distribution. The algorithm balances exploration (testing new stimuli) and exploitation (employing previously effective stimuli) to maximize engagement and cognitive improvement. In other words, BIFL continuously identifies and delivers the most suitable stimulus modality (visual, auditory, or text) during gameplay by analyzing real-time multimodal feedback—including EEG concentration, facial expressions, and game performance. This enables the system to promote engagement and emotional regulation in neurodivergent children by dynamically adapting to their biophysical responses.

3.3.1. Problem Definition

Let there be a set of stimuli

= {s_{1}, s_{2}, . . ., s_{N}}

, where each

s_{i}

represents a distinct stimulus (e.g., auditory, visual, or multimodal cues). The goal of BIFL is to sequentially select stimuli to maximize the engagement and cognitive performance of a child interacting with the system. At each time step t, the system selects a stimulus

s_{i}

and observes an immediate feedback response

x_{i, t}

, which represents an increase (or decrease) in the child’s cognitive engagement or regulated emotion (from negative to neutral or positive). This feedback is obtained via EEG-based concentration levels or facial expression-based emotional responses. The reward function

R (s_{i})

is modeled as a random variable representing the effectiveness of stimulus

s_{i}

in improving engagement. The true reward distribution of each stimulus is unknown but can be estimated dynamically using Bayesian inference. Each stimulus

s_{i}

follows a Beta prior distribution,

θ_{i} \sim B e t a (α_{i}, β_{i})

, where

α_{i}

and

β_{i}

are the parameters representing the number of observed successes and failures, respectively.

3.3.2. Bayesian Convergence of BIFL

Theorem 1.

As

t \to \infty

, the posterior mean estimate

{\hat{θ}}_{i}

converges to the true reward probability

μ_{i}

.

Using Bayesian inference, the posterior probability density is given by

P (θ_{i} | x_{i, 1}, \dots, x_{i, n_{i}}) = \frac{θ_{i}^{α_{i}^{'} - 1} {(1 - θ_{i})}^{β_{i}^{'} - 1}}{B (α_{i}^{'}, β_{i}^{'})} .

(1)

We apply the Law of Large Numbers,

{\hat{θ}}_{i} = \frac{α_{i}^{'}}{α_{i}^{'} + β_{i}^{'}} \to μ_{i},

(2)

where

μ_{i}

is the true probability of stimulus

s_{i}

improving engagement. This result confirms that the posterior estimate is an unbiased estimator of the true probability and, thus, the BIFL algorithm adapts optimally over time.

3.3.3. Exploration–Exploitation Trade-Off in BIFL

To balance exploration and exploitation, BIFL employs the Upper Confidence Bound (UCB) approach, which selects the stimulus with the highest expected reward adjusted by an exploration term. The expected stimulus reward

s_{i}

is given by the posterior mean:

{\hat{θ}}_{i} = \frac{α_{i}^{'}}{α_{i}^{'} + β_{i}^{'}} .

(3)

Lemma 1.

The probability of selecting a suboptimal stimulus decreases over time as

P (θ_{i} > θ_{j}) \approx 1 - O (\frac{1}{t}) .

(4)

Proof.

Since the posterior distribution follows

θ_{i} \sim B e t a (α_{i}, β_{i}),

the probability of selecting

s_{i}

over

s_{j}

follows Thompson Sampling:

P (θ_{i} > θ_{j}) = \int_{0}^{1} P (θ_{i} > θ_{j} | x) P (x) d x .

(5)

As

t \to \infty

,

P (θ_{i} > θ_{j})

approaches 1, ensuring that the probability of selecting an inferior stimulus diminishes over time. Thus, BIFL dynamically adapts the stimulus selection process, optimizing engagement through feedback learning. □

3.3.4. Stimulus Selection and Decision-Making

At each iteration, BIFL samples the posterior distribution for each stimulus and selects the one with the highest estimated reward:

s^{*} = arg {max}_{s_{i}} θ_{i}

. To further balance between exploration and exploitation, the system applies a UCB strategy:

s^{*} = arg max_{s_{i}} ({\hat{θ}}_{i} + \sqrt{\frac{β_{0} + α_{0}}{β_{i}^{'} + α_{i}^{'}}}),

(6)

where

β_{0} + α_{0}

represents the prior sample size, influencing the trade-off between exploring new stimuli and exploiting known effective stimuli.

3.3.5. Incremental Learning in BIFL

Theorem 2.

As new observations are collected, BIFL updates its prior beliefs using Bayesian updating, refining stimulus selection over time.

Proof.

Given

n_{i}

observations of a stimulus

s_{i}

, where

x_{i, 1}, . . ., x_{i, n_{i}}

are the observed engagement improvements, the posterior parameters update as

α_{i}^{'} = α_{i} + \sum_{j = 1}^{n_{i}} x_{i, j}, β_{i}^{'} = β_{i} + n_{i} - \sum_{j = 1}^{n_{i}} x_{i, j} .

(7)

Since the Beta distribution acts as a conjugate prior for the Bernoulli likelihood function, the posterior maintains a Beta form:

θ_{i} | x_{i, 1}, . . ., x_{i, n_{i}} \sim B e t a (α_{i}^{'}, β_{i}^{'})

. Thus, BIFL continuously refines its belief about the effectiveness of each stimulus, ensuring optimal adaptation. □

Overall Workflow of BIFL

BIFL follows these steps:

Prior beliefs: Assume a Beta prior for each stimulus.
Select stimulus $s^{*}$ using Thompson Sampling or UCB.
Feedback: Record the cognitive response $x_{i, t}$ .
Beliefs: Adjust $α_{i}$ and $β_{i}$ using Bayesian updating.
Repeat: Continuously improve stimulus selection.

In short, BIFL provides an adaptive mechanism for selecting optimal stimuli in real time based on immediate feedback. The integration of Bayesian inference and MAB theory ensures continuous learning while maintaining an optimal balance between exploration and exploitation. This approach enables highly personalized cognitive training, optimizing engagement for each individual user. Moreover, BIFL supports both UCB and Thompson Sampling for exploration–exploitation trade-off. In this work, UCB was used in real-time decision-making, while Thompson Sampling was used for theoretical convergence analysis.

It is important to mention that traditional Reinforcement Learning (RL) approaches, such as Q-learning, require extensive offline training over multiple episodes and a well-defined reward function. However, in real-time interactions with neurodivergent children, cognitive states, emotions, and brainwave dynamics are unpredictable and non-stationary. Offline RL training within this context is not feasible, due to time constraints and the impossibility of replicating spontaneous emotional–cognitive fluctuations. Our BIFL approach processes real-time signals over a 10 s temporal window, allowing adaptive feedback based on immediate multimodal input. Although RL-based comparisons will be explored in future simulations, BIFL was more appropriate for on-the-fly decision-making and showed good real-time performance.

3.4. Multimodal Engagement Score Computation

To quantitatively estimate dynamic child engagement during gameplay, we developed a multimodal engagement scoring model based on three sources: (i) game score, (ii) EEG concentration, and (iii) facial emotion. Each modality is normalized to a discrete score: 1 (Low), 3 (Medium), or 5 (High), with the following mappings:

EEG concentration:
−
concentration > 70% → Score 5 (High)
−
50–70% → Score 3 (Medium)
−
<50% → Score 1 (Low)
Facial expression:
−
positive emotion (happy/surprised) → Score 5
−
neutral expression → Score 3
−
negative emotion (anger, disgust, sadness, fear) → Score 1
Game performance:
−
High points → Score 5
−
Medium points → Score 3
−
Low points → Score 1

Then, the engagement score at each window (every 10 s) is computed as

E (t) = w_{E E G} (t) \times s c o r e_{E E G} (t) + w_{F a c i a l} (t) \times s c o r e_{F a c i a l} (t) + w_{G a m e} (t) \times s c o r e_{G a m e} (t) .

(8)

Values closer to 5 indicate higher engagement. To personalize the engagement score and account for variability between children, especially when observing neurodiverse profiles, weights are dynamically assigned based on the confidence of each modality. The weights

w_{E E G}

,

w_{F a c i a l}

, and

w_{G a m e}

are recalculated every 10 s using the inverse Jensen–Shannon divergence, which quantifies the temporal consistency of each modality’s signals. A more stable signal (i.e., less fluctuation over time) receives a higher weight, ensuring that the engagement score emphasizes the most reliable input at each moment. The weights are computed as follows:

w_{i} = \frac{1 - J S (P_{i} ‖ U)}{\sum_{j = 1}^{n} (1 - J S (P_{j} ‖ U))} .

(9)

Given the empirical distribution

P_{i}

of recent values for each factor and a uniform reference distribution U, the JS divergence is computed as

J (P_{i} ‖ U) = \frac{1}{2} K L (P_{i} ‖ M) + \frac{1}{2} K L (U ‖ M),

(10)

where

M = \frac{1}{2} (P_{i} + U)

, and where

K L (P ‖ Q)

is the Kullback–Leibler divergence

K L (P ‖ Q) = \sum_{x} P (x) log \frac{P (x)}{Q (x)}

. In this formulation, lower JS divergence (i.e., more consistent or reliable behavior) leads to a higher weight, while factors with more variability receive less influence in the engagement computation. This adaptive weighting ensures that the engagement score reflects the most trustworthy and stable signals over time. The advantages of adopting dynamic weights are as follows:

Adaptiveness: No fixed modality importance; adapts to each child.
Explainability: Weight evolution interprets engagement focus (e.g., child relying more on cognitive or affective cues).
Robustness: Avoids biases toward modalities insensitive to neurodivergent behaviors.

3.5. Game Interface Design Using Unity Engine for Neuro-Affective Training

Based on the proposed Neuro-Affective Intelligence (NAI) framework and Bayesian Immediate Feedback Learning (BIFL) described in this study, a set of serious games was developed using the Unity game engine. The primary goal of these games is to support the neurocognitive development of neurodivergent children by fostering memory, reasoning, and social skills. Inspired by the state-of-the-art in therapeutic serious games for neurodiverse populations, three game categories were implemented:

Memory game: Designed to improve concentration and short-term memory. Children match pairs of animals or objects within a limited time.
Reasoning game: Focused on logic and basic problem-solving using animal-based math puzzles and mazes.
Social skills game: Developed to enhance emotion recognition and conversational response. Children choose the appropriate facial expression label or social response in daily scenarios.

Each game interface includes score tracking, level progression, and time constraints to provide a structured and motivating experience. The gameplay is adapted in real time by the BIFL module, which monitors engagement through EEG (concentration), facial emotion recognition, and game performance. Based on these inputs, the system logs multimodal responses and dynamically selects the effective stimulus (visual, auditory, or textual) to guide the child. Stimuli may include visual highlights, spoken hints, or on-screen instructions.

Stimulus repetition is applied when sustained low concentration or negative emotional states are detected, aiding self-regulation and helping the child refocus on the task. Engagement is quantified via a multimodal score, dynamically weighted according to the temporal consistency and informativeness of each input modality, following the computation model described in Equation (8).

Figure 2, Figure 3 and Figure 4 illustrate the Unity-based interfaces designed for each task category. These interfaces were used in experimental sessions and adapted based on real-time BIFL logs to maximize engagement and emotional regulation.

These Unity-based games were integrated with real-time EEG and facial expression sensors, forming the core of the Child–Game Interaction framework. The combination of gamified interaction and neuro-affective feedback promotes an engaging and personalized learning experience for neurodivergent users.

4. Experimental Design

This study evaluated neurocognitive training in a CGI scenario (Figure 5) within the NeuroEngage project. The CGI setup employed multimodal AI to assess neuro-affective states and dynamically adapt responses.

The independent variables included interaction modality (game-based neurocognitive training), engagement strategies (adaptive visual, auditory, and textual feedback), and multimodal inputs (EEG-based mental states and facial expressions). The dependent variables measured were engagement levels assessed via BIFL, emotional state classification (positive: happy and surprised; neutral; negative: angry, disgusted, fearful, sad), mental states (concentration, neutral, relaxation), and task performance in memory, reasoning, and social interaction tasks. The study tested three hypotheses: (H1) adaptive multimodal interventions enhance engagement over static systems, (H2) emotion and cognitive-aware adaptations improve task performance, and (H3) neurophysiological signals add predictive value for engagement.

The experiment spanned four weeks, involving 40 children (6–10 years) diagnosed with ADHD or ASD, recruited from five primary schools in Brazil in partnership with the Cambé City Council (state of Paraná). Each child participated in a total of eight training sessions (i.e., twice per week over four weeks). Among them, 62.5% (25) had ADHD, while 37.5% (15) had ASD.

Each participant completed a ten-minute trial with three tasks: a memory/concentration task (matching and recall exercises), a reasoning task (logical problem-solving challenges), and a social interaction task (collaborative teamwork). During gameplay, EEG signals classified real-time brainwave activity, while facial expressions analyzed emotional responses. The game dynamically adjusted visual, text, and auditory stimuli to optimize engagement based on cognitive and affective states.

Ethics and Data Privacy

The study was conducted in direct collaboration with the Municipal Department of Education of Cambé in the state of Paraná, Brazil, and it was conducted in accordance with the ethical guidelines for AI-based interventions involving children. Informed consent was obtained from all parents and data privacy protocols were strictly followed. To ensure emotional well-being, real-time monitoring allowed facilitator intervention if distress was observed, and school representatives, such as teachers and pedagogical staff, remained present throughout the sessions. Only children aged 6 to 10 participated, in line with previous child–machine interaction studies, to ensure cognitive and social appropriateness.

5. Results

5.1. Brainwave Processing and Classification

For the EEG-based concentration training, we utilized the Mental States EEG dataset [21], which consists of recordings representing three distinct mental states—concentration, relaxation, and neutral. The dataset includes EEG recordings from four adult participants (two male, two female), with each state recorded for 60 s per individual. A Muse EEG headband was used to capture signals from TP9, AF7, AF8, and TP10 electrode positions. EEG classification was performed using our 5-CNN architecture, trained on a specialized feature engineering pipeline, as previously detailed. The classification results (with an 80–20% train–test split) are shown in Figure 6.

The classification performance using the 5-CNN model along with improvements following the integration of synthetic EEG samples via the WGAN-GP demonstrate the benefits of synthetic data generation. The WGAN-GP model was trained on adult EEG data to augment limited real samples, improving the classifier accuracy from 92.0% to 98.48% when 50% synthetic data were used. Figure 7 demonstrates the similarities between real and synthetic EEG data generated by the WGAN-GP. However, excessive augmentation beyond 75% synthetic data did not yield higher improved returns, suggesting that 50% synthetic data augmentation provides the optimal balance between performance enhancement and dataset realism. These findings validate that a WGAN-GP generates EEG signals with statistical properties that closely match real data, improving classification accuracy.

Figure 8 presents the effectiveness of Transfer Learning (TL) in EEG classification. Adult-trained models applied to child data achieved an initial accuracy of approximately 85% in the first epoch, demonstrating strong generalization. The results also compared adult training, child training, and bidirectional TL (adult-to-children and children-to-adult) to evaluate adaptability. Even with limited calibration data—acquiring 20 s of data from the children before the experiments to fine-tune the models—classification remained effective, indicating that TL enhances adaptation. Although the WGAN-GP was trained on adult EEG data, Transfer Learning and a brief per-child calibration phase allowed effective adaptation to child data.

Real-Time Testing and Model Generalization

During live experiments with children, our system achieved an average classifier confidence of 89% in detecting mental states in real time. This performance is largely attributable to (i) data augmentation, given the training data, and (ii) an initial calibration phase. Before gameplay, each child provided 20 s of concentration and relaxation EEG samples, allowing for fine-tuning the pre-trained classification model. This personalized calibration significantly enhanced classification accuracy, ensuring robust recognition of neurocognitive states. This step is essential for an accurate real-time classification.

5.2. Facial Expression Recognition

Facial expressions were analyzed using the KDEF dataset [20] in conjunction with our approach. Given our features engineering, the DBMM ensemble combined predictions from SVM and LR to enhance classification accuracy. Incorporation of temporal features significantly improved performance, increasing the accuracy from 88% to 95%, demonstrating the importance of capturing temporal dependencies in facial expressions. While the KDEF dataset consists of deliberately posed expressions, it remains effective for real-time emotion classification, albeit with slightly reduced accuracy when applied to spontaneous emotions.

To further enhance generalization, the classification models were also trained using the Real Emotion dataset, which consists of six adult participants who express genuine emotions while watching emotional video clips [22]. The dataset includes recordings of anger, fear, disgust, happiness, neutrality, sadness, and surprise, spanning 17 min per participant. When trained solely on this dataset, DBMM (SVM +RF) achieved 85% accuracy without temporal features, which increased to 90% when incorporating them. Cross-dataset evaluation demonstrated that training on KDEF and testing on Real Emotion yielded 75% accuracy, while the inverse setup resulted in 78% accuracy. By merging both datasets, the overall performance improved to 97%, making this the final model selected for real-time facial expression recognition within CGI.

Real-Time Performance and Generalization

During the on-the-fly tests with children, the facial expression classification system achieved an average classifier confidence of 87% in recognizing emotional states in real-time. This level of accuracy demonstrates the effectiveness of our approach in integrating multiple feature representations, allowing for robust classification even in dynamic, spontaneous scenarios.

Figure 9 presents representative real-time samples of emotion classification during Child–Game Interactions, integrating facial expression recognition with EEG-based brain activity analysis. The left side of each panel shows the detected facial expression and the model’s confidence level, while the right side illustrates the corresponding EEG signals and their decomposition across the five canonical frequency bands:

δ

(delta),

θ

(theta),

α

(alpha),

β

(beta), and

γ

(gamma). In the top panel, the system identifies a happy facial expression with high confidence (99%). The associated EEG activity reveals elevated power in the

α

and

β

bands, particularly

β

, which is commonly linked to focused attention and cognitive engagement. This pattern suggests an emotionally positive and concentrated mental state during the task. In contrast, the bottom panel shows a neutral expression with 90% confidence. The EEG signals display dominant

δ

and

θ

activity with comparatively lower

α

and

β

band powers. Such a pattern is typically associated with a resting or low-attention state, indicating reduced cognitive engagement at that moment.

This figure illustrates the system’s ability to interpret multimodal signals in real time, effectively mapping emotional and attentional states based on both facial and neurophysiological cues.

5.3. Experimental Results on Game Adaptation: BIFL

The following subsections present representative examples of short-term interactions involving two neurodivergent children playing a memory game where engagement levels and emotional responses were dynamically adjusted.

5.3.1. Short-Term Individual BIFL Experiment Example

The memory game consists of a grid of face-down cards in which players attempt to match pairs by flipping two cards at a time. Cognitive engagement is monitored using EEG signals, while emotional responses are tracked through facial expression analysis.

Stimuli adjustments are triggered adaptively, including the following:

Visual cues (highlighting matching cards);
Auditory cues (verbal hints);
Textual cues (written hints).

Every 10 s, concentration levels and facial expressions are assessed. If concentration decreases by more than 10% or if negative emotions persist then a stimulus is triggered. The BIFL framework continuously adapts by learning which stimulus best enhances user engagement.

Child 1 (ASD)

To illustrate the application of BIFL, we present a detailed computation for a child diagnosed with ASD during a short-term evaluation in a memory task game. This child exhibited moderate concentration fluctuations and emotional variability. Table 1 summarizes the recorded engagement parameters:

At $t = 20 s$ , EEG concentration dropped by 15%, leading to a visual cue trigger.
At $t = 30 s$ , facial emotion detected anger, prompting an auditory cue.
At $t = 40 s$ , concentration dropped again, triggering a textual cue.
By $t = 50 s$ , engagement stabilized.

We began with a uniform Beta prior for each stimulus:

θ_{i} \sim Beta (α_{0} = 1, β_{0} = 1)

. After each stimulus application, the system evaluated whether the engagement improved (reward = 1), and it updated the posterior using the Beta-Bernoulli model. The reward updates were

Visual (at $t = 20$ s): $α_{V}^{'} = 2$ , $β_{V}^{'} = 1$
Auditory (at $t = 30$ s): $α_{A}^{'} = 2$ , $β_{A}^{'} = 1$
Textual (at $t = 40$ s): $α_{T}^{'} = 2$ , $β_{T}^{'} = 1$

The posterior mean reward for each cue was calculated as

{\hat{θ}}_{V} = {\hat{θ}}_{A} = {\hat{θ}}_{T} = \frac{2}{2 + 1} = 0.6667

. Applying the Upper Confidence Bound (UCB) rule,

U C B_{i} = {\hat{θ}}_{i} + \sqrt{\frac{α_{0} + β_{0}}{α_{i}^{'} + β_{i}^{'}}}

, we had

U C B_{V} = U C B_{A} = U C B_{T} = 0.6667 + \sqrt{\frac{2}{3}} \approx 1.483

.

Since all the stimuli achieved the same UCB score, a predefined heuristic (e.g., selecting the most recently successful stimulus) would determine the next action if necessary.

Child 2 (ADHD)

Child 2, diagnosed with ADHD, showed sharp concentration dips and more reactive emotional responses. From Table 1:

At $t = 10 s$ , EEG concentration dropped, triggering a textual cue.
At $t = 20 s$ , a negative emotion (sadness) was detected, prompting a visual cue.
From $t = 30 s$ onward, EEG concentration and emotional state improved, and no further stimuli were needed.

Assuming both visual and textual cues were effective (reward = 1), we updated their Beta parameters as

Textual: $α_{T}^{'} = 2$ , $β_{T}^{'} = 1$ ;
Visual: $α_{V}^{'} = 2$ , $β_{V}^{'} = 1$ ;
Auditory: unused $\Rightarrow α_{A}^{'} = 1$ , $β_{A}^{'} = 1$ .

Posterior means:

{\hat{θ}}_{T} = {\hat{θ}}_{V} = \frac{2}{3} = 0.6667

, and

{\hat{θ}}_{A} = \frac{1}{2} = 0.5

.

UCB values:

U C B_{T} = U C B_{V} = 0.6667 + \sqrt{2 / 3} \approx 1.483

,

U C B_{A} = 0.5 + \sqrt{2 / 2} = 1.5

.

Although auditory cue had a higher UCB (due to fewer trials and greater uncertainty), the system favored proven stimuli unless high uncertainty justified exploration. As visual feedback successfully stabilized engagement at

t = 20

s, the system maintained it as the best cue for further use.

This validates the convergence behavior of BIFL, where effective cues gain higher confidence and become the preferred stimuli. Table 1 presents the short-term evaluation for both children and game adaptation using BIFL.

5.3.2. Group-Level Stimulus Distribution Analysis

While the previous case studies demonstrate individual responses to BIFL-driven stimulus adaptation, we also analyzed the stimulus distribution across the full cohort. Visual feedback emerged as the most frequently reinforced modality, selected in 62.5% of the sessions—especially during moments of low concentration. Auditory stimuli were dominant in 22.5% of the sessions, often triggered when signs of emotional disengagement were detected. Multimodal cues (combining visual and auditory feedback) accounted for the remaining 15%, typically used to maintain engagement during complex or socially oriented tasks. This distribution underscores the importance of personalized stimulus adaptation, as BIFL dynamically selects cues that align with each child’s engagement profile in real time.

5.3.3. Overall Evaluation of BIFL with 40 Children

Figure 10 summarizes the impact of BIFL-driven adaptation over four weeks of interaction with 40 neurodivergent children. The results reveal notable improvements in concentration, emotional engagement, and game performance.

Among the 25 children with ADHD, auditory-based stimuli (e.g., verbal hints and feedback) were particularly effective, leading to a 25.3% increase in concentration and a 22.3% boost in emotional engagement. Tasks involving memory and reasoning showed the highest concentration gains (30% and 28%, respectively).

For the 15 children with ASD, visual and structured multimodal cues yielded the best outcomes. These participants exhibited a 16.7% improvement in concentration and a 27.3% increase in emotional engagement. Visual cues were especially impactful during memory tasks (emotional gain of 35%), while structured textual and visual prompts proved most beneficial in social interaction tasks.

In terms of objective performance, ADHD participants demonstrated consistent gains across all game types, particularly in social interaction games, with improvements of up to 62%. ASD participants also improved, especially in social tasks where visual feedback and dynamic guidance helped maintain focus and reduce disengagement.

Overall, BIFL adaptation led to a 22.4% increase in concentration, a 24.8% boost in emotional engagement, and an average 32.1% improvement in game scores, validating the effectiveness of real-time, personalized feedback mechanisms for enhancing cognitive and emotional outcomes.

We investigated which type of stimulus most effectively improved engagement within each population. As shown in Figure 11, visual stimuli produced the highest engagement success in the ASD group (50%), consistent with the literature, suggesting improved visual processing in ASD children. In contrast, ADHD children responded more positively to auditory stimuli (45%), possibly due to their greater sensitivity to auditory cues. These results reveal the following:

ADHD children responded best to auditory stimuli, likely due to their increased auditory attention and responsiveness to external cues.
ASD children achieved higher engagement with visual stimuli, supporting the literature that visual processing is often stronger in individuals with ASD.
BIFL effectively learned to adapt to individual responses, optimizing the mix of modality and stimulus timing through dynamic weighting.

5.4. Engagement Scores Analysis

Figure 12 presents the average normalized modality weights over time, aggregated across 40 neurodivergent children during three different types of games, each with a 3 min duration. The results indicate that the game score achieved by the children during the gameplay maintained the highest weight throughout most of the session, closely followed by the EEG concentration, with a maximum difference typically under 0.1. This close proximity suggests a strong joint contribution of both modalities in estimating engagement, where the game score reflects direct performance feedback and EEG captures underlying cognitive attention levels.

Interestingly, peaks in EEG concentration often preceded increases in game score by a few seconds, indicating that higher concentration may anticipate better gameplay outcomes. This subtle lag reinforces the temporal dynamics between cognitive state and performance, aligning with expected neurocognitive responses in adaptive learning contexts. Facial expression consistently held the lowest weight, although it showed small increases that followed game score peaks with a slight delay. This behavior likely reflects the reduced expressiveness common in neurodivergent populations such as ASD and ADHD, where emotional cues are less explicit but still present and reactive under certain conditions. The lower discriminative power of facial expressions is thus mitigated by assigning them less influence in the final engagement score while still preserving their contribution in the multimodal model. Overall, this averaged modality profile illustrates a coherent and realistic weighting strategy: performance (game score) and cognitive attention (EEG) dominate engagement estimation, while facial emotion recognition supports additional context with lower but non-negligible importance. This insight reinforces the value of dynamically adjusting the modality weights based on observed temporal responses, rather than relying on fixed assumptions.

To understand how the engagement evolved throughout the gameplay, Figure 13 presents the average multimodal engagement scores calculated from the ASD and ADHD groups over a 3 min interaction involving three different games. The engagement score, ranging from 1 to 5, was dynamically computed using a weighted fusion of EEG concentration, facial expressions, and game performance metrics.

The ADHD group consistently demonstrated higher levels of engagement throughout most of the session, particularly in the early phase (0–60 s) and again between 85–130 s, with pronounced peaks around 20–30 s and 90–100 s. These peaks align with BIFL’s adaptive use of auditory and high-frequency visual stimuli, which appear particularly effective for individuals with ADHD—likely due to their increased responsiveness to dynamic sensory input. In contrast, the ASD group exhibited a more gradual engagement trajectory, with a modest local peak between 70–80 s where their engagement briefly approached that of the ADHD group. This suggests that, for many participants with ASD, engagement may require longer exposure to structured stimuli before meaningful responses are observed. Toward the end of the session (150–180 s), the engagement scores for both groups converged, indicating stable attention once consistent and individualized stimuli were provided.

It is also important to consider the difference in sample size between groups (ADHD: 25 participants, ASD: 15 participants), which may introduce variability in engagement metrics. Naturally, larger sample sizes introduce greater variability and offer more robust insights.

Nevertheless, our experimental results demonstrate that the proposed BIFL framework, combined with multimodal engagement scoring, performs effectively in real-time, real-world scenarios. These promising findings suggest that the approach is both scalable and adaptable to larger and more diverse populations. Overall, the average engagement score across the 3 min session with three games was approximately 3.9 for ADHD and 3.7 for ASD. This small difference highlights the effectiveness of the BIFL framework in supporting engagement for both groups, helping to reduce disparities through adaptive and personalized interaction strategies. The dynamic fluctuations and recovery patterns in engagement scores observed in both cohorts underscore the value of real-time stimulus adaptation in maintaining attention and promoting emotional regulation in neurodivergent individuals.

5.5. Impact Assessment and Statistical Evaluation

To evaluate the impact of BIFL-based adaptations, we conducted a statistical analysis using the following:

Paired t-tests ( $t (39)$ ) to assess whether the mean difference between pre- and post-intervention conditions was statistically significant.
We used p-values to estimate the probability that observed differences could have occurred by chance, with $p < 0.05$ considered statistically significant.
Cohen’s d to quantify the effect size, indicating the magnitude of the difference. The effect sizes were interpreted as follows: around 0.2 (small), 0.5 (medium), and above 0.8 (large).

Our results are summarized in Figure 14.

The findings demonstrate significant improvements across cognitive, emotional, and task performance metrics:

Ease of play: Improved from 4.2 to 4.6 on a 5-point scale (self-reported questionnaire) ( $t (39) = 8.94$ , $p < 0.001$ , $d = 1.41$ ).
Task completion time: Decreased by 30%, reflecting improved efficiency.
Negative emotions: Decreased by 20% compared to the baseline in the first week ( $t (39) = 8.94$ , $p < 0.001$ , $d = 1.41$ ).
Positive emotional responses: Increased by 25% ( $t (39) = 10.54$ , $p < 0.001$ , $d = 1.67$ ).
Memory task performance:
−
Improved by 18% for ASD participants ( $t (39) = 8.94$ , $p < 0.001$ , $d = 1.41$ ).
−
Improved by 30% for ADHD participants ( $t (39) = 8.94$ , $p < 0.001$ , $d = 1.41$ ).
Reasoning task performance:
−
Improved by 22% for ASD participants ( $t (39) = 8.94$ , $p < 0.001$ , $d = 1.41$ ).
−
Improved by 28% for ADHD participants ( $t (39) = 8.94$ , $p < 0.001$ , $d = 1.41$ ).
Speed of memory task completion: Increased by 22% ( $t (39) = 8.94$ , $p < 0.001$ , $d = 1.41$ ).

These findings confirm that BIFL enhances neuro-affective engagement by enabling real-time, personalized stimulus selection tailored to user performance and emotional responses.

6. Conclusions and Future Work

This paper introduces a neuro-adaptive framework, Bayesian Immediate Feedback Learning (BIFL), designed to improve cognitive and emotional engagement in neurodivergent children through multimodal, real-time feedback. By integrating EEG-based concentration assessment, facial emotion recognition, and game performance, the system dynamically selects and delivers customized stimuli—visual, auditory, or textual—during interactive gameplay. Our experimental results involving 40 children (25 with ADHD and 15 with ASD) over a four-week intervention revealed significant improvements: concentration increased by 22.4%, emotional engagement by 24.8%, and task performance by 32.1%. Statistical analysis confirmed the robustness of these gains (

p < 0.001

, Cohen’s

d > 1.4

).

Importantly, the ADHD group responded most positively to auditory stimuli, while the ASD group showed increased engagement with structured visual cues. The BIFL framework successfully adapted its stimulus strategy over time, supported by uncertainty-aware weighting of input modalities and real-time engagement scoring. These findings demonstrate that personalized neuro-affective interaction can support cognitive training in diverse developmental profiles and holds promise for scalable therapeutic applications.

Limitations and Future Directions

This study presents a promising neuro-adaptive framework using BIFL to enhance engagement and cognitive performance in children with neurodevelopmental disorders. However, some limitations must be acknowledged.

Firstly, the participant group was limited to children aged 6–10 years. Emotional needs and attention regulation can vary significantly across developmental stages, and future studies should assess generalizability to younger and older populations.

Secondly, EEG signals were acquired using the Muse headband, which, while portable and child-friendly, offers only four dry electrodes and is prone to motion artifacts—particularly relevant for children with ADHD. Although we applied preprocessing and artifact-removal techniques, future work should explore more robust denoising pipelines for mobile EEG.

Thirdly, the facial expression recognition model was initially trained on adult facial datasets (e.g., KDEF, Emotional dataset), which may not fully reflect the emotional expression patterns of children. Since children often display more exaggerated or subtler expressions, this mismatch may introduce bias. While we mitigated this through geometric feature extraction and EEG-assisted disambiguation, future work should incorporate child-specific facial datasets for improved accuracy.

Fourthly, the EEG model was pre-trained on adult data and adapted using a short (20 s) per-child calibration phase. Although this enabled real-time personalization, it may limit the model’s ability to capture finer individual variability. Future research could benefit from integrating larger child-specific EEG datasets and employing domain adaptation, data augmentation (e.g., noise injection, waveform synthesis), or adversarial learning to improve generalization.

Moreover, the intervention lasted four weeks, so long-term outcomes such as sustained attention gains, user fatigue, and habituation were not explored. Longitudinal studies are needed to assess the durability of cognitive and emotional improvements over time.

All participants were from Brazilian schools, which may limit cultural generalizability. Future cross-cultural validation can help ensure broader applicability of the BIFL system.

Finally, although BIFL was effective in real-time adaptation, it would be valuable to benchmark its performance against Reinforcement Learning (RL) models and Large Language Model (LLM)-based personalization strategies. While real-time clinical contexts pose challenges for RL, due to the need for large-scale offline training and consistent reward structures, simulated environments with synthetic behavioral trajectories could enable safe and meaningful comparison in future studies.

Author Contributions

Methodology, D.R.F.; Software, D.R.F.; Validation, D.R.F.; Investigation, D.R.F. and P.P.d.S.A.; Writing—original draft, D.R.F.; Writing—review and editing, D.R.F. and P.P.d.S.A.; Supervision, D.R.F. and P.P.d.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive direct external funding. However, the first author was supported by mobility grants for conducting pilot studies in Brazil, funded by the Newton Fund (UK), CONFAP (Brazil), and IEEE RAS-SIGHT.

Institutional Review Board Statement

The study was conducted in direct collaboration with the Municipal Department of Education of Cambé, in the state of Paraná. The process did not involve a local university; therefore, there was no internal university ethics committee involved.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Due to the sensitive nature of the data involving children, and in compliance with GDPR and institutional ethical guidelines, the dataset is not publicly available. Access to data may be considered upon reasonable request and subject to additional ethical approval.

Acknowledgments

The authors gratefully acknowledge the Municipal Department of Education of Cambé, Paraná, Brazil, for their collaboration and permission to conduct research in primary schools in the region. Their support and authorization were essential for carrying out the experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. Autism Spectrum Disorders and Other Developmental Disorders: Fact Sheet; WHO: Geneva, Switzerland, 2023; Available online: https://www.who.int/news-room/fact-sheets/detail/autism-spectrum-disorders (accessed on 6 April 2025).
Francillette, Y.; Boucher, E.; Bouchard, B.; Bouchard, K.; Gaboury, S. Serious games for people with mental disorders: State of the art of practices to maintain engagement and accessibility. Entertain. Comput. 2021, 37, 100396. [Google Scholar] [CrossRef]
Grossard, C.; Grynspan, O.; Serret, S.; Jouen, A.-L.; Bailly, K.; Cohen, D. Serious games to teach social interactions and emotions to individuals with autism spectrum disorders. Comput. Educ. 2017, 113, 195–211. [Google Scholar] [CrossRef]
Hassan, A.; Pinkwart, N.; Shafi, M. Serious games to improve social and emotional intelligence in children with autism. Entertain. Comput. 2021, 38, 100417. [Google Scholar] [CrossRef]
Saleme, P.; Pang, B.; Parkinson, J. Design of a Digital Game Intervention to Promote Socio-Emotional Skills and Prosocial Behavior in Children. Multimodal Technol. Interact. 2021, 5, 58. [Google Scholar] [CrossRef]
Moreno, G.; Moreira, F.; Collazos, C.; Fardoun, H. Recommendations for the design of inclusive apps for the treatment of autism: An approach to design focused on inclusive users. In Proceedings of the Iberian Conference on Information Systems and Technologies, Seville, Spain, 24–27 June 2020. [Google Scholar]
Alabdulakareem, E.; Jamjoom, M. Computer-assisted learning for improving ADHD individuals’ executive functions through gamified interventions: A review. Entertain. Comput. 2020, 33, 100341. [Google Scholar] [CrossRef]
Friedrich, E.V.C.; Suttie, N.; Sivanathan, A.; Lim, T.; Louchart, S.; Pineda, J. Brain–computer interface game applications for combined neurofeedback and biofeedback treatment for children on the autism spectrum. Front. Neuroeng. 2014, 3, 21. [Google Scholar]
Liarokapis, F.; Debattista, K.; Vourvopoulos, A.; Panagiotis, P.; Ene, A. Comparing interaction techniques for serious games through brain–computer interfaces: A user perception evaluation study. Entertain. Comput. 2014, 5, 391–399. [Google Scholar] [CrossRef]
Rajabi, S.; Pakize, A.; Moradi, N. Effect of combined neurofeedback and game-based cognitive training on the treatment of ADHD: A randomized controlled study. Appl. Neuropsychol. 2019, 9, 193–205. [Google Scholar] [CrossRef] [PubMed]
Bonab, H.S.; Sani, S.E.; Behzadnia, B. The Impact of VR Intervention on Emotion Regulation and Executive Functions in Autistic Children. Games Health J. 2023, 14, 146–158. [Google Scholar] [CrossRef] [PubMed]
Faria, D.R.; Bird, J.; Daquana, C.; Kobylarz, J.; Ayrosa, P.P.S. Towards AI-based Interactive Game Intervention to Monitor Concentration Levels in Children with Attention Deficit. Int. J. Inf. Educ. Technol. 2020, 10, 641–648. [Google Scholar] [CrossRef]
Vortmann, L.-M.; Ceh, S.; Putze, F. Multimodal EEG and Eye Tracking Feature Fusion Approaches for Attention Classification in Hybrid BCIs. Front. Comp. Sci. 2022, 4, 780580. [Google Scholar] [CrossRef]
Soysa, A.; Mahmud, A. Tangible Play and Children with ASD in Low-Resource Countries: A Case Study. In Proceedings of the Fourteenth International Conference on Tangible, Embedded, and Embodied Interaction (TEI ’20), Sydney, Australia, 9–12 February 2020. [Google Scholar]
Ahire, N.; Awale, R.N.; Wagh, A. Electroencephalogram (EEG) based prediction of attention deficit hyperactivity disorder (ADHD) using machine learning. Cogn. Neurodyn. 2023, 32, 966–977. [Google Scholar] [CrossRef] [PubMed]
Capelo, D.C.; Sánchez, M.E.; Hurtado, J.S.; Chicaiza, D.B. Multisensory Virtual Game with Use of the Device Leap Motion to Improve the Lack of Attention in Children of 7–12 Years with ADHD. In Proceedings of the International Conference on Information Technology & Systems (ICITS 2018); Springer International Publishing: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Timaná, L.C.R.; García, J.F.C.; Filho, T.B.; González, A.A.O.; Monsalve, N.R.H.; Jimenez, N.J.V. Use of Serious Games in Interventions of Executive Functions in Neurodiverse Children: Systematic Review. JMIR Serious Games 2024, 12, e59053. [Google Scholar] [CrossRef] [PubMed]
Manoharan, G.; Faria, D.R. Enhanced Mental State Classification using EEG-based Brain-Computer Interface through Deep Learning. In Proceedings of the IntelliSys’24: 10th Intelligent Systems Conference 2024, Amsterdam, The Netherlands, 5–6 September 2024. [Google Scholar]
Venugopal, A.; Faria, D.R. Boosting EEG and ECG Classification with Synthetic Biophysical Data Generated via Generative Adversarial Networks. Appl. Sci. 2024, 14, 10818. [Google Scholar] [CrossRef]
Lundqvist, D.; Flykt, A.; Öhman, A. The Karolinska Directed Emotional Faces—KDEF [online]; Karolinska Institutet. Available online: https://kdef.se/home/aboutKDEF (accessed on 8 October 2024).
Bird, J.J.; Manso, L.; Ribeiro, E.P.; Ekart, A.; Faria, D.R. A Study on Mental State Classification using EEG-based Brain-Machine Interface. In Proceedings of the 2018 International Conference on Intelligent Systems (IS), Funchal, Portugal, 25–27 September 2018. [Google Scholar]
Faria, D.R.; Vieira, M.; Faria, F.C.C.; Premebida, C. Affective Facial Expressions Recognition for Human-Robot Interaction. In Proceedings of the IEEE RO-MAN’17: IEEE International Symposium on Robot and Human Interactive Communication, Lisbon, Portugal, 28 August–1 September 2017. [Google Scholar]

Figure 1. The proposed system integrates multisensory biophysical data to assess neuro-affective states in real time. Adaptive interventions are delivered through a game interface, generating multiple stimuli to improve engagement.

Figure 2. Memory task interface: children must find matching pairs among animals, objects, or geometrical shapes. Score, time, and engagement stimuli are dynamically adjusted via BIFL.

Figure 3. Reasoning task interface: includes logic puzzles and mazes with incremental difficulty. Feedback is tailored to the child’s affective and cognitive states.

Figure 4. Social skill task interface: players choose the appropriate response or emotional label. Scenarios include greetings and emotion identification tasks.

Figure 5. Experimental setup: Children interacted with games on a laptop while wearing an EEG headband to monitor brain activity. Facial expressions were recorded using the laptop’s built-in camera for real-time affective analysis.

Figure 6. Training and test results on the Mental States EEG dataset, without data augmentation. The classifier achieved 92% accuracy on the test set.

Figure 7. Comparison of real and synthetic EEG signals generated by the WGAN-GP, showing waveform and spectral similarities. Top left image: synthetic signal of concentration. Top right: real concentration. Bottom left and right: synthetic and real wave form of concentration, respectively.

Figure 8. EEG Transfer Learning: performance of models from adult on child data and vice versa. Classification using only 20 s of calibrations shows reliable classification.

Figure 9. Real-time facial expression recognition during CGI tasks, alongside corresponding EEG-based brain activity. These examples illustrate on-the-fly emotion and concentration classification, with our models achieving an average accuracy of 89% across the collected samples. Top row: Detected happy facial expression with elevated beta activity, indicating a concentrated and engaged cognitive state. Bottom row: Detected neutral expression with dominant delta/theta bands, reflecting a low concentration or resting state.

Figure 10. Overall results of BIFL-based adaptation for 40 children. Improvement in concentration, emotional response, and game performance for ADHD and ASD groups across memory, reasoning, and social interaction tasks.

Figure 11. Stimulus engagement success rates for ADHD and ASD. Visual stimuli proved most effective for ASD children, while auditory stimuli were most impactful for ADHD children.

Figure 12. Average modality weights over time (normalized). The BIFL system dynamically adjusted importance based on modality consistency. Game scores were consistently most informative, followed closely by EEG, with facial expressions weighted least.

Figure 13. Average engagement scores over time for ASD and ADHD groups. ADHD group maintained slightly higher engagement throughout. Peaks suggest optimal alignment of stimulus type and concentration periods.

Figure 14. Statistical significance of neuro-affective improvements across children. Bars indicate changes for different measures, with positive improvements in blue and reductions in red. Each bar includes p-value and Cohen’s d effect size, showing enhancements across tasks.

Table 1. Cognitive and emotional states and engagement scores for children with ADHD and ASD.

Time (s)	Group	EEG State	Emotion	Game Score	Eng. Score	Change	Stimulus
0	ASD	Concentrated	Neutral	Medium	4.3	-	None
0	ADHD	Neutral	Neutral	Medium	4.0	-	None
10	ASD	Concentrated	Happy	High	4.7	-	None
10	ADHD	Relaxed	Neutral	Low	3.1	EEG/Game drop	T *
20	ASD	Concentration (−15%)	Neutral	Low	3.3	EEG/Game drop	V *
20	ADHD	Neutral	Sad	Low	2.7	Negative emotion	V *
30	ASD	Neutral	Angry	Low	2.9	Negative emotion	A *
30	ADHD	Concentrated	Neutral	Medium	3.8	Improved	None
40	ASD	Relaxed	Neutral	Medium	3.2	EEG drop	T *
40	ADHD	Concentrated	Happy	High	4.8	Improved	None
50	ASD	Concentrated	Neutral	Medium	4.3	Improved	None
50	ADHD	Concentrated	Happy	High	4.6	Stable	None

* A = Auditory stimulus (e.g., verbal hints, feedback); T = Textual stimulus (e.g., written hints); V = Visual stimulus (e.g., color cues).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Faria, D.R.; da Silva Ayrosa, P.P. Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children. Appl. Sci. 2025, 15, 7532. https://doi.org/10.3390/app15137532

AMA Style

Faria DR, da Silva Ayrosa PP. Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children. Applied Sciences. 2025; 15(13):7532. https://doi.org/10.3390/app15137532

Chicago/Turabian Style

Faria, Diego Resende, and Pedro Paulo da Silva Ayrosa. 2025. "Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children" Applied Sciences 15, no. 13: 7532. https://doi.org/10.3390/app15137532

APA Style

Faria, D. R., & da Silva Ayrosa, P. P. (2025). Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children. Applied Sciences, 15(13), 7532. https://doi.org/10.3390/app15137532

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children

Abstract

1. Introduction

2. Related Works

Beyond the State-of-the-Art

3. Methods

3.1. EEG-Based Concentration Level Detection

EEG Data Augmentation Using WGAN-GP

3.2. Facial Expression Recognition

3.3. Theoretical Foundations of Proposed Bayesian Immediate Feedback Learning (BIFL)

3.3.1. Problem Definition

3.3.2. Bayesian Convergence of BIFL

3.3.3. Exploration–Exploitation Trade-Off in BIFL

3.3.4. Stimulus Selection and Decision-Making

3.3.5. Incremental Learning in BIFL

3.4. Multimodal Engagement Score Computation

3.5. Game Interface Design Using Unity Engine for Neuro-Affective Training

4. Experimental Design

Ethics and Data Privacy

5. Results

5.1. Brainwave Processing and Classification

Real-Time Testing and Model Generalization

5.2. Facial Expression Recognition

Real-Time Performance and Generalization

5.3. Experimental Results on Game Adaptation: BIFL

5.3.1. Short-Term Individual BIFL Experiment Example

Child 1 (ASD)

Child 2 (ADHD)

5.3.2. Group-Level Stimulus Distribution Analysis

5.3.3. Overall Evaluation of BIFL with 40 Children

5.4. Engagement Scores Analysis

5.5. Impact Assessment and Statistical Evaluation

6. Conclusions and Future Work

Limitations and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI