Multi-Agent Reinforcement Learning Model Simulation for Attention-Deficit Hyperactivity Disorder Children

Namasse, Zineb; Hidila, Zineb; Tabaa, Mohamed; Elhaddadi, Mounia; Mouchawrab, Samar

doi:10.3390/app16042158

Open AccessArticle

Multi-Agent Reinforcement Learning Model Simulation for Attention-Deficit Hyperactivity Disorder Children

by

Zineb Namasse

^1,2,

Zineb Hidila

¹,

Mohamed Tabaa

^1,*

,

Mounia Elhaddadi

³ and

Samar Mouchawrab

²

¹

Multidisciplinary Laboratory of Research and Innovation, Moroccan School of Engineering Sciences, Casablanca 20250, Morocco

²

Research, Development, and Innovation Laboratory, Mundiapolis University, Casablanca 20180, Morocco

³

Laboratory of Nutrition, Health and Environment, Ibn Tofail University, Kenitra 14000, Morocco

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 2158; https://doi.org/10.3390/app16042158

Submission received: 18 December 2025 / Revised: 29 January 2026 / Accepted: 31 January 2026 / Published: 23 February 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Background: A child with Attention-Deficit Hyperactivity Disorder (ADHD) faces two issues: inattention and hyperactivity/impulsivity. These two symptoms make the child’s life more challenging compared to non-ADHD individuals. Therefore, one of the steps toward better quality of life involves cooperation with and contact with the environment to better address this condition. Thanks to Artificial Intelligence (AI), doctors, caregivers, and parents are increasingly better able to understand the hardships these children face. One AI technique is Reinforcement Learning (RL). Methods: We propose an RL model simulation with 44 child agents with or without ADHD, using the Independent Deep Q Network (IDQN), Value Decomposition Network (VDN), and QMIX algorithms. Results: Comparing the results obtained with these three algorithms, children with ADHD find it more challenging to choose the maximum rewards than neurotypical children (395 at episode 300 for non-ADHD compared to 340 at episode 120 for ADHD using IDQN, 69 from episode 90 for ADHD compared to 82 for non-ADHD via VDN, and 31 at episode 110 for ADHD versus 28 at episode 110 for non-ADHD with QMIX). Conclusions: The simulated ADHD agents struggle to aim for the maximum rewards as much as neurotypical children.

Keywords:

reinforcement learning; attention-deficit hyperactivity disorder; reinforcement learning with multiple agents; cooperation

1. Introduction

Attention-Deficit Hyperactivity Disorder, or ADHD, is a neurodevelopmental disorder that affects children more frequently than adults [1]. Two main symptoms characterize this condition: inattention and hyperactivity/impulsivity. People with this disorder have difficulty staying focused on a task for as long as neurotypical individuals (TD), are easily distracted, quickly lose track of the main idea, move in all directions, have trouble waiting their turn, and exhibit many other behaviors [1]. ADHD affects boys more frequently than girls (15% versus 8%) [2]. Some researchers and scientists confirm that this disorder can be associated with other illnesses. For example, in China, 17% of children with ADHD suffer from anxiety disorders [3]. Moreover, since COVID-19, people affected by this disorder have been feeling more isolated than those without ADHD [4]. Furthermore, a meta-analysis has confirmed that children with ADHD are 17.3 times more likely to be depressed [5]. According to [6], 70% of children with ADHD suffer from sleep disorders. In addition, 40–50% of children with ADHD have speech and/or language impairments [7]. According to a systematic review, approximately 30% of children with ADHD are epileptic [8]. Furthermore, according to a study evaluating dysgraphia in Arabic, 50% of children with ADHD have writing problems [9]. On the other hand, 20% of children with ADHD also have eating disorders [10]. Another study showed that 17 to 48% of children with ADHD injure themselves and fall more often than those without ADHD [11]. On the other hand, several researchers have noted the difficulties these children experience. For example, Ref. [12] noted that children with ADHD have a lower quality of life, lower levels of happiness, and higher levels of sibling bullying compared to children without ADHD. Ref. [13] found that this group is more prone to academic failure, social rejection, low self-esteem, emotional dysregulation, risks of delinquency in adulthood, and many other challenges. Ref. [14] demonstrated, using a 2-year follow-up, that one of the reasons why children and adolescents with ADHD have a more complicated school life is due to an alteration in a brain region called Executive Function (EF). On this basis, numerous Artificial Intelligence (AI) techniques have been employed to support these children. Among these techniques, we can mention Machine Learning (ML), Deep Learning (DL), Reinforcement Learning (RL), and many others. In this work, we propose a Multi-Agent Reinforcement Learning simulation in an ADHD context and adapt and compare three value-based algorithms in an inspired but not clinical environment. The remainder of the paper is organized as follows: Section 2 presents tutorials that use AI in ADHD research. Section 3 discusses the principles of Reinforcement Learning and its extension to multiple agents (MARL). Section 4 proposes a methodology based on Reinforcement Learning techniques for ADHD. Section 5 opens discussion and results, and Section 6 concludes the paper.

2. Related Works

Several Artificial Intelligence (AI) techniques have contributed to supporting this research.

Monitoring and understanding cognitive and behavioral characteristics: Ref. [15] proposed an extended reality solution to understand and support children with ADHD through 3D graphics, immersive games, and various other tools, utilizing multifaceted data such as eye movements, voice, and other factors. Ref. [16] proposed a simple fixation test using a tablet and the GraphPad Prism AI software to measure attention and gaze toward and away from a target. They found that, during a simple fixation, children with ADHD had more saccades, meaning they moved more abruptly and more frequently than children without ADHD, and that, during anti-saccade tests, they required more correction time to orient towards their target than non-ADHD children. As for [17], authors employed techniques from Machine Learning (ML), Deep Learning (DL), Random Forest, Temporal Convolutional Network (TCN), and Adaptive Control of Thought-Rational (ACT-R) to better understand the cognitive processes of individuals with ADHD, ultimately aiming to improve diagnosis and treatment. Their model achieved 98.21% accuracy and 93.86% recall. Other researchers have focused on the reasons why creativity in children with ADHD is higher than in typically developing children. For example, Ref. [18] conducted a pilot test using four stimulating environments, varying colors (from lighter to darker) and sounds (from instrumental to background). The results show that changes in these two factors significantly affect the children’s creativity.

Other researchers have focused more on Reinforcement Learning (RL) techniques in the context of ADHD.

Assisted treatment and intervention with Reinforcement Learning: Ref. [19] proposed a framework that first applied the principles of Universal Design for Learning (UDL) and the RL technique Q-Learning to teach skills to a student with ADHD, and then recommended learning tools and stimuli based on the student’s development. To achieve this, they treated the student’s behavior as the environment, the state as the schema presented to the student, and the action as the agent’s response to the environment, along with a Q-table to store rewards. They then combined RL with UDL, which involves implementing a range of means for representation, engagement, and expression. Representation is divided into four levels: reading and writing, visual, auditory, and kinesthetic. Expression, linked to representation, includes two stages: evaluation and interaction, each with four sub-stages. For evaluation, there is a final reading and writing exam, a quiz-style test for the visual and auditory parts, and embedded questions and quizzes for the kinesthetic part. Regarding interaction, there are links in the reading–writing and visual sections, forums and debates for the auditory section, and debates and interactions between teachers and students for the kinesthetic section. Engagement has two levels: the number of course units and their navigation, which is divided into four sub-steps: basic chapters and courses, horizontal sections, horizontal and vertical subsections, short clips, and customized programs. The results show improved student performance when combining RL with UDL. Ref. [20] developed an application and a toy using Q-learning to better regulate the emotions of children with ADHD. Using a heart rate frequency bracelet, several assessments were conducted, and data were collected to evaluate the impact on stress reduction and user engagement. The methodology was as follows: first, to better understand ADHD, the authors conducted qualitative and quantitative research. Next, they developed a framework comprising three fundamental components of ADHD emotional regulation: regulation methods, children, and ADHD itself. Then, they integrated Q-Learning to store rewards in a Q-table, along with heart rate data collected by the heart rate monitor. They also designed a small toy with sensory sensors. Finally, they developed an application that provides access to heart rate data, meditation sessions, and other features. The objective of [21] was to improve health interventions for individuals with ADHD by using a Reinforcement Learning (RL) technique, SARSA, and computational persuasion (CP) to develop a tool for adaptive, personalized computerized cognitive training. They set three missions: to verify the effectiveness of RL-based cognitive training, to explore advanced RL and CP techniques, and to conduct a clinical trial to estimate the impact of AI-driven cognitive training (CCT) on cognitive functions. Ref. [22] aimed to create a multimodal framework with Deep Learning (DL) and RL to improve accuracy and decision-making. To achieve this, they used a multimodal approach that integrated data types such as images, text, and sound. Next, they applied Transfer Learning to enhance object recognition performance. Subsequently, they employed RL techniques such as DQN, PPO, and A3C to enhance decision-making. They concluded that PPO and A3C achieved 90.2% accuracy. Ref. [23] used value-based Reinforcement Learning techniques to obtain the internal value and decision policy for 54 participants (28 with ADHD and 26 without) who were presented with a challenge comprising three tasks. For example, the task was to find an image of a puppy. In this task, there are two houses; each house contains two rooms, and each room contains two chests, with a dog image in one of them. The researchers observed that individuals with ADHD had more difficulty choosing the correct option and took longer to reflect than neurotypical individuals during the initial task.

Personalization and decision-making with Machine Learning and Reinforcement Learning: Ref. [24] proposed a combination of Machine Learning and Reinforcement Learning to better differentiate between Autism Spectrum Disorder (ASD), the combined ASD and Attention-Deficit Hyperactivity Disorder (ADHD), and neurotypicals, as well as to enhance and optimize feature extraction in the model. For the Machine Learning part, they used a multi-task convolutional neural network (CNN) model that takes as input a dataset from two sources: (1) auxiliary tasks distinguishing between ASD and non-ASD and (2) the target task differentiating ASD from ADHD. The features from these two steps are extracted separately, with some shared and some private, and then combined according to the CNN model’s requirements. Regarding the Reinforcement Learning component, Q-Learning was used to optimize the model. In this step, an agent selects an action, which is a layer of the network, by consulting the Q-table according to the current state or by selecting a random action and adding the action to the model. Then, the model is trained and validated based on the number of layers, and the reward is returned to the Q-table for updating. When the number of layers reaches its limit, the model restarts. The results showed that combining these two techniques (ML and RL) increased performance by 11.07%.

3. Background

Reinforcement Learning (RL) is a subfield of Machine Learning (ML) [25] that, unlike this discipline, learns from a sequence of actions and rewards/punishments in an environment [26]. The two main pillars of RL are Professors Richard Sutton and Andrew Barto [27]. In RL, the most essential components are the agent, which represents any entity in an RL problem, the environment, the setting where the agent operates, the states, which represent the agent’s situations, the actions, which are what the agent can perform as tasks in its environment, the rewards or penalties, which the agent receives as feedback when performing a positive or negative action, and the policy, which helps the agent choose the best actions to maximize its rewards [28,29].

Single-agent Reinforcement Learning (RL) can be subdivided into K-Armed Bandit methods [30] and Markov Decision Processes (MDPs). MDP is a category of RL divided into two types: Model-Free (e.g., Monte Carlo methods or Temporal Difference methods) and Model-Based (e.g., Dynamic Programming) [28,30]. The Model-Free category itself is subdivided into two subcategories: value-based (Q-Learning, SARSA, or DQN [26,28,31,32,33,34,35]) and policy-based (REINFORCE [36,37] and actor–critic [38,39,40,41]). The Multi-Agent approach (MARL), on the other hand, involves multiple agents that act on environmental states and receive individual or collective rewards or penalties. MARL requires more complex models [42]. Figure 1 below shows a diagram related to MARL.

MARL models can be classified as normal or stochastic games. The Normal-Form Game, like the K-Armed Bandit, involves multiple agents but only one state. It is classified by reward type as zero-sum, common-goal, or general-sum. The Stochastic Game, by contrast, uses an environment composed of multiple states that evolve according to the agents’ actions and transition functions [42]. However, MARL algorithms face some obstacles, for example, non-stationarity, in which the environment changes with each agent action, and scalability, in which the joint action space grows with the number of agents. Another challenge is credit assignment, which involves the difficulty of distributing rewards fairly and equitably [43,44], especially when one agent contributes the most or the least. Below is a figure listing most of the challenges encountered by MARL.

There are several ways to categorize MARL techniques, such as by interaction mode or training mode. There are three modes of interaction: fully cooperative (Team-Q-Learning [45], OPTQTRAN [45], QTRAN or QTRAN++ [46], or Comm-MAPPO [45]), fully competitive (minmax-Q or DP-MA2C [45]), and mixed [45]. A fundamental concept to understand in this category is the Nash equilibrium. A Nash equilibrium involves an agent believing that their strategy or policy is optimal and that they will not change it if the other agent does not alter their policy [45,47,48,49]. Regarding the mode of training, there are three categories: Decentralized Training and Decentralized Execution (DTDE) [50,51] (such as IPPO and IQL [49]), Centralized Training and Centralized Execution (CTCE) [50,51] (such as CommNet or BicNet [49]), and the hybrid category, which is Centralized Training and Decentralized Execution (CTDE) [50,51] (VDN, MAPPO, QMIX, and so on [49]). It is also possible to categorize MARL algorithms into value-based (G2Anet [52], Primary–Secondary Multi-Agent Policy Gradient [53]), policy-based (G2Anet [52], Primary–Secondary Multi-Agent Policy Gradient [53]) or both (MADDPG-M [54], MAGIC [55], DCSS [56]), or based on the reward mode (VDN [57], QMIX [58], COMA [59]), or based on communication (DIAL [60], ATOC [61], IC3Net [62], IMAC [63], LSC [64], NeurComm [65], ETCNet [66], IP [67], IS [68]), and so forth.

4. Materials and Methods

As mentioned in Section 3, algorithms can be categorized as value-based, policy-based, or both. In our case, individuals with ADHD tend to explore various options and hesitate frequently. In other words, if they are not well guided, they may make poor decisions, which can affect their future [69,70,71,72,73]. Therefore, we adopt a value-based, deterministic approach with complete observation. Among the algorithms that meet these criteria, we can mention Independent Deep Q Network (IDQN), Value Decomposition Network (VDN), and QMIX. Based on this, we can model our multi-agent Markov Decision Process (MDP) as (S, A, P, R, γ). However, it is important to note that handling an individual with ADHD requires going through multiple actions, not just a single action. Based on this, we use a deterministic Markov game approach, as shown in Figure 2.

4.1. Objective Function

For a child/adolescent with ADHD, it is tough to address both symptoms at the same time; it is better to focus on one symptom at a time, which is the goal of our objective function, written below. The weighting factor

α

represents the importance of choosing between the two symptoms. The threshold is 0.5 [74,75,76] to better balance hyperactivity and inattention, promote stable learning, and avoid bias. If alpha is less than or equal to 0.5, we focus on inattention; otherwise, on hyperactivity. The

R_{I n a t t e n t i o n}

and

R_{H y p e r a c t i v i y}

, respectively, represent the penalties for inattention and hyperactivity, which can be calculated as 1—the attention reward and 1—the activity reward (the attention and activity rewards are calculated in Section 4.6).

J = m i n (α \sum R_{I n a t t e n t i o n}, (1 - α) \sum R_{H y p e r a c t i v i t y})

(1)

4.2. Environment

The environment is designed to capture multiple behavioral and contextual factors related to ADHD. For clarity and interpretability, these factors are illustrated in four conceptual feature groupings: the first focuses on sweet foods, the second on healthy foods, the third on physical activity, and the fourth on writing. This grouping is introduced solely as an explanatory abstraction to reflect domain knowledge and does not correspond to separate vector structures in the implementation. The sweet and healthy foods in the first and second vectors are Boolean variables.

The variables are used as follows: 1 if one or many agents consumed the food and 0 otherwise. Regarding the third vector, the field for the frequency of physical activity is 0 if the individual has not performed any physical activity, 1 if the individual has performed physical activity for 1 to 3 days, 2 if the individual has performed physical activity for 4 to 6 days, and 3 if the individual has performed physical activity every day. The physical pain field is a Boolean variable equal to 0 if the individual has not been injured and 1 otherwise. For the fourth vector, the writing tilt field takes the value 0 if the tilt is straight, 1 if it is ascending, 2 if it is wavy, and 3 if it is descending. The letter deviation field has a value of 0 if the letters are written in a proper oblique manner, 1 if they are written vertically, 2 if they are written obliquely and variably, and 3 if they are written in a left oblique manner. The font size value is 0 if the font size is between 3 and 4 mm, 1 if it exceeds 5 mm, 2 if it is less than 2 mm, and 3 if the size is variable. The values of these four vector variables vary with the agents’ actions. In practice, all features are combined into a unified representation used internally by the environment. To train this environment, we use the XGBoost machine learning algorithm with a seed of 42 and an 80–20% train–test split, given the number of samples and the number of individuals per ADHD category. Then, we trained the agents using three MARL algorithms and established a benchmark in Section 4.8.

4.3. Agents

In our MARL model, we have 90 agents, some with and some without ADHD. Among them, 44 are children, and 46 are adolescents. Among these children, 14 have both symptoms of ADHD, 8 have hyperactive ADHD, 5 have inattentive ADHD, and 17 do not have ADHD. Among these 46 adolescents, 7 are both hyperactive and inattentive, 9 are hyperactive, 6 are inattentive, and 24 do not have ADHD. In our study, we focus only on children.

4.4. Actions

In our modeling, we chose actions related to food, sports, or writing that can be classified as hyperactivity, activity, inattention, or attention. For example, the first, second, fifth, and sixth actions are hyperactivity actions. The third, fourth, and seventh actions are related to activity. The eighth and ninth actions are inattention actions, and the eleventh and twelfth actions are about attention. Table 1 below describes and references these actions.

4.5. States

Since we are working with a deterministic Markovian game model, we represent the states as scalars ranging from 0 to 3. Four states are possible: a scalar value of 0 for ADHD agents with both symptoms, a value of 1 for hyperactive ADHD agents, a value of 2 for inattentive ADHD agents, or a value of 3 for agents without ADHD:

S_{s t a t e} \in [S_{0}, S_{1}, S_{2}, S_{3}]

(2)

{\begin{array}{l} I f S_{s t a t e} = S_{0}, t h e n t h e a g e n t s a r e A D H D - C \\ I f S_{s t a t e} = S_{1}, t h e n t h e a g e n t s a r e A D H D - H \\ I f S_{s t a t e} = S_{2}, t h e n t h e a g e n t s a r e A D H D - I \\ I f S_{s t a t e} = S_{3}, t h e n t h e a g e n t s a r e n o t A D H D \end{array}

(3)

4.6. Rewards

Rewards are also subdivided into two sub-rewards: one for attention and the other for activity. Table 2 below and the Equation (4) [85] detail the variables of these two sub-rewards.

{\begin{matrix} R (A t t) = D F T + (\frac{W S p e \times L S}{W S p a + 1}) \\ R (A c t) = \frac{P r o - K c a l / 100}{W C + 1} \end{matrix}

(4)

4.7. Reward Aggregation Function

G = α \sum R_{A t t e n t i o n} + (1 - α) \sum R_{A c t i v i t y}

(5)

An aggregation function is a function that aggregates all rewards collected from the initial state to the final state. In our case, when our ADHD agents move from a non-final state to a final state, their rewards are accumulated.

4.8. Algorithms

There are numerous MARL algorithms, some of which are listed in Table 3 below. In our modeling, we aim to guide our agents (children) as effectively as possible to prevent confusion during decision-making. We also aim to distribute rewards fairly among the agents. These are the reasons why we chose the IDQN, VDN, and QMIX algorithms.

IDQN: Independent Deep Q Network is a MARL algorithm in which each agent learns its own Q-function independently and treats other agents as part of the environment [86]. The pseudocode of Algorithm 1 is described below.

Algorithm 1: Independent Deep Q Network (IDQN) [42]

Inputs: values networks with parameters, target networks with parameters, replay buffers

D_{i}

, time steps T, agent number n, observation o, action a, reward r, next observation

o_{i + 1}

, transitions B, state s, loss L.

Outputs: updated target network parameters

θ_{i}

and

{\bar{θ}}_{i}

, global reward y.

Initialize n value networks with random parameters

θ_{1}, θ_{2}, \dots θ_{n}

;

Initialize n target networks with parameters

{\bar{θ}}_{1} = θ_{1}, \dots, {\bar{θ}}_{n} = θ_{n}

;

Initialize a replay buffer for each agent

D_{1}, D_{2}, \dots, D_{n}

.

For time step t = 0, 1, 2,…T do

Collect current observations

o_{t}^{1}, o_{t}^{2}, \dots, o_{t}^{n}

.

For agent i = 1 to n do

Choose an action (ex,

ϵ

-greedy).

End For

Apply actions

(a_{1}^{t}, a_{2}^{t}, \dots, a_{n}^{t})

, get rewards

r_{1}^{t}, r_{2}^{t}, \dots r_{n}^{t}

and next observations

o_{1}^{t + 1}, o_{2}^{t + 2}, \dots ., o_{n}^{t + n}

.

For agent i = 1 to n do

Store transition

(h_{i}^{t}, a_{i}^{t}, r_{i}^{t}, h_{i}^{t + 1})

in replay buffers

D_{i}

;

Sample random mini batch of B transitions

(h_{i}^{k}, a_{i}^{k}, r_{i}^{k}, h_{i}^{k + 1})

from

D_{i}

.

If

s^{k + 1}

is terminal, then

y_{i}^{k} \leftarrow r_{i}^{k}

else

y_{i}^{k} \leftarrow r_{i}^{k} + γ {m a x}_{{a^{'}}_{i} \in A_{i}} Q (h_{i}^{k + 1}, {a^{'}}_{i}; {\bar{θ}}_{i}) .

End If

L (θ_{i}) \leftarrow \frac{1}{B} \sum_{i = 1}^{B} (y_{i}^{k} - {Q (h_{i}^{k}, a_{i}^{k}; θ_{i}))}^{2}

Update parameters

θ_{i}

by minimizing the loss

L (θ_{i})

;

Update target network parameters

{\bar{θ}}_{i}

.

End For

VDN: Value Decomposition Network (VDN) is a MARL algorithm that, unlike IQL, is decentralized only in execution. Here, the joint value function is decomposed into multiple value functions depending on the number of agents. Its equation and Figure 3 are below [42,86,88].

Q_{t o t} (h, a) = \sum_{i = 1}^{n} Q_{i} (h_{i}, a_{i}; θ_{i})

(6)

The pseudocode of Algorithm 2 is shown below.

Algorithm 2: Value Decomposition Network (VDN) [42]

Inputs: values networks with parameters, target networks with parameters, shared replay buffers D, time steps T, agent number n, observation o, action a, reward r, next observation

o_{i + 1}

, transitions B, state s, loss L

Outputs: updated target network parameters

θ_{i}

and

{\bar{θ}}_{i}

, global reward y.

Initialize n value networks with random parameters

θ_{1}, θ_{2}, \dots θ_{n}

;

Initialize n target networks with parameters

{\bar{θ}}_{1} = θ_{1}, \dots, {\bar{θ}}_{n} = θ_{n}

;

Initialize a shared replay buffer for all the agents D.

For time step t = 0, 1, 2,…T do

Collect current observations

o_{t}^{1}, o_{t}^{2}, \dots, o_{t}^{n}

.

For agent i = 1 to n do

Choose an action (ex:

ϵ

-greedy).

End for

Apply actions

(a_{1}^{t}, a_{2}^{t}, \dots, a_{n}^{t})

, get the shared reward

r^{t}

and next observations

o_{1}^{t + 1}, o_{2}^{t + 2}, \dots ., o_{n}^{t + n}

.

For agent i = 1 to n do

Store transition

(h^{t}, a^{t}, r^{t}, h^{t + 1})

in the shared replay buffer D;

Sample random mini batch of B transitions

(h^{k}, a^{k}, r^{t}, h^{k + 1})

from D.

If

s^{k + 1}

is terminal, then

y^{k} \leftarrow r^{k}

else

y^{k} \leftarrow r^{k} + γ \sum_{i \in I} {m a x}_{{a^{'}}_{i} \in A_{i}} Q (h_{i}^{k + 1}, {a^{'}}_{i}; {\bar{θ}}_{i})

End If

L (θ) \leftarrow \frac{1}{B} \sum_{i = 1}^{B} (y_{i}^{k} - \sum_{i \in I} {Q (h_{i}^{k}, a_{i}^{k}; θ_{i}))}^{2}

Update parameters

θ_{i}

by minimizing the loss

L (θ)

;

Update target network parameters

{\bar{θ}}_{i}

for each agent i.

End For

QMIX is a MARL algorithm that, like VDN, uses a shared reward and only its execution is decentralized. One of its improvements is that it does not need to decompose everything but only to verify that

Q_{t o t}

and

Q_{a}

have the maximum value simultaneously. Furthermore, the value function

Q_{t o t}

not only meets these criteria but also ensures that it is not fixed and can become complex. It is enough to verify the monotonicity relation between

Q_{t o t}

and

Q_{a}

(7). Like VDN, QMIX represents each agent’s utility function via a deep neural network Q (DQN). To better capture the monotonicity of the action-value function across individual value functions, QMIX uses a neural network function

f_{m i x}

and a feedforward neural network that combines the individual value functions to approximate the centralized action-value function (8). Equations (7) and (8) and Figure 4 are written below [42,89].

\frac{\partial Q_{t o t}}{\partial Q_{a}} \geq 0, \forall a \in A

(7)

Q_{t o t} (h, a) = f_{m i x} (Q (h_{1}, a_{1}, θ_{1}), Q (h_{2}, a_{2}, θ_{2}), \dots \dots, Q (h_{n}, a_{n}, θ_{,}); θ_{m i x})

(8)

The pseudocode of Algorithm 3 is as follows.

Algorithm 3: QMIX [42]

Inputs: values networks with parameters, target networks with parameters, shared replay buffers D, time steps T, agent number n, observation o, action a, reward r, next observation

o_{i + 1}

, transitions B, state s, loss L.

Outputs: updated target network parameters

θ_{i}

and

{\bar{θ}}_{i}

, global reward y.

Initialize n value networks with random parameters

θ_{1}, θ_{2}, \dots θ_{n}

;

Initialize n target networks with parameters

{\bar{θ}}_{1} = θ_{1}, \dots, {\bar{θ}}_{n} = θ_{n}

;

Initialize a shared replay buffer for all the agents D;
Initialize hypernetwork with random parameters

θ_{h y p e r}

.

For time step t = 0, 1, 2,…T do

Collect current observations

o_{t}^{1}, o_{t}^{2}, \dots, o_{t}^{n}

.

For agent i = 1 to n do

Choose an action (ex:

ϵ

-greedy).

End for

Apply actions

(a_{1}^{t}, a_{2}^{t}, \dots, a_{n}^{t})

, get the shared reward

r^{t}

, next observations

o_{1}^{t + 1}, o_{2}^{t + 2}, \dots ., o_{n}^{t + n}

and next centralized information

z^{t + 1}

.

For agent i = 1 to n do

Store transition

(h^{t}, z^{t}, a^{t}, r^{t}, h^{t + 1}, z^{t + 1})

in the shared replay buffer D;

Sample random mini batch of B transitions

(h^{k}, z^{k}, a^{k}, r^{k}, h^{k + 1}, z^{k + 1})

from D.

If

s^{k + 1}

is terminal, then

y^{k} \leftarrow r^{k}

else

Mixing parameters

θ_{m i x}^{k + 1} \leftarrow f_{h y p e r} (z^{k + 1}; θ_{h y p e r})

y^{k} \leftarrow r^{k} γ f_{m i x} (\begin{matrix} {m a x}_{{a^{'}}_{1}} Q (h_{1}^{k + 1}, {a^{'}}_{1}; {\bar{θ}}_{1}), \\ \begin{matrix} \dots \dots; θ_{m i x}^{k + 1} \\ {m a x}_{{a^{'}}_{n}} Q (h_{n}^{k + 1}, {a^{'}}_{n}; {\bar{θ}}_{n}), \end{matrix} \end{matrix}) .

End If
Mixing parameters

θ_{m i x}^{k} \leftarrow f_{h y p e r} (z^{k}; θ_{h y p e r})

;
Value estimates

Q_{t o t} (h^{k}, z^{k}, a^{k}; θ) = f_{m i x} (Q (h_{1}^{k}, a_{1}^{k}; θ_{1}), Q (h_{2}^{k}, a_{2}^{k}; θ_{2}), \dots \dots, Q (h_{n}^{k}, a_{n}^{k}; θ_{n}); θ_{m i x}^{k})

.

L (θ) \leftarrow \frac{1}{B} \sum_{i = 1}^{B} (y_{i}^{k} - {Q (h^{k}, z^{k}, a^{k}; θ))}^{2}

Update parameters

θ_{i}

by minimizing the loss

L (θ)

;

Update target network parameters

{\bar{θ}}_{i}

for each agent i.

End For

5. Discussion and Results

ADHD is a neurodevelopmental disorder that affects a patient’s behavior from an early age and can worsen their condition if the patient is not diagnosed and treated as early as possible. Fortunately, many AI techniques have helped doctors and caregivers better treat this disease and support patients with it. One of these methods is Reinforcement Learning (RL). With this technique, it is possible to model a patient’s behavior and thus better analyze their symptoms. However, in life, a child never learns alone. They need contact and communication with their environment, including friends, parents, teachers, doctors, or children who can understand them or are in the same situation. The same applies to the RL mechanism: a single agent cannot learn in isolation; without cooperation, coordination, or communication, it would face significant difficulties adapting to a constantly changing environment. This is one of the reasons why it would be beneficial to have multiple ADHD agents to generalize their behavior better and thus provide better treatments.

The innovation of this article lies in formulating a Multi-Agent Reinforcement Learning problem and applying practical algorithms in the context of Attention-Deficit Hyperactivity Disorder.

In our modeling, we began by developing a questionnaire on the daily habits of 106 individuals, some with ADHD and some without, and distributed it to the TDAH Maroc Enfants association and local schools. We noticed that, among these individuals, 44 were children and 46 were adolescents. Since our paper mentions only children, we focused on them. We stored their responses in a dataset and, to further enrich it with attributes not directly captured in the questionnaire, we incorporated two additional datasets: one provides kilocalorie and protein information for each food item and the other contains features related to dysgraphia [90,91]. After integrating the three datasets, we observed that the first dataset has more rows than the other two. To address this issue, we trained a Generative Adversarial Network (GAN) on the second and third datasets. For data preprocessing, we employed the Synthetic Minority Over-sampling Technique (SMOTE) to balance the ADHD and non-ADHD classes. Then, with a seed of 42 to ensure fully deterministic, reproducible learning trajectories and to facilitate controlled analysis of agents, we split the dataset into a training set of 80% and a test set of 20%. We then used the XGBoost algorithm for a prediction test of the states, next states, and reward. As for the MARL algorithm hyperparameters, we chose the number of states as the input dimension, the number of actions as the output dim, a hidden dimension of either 64 or 128, a gamma value of 0.99, a start epsilon of 1, a decay epsilon of 0.995, an ending epsilon of 0.05, an episode number of 300, a maximum number of 50 steps, and a Boolean variable called done. The loop ends when the 50 steps of all the episodes are completed. Then, we represented an office environment as our setting through four vectors: one for sweet foods, one for healthy foods, one for physical activity, and one for writing. Then we identified specific actions that increase or decrease the symptoms of this disorder. When the agents perform these actions, they modify the fields of these vectors. The states are divided into four categories in scalar format: 0 for ADHD-C, 1 for ADHD-H, 2 for ADHD-I, and 3 for non-ADHD.

Regarding rewards, we divided them into two sub-rewards: one for activity and the other for attention. We then selected three value-based MARL algorithms: IQL, VDN, and QMIX. However, we limited to 44 agents as increasing the agent number would impact train stability. Although DIAL is a value-based MARL technique, it was excluded due to computational and memory constraints, as it supports multiple communication channels, which increase resource consumption. The results of these three algorithms showed that children with ADHD aim for fewer maximum rewards than children without ADHD.

Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 show that ADHD children consume pastries, soft drinks, and other sugary foods, resulting in an unbalanced diet. Additionally, ADHD children are more dysgraphic, may have an unbalanced speed of writing, and tend to be more aggressive than neurotypical children. Because the VDN and QMIX algorithms both employ centralized learning, they mitigate non-stationarity, thereby improving agent stability. The figures below show the rest of the results.

Figure 5 shows the percentages of sweet foods consumed by children with ADHD and children without ADHD. For example, 2.3% of inattentive children with ADHD eat ice cream, or 4.6% of children with ADHD and two predominant symptoms eat biscuits.

Figure 6, on the other hand, shows the percentages of consumption of certain healthy foods by children with ADHD. For example, 4.5% of children with ADHD who have both symptoms dominant eat fish, and 1.4% of inattentive children with ADHD consume dried fruits.

Figure 7 highlights the percentages of dysgraphia among children with ADHD compared to those without ADHD. We can infer from this pie chart that 7.8% of children with ADHD have both symptoms, as well as dysgraphia.

Figure 8 demonstrates the percentage of children with or without ADHD by walking speed. For example, 13.3% of children with ADHD and the two prominent symptoms walk and/or run quickly, whereas only 6.7% of hyperactive children with ADHD walk and/or run more slowly.

Figure 9 shows the categories of aggressiveness according to the type of ADHD and non-ADHD children. We can conclude that 7.8% of ADHD children with both symptoms are very aggressive, while 4.4% of inattentive ADHD children are not.

Figure 10 shows the average rewards for children with ADHD and non-ADHD using the IDQN, where those of non-ADHD children surpass those of children with ADHD (the peak for non-ADHD is 395 at episode 300, compared to 340 at episode 120 for ADHD). However, the results also show some instability. The reason for this is that when children act alone, especially children with ADHD, they are not necessarily oriented towards the maximum rewards.

Figure 11 displays the average rewards for children with and without ADHD using VDN, showing that non-ADHD children have higher average rewards than those with ADHD (the ADHD peak is 69 starting from episode 90, compared to 82 for non-ADHD). Additionally, the results show some stability. One reason is that when children coordinate, synchronize, and share a reward, their performance is more stable than when they act independently.

Figure 12 presents the average rewards of children with ADHD and without ADHD using the QMIX algorithm, where, similar to the previous figure, the average rewards of children without ADHD exceed those of children with ADHD (the peak for non-ADHD is 31 at episode 110, and for ADHD it is 28 at episode 110), except that the results demonstrate some instability. In addition to this, ADHD agents tend to be more explorative, and so their behavior is more stochastic. The use of the QMIX algorithm, which employs a nonlinear, monotonic function, may explain the variance and behavior.

6. Conclusions and Future Work

In conclusion, this paper listed some of the difficulties faced by children with ADHD, outlined guided work using Artificial Intelligence (AI) techniques to help these children, introduced the concept of Reinforcement Learning (RL) for an agent and subsequently for multiple agents (MARL). We then proposed a methodology that used three algorithms from this discipline, with our agents being children with or without ADHD. We can infer that using Multi-Agent Reinforcement Learning (MARL) algorithms in a cooperative framework can improve agents’ performance, particularly for agents with ADHD. Moreover, this demonstrates that ADHD agents can understand one another and be better treated when grouped. Other Artificial Intelligence tools may be used in the future to better support children with this neurodevelopmental disorder, such as large language models (LLMs).

Author Contributions

Conceptualization: Z.N.; Methodology: Z.N., Z.H. and M.T.; Writing—original draft preparation: Z.N.; Writing—review and editing: M.T. and Z.H.; Resources: M.E., Supervision: S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

At the time of data collection, no formal institutional review board (IRB) or national ethics committee was available at the local level. However, the study protocol was reviewed and approved by the Scientific Council of the Health and Environment Laboratory, Faculty of Sciences, Ibn Tofaïl University, which acted as the local ethics body. The study was carried out in accordance with the principles of the Declaration of Helsinki (2013) and national ethical standards for research involving minors.

Informed Consent Statement

The study involved two populations: children aged 4–10 years, recruited through the TDAH Maroc Enfants association, and adolescents aged 12–16 years, recruited from local schools. Questionnaires were distributed in Arabic to ensure comprehension and accessibility. For children, the survey was administered individually or in tiny groups within the association, under the supervision of a qualified researcher who clarified instructions when needed. At the same time, parents were present in the environment without intervening. For adolescents, the questionnaires were completed individually in school settings under the supervision of a researcher or school psychologist. Parents were informed via a dedicated WhatsApp group created by Dr. Mounia Elhaddadi, which also allowed them to ask questions or refuse participation. Written informed consent was obtained from all parents/legal guardians, and verbal assent was obtained from children and adolescents prior to participation. Participation was voluntary, anonymity and confidentiality were guaranteed, and no personally identifiable information was collected.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors acknowledge the children from the TDAH Maroc Enfants association, the adolescents from the local schools, and their parents for their participation in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADHD	Attention Deficit Hyperactivity Disorder
EF	Executive Function
TD	Typically Developing
AI	Artificial Intelligence
ML	Machine Learning
DL	Deep Learning
RL	Reinforcement Learning
MARL	Multi-Agent Reinforcement Learning
IDQN	Independent Deep Q Network
VDN	Value Decomposition Network

References

Desseilles, M.; Perroud, N.; Weibel, S. Manuel de L’hyperactivité et Du Déficit de L’attention: Le TDAH Chez L’adulte; Eyrolles: Paris, France, 2020. [Google Scholar]
CDC. Data and Statistics on ADHD. Available online: https://www.cdc.gov/adhd/data/index.html (accessed on 15 December 2025).
Liu, J.; Jiang, Z.; Li, F.; Zheng, Y.; Cui, Y.; Xu, H.; Li, Y. Prevalence and Comorbidity of Attention Deficit Hyperactivity Disorder in Chinese School-Attending Students Aged 6–16: A National Survey. Ann. Gen. Psychiatry 2025, 24, 23. [Google Scholar] [CrossRef] [PubMed]
Laslo-Roth, R.; George-Levi, S.; Rosenstreich, E. Protecting Children with ADHD against Loneliness: Familial and Individual Factors Predicting Perceived Child’s Loneliness. Personal. Individ. Differ. 2021, 180, 110971. [Google Scholar] [CrossRef]
Ingeborgrud, C.B.; Oerbeck, B.; Friis, S.; Zeiner, P.; Pripp, A.H.; Aase, H.; Biele, G.; Dalsgaard, S.; Overgaard, K.R. Anxiety and Depression from Age 3 to 8 Years in Children with and Without ADHD Symptoms. Sci. Rep. 2023, 13, 15376. [Google Scholar] [CrossRef] [PubMed]
Nguyen-Thi-Phuong, M.; Nguyen-Thi-Thanh, M.; Goldberg, R.J.; Nguyen, H.L.; Dao-Thi-Minh, A.; Duong-Quy, S. Obstructive Sleep Apnea and Sleep Disorders in Children with Attention Deficit Hyperactivity Disorder. Pulm. Ther. 2025, 11, 423–441. [Google Scholar] [CrossRef]
Parks, K.M.; Hannah, K.E.; Moreau, C.N.; Brainin, L.; Joanisse, M.F. Language Abilities in Children and Adolescents with DLD and ADHD: A Scoping Review. J. Commun. Disord. 2023, 106, 106381. [Google Scholar] [CrossRef]
He, Z.; Yang, X.; Li, Y.; Zhao, X.; Li, J.; Li, B. Attention-deficit/Hyperactivity Disorder in Children with Epilepsy: A Systematic Review and Meta-Analysis of Prevalence and Risk Factors. Epilepsia Open 2024, 9, 1148–1165. [Google Scholar] [CrossRef]
Lotfy, A.S.; Darwish, M.E.S.; Ramadan, E.S.; Sidhom, R.M. The Incidence of Dysgraphia in Arabic Language in Children with Attention-Deficit Hyperactivity Disorder. Egypt J. Otolaryngol. 2021, 37, 115. [Google Scholar] [CrossRef]
Villa, F.M.; Crippa, A.; Rosi, E.; Nobile, M.; Brambilla, P.; Delvecchio, G. ADHD and Eating Disorders in Childhood and Adolescence: An Updated Minireview. J. Affect. Disord. 2023, 321, 265–271. [Google Scholar] [CrossRef]
Alqarni, M.M.; Shati, A.A.; Alassiry, M.Z.; Asiri, W.M.; Alqahtani, S.S.; ALZomia, A.S.; Mahnashi, N.A.; Alqahtani, M.S.; Alamri, F.S.; Alqarni, M.M. Patterns of Injuries Among Children Diagnosed with Attention Deficit Hyperactivity Disorder in Aseer Region, Southwestern Saudi Arabia. Cureus 2021, 13, e17396. [Google Scholar] [CrossRef]
French, B.; Nalbant, G.; Wright, H.; Sayal, K.; Daley, D.; Groom, M.J.; Cassidy, S.; Hall, C.L. The Impacts Associated with Having ADHD: An Umbrella Review. Front. Psychiatry 2024, 15, 1343314. [Google Scholar] [CrossRef]
Banaschewski, T.; Häge, A.; Hohmann, S.; Mechler, K. Perspectives on ADHD in Children and Adolescents as a Social Construct amidst Rising Prevalence of Diagnosis and Medication Use. Front. Psychiatry 2024, 14, 1289157. [Google Scholar] [CrossRef] [PubMed]
Jensen, V.H.; Orm, S.; Øie, M.G.; Andersen, P.N.; Hovik, K.T.; Skogli, E.W. Executive Functions and ADHD Symptoms Predict Educational Functioning in Children with ADHD: A Two-Year Longitudinal Study. Appl. Neuropsychol. Child 2025, 14, 225–235. [Google Scholar] [CrossRef] [PubMed]
Byun, J.; Joung, C.; Lee, Y.; Lee, S.; Won, W. Le Petit Care: A Child-Attuned Design for Personalized ADHD Symptom Management Through AI-Powered Extended Reality. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2025; pp. 1–7. [Google Scholar]
Chen, X.; Wang, S.; Yang, X.; Yu, C.; Ni, F.; Yang, J.; Tian, Y.; Ye, J.; Liu, H.; Luo, R. Utilizing Artificial Intelligence-Based Eye Tracking Technology for Screening ADHD Symptoms in Children. Front. Psychiatry 2023, 14, 1260031. [Google Scholar] [CrossRef] [PubMed]
Yu, D.; Fang, J. hui Using Artificial Intelligence Methods to Study the Effectiveness of Exercise in Patients with ADHD. Front. Neurosci. 2024, 18, 1380886. [Google Scholar] [CrossRef]
Busby, A.; Wijetunge, M.N.R.; Jayadas, A. Promoting Creativity Among the Students with ADHD in Universities in USA: A Controlled Experiment on the Effects of Environmental Stimulus. A B E-J. 2025, 1, 1–26. [Google Scholar]
Dahan, A.; Roth, N.; Pelosi, A.D.; Reiner, M. A Reinforcement Learning Framework for Personalized Adaptive E-Learning. In Advanced Technologies and the University of the Future; Vendrell Vidal, E., Cukierman, U.R., Auer, M.E., Eds.; Lecture Notes in Networks and Systems; Springer Nature: Cham, Switzerland, 2025; Volume 1140, pp. 141–162. ISBN 978-3-031-71529-7. [Google Scholar]
Tejasvi, P.; Kumar, T. A Smart System Facilitating Emotional Regulation in Neurodivergent Children. Procedia Comput. Sci. 2024, 235, 3257–3270. [Google Scholar] [CrossRef]
Boschello, F.; Conca, A.; Donadello, I.; Giupponi, G.; Holzer, S.; Zini, F. Towards AI-Based Cognitive Training for Adult ADHD Patients. In Proceedings of the First International Conference on AI in Medicine and Healthcare (AiMH’ 2025), Innsbruck, Austria, 8–10 April 2025. [Google Scholar]
Bansal, D.; Verma, A.; Sharma, A.; Kapoor, I.; Reddy, V.; Patel, A. Improving Object Recognition and Diagnostics with Advanced Learning Techniques. 2024. Available online: https://www.researchgate.net/publication/384844639_Improving_Object_Recognition_and_Diagnostics_with_Advanced_Learning_Techniques (accessed on 18 October 2025).
Katabi, G.; Shahar, N. Exploring the Steps of Learning: Computational Modeling of Initiatory-Actions among Individuals with Attention-Deficit/Hyperactivity Disorder. Transl. Psychiatry 2024, 14, 10. [Google Scholar] [CrossRef]
Dong, H.; Chen, D.; Chen, Y.; Tang, Y.; Yin, D.; Li, X. A Multi-Task Learning Model with Reinforcement Optimization for ASD Comorbidity Discrimination. Comput. Methods Programs Biomed. 2024, 243, 107865. [Google Scholar] [CrossRef]
Zhang, S.; Song, H.; Wang, Q.; Pei, Y. Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs. arXiv 2024, arXiv:2406.19812. [Google Scholar] [CrossRef]
Gupta, A.; Badr, Y.; Negahban, A.; Qiu, R.G. Energy-Efficient Heating Control for Smart Buildings with Deep Reinforcement Learning. J. Build. Eng. 2021, 34, 101739. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Naeem, M.; Rizvi, S.T.H.; Coronato, A. A Gentle Introduction to Reinforcement Learning and Its Application in Different Fields. IEEE Access 2020, 8, 209320–209344. [Google Scholar] [CrossRef]
Tedeschi, T.; Ciangottini, D.; Baioletti, M.; Poggioni, V.; Spiga, D.; Storchi, L.; Tracolli, M. Smart Caching in a Data Lake for High Energy Physics Analysis. arXiv 2022, arXiv:2208.06437. [Google Scholar] [CrossRef]
Moussaoui, H.; El Akkad, N.; Benslimane, M. Reinforcement Learning: A Review. IJCDS 2023, 13, 1465–1483. [Google Scholar] [CrossRef] [PubMed]
Malibari, N.; Katib, I.; Mehmood, R. Systematic Review on Reinforcement Learning in the Field of Fintech. arXiv 2023, arXiv:2305.07466. [Google Scholar] [CrossRef]
Zhong, L. Comparison of Q-Learning and SARSA Reinforcement Learning Models on Cliff Walking Problem; Atlantis Press: Dordrecht, The Netherlands, 2024; pp. 207–213. [Google Scholar]
Liu, Y.; Yang, J.; Chen, L.; Guo, T.; Jiang, Y. Overview of Reinforcement Learning Based on Value and Policy. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; IEEE: New York, NY, USA, 2020; pp. 598–603. [Google Scholar]
Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2016; Volume 30. [Google Scholar]
Zhou, S.; Liu, X.; Xu, Y.; Guo, J. A Deep Q-Network (DQN) Based Path Planning Method for Mobile Robots. In Proceedings of the 2018 IEEE International Conference on Information and Automation (ICIA), Wuyishan, China, 11–13 August 2018; IEEE: New York, NY, USA, 2018; p. 371. [Google Scholar]
Tournaire, T. Model-Based Reinforcement Learning for Dynamic Resource Allocation in Cloud Environments. Ph.D. Thesis, Institut Polytechnique de Paris, Paris, France, 2022. [Google Scholar]
Sivamayil, K.; Rajasekar, E.; Aljafari, B.; Nikolovski, S.; Vairavasundaram, S.; Vairavasundaram, I. A Systematic Study on Reinforcement Learning Based Applications. Energies 2023, 16, 1512. [Google Scholar] [CrossRef]
Siboo, S.; Bhattacharyya, A.; Raj, R.N.; Ashwin, S.H. An Empirical Study of DDPG and PPO-Based Reinforcement Learning Algorithms for Autonomous Driving. IEEE Access 2023, 11, 125094–125108. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D.P. Continuous Control with Deep Reinforcement Learning. U.S. Patent No 10,776,692, 15 September 2020. [Google Scholar]
Bick, D. Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization. Ph.D. Thesis, University of Groningen, Groningen, The Netherlands, 2021. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Albrecht, S.V.; Christianos, F.; Schäfer, L. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches; MIT Press: Cambridge, MA, USA, 2024. [Google Scholar]
Yuan, L.; Zhang, Z.; Li, L.; Guan, C.; Yu, Y. A Survey of Progress on Cooperative Multi-Agent Reinforcement Learning in Open Environment. arXiv 2023, arXiv:2312.01058. [Google Scholar] [CrossRef]
Ning, Z.; Xie, L. A Survey on Multi-Agent Reinforcement Learning and Its Application. J. Autom. Intell. 2024, 3, 73–91. [Google Scholar] [CrossRef]
Liang, J.; Miao, H.; Li, K.; Tan, J.; Wang, X.; Luo, R.; Jiang, Y. A Review of Multi-Agent Reinforcement Learning Algorithms. Electronics 2025, 14, 820. [Google Scholar] [CrossRef]
Son, K.; Kim, D.; Kang, W.J.; Hostallero, D.E.; Yi, Y. Qtran: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge MA, USA, 2019; pp. 5887–5896. [Google Scholar]
Ryu, H.; Shin, H.; Park, J. Cooperative and Competitive Biases for Multi-Agent Reinforcement Learning. arXiv 2021, arXiv:2101.06890. [Google Scholar] [CrossRef]
Maisonhaute, T.; Michel, F.; Soulie, J.-C. État de l’art Sur Les Approches En Apprentissage Par Renforcement Multi-Agent. In Proceedings of the JFSMA 2024; Cépaduès: Toulouse, France, 2024; pp. 99–108. [Google Scholar]
Zhang, Y. DQN for Coordinating Multi-Agent Cooking. Highlights Sci. Eng. Technol. 2023, 39, 1228–1238. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Y.; Tian, F.; Ma, J.; Jin, Q. Intelligent Games Meeting with Multi-Agent Deep Reinforcement Learning: A Comprehensive Review. Artif. Intell. Rev. 2025, 58, 165. [Google Scholar] [CrossRef]
Li, Z.; Chen, X.; Fu, J.; Xie, N.; Zhao, T. Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL. Algorithms 2024, 17, 36. [Google Scholar] [CrossRef]
Liu, Y.; Wang, W.; Hu, Y.; Hao, J.; Chen, X.; Gao, Y. Multi-Agent Game Abstraction via Graph Attention Neural Network. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; Volume 34, pp. 7211–7218. [Google Scholar]
Kong, X.; Xin, B.; Liu, F.; Wang, Y. Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning. arXiv 2017, arXiv:1712.07305. [Google Scholar]
Kilinc, O.; Montana, G. Multi-Agent Deep Reinforcement Learning with Extremely Noisy Observations. arXiv 2018, arXiv:1812.00922. [Google Scholar] [CrossRef]
Niu, Y.; Paleja, R.R.; Gombolay, M.C. Multi-Agent Graph-Attention Communication and Teaming. In Proceedings of the AAMAS, Online, 3–7 May 2021; Volume 21, p. 20. [Google Scholar]
Tucker, M.; Li, H.; Agrawal, S.; Hughes, D.; Sycara, K.; Lewis, M.; Shah, J.A. Emergent Discrete Communication in Semantic Spaces. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 10574–10586. [Google Scholar]
Gohari, P.; Hale, M.; Topcu, U. Privacy-Engineered Value Decomposition Networks for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 2023 62nd IEEE Conference on Decision and Control (CDC), Singapore, 13–15 December 2023; IEEE: New York, NY, USA, 2023; pp. 8038–8044. [Google Scholar]
Hu, J.; Jiang, S.; Harding, S.A.; Wu, H.; Liao, S. Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning. arXiv 2023, arXiv:2102.03479. [Google Scholar] [CrossRef]
Canese, L.; Cardarilli, G.C.; Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Re, M.; Spanò, S. Multi-Agent Reinforcement Learning: A Review of Challenges and Applications. Appl. Sci. 2021, 11, 4948. [Google Scholar] [CrossRef]
Foerster, J.; Assael, I.A.; de Freitas, N.; Whiteson, S. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
Jiang, J.; Lu, Z. Learning Attentional Communication for Multi-Agent Cooperation. In Advances in Neural Information Processing Systems; Curran Associates: Sydney, NSW, Australia, 2018. [Google Scholar]
Singh, A.; Jain, T.; Sukhbaatar, S. Learning When to Communicate at Scale in Multiagent Cooperative and Competitive Tasks. arXiv 2018, arXiv:1812.09755. [Google Scholar] [CrossRef]
Wang, R.; He, X.; Yu, R.; Qiu, W.; An, B.; Rabinovich, Z. Learning Efficient Multi-Agent Communication: An Information Bottleneck Approach. In Proceedings of the 37th International Conference on Machine Learning; PMLR: Cambridge MA, USA, 2020; pp. 9908–9918. [Google Scholar]
Sheng, J.; Wang, X.; Jin, B.; Yan, J.; Li, W.; Chang, T.-H.; Wang, J.; Zha, H. Learning Structured Communication for Multi-Agent Reinforcement Learning. Auton. Agents Multi-Agent Syst. 2022, 36, 50. [Google Scholar] [CrossRef]
Chu, T.; Chinchali, S.; Katti, S. Multi-Agent Reinforcement Learning for Networked System Control. arXiv 2020, arXiv:2004.01339. [Google Scholar] [CrossRef]
Hu, G.; Zhu, Y.; Zhao, D.; Zhao, M.; Hao, J. Event-Triggered Multi-Agent Reinforcement Learning with Communication under Limited-Bandwidth Constraint. arXiv 2020, arXiv:2010.04978. [Google Scholar]
Qu, C.; Li, H.; Liu, C.; Xiong, J.; Chu, W.; Wang, W.; Qi, Y.; Song, L. Intention propagation for multi-agent reinforcement learning. arXiv 2020, arXiv:2002.07085. [Google Scholar]
Kim, W.; Park, J.; Sung, Y. Communication in multi-agent reinforcement learning: Intention sharing. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Yusriyyah, Q.N.; Aziz, A.R.H.; Setiawati, Y.; Dianasari, D.; Pradanita, V.N.; Ardani, I.G.A.I. Learning Disorder in Attention Deficit Hyperactivity Disorder (ADHD) Children: A Literature Review. Int. J. Sci. Adv. 2023, 4, 15–18. [Google Scholar] [CrossRef]
Ballard, R.; Sadhu, J. Anticipatory Guidance for Children and Adolescents with Attention-Deficit/Hyperactivity Disorder. Pediatr. Ann. 2025, 54, e34–e39. [Google Scholar] [CrossRef] [PubMed]
Vaughn, M. Confidently Parenting ADHD Children: Practical Tips and Guidance; Independently Published (Amazon KDP): Seattle, WA, USA, 2024. [Google Scholar]
Cuevas, B.P.G.; Carreño, Y.A.M.; Gamboa, M.R. Integrating Scaffolding Techniques into Listening Comprehension Activities for English Language Learning in Students with ADHD. REGARD 2024, 8, 40. [Google Scholar]
Wojciechowska, K.; Turek, M.; Jaroń, A.; Jastrzębska, K.; Witkowska, M.; Skotnicka, J.; Błaszczak, K.; Borkowski, A.; Sawicki, M. ADHD-Treatment Options and Consequences of Neglect. Qual. Sport 2024, 17, 53428. [Google Scholar] [CrossRef]
Lyu, J.; Ishwaran, H. Commentary: To Classify Means to Choose a Threshold. J. Thorac. Cardiovasc. Surg. 2023, 165, 1443–1445. [Google Scholar] [CrossRef]
Van den Goorbergh, R.; van Smeden, M.; Timmerman, D.; Van Calster, B. The Harm of Class Imbalance Corrections for Risk Prediction Models: Illustration and Simulation Using Logistic Regression. J. Am. Med. Inform. Assoc. 2022, 29, 1525–1534. [Google Scholar] [CrossRef]
Rajaraman, S.; Ganesan, P.; Antani, S. Deep Learning Model Calibration for Improving Performance in Class-Imbalanced Medical Image Classification Tasks. PLoS ONE 2022, 17, e0262838. [Google Scholar] [CrossRef]
Salvat, H.; Mohammadi, M.N.; Molavi, P.; Mostafavi, S.A.; Rostami, R.; Salehinejad, M.A. Nutrient Intake, Dietary Patterns, and Anthropometric Variables of Children with ADHD in Comparison to Healthy Controls: A Case-Control Study. BMC Pediatr. 2022, 22, 70. [Google Scholar] [CrossRef]
Ahn, J.; Shin, J.; Park, H.; Ha, J.-W. Increased Risk of Injury and Adult Attention Deficit Hyperactivity Disorder and Effects of Pharmacotherapy: A Nationwide Longitudinal Cohort Study in South Korea. Front. Psychiatry 2024, 15, 1453100. [Google Scholar] [CrossRef]
Hyde, C.; Fuelscher, I.; Rosch, K.S.; Seymour, K.E.; Crocetti, D.; Silk, T.; Singh, M.; Mostofsky, S.H. Subtle Motor Signs in Children with ADHD and Their White Matter Correlates. Hum. Brain Mapp. 2024, 45, e70002. [Google Scholar] [CrossRef]
Meachon, E.J.; Klupp, S.; Grob, A. Gait in Children with and without ADHD: A Systematic Literature Review. Gait Posture 2023, 104, 31–42. [Google Scholar] [CrossRef] [PubMed]
Downing, C.; Caravolas, M. Handwriting Legibility and Fluency and Their Patterns of Concurrent Relations with Spelling, Graphomotor, and Selective Attention Skills. J. Exp. Child Psychol. 2023, 236, 105756. [Google Scholar] [CrossRef] [PubMed]
Katsarou, D.V.; Efthymiou, E.; Kougioumtzis, G.A.; Sofologi, M.; Theodoratou, M. Identifying Language Development in Children with ADHD: Differential Challenges, Interventions, and Collaborative Strategies. Children 2024, 11, 841. [Google Scholar] [CrossRef] [PubMed]
Kaplan Kılıç, B.; Bumin, G.; Öğütlü, H. Effect of Telerehabilitation on Handwriting Performance in Children With Attention Deficit Hyperactivity Disorder: Randomized Controlled Trial. Child 2025, 51, e70055. [Google Scholar] [CrossRef]
Santos, W.M.D.; Albuquerque, A.R. de Effect of Words Highlighting in School Tasks upon Typical ADHD Behaviors. Psicol. Teor. Pesqui. 2021, 37, e37302. [Google Scholar] [CrossRef]
Namasse, Z.; Hidila, Z.; Tabaa, M.; Elhaddadi, M.; Mouchawrab, S. Cure-Free: A Free-Model Reinforcement Learning Approach for the ADHD Children. In Emerging Technologies for Developing Countries; 2026, in press.
Amato, C. A First Introduction to Cooperative Multi-Agent Reinforcement Learning. 2024, in press.
Zhu, C.; Dastani, M.; Wang, S. A Survey of Multi-Agent Deep Reinforcement Learning with Communication. Auton. Agents Multi-Agent Syst. 2024, 38, 4. [Google Scholar] [CrossRef]
Amato, C. An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning. arXiv 2024, arXiv:2409.03052. [Google Scholar] [CrossRef]
Fang, X.; Cui, P.; Wang, Q. Multiple Agents Cooperative Control Based on QMIX Algorithm in SC2LE Environment. In Proceedings of the 2020 7th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS), Guangzhou, China, 13–15 November 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 435–439. [Google Scholar]
U.S. Department of Agriculture. SuperTracker: Source Code and Foods Database. Available online: https://catalog.data.gov/dataset/supertracker-source-code-and-foods-database (accessed on 30 January 2026).
Cohen, R.; Cohen-Kroitoru, B.; Halevy, A.; Aharoni, S.; Aizenberg, I.; Shuper, A. Handwriting in Children with Attention Deficient Hyperactive Disorder: Role of Graphology. BMC Pediatr. 2019, 19, 484. [Google Scholar] [CrossRef]

Figure 1. Multi-Agent Reinforcement Learning (MARL) Components [43].

Figure 2. Schema.

Figure 3. Value Decomposition Network (VDN) [42].

Figure 4. QMIX [42].

Figure 5. Attention-Deficit Hyperactivity Disorder (ADHD) Feedback and Sugar Food.

Figure 6. ADHD Feedback and Healthy Food.

Figure 7. ADHD Feedback and Dysgraphia.

Figure 8. ADHD Feedback and Walking Speed.

Figure 9. ADHD Feedback and Aggressiveness.

Figure 10. Average rewards of children with ADHD and non-ADHD according to the Independent Deep Q Network (IDQN) algorithm.

Figure 11. Average rewards of children with ADHD and without ADHD according to the VDN algorithm.

Figure 12. Average rewards of children with ADHD and without ADHD according to the QMIX algorithm.

Table 1. Agents’ actions.

Indices	Action	Explanation
1	Eating sugar food	According to [77], ADHD children eat more sugary food than non-ADHD children.
2	Eat less proteins	[77] found that children with ADHD consumed less protein-based food.
3	Eat more protein	[77] concluded that increased protein intake reduces ADHD symptoms.
4	Eat less sugary food	In agreement with [77], a decrease in sugar consumption could alleviate ADHD symptoms.
5	Fall	A systematic review mentioned by [78] indicates that children with ADHD have more fractures than those without ADHD.
6	Move clumsily	According to a study mentioned by [79], children with ADHD have widespread subtle motor activities and greater motor overflow than non-ADHD children.
7	Walk more slowly	Studies cited by [80] indicate that a slower walking pace could alleviate ADHD symptoms.
8	Making graphic design mistakes	According to [81], children with ADHD make more errors in drawing than neurotypical children.
9	Write an incoherent text.	[82] mention that children with ADHD write more incoherent texts than children without ADHD.
10	Write more often	The findings by [83] highlight that more frequent writing practice may increase attention in individuals with ADHD.
11	Highlight the words	A study conducted by [84] suggests that highlighting words can improve attention in individuals with ADHD.
12	Summarize the paragraphs	[82] propose paragraph brainstorming as an intervention to reduce inattention in individuals with ADHD.

Table 2. Reward Variables.

Variables	Meaning	Margin
DFT	Duration of focus during a task	0: Less than 10 min 1: Between 10 and 15 min 2: Between 15 and 20 min 3: Between 20 and 30 min
WSpe	Writing speed	0: Normal speed 1: High speed 2: Low speed
LS	Letter spacing	0: Normal (1–2 mm) 1: Large 2: Narrow 3: Irregularly spaced
WSpa	Word spacing	0: Normal 1: Large 2: Narrow 3: Irregularly spaced
Pro	Proteins	0: Between 0 and 20 1: Between 21 and 40 2: Between 41 and 60
Kcal	Kilocalories	0: Between 0 and 200 1: Between 201 and 400 2: Between 401 and 600
WC	Walking cadence	0: Very slow 1: Slow 2: Normal 3: Fast 4: Very fast

Table 3. Benchmarking of a few Multi-Agent Reinforcement Learning (MARL) Algorithms.

Algorithm	Value-Based or Policy-Based	Credit Assignment
MAPPO [50]	Policy-Based	✕
MADDPG [50]	Policy-Based	✕
IDQN [86]	Value-Based	✕
CommNet [87]	Policy-Based	✕
BicNet [87]	Policy-Based/Value-Based	✕
DIAL [50]	Value-Based	✕
COMA [50]	Policy-Based	✓
VDN [50]	Value-Based	✓
QMIX [50]	Value-Based	✓

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Namasse, Z.; Hidila, Z.; Tabaa, M.; Elhaddadi, M.; Mouchawrab, S. Multi-Agent Reinforcement Learning Model Simulation for Attention-Deficit Hyperactivity Disorder Children. Appl. Sci. 2026, 16, 2158. https://doi.org/10.3390/app16042158

AMA Style

Namasse Z, Hidila Z, Tabaa M, Elhaddadi M, Mouchawrab S. Multi-Agent Reinforcement Learning Model Simulation for Attention-Deficit Hyperactivity Disorder Children. Applied Sciences. 2026; 16(4):2158. https://doi.org/10.3390/app16042158

Chicago/Turabian Style

Namasse, Zineb, Zineb Hidila, Mohamed Tabaa, Mounia Elhaddadi, and Samar Mouchawrab. 2026. "Multi-Agent Reinforcement Learning Model Simulation for Attention-Deficit Hyperactivity Disorder Children" Applied Sciences 16, no. 4: 2158. https://doi.org/10.3390/app16042158

APA Style

Namasse, Z., Hidila, Z., Tabaa, M., Elhaddadi, M., & Mouchawrab, S. (2026). Multi-Agent Reinforcement Learning Model Simulation for Attention-Deficit Hyperactivity Disorder Children. Applied Sciences, 16(4), 2158. https://doi.org/10.3390/app16042158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Agent Reinforcement Learning Model Simulation for Attention-Deficit Hyperactivity Disorder Children

Abstract

1. Introduction

2. Related Works

3. Background

4. Materials and Methods

4.1. Objective Function

4.2. Environment

4.3. Agents

4.4. Actions

4.5. States

4.6. Rewards

4.7. Reward Aggregation Function

4.8. Algorithms

5. Discussion and Results

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI