Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (218)

Search Parameters:
Keywords = Reward Shaping

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1088 KB  
Article
Users’ Perspectives of Bidirectional Charging in Public Environments
by Érika Martins Silva Ramos, Thomas Lindgren, Jonas Andersson and Jens Hagman
World Electr. Veh. J. 2026, 17(4), 176; https://doi.org/10.3390/wevj17040176 - 26 Mar 2026
Viewed by 194
Abstract
Technological advances such as Vehicle-to-Grid (V2G) have the potential to support renewable energy integration and grid stability, but large-scale deployment depends on users’ willingness to participate, particularly in public charging environments. While prior research has examined V2G from technical feasibility and system-level perspectives, [...] Read more.
Technological advances such as Vehicle-to-Grid (V2G) have the potential to support renewable energy integration and grid stability, but large-scale deployment depends on users’ willingness to participate, particularly in public charging environments. While prior research has examined V2G from technical feasibility and system-level perspectives, everyday public settings remain unexplored. This study investigates electric vehicle (EV) users’ willingness to engage in V2G services in public spaces, with a focus on incentives, expectations, and how participation aligns with existing routines and parking conditions. A mixed-method approach was applied, combining a survey of 544 car users with two waves of user-centered interviews. The survey data were analyzed using factor analysis and linear regression models, while the interview data were thematically analyzed. The results show that users’ evaluations of V2G are shaped by sustainability expectations, perceived efficiency, and uncertainties, and preferences for public V2G participation are strongly influenced by convenience, clarity of the offer, and perceived control. Home charging practices emerged as a key reference point shaping expectations of public V2G services. Across both methods, simple and transparent incentives, such as reduced charging or parking costs, were consistently preferred over more complex reward models, including point-based systems or dynamic energy trading. Concerns related to control over trips, battery degradation, trust in service providers, and added complexity remain important barriers to participation. The findings highlight the need for user-centered and socio-technical design of public V2G services that align with users’ everyday routines, parking conditions, and expectations to support broader adoption beyond the home context. Full article
Show Figures

Figure A1

34 pages, 63807 KB  
Article
Research on Path Planning Methods and Characteristics of Urban Unmanned Aerial Vehicles Under Noise Constraints
by Yaqing Chen, Yunfei Jin, Xin He and Yumei Zhang
Drones 2026, 10(3), 227; https://doi.org/10.3390/drones10030227 - 23 Mar 2026
Viewed by 211
Abstract
This study proposes TNAP-DDQN, a deep reinforcement learning method for urban low-altitude UAV path planning under residential noise threshold constraints. With time cost and safety risk as the optimization objectives, operational constraints such as collision risk and maximum AGL altitude are incorporated to [...] Read more.
This study proposes TNAP-DDQN, a deep reinforcement learning method for urban low-altitude UAV path planning under residential noise threshold constraints. With time cost and safety risk as the optimization objectives, operational constraints such as collision risk and maximum AGL altitude are incorporated to achieve coordinated optimization of noise compliance, operational safety, and efficiency. To mitigate action space contraction and training instability induced by multiple constraints, a Noise-Degradation-Mask-based Action Bias Network (NDM-ABN) is introduced at the action selection layer. A three-tier degradation scheme prevents empty candidate sets, while bias-based decision making is applied to approximately tied actions to stabilize the policy. Moreover, multi-step prioritized experience replay (PER) improves sample efficiency and long-horizon return modeling, and potential-based reward shaping (PBRS) transforms sparse constraint signals into auxiliary rewards. Simulation results indicate that: (1) NDM-ABN is the key module for stabilizing the noise-exposure process by suppressing high-noise actions; (2) the required AGL is related to the UAV source noise level and local noise limits, implying the need for differentiated AGL altitude classes; and (3) the maximum admissible UAV source noise level increases as the threshold is relaxed. The proposed method provides quantitative guidance for noise-entry and AGL altitude regulation, while future work will incorporate additional metrics (e.g., A-weighted equivalent sound level) to better capture noise fluctuations and short-term peaks. Full article
(This article belongs to the Section Innovative Urban Mobility)
Show Figures

Figure 1

48 pages, 4538 KB  
Review
Beyond Sensory Properties: Molecular Interactions of Antioxidant Flavour-Active Polyphenols Across the Food-Oral-Gut Axis
by Inês M. Ferreira, Sara A. Martins, Leonor Gonçalves, Mónica Jesus, Elsa Brandão and Susana Soares
Antioxidants 2026, 15(3), 397; https://doi.org/10.3390/antiox15030397 - 21 Mar 2026
Viewed by 463
Abstract
Dietary antioxidants are widely valued for their potential health benefits, but incorporating them into functional foods is not straightforward. Polyphenols are among the most abundant and important antioxidants in foods, and this review focuses on them because the same structural features linked to [...] Read more.
Dietary antioxidants are widely valued for their potential health benefits, but incorporating them into functional foods is not straightforward. Polyphenols are among the most abundant and important antioxidants in foods, and this review focuses on them because the same structural features linked to their health-promoting effects can also cause pronounced bitterness and astringency, ultimately limiting consumer acceptance. This review examines how these challenges are interconnected across three levels: food matrix interactions, bioavailability, and consumer psychobiology. We describe how non-covalent interactions between polyphenols, proteins, and polysaccharides can have both positive and negative effects. While these interactions may alter oral lubrication and flavour release, they also protect highly reactive bioactive compounds from gastric degradation. Furthermore, we broaden the concept of bioavailability by exploring the microbiota-mediated “colonic rescue” of polyphenols that are not released during earlier digestion. We also highlight the role of extraoral bitter taste receptors (TAS2Rs) along the gastrointestinal (GI) tract. Activation of these receptors during digestion can trigger relevant metabolic and endocrine responses, indicating that systemic absorption is not the only pathway to bioactivity. Finally, we connect these mechanisms to individual differences in food acceptance, showing that genetic factors (e.g., TAS2R38 and the salivary proteome) and psychological traits (such as neophobia and reward sensitivity) can shape rejection or flavour-nutrient learning. Overall, the successful development of functional foods will require a “sensory-by-design” approach. This strategy utilises matrix interactions strategically to improve both consumer acceptance and physiological efficacy. Full article
(This article belongs to the Section Natural and Synthetic Antioxidants)
Show Figures

Figure 1

26 pages, 21346 KB  
Article
A Load-Balancing-Aware Learning Framework for Collaborative UAV-MEC Computation Offloading
by Huafeng Li, Yuxuan Wang, Hengming Liu, Jiaxuan Li, Xu Wang, Qun Lei, Ke Xiao and Hongliang Zhu
Sensors 2026, 26(6), 1920; https://doi.org/10.3390/s26061920 - 18 Mar 2026
Viewed by 239
Abstract
Unmanned Aerial Vehicle (UAV) computing clusters face severe operational constraints due to limited computing capabilities and battery capacities, which complicate the simultaneous optimization of low offloading latency, long task endurance, and high cluster efficiency. To address these challenges, this paper proposes a Multi-Objective [...] Read more.
Unmanned Aerial Vehicle (UAV) computing clusters face severe operational constraints due to limited computing capabilities and battery capacities, which complicate the simultaneous optimization of low offloading latency, long task endurance, and high cluster efficiency. To address these challenges, this paper proposes a Multi-Objective Reinforcement Learning framework based on Latency and Power Balance (MORL-LAPB). Instead of broad situational awareness descriptions, our framework directly combines a reward-shaping reinforcement learning algorithm with an evolutionary mechanism to construct a closed-loop optimization paradigm. Crucially, in this context, ’balancing’ extends beyond traditional computational workload distribution; it represents a joint optimization that balances task allocation to ensure short service delays while simultaneously equating the energy depletion rates across UAV nodes to maximize overall cluster efficiency and operational duration. By efficiently identifying Pareto optimal trade-offs, MORL-LAPB dynamically regulates UAV energy allocation and computational resource scheduling. Experimental results demonstrate that, compared to RSO, NSO, and DRLSO baselines, the proposed MORL-LAPB significantly reduces offloading latency, extends effective task execution duration, and improves cluster energy efficiency. The framework offers flexible adaptability and long-term sustainability for diverse operational scenarios under strict multi-objective constraints. Full article
(This article belongs to the Special Issue Communications and Networking Based on Artificial Intelligence)
Show Figures

Figure 1

24 pages, 4975 KB  
Article
Disturbance Observer-Based Actor–Critic Reinforcement Learning with Adaptive Reward for Energy-Efficient Control of Robotic Manipulators
by Le Thi Minh Tam, Nguyen Viet Ngu, Duc Hung Pham and V. T. Mai
Actuators 2026, 15(3), 167; https://doi.org/10.3390/act15030167 - 16 Mar 2026
Viewed by 314
Abstract
Reinforcement learning controllers for robot manipulators depend strongly on reward tuning, and fixed weights may yield poor trade-offs under uncertainty and disturbances. This paper proposes a disturbance observer-based actor–critic RL (DOB–ACRL) with adaptive multi-objective reward shaping for a torque-saturated 2-DOF manipulator, where the [...] Read more.
Reinforcement learning controllers for robot manipulators depend strongly on reward tuning, and fixed weights may yield poor trade-offs under uncertainty and disturbances. This paper proposes a disturbance observer-based actor–critic RL (DOB–ACRL) with adaptive multi-objective reward shaping for a torque-saturated 2-DOF manipulator, where the reward weights are updated online using normalized indicators of tracking error, control energy, and effort. A Lyapunov analysis guarantees the uniform ultimate boundedness of closed-loop signals. The simulations show improved learning and performance over a static reward actor–critic baseline, reducing the RMS tracking error by up to 22.8%, the control energy by ~4.6%, the control effort by 1.9%, and the settling time by up to 29.2%. Full article
(This article belongs to the Section Actuators for Robotics)
Show Figures

Figure 1

17 pages, 566 KB  
Article
Analyst-of-Record: A Proof-of-Concept for Influence-Based Analyst Credit Assignment in Human-Feedback Decision Support
by Devon L. Brown and Danda B. Rawat
Electronics 2026, 15(6), 1210; https://doi.org/10.3390/electronics15061210 - 13 Mar 2026
Viewed by 293
Abstract
The purpose of this study is to examine whether analyst-level credit can be assigned quantitatively in a lightweight human-feedback decision-support pipeline. In intelligence and national security workflows, analysts often provide edits, comments, and evaluative feedback during the production of analytic products, yet these [...] Read more.
The purpose of this study is to examine whether analyst-level credit can be assigned quantitatively in a lightweight human-feedback decision-support pipeline. In intelligence and national security workflows, analysts often provide edits, comments, and evaluative feedback during the production of analytic products, yet these intermediate contributions are usually discarded, leaving no auditable record of how individual feedback shaped the final output. To address this problem, this study proposes a proof-of-concept Analyst-of-Record framework that combines synthetic analyst feedback, a linear ridge reward model, first-order influence functions, and additive Shapley aggregation to estimate both feedback-item and analyst-level contribution scores. The research design uses the Fact Extraction and VERification (FEVER) fact-verification dataset under controlled experimental settings. The pipeline retrieves evidence with Best Matching 25 (BM25), generates a grounded template-based response, derives three synthetic analyst feedback channels from FEVER annotations, trains a reward model on simple claim–answer and analyst-identity features, and aggregates per-feedback influence scores into an Analyst Contribution Index (ACI). The main experiments are conducted on a 500-claim subset across five random seeds, with additional ablation and bootstrap analyses used to assess sensitivity and stability. The findings show that the reward model achieves a mean validation R2 of 0.801±0.037, indicating that the synthetic feedback signals are learnable under the selected featureization. The analyst-level contribution scores remain stable across random seeds, with approximately half of the total influence magnitude attributed to the explanation-quality channel and the remainder split across the other two channels. Ablation results further show that removing the explanation-quality channel collapses validation fit, while bootstrap resampling demonstrates tight concentration of absolute ACI magnitudes. Theoretically, this study extends attribution research beyond document-only grounding by showing how analyst feedback itself can be modeled as an object of contribution analysis. It also demonstrates that influence functions and Shapley-style aggregation can be adapted into a tractable framework for estimating interpretable analyst-level credit in a reproducible experimental setting. Practically, the proposed framework offers an initial foundation for more traceable and accountable decision-support workflows in which intermediate analyst contributions can be preserved rather than lost. The results also provide a feasible implementation path for future systems that incorporate stronger generators, richer evidence representations, and real analyst annotations. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

25 pages, 2560 KB  
Article
Statistical Reward Shaping for Reinforcement Learning in Bipedal Locomotion
by Shuhan Yan, Chuan Chen, Xinliang Zhou and Jiaping Xiao
Electronics 2026, 15(6), 1203; https://doi.org/10.3390/electronics15061203 - 13 Mar 2026
Viewed by 384
Abstract
Achieving stable bipedal locomotion for humanoid robots remains a central challenge in reinforcement learning (RL), in which the design of reward functions is pivotal but non-trivial. This paper proposes a three-tier statistical reward shaping framework to optimize bipedal gait learning. First, training outcomes [...] Read more.
Achieving stable bipedal locomotion for humanoid robots remains a central challenge in reinforcement learning (RL), in which the design of reward functions is pivotal but non-trivial. This paper proposes a three-tier statistical reward shaping framework to optimize bipedal gait learning. First, training outcomes are diagnostically monitored using forward distance, fall rate, and posture score. Pearson correlation and regression analyses are then employed to identify trade-offs and isolate the direct effects of reward components. Finally, targeted parameter sweeps enable directionally guided optimization, substantially reducing heuristic parameter tuning while refining a reward function for the H1 robot in Isaac Lab. Experimental results demonstrate clear improvements over the baseline. The optimized policy reduces convergence time by 14% and increases forward distance by 186%. Stability is markedly enhanced, with fall rate decreasing from 75% to 2% and active locomotion efficiency nearly doubling (0.339 to 0.678). These results validate a reproducible, data-driven framework for reward design, highlighting the importance of principled statistical analysis in complex RL-based humanoid locomotion. Full article
(This article belongs to the Special Issue Advances in Intelligent Computing and Systems Design)
Show Figures

Figure 1

33 pages, 1249 KB  
Article
Degradation-Aware Learning-Based Control for Residential PV–Battery Systems
by Ahmed Chiheb Ammari
Energies 2026, 19(6), 1434; https://doi.org/10.3390/en19061434 - 12 Mar 2026
Viewed by 280
Abstract
Residential photovoltaic (PV)–battery systems are increasingly deployed to reduce electricity costs under time-of-use and demand-charge tariffs, yet their economic value depends critically on how storage is operated over time. Effective control must simultaneously address short-term energy costs, peak-demand exposure, and long-term battery degradation, [...] Read more.
Residential photovoltaic (PV)–battery systems are increasingly deployed to reduce electricity costs under time-of-use and demand-charge tariffs, yet their economic value depends critically on how storage is operated over time. Effective control must simultaneously address short-term energy costs, peak-demand exposure, and long-term battery degradation, all under substantial uncertainty in load and PV generation. While optimization-based approaches can achieve strong performance with accurate forecasts, they are sensitive to forecast errors, whereas learning-based methods often neglect degradation effects or deplete the battery prematurely, leading to suboptimal peak-shaving behavior. This paper proposes a forecast-free, degradation-aware reinforcement learning (RL) framework for residential PV–battery energy management that jointly addresses demand-charge mitigation and battery aging. The proposed controller internalizes both calendar aging and rainflow-based cycling degradation within its objective and incorporates demand-aware reward shaping with time-varying penalties on on-peak grid imports. In addition, a complementary state-of-charge reserve mechanism discourages premature battery depletion and improves responsiveness to late on-peak demand surges, despite the absence of explicit load or PV forecasts. Physical feasibility is guaranteed through an execution-time safety layer that enforces all device and operational constraints by construction. The proposed framework is evaluated on high-resolution residential datasets and compared against optimization-based baselines, including a day-ahead scheduler with perfect foresight and a receding-horizon MPC controller using short-horizon forecasts. Overall, the results show that the proposed RL controller substantially reduces demand charges and total electricity costs relative to forecast-based MPC while maintaining degradation-aware operation, demonstrating the potential of forecast-free reinforcement learning as a practical control strategy for residential PV–battery systems under demand-charge tariffs. Full article
(This article belongs to the Section A: Sustainable Energy)
Show Figures

Figure 1

39 pages, 67440 KB  
Article
LLM-TOC: LLM-Driven Theory-of-Mind Adversarial Curriculum for Multi-Agent Generalization
by Chenxu Wang, Jiang Yuan, Tianqi Yu, Xinyue Jiang, Liuyu Xiang, Junge Zhang and Zhaofeng He
Mathematics 2026, 14(5), 915; https://doi.org/10.3390/math14050915 - 8 Mar 2026
Viewed by 377
Abstract
Zero-shot generalization to out-of-distribution (OOD) teammates and opponents in multi-agent systems (MASs) remains a fundamental challenge for general-purpose AI, especially in open-ended interaction scenarios. Existing multi-agent reinforcement learning (MARL) paradigms, such as self-play and population-based training, often collapse to a limited subset of [...] Read more.
Zero-shot generalization to out-of-distribution (OOD) teammates and opponents in multi-agent systems (MASs) remains a fundamental challenge for general-purpose AI, especially in open-ended interaction scenarios. Existing multi-agent reinforcement learning (MARL) paradigms, such as self-play and population-based training, often collapse to a limited subset of Nash equilibria, leaving agents brittle when faced with semantically diverse, unseen behaviors. Recent approaches that invoke Large Language Models (LLMs) at run time can improve adaptability but introduce substantial latency and can become less reliable as task horizons grow; in contrast, LLM-assisted reward-shaping methods remain constrained by the inefficiency of the inner reinforcement-learning loop. To address these limitations, we propose LLM-TOC (LLM-Driven Theory-of-Mind Adversarial Curriculum), which casts generalization as a bi-level Stackelberg game: in the inner loop, a MARL agent (the follower) minimizes regret against a fixed population, while in the outer loop, an LLM serves as a semantic oracle that generates executable adversarial or cooperative strategies in a Turing-complete code space to maximize the agent’s regret. To cope with the absence of gradients in discrete code generation, we introduce Gradient Saliency Feedback, which transforms pixel-level value fluctuations into semantically meaningful causal cues to steer the LLM toward targeted strategy synthesis. We further provide motivating theoretical analysis via the PAC-Bayes framework, showing that LLM-TOC converges at rate O(1/K) and yields a tighter generalization error bound than parameter-space exploration under reasonable preconditions. Experiments on the Melting Pot benchmark demonstrate that, with expected cumulative collective return as the core zero-shot generalization metric, LLM-TOC consistently outperforms self-play baselines (IPPO and MAPPO) and the LLM-inference method Hypothetical Minds across all held-out test scenarios, reaching 75% to 85% of the upper-bound performance of Oracle PPO. Meanwhile, with the number of RL environment interaction steps to reach the target relative performance as the core efficiency metric, our framework reduces the total training computational cost by more than 60% compared with mainstream baselines. Full article
(This article belongs to the Special Issue Applications of Intelligent Game and Reinforcement Learning)
Show Figures

Figure 1

24 pages, 525 KB  
Systematic Review
Gender Diversity and Psychosocial Work Risks from a Non-Binary Perspective: A Systematic Review
by Abel Perez-Gonzalez, Ferdinando Tuscani, Raul Pelagaggi and Mohamed Nasser
Merits 2026, 6(1), 6; https://doi.org/10.3390/merits6010006 - 27 Feb 2026
Viewed by 422
Abstract
This systematic review examines how gender shapes exposure to and experiences of psychosocial risks in the workplace. Drawing on 89 empirical studies published between 2010 and 2024, the review synthesizes evidence from occupational health psychology, gender studies, and organizational research. Searches were conducted [...] Read more.
This systematic review examines how gender shapes exposure to and experiences of psychosocial risks in the workplace. Drawing on 89 empirical studies published between 2010 and 2024, the review synthesizes evidence from occupational health psychology, gender studies, and organizational research. Searches were conducted in PubMed, Web of Science, Scopus, CINAHL, and PsycINFO, and included empirical studies published in English and Spanish. Following PRISMA guidelines, a qualitative thematic synthesis was conducted to integrate findings across diverse sectors, populations, and methodological approaches. The evidence reveals persistent gendered patterns in psychosocial risk exposure and outcomes: women are more frequently exposed to emotionally demanding and relational forms of work and report poorer mental health outcomes; men experience performance-driven strain linked to workload, competition, and reward insecurity more often; and transgender and non-binary workers face additional psychosocial burdens associated with stigma, discrimination, and minority stress. Across the literature, structural and cultural determinants—such as occupational segregation, unequal recognition, and gendered organizational norms—emerge as central mechanisms underlying these disparities. Theoretical frameworks including effort–reward imbalance, demand–control, work–family conflict, organizational climate, and minority stress collectively contribute to explaining how gendered psychosocial risks are produced and sustained. Overall, the review underscores the need to move beyond individualistic and binary models of psychosocial risk toward gender-responsive approaches that account for structural, relational, and identity-based dimensions of work, thereby informing research and organizational strategies aimed at promoting equitable and sustainable well-being at work. Full article
Show Figures

Graphical abstract

38 pages, 16228 KB  
Article
Deep Q-Network Agents for Game Playing: Systematic Evaluation Across Eight Benchmark and Custom Environments
by Časlav Livada, Marko Duka, Tomislav Keser and Krešimir Nenadić
Electronics 2026, 15(5), 958; https://doi.org/10.3390/electronics15050958 - 26 Feb 2026
Viewed by 422
Abstract
Deep Q-Networks (DQNs) have achieved strong performance across a range of benchmark tasks; however, their reliability under varying reward structures and planning horizons remains insufficiently characterized. This study presents a systematic cross-environment analysis of DQN agents evaluated across eight environments spanning simple control, [...] Read more.
Deep Q-Networks (DQNs) have achieved strong performance across a range of benchmark tasks; however, their reliability under varying reward structures and planning horizons remains insufficiently characterized. This study presents a systematic cross-environment analysis of DQN agents evaluated across eight environments spanning simple control, arcade, and strategic domains. Rather than pursuing state-of-the-art performance, the objective is to investigate structural conditions under which standard value-based reinforcement learning succeeds, degrades, or fails. Across controlled experiments with consistent training budgets and statistical validation, three recurring failure patterns are identified: (i) sparse-reward exploration failure, (ii) reward exploitation without functional task competence, and (iii) strategic planning limitations in long-horizon or adversarial environments. Within-environment ablation studies further demonstrate that moderate network scaling (2–4× parameter increases) does not significantly alter learning outcomes when reward functions remain unchanged, suggesting that reward alignment and task horizon dominate architectural capacity as determinants of performance. The results provide a structured diagnostic perspective on DQN reliability, clarify the limits of reward shaping in complex environments, and offer practical guidance for identifying when standard value-based methods are likely to become unstable or insufficient. Full article
(This article belongs to the Special Issue Machine/Deep Learning Applications and Intelligent Systems)
Show Figures

Figure 1

19 pages, 3606 KB  
Article
Autonomous Navigation of an Unmanned Underwater Vehicle via Safe Reinforcement Learning and Active Disturbance Rejection Control
by Qinze Chen, Yun Cheng, Yinlong Yuan and Liang Hua
J. Mar. Sci. Eng. 2026, 14(5), 425; https://doi.org/10.3390/jmse14050425 - 25 Feb 2026
Viewed by 329
Abstract
A two-layer control framework for unmanned underwater vehicle (UUV) navigation is proposed, combining a lower-layer active disturbance rejection controller (ADRC) with an upper-layer safe reinforcement learning (RL) policy for obstacle-avoidance navigation. The lower layer, utilizing ADRC, ensures high tracking accuracy and effective disturbance [...] Read more.
A two-layer control framework for unmanned underwater vehicle (UUV) navigation is proposed, combining a lower-layer active disturbance rejection controller (ADRC) with an upper-layer safe reinforcement learning (RL) policy for obstacle-avoidance navigation. The lower layer, utilizing ADRC, ensures high tracking accuracy and effective disturbance rejection, while the upper layer integrates the twin delayed deep deterministic policy gradient (TD3) algorithm, combined with a control barrier function (CBF)-based quadratic programming (QP) safety filter and safety-inspired reward shaping (SR). The method is evaluated in two simulation studies: (i) velocity and attitude control to assess tracking and disturbance rejection, and (ii) obstacle-avoidance navigation to assess learning efficiency, trajectory smoothness, and safety-related metrics. Simulation results show that ADRC achieves faster tracking and stronger disturbance rejection than a conventional proportional–integral–derivative (PID) controller. Moreover, the proposed TD3 + QP + SR scheme exhibits faster learning, smoother trajectories, and improved safety performance compared with RL baselines. These results indicate that the proposed framework enables efficient and safe UUV navigation in simulation scenarios with obstacles and disturbances. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

12 pages, 223 KB  
Article
Motivating Teachers in Curriculum Enrichment Programmes Through Rewards and Recognition in Practice
by Ntandokamenzi Penelope Dlamini
Educ. Sci. 2026, 16(2), 343; https://doi.org/10.3390/educsci16020343 - 21 Feb 2026
Viewed by 281
Abstract
Teacher motivation plays a critical role in the successful implementation of curriculum enrichment programmes, yet it remains underexplored in many educational initiatives. The study contributes insights into teacher motivation in early childhood education and offers practical guidance for strengthening the sustainability of enrichment [...] Read more.
Teacher motivation plays a critical role in the successful implementation of curriculum enrichment programmes, yet it remains underexplored in many educational initiatives. The study contributes insights into teacher motivation in early childhood education and offers practical guidance for strengthening the sustainability of enrichment programmes through integrated recognition, support, and incentive structures. This study investigates the impact of rewards and recognition on teachers’ engagement in the Tsogo Sun Moves for Life chess programme in early childhood education classrooms within the King Cetshwayo District, South Africa. A qualitative case study design was used, with data collected through semi-structured interviews, observations, and document analysis, and analysed using thematic analysis. The findings indicate that while teachers valued teaching resources, coordinator support, and certificates of appreciation, these forms of recognition were insufficient to sustain long-term engagement. Teachers emphasised the need for meaningful acknowledgment and tangible incentives to justify the additional workload associated with programme implementation. Drawing on Self-Determination Theory and Herzberg’s Two-Factor Theory of Motivation, the study highlights the interaction between intrinsic and extrinsic motivation in shaping teachers’ commitment. Full article
15 pages, 720 KB  
Article
Sex and Age Differences in Decision-Making Under Risk by Wild Balinese Long-Tailed Macaques (Macaca fascicularis fascicularis): A Field Experimental Study
by Caleb Bunselmeyer, Noëlle Gunst, I Nengah Wandia, Robert J. Williams, Elsa Addessi and Jean-Baptiste Leca
Animals 2026, 16(4), 617; https://doi.org/10.3390/ani16040617 - 15 Feb 2026
Viewed by 603
Abstract
This study examines risky decision-making in a free-ranging population of Balinese long-tailed macaques (Macaca fascicularis fascicularis), addressing gaps in research that have largely focused on captive primates and have rarely considered individual differences by age and sex. Thirty-three macaques of different [...] Read more.
This study examines risky decision-making in a free-ranging population of Balinese long-tailed macaques (Macaca fascicularis fascicularis), addressing gaps in research that have largely focused on captive primates and have rarely considered individual differences by age and sex. Thirty-three macaques of different age–sex classes were tested using a choice task contrasting a guaranteed small reward with a probabilistic larger reward. At the group level, macaques showed no preference for safe or risky options. However, substantial individual variation emerged: some individuals were risk-prone, others risk-averse, and many indifferent. Notably, age and sex interacted in shaping risk preferences. Among males, adults and juveniles were more risk-prone than younger adults, whereas among females, adults were more risk-prone than juveniles. Juveniles also displayed outcome-dependent flexibility, choosing the risky option more often after a previous successful risky choice, consistent with a win–stay strategy. Like in rodents, this pattern may reflect adaptive learning during developmental transitions. Importantly, the observed behavioral differences were not due to misunderstanding of the task, as macaques reliably chose the larger option when outcomes were visible. This pronounced individual variability in primate risk preferences underscore the importance of considering demographic factors when characterizing species-typical risk preferences. Full article
(This article belongs to the Section Human-Animal Interactions, Animal Behaviour and Emotion)
Show Figures

Graphical abstract

29 pages, 11326 KB  
Article
Constrained Soft Actor–Critic for Joint Computation Offloading and Resource Allocation in UAV-Assisted Edge Computing
by Nawazish Muhammad Alvi, Waqas Muhammad Alvi, Xiaolong Zhou, Jun Li and Yifei Wei
Sensors 2026, 26(4), 1149; https://doi.org/10.3390/s26041149 - 10 Feb 2026
Viewed by 552
Abstract
Unmanned Aerial Vehicle (UAV)-assisted edge computing supports latency-sensitive applications by offloading computational tasks to ground-based servers. However, determining optimal resource allocation under strict latency constraints and stochastic channel conditions remains challenging. This paper addresses the joint computation partitioning and power allocation problem for [...] Read more.
Unmanned Aerial Vehicle (UAV)-assisted edge computing supports latency-sensitive applications by offloading computational tasks to ground-based servers. However, determining optimal resource allocation under strict latency constraints and stochastic channel conditions remains challenging. This paper addresses the joint computation partitioning and power allocation problem for UAV-assisted edge computing systems. We formulate the problem as a Constrained Markov Decision Process (CMDP) that explicitly models latency constraints, rather than relying on implicit reward shaping. To solve this CMDP, we propose Constrained Soft Actor–Critic (C-SAC), a deep reinforcement learning algorithm that combines maximum-entropy policy optimization with Lagrangian dual methods. C-SAC employs a dedicated constraint critic network to estimate long-term constraint violations and an adaptive Lagrange multiplier that automatically balances energy efficiency against latency satisfaction without manual tuning. Extensive experiments demonstrate that C-SAC achieves an 18.9% constraint violation rate. This represents a 60.6-percentage-point improvement compared to unconstrained Soft Actor–Critic, with 79.5%, and a 22.4-percentage-point improvement over deterministic TD3-Lagrangian, achieving 41.3%. The learned policies exhibit strong channel-adaptive behavior with a correlation coefficient of 0.894 between the local computation ratio and channel quality, despite the absence of explicit channel modeling in the reward function. Ablation studies confirm that both adaptive mechanisms are essential, while sensitivity analyses show that C-SAC maintains robust performance with violation rates varying by less than 2 percentage points even as channel variability triples. These results establish constrained reinforcement learning as an effective approach for reliable UAV edge computing under stringent quality-of-service requirements. Full article
(This article belongs to the Special Issue Communications and Networking Based on Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop