Submit to Special Issue Submit Abstract to Special Issue Review for Electronics Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Reinforcement Learning: Sample Efficiency, Generalisation, and AI Applications

Print Special Issue Flyer
Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 September 2026 | Viewed by 8130

Share This Special Issue

Editors

Dr. Abdulrahman Altahhan

E-Mail Website
Guest Editor

School of Computer Science, University of Leeds, Leeds LS2 9BW, UK
Interests: deep reinforcement learning; deep learning; AI; machine learning; intelligent agents; robotics applications

Prof. Dr. Vasile Palade

E-Mail Website
Guest Editor

Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry CV1 2TL, UK
Interests: deep learning; machine learning; AI and data science

Special Issue Information

Dear Colleagues,

Reinforcement learning (RL) continues to reshape the landscape of artificial intelligence, providing powerful tools for solving complex, sequential decision-making problems across a wide spectrum of domains. From fine-tuning large language models (LLMs) to enabling autonomous systems and managing critical infrastructure, RL has proven its versatility and transformative potential.

This Special Issue seeks to highlight recent advances, novel applications, and underexplored dimensions of RL that are shaping the future of intelligent systems. We particularly welcome contributions that introduce innovations in experience replay, algorithm design, sample efficiency, generalisation, safety, interpretability, and real-world deployment.

We invite researchers and practitioners from diverse disciplines to contribute high-quality work—ranging from theoretical developments and methodological insights to applied research and interdisciplinary case studies. This is a timely opportunity to exchange ideas, inspire new directions, and spotlight impactful use cases of RL.

Topics of Interest include, but are not limited to the following:

RL for robotics, dexterous manipulation, and swarm intelligence;
RL in autonomous driving, drone navigation, and transport systems;
Sample-efficient, generalisable, and robust RL algorithms;
New paradigms in experience replay and memory architectures;
RL in control of nuclear plants, water systems, and renewable energy grids;
RL for training or fine-tuning large language models (LLMs);
Human-in-the-loop RL and preference-based learning;
RL for summarisation, dialogue systems, and alignment with human intent;
RL for environmental forecasting and climate resilience;
Offline, safe, interpretable, and explainable RL;
Multi-agent reinforcement learning (MARL) and coordination strategies;
RL applications in healthcare, finance, logistics, and smart infrastructure;
Benchmarks, reproducibility, and open-source RL frameworks.

We aim to make this Special Issue both inclusive and impactful, welcoming contributions that expand the boundaries of RL from both the academic and industrial communities. Whether your work addresses foundational challenges or introduces creative applications, we would be delighted to consider your submission.

Please feel free to contact us with any queries or to discuss the suitability of your work.

We look forward to receiving your contribution and showcasing the latest innovations in reinforcement learning.

Best wishes in your research,

Dr. Abdulrahman Altahhan
Prof. Dr. Vasile Palade
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-anonymized peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

reinforcement learning
sample efficiency
over-estimation
generalisation
on-policy
off-policy
offline
online
policy gradient
experience replay
full experience replay
LLMs
human-in-the-loop
robotics
autonomous driverless cars
multi-agent RL
control

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

25 pages, 5582 KB

Open AccessArticle

AoI- and DS-Enhanced Cooperative Search for Multi-UAV Systems Under Spatially Structured Communication Constraints

by Lingtao Xue, Xuewen Dong, Xinyu Hu, Lingxiao Yang and Gang Xiao

Electronics 2026, 15(9), 1875; https://doi.org/10.3390/electronics15091875 - 29 Apr 2026

Viewed by 396

Abstract

Multi-UAV cooperative search is important for applications such as target reconnaissance, environmental monitoring, and emergency response. In practice, communication is often spatially heterogeneous due to terrain occlusion and environmental interference, which may delay information sharing and weaken coordination efficiency when UAVs traverse communication-blocked areas. To address this issue, we propose an Age of Information (AoI)- and Dempster–Shafer (DS)-enhanced cooperative search framework for multi-UAV systems under spatially structured communication constraints. Specifically, a DS belief map is introduced to fuse uncertain observations, while AoI is used to characterize the freshness of delayed information. An AoI-aware update mechanism further integrates buffered observations into the global belief map after communication recovery. The search process is then formulated as a communication-aware multi-agent sequential decision-making problem and solved using reinforcement learning. To demonstrate the generality of the proposed framework, we instantiate it with Proximal Policy Optimization (PPO), Multi-Agent Proximal Policy Optimization (MAPPO), and Q-value Mixing Network (QMIX). Experimental results show that the proposed framework consistently outperforms the baseline methods under heterogeneous environments and different communication conditions. Among all variants, AoI-DS-MAPPO achieves the best overall performance, improving average reward, success rate, and the number of detected targets by 26.13%, 24.32%, and 3.65%, respectively, while reducing episode length by 31.96% relative to the strongest baseline. Full article

(This article belongs to the Special Issue Reinforcement Learning: Sample Efficiency, Generalisation, and AI Applications)

► Show Figures

Figure 1

25 pages, 1702 KB

Open AccessArticle

Reinforcement Learning for Enhancing Bitcoin Risk-Aware Trading with Predictive Signals

by Simona-Vasilica Oprea and Adela Bâra

Electronics 2026, 15(4), 793; https://doi.org/10.3390/electronics15040793 - 12 Feb 2026

Viewed by 2439

Abstract

This paper proposes an AI-based trading framework that integrates supervised price forecasting with reinforcement learning (RL)-based decision-making. The objective is to enhance both profitability and risk management in cryptocurrency trading by equipping RL agents with forward-looking market information and risk-aware incentives. The proposed methodology follows a two-stage design. First, a univariate long short-term memory (LSTM) model generates 72 bitcoin price forecasts. These predictions are used to compute future technical indicators, which are combined with current market indicators to construct an enriched, forward-looking state representation. Second, an RL agent is trained in this environment using a novel long-term reward function that incorporates transaction costs, drawdown penalties, volatility penalties, and delayed rewards to promote stable and sustainable trading behavior. Four state-of-the-art RL algorithms (PPO, SAC, TD3, and A2C) are systematically evaluated over randomized 180-day episodes using hourly bitcoin data. The results demonstrate that the proposed agent consistently outperforms conventional buy-and-hold and moving average crossover strategies, achieving an average profit ratio of 32% and a Sharpe ratio of 1.34. These findings highlight the novelty and effectiveness of combining mid-term price forecasts, enriched technical states, and risk-aware RL training for robust cryptocurrency trading. Full article

(This article belongs to the Special Issue Reinforcement Learning: Sample Efficiency, Generalisation, and AI Applications)

► Show Figures

Figure 1

19 pages, 4754 KB

Open AccessArticle

Enhancing Adversarial Policy Learning via Value-Based Reward Shaping

by Bo Hou, Guangyu Pan and Yao Chen

Electronics 2026, 15(2), 463; https://doi.org/10.3390/electronics15020463 - 21 Jan 2026

Viewed by 618

Abstract

In adversarial reinforcement learning, designing dense reward functions is a traditional approach to address the sparsity of adversarial objectives. However, conventional reward design often relies on high-quality domain knowledge and may fail in practice, thereby inducing objective misalignment—a discrepancy between optimizing the designed reward and achieving the true adversarial utility. To reduce this discrepancy, a Value-Based Reward Shaping (VBRS) framework is proposed. VBRS integrates an intrinsic state-value estimate, which is a dynamic predictor of long-term utility, into the immediate reward function. As a result, exploration can be encouraged toward states predicted to be strategically advantageous, potentially avoiding some local optima in practice. Experiments demonstrate that VBRS outperforms a baseline that relies solely on the original reward function. The results confirm that the proposed method enhances adversarial performance and helps bridge the gap between designed reward guidance and the adversarial objective. Full article

(This article belongs to the Special Issue Reinforcement Learning: Sample Efficiency, Generalisation, and AI Applications)

► Show Figures

Figure 1

16 pages, 899 KB

Open AccessArticle

MoE-World: A Mixture-of-Experts Architecture for Multi-Task World Models

by Cong Tang, Yuang Liu, Yueling Wu, Wence Han, Qian Yin, Xin Zheng, Wenyi Zeng and Qiuli Zhang

Electronics 2025, 14(24), 4884; https://doi.org/10.3390/electronics14244884 - 11 Dec 2025

Cited by 1 | Viewed by 2161

Abstract

World models are currently a mainstream approach in model-based deep reinforcement learning. Given the widespread use of Transformers in sequence modeling, they have provided substantial support for world models. However, world models often face the challenge of the seesaw phenomenon during training, as predicting transitions, rewards, and terminations is fundamentally a form of multi-task learning. To address this issue, we propose a Mixture-of-Experts-based world model (MoE-World), a novel architecture designed for multi-task learning in world models. The framework integrates Transformer blocks organized as mixture-of-experts (MoE) layers, with gating mechanisms implemented using multilayer perceptrons. Experiments on standard benchmarks demonstrate that it can significantly mitigate the seesaw phenomenon and achieve competitive performance on the world model’s reward metrics. Further analysis confirms that the proposed architecture enhances both the accuracy and efficiency of multi-task learning. Full article

(This article belongs to the Special Issue Reinforcement Learning: Sample Efficiency, Generalisation, and AI Applications)

► Show Figures

Figure 1

18 pages, 1910 KB

Open AccessArticle

Hierarchical Learning for Closed-Loop Robotic Manipulation in Cluttered Scenes via Depth Vision, Reinforcement Learning, and Behaviour Cloning

by Hoi Fai Yu and Abdulrahman Altahhan

Electronics 2025, 14(15), 3074; https://doi.org/10.3390/electronics14153074 - 31 Jul 2025

Cited by 1 | Viewed by 1941

Abstract

Despite rapid advances in robot learning, the coordination of closed-loop manipulation in cluttered environments remains a challenging and relatively underexplored problem. We present a novel two-level hierarchical architecture for a depth vision-equipped robotic arm that integrates pushing, grasping, and high-level decision making. Central to our approach is a prioritised action–selection mechanism that facilitates efficient early-stage learning via behaviour cloning (BC), while enabling scalable exploration through reinforcement learning (RL). A high-level decision neural network (DNN) selects between grasping and pushing actions, and two low-level action neural networks (ANNs) execute the selected primitive. The DNN is trained with RL, while the ANNs follow a hybrid learning scheme combining BC and RL. Notably, we introduce an automated demonstration generator based on oriented bounding boxes, eliminating the need for manual data collection and enabling precise, reproducible BC training signals. We evaluate our method on a challenging manipulation task involving five closely packed cubic objects. Our system achieves a completion rate (CR) of 100%, an average grasping success (AGS) of 93.1% per completion, and only 7.8 average decisions taken for completion (DTC). Comparative analysis against three baselines—a grasping-only policy, a fixed grasp-then-push sequence, and a cloned demonstration policy—highlights the necessity of dynamic decision making and the efficiency of our hierarchical design. In particular, the baselines yield lower AGS (86.6%) and higher DTC (10.6 and 11.4) scores, underscoring the advantages of content-aware, closed-loop control. These results demonstrate that our architecture supports robust, adaptive manipulation and scalable learning, offering a promising direction for autonomous skill coordination in complex environments. Full article

(This article belongs to the Special Issue Reinforcement Learning: Sample Efficiency, Generalisation, and AI Applications)

► Show Figures

Journal Menu

Journal Browser

Reinforcement Learning: Sample Efficiency, Generalisation, and AI Applications

Share This Special Issue

Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI