This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning
by
Hadi Mohammadian KhalafAnsar
Hadi Mohammadian KhalafAnsar 1,
Jaime Rohten
Jaime Rohten 2,*
and
Jafar Keighobadi
Jafar Keighobadi 1
1
Faculty of Mechanical Engineering, University of Tabriz, Tabriz 51666-16471, Iran
2
Department of Electrical and Electronic Engineering, Universidad del Bío-Bío, Concepción 4051381, Chile
*
Author to whom correspondence should be addressed.
AI 2025, 6(12), 319; https://doi.org/10.3390/ai6120319 (registering DOI)
Submission received: 11 November 2025
/
Revised: 28 November 2025
/
Accepted: 3 December 2025
/
Published: 6 December 2025
Abstract
Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori knowledge layer that incorporates measurement uncertainty in both angle and angular velocity, allowing the controller to generate adaptive actions dynamically. Simultaneously, the deep Q-network is responsible for learning the optimal policy. To ensure stability, the Double DQN mechanism successfully alleviates the overestimation bias commonly observed in value-based reinforcement learning. An accelerated convergence mechanism is achieved through a multi-component reward shaping function that prioritizes angle stability and survival. Results: Given the training results, the method stabilizes rapidly; it achieves a 100% success rate by episode 20 and maintains consistent high rewards (650–700) throughout training. While Standard DQN and other baselines take 100+ episodes to become reliable, our method converges in about 20 episodes (4–5 times faster). It is observed that in comparison with advanced baselines like C51 or PER, the proposed method is about 15–20% better in final performance. We also found that PPO and QR-DQN surprisingly struggle on this task, highlighting the need for stability mechanisms. Conclusions: The proposed approach provides a practical solution that balances exploration with safety through the integration of fuzzy logic and deep reinforcement learning. This rapid convergence is particularly important for real-world applications where data collection is expensive, achieving stable performance much faster than existing methods without requiring complex theoretical guarantees.
Share and Cite
MDPI and ACS Style
KhalafAnsar, H.M.; Rohten, J.; Keighobadi, J.
A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning. AI 2025, 6, 319.
https://doi.org/10.3390/ai6120319
AMA Style
KhalafAnsar HM, Rohten J, Keighobadi J.
A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning. AI. 2025; 6(12):319.
https://doi.org/10.3390/ai6120319
Chicago/Turabian Style
KhalafAnsar, Hadi Mohammadian, Jaime Rohten, and Jafar Keighobadi.
2025. "A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning" AI 6, no. 12: 319.
https://doi.org/10.3390/ai6120319
APA Style
KhalafAnsar, H. M., Rohten, J., & Keighobadi, J.
(2025). A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning. AI, 6(12), 319.
https://doi.org/10.3390/ai6120319
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.