Previous Article in Journal
Improved Productivity Using Deep Learning-Assisted Major Coronal Curve Measurement on Scoliosis Radiographs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning

by
Hadi Mohammadian KhalafAnsar
1,
Jaime Rohten
2,* and
Jafar Keighobadi
1
1
Faculty of Mechanical Engineering, University of Tabriz, Tabriz 51666-16471, Iran
2
Department of Electrical and Electronic Engineering, Universidad del Bío-Bío, Concepción 4051381, Chile
*
Author to whom correspondence should be addressed.
AI 2025, 6(12), 319; https://doi.org/10.3390/ai6120319 (registering DOI)
Submission received: 11 November 2025 / Revised: 28 November 2025 / Accepted: 3 December 2025 / Published: 6 December 2025

Abstract

Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori knowledge layer that incorporates measurement uncertainty in both angle and angular velocity, allowing the controller to generate adaptive actions dynamically. Simultaneously, the deep Q-network is responsible for learning the optimal policy. To ensure stability, the Double DQN mechanism successfully alleviates the overestimation bias commonly observed in value-based reinforcement learning. An accelerated convergence mechanism is achieved through a multi-component reward shaping function that prioritizes angle stability and survival. Results: Given the training results, the method stabilizes rapidly; it achieves a 100% success rate by episode 20 and maintains consistent high rewards (650–700) throughout training. While Standard DQN and other baselines take 100+ episodes to become reliable, our method converges in about 20 episodes (4–5 times faster). It is observed that in comparison with advanced baselines like C51 or PER, the proposed method is about 15–20% better in final performance. We also found that PPO and QR-DQN surprisingly struggle on this task, highlighting the need for stability mechanisms. Conclusions: The proposed approach provides a practical solution that balances exploration with safety through the integration of fuzzy logic and deep reinforcement learning. This rapid convergence is particularly important for real-world applications where data collection is expensive, achieving stable performance much faster than existing methods without requiring complex theoretical guarantees.
Keywords: reinforcement learning (RL); deep Q-Network (DQN); dueling DQN; reward shaping; fuzzy logic; type-2 fuzzy systems reinforcement learning (RL); deep Q-Network (DQN); dueling DQN; reward shaping; fuzzy logic; type-2 fuzzy systems

Share and Cite

MDPI and ACS Style

KhalafAnsar, H.M.; Rohten, J.; Keighobadi, J. A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning. AI 2025, 6, 319. https://doi.org/10.3390/ai6120319

AMA Style

KhalafAnsar HM, Rohten J, Keighobadi J. A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning. AI. 2025; 6(12):319. https://doi.org/10.3390/ai6120319

Chicago/Turabian Style

KhalafAnsar, Hadi Mohammadian, Jaime Rohten, and Jafar Keighobadi. 2025. "A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning" AI 6, no. 12: 319. https://doi.org/10.3390/ai6120319

APA Style

KhalafAnsar, H. M., Rohten, J., & Keighobadi, J. (2025). A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning. AI, 6(12), 319. https://doi.org/10.3390/ai6120319

Article Metrics

Back to TopTop