An Adaptive QAPF Framework with a Discrete CBF-Inspired Safety Filter and Adaptive Reward Shaping for Safe Mobile Robot Navigation
Abstract
1. Introduction
- Adaptive QAPF Framework with Potential-Based Shaping. An adaptive shaping coefficient is introduced around the potential-based shaping form of Ng, Harada and Russell [14], , with , where denotes the continuous/grid position represented by state and normalizes potential differences used for reward shaping. Under the standard assumptions, the unclipped fixed-weight form preserves the optimal policy of the underlying MDP. The implemented clipped and decayed form is therefore presented as an empirically stable approximation rather than as an unconditional policy-invariance guarantee; finite-training performance still depends on exploration, discretization and hyperparameter choices.
- CBF-Inspired Safety Filter with Visit Memory and Empirical Unreachable-Goal Handling. A novel discrete CBF-inspired filter [15,16] is introduced that augments the barrier-function safety test with a per episode visit memory to eliminate oscillation loops, and is paired with an empirical timeout/stagnation detector that converts blocked-goal, sealed-corridor and dead-end concave cases into labeled Timeout-Unreachable and Stagnation-Unreachable outcomes. The held-out collision rate is reduced by the filter from (QAPF alone) to (QAPF+CBF), which is an approximately reduction relative to the internal QAPF-only ablation, while task completion is maintained and persistent no-path behavior is reported as a safe-failure mode rather than as indefinite oscillation. The prior QAPF formulation [12] does not report a comparable held-out collision metric and therefore the safety contrast is internal rather than external.
- Energy-Aware Velocity Modulation. An explicit velocity-modulation law , driven by the normalized gradient magnitude , is described. The law is constructed so that when (free space) and smoothly approaches in high-gradient obstacle-proximate regions; normalizes gradient magnitudes, distinct from the potential-difference scale used by the reward-shaping term. The law is paired with a two-term kinetic and jerk-energy metric .
- Dynamic and Multi-Robot Cooperative Extension. Beyond the static single-robot setting, dynamic obstacles, narrow-passage scenarios and multi-robot cooperative navigation are evaluated; for the cooperative case, an inter-robot virtual repulsion is paired with the per robot CBF-inspired filter, and the centralized scheme is accompanied by an explicit scaling-cost analysis with three concrete decentralization strategies (k-nearest, communication graph, hierarchical-cluster) [17,18] for fleets beyond the small-N regime.
- Comprehensive Safety-Centric Evaluation Protocol. A held-out evaluation protocol is introduced that jointly reports (i) main safety/efficiency metrics (success rate, collision rate, minimum clearance, Pareto frontier), (ii) robustness under three independent noise channels (observation noise , actuator slip and external drift ) with an accompanying conditional/high-probability inflated-barrier safety statement, (iii) per decision inference latency physically measured on an Intel i7 reference with conservative projections to Jetson/DGX-class platforms, and (iv) multi-axis generalization across held-out maps, to obstacle-density shifts, grid-size shifts and 1000-episode long-horizon stability.
2. Literature Review and Background
2.1. Literature Review
2.1.1. Enhanced Artificial Potential Field Methods
2.1.2. Deep Reinforcement Learning for Autonomous Navigation
2.1.3. Collision Avoidance with Dynamic Obstacles
2.1.4. Control Barrier Functions
2.1.5. Quantitative Headline Comparison Against the Closest Prior QAPF Work
2.2. Theoretical Background
2.2.1. Reinforcement Learning and Policy-Invariant Reward Shaping
2.2.2. Control Barrier Functions
3. Problem and System Description
3.1. System Description
3.2. MDP Formulation
3.3. Potential Field Visualization
4. Methodology
4.1. Q-Learning
4.2. Potential Field Force Approach
| Algorithm 1 STEP(a): potential-field evaluation |
4.3. QAPF Learning Algorithm
| Algorithm 2 QAPF learning algorithm |
|
4.4. Hybrid Q+APF Action Scoring
4.5. CBF-Inspired Action Filter with Visit Memory
| Algorithm 3 CBF-inspired action filter with visit memory |
|
| Algorithm 4 QAPF+CBF evaluation wrapper |
|
4.6. Energy-Aware Velocity Modulation
4.7. Empirical Unreachable-Goal Detection
4.8. Multi-Robot Cooperative Extension
4.9. Three-Dimensional Workspace Extension
4.10. Implementation Specifications
5. Experiment and Analysis
5.1. Experimental Setup and Evaluated Methods
5.2. Assumptions
- 1.
- Nominal state observation: The robot has access to its position and to the positions of all obstacles located within the influence distance . Robustness to bounded observation noise and actuator uncertainty is analyzed in Section 5.5.
- 2.
- Deterministic nominal transitions: Under nominal conditions, executing a discrete action moves the robot deterministically by one grid cell in the selected direction. Robustness to bounded actuator slip and external disturbance is analyzed in Section 5.5.
- 3.
- Known obstacle geometry: Obstacles are represented as point objects with a fixed collision radius of cells.
- 4.
- Dynamic obstacle model: Dynamic obstacles are assumed to move with constant velocity along fixed linear trajectories and to follow reflective boundary conditions at the workspace limits.
- 5.
- Discrete-action space: The robot selects its motion command from the four cardinal actions .
5.3. Main Results and Multi-Scenario Evaluation
5.3.1. Multi-Scenario Evaluation
5.3.2. Representative Trajectories, Safety Analysis and Reward Stability
5.4. Empirical Unreachable-Goal and Stagnation Detection
5.5. Robustness Analysis Under Noise and Disturbances
5.5.1. Noise Model
5.5.2. Theoretical Analysis
5.5.3. Experimental Results
5.6. I7 Inference Timing and Embedded-Device Latency Projection
5.6.1. Measurement Protocol
5.6.2. Device Extrapolation
5.6.3. Results
5.7. Ablation, Sensitivity and Extended Studies
5.7.1. Ablation Study
5.7.2. Sensitivity Analysis
5.7.3. Energy Modulation Study
5.7.4. Multi-Robot Study
Scalability Discussion
5.7.5. Curriculum Learning Study
Curriculum Schedule Details
5.7.6. Multi-Axis Generalization Study
5.8. Three-Dimensional Workspace Evaluation
5.9. Discussion, Limitations and Threats to Validity
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Algorithm Capabilities, Limitations and Comparisons
Appendix A.1. Obstacle Shape and Concave Geometry
Appendix A.2. Empirical Handling of Impossible-Goal Cases
| Scenario | Unreachable Detection (%) | False-Unreachable (%) | Collision Before Label (%) | Mean Detection Time (Steps) |
|---|---|---|---|---|
| Reachable held-out maps | − | − | ||
| Blocked-goal maps | − | |||
| Sealed-corridor maps | − | |||
| Dead-end concave maps | − |
Appendix A.3. Comparison Against RRT* and A*
| Property | A* | RRT* | QAPF+CBF (Ours) |
|---|---|---|---|
| requires explicit map/model? | yes (occupancy grid) | yes (configuration space) | no global map required; local features used |
| optimality guarantee | shortest discrete path | asymptotic optimality | no (learned heuristic) |
| handles dynamic obstacles? | via repeated replanning | via repeated replanning | yes, as a reactive policy (Section 5.3) |
| handles sensor noise? | requires estimator/map filtering | requires estimator/map filtering | evaluated directly under noisy local features (Section 5.5) |
| per decision time | for fresh plan | for query, but tree growth is offline | , real time |
| empirical SR (this paper) | not benchmarked | not benchmarked | |
| collision-free condition | yes, if the occupancy map and collision checking are exact | yes, if samples and edges are collision-checked in a valid configuration space | empirical CR |
| operating regime | static, fully-observed | static, fully-observed | dynamic, partially-observed, learnable |
Appendix A.4. Scalability to 3D Workspaces
Appendix A.5. Kinematic and Geometric Constraints, and Pose Uncertainty
Appendix B. Symbols and Abbreviations
Appendix B.1. Symbol Table
| Symbol | Domain/Units | Description |
|---|---|---|
| State and action | ||
| , | (cells) | Robot position; t indexes the decision step. |
| Goal position. | ||
| Successor position when action a is executed. | ||
| , | Discrete state and successor state in the MDP. | |
| Finite | Discrete state space, . | |
| Binned position coordinates (5 bins per axis). | ||
| Discretized bearing to goal and to nearest obstacle. | ||
| Binned distance to nearest obstacle. | ||
| Discretized approach rate (closing, stationary, receding). | ||
| One-step predicted distance bin. | ||
| a, | Action; . | |
| Set | Obstacle set, . | |
| Number of obstacles in the scene. | ||
| Artificial Potential Field | ||
| Scalar | Total potential at . | |
| Scalar | Attractive potential generated by the goal. | |
| Scalar | Repulsive potential generated by obstacles. | |
| Scalar | Attractive-gain coefficient (). | |
| Scalar | Repulsive-gain coefficient (). | |
| Cells | Distance to nearest obstacle: . | |
| Cells | Influence distance of the repulsive potential (). | |
| Vector | Gradient vector of the total potential at position . | |
| Scalar | Auto-calibrated potential-difference normalization constant for reward shaping, computed from . | |
| Scalar | Auto-calibrated gradient-magnitude normalization constant for energy-aware velocity modulation, computed from . | |
| Scalar | Zero-centered and range-normalized potential at successor . | |
| T, , | Scalar | Softmax temperature; initial value; floor. |
| Scalar | Mean successor potential, , used in Equation (10). | |
| Scalar | Minimum successor potential subtracted in the softmax of Equation (9) for numerical stability. | |
| Scalar | Numerical-stability constant in Equation (10); . | |
| G | Integer | Grid side length in cells (default ). |
| Vector | Discrete-action displacement vector applied by Algorithm 1. | |
| Reward and Q-learning | ||
| Scalar | Step reward function. | |
| Scalar | Terminal outcome bonus or penalty (+100, −50 or 0). | |
| Scalar | Step, proximity and progress reward weights. | |
| Cells | Euclidean distance to the goal. | |
| Scalar | Action-value function. | |
| Scalar | Optimistic Q-table initializer (). | |
| Q-learning rate. | ||
| Discount factor (). | ||
| Exploration probability for -greedy. | ||
| Scalar | Potential-based shaping increment. | |
| Scalar | State potential for reward shaping, . | |
| Scalar | Adaptive shaping coefficient at episode e. | |
| Scalar | Floor, ceiling and decay rate of . | |
| Scalar | APF guidance weights at training and evaluation. | |
| Buffer | Stuck-history buffer of recent goal distances used by the anti-deadlock monitor. | |
| Integer | Maximum number of training episodes. | |
| Integer | Maximum episode horizon in steps. | |
| Integer | Curriculum ramp length in episodes. | |
| , | Integer | Initial and target obstacle densities for the curriculum schedule. |
| CBF safety filter | ||
| Scalar | Discrete barrier function, . | |
| Cells | Collision radius (). | |
| Cells | Inflated safety margin (). | |
| Set | Nominal safe set, . | |
| Set | High-probability inflated safe set. | |
| Integer | Per episode visit counter. | |
| M | Integer | Visit cap per cell–action pair (). |
| Subsets of | Safe-mask and forbidden-mask sets. | |
| Action | QAPF nominal action prior to filtering. | |
| Integer | Number of consecutive stagnation windows that triggers Stagnation-Unreachable. | |
| Integer | Stagnation-window length in steps (). | |
| Cells | Goal-distance variation threshold defining a stagnation window (). | |
| Integer | Persistent-stagnation counter used by Algorithm 2. | |
| Multi-robot and energy | ||
| , | Position of robots k and j. | |
| Scalar | Inter-robot repulsive potential. | |
| Scalar | Multi-robot repulsion gain and influence distance. | |
| Scalar | Per step kinetic plus jerk energy. | |
| Scalar | Total episode energy. | |
| Scalar | Kinetic and jerk-energy weights. | |
| Cells/step | Modulated speed at position . | |
| Cells/step | Speed floor and ceiling. | |
| Scalar | Velocity-modulation aggressiveness coefficient. | |
| Scalar | Normalized gradient magnitude, . | |
| Function | Logistic sigmoid, . | |
| Noise model and robustness | ||
| Vector | Observation noise at step t. | |
| Cells | Observation-noise standard deviation. | |
| Actuator-slip probability. | ||
| Vector | External positional drift at step t. | |
| Cells | External-drift standard deviation. | |
| Cells | Radius of the high-probability error ball at level p. | |
| Matrix | identity matrix used in noise covariance. | |
| Training schedule | ||
| e | Episode index. | |
| t | Decision-step index within an episode. | |
Appendix B.2. Abbreviations
| Abbreviation | Meaning |
|---|---|
| Domain and concept | |
| APF | Artificial potential field. |
| CBF | Control barrier function. |
| MDP | Markov decision process. |
| QP | Quadratic program. |
| RL | Reinforcement learning. |
| DRL | Deep reinforcement learning. |
| RSS | Responsibility-sensitive safety (contract). |
| SLAM | Simultaneous localization and mapping. |
| GP | Gaussian process. |
| GC | Garbage collection (Python runtime). |
| Methods (proposed and baselines) | |
| QAPF | Q-learning with artificial potential field (proposed). |
| QAPF+CBF | QAPF augmented with the discrete CBF-inspired action filter (proposed). |
| Std. QL | Standard tabular Q-learning. |
| EQL | Efficient Q-learning baseline using optimistic initialization and slow decay. |
| CQL | Conservative Q-learning. |
| DQN | Deep Q-network. |
| PPO | Proximal policy optimization. |
| SAC | Soft actor–critic. |
| TD3 | Twin delayed deep deterministic policy gradient. |
| RRT* | Optimal rapidly-exploring random tree. |
| A* | A-star graph-search planner. |
| iADA*-RL | Incremental anytime dynamic A* with reinforcement learning. |
| Metrics, evaluation and protocols | |
| SR | Success Rate. |
| CR | Collision Rate. |
| pp | Percentage point(s). |
| Hz | Hertz (decisions per second). |
| Hardware and platforms | |
| i7 | Intel Core i7 reference laptop CPU. |
| DGX | NVIDIA DGX-class server platform. |
| AGX Orin | NVIDIA Jetson AGX Orin embedded module. |
| Orin NX | NVIDIA Jetson Orin NX embedded module. |
| Nano | NVIDIA Jetson Nano embedded module. |
| GPU | Graphics processing unit. |
| CPU | Central processing unit. |
| Statistics | |
| i.i.d. | Independent and identically distributed. |
| p | p-value (statistical significance). |
| 95th and 99th percentile of a distribution. | |
References
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar] [CrossRef]
- Chen, L.; Wu, P.; Chitta, K.; Jaeger, B.; Geiger, A.; Li, H. End-to-End Autonomous Driving: Challenges and Frontiers. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10164–10183. [Google Scholar] [CrossRef]
- Yao, Q.; Zheng, Z.; Qi, L.; Yuan, H.; Guo, X.; Zhao, M.; Liu, Z.; Yang, T. Path Planning Method with Improved Artificial Potential Field: A Reinforcement Learning Perspective. IEEE Access 2020, 8, 135513–135523. [Google Scholar] [CrossRef]
- Aradi, S. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 740–759. [Google Scholar] [CrossRef]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4909–4926. [Google Scholar] [CrossRef]
- Khatib, O. Real-time obstacle avoidance for manipulators and mobile robots. Int. J. Robot. Res. 1986, 5, 90–98. [Google Scholar] [CrossRef]
- Herrera Ortiz, J.A.; Rodríguez-Vázquez, K.; Padilla Castañeda, M.A.; Arámbula Cosío, F. Autonomous robot navigation based on the evolutionary multi-objective optimization of potential fields. Eng. Optim. 2013, 45, 19–43. [Google Scholar] [CrossRef]
- Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning; Proceedings of Machine Learning Research; PMLR: Stockholm, Sweden, 2018; Volume 80, pp. 1861–1870. [Google Scholar]
- Orozco-Rosas, U.; Picos, K.; Pantrigo, J.J.; Montemayor, A.S.; Cuesta-Infante, A. Mobile robot path planning using a QAPF learning algorithm for known and unknown environments. IEEE Access 2022, 10, 84648–84663. [Google Scholar] [CrossRef]
- Maoudj, A.; Hentout, A. Optimal path planning approach based on Q-learning algorithm for mobile robots. Appl. Soft Comput. 2020, 97, 106796. [Google Scholar] [CrossRef]
- Ng, A.Y.; Harada, D.; Russell, S. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. In Proceedings of the Sixteenth International Conference on Machine Learning; Morgan Kaufmann: San Francisco, CA, USA, 1999; pp. 278–287. [Google Scholar]
- Ames, A.D.; Xu, X.; Grizzle, J.W.; Tabuada, P. Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 2017, 62, 3861–3876. [Google Scholar] [CrossRef]
- Cheng, R.; Orosz, G.; Murray, R.M.; Burdick, J.W. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2019; Volume 33, pp. 3387–3395. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. arXiv 2016, arXiv:1610.03295. [Google Scholar] [CrossRef]
- Dinneweth, J.; Boubezoul, A.; Mandiau, R.; Espié, S. Multi-agent reinforcement learning for autonomous vehicles: A survey. Auton. Intell. Syst. 2022, 2, 27. [Google Scholar] [CrossRef]
- Montiel, O.; Sepúlveda, R.; Orozco-Rosas, U. Optimal Path Planning Generation for Mobile Robots using Parallel Evolutionary Artificial Potential Field. J. Intell. Robot. Syst. 2015, 79, 237–257. [Google Scholar] [CrossRef]
- Low, E.S.; Ong, P.; Cheah, K.C. Solving the optimal path planning of a mobile robot using improved Q-learning. Robot. Auton. Syst. 2019, 115, 143–161. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized Experience Replay. In Proceedings of the International Conference on Learning Representations. arXiv 2016, arXiv:1511.05952. [Google Scholar] [CrossRef]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning; Proceedings of Machine Learning Research; PMLR: Stockholm, Sweden, 2018; Volume 80, pp. 1587–1596. [Google Scholar]
- Chen, L.; Lu, K.; Rajeswaran, A.; Lee, K.; Grover, A.; Laskin, M.; Abbeel, P.; Srinivas, A.; Mordatch, I. Decision Transformer: Reinforcement Learning via Sequence Modeling. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 15084–15097. [Google Scholar]
- Janner, M.; Li, Q.; Levine, S. Offline Reinforcement Learning as One Big Sequence Modeling Problem. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 1273–1286. [Google Scholar]
- Muhammad, K.; Ullah, A.; Lloret, J.; Del Ser, J.; de Albuquerque, V.H.C. Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4316–4336. [Google Scholar] [CrossRef]
- You, C.; Lu, J.; Filev, D.; Tsiotras, P. Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robot. Auton. Syst. 2019, 114, 1–18. [Google Scholar] [CrossRef]
- Maw, A.A.; Tyan, M.; Nguyen, T.A.; Lee, J.W. iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV. Appl. Sci. 2021, 11, 3948. [Google Scholar] [CrossRef]
- Ames, A.D.; Coogan, S.; Egerstedt, M.; Notomista, G.; Sreenath, K.; Tabuada, P. Control barrier functions: Theory and applications. In Proceedings of the 2019 18th European Control Conference (ECC); IEEE: New York, NY, USA, 2019; pp. 3420–3431. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. On a Formal Model of Safe and Scalable Self-Driving Cars. arXiv 2017, arXiv:1708.06374. [Google Scholar] [CrossRef]
- Zhu, M.; Wang, Y.; Pu, Z.; Hu, J.; Wang, X.; Ke, R. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transp. Res. Part Emerg. Technol. 2020, 117, 102662. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Kumar, A.; Zhou, A.; Tucker, G.; Levine, S. Conservative Q-Learning for Offline Reinforcement Learning. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 1179–1191. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning; Proceedings of Machine Learning Research; PMLR: Mountain View, CA, USA, 2017; Volume 78, pp. 1–16. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: New York, NY, USA, 2017; pp. 23–30. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI); IEEE: New York, NY, USA, 2020; pp. 737–744. [Google Scholar] [CrossRef]
- Huang, X.; Kwiatkowska, M.; Wang, S.; Wu, M. Safety Verification of Deep Neural Networks. In Proceedings of the Computer Aided Verification; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10426, pp. 3–29. [Google Scholar] [CrossRef]
- Karaman, S.; Frazzoli, E. Sampling-Based Algorithms for Optimal Motion Planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]













| Method | Map-Free Nav. | Dynamic Obs. | Multi-Robot | Safety Filtering | Adaptive Learning | Local-Minima Mitigation | Real-Time Capable |
|---|---|---|---|---|---|---|---|
| Classical APF [6] | ✔ | ✗ | ✗ | ✗ | ✗ | ✗ | ✔ |
| Black-hole APF+RL [3] | ✔ | ✔ | ✗ | ✗ | ∼ | ✔ | ✔ |
| Evolutionary APF [19] | ✗ | ✗ | ✗ | ✗ | ✗ | ∼ | ✗ |
| Evo. Multi-obj. APF [7] | ✔ | ✔ | ✔ | ✗ | ✗ | ✗ | ∼ |
| QAPF (Orozco-Rosas) [12] | ✔ | ✗ | ✗ | ✗ | ✔ | ∼ | ✔ |
| Q-Learning [8] | ✔ | ✗ | ✗ | ✗ | ✔ | ✗ | ✔ |
| DQN [9] | ✔ | ∼ | ✗ | ✗ | ✔ | ✔ | ∼ |
| PPO [10] | ✔ | ✔ | ✗ | ✗ | ✔ | ✔ | ∼ |
| SAC [11] | ✔ | ✔ | ✗ | ✗ | ✔ | ✔ | ∼ |
| CBF-QP [15] | ✗ | ∼ | ✗ | ✔ | ✗ | ✗ | ✔ |
| CBF+RL [16] | ✔ | ✔ | ✗ | ✔ | ✔ | ✔ | ∼ |
| QAPF (Ours) | ✔ | ✔ | ✔ | ✗ | ✔ | ∼ | ✔ |
| QAPF+CBF (Ours) | ✔ | ✔ | ✔ | ✔ | ✔ | ∼ | ✔ |
| Capability Dimension | Prior QAPF [12] | This Work | Improvement |
|---|---|---|---|
| Convergence horizon | episodes | ∼205 ep (QAPF)/∼230 ep (QAPF+CBF) | Protocol-dependent reported-horizon reduction |
| Collision rate (held-out) | − (not reported) | (QAPF)/ (QAPF+CBF) | ∼20× internal reduction vs. QAPF-only ablation |
| Success rate (static) | Path-quality metric only | / | Qualitative jump |
| Dynamic obstacles | − | SR (QAPF+CBF) | New capability |
| Narrow passages | − | SR (QAPF+CBF) | New capability |
| Multi-robot cooperation | − | joint SR | New capability |
| Robustness eval. (noise) | − | 6 primary regimes (theory + exp.) | New capability |
| Unreachable-goal handling | − | Timeout/stagnation detector with empirical no-path investigation | New safe-failure capability |
| Embedded inference timing | − | ∼4.9 kHz mean-call throughput on i7; projected embedded latencies | New capability |
| Parameter | Value/Description |
|---|---|
| Grid size | cells |
| Max steps per episode | 1000 |
| Goal threshold ()/Collision radius () | 0.5/1.5 cells |
| Outcome rewards (, , , , ) | , , 1, 1, |
| Learning rate | 0.15 (QAPF/QAPF+CBF); 0.1 (baselines) |
| Discount factor | 0.95 |
| Exploration (, , ) | 0.3, 0.01, 0.995 (QAPF); 0.9, 0.01, 0.995 (baselines) |
| APF gains (, , ) | 1.0, 100.0, 3.0 cells |
| Softmax temperature (, , ) | 2.0, 0.3, 0.995/episode |
| Optimistic Q-init () | 5.0 |
| Adaptive shaping (, , ) | 0.5, 5.0, 0.005 |
| Hybrid guidance weights (, ) | 1.2, 2.0 |
| calibration | 95th percentile of over 2000 random steps; used only for reward-shaping normalization |
| Gradient scale | 95th percentile of over 500 random steps; used only for energy-aware velocity modulation |
| CBF safety margin ()/visit cap (M) | 0.3 cells/3 revisits per cell to action |
| Empirical unreachable detection | steps; steps; cell; consecutive stagnation windows |
| Max episodes/evaluation logging/convergence extraction | 1500/every 50 episodes/convergence episode estimated by interpolation of the smoothed held-out SR curve |
| Final held-out evaluation episodes | 100 |
| Independent seeds | 30 |
| Energy-aware module | |
| (Constant/APF-mod/APF-agg) | /0.5/1.5 |
| , , , | 1.0, 0.05, 1.0, 0.5 |
| Multi-robot module | |
| , | 20.0, 3.0 cells |
| Curriculum schedule (Section 5.7.5) | |
| Initial density /target density | 5/15 |
| Ramp episodes /schedule shape | 400/linear, monotonic seed |
| Category | Parameter | Value/Description |
|---|---|---|
| Environment | Grid size | cells |
| Obstacle count | 15 (default); variable in scenarios | |
| Obstacle types | Static, dynamic, narrow-passage | |
| Hardware | Processor | Intel Core i7 (8 cores, ) |
| Memory | 16 GB DDR4 RAM | |
| Parallelism | Multiprocessing, 16 CPU threads | |
| Software | Language | Python 3.8 |
| Numerical | NumPy 1.21.0 | |
| Visualization | Matplotlib 3.4.0 | |
| Evaluation | Independent runs | 30 for main comparative, scenario, ablation and generalization experiments |
| Protocol-specific exceptions | Robustness: 5 training seeds × 30 noisy episodes per regime; timing: repeated per decision calls | |
| Training episodes | Up to 1500 (main); 1000 (multi-robot) | |
| Evaluation | Held-out suites (disjoint seed pools) |
| Method | Success Rate (%) | Collision Rate (%) | Min. Clearance (Cells) | Conv. Episode |
|---|---|---|---|---|
| APF-Only | N/A ‡ | |||
| Std. QL | ||||
| EQL | ||||
| CQL | ||||
| DQN | ||||
| QAPF | ||||
| QAPF+CBF |
| Scenario | Method | Success Rate (%) | Collision Rate (%) | Min. Clearance (Cells) |
|---|---|---|---|---|
| Static | Std. QL | 78.3 | 22.4 | 3.52 |
| EQL | 86.2 | 14.3 | 3.10 | |
| CQL | 82.5 | 18.5 | 3.35 | |
| DQN | 84.7 | 16.8 | 3.21 | |
| QAPF | 6.2 | 2.78 | ||
| QAPF+CBF | 93.8 | 3.15 | ||
| Dynamic | Std. QL | 61.5 | 31.8 | 2.85 |
| EQL | 71.8 | 24.7 | 2.62 | |
| CQL | 67.2 | 28.3 | 2.74 | |
| DQN | 74.1 | 22.4 | 2.55 | |
| QAPF | 89.3 | 9.8 | 2.18 | |
| QAPF+CBF | 2.82 | |||
| Narrow | Std. QL | 55.8 | 37.6 | 3.95 |
| EQL | 64.3 | 29.2 | 3.65 | |
| CQL | 60.1 | 34.8 | 3.78 | |
| DQN | 69.4 | 12.5 | 3.42 | |
| QAPF | 9.8 | 2.38 | ||
| QAPF+CBF | 85.5 | 3.08 |
| Regime | (Cells) | (Cells) | Physical Interpretation | |
|---|---|---|---|---|
| clean | 0.0 | 0.00 | 0.00 | ideal conditions |
| obs_low | 0.3 | 0.00 | 0.00 | light sensor noise |
| obs_high | 0.8 | 0.00 | 0.00 | heavy sensor noise |
| act_low | 0.0 | 0.05 | 0.00 | mild actuator slip |
| act_high | 0.0 | 0.15 | 0.00 | severe actuator slip |
| combined | 0.3 | 0.05 | 0.10 | realistic mixed disturbance |
| Regime | APF-Only | QAPF | QAPF+CBF |
|---|---|---|---|
| clean | |||
| obs_low | |||
| obs_high | |||
| act_low | |||
| act_high | |||
| combined |
| Method | i7 Median (µs) | i7 Mean (µs) | Nano | Orin NX | AGX Orin | DGX | Hz @ i7 (Mean) | 20 Hz? |
|---|---|---|---|---|---|---|---|---|
| APF-Only | 158.6 | 228.2 | 380.6 | 174.5 | 142.7 | 111.0 | 4382 | ✔ † |
| Std. QL | 9.0 | 14.7 | 21.6 | 9.9 | 8.1 | 6.3 | 68,000 | ✔ |
| DQN | 13.4 | 17.5 | 32.2 | 14.7 | 12.1 | 9.4 | 57,000 | ✔ |
| QAPF | 149.4 | 173.6 | 358.5 | 164.3 | 134.4 | 104.6 | 5761 | ✔ |
| QAPF+CBF | 173.7 | 204.1 | 417.0 | 191.1 | 156.4 | 121.6 | 4899 | ✔ |
| Platform | Scaling Factor | Rationale |
|---|---|---|
| Intel i7 (reference) | Baseline (laptop-class, 8 cores, ) | |
| Jetson Nano | ARM Cortex-A57 @ , no AVX | |
| Jetson Orin NX | ARM Cortex-A78AE @ | |
| Jetson AGX Orin | ARM Cortex-A78AE @ , wider memory | |
| DGX-level GPU (proxy) | CPU-side overhead only; numpy tables do not saturate GPU |
| Variant | Success (%) | Collision (%) | Clearance | Conv. Ep |
|---|---|---|---|---|
| RL-Only | ||||
| APF-Only | N/A | |||
| QAPF-Fixed- | ||||
| QAPF-Full | ||||
| QAPF+CBF |
| Success Rate (%) | Conv. Episode | ||
|---|---|---|---|
| 5 | 0.005 | 90.8 | 225 |
| 5 | 0.01 | 91.7 | 218 |
| 5 | 0.02 | 89.1 | 210 |
| 10 | 0.005 | 92.1 | 245 |
| 10 | 0.01 | 94.5 | 198 |
| 10 | 0.02 | 89.8 | 176 |
| 20 | 0.005 | 88.2 | 258 |
| 20 | 0.01 | 90.3 | 231 |
| 20 | 0.02 | 87.4 | 219 |
| Regime | SR (%) | E | Nav. Time | ||
|---|---|---|---|---|---|
| Constant | 94.5 | 60.05 | 1.00 | 14.2 | |
| APF-mod | 0.5 | 94.5 | 49.25 | 0.82 | 15.4 |
| APF-agg | 1.5 | 94.5 | 42.04 | 0.70 | 20.8 |
| Configuration | Joint Success Rate (%) | Inter-Robot Collisions |
|---|---|---|
| Independent | ||
| Cooperative | ||
| Coop+CBF |
| Cost Dimension | Scaling | Example | Main Bottleneck |
|---|---|---|---|
| Per decision pair evaluations | 90 directed/45 unique pairs | Runtime at large N | |
| Communication bandwidth | state messages per cycle | Wireless contention | |
| Fleet-level joint success | Joint SR drops rapidly | ||
| Single-point-of-failure risk | Centralized coordinator | One coordinator | Resilience requirement |
| Training Mode | Success Rate (%) | Convergence Episode |
|---|---|---|
| Fixed | ||
| Curriculum |
| Generalization Axis | Cell | APF-Only | DQN | QAPF | QAPF+CBF |
|---|---|---|---|---|---|
| Axis A. Held-out maps under the training distribution | |||||
| A1 Seen (training pool) | |||||
| A2 Unseen (disjoint seeds) | |||||
| Axis B. Out-of-distribution obstacle density (trained on ) | |||||
| B1 (sparse) | |||||
| B2 | |||||
| B3 (in-dist.) | |||||
| B4 | |||||
| B5 (dense) | |||||
| Axis C. Out-of-distribution grid size (trained on , density held fixed) | |||||
| C1 (smaller) | |||||
| C2 (in-dist.) | |||||
| C3 (larger) | |||||
| Decile | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 |
|---|---|---|---|---|---|---|---|---|---|---|
| SR (%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Isaac, E.; George, A.J.; Ioannou, I.; Abraham, J.P.; Kallam, S.; Ghantasala, G.S.P.; Vidyullatha, P.; Vassiliou, V. An Adaptive QAPF Framework with a Discrete CBF-Inspired Safety Filter and Adaptive Reward Shaping for Safe Mobile Robot Navigation. Electronics 2026, 15, 1945. https://doi.org/10.3390/electronics15091945
Isaac E, George AJ, Ioannou I, Abraham JP, Kallam S, Ghantasala GSP, Vidyullatha P, Vassiliou V. An Adaptive QAPF Framework with a Discrete CBF-Inspired Safety Filter and Adaptive Reward Shaping for Safe Mobile Robot Navigation. Electronics. 2026; 15(9):1945. https://doi.org/10.3390/electronics15091945
Chicago/Turabian StyleIsaac, Elizabeth, Asha J. George, Iacovos Ioannou, Jisha P. Abraham, Suresh Kallam, G. S. Pradeep Ghantasala, Pellakuri Vidyullatha, and Vasos Vassiliou. 2026. "An Adaptive QAPF Framework with a Discrete CBF-Inspired Safety Filter and Adaptive Reward Shaping for Safe Mobile Robot Navigation" Electronics 15, no. 9: 1945. https://doi.org/10.3390/electronics15091945
APA StyleIsaac, E., George, A. J., Ioannou, I., Abraham, J. P., Kallam, S., Ghantasala, G. S. P., Vidyullatha, P., & Vassiliou, V. (2026). An Adaptive QAPF Framework with a Discrete CBF-Inspired Safety Filter and Adaptive Reward Shaping for Safe Mobile Robot Navigation. Electronics, 15(9), 1945. https://doi.org/10.3390/electronics15091945

